• Stars
    star
    640
  • Rank 69,928 (Top 2 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Amoro is a Lakehouse management system built on open data lake formats.

logo

Arctic is a LakeHouse management system under open architecture, which on top of data lake open formats provides more optimizations for streaming and upsert scenarios, as well as a set of pluggable self-optimizing mechanisms and management services. Using Arctic could help various data platforms, tools and products build out-of-the-box, streaming and batch unified LakeHouses quickly.

What is Arctic

Currently, Arctic is a LakeHouse management system on top of iceberg format. Benefit from the thriving ecology of Apache Iceberg, Arctic could be used on kinds of data lakes on premise or clouds with varities of engines. Several concepts should be known before your deeper steps:

Introduce

  • AMS and optimizers - Arctic Management Service provides management features including self-optimizing mechanisms running on optimizers, which could be scaled as demand and scheduled on different platforms.
  • Multiple formats — Arctic use formats analogous to MySQL or ClickHouse using storage engines to meet different scenarios. Two formats were available since Arctic v0.4.
    • Iceberg format — learn more about iceberg format details and usage with different engines: Iceberg Docs
    • Mixed streaming format - if you are interested in advanced features like auto-bucket, logstore, hive compatible, strict PK constraints etc. learn Arctic Mixed Iceberg format and Mixed Hive format

Arctic features

  • Defining keys - supports defining primary key with strict constraints, and more types of keys in future
  • Self-optimizing - user-insensitive asynchronous self-optimization mechanisms could keep lakehouse fresh and healthy
  • Management features - dashboard UI to support catalog/table management, SQL terminal and all kinds of metrics
  • Formats compatible - Hive/Iceberg format compatible means writing and reading through native Hive/Iceberg connector
  • Better data pipeline SLA - using LogStore like kafka to accelarate streaming data pipeline to ms/s latency
  • Better OLAP performace - provides auto-bucket feature for better compaction and merge-on-read performance
  • Concurrent conflicts resovling - Flink or Spark could concurrent write data without worring about conflicts

Modules

Arctic contains modules as below:

  • arctic-core contains core abstractions and common implementation for other modules
  • arctic-flink is the module for integrating with Apache Flink (use arctic-flink-runtime for a shaded version)
  • arctic-spark is the module for integrating with Apache Spark (use arctic-spark-runtime for a shaded version)
  • arctic-trino now provides query integrating with apache trino, built on JDK17
  • arctic-ams is arctic meta service module
    • ams-api contains ams thrift api
    • ams-dashboard is the dashboard frontend for ams
    • ams-server is the backend server for ams
    • ams-optimizer provides default optimizer implementation

Building

Arctic is built using Maven with Java 1.8 and Java 17(only for trino module).

  • To build Trino module need config toolchains.xml in ${user.home}/.m2/ dir, the content is
<?xml version="1.0" encoding="UTF-8"?>
<toolchains>
    <toolchain>
        <type>jdk</type>
        <provides>
            <version>17</version>
            <vendor>sun</vendor>
        </provides>
        <configuration>
            <jdkHome>${YourJDK17Home}</jdkHome>
        </configuration>
    </toolchain>
</toolchains>
  • To invoke a build and run tests: mvn package -P toolchain
  • To skip tests: mvn -DskipTests package -P toolchain
  • To package without trino module and JAVA 17 dependency: mvn clean package -DskipTests -pl '!trino'
  • To build with hadoop 2.x(the default is 3.x) mvn clean package -DskipTests -Dhadoop=v2
  • To indicate flink version for optimizer(the default is 1.14, 1.15 and 1.16 are available) mvn clean package -DskipTests -Doptimizer.flink=1.15

Engines supported

Arctic support multiple processing engines as below:

Processing Engine Version
Flink 1.12.x, 1.14.x and 1.15.x
Spark 3.1, 3.2, 3.3
Trino 406

Quickstart

Visit https://arctic.netease.com/ch/quickstart/setup/ to quickly explore what arctic can do.

Join Community

If you are interested in Lakehouse, Data Lake Format, welcome to join our community, we welcome any organizations, teams and individuals to grow together, and sincerely hope to help users better use Data Lake Format through open source.

Join the Arctic WeChat Group: Add " kllnn999 " as a friend on WeChat and specify "Arctic lover".

More Repositories

1

pomelo

A fast,scalable,distributed game server framework for Node.js.
JavaScript
11,863
star
2

tango

A code driven low-code builder, develop low-code app on your codebase.
TypeScript
2,047
star
3

UnitySocketIO

socket.io client for unity3d.
C#
887
star
4

lordofpomelo

the online demo of pomelo
JavaScript
792
star
5

chatofpomelo

chat application of pomelo
JavaScript
298
star
6

pomelo-unityclient-socket

pomele dotnet client
C#
170
star
7

libpomelo2

A New Client SDK for Pomelo
C
133
star
8

libpomelo

[DEPRECATED] Please Go to https://github.com/NetEase/libpomelo2
C
116
star
9

pomelo-unityclient

pomelo client for unity3d.
C#
113
star
10

pomelo-cocos2d-js

pomelo-cocos2d-js client
JavaScript
100
star
11

chatofpomelo-websocket

chatofpomelo with pomelo 0.3.0
JavaScript
99
star
12

arrow

A TestNG plugin with better test report template, re-run tests on failure, and many more.
Java
97
star
13

pomelo-rpc

rpc framework for pomelo
JavaScript
95
star
14

airtest

Deprecated, moved to http://airtest.netease.com
Python
93
star
15

treasures

a tutorial demo of pomelo
JavaScript
90
star
16

pomelo-androidclient

java&android client for pomelo
Java
83
star
17

pomelo-iosclient

iOS client lib for pomelo
Objective-C
67
star
18

pomelo-aoi

The aoi module used in lordofpomelo
JavaScript
58
star
19

pomelo-cocos2dchat

cocos2dchat based on cocos2d-x and libpomelo
C++
57
star
20

pomelo-admin

admin module for pomelo monitor system
JavaScript
50
star
21

pomelo-ioschat

A chat demo for pomelo iOS client
Objective-C
49
star
22

pomelo-scheduler

the high performance schedule module for calling scheduled task
JavaScript
49
star
23

hive-tools

Java
46
star
24

erlyssh

A Parallel SSH Execution Tool
Erlang
44
star
25

pomelo-robot

the client test framework for websocket
JavaScript
41
star
26

Polyphonic-TrOMR

TrOMR:Transformer-based Polyphonic Optical Music Recognition
Python
41
star
27

pomelo-sync

the backend database sync module
JavaScript
41
star
28

pomelo-cn

The Chinese issues for pomelo.
38
star
29

pomelo-unitychat-socket

pomelo-chat client based on socket, support unity 3D and other .net environment,
ASP
38
star
30

pomelo-admin-web

monitor server demo for pomelo admin system
JavaScript
35
star
31

pomelo-bt

behavior tree implementation for pomelo ai
JavaScript
33
star
32

pomelo-androidchat

chat client application using android client for pomelo
Java
31
star
33

pomelo-cli

pomelo-cli is a command-line library for pomelo maintenance
JavaScript
31
star
34

pomelo-mqtt-connector

pomelo mqtt connector based on mqtt over tcp and over ws protocol
JavaScript
29
star
35

pomelo-globalchannel-plugin

globalChannel plugin for pomelo
JavaScript
29
star
36

pomelo-protocol

pomelo-protocol
JavaScript
29
star
37

pomelo-robot-demo

pomelo-protocol
JavaScript
29
star
38

pomelo-logger

logger wrapper of log4js for pomelo logger system
JavaScript
25
star
39

pomelo-status-plugin

status plugin for pomelo
JavaScript
24
star
40

mebius

精准测试sdk
Java
23
star
41

spark-alarm

Alerting and monitoring tool for Apache Spark
Scala
22
star
42

c

Python
20
star
43

pomelo-zookeeper-plugin

zookeeper service for pomelo
JavaScript
18
star
44

nec

Netease's nec css framework,
CSS
18
star
45

pomelo-unitychat

ASP
17
star
46

opencurve

14
star
47

pomelo-sync-plugin

sync plugin for pomelo
JavaScript
13
star
48

tango-boot

A frontend framework for netease tango low-code app
TypeScript
12
star
49

pomelo-rpc-zeromq

pomelo rpc using zeromq for communication
JavaScript
12
star
50

pomelo-protobuf-plugin

plugin for pomelo protobuf
JavaScript
11
star
51

lakehouse-benchmark

A benchmark tool for lakehouses.
Java
11
star
52

tango-components

An UI library for tango low-code builder
TypeScript
10
star
53

pomelo-dotnetchat-console

pomelo-chat client based on socket and .net
C#
10
star
54

pomelo-pathfinding

Path finding module used in lordofpomelo.
JavaScript
9
star
55

example

C
9
star
56

ranger

Java
9
star
57

pomelo-loader

loader module for pomelo to load handler and remote with the convention over configuration rules
JavaScript
9
star
58

pomelo-daemon

provide daemon process service for pomelo to deploy in distributed environments
JavaScript
8
star
59

pomelo-scale-plugin

scale up service for pomelo
JavaScript
7
star
60

kyuubi-arctic-playground

Playbook of Kyuubi and Arctic Demo
Dockerfile
6
star
61

pomelo-masterha-plugin

master ha plugin for pomelo
JavaScript
6
star
62

tango-playground

A playground app of tango low-code builder
TypeScript
5
star
63

pomelo-data-plugin

load data pomelo plugin
JavaScript
5
star
64

lakehouse-benchmark-ingestion

A ingestion tool for Lakehouse benchmark
Java
4
star
65

pomelo-monitor

Simple, comprehensive monitoring tool for operating-system and process in nodejs.
JavaScript
4
star
66

pomelo-collection

Collection used for pomelo
JavaScript
4
star
67

airtest-gui

GUI write by qt for airtest
Python
3
star
68

pomelo-data-plugin-demo

pomelo-data-plugin demo
JavaScript
3
star
69

tango-site

Official site of NetEase Tango low-code builder
TypeScript
3
star
70

amoro-site

Documentation site for project Amoro
SCSS
2
star
71

amoro-medium

Medium materials for project Amoro
1
star