Giter VIP home page Giter VIP logo

zhangbutao / arctic Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/amoro

0.0 0.0 0.0 31.36 MB

Arctic is a streaming lake warehouse service open sourced by NetEase

Home Page: https://arctic.netease.com/

License: Apache License 2.0

JavaScript 0.21% Python 0.09% Java 93.23% Scala 3.03% TypeScript 0.75% CSS 0.04% ANTLR 0.73% Thrift 0.06% HTML 0.06% Vue 1.47% Dockerfile 0.07% Less 0.07% Shell 0.18%

arctic's Introduction

logo

Arctic is a LakeHouse management system under open architecture, which on top of data lake open formats provides more optimizations for streaming and upsert scenarios, as well as a set of pluggable self-optimizing mechanisms and management services. Using Arctic could help various data platforms, tools and products build out-of-the-box, streaming and batch unified LakeHouses quickly.

What is arctic

Currently, Arctic is a LakeHouse management system on top of iceberg format. Benefit from the thriving ecology of Apache Iceberg, Arctic could be used on kinds of data lakes on premise or clouds with varities of engines. Several concepts should be known before your deeper steps:

Introduce

  • AMS and optimizers - Arctic Management Service provides management features including self-optimizing mechanisms running on optimizers, which could be scaled as demand and scheduled on different platforms.
  • Mutiple formats โ€” Arctic use formats analogous to MySQL or ClickHouse using storage engines to meet different scenarios. Two formats were available since Arctic v0.4.
    • Iceberg format โ€” learn more about iceberg format details and usage with different engines: Iceberg Docs
    • Mixed streaming format - if you are interested in advanced features like auto-bucket, logstore, hive compatible, strict PK constraints etc. learn Arctic Mixed Iceberg format and Mixed Hive format

Arctic features

  • Defining keys - supports defining primary key with strict constraints, and more types of keys in future
  • Self-optimizing - user-insensitive asynchronous self-optimization mechanisms could keep lakehouse fresh and healthy
  • Management features - dashboard UI to support catalog/table management, SQL terminal and all kinds of metrics
  • Formats compatible - Hive/Iceberg format compatible means writing and reading through native Hive/Iceberg connector
  • Better data pipeline SLA - using LogStore like kafka to accelarate streaming data pipeline to ms/s latency
  • Better OLAP performace - provides auto-bucket feature for better compaction and merge-on-read performance
  • Concurrent conflicts resovling - Flink or Spark could concurrent write data without worring about conflicts

Modules

Arctic contains modules as below:

  • arctic-core contains core abstractions and common implementions for other modules
  • arctic-flink is the module for integrating with Apache Flink (use arctic-flink-runtime for a shaded version)
  • arctic-spark is the module for integrating with Apache Spark (use arctic-spark-runtime for a shaded version)
  • arctic-trino now provides query integrating with apache trino, built on JDK11
  • arctic-optimizing exposes optimizing container/group api and provides default implemetion
  • arctic-ams is arctic meta service module
    • ams-api contains ams thrift api
    • ams-dashboard is the dashboard frontend for ams
    • ams-server is the backend server for ams

Building

Arctic is built using Maven with Java 1.8 and Java 11(only for trino module).

  • To build Trino module need config toolchains.xml in ${user.home}/.m2/ dir, the content is
<?xml version="1.0" encoding="UTF-8"?>
<toolchains>
    <toolchain>
        <type>jdk</type>
        <provides>
            <version>11</version>
            <vendor>sun</vendor>
        </provides>
        <configuration>
            <jdkHome>${yourJdk11Home}</jdkHome>
        </configuration>
    </toolchain>
</toolchains>
  • To invoke a build and run tests: mvn package -P toolchain
  • To skip tests: mvn -DskipTests package -P toolchain
  • To package without trino module and JAVA 11 dependency: mvn clean package -DskipTests -pl '!trino'

Engines supported

Arctic support multiple processing engines as below:

Processing Engine Version
Flink 1.12.x, 1.14.x and 1.15.x
Spark 2.3, 3.1
Trino 380

Quickstart

Visit https://arctic.netease.com/ch/docker-quickstart/ to quickly explore what arctic can do.

Join Community

If you are interested in Lakehouse, Data Lake Format, welcome to join our community, we welcome any organizations, teams and individuals to grow together, and sincerely hope to help users better use Data Lake Format through open source.

Join the Arctic WeChat Group: Add " kllnn999 " as a friend on WeChat and specify "Arctic lover".

arctic's People

Contributors

zhoujinsong avatar hzluting avatar hameizi avatar hellojinsilei avatar baiyangtx avatar huiyuanz avatar shidayang avatar wangtaohz avatar huangfru avatar zstraw avatar yesorno828 avatar shendanfengg avatar aireed avatar lklhdu avatar stenicholas avatar zhongqishang avatar majin1102 avatar jamesishuang avatar wuqiao avatar zhendongbai avatar lvyanquan avatar chenhong02 avatar nicochen avatar xbaith avatar mcruijie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.