Giter VIP home page Giter VIP logo

apache / inlong Goto Github PK

View Code? Open in Web Editor NEW
1.3K 68.0 487.0 53.64 MB

Apache InLong - a one-stop, full-scenario integration framework for massive data

Home Page: https://inlong.apache.org/

License: Apache License 2.0

Shell 0.59% CSS 0.25% JavaScript 4.25% Java 87.10% Batchfile 0.03% Dockerfile 0.06% Smarty 0.02% Scala 0.08% CMake 0.08% C++ 2.32% Python 0.07% HTML 0.01% TypeScript 3.71% Less 0.04% Go 1.39% Makefile 0.01%
inlong one-stop-service data-streaming event-streaming framework massive-data-integration full-scenario-service

inlong's Introduction

GitHub Actions CodeCov Maven Central GitHub release License Twitter Slack

What is Apache InLong?

Stargazers Over Time Contributors Over Time
Stargazers over time Contributor Over Time

Apache InLong is a one-stop, full-scenario integration framework for massive data that supports Data Ingestion, Data Synchronization and Data Subscription, and it provides automatic, secure and reliable data transmission capabilities. InLong also supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.

InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams.

InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats, that is, the data can be continuously pushed from the source to the target cluster, which greatly meets the data reporting service requirements in the business big data scenario.

For getting more information, please visit our project documentation at https://inlong.apache.org/. inlong-structure-en.png

Features

Apache InLong offers a variety of features:

  • Ease of Use: a SaaS-based service platform. Users can easily and quickly report, transfer, and distribute data by publishing and subscribing to data based on topics.
  • Stability & Reliability: derived from the actual online production environment. It delivers high-performance processing capabilities for 10 trillion-level data streams and highly reliable services for 100 billion-level data streams.
  • Comprehensive Features: supports various types of data access methods and can be integrated with different types of Message Queue (MQ). It also provides real-time data extract, transform, and load (ETL) and sorting capabilities based on rules. InLong also allows users to plug features to extend system capabilities.
  • Service Integration: provides unified system monitoring and alert services. It provides fine-grained metrics to facilitate data visualization. Users can view the running status of queues and topic-based data statistics in a unified data metric platform. Users can also configure the alert service based on their business requirements so that users can be alerted when errors occur.
  • Scalability: adopts a pluggable architecture that allows you to plug modules into the system based on specific protocols. Users can replace components and add features based on their business requirements.

When should I use InLong?

InLong aims to provide a one-stop, full-scenario integration framework for massive data, users can easily build stream-based data applications. It supports Data Ingestion, Data Synchronization and Data Subscription at the same time, and is suitable for environments that need to quickly build a data reporting platform, as well as an ultra-large-scale data reporting environment that InLong is very suitable for, and an environment that needs to automatically sort and land the reported data.

You can use InLong in the following ways:

Supported Data Nodes (Updating)

Type Name Version
Extract Node Auto Push None
File None
Kafka 2.x
MongoDB >= 3.6
MQTT >= 3.1
MySQL 5.6, 5.7, 8.0.x
Oracle 11,12,19
PostgreSQL 9.6, 10, 11, 12
Pulsar 2.8.x
Redis 2.6.x
SQLServer 2012, 2014, 2016, 2017, 2019
Load Node Auto Consumption None
ClickHouse 20.7+
Elasticsearch 6.x, 7.x
Greenplum 4.x, 5.x, 6.x
HBase 2.2.x
HDFS 2.x, 3.x
Hive 1.x, 2.x, 3.x
Iceberg 0.12.x
Hudi 0.12.x
Kafka 2.x
MySQL 5.6, 5.7, 8.0.x
Oracle 11, 12, 19
PostgreSQL 9.6, 10, 11, 12
SQLServer 2012, 2014, 2016, 2017, 2019
TDSQL-PostgreSQL 10.17
Doris >= 0.13
StarRocks >= 2.0
Kudu >= 1.12.0
Redis >= 3.0

Build InLong

More detailed instructions can be found at Quick Start section in the documentation.

Requirements:

CodeStyle:

mvn spotless:apply

Compile and install:

mvn clean install -DskipTests

(Optional) Compile using docker image:

docker pull maven:3.6-openjdk-8
docker run -v `pwd`:/inlong  -w /inlong maven:3.6-openjdk-8 mvn clean install -DskipTests

after compile successfully, you could find distribution file at inlong-distribution/target.

Deploy InLong

Develop InLong

Contribute to InLong

Contact Us

Documentation

License

© Contributors Licensed under an Apache-2.0 license.

inlong's People

Contributors

aloyszhang avatar apocalypsewan avatar baomingyu avatar bluewang avatar ciscozhou avatar dockerzhang avatar doleyzi avatar e-mhui avatar emsnap avatar featzhang avatar fuweng11 avatar ganfengtan avatar gong avatar gosonzhang avatar haifxu avatar healchow avatar justinwwhuang avatar kipshi avatar leezng avatar lucaspeng12138 avatar luchunliang avatar lvjiancheng avatar pocozh avatar shink avatar technoboy- avatar thesumery avatar tszkitlo40 avatar vernedeng avatar woofyzhao avatar yunqingmoswu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inlong's Issues

[INLONG-26] correct spelling (difftime-> diffTime)

ConsumerSamplePrint#printExceptionCaught difftime -> diffTime
BrokerSamplePrint#printExceptionCaught difftime-> diffTime
DiskSamplePrint#printExceptionCaught difftime-> diffTime
BdbStoreSamplePrint#printExceptionCaught difftime-> diffTime

JIRA link - [INLONG-26] created by technoboy

[INLONG-6] Improvements&Corrections for Translation

I upload three articles regarding to the TubeMQ Lib Interface Usage, the TubeMQ console operation guideand and the TubeMQ VS Kafka Performance Comparison Test Summary.

Further suggestions for translating is very much welcomed.

JIRA link - [INLONG-6] created by TodoElMundo

[INLONG-35] check illegal package's field value

When the message is decoded, the number of ByteBuffer lists (listSize) and the length (len) value of each ByteBuffer may be negative values in the received RPC package. Negative value judgment is required, like: 

 

 

JIRA link - [INLONG-35] created by gosonzhang

[INLONG-2] Multiple languages (other than Java and C++) support in TubeMQ SDKs

As for SDK, at present, the language are only limited to Java and C++. In the short and medium term,our efforts will be devoted to the core processing flow improvement, so it's very hard to rely on the existing effort to support the use of other languages for now.
However, TubeMQ project need more SDKs of corresponding language, such as Go, Python, PHP, C# language, for getting adopted by widely range of scenarios.

As for using tools, our environment is a DO separation mode. We may not have a good understanding of the tools you use everyday. Would you like to provide some corresponding ideas, or actually provide some tools to enrich TubeMQ's operation and maintenance in this area?

Go SDK see https://issues.apache.org/jira/browse/TUBEMQ-25

JIRA link - [INLONG-2] created by zhangguocheng

[INLONG-40] Optimize message disk store classes's logic

Recently, in actual testing and troubleshooting, we analyzed and organized the logic of file storage (org.apache.tubemq.server.broker.msgstore.disk), and found some problems that need to be optimized:

1. Because the data uses an out-of-date aging mechanism, the FileSegment file indexing method is not necessary,and this logic interferes will with the use of the system;

2. The FileReadView class is intended to construct a separate view of the read data processing, and at the same time control the read operation in the view. From the actual troubleshooting, this logic is not necessary, and no requests will read expired files, the class for read encapsulation is not needed;

3. The FileSegmentList class, from the test point of view, the use of List is not as good as the array method, and not faster, at the same time, this class is best to form a container without too much other business logic, so that the abstraction will be more suitable for its positioning.

I'll append my changes to improve them.

JIRA link - [INLONG-40] created by gosonzhang

[INLONG-12] Change to use Apache License V2

I've noticed that the License in code files need modify, for example :

/*

  • Tencent is pleased to support the open source community by making TubeMQ available.
    *
  • Copyright (C) 2012-2019 Tencent. All Rights Reserved.
    *
  • Licensed under the Apache License, Version 2.0 (the "License"); you may not use
  • this file except in compliance with the License. You may obtain a copy of the
  • License at
    *

I'll modify the files to use the standard Apache License declaration.

JIRA link - [INLONG-12] created by gosonzhang

[INLONG-42] Add peer information about message received

When the consumer obtains the messages, they does not carry the message source information, such as the message obtained from which broker and which partition. It is difficult to process the message statistics, so it is necessary to add the return of this information in the program implementation

JIRA link - [INLONG-42] created by gosonzhang

[INLONG-30] Add Contact info in README.md

We need to provide the project's communication methods in a centralized manner, including mailing lists, issue management entrances, etc., so that interested parties can join the project easily and quickly.

JIRA link - [INLONG-30] created by gosonzhang

[INLONG-3] C++ SDK support in TubeMQ

I will contribute the SDK of C + +, which is currently being sorted out. Without changing the server-side interaction protocol, the semantics will be the same as the Java implementation.


Task
1.io thread management
2. Connection interface
3. Connection pool management
4.IO Buffer
5.Future/Promise support
6.Codec interface and TubeMQ interactive encoding
7. Functionalization of service interface parameter settings
8.Future service interface
9.Master implementation, metadata acquisition and reading interface
10. Consumer realization
11. Producer Realization
12.Client configuration
13. Producer API package
14. Consumer API package

Plan
Phase 1 (Available)-Implement basic functions and ensure unit test coverage
RPC support
Message consumption (pull mode)
Write message (synchronous)
Connection pool implementation-automatic reconnection of broken links, automatic recovery of idle connections, shared according to sessionFactory
Test case and specification construction-

Phase 2 (Practical)-Realize the features of each function and be able to adapt to a certain level of pressure measurement
Asynchronous production 
Authentication and authentication && Prevent production and consumption around the master 
Accurately specify offset partition consumption 
Frequency control
Consumption of multiple topics in a single group 
Server filter consumption 

Phase 3 (Easy to use)-Continuous iteration, full-featured implementation, realizing the latest function points according to priority
Inactive for more than specified minutes, mainly at the production end, such as 3 minutes 
Automatically shield the dead pixels, detect the dead pixels through the algorithm, and automatically shield the fault Broker's data transmission 
TLS 
push consumption

 

Featrue
1.rpc development
Pre-research--asio development
Connection interface
Connection pool management-connection reuse, automatic reconnection of broken links, automatic recovery of idle connections, sharing according to sessionFactory
Streaming request support-based on serialNo
Future/Promise interface
Codec interface
Log
Config
Buffer

2. Service interface development
API service interface, such as heartbeat package variable function
Request Future/Promise function encapsulation
Config: client, consumer, producer, Master, rpc
Message id

3.Client
turn stop
Configuration input
Consumer API
Producer API

4.Master
Authentication
Prevent the production and consumption around the Master: get the token from the Master and update it to the local metadata
Metadata management: Topic, Group, Master
Master Heartbeat

5. Consumer
User interface
Heartbeat support
pull consumption
Consumption of multiple topics in a single group
Authentication
Precisely specify offset partition consumption

6. Producer
User interface
Simultaneous production
Asynchronous production
Load balancing algorithm: polling
Load balancing algorithm: hash

7. Advanced features
Master disconnects and reconnects, traversing dns to obtain Master host IP
Limiting
Effectively-Once
tls:asio+openssl
Filter consumption
Inactive for more than the specified minutes, mainly at the production end, for example 3 minutes
Fuse: Automatically shield the dead spots, detect the dead spots through the algorithm, automatically shield the fault Broker's data transmission
Reporting time-consuming, failure; static weight (number of partitions), dynamic weight
Connection pool management-automatic recovery of idle connections, shared according to sessionFactory
Push consumptio

 

 

JIRA link - [INLONG-3] created by zhangguocheng

[INLONG-39] Optimize the loadMessageStores() logic

After the log file is successfully initialized, the instance should be written to the dataStores directly in the task, and there is no need to write the result by returning in loadStoresInParallel(), in order to improve the overall task and release. 

JIRA link - [INLONG-39] created by gosonzhang

[INLONG-4] Add community join document

How the user can join the community, any mail list , QQ Group, Slack or what every .

 

How developers can join and see what can help to contribute.

 

Need a document guide them.

JIRA link - [INLONG-4] created by netroby

[INLONG-9] Remove some unnecessary code

1. 'String.valueOf()'

String strCallbackFun = req.getParameter("callback");
if ((TStringUtils.isNotEmpty(strCallbackFun))
&& (strCallbackFun.length() <= TBaseConstants.META_MAX_CALLBACK_STRING_LENGTH)
&& (strCallbackFun.matches(TBaseConstants.META_TMP_CALLBACK_STRING_VALUE))) {
    strCallbackFun = String.valueOf(strCallbackFun).trim();
}

 

com.tencent.tubemq.server.master.web.action.screen.Webapi#execute

 

2. {String}.substring()

if (path.startsWith("/")) {
    path = path.substring(1, path.length());
}

com.tencent.tubemq.server.master.web.simplemvc.RequestContext#normalizePath

 

3.unuse init boolean

private boolean isOverTLS = false;

com.tencent.tubemq.corerpc.netty.NettyRpcServer  isOverTLS 

 

 

PR #3

JIRA link - [INLONG-9] created by lan.liang

[INLONG-25] [Feature]Go SDK support for TubeMQ

There are many users who use the C/C++ language, but it is worth noting that more and more businesses use the Go language, and it is not a long-term way only having C/C++ SDK, many users like the pure Go SDK.

 I will contribute the SDK of Go. Without changing the server-side interaction protocol, the semantics will be the same as the Java and C++ implementation.

  1. Multiplexed connection pool
  2. Connection interface
  3. Codec interface and TubeMQ interactive encoding
  4. Selector
  5. Master implementation, metadata acquisition and reading interface
  6. Consumer realization
  7. Client configuration
  8. Consumer API

 

 

 

 

JIRA link - [INLONG-25] created by gosonzhang

[INLONG-50] Replace fastjson to gson

Recently, the fastjson security vulnerability problem has again emerged. This component is also referenced in the TubeMQ code. In the near future, it is planned to replace fastjson with gson.

JIRA link - [INLONG-50] created by gosonzhang

[INLONG-43] Add DeletePolicy's value check

The value of DeletePolicy is a structured data content. When inputting this data, content check should be added to avoid inconsistency between the set value and the actual effective situation

JIRA link - [INLONG-43] created by gosonzhang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.