Giter VIP home page Giter VIP logo

zfthink / alldata Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alldatacenter/alldata

0.0 0.0 1.0 1.02 GB

🔥🔥 BigData 💥 大数据 💥大数据AllData平台,通过二开大数据BigData生态组件,以及大数据BigData采集、大数据BigData存储、大数据BigData计算、大数据BigData开发来建设开源社区大数据BigData平台。联系作者: https://docs.qq.com/doc/DVFVMYUp6cFhSRVJs

Home Page: https://alldata.readthedocs.io

License: Apache License 2.0

alldata's Introduction

AllData 一站式大数据平台

Stargazers over time

Stargazers over time



体验版地址 | 账密 poc/123456

体验版


image


首页


image


数据集成


image



image



image



image



image



image



image



image



image



image


元数据管理


image



image


元数据拾取


image



image



image


应用分析


image



image


系统菜单管理


image


元数据管理


image


数据质量


image


数据市场


image


数据标准


image


BI报表


image


数据资产


image


流程编排


image


AllData AI Studio 社区版


image


AllData Studio 社区版


image


1、AllData输入

实时开发

Dlink

离线开发

FlinkX

数据治理

ElAdmin

湖仓一体

Dlink+CDC+Hudi

机器学习算法平台

cube-studio

数据集成

ElAdmin

数据中台

ElAdmin

大数据集群运维平台

Rancher

数据分析

Hive+Doris

实时同步

Dlink+FlinkCDC+Doris

任务调度

DolphinScheduler

运维中心

SREWorks

数仓建模

Doris

低代码引擎

lowcode-engine

墨刀产品原型

2、输出

MVP产品

设计文档

项目会议

3、架构

前端开发

产品设计

后端架构

云原生架构

大数据架构

UI设计

部署方式

数据库版本为 mysql5.7 及以上版本

1、eladmin数据库初始化

1.1 source install/eladmin/eladmin_alldatadc.sql

1.2 source install/eladmin/eladmin_dts.sql

1.3 source install/datax/eladmin_data_cloud.sql

1.4 source install/datax/eladmin_cloud_quartz.sql

1.5 source install/datax/eladmin_foodmart2.sql

1.6 source install/datax/eladmin_robot.sql

2、修改 datax-config 配置中心

config 文件夹下的配置文件,修改 redismysqlrabbitmq 的配置信息

3、安装aspose-words

cd install/datax

mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-words -Dversion=20.3 -Dpackaging=jar -Dfile=aspose-words-20.3.jar

4、项目根目录下执行 mvn install

获取安装包build/eladmin-release-2.6.tar.gz

上传服务器解压

5、部署微服务: 进入不同的目录启动相关服务

5.1 必须启动、并且顺序启动

eureka->config->gateway

5.2 按需启动cd install/16gmaster

譬如启动元数据管理

sh install/16gmaster/data-metadata-service.sh

tail -100f install/16gmaster/data-metadata-service.log

5.2 按需启动cd install/16gdata

按需启动相关服务

5.3 按需启动cd install/16gslave

按需启动相关服务

6、部署Eladmin:

6.1 启动sh install/16gmaster/eladmin-system.sh

6.2 部署Eladmin前端

source /etc/profile

cd $(dirname $0)

source /root/.bashrc && nvm use v10.15.3

nohup npm run dev &

6.3 访问Eladmin页面

curl http://localhost:8013

用户名:admin 密码:123456

Flink数据血缘初体验

知识图谱建设方法论


image


## 知识图谱(Knowledge Graph)

知识图谱建设方法论:

一, 知识图谱技术架构: 确定知识的表示方式和知识的存储方式;

二, 知识图谱建设方法论: 知识图谱建设可以分为知识建模, 知识抽取, 知识验证这样几个阶段, 形成一个知识图谱

从知识抽取的内容上, 又可以分为实体抽取, 属性抽取, 关系抽取, 事件抽取:

实体抽取指从数据源中检测到可命名的实体, 并将它们分类到已建模的类型中, 例如人, 组织, 地点, 时间等等;

属性抽取是识别出命名实体的具体属性;

关系抽取是识别出实体与实体之间的关系, 例如从句子“著名歌手周杰伦的妻子昆凌”中识别出“周杰伦”与“昆凌”之间的夫妻关系;

事件抽取是识别出命名实体相关的事件信息, 例如“周杰伦”与“昆凌”结婚就是一个事件

可以看出实体抽取, 属性抽取, 关系抽取是抽取我们在知识建模中定义的拓扑结构部分数据,

事件抽取是事件建模相关数据的抽取, 所以在领域知识图谱建设中, 也需要包括数据准备域的抽取方式, 处置域的数据抽取方式

知 识 验 证

从各种不同数据源抽取的知识, 并不一定是有效的知识, 必须进行知识的验证, 将有效的, 正确的知识进入知识库造成知识不准确的原因,

通常是原始数据存在错误, 术语存在二义性, 知识冲突等等, 例如前面提到的"1#"压水堆, "1号"压水堆, “一号”压水堆这三个词对应一个实体,

如果在抽取中没有合理定义规则, 这就需要在知识验证阶段得到处理, 以便形成闭环

三, 基于知识图谱建设应用: 每一类应用的侧重点不同, 使用技术和达到的效果也不同, 我们总结为知识推理类, 知识呈现类, 知识问答类, 知识共享类

1, 知识图谱建设

1.1 人工数据标注工具: https://github.com/doccano/doccano

1.2 自动标注+知识抽取: https://github.com/zjunlp/DeepKE

2, 知识存储: https://github.com/alibaba/GraphScope

3, 知识图谱应用: https://github.com/lemonhu/stock-knowledge-graph

dinky新增hive2flink任务类型

1、支持执行提交hive sql running on flink

2、测试代码

@Test
void testCreateDatabase() {
    sql("create database db1").ok("CREATE DATABASE `DB1`");
    sql("create database db1 comment 'comment db1' location '/path/to/db1'")
            .ok(
                    "CREATE DATABASE `DB1`\n"
                            + "COMMENT 'comment db1'\n"
                            + "LOCATION '/path/to/db1'");
    sql("create database db1 with dbproperties ('k1'='v1','k2'='v2')")
            .ok(
                    "CREATE DATABASE `DB1` WITH DBPROPERTIES (\n"
                            + "  'k1' = 'v1',\n"
                            + "  'k2' = 'v2'\n"
                            + ")");
}

3、结果预览

测试FlinkHiveSqlParser Passed


image


Flink数据血缘初体验

1 结果预览


image


2 创建FlinkDDL

参考Resource/FlinkDDLSQL.sql

CREATE TABLE data_gen (

amount BIGINT

) WITH (

'connector' = 'datagen',

'rows-per-second' = '1',

'number-of-rows' = '3',

'fields.amount.kind' = 'random',

'fields.amount.min' = '10',

'fields.amount.max' = '11');

CREATE TABLE mysql_sink (

amount BIGINT,

PRIMARY KEY (amount) NOT ENFORCED

) WITH (

'connector' = 'jdbc',

'url' = 'jdbc:mysql://localhost:3306/test_db',

'table-name' = 'test_table',

'username' = 'root',

'password' = '123456',

'lookup.cache.max-rows' = '5000',

'lookup.cache.ttl' = '10min'

);

INSERT INTO mysql_sink SELECT amount as amount FROM data_gen;

3 执行com.platform.FlinkLineageBuild

获取结果

1、Flink血缘构建结果-表:

[LineageTable{id='4', name='data_gen', columns=[LineageColumn{name='amount', title='amount'}]},

LineageTable{id='6', name='mysql_sink', columns=[LineageColumn{name='amount', title='amount'}]}]

表ID: 4

表Namedata_gen

表ID: 4

表Namedata_gen

表-列LineageColumn{name='amount', title='amount'}

表ID: 6

表Namemysql_sink

表ID: 6

表Namemysql_sink

表-列LineageColumn{name='amount', title='amount'}

2、Flink血缘构建结果-边:

[LineageRelation{id='1', srcTableId='4', tgtTableId='6', srcTableColName='amount', tgtTableColName='amount'}]

表-边: LineageRelation{id='1', srcTableId='4', tgtTableId='6', srcTableColName='amount', tgtTableColName='amount'}

AllData Doris


image


AllData全新定制一站式场景化大数据中台


image


大数据组件管理DOCKER FOR DATA PLATFORM

1、配置主机服务HOST


image


2、启动大数据集群


image


3、YARN正常访问


image


4、HIVE正常使用


image


5、HDFS正常访问


image


6、ES健康检测


image


7、KIBANA UI访问


image


8、PRESTO UI访问


image


9、HBASE正常访问


image


10、FLIKN RUNTIME WEB 正常访问


image


使用Docker/K8S云原生方案-控制各种组件起停

1、BUSINESS FOR ALL DATA PLATFORM 商业项目

2、BUSINESS FOR ALL DATA PLATFORM 计算引擎

3、DEVOPS FOR ALL DATA PLATFORM 运维引擎

4、DATA GOVERN FOR ALL DATA PLATFORM 数据治理引擎

5、DATA Integrate FOR ALL DATA PLATFORM 数据集成引擎

6、AI FOR ALL DATA PLATFORM 人工智能引擎

7、DATA ODS FOR ALL DATA PLATFORM 数据采集引擎

8、OLAP FOR ALL DATA PLATFORM OLAP查询引擎

9、OPTIMIZE FOR ALL DATA PLATFORM 性能优化引擎

10、DATABASES FOR ALL DATA PLATFORM 分布式存储引擎

Flink Table Store && Lake Storage POC

2.1 SQL~Flink table store poc

set execution.checkpointing.interval=15sec;

CREATE CATALOG alldata_catalog WITH (

'type'='table-store',

'warehouse'='file:/tmp/table_store'

);

USE CATALOG alldata_catalog;

CREATE TABLE word_count (

word STRING PRIMARY KEY NOT ENFORCED,

cnt BIGINT

);

CREATE TEMPORARY TABLE word_table (

word STRING

) WITH (

'connector' = 'datagen',

'fields.word.length' = '1'

);

INSERT INTO word_count SELECT word, COUNT(*) FROM word_table GROUP BY word;

-- POC Test OLAP QUERY

SET sql-client.execution.result-mode = 'tableau';

RESET execution.checkpointing.interval;

SET execution.runtime-mode = 'batch';

SELECT * FROM word_count;

-- POC Test Stream QUERY

-- SET execution.runtime-mode = 'streaming';

-- SELECT interval, COUNT(*) AS interval_cnt FROM

-- (SELECT cnt / 10000 AS interval FROM word_count) GROUP BY interval;

2.2 Flink Runtime Web


image


2.3 Flink Batch


image


2.4 Flink Olap Read


image


2.5 Flink Stream Read


image


Dlink二开新增Flink1.16.0支持

1、Dlink配置Flink Table Store相关依赖


image


### 2、Dlink启动并运行成功

image


### 3、OLAP查询

image


4、Flink1.16.0 Dlink流式读

4.1 Stream Read 1


image


> 4.2 Stream Read 2

image


Architecture


image


image


Component Description Important Composition
ai AI STUDIO FOR ALL DATA PLATFORM artificial intelligence engine 人工智能引擎
assembly WHOLE PACKAGE BUILD FOR ALL DATA PLATFORM assembly engine 整包构建引擎
buried BURIED FOR ALL DATA PLATFORM data acquisition engine 埋点解决方案
buried-trade BURIED TRADE FOR ALL DATA PLATFORM commerce engine 商业系统
cluster DATA SRE FOR ALL DATA PLATFORM OLAP query engine 智能大数据运维引擎
crawlerlab CRAWLER PLATFORM FOR ALL DATA PLATFORM commerce engine 爬虫引擎系统
document DOCUMENT FOR ALL DATA PLATFORM OLAP query engine 官方文档
dts DTS FOR ALL DATA PLATFORM DATA DTS engine 数据集成引擎
fs DATA STORAGE FOR ALL DATA PLATFORM DATA STORAGE engine 大数据存储引擎
govern DATA GOVERN FOR ALL DATA PLATFORM Data Governance Engine 数据治理引擎
iot IOT FOR ALL DATA PLATFORM Data Governance Engine 云原生IOT开发框架
knowledge KNOWLEDGE GRAPH FOR ALL DATA PLATFORM Data Task Engine 知识图谱引擎
lakehouse ONE LAKE FOR ALL DATA PLATFORM ONE LAKE engine 数据湖引擎
market MARKET FOR ALL DATA PLATFORM MARKET engine 数据实验场引擎
olap OLAP FOR ALL DATA PLATFORM OLAP query engine 混合OLAP查询引擎
studio ONE HUB FOR ALL DATA PLATFORM ONE HUB Engine AllData总部前后端解决方案
trade TRADE FOR ALL DATA PLATFORM TRADE Engine TRADE引擎
wiki WIKI FOR ALL DATA PLATFORM WIKI Engine AllData知识库
alldata AllData社区项目通过二开大数据生态组件,以及大数据采集、大数据存储、大数据计算、大数据开发来建设一站式大数据平台 Github一站式开源大数据平台AllData社区项目

AllData社区商业计划图

image


AllData社区项目业务流程图

image


AllData社区项目树状图

image


全站式AllData产品路线图


image


AllData社区项目时间旅行

image


实时推荐系统业务流程图

image


AllData总部前后端解决方案

包括AllData前后端解决方案、多租户运维平台前后端

基于eladmin + tenant 建设AllData前后端解决方案

1、AllData前端解决方案 studio/eladmin-web

2、AllData后端解决方案 studio/eladmin

3、多租户运维平台前端 studio/tenant

4、多租户运维平台前端 studio/tenantBack

image

Integration

Data Quality


image


image


image


image


image


image



Livy访问查看JOB


image


image


离线商城数仓展示


image

image

image

image

image

image

image


Community

联系作者: https://docs.qq.com/doc/DVFVMYUp6cFhSRVJs

alldata's People

Contributors

1820586026 avatar alldatafounder avatar ccckdi avatar vue-penghong avatar yg9538 avatar

Forkers

zfthink88

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.