datax的elacticsearch读写插件
- 获取阿里datax源码,编译并把datax-common发布到本地仓库
- 使用maven编译:
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
- 将编译打包后的插件放到datax目录相应的位置
类型 | 数据源 | Reader(读) | Writer(写) | 文档 |
---|---|---|---|---|
无结构化数据存储 | Elasticsearch | √ | √ | 读、写 |
datax数据同步elasticsearch的reader和writer插件,支持一对多的扁平数据转换成es的嵌套对象,也支持嵌套对象的读取和ognl表达式过滤,理论上可以无限嵌套。
License: Apache License 2.0
作者你好,感谢你开发es reader的功能,用过一段时间后,发现有3个疑问向请教:
dfs_query_then_fetch
方式取数据,似乎是将索引取出后才聚合得到结果,由于索引较大,通常读取非常慢,因为WaitReaderTime等待事件较长;但是通过scan
方式会报错不再支持该功能。所以如果能取得聚合后的数据,可能会解决这个问题,而且目标端也无需再次进行数据加工。数据例子贴在下面一个回复query
方式直接取数吗,比如这个例子:index=public_pubalert* @message in ("ORA-") | stats count by date_histogram(timestamp,1d)
非常感谢!
com.alibaba.datax.common.exception.DataXException: Code:[Framework-13], Description:[DataX插件运行时出错, 具体原因请参看DataX运行结束时的错误诊断信息 .]. - java.lang.UnsupportedOperationException: JsonObject
at com.google.gson.JsonElement.getAsLong(JsonElement.java:230)
at io.searchbox.core.SearchResult.getTotal(SearchResult.java:218)
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.transportRecords(EsReader.java:285)
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.startRead(EsReader.java:159)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57)
at java.lang.Thread.run(Thread.java:748)
java.lang.UnsupportedOperationException: JsonObject
at com.google.gson.JsonElement.getAsLong(JsonElement.java:230)
at io.searchbox.core.SearchResult.getTotal(SearchResult.java:218)
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.transportRecords(EsReader.java:285)
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.startRead(EsReader.java:159)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57)
at java.lang.Thread.run(Thread.java:748)
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:40) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.scheduler.processinner.ProcessInnerScheduler.dealFailedStat(ProcessInnerScheduler.java:39) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.scheduler.AbstractScheduler.schedule(AbstractScheduler.java:99) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.schedule(JobContainer.java:535) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.start(Engine.java:92) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.entry(Engine.java:171) [datax-core-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.Engine.main(Engine.java:204) [datax-core-0.0.1-SNAPSHOT.jar:na]
java.lang.UnsupportedOperationException: JsonObject
at com.google.gson.JsonElement.getAsLong(JsonElement.java:230) ~[na:na]
at io.searchbox.core.SearchResult.getTotal(SearchResult.java:218) ~[na:na]
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.transportRecords(EsReader.java:285) ~[na:na]
at com.alibaba.datax.plugin.reader.elasticsearchreader.EsReader$Task.startRead(EsReader.java:159) ~[na:na]
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
当job.json中插入的字段type设置为nested时 插入提示成功 但是查看索引的mapping 发现没有type字段。
job.json如下:
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "app_user",
"password": "appuser",
"connection": [
{
"querySql": [
"select out_trade_no,order_type from `order_info` where create_time > '2021-01-10 00:00:00' "
],
"jdbcUrl": [
"jdbc:mysql://192.168.101.3:9002/order"
]
}
]
}
},
"writer": {
"name": "elasticsearchwriter",
"parameter": {
"flatToNested": true,
"endpoint": "http://192.168.101.80:9200",
"accessId": "elastic",
"accessKey": "qvz6pguDN8FYcZSgslRA",
"index": "order_test",
"type": "_doc",
"cleanup": false,
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"discovery": false,
"batchSize": 1000,
"splitter": ",",
"alias": "order_test",
"column": [{
"name": "out_trade_no",
"type": "text",
"colNo": 0
},
{
"name": "order_type",
"type": "text",
"colNo": 1
},
{ "name": "orders",
"type": "nested",
"child": [
{
"name": "out_trade_no",
"type": "text",
"colNo": 0
},
{ "name": "order_type",
"type": "text" ,
"colNo": 1
}
]
}
]
}
}
}
]
}
}
插入后的索引mapping如下:
"mappings": {
"_doc": {
"properties": {
"order_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"orders": {
"properties": {
"order_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"out_trade_no": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"out_trade_no": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
"table":{ "name": "TACHE", "filter": "pk != null", "nameCase": "UPPERCASE", "column": [ { "name": "flow_id", "alias": "pk", }, { "name": "taches", "child": [ { "name": "tch_id" }, { "name": "tch_mod" }, { "name": "flow_id" } ] } ] }
无法打包,缺datax-common找不到,请问有release吗
Json配置文件是如何配置的?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.