Comments (6)
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":13844,
"max_score":1,
"hits":[
{
"_index":"public_pubalert_eoi_20210101_20211231",
"_type":"log",
"_id":"AXa6OUpWXaSEQ6uiHp1P",
"_score":1,
"_source":{
"@timestamp":"2021-01-01T03:14:29.000+08:00",
"source":"alert_testdb1",
"timestamp":"2021-01-01T03:14:29.000+08:00",
"@message":"Fri Jan 01 03:14:29 2021 Stopping background process CJQ0",
"@tags":[
"tag_on_failure",
"SGrok: "
],
"@collectiontime":"2021-01-01T03:14:40.065+08:00",
"@rownumber":24667,
"instance":"testdb1",
"@ip":"10.0.0.1",
"@hostname":"testdb1",
"@@id":"18980b4999d9dddba5836c23bbad8866",
"@@datasetid":"263322181648384",
"@dataset":"pubalert",
"@path":"/u01/oracle/diag/rdbms/testdb/testdb1/trace/alert_testdb1.log"
}
}
]
}
}
from datax-elasticsearch.
作者你好,感谢你开发es reader的功能,用过一段时间后,发现有3个疑问向请教:
- 怎样取得聚合后的数据结果?使用
dfs_query_then_fetch
方式取数据,似乎是将索引取出后才聚合得到结果,由于索引较大,通常读取非常慢,因为WaitReaderTime等待事件较长;但是通过scan
方式会报错不再支持该功能。所以如果能取得聚合后的数据,可能会解决这个问题,而且目标端也无需再次进行数据加工。数据例子贴在下面一个回复- 可以支持
query
方式直接取数吗,比如这个例子:
index=public_pubalert* @message in ("ORA-") | stats count by date_histogram(timestamp,1d)
这样也能变相解决问题1。- 以问题2的查询为例,聚合后假如源es表中没有count列,目标能输出这个count列吗?
非常感谢!
不同版本的es功能有所差异,具体要以所使用的版本的特性去配置和使用。
1、scan报错是因为你所使用的版本高于5.0,scan和count的search_type已经不支持了,索引数据量大应该考虑采用分页的方式,目前提供了scroll方式的分页,具体查看reader的文档;
2、问题二意义不明,请提供es原生的语法,reader的search支持配置原生的es查询语法,返回的结果中_source字段下面的属性都能通过配置取到。
from datax-elasticsearch.
感谢回复!
我所用的es版本是5.4.3,下面的语法是原生的语句,逻辑是取每秒最大的值。
但是max函数没起到聚合的效果,每秒所有的值全部取出来了。
{
"job": {
"setting": {
"speed": {
"byte": 1048576000
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "elasticsearchreader",
"parameter": {
"endpoint": "http://10.0.0.1:9200",
"accessId": "xxxx",
"accessKey": "xxxx",
"index": "index_1*",
"type": "log",
"searchType": "dfs_query_then_fetch",
"headers": {
},
"scroll": "3m",
"search": [
{
"from": 0,
"query": {
"bool": {
"must": [
{"range": {
"@timestamp": {
"gte": "now-1d",
"lte": "now",
"format": "yyyy-MM-dd-HH",
"time_zone":"+08:00"
}
}
},
{
"wildcard": {
"@ip": {
"wildcard": "*10.0.0.20*",
"boost": 1
}
}
}
],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"date_histogram_timestamp": {
"date_histogram": {
"format": "HH:mm:ss",
"keyed": false,
"field": "timestamp",
"min_doc_count": 0,
"interval": "1m",
"offset": 0,
"order": {
"_key": "asc"
},
"time_zone": "Asia/Shanghai"
},
"aggregations": {
"max_delays": {
"max": {
"field": "delays"
}
}
}
}
}
}
],
"table":{
"name": "index_1*",
"column": [
{
"name": "@timestamp"
},
{
"name": "ip_dst"
},
{
"name": "trans_count"
},
]
}
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"path": "/tmp/out",
"fileName":"new-inflxudb.csv",
"print": true,
"encoding": "UTF-8"
}
}
}
]
}
}
from datax-elasticsearch.
es官方是支持只取聚合后数据的,加入"size": 0这个参数即可,但是使用datax就只能导出0行数据了,这是什么原因呢?
文档参考:https://www.elastic.co/guide/en/elasticsearch/reference/5.4/returning-only-agg-results.html
from datax-elasticsearch.
@john5480 1、语法问题这里不做回复,只讨论插件使用和功能上的问题(不考虑你的目的,你的语法也是有问题的,字段与索引结构都对应不上);2、插件不支持取聚合的返回结果只能获取_source节点下的属性。建议你把数据分析放到应用层面去做,插件的定位只做数据同步,实在有这方面需求可以自行开发或者查找其他替代品。
from datax-elasticsearch.
好的,非常感谢。
from datax-elasticsearch.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datax-elasticsearch.