Giter VIP home page Giter VIP logo

Comments (6)

john5480 avatar john5480 commented on June 9, 2024
{
    "took":2,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "failed":0
    },
    "hits":{
        "total":13844,
        "max_score":1,
        "hits":[
            {
                "_index":"public_pubalert_eoi_20210101_20211231",
                "_type":"log",
                "_id":"AXa6OUpWXaSEQ6uiHp1P",
                "_score":1,
                "_source":{
                    "@timestamp":"2021-01-01T03:14:29.000+08:00",
                    "source":"alert_testdb1",
                    "timestamp":"2021-01-01T03:14:29.000+08:00",
                    "@message":"Fri Jan 01 03:14:29 2021 Stopping background process CJQ0",
                    "@tags":[
                        "tag_on_failure",
                        "SGrok: "
                    ],
                    "@collectiontime":"2021-01-01T03:14:40.065+08:00",
                    "@rownumber":24667,
                    "instance":"testdb1",
                    "@ip":"10.0.0.1",
                    "@hostname":"testdb1",
                    "@@id":"18980b4999d9dddba5836c23bbad8866",
                    "@@datasetid":"263322181648384",
                    "@dataset":"pubalert",
                    "@path":"/u01/oracle/diag/rdbms/testdb/testdb1/trace/alert_testdb1.log"
                }
            }
        ]
    }
}

from datax-elasticsearch.

Kestrong avatar Kestrong commented on June 9, 2024

作者你好,感谢你开发es reader的功能,用过一段时间后,发现有3个疑问向请教:

  1. 怎样取得聚合后的数据结果?使用dfs_query_then_fetch方式取数据,似乎是将索引取出后才聚合得到结果,由于索引较大,通常读取非常慢,因为WaitReaderTime等待事件较长;但是通过scan方式会报错不再支持该功能。所以如果能取得聚合后的数据,可能会解决这个问题,而且目标端也无需再次进行数据加工。数据例子贴在下面一个回复
  2. 可以支持query方式直接取数吗,比如这个例子:
    index=public_pubalert* @message in ("ORA-") | stats count by date_histogram(timestamp,1d)
    这样也能变相解决问题1。
  3. 以问题2的查询为例,聚合后假如源es表中没有count列,目标能输出这个count列吗?

非常感谢!

不同版本的es功能有所差异,具体要以所使用的版本的特性去配置和使用。
1、scan报错是因为你所使用的版本高于5.0,scan和count的search_type已经不支持了,索引数据量大应该考虑采用分页的方式,目前提供了scroll方式的分页,具体查看reader的文档;
2、问题二意义不明,请提供es原生的语法,reader的search支持配置原生的es查询语法,返回的结果中_source字段下面的属性都能通过配置取到。

from datax-elasticsearch.

john5480 avatar john5480 commented on June 9, 2024

感谢回复!
我所用的es版本是5.4.3,下面的语法是原生的语句,逻辑是取每秒最大的值。
但是max函数没起到聚合的效果,每秒所有的值全部取出来了。

{ 
"job": {
    "setting": {
      "speed": {
        "byte": 1048576000
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "elasticsearchreader",
          "parameter": {
            "endpoint": "http://10.0.0.1:9200",
            "accessId": "xxxx",
            "accessKey": "xxxx",
            "index": "index_1*",
            "type": "log",
            "searchType": "dfs_query_then_fetch",
            "headers": {
            },
            "scroll": "3m",
            "search": [
{
	"from": 0,
	"query": {
		"bool": {
			"must": [
				{"range": {
           "@timestamp": {
             "gte": "now-1d",
             "lte": "now",
             "format": "yyyy-MM-dd-HH",
             "time_zone":"+08:00"
           }
         }
         },
         {
					"wildcard": {
						"@ip": {
							"wildcard": "*10.0.0.20*",
							"boost": 1
						}
					}
				}
			],
			"disable_coord": false,
			"adjust_pure_negative": true,
			"boost": 1
		}
	},
	"aggregations": {
		"date_histogram_timestamp": {
			"date_histogram": {
				"format": "HH:mm:ss",
				"keyed": false,
				"field": "timestamp",
				"min_doc_count": 0,
				"interval": "1m",
				"offset": 0,
				"order": {
					"_key": "asc"
				},
				"time_zone": "Asia/Shanghai"
			},
			"aggregations": {
				"max_delays": {
					"max": {
						"field": "delays"
					}
				}
			}
		}
	}
}             
            ],
            "table":{
              "name": "index_1*",
              "column": [
                {
                  "name": "@timestamp"
                },
                {
                  "name": "ip_dst"
                },
                {
                  "name": "trans_count"
                },
                
                
              ]
            }
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
"path": "/tmp/out",
            "fileName":"new-inflxudb.csv",
            "print": true,
            "encoding": "UTF-8"
          }
        }
      }
    ]
  }
}


from datax-elasticsearch.

john5480 avatar john5480 commented on June 9, 2024

es官方是支持只取聚合后数据的,加入"size": 0这个参数即可,但是使用datax就只能导出0行数据了,这是什么原因呢?
文档参考:https://www.elastic.co/guide/en/elasticsearch/reference/5.4/returning-only-agg-results.html

from datax-elasticsearch.

Kestrong avatar Kestrong commented on June 9, 2024

@john5480 1、语法问题这里不做回复,只讨论插件使用和功能上的问题(不考虑你的目的,你的语法也是有问题的,字段与索引结构都对应不上);2、插件不支持取聚合的返回结果只能获取_source节点下的属性。建议你把数据分析放到应用层面去做,插件的定位只做数据同步,实在有这方面需求可以自行开发或者查找其他替代品。

from datax-elasticsearch.

john5480 avatar john5480 commented on June 9, 2024

好的,非常感谢。

from datax-elasticsearch.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.