Giter VIP home page Giter VIP logo

bigquery_gateway's Introduction

BigqueryGateway

Elasticsearch stuff

List all indices

$ curl -s http://es-hist-test-01.totango:9200/_cat/indices\?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   291_2018_07_1_idx   9UVGYooMS9-_E_BpkjlqGQ   1   1          0            0       320b           160b
green  open   244_2019_03_1_idx   2xlIxf1qQ6G1mEmO5dBbPA   2   1          0            0       636b           318b
green  open   241_2018_07_1_idx   uo-tYaNvTYWg415pCJANJA   1   1          0            0       320b           160b
green  open   271_2019_04_1_idx   7x9sJHkBRwWaDamY9ZJ6NQ   2   1   11797542            0      1.2gb        640.5mb
green  open   515_2018_12_1_idx   vVlYZUKdT4KU1qWXyZreYA   2   1          0            0       640b           320b
green  open   106_2018_10_1_idx   WzDzxS0HS7eX0VqmTbxMWg   2   1       3510           65    836.9kb        418.4kb
green  open   246_2019_01_1_idx   fMCHCD_zRLKe_lcRkZESFw   2   1     220131            0     26.4mb         13.2mb
green  open   82_2018_12_1_idx    iL11jtM0S-Gb-0dOWXnJuQ   2   1   13414523            0      1.2gb        646.7mb

# And looking at the totango service  

curl -s http://es-hist-test-01.totango:9200/_cat/indices\?v | awk '{print $3, $9}' | grep 880 | sort
880_2018_04_1_idx 1gb
880_2018_05_1_idx 5.2gb
880_2018_06_1_idx 5.6gb
880_2018_07_1_idx 5.9gb
880_2018_08_1_idx 7.4gb
880_2018_09_1_idx 6.6gb
880_2018_10_1_idx 7.8gb
880_2018_11_1_idx 7.7gb
880_2018_12_1_idx 9.9gb
880_2019_01_1_idx 8gb
880_2019_02_1_idx 9gb
880_2019_03_1_idx 8.6gb
880_2019_04_1_idx 17.1gb
880_2019_05_1_idx 632.2mb

dumping the historical prod cluster state

$ 

Get the mappings

$ curl http://es-hist-test-01.totango:9200/160_2019_02_1_idx/_mappings | json_pp
$ curl -XPOST http://es-hist-test-01.totango:9200/160_2019_02_1_idx/_search\?size\=5 > ./5_rows.json

Determining the bigquery schema from the elasticsearch mapping

Lets's use the "products" obbject as an example

products' elasticsearch mapping

"products" : {
  "properties" : {
     "all" : {
        "type" : "keyword"
     },
     "direct" : {
        "type" : "keyword"
     }
  }
}

So in elasticsearch, any field can contain zero or more values (of the same type), so let's look at the data
Get some sample data with "documents" field, to see its data structure

$ curl -s http://es-hist-test-01.totango:9200/248_2019_05_1_idx/_search -d '{"query":{"query_string":{"fields":["products"],"query":"*"}}}'

Extacting out the products object

"products":{"all":["Transportation","Marketing"],"direct":["Transportation","Marketing"]}

so it's an array..

The bigquery schema will be

{
	"type": "record",
	"name": "products",
	"mode": "nullable",
	"fields": [{
		"type": "string",
		"name": "all",
		"mode": "repeated"
	}, {
		"type": "string",
		"name": "direct",
		"mode": "repeated"
	}]
}

Grafana

Historical production cluster

http://seer-prod/d/000000046/elasticsearch-client?orgId=1&from=now%2Fw&to=now%2Fw&var-environment=prod&var-serviceId=All&var-component=application&var-activity=All&var-documentType=historical_account
http://seer-prod/d/000000038/elasticsearch?orgId=1&var-env=prod&var-cluster=hist&var-host=All&from=now-24h&to=now http://seer-prod/d/000000038/elasticsearch?panelId=7&fullscreen&orgId=1&var-env=prod&var-cluster=hist&var-host=All&from=now%2Fw&to=now%2Fw

vim newline between docs doc

:%s/\"_source\":{/\n\r{/g

Working with elasticsearch scroll

λ  curl -s -XPOST http://es-hist-test-01.totango:9200/160_2019_04_1_idx/_search\?scroll\=1m -d '{ "size": 5,"query": { "range" : { "date" : { "gte" : "now-1d/d" } } } }'
{"_scroll_id":"the-scroll-id-AbC12","took":1,"timed_out":false,"_shards":{"total":2,"successful":2,"failed":0},"hits":{"total":0,"max_score":null,"hits":[{doc-a},{doc-b},{doc-c},{doc-k}]}}%
λ  curl -s -XPOST http://es-hist-test-01.totango:9200/160_2019_04_1_idx/_search\?scroll\=1m -d '{ "size": 5,"query": { "range" : { "date" : { "gte" : "now-1d/d" } } } }'
curl -s -XPOST http://es-hist-test-01.totango:9200/_search/scroll -d '{"scroll": "1m", "scroll_id":"the-scroll-id-AbC12"}'
{"_scroll_id":"the-scroll-id-AbC12","took":1,"timed_out":false,"_shards":{"total":2,"successful":2,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}%
λ  curl -s -XPOST http://es-hist-test-01.totango:9200/_search/scroll -d '{"scroll": "1m", "scroll_id":"the-scroll-id-AbC12"}'
{"error":{"root_cause":[{"type":"search_context_missing_exception","reason":"No search context found for id [51641]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [51641]"}}],"caused_by":{"type":"search_context_missing_exception","reason":"No search context found for id [51641]"}},"status":404}%
$ mix run ./lib/bigquery_gateway/create_bq_table.exs --project "promenade-222313" --dataset "integration_hub" --table "historical" --schema "./priv/historical_schema.json" --partition-on "date" --cluster-on "service_id"
$
$ mix run ./lib/bigquery_gateway/es_2bq_copier.exs --es-endpoint "http://es-hist-test-01.totango:9200" --es-index "880_2019_05_1_idx" --es-days-back 1 --bq-project "promenade-222313" --bq-dataset "integration_hub" --bq-table "historical" --es-scroll-size 2000 --bq-batch-size 500

bigquery_gateway's People

Contributors

izhar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.