contentsquare / chproxy Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 256.0 17.06 MB

Open-Source ClickHouse http proxy and load balancer

Home Page: https://www.chproxy.org/

License: MIT License

Makefile 0.51% Go 99.44% Dockerfile 0.05%

clickhouse clickhouse-proxy load-balancer proxy

chproxy's People

Contributors

Stargazers

Watchers

Forkers

algorithm-buaa valyala im-kulikov jpiper tarasenkomgid jeaklee gerasim13 yangspeaking maximsnezhkov henyihanwobushi alexdzyoba wuyanxing001 th-ha archsaber songhtdo trendingtechnology developgo zedge josephmilla wyx0578 wchch huangyingting mahary chijiajian antonkuzvalid lekspyl chris-data nineteenthspring ellap711 wang-jinfeng ivi-ru kcking levonet iamnotgay forkkit gpf199541 elemir digitalsanity introspection3 dpengpeng yutiansut yymysql fortyhub jb-abbadie macdonsp nelsonspark taozhou lalex kingwide viokuma mancubus77 stigsb dongbin86 sunny19930321 turningme raghu999 chhetripradeep renruoxu fourspaces wangjirui msaf1980 powersam richard403 anheicang greedyu hyqkuku kasnet forestlzj onlyone0001 1027714327 xunliu doslin rubik-ai tochka isgasho zhaodaiyang0574 mateng0915 q2515045 shanhuhai5739 simpod syndicut jiajie999 awesomeleo ares2013 gdd3 abeusher besthaoge ishuowei zygitup al-assad ohodoa junneyang xevgeny oaplatform wenyichuan fangzbxy zhanglei dubin555 xuxulinux corncode

chproxy's Issues

HTTP 400 unsupported path "/query" error with latest clickhouse-odbc

chproxy is not working with https://github.com/yandex/clickhouse-odbc/releases/tag/v1.0.0.20190201 driver

Proxy cascading

Hi!
Is there any way to configure proxies to query another proxies?
I found that a proxy can't pass heartbeat check from another proxy (unauthorized).
And i would like to configure compression between proxies if cascading is real.
thx

Support compression transfer data

I use the clickhouse-jdbc to write the data to the Clickhouse(1.1.54318)
I will set the properties compress and decompress as true.
The jdbc driver uses LZ4 to compress and decompress the data. And it can transfer data with Clickhouse(1.1.54318).
When I connect to chproxy, it doesn't work properly.

error log

Caused by: java.io.IOException: Magic is not correct: 103
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.readNextBlock(ClickHouseLZ4Stream.java:93) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.checkNext(ClickHouseLZ4Stream.java:74) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.read(ClickHouseLZ4Stream.java:50) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.response.StreamSplitter.readFromStream(StreamSplitter.java:85) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.response.StreamSplitter.next(StreamSplitter.java:47) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.response.ClickHouseResultSet.<init>(ClickHouseResultSet.java:65) ~[clickhouse-jdbc-0.1.34.jar:na]
	at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:117) ~[clickhouse-jdbc-0.1.34.jar:na]

config.yml

hack_me_please: false

server:
  http:
      listen_addr: "0.0.0.0:9090"
      allowed_networks: ["192.168.1.0/24"]
  metrics:
      allowed_networks: ["192.168.1.0/24"]

users:
  - name: "default"
    to_cluster: "default"
    to_user: "default"
    allow_cors: true

Ability to configure allowedParams

We have some clients that send "max_result_rows" (I'm looking at you, JDBC connector!) and we liked that chproxy stripped this out as readonly users can't change this parameters. It would be good if the allowedParams could be configured

Can the chproxy configuration be dynamically updated？

Configuration may be updated without restart - just send SIGHUP signal to chproxy process.

[Feature Request] Support listen ipv6

https://github.com/Vertamedia/chproxy/blob/master/main.go#L118

Maybe setting in config - enableTCP6

is there anything available to deploy this on kubernetes?

is there anything available to deploy this on kubernetes?
helm charts or some values.yaml you guys might use?

Write to Multiple clusters?

Is it possible to insert into multiple clusters?

this would be a nice feature. since it could come in handy when migrating to a new cluster

Can't connect to Managed ClickHouse cluster in the Yandex.Cloud

The only way to connect to cluster is via https with Yandex certificate
Trying to connect via https to rc1b-z5qstya9********.mdb.yandexcloud.net:8443 node but receiving error:
ERROR: 2019/10/11 13:19:13 scope.go:638: error while health-checking "rc1b-z5qstya9********.mdb.yandexcloud.net:8443" host: cannot send request in 82.730564ms: Get https://rc1b-z5qstya9********.mdb.yandexcloud.net:8443: x509: certificate signed by unknown authority
Scheme: "https" is set.
But how to add certificate to "cluster" section?

Another solution is to allow insecure https connections. Is it possible?

[Feature Request] Round Time from Grafana

Grafana generates query with time in seconds and is is very hard to cache it (because if i use time range from date to now - now change every second)

It would be very cool if you can trim date in SQL with some options: by seconds,minutes,hours,days i think that feature can be implemented with REGex

what do you think?

[Feature Request] support for enable_http_compression parameter

Please add support of enable_http_compression parameter to get gziped responses from server.

According to clickhouse doc https://clickhouse.yandex/docs/en/interfaces/http_interface/

Also standard gzip-based HTTP compression can be used. To send gzip compressed POST data just add Content-Encoding: gzip to request headers, and gzip POST body. To get response compressed, you need to add Accept-Encoding: gzip to request headers, and turn on ClickHouse setting called enable_http_compression.

How chproxy spread INSERT

users:
  - name: "insert"
    to_cluster: "stats-raw"
    to_user: "default"

clusters:
  - name: "stats-raw"

    # Requests are spread in `round-robin` + `least-loaded` fashion among nodes.
    # Unreachable and unhealthy nodes are skipped.
    nodes: [
      "10.10.10.1:8123",
      "10.10.10.2:8123",
      "10.10.10.3:8123",
    ]

Take the above configuration as an example
What I don't understand is how chproxysql executes INSERT statements

i.e
request A , B, C. A have 10 INSERT statement and insert 10 rows, B 100 INSERT..., C 1000 INSERT...
A, B, C will send to 10.10.10.1, 10.10.10.2, 10.10.10.3 respectively, so 10.10.10.1 have 10 record in local table, 10.10.10.2 have 100, 10.10.10.3 have 1000?
OR
the totoal 10+100+1000=1110 rows will Insert three nodes evenly?

sorry for my poor english

Unknown setting no_cache

I have several clients written over jdbc using ch-proxy. i Have standard configuration -

log_debug: true

hack_me_please: true
server:
  http:
    listen_addr: ":9090"

users:
  - name: "..."


clusters:
  - name: ".."

     Requests are spread in `round-robin` + `least-loaded` fashion among nodes.
     Unreachable and unhealthy nodes are skipped.
    nodes: [
....
    ]

And when i'm trying to send a request with external data (using post method i suppose) ch-proxy adds paramter no_cache and click house throw an exception - Code: 115, e.displayText() = DB::Exception: Unknown setting no_cache.

I suppose it woud be nice to remove this

// disable cache for external_data queries
			params.Set("no_cache", "1")
			log.Debugf("external data params detected - cache will be disabled")

[Feature Request] Cache settings in days

Now i can set cache settings in seconds, but for heavy requests (in my project) i use cache for 3-7 days it would be good if i can set settings in days.

Example:
Now i use expire: 259000s
i want: expire: 3d

Please implement internal chproxy url for health checks

Hi,

checking if chproxy is still alive would be much easier if it would have an internal status/healph page, which returns 200 if everything is okay. Like clickhouse anwers with 200 on / and /ping.

Thanks,

bernd

could apply file change very soon

we want to add users on the fly without restarting.
For now , we try to modify config.yml to add users item .
But chproxy need restart to apply configuration changes.
Chproxy Users will lose their connection if we restart ,
so it looks like a bad option

At the same time , if we modify the configuration in ClickHouse, it will apply without restarting.

Could we have this feature in chproxy, thanks

Format specifier is incorrect in scope.go causing resulting in 'go test' failure

Running 'go test' on master results in:

./scope.go:82:9: Sprintf format %s has arg time.Since(s.startTime).Nanoseconds() / 1000.0 of wrong type int64

This was introduced in #70 which changed the value from a Duration to int64 without updating the format specifier

Do not print loaded config to a log file

By design, all passwords stored in the config file in a plaintext. During config loading/reloading, whole config including passwords will be printed to the log:
https://github.com/Vertamedia/chproxy/blob/master/main.go#L272

We're running chproxy in a docker, and collecting logs from a service for further analysis. Passwords stored in a log file creating a security issue for us.
Also, i think it's useful to print a loaded config while debugging (with a log_debug = true), but makes no sense during a normal operation.

Is it possible print the config only with a log_debug = true only? Or at least hide a passwords somehow?

Thanks!

True Round Robin Routing

Before 23:50, there were 4 nodes in the cluster where 1 node was broken.
After 23:50, there are 3 working nodes in a cluster.
The graphs show an uneven distribution of queries.
All queries from the first node are moved to the next node.
This behavior can lead to cascading overload of all nodes.

shard can delete？

hi:
my english is low,but i have a question,i want to delete data in 2 shard,how do it ,thanks.

Provide read/write timeouts options in config

There should be possibility to set read/write timeouts for proxy via config

pass format_schema query param setting to clickhouse

We are setting format_schema as a query parameter in our POST request, however when chproxy proxies the request, this parameter is not included and Clickhouse errors with DB::Exception: The format Protobuf requires a schema. The 'format_schema' setting should be set (version 19.14.6.12 (official build))

This would be resolved if chproxy forwarded this query parameter to the server.

Images for README

identical request should be able to be refused after grace time elapse

I think identical request should be able to be refused after grace time elapse. but I spot that all remain requests have been delegated to clickhouse servers after first request timeout. and no options can be configured. when set grace_time: -1s, the server can't be started. display error: can't load config "/data/service/ck-proxy/config.yml": not a valid duration string: "-1s"

Log functions enhancement

The log function is very simple now. When using it in production environment, I think it's better to have some more function. First, it's better to have the log rotation and the log path can be config. The second is the log file size limit can be set.

Does the core team have this plan in the Roadmap. or we can discuss it here, and if more people think it is useful, I can implement it after more details have been discussed.

[Feature Request] Support prepared statements

Actually, this might be a bit out of scope for a plain proxy but in some of projects the only reason for a client facing self-written middleware instead of just the chproxy is that we want to have another abstraction layer in between and prevent users from running super expensive queries and for example.

I see that this is anything but not simple and fast but I think that this would increase the value of chproxy a lot.

Would love to see the support for prepared statements in the future.

Let me know if you need more details.

Best regards,
Marcel

Distributed table INSERTS

We have 3 clickhouse nodes (3 shards) in a cluster. It is noted in Readme for INSERTS that

..The routing logic may be embedded either directly into applications generating INSERTs or may be moved to a proxy. Proxy approach is better since it allows re-configuring ClickHouse cluster without modification of application configs and without application downtime.

We are inserting in batches using begin->prepare->(in loop exec)->commit. So each individual commit will contain multiple rows which has to be distributed among shards.

If we use distributed table, then the clickhouse shard will decide based on the sharding logic to decide to which shard this row has to go to. How will the proxy know to which shard should each row be written into? ie. how does the proxy know the sharding logic

Is it possible with chproxy and inserts to say that the

chproxy just has to send each commit that it receives to one single clickhouse node which has a distributed table (which will do the re-routing as needed).
chproxy has to round robin this among 3 different clickhouse nodes (for load balancing)
If a write fails, then it should retry the failure into the next clickhouse node

Letsencrypt issue

https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188

chproxy mistakenly detects remote client connection closing

We are currently using chproxy version 1.12.0.

When running heavier queries that take longer time, they are being killed after some time, despite having max_execution_time value set to 0 in both chproxy and clickhouse.

Here one example:

$ curl -v "[email protected]:9090/?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow"
*   Trying 10.240.0.23...
* TCP_NODELAY set
* Connected to 10.240.0.23 (10.240.0.23) port 9090 (#0)
* Server auth using Basic with user 'benchmark'
> GET /?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow HTTP/1.1
> Host: 10.240.0.23:9090
> Authorization: Basic YmVuY2htYXJrOg==
> User-Agent: curl/7.61.1
> Accept: */*
> 
* Empty reply from server
* Connection #0 to host 10.240.0.23 left intact
curl: (52) Empty reply from server

As you can see, the connection is killed. Here are some relevant logs from chproxy:

Oct 04 11:34:41 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:34:41 proxy.go:74: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 51.858µs ]: request start
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 proxy.go:199: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 2m39.080545361s ]: remote client closed the connection in 2m39.080353788s; query: "SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow"
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 scope.go:243: killing the query with query_id=155A63380D95D3C6
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 scope.go:283: killed the query with query_id=155A63380D95D3C6; respBody: ""
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 proxy.go:115: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 2m39.161807652s ]: request failure: non-200 status code 499; query: "SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow"; URL: "http://10.240.0.16:8123/?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow&query_id=155A63380D95D3C6"

And from clickhouse:

2018.10.04 11:34:41.713028 [ 11290 ] <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: GET, Address: [::ffff:10.240.0.23]:43198, User-Agent: RemoteAddr: 192.168.11.231:49580; LocalAddr: 10.240.0.23:9090; CHProxy-User: benchmark; CHProxy-ClusterUser: readonly; curl/7.61.1
2018.10.04 11:34:41.713126 [ 11290 ] <Trace> HTTPHandler: Request URI: /?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow&query_id=155A63380D95D3C6
2018.10.04 11:34:41.769439 [ 11290 ] <Debug> executeQuery: (from [::ffff:10.240.0.23]:43198, user: readonly, query_id: 155A63380D95D3C6) SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow
2018.10.04 11:34:41.975876 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): Key condition: (column 0 in [1536019200, +inf)), (column 0 in (-inf, 1538611200]), and                                 
2018.10.04 11:34:41.975914 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): MinMax index condition: (column 0 in [1536019200, +inf)), (column 0 in (-inf, 1538611200]), and                        
2018.10.04 11:34:42.092220 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): Selected 21 parts by date, 21 parts by key, 432494 marks to read from 21 ranges                                        
2018.10.04 11:34:42.096082 [ 11290 ] <Trace> default..inner.mv_agg_day_new (SelectExecutor): Reading approx. 3542990848 rows                                                                                        
2018.10.04 11:34:42.096146 [ 11290 ] <Trace> InterpreterSelectQuery: FetchColumns -> WithMergeableState
2018.10.04 11:34:42.097373 [ 11290 ] <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2018.10.04 11:34:42.172805 [ 11290 ] <Debug> executeQuery: Query pipeline:
2018.10.04 11:34:42.174612 [ 11290 ] <Trace> Aggregator: Reading blocks of partially aggregated data.
2018.10.04 11:37:20.693869 [ 11290 ] <Trace> Aggregator: Read 5 blocks of partially aggregated data, total 155 rows.
2018.10.04 11:37:20.693938 [ 11290 ] <Trace> Aggregator: Merging partially aggregated single-level data.
2018.10.04 11:37:20.721704 [ 11290 ] <Trace> Aggregator: Merged partially aggregated single-level data.
2018.10.04 11:37:20.739659 [ 11290 ] <Trace> Aggregator: Converting aggregated data to blocks
2018.10.04 11:37:20.739773 [ 11290 ] <Trace> Aggregator: Converted aggregated data to blocks. 31 rows, 0.002 MiB in 0.000 sec. (489777.862 rows/sec., 39.235 MiB/sec.)
2018.10.04 11:37:20.768479 [ 11290 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
2018.10.04 11:37:20.768664 [ 11290 ] <Trace> UnionBlockInputStream: Waited for threads to finish
2018.10.04 11:37:20.768708 [ 11290 ] <Information> executeQuery: Read 17806359988 rows, 3.87 TiB in 158.999 sec., 111990246 rows/sec., 24.93 GiB/sec.
2018.10.04 11:37:20.786022 [ 11290 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
2018.10.04 11:37:20.786042 [ 11290 ] <Trace> UnionBlockInputStream: Waited for threads to finish
2018.10.04 11:37:20.786088 [ 11290 ] <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2018.10.04 11:37:20.792317 [ 11290 ] <Debug> MemoryTracker: Peak memory usage (for query): 65.77 MiB.
2018.10.04 11:37:20.792482 [ 11290 ] <Information> HTTPHandler: Done processing query
2018.10.04 11:37:20.793398 [ 11290 ] <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: [::ffff:10.240.0.23]:43198, User-Agent: Go-http-client/1.1
2018.10.04 11:37:20.793451 [ 11290 ] <Trace> HTTPHandler: Request URI: /
2018.10.04 11:37:20.793667 [ 11290 ] <Debug> executeQuery: (from [::ffff:10.240.0.23]:43198, query_id: de0d107a-1617-4759-b275-96ff578c8d5e) KILL QUERY WHERE query_id = '155A63380D95D3C6'
2018.10.04 11:37:20.873350 [ 11290 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.10.04 11:37:20.873699 [ 11290 ] <Information> executeQuery: Read 2 rows, 1.45 KiB in 0.080 sec., 25 rows/sec., 18.17 KiB/sec.
2018.10.04 11:37:20.873732 [ 11290 ] <Debug> MemoryTracker: Peak memory usage (for query): 129.84 KiB.
2018.10.04 11:37:20.873843 [ 11290 ] <Information> HTTPHandler: Done processing query

Is there any reason for such behavior?

Add new allowed parameters

Using chproxy make uncomfortable work with database tools like DBeaver or DbVisualiser because they uses query parameters to limit shown data.

Example:

POST /?compress=1&password=123&result_overflow_mode=break&extremes=0&max_result_rows=100&user=default&database=default HTTP/1.1
Content-Length: 83
Content-Type: text/plain; charset=UTF-8
Host: localhost
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_181)

SELECT * FROM "default"."table" FORMAT TabSeparatedWithNamesAndTypes;

chproxy cut parameters result_overflow_mode, extremes and max_result_rows, so when I open table in database tool ClickHouse tries to load ALL data from table instead of load only 100 rows.

Please, add this parameters to allowed.

Can I access port 9000 via chproxy ?

When I config chproxy to access CH by port 9000, the log of chproxy show connect error.Is there any way can access CH by chproxy via port 9000 ?

Feature request: Global caching

It would be good if chproxy could have global cache not only per user

cannot reach clickhouse host？

When using agents, what are the causes of the following problems?
Caused by: java.lang.Throwable: [ Id: 160857F337DC563A; User "tmplarge"(1) proxying as "default"(1) to "d085126100.aliyun.com:8123"(6); RemoteAddr: "10.13.56.73:51080"; LocalAddr: "10.85.129.101:9090"; Duration: 825 ?s]: cannot reach d085126100.aliyun.com:8123; query: "select timezone()\nFORMAT TabSeparatedWithNamesAndTypes;"

Healthcheck endpoint

For deploying chproxy via kubernetes (or similar), it would be good if there was an endpoint like /health that could be used for liveness and readiness checks.

chproxy con't proxy parameter to backgroud clickhouse server

eg:
using "jdbc:clickhouse://172.0.0.3:8123/db?insert_deduplicate=0" connect to clickhouse server
the insert_deduplicate is works.

The front is chproxy, using "jdbc:clickhouse://172.0.0.3:9090/db?insert_deduplicate=0" connect to chproxy server, the insert_deduplicate is not works.

Is there a good solution?

add node weight support

Could node weight support in the future project?

Empty Authorization header will lead to 400 bad request

Trying to post a query with an empty Authorization header will lead to a 400 response.

For example, the following will fail through chproxy, while it would work when submitted directly to CH node:

nc 127.0.0.1 8123
POST /?user=default&database=default&compress=1 HTTP/1.1
Authorization
Content-Type: text/plain; charset=UTF-8
User-Agent: Java/11.0.1
Host: localhost:8123
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 55

select timezone() FORMAT TabSeparatedWithNamesAndTypes

Notice that this type of request is generated by the https://github.com/blynkkk/clickhouse4j JDBC driver, which is quite popular.

Audit log

Add the ability to log INSERT and SELECT queries in an audit log for compliance purposes and performance analysis. Log partial sql statement, origin and the user-id of each query.

[Feature Request] Store cache in RAM

It would be good if i can save cache to RAM (and perfect if i can set expire time for RAM and HDD separately)

keep alive with upstream

I found chproxy closes each connection with upstream server very soon, like within a second. With hundreds and thousands of concurrent requests, available ports can be easily run out - since most of them would be at TIME_WAIT state.

Can we import some connection pool sort of ideas to keep a custom number of active connections with upstream servers, or set a custom keep alive time with upstream server for each cluster, or honor the CH keep alive time in the response headers?

Cache not working between browsers?

Is CHproxy hashing the user-agent string or something similar as the cache key? I find that even though I can force-refresh the same page a million times and can verify the result comes from cache, if I switch to a different browser (or use cURL) the query gets re-run

health-check failed on server with defined <http_server_default_response> option

Many months ago, as defined in clickhouse docs i add variable http_server_default_response to clickhouse config for starting tabix in browser. Now i need to setup chproxy and see in log for every clickhouse node this:

ERROR: 2019/05/16 07:59:00 scope.go:553: error while health-checking "10.60.11.241:8123" host: unexpected response: <html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>

my config.yml:

hack_me_please: true
log_debug: true

server:
  http:
      listen_addr: ":9191"

users:
  - name: "grafana"
    password: "***"
    to_cluster: "click-na"
    to_user: "grafana"
    max_concurrent_queries: 20
    max_execution_time: 120s
    requests_per_minute: 50
    allow_cors: true
    max_queue_size: 40
    max_queue_time: 30s
    cache: "shortterm"

clusters:
  - name: "click-na"
    nodes: [
     "10.60.11.241:8123",
     "10.60.11.242:8123",
     "10.60.11.243:8123",
    ]
    users:
      - name: "grafana"
        password: "***"

caches:
  - name: "shortterm"
    dir: "/tmp/cache"
    max_size: 150Mb
    expire: 600s

curl request to / of clickhouse node:

# curl -v 10.60.11.241:8123/
*   Trying 10.60.11.241...
* TCP_NODELAY set
* Connected to 10.60.11.241 (10.60.11.241) port 8123 (#0)
> GET / HTTP/1.1
> Host: 10.60.11.241:8123
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 16 May 2019 08:20:29 GMT
< Connection: Keep-Alive
< Keep-Alive: timeout=3
< Content-Type: text/html; charset=UTF-8
< Content-Length: 183
< 
* Curl_http_done: called premature == 0
* Connection #0 to host 10.60.11.241 left intact
<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>

I think, simple request to '/' and search 'Ok' in body is not good for database healthcheck.
May be SELECT 1; request will be more suitable, because it always return predefined result.

For example:

# echo "SELECT 1;" | curl -u grafana:*** --data-binary @- 10.60.11.241:8123
1

Work through VPN

Hi, I have a config that works fine in the office:

server:
http:
listen_addr: :8124

users:

name: default
to_cluster: xxx-clickhouse
to_user: default
max_concurrent_queries: 100
max_execution_time: 180s
cache: "oneBigCache"

hack_me_please: true

clusters:

name: xxx-clickhouse
nodes:
- ch-1.xxx.net:8123
- ch-2.xxx.net:8123
users:
- name: default
  password: blah

caches:

name: "oneBigCache"
dir: "/tmp/ch_proxy"
max_size: 5000Mb
expire: 1h

However, at home through the VPN I can’t get the connection:

ERROR: 2020/03/22 18:01:44 scope.go:638: error while health-checking "ch-1.xxx.net:8123" host: cannot send request in 3.000085934s: Get http://ch-1.xxx.net:8123: dial tcp XX.X.XX.XXX:8123: i/o timeout

Tell me what am I doing wrong?

Connection exception

I'm using jdbc client written on on java. When i use a long-time query i got this exception -

Caused by: java.sql.SQLException: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 4096; actual size: 4088)
	at ru.yandex.clickhouse.response.ClickHouseResultSet.hasNext(ClickHouseResultSet.java:129)
	at ru.yandex.clickhouse.response.ClickHouseResultSet.next(ClickHouseResultSet.java:143)
	at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:93)
	at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:61)
	at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:679)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:617)
	... 45 more
Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 4096; actual size: 4088)
	at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:198)
	at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at com.google.common.io.ByteStreams.read(ByteStreams.java:859)
	at com.google.common.io.ByteStreams.readFully(ByteStreams.java:738)
	at com.google.common.io.ByteStreams.readFully(ByteStreams.java:722)
	at com.google.common.io.LittleEndianDataInputStream.readFully(LittleEndianDataInputStream.java:65)
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.readNextBlock(ClickHouseLZ4Stream.java:101)
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.checkNext(ClickHouseLZ4Stream.java:74)
	at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.read(ClickHouseLZ4Stream.java:60)
	at ru.yandex.clickhouse.response.StreamSplitter.readFromStream(StreamSplitter.java:92)
	at ru.yandex.clickhouse.response.StreamSplitter.next(StreamSplitter.java:54)
	at ru.yandex.clickhouse.response.ClickHouseResultSet.hasNext(ClickHouseResultSet.java:116)

Here is my congif -


hack_me_please: true
server:
  http:
    listen_addr: ":9090"
    read_timeout: 50m

users:
  - name: ""
    password: ""
    to_cluster: ""
    to_user: ""

clusters:
  - name: ""
    nodes: []

I also tried the same query directly to database and it works fine

Provide docker images

Hi,guys

I run chproxy with Docker in the production env.
It brings me too much convenience. I like Docker
Can you provide a Docker images? I think this is a good idea.

Of course, I also created the docker image of chproxy.
If you think it's good, you can use it directly.
Dockerfile
Docker Hub

Merry Christmas 🎄🎄🎄🎉🎉

chproxy does not support input multi-query

curl 'http://proxy-bip-readonly:@localhost:9090/' -d 'use hl_user_db; show databases ' Code: 62, e.displayText() = DB::Exception: Syntax error (Multi-statements are not allowed): failed at position 5: hl_user_db; show databases . (version 19.16.3.6 (official build))
chproxy does not support input multi-query, why ? I think It is very useful. Can I do by other way ?

Releases schedule

Can you tell me about the release schedule? You have some release schedule? When is the next release planned?

Thanks!

External data for query processing

I've noticed that chproxy doesn't support external data for queries. Documentation - https://clickhouse.yandex/docs/en/table_engines/external_data/

Example with clickhouse server:
$ echo -ne "1\n2\n3\n" > /tmp/testdata $ curl -F 'testdata=@/tmp/testdata;' 'http://localhost:8123?user=user&password=password&query=SELECT+*+FROM+testdata&testdata_structure=id+UInt32'
Response:
1 2 3

Example with chproxy:
$ echo -ne "1\n2\n3\n" > /tmp/testdata $ curl -F 'testdata=@/tmp/testdata;' 'http://localhost:9090?user=CHPROXY_USER&password=CHPROXY_PASSWORD&query=SELECT+*+FROM+testdata&testdata_structure=id+UInt32'
Response:
Code: 36, e.displayText() = DB::Exception: Neither structure nor types have not been provided for external table testdata. Use fields testdata_structure or testdata_types to do so., e.what() = DB::Exception

Are there any plans on fixing it?

Support sending query execution progress in HTTP headers

There are a feature of ClickHouse server, allowing to send a current query execution progress via HTTP headers: https://clickhouse.yandex/docs/en/operations/settings/settings/#settings-send_progress_in_http_headers

There are CLI supporting HTTP interface of ClickHouse: https://github.com/hatarist/clickhouse-cli
It's very useful with chproxy, because chproxy doesn't support native protocol, but CLI tool is very handy for running manual queries. This CLI can draw a progress bar (as native clickhouse-client do) using data from X-Clickhouse-Progress headers. Unfortunately, it doesn't work correctly with chproxy.

When querying ClickHouse server directly with send_progress_in_http_headers=1, server sends X-Clickhouse-Progress periodically (twice per second) during query execution:

$ echo "select avg(number) from (select number from system.numbers limit 50000000);" | curl -v 'http://<host>:8123/?cancel_http_readonly_queries_on_client_close=1&send_progress_in_http_headers=1&query=' --data-binary @-
* Hostname was NOT found in DNS cache
*   Trying <ip>...
* Connected to <host> (<ip>) port 8123 (#0)
> POST /?cancel_http_readonly_queries_on_client_close=1&send_progress_in_http_headers=1&query= HTTP/1.1
> User-Agent: curl/7.38.0
> Host: <host>:8123
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< Date: Tue, 09 Jul 2019 08:20:21 GMT
< Connection: Keep-Alive
< Content-Type: text/tab-separated-values; charset=UTF-8
< X-ClickHouse-Server-Display-Name: <host>
< Transfer-Encoding: chunked
< Query-Id: 07731bdc-e5ea-4786-94a7-26e98abfe8e7
< Keep-Alive: timeout=3
< X-ClickHouse-Progress: {"read_rows":"13631488","read_bytes":"109051904","total_rows":"0"}
< X-ClickHouse-Progress: {"read_rows":"25755648","read_bytes":"206045184","total_rows":"0"}
< X-ClickHouse-Progress: {"read_rows":"41549824","read_bytes":"332398592","total_rows":"0"}
< 
24999999.5
* Connection #0 to host <host> left intact

In this example X-ClickHouse-Progress headers appearing immidiately after sending a query

When querying ClickHouse server via chproxy with send_progress_in_http_headers=1, chproxy will send all X-Clickhouse-Progress headers only when query execution is finished

$ echo "select avg(number) from (select number from system.numbers limit 100000000);" | curl -v 'http://test:test@<chproxy>:9090' --data-binary @-
* Rebuilt URL to: http://test:test@<chproxy>:9090/
* Hostname was NOT found in DNS cache
*   Trying <ip>...
* Connected to <chproxy> (<ip>) port 9090 (#0)
* Server auth using Basic with user 'test'
> POST / HTTP/1.1
> Authorization: Basic dGVzdDp0ZXN0
> User-Agent: curl/7.38.0
> Host: <chproxy>:9090
> Accept: */*
> Content-Length: 77
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 77 out of 77 bytes
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: text/tab-separated-values; charset=UTF-8
< Date: Tue, 09 Jul 2019 08:24:30 GMT
< Query-Id: 15AFAF990FDA68BB
< X-Clickhouse-Progress: {"read_rows":"15597568","read_bytes":"124780544","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"31588352","read_bytes":"252706816","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"47841280","read_bytes":"382730240","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"64028672","read_bytes":"512229376","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"80347136","read_bytes":"642777088","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"96600064","read_bytes":"772800512","total_rows":"0"}
< X-Clickhouse-Server-Display-Name: <host>
< Content-Length: 11
< 
49999999.5
* Connection #0 to host <chproxy> left intact

DEBUG: 2019/07/09 08:24:30 proxy.go:113: [ Id: 15AFAF990FDA68BB; User "test"(1) proxying as "readonly"(1) to "<host>:8123"(1); RemoteAddr: "<ip>:45546"; LocalAddr:
 "<ip>:9090"; Duration: 626.197469ms ]: request success; query: "select avg(number) from (select number from system.numbers limit 100000000);\n"; URL: "http://<host>
:8123/?cancel_http_readonly_queries_on_client_close=1&query_id=15AFAF990FDA68BB&send_progress_in_http_headers=1"

In this example X-Clickhouse-Progress headers will appear in the end, so progress bar can't be displayed during query execution.

Is it possible to send X-Clickhouse-Progress headers from chproxy to the client as ClickHouse server do?

Thanks!

Allowed or Disallowed Query List (Words) Feature

Hello,

I think it will be nice if you can some settings to allow only specific queries or deny specific queries (or maybe both).

For example:

allowed_query_words = SELECT,USE,SHOW,DESCRIBE,EXISTS,USE,KILL
disallowed_query_words=CREATE,ALTER,DROP,INSERT

So the user will only able to send queries starting with "SELECT"

I know there is a readonly mechanism inside Clickhouse. But through proxy we can restrict proxy users more granular.

What do you think?

contentsquare / chproxy Goto Github PK

chproxy's People

Contributors

Stargazers

Watchers

Forkers

chproxy's Issues

Recommend Projects

Recommend Topics

Recommend Org