contentsquare / chproxy Goto Github PK
View Code? Open in Web Editor NEWOpen-Source ClickHouse http proxy and load balancer
Home Page: https://www.chproxy.org/
License: MIT License
Open-Source ClickHouse http proxy and load balancer
Home Page: https://www.chproxy.org/
License: MIT License
chproxy is not working with https://github.com/yandex/clickhouse-odbc/releases/tag/v1.0.0.20190201 driver
Hi!
Is there any way to configure proxies to query another proxies?
I found that a proxy can't pass heartbeat check from another proxy (unauthorized).
And i would like to configure compression between proxies if cascading is real.
thx
I use the clickhouse-jdbc to write the data to the Clickhouse(1.1.54318)
I will set the properties compress and decompress as true.
The jdbc driver uses LZ4 to compress and decompress the data. And it can transfer data with Clickhouse(1.1.54318).
When I connect to chproxy, it doesn't work properly.
error log
Caused by: java.io.IOException: Magic is not correct: 103
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.readNextBlock(ClickHouseLZ4Stream.java:93) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.checkNext(ClickHouseLZ4Stream.java:74) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.read(ClickHouseLZ4Stream.java:50) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.response.StreamSplitter.readFromStream(StreamSplitter.java:85) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.response.StreamSplitter.next(StreamSplitter.java:47) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.response.ClickHouseResultSet.<init>(ClickHouseResultSet.java:65) ~[clickhouse-jdbc-0.1.34.jar:na]
at ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:117) ~[clickhouse-jdbc-0.1.34.jar:na]
config.yml
hack_me_please: false
server:
http:
listen_addr: "0.0.0.0:9090"
allowed_networks: ["192.168.1.0/24"]
metrics:
allowed_networks: ["192.168.1.0/24"]
users:
- name: "default"
to_cluster: "default"
to_user: "default"
allow_cors: true
We have some clients that send "max_result_rows" (I'm looking at you, JDBC connector!) and we liked that chproxy stripped this out as readonly users can't change this parameters. It would be good if the allowedParams could be configured
Configuration may be updated without restart - just send SIGHUP signal to chproxy process.
https://github.com/Vertamedia/chproxy/blob/master/main.go#L118
Maybe setting in config - enableTCP6
is there anything available to deploy this on kubernetes?
helm charts or some values.yaml you guys might use?
Is it possible to insert into multiple clusters?
this would be a nice feature. since it could come in handy when migrating to a new cluster
The only way to connect to cluster is via https with Yandex certificate
Trying to connect via https to rc1b-z5qstya9********.mdb.yandexcloud.net:8443 node but receiving error:
ERROR: 2019/10/11 13:19:13 scope.go:638: error while health-checking "rc1b-z5qstya9********.mdb.yandexcloud.net:8443" host: cannot send request in 82.730564ms: Get https://rc1b-z5qstya9********.mdb.yandexcloud.net:8443: x509: certificate signed by unknown authority
Scheme: "https" is set.
But how to add certificate to "cluster" section?
Another solution is to allow insecure https connections. Is it possible?
Grafana generates query with time in seconds and is is very hard to cache it (because if i use time range from date to now - now change every second)
It would be very cool if you can trim date in SQL with some options: by seconds,minutes,hours,days i think that feature can be implemented with REGex
what do you think?
Please add support of enable_http_compression
parameter to get gziped responses from server.
According to clickhouse doc https://clickhouse.yandex/docs/en/interfaces/http_interface/
Also standard gzip-based HTTP compression can be used. To send gzip compressed POST data just add Content-Encoding: gzip to request headers, and gzip POST body. To get response compressed, you need to add Accept-Encoding: gzip to request headers, and turn on ClickHouse setting called enable_http_compression.
users:
- name: "insert"
to_cluster: "stats-raw"
to_user: "default"
clusters:
- name: "stats-raw"
# Requests are spread in `round-robin` + `least-loaded` fashion among nodes.
# Unreachable and unhealthy nodes are skipped.
nodes: [
"10.10.10.1:8123",
"10.10.10.2:8123",
"10.10.10.3:8123",
]
Take the above configuration as an example
What I don't understand is how chproxysql executes INSERT statements
i.e
request A , B, C. A have 10 INSERT statement and insert 10 rows, B 100 INSERT..., C 1000 INSERT...
A, B, C will send to 10.10.10.1, 10.10.10.2, 10.10.10.3 respectively, so 10.10.10.1 have 10 record in local table, 10.10.10.2 have 100, 10.10.10.3 have 1000?
OR
the totoal 10+100+1000=1110 rows will Insert three nodes evenly?
sorry for my poor english
I have several clients written over jdbc using ch-proxy. i Have standard configuration -
log_debug: true
hack_me_please: true
server:
http:
listen_addr: ":9090"
users:
- name: "..."
clusters:
- name: ".."
Requests are spread in `round-robin` + `least-loaded` fashion among nodes.
Unreachable and unhealthy nodes are skipped.
nodes: [
....
]
And when i'm trying to send a request with external data (using post method i suppose) ch-proxy adds paramter no_cache and click house throw an exception - Code: 115, e.displayText() = DB::Exception: Unknown setting no_cache.
I suppose it woud be nice to remove this
// disable cache for external_data queries
params.Set("no_cache", "1")
log.Debugf("external data params detected - cache will be disabled")
Now i can set cache settings in seconds, but for heavy requests (in my project) i use cache for 3-7 days it would be good if i can set settings in days.
Example:
Now i use expire: 259000s
i want: expire: 3d
Hi,
checking if chproxy is still alive would be much easier if it would have an internal status/healph page, which returns 200 if everything is okay. Like clickhouse anwers with 200 on / and /ping.
Thanks,
bernd
we want to add users on the fly without restarting.
For now , we try to modify config.yml to add users item .
But chproxy need restart to apply configuration changes.
Chproxy Users will lose their connection if we restart ,
so it looks like a bad option
At the same time , if we modify the configuration in ClickHouse, it will apply without restarting.
Could we have this feature in chproxy, thanks
Running 'go test' on master results in:
./scope.go:82:9: Sprintf format %s has arg time.Since(s.startTime).Nanoseconds() / 1000.0 of wrong type int64
This was introduced in #70 which changed the value from a Duration to int64 without updating the format specifier
By design, all passwords stored in the config file in a plaintext. During config loading/reloading, whole config including passwords will be printed to the log:
https://github.com/Vertamedia/chproxy/blob/master/main.go#L272
We're running chproxy in a docker, and collecting logs from a service for further analysis. Passwords stored in a log file creating a security issue for us.
Also, i think it's useful to print a loaded config while debugging (with a log_debug = true
), but makes no sense during a normal operation.
Is it possible print the config only with a log_debug = true
only? Or at least hide a passwords somehow?
Thanks!
hi:
my english is low,but i have a question,i want to delete data in 2 shard,how do it ,thanks.
There should be possibility to set read/write timeouts for proxy via config
We are setting format_schema
as a query parameter in our POST request, however when chproxy proxies the request, this parameter is not included and Clickhouse errors with DB::Exception: The format Protobuf requires a schema. The 'format_schema' setting should be set (version 19.14.6.12 (official build))
This would be resolved if chproxy forwarded this query parameter to the server.
I think identical request should be able to be refused after grace time elapse. but I spot that all remain requests have been delegated to clickhouse servers after first request timeout. and no options can be configured. when set grace_time: -1s, the server can't be started. display error: can't load config "/data/service/ck-proxy/config.yml": not a valid duration string: "-1s"
The log function is very simple now. When using it in production environment, I think it's better to have some more function. First, it's better to have the log rotation and the log path can be config. The second is the log file size limit can be set.
Does the core team have this plan in the Roadmap. or we can discuss it here, and if more people think it is useful, I can implement it after more details have been discussed.
Actually, this might be a bit out of scope for a plain proxy but in some of projects the only reason for a client facing self-written middleware instead of just the chproxy is that we want to have another abstraction layer in between and prevent users from running super expensive queries and for example.
I see that this is anything but not simple and fast but I think that this would increase the value of chproxy a lot.
Would love to see the support for prepared statements in the future.
Let me know if you need more details.
Best regards,
Marcel
We have 3 clickhouse nodes (3 shards) in a cluster. It is noted in Readme for INSERTS that
..The routing logic may be embedded either directly into applications generating INSERTs or may be moved to a proxy. Proxy approach is better since it allows re-configuring ClickHouse cluster without modification of application configs and without application downtime.
We are inserting in batches using begin->prepare->(in loop exec)->commit
. So each individual commit will contain multiple rows which has to be distributed among shards.
If we use distributed table, then the clickhouse shard will decide based on the sharding logic to decide to which shard this row has to go to. How will the proxy know to which shard should each row be written into? ie. how does the proxy know the sharding logic
Is it possible with chproxy and inserts to say that the
We are currently using chproxy version 1.12.0.
When running heavier queries that take longer time, they are being killed after some time, despite having max_execution_time
value set to 0 in both chproxy and clickhouse.
Here one example:
$ curl -v "[email protected]:9090/?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow"
* Trying 10.240.0.23...
* TCP_NODELAY set
* Connected to 10.240.0.23 (10.240.0.23) port 9090 (#0)
* Server auth using Basic with user 'benchmark'
> GET /?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow HTTP/1.1
> Host: 10.240.0.23:9090
> Authorization: Basic YmVuY2htYXJrOg==
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host 10.240.0.23 left intact
curl: (52) Empty reply from server
As you can see, the connection is killed. Here are some relevant logs from chproxy:
Oct 04 11:34:41 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:34:41 proxy.go:74: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 51.858µs ]: request start
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 proxy.go:199: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 2m39.080545361s ]: remote client closed the connection in 2m39.080353788s; query: "SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow"
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 scope.go:243: killing the query with query_id=155A63380D95D3C6
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 scope.go:283: killed the query with query_id=155A63380D95D3C6; respBody: ""
Oct 04 11:37:20 gceuw1-chp01.c.lucky-eon-543.internal chproxy[13621]: DEBUG: 2018/10/04 11:37:20 proxy.go:115: [ Id: 155A63380D95D3C6; User "benchmark"(1) proxying as "readonly"(1) to "10.240.0.16:8123"(1); RemoteAddr: "192.168.11.231:49580"; LocalAddr: "10.240.0.23:9090"; Duration: 2m39.161807652s ]: request failure: non-200 status code 499; query: "SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow"; URL: "http://10.240.0.16:8123/?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow&query_id=155A63380D95D3C6"
And from clickhouse:
2018.10.04 11:34:41.713028 [ 11290 ] <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: GET, Address: [::ffff:10.240.0.23]:43198, User-Agent: RemoteAddr: 192.168.11.231:49580; LocalAddr: 10.240.0.23:9090; CHProxy-User: benchmark; CHProxy-ClusterUser: readonly; curl/7.61.1
2018.10.04 11:34:41.713126 [ 11290 ] <Trace> HTTPHandler: Request URI: /?query=SELECT+toDate%28%60date_time%60%2C+%27UTC%27%29+as+%60ex_ch_date_time%60%2C+sumMerge%28%60invalid_impression%60%29+as+%60ex_ch_invalid_impression%60%2C+sumMerge%28%60advertiser_value_usd%60%29+as+%60ex_ch_advertiser_value_usd%60%2C+sumMerge%28%60impression%60%29+as+%60ex_ch_impression%60%2C+sumMerge%28%60invalid_click%60%29+as+%60ex_ch_invalid_click%60%2C+sumMerge%28%60advertiser_value_eur%60%29+as+%60ex_ch_advertiser_value_eur%60%2C+sumMerge%28%60margin_eur%60%29+as+%60ex_ch_margin_eur%60%2C+sumMerge%28%60click%60%29+as+%60ex_ch_click%60%2C+sumMerge%28%60publisher_value_eur%60%29+as+%60ex_ch_publisher_value_eur%60%2C+sumMerge%28%60margin_usd%60%29+as+%60ex_ch_margin_usd%60%2C+sumMerge%28%60publisher_value_usd%60%29+as+%60ex_ch_publisher_value_usd%60+FROM+%60dist_mv_agg_day_new%60+WHERE+%28%60date_time%60+%3E%3D+toDateTime%28%272018-09-04+00%3A00%3A00%27%2C+%27UTC%27%29+and+%60date_time%60+%3C%3D+toDateTime%28%272018-10-04+00%3A00%3A00%27%2C+%27UTC%27%29%29+GROUP+BY+date_time+FORMAT+JSONEachRow&query_id=155A63380D95D3C6
2018.10.04 11:34:41.769439 [ 11290 ] <Debug> executeQuery: (from [::ffff:10.240.0.23]:43198, user: readonly, query_id: 155A63380D95D3C6) SELECT toDate(`date_time`, 'UTC') as `ex_ch_date_time`, sumMerge(`invalid_impression`) as `ex_ch_invalid_impression`, sumMerge(`advertiser_value_usd`) as `ex_ch_advertiser_value_usd`, sumMerge(`impression`) as `ex_ch_impression`, sumMerge(`invalid_click`) as `ex_ch_invalid_click`, sumMerge(`advertiser_value_eur`) as `ex_ch_advertiser_value_eur`, sumMerge(`margin_eur`) as `ex_ch_margin_eur`, sumMerge(`click`) as `ex_ch_click`, sumMerge(`publisher_value_eur`) as `ex_ch_publisher_value_eur`, sumMerge(`margin_usd`) as `ex_ch_margin_usd`, sumMerge(`publisher_value_usd`) as `ex_ch_publisher_value_usd` FROM `dist_mv_agg_day_new` WHERE (`date_time` >= toDateTime('2018-09-04 00:00:00', 'UTC') and `date_time` <= toDateTime('2018-10-04 00:00:00', 'UTC')) GROUP BY date_time FORMAT JSONEachRow
2018.10.04 11:34:41.975876 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): Key condition: (column 0 in [1536019200, +inf)), (column 0 in (-inf, 1538611200]), and
2018.10.04 11:34:41.975914 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): MinMax index condition: (column 0 in [1536019200, +inf)), (column 0 in (-inf, 1538611200]), and
2018.10.04 11:34:42.092220 [ 11290 ] <Debug> default..inner.mv_agg_day_new (SelectExecutor): Selected 21 parts by date, 21 parts by key, 432494 marks to read from 21 ranges
2018.10.04 11:34:42.096082 [ 11290 ] <Trace> default..inner.mv_agg_day_new (SelectExecutor): Reading approx. 3542990848 rows
2018.10.04 11:34:42.096146 [ 11290 ] <Trace> InterpreterSelectQuery: FetchColumns -> WithMergeableState
2018.10.04 11:34:42.097373 [ 11290 ] <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2018.10.04 11:34:42.172805 [ 11290 ] <Debug> executeQuery: Query pipeline:
2018.10.04 11:34:42.174612 [ 11290 ] <Trace> Aggregator: Reading blocks of partially aggregated data.
2018.10.04 11:37:20.693869 [ 11290 ] <Trace> Aggregator: Read 5 blocks of partially aggregated data, total 155 rows.
2018.10.04 11:37:20.693938 [ 11290 ] <Trace> Aggregator: Merging partially aggregated single-level data.
2018.10.04 11:37:20.721704 [ 11290 ] <Trace> Aggregator: Merged partially aggregated single-level data.
2018.10.04 11:37:20.739659 [ 11290 ] <Trace> Aggregator: Converting aggregated data to blocks
2018.10.04 11:37:20.739773 [ 11290 ] <Trace> Aggregator: Converted aggregated data to blocks. 31 rows, 0.002 MiB in 0.000 sec. (489777.862 rows/sec., 39.235 MiB/sec.)
2018.10.04 11:37:20.768479 [ 11290 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
2018.10.04 11:37:20.768664 [ 11290 ] <Trace> UnionBlockInputStream: Waited for threads to finish
2018.10.04 11:37:20.768708 [ 11290 ] <Information> executeQuery: Read 17806359988 rows, 3.87 TiB in 158.999 sec., 111990246 rows/sec., 24.93 GiB/sec.
2018.10.04 11:37:20.786022 [ 11290 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
2018.10.04 11:37:20.786042 [ 11290 ] <Trace> UnionBlockInputStream: Waited for threads to finish
2018.10.04 11:37:20.786088 [ 11290 ] <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2018.10.04 11:37:20.792317 [ 11290 ] <Debug> MemoryTracker: Peak memory usage (for query): 65.77 MiB.
2018.10.04 11:37:20.792482 [ 11290 ] <Information> HTTPHandler: Done processing query
2018.10.04 11:37:20.793398 [ 11290 ] <Trace> HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: [::ffff:10.240.0.23]:43198, User-Agent: Go-http-client/1.1
2018.10.04 11:37:20.793451 [ 11290 ] <Trace> HTTPHandler: Request URI: /
2018.10.04 11:37:20.793667 [ 11290 ] <Debug> executeQuery: (from [::ffff:10.240.0.23]:43198, query_id: de0d107a-1617-4759-b275-96ff578c8d5e) KILL QUERY WHERE query_id = '155A63380D95D3C6'
2018.10.04 11:37:20.873350 [ 11290 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.10.04 11:37:20.873699 [ 11290 ] <Information> executeQuery: Read 2 rows, 1.45 KiB in 0.080 sec., 25 rows/sec., 18.17 KiB/sec.
2018.10.04 11:37:20.873732 [ 11290 ] <Debug> MemoryTracker: Peak memory usage (for query): 129.84 KiB.
2018.10.04 11:37:20.873843 [ 11290 ] <Information> HTTPHandler: Done processing query
Is there any reason for such behavior?
Using chproxy make uncomfortable work with database tools like DBeaver or DbVisualiser because they uses query parameters to limit shown data.
Example:
POST /?compress=1&password=123&result_overflow_mode=break&extremes=0&max_result_rows=100&user=default&database=default HTTP/1.1
Content-Length: 83
Content-Type: text/plain; charset=UTF-8
Host: localhost
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_181)
SELECT * FROM "default"."table" FORMAT TabSeparatedWithNamesAndTypes;
chproxy cut parameters result_overflow_mode, extremes and max_result_rows, so when I open table in database tool ClickHouse tries to load ALL data from table instead of load only 100 rows.
Please, add this parameters to allowed.
When I config chproxy to access CH by port 9000, the log of chproxy show connect error.Is there any way can access CH by chproxy via port 9000 ?
It would be good if chproxy could have global cache not only per user
When using agents, what are the causes of the following problems?
Caused by: java.lang.Throwable: [ Id: 160857F337DC563A; User "tmplarge"(1) proxying as "default"(1) to "d085126100.aliyun.com:8123"(6); RemoteAddr: "10.13.56.73:51080"; LocalAddr: "10.85.129.101:9090"; Duration: 825 ?s]: cannot reach d085126100.aliyun.com:8123; query: "select timezone()\nFORMAT TabSeparatedWithNamesAndTypes;"
For deploying chproxy via kubernetes (or similar), it would be good if there was an endpoint like /health
that could be used for liveness and readiness checks.
eg:
using "jdbc:clickhouse://172.0.0.3:8123/db?insert_deduplicate=0" connect to clickhouse server
the insert_deduplicate is works.
The front is chproxy, using "jdbc:clickhouse://172.0.0.3:9090/db?insert_deduplicate=0" connect to chproxy server, the insert_deduplicate is not works.
Is there a good solution?
Could node weight support in the future project?
Trying to post a query with an empty Authorization
header will lead to a 400 response.
For example, the following will fail through chproxy, while it would work when submitted directly to CH node:
nc 127.0.0.1 8123
POST /?user=default&database=default&compress=1 HTTP/1.1
Authorization
Content-Type: text/plain; charset=UTF-8
User-Agent: Java/11.0.1
Host: localhost:8123
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 55
select timezone() FORMAT TabSeparatedWithNamesAndTypes
Notice that this type of request is generated by the https://github.com/blynkkk/clickhouse4j JDBC driver, which is quite popular.
Add the ability to log INSERT and SELECT queries in an audit log for compliance purposes and performance analysis. Log partial sql statement, origin and the user-id of each query.
It would be good if i can save cache to RAM (and perfect if i can set expire time for RAM and HDD separately)
I found chproxy closes each connection with upstream server very soon, like within a second. With hundreds and thousands of concurrent requests, available ports can be easily run out - since most of them would be at TIME_WAIT state.
Can we import some connection pool sort of ideas to keep a custom number of active connections with upstream servers, or set a custom keep alive time with upstream server for each cluster, or honor the CH keep alive time in the response headers?
Is CHproxy hashing the user-agent string or something similar as the cache key? I find that even though I can force-refresh the same page a million times and can verify the result comes from cache, if I switch to a different browser (or use cURL) the query gets re-run
Many months ago, as defined in clickhouse docs i add variable http_server_default_response to clickhouse config for starting tabix in browser. Now i need to setup chproxy and see in log for every clickhouse node this:
ERROR: 2019/05/16 07:59:00 scope.go:553: error while health-checking "10.60.11.241:8123" host: unexpected response: <html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>
my config.yml:
hack_me_please: true
log_debug: true
server:
http:
listen_addr: ":9191"
users:
- name: "grafana"
password: "***"
to_cluster: "click-na"
to_user: "grafana"
max_concurrent_queries: 20
max_execution_time: 120s
requests_per_minute: 50
allow_cors: true
max_queue_size: 40
max_queue_time: 30s
cache: "shortterm"
clusters:
- name: "click-na"
nodes: [
"10.60.11.241:8123",
"10.60.11.242:8123",
"10.60.11.243:8123",
]
users:
- name: "grafana"
password: "***"
caches:
- name: "shortterm"
dir: "/tmp/cache"
max_size: 150Mb
expire: 600s
curl request to / of clickhouse node:
# curl -v 10.60.11.241:8123/
* Trying 10.60.11.241...
* TCP_NODELAY set
* Connected to 10.60.11.241 (10.60.11.241) port 8123 (#0)
> GET / HTTP/1.1
> Host: 10.60.11.241:8123
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 16 May 2019 08:20:29 GMT
< Connection: Keep-Alive
< Keep-Alive: timeout=3
< Content-Type: text/html; charset=UTF-8
< Content-Length: 183
<
* Curl_http_done: called premature == 0
* Connection #0 to host 10.60.11.241 left intact
<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>
I think, simple request to '/' and search 'Ok' in body is not good for database healthcheck.
May be SELECT 1;
request will be more suitable, because it always return predefined result.
For example:
# echo "SELECT 1;" | curl -u grafana:*** --data-binary @- 10.60.11.241:8123
1
Hi, I have a config that works fine in the office:
server:
http:
listen_addr: :8124
users:
hack_me_please: true
clusters:
name: xxx-clickhouse
nodes:
users:
caches:
However, at home through the VPN I can’t get the connection:
ERROR: 2020/03/22 18:01:44 scope.go:638: error while health-checking "ch-1.xxx.net:8123" host: cannot send request in 3.000085934s: Get http://ch-1.xxx.net:8123: dial tcp XX.X.XX.XXX:8123: i/o timeout
Tell me what am I doing wrong?
I'm using jdbc client written on on java. When i use a long-time query i got this exception -
Caused by: java.sql.SQLException: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 4096; actual size: 4088)
at ru.yandex.clickhouse.response.ClickHouseResultSet.hasNext(ClickHouseResultSet.java:129)
at ru.yandex.clickhouse.response.ClickHouseResultSet.next(ClickHouseResultSet.java:143)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:93)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:61)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:679)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:617)
... 45 more
Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 4096; actual size: 4088)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:198)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at com.google.common.io.ByteStreams.read(ByteStreams.java:859)
at com.google.common.io.ByteStreams.readFully(ByteStreams.java:738)
at com.google.common.io.ByteStreams.readFully(ByteStreams.java:722)
at com.google.common.io.LittleEndianDataInputStream.readFully(LittleEndianDataInputStream.java:65)
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.readNextBlock(ClickHouseLZ4Stream.java:101)
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.checkNext(ClickHouseLZ4Stream.java:74)
at ru.yandex.clickhouse.response.ClickHouseLZ4Stream.read(ClickHouseLZ4Stream.java:60)
at ru.yandex.clickhouse.response.StreamSplitter.readFromStream(StreamSplitter.java:92)
at ru.yandex.clickhouse.response.StreamSplitter.next(StreamSplitter.java:54)
at ru.yandex.clickhouse.response.ClickHouseResultSet.hasNext(ClickHouseResultSet.java:116)
Here is my congif -
hack_me_please: true
server:
http:
listen_addr: ":9090"
read_timeout: 50m
users:
- name: ""
password: ""
to_cluster: ""
to_user: ""
clusters:
- name: ""
nodes: []
I also tried the same query directly to database and it works fine
Hi,guys
I run chproxy with Docker in the production env.
It brings me too much convenience. I like Docker
Can you provide a Docker images? I think this is a good idea.
Of course, I also created the docker image of chproxy.
If you think it's good, you can use it directly.
Dockerfile
Docker Hub
Merry Christmas 🎄🎄🎄🎉🎉
curl 'http://proxy-bip-readonly:@localhost:9090/' -d 'use hl_user_db; show databases ' Code: 62, e.displayText() = DB::Exception: Syntax error (Multi-statements are not allowed): failed at position 5: hl_user_db; show databases . (version 19.16.3.6 (official build))
chproxy does not support input multi-query, why ? I think It is very useful. Can I do by other way ?
Can you tell me about the release schedule? You have some release schedule? When is the next release planned?
Thanks!
I've noticed that chproxy doesn't support external data for queries. Documentation - https://clickhouse.yandex/docs/en/table_engines/external_data/
Example with clickhouse server:
$ echo -ne "1\n2\n3\n" > /tmp/testdata $ curl -F 'testdata=@/tmp/testdata;' 'http://localhost:8123?user=user&password=password&query=SELECT+*+FROM+testdata&testdata_structure=id+UInt32'
Response:
1 2 3
Example with chproxy:
$ echo -ne "1\n2\n3\n" > /tmp/testdata $ curl -F 'testdata=@/tmp/testdata;' 'http://localhost:9090?user=CHPROXY_USER&password=CHPROXY_PASSWORD&query=SELECT+*+FROM+testdata&testdata_structure=id+UInt32'
Response:
Code: 36, e.displayText() = DB::Exception: Neither structure nor types have not been provided for external table testdata. Use fields testdata_structure or testdata_types to do so., e.what() = DB::Exception
Are there any plans on fixing it?
There are a feature of ClickHouse server, allowing to send a current query execution progress via HTTP headers: https://clickhouse.yandex/docs/en/operations/settings/settings/#settings-send_progress_in_http_headers
There are CLI supporting HTTP interface of ClickHouse: https://github.com/hatarist/clickhouse-cli
It's very useful with chproxy, because chproxy doesn't support native protocol, but CLI tool is very handy for running manual queries. This CLI can draw a progress bar (as native clickhouse-client do) using data from X-Clickhouse-Progress
headers. Unfortunately, it doesn't work correctly with chproxy
.
When querying ClickHouse server directly with send_progress_in_http_headers=1
, server sends X-Clickhouse-Progress
periodically (twice per second) during query execution:
$ echo "select avg(number) from (select number from system.numbers limit 50000000);" | curl -v 'http://<host>:8123/?cancel_http_readonly_queries_on_client_close=1&send_progress_in_http_headers=1&query=' --data-binary @-
* Hostname was NOT found in DNS cache
* Trying <ip>...
* Connected to <host> (<ip>) port 8123 (#0)
> POST /?cancel_http_readonly_queries_on_client_close=1&send_progress_in_http_headers=1&query= HTTP/1.1
> User-Agent: curl/7.38.0
> Host: <host>:8123
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< Date: Tue, 09 Jul 2019 08:20:21 GMT
< Connection: Keep-Alive
< Content-Type: text/tab-separated-values; charset=UTF-8
< X-ClickHouse-Server-Display-Name: <host>
< Transfer-Encoding: chunked
< Query-Id: 07731bdc-e5ea-4786-94a7-26e98abfe8e7
< Keep-Alive: timeout=3
< X-ClickHouse-Progress: {"read_rows":"13631488","read_bytes":"109051904","total_rows":"0"}
< X-ClickHouse-Progress: {"read_rows":"25755648","read_bytes":"206045184","total_rows":"0"}
< X-ClickHouse-Progress: {"read_rows":"41549824","read_bytes":"332398592","total_rows":"0"}
<
24999999.5
* Connection #0 to host <host> left intact
In this example X-ClickHouse-Progress
headers appearing immidiately after sending a query
When querying ClickHouse server via chproxy
with send_progress_in_http_headers=1
, chproxy
will send all X-Clickhouse-Progress
headers only when query execution is finished
$ echo "select avg(number) from (select number from system.numbers limit 100000000);" | curl -v 'http://test:test@<chproxy>:9090' --data-binary @-
* Rebuilt URL to: http://test:test@<chproxy>:9090/
* Hostname was NOT found in DNS cache
* Trying <ip>...
* Connected to <chproxy> (<ip>) port 9090 (#0)
* Server auth using Basic with user 'test'
> POST / HTTP/1.1
> Authorization: Basic dGVzdDp0ZXN0
> User-Agent: curl/7.38.0
> Host: <chproxy>:9090
> Accept: */*
> Content-Length: 77
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 77 out of 77 bytes
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: text/tab-separated-values; charset=UTF-8
< Date: Tue, 09 Jul 2019 08:24:30 GMT
< Query-Id: 15AFAF990FDA68BB
< X-Clickhouse-Progress: {"read_rows":"15597568","read_bytes":"124780544","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"31588352","read_bytes":"252706816","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"47841280","read_bytes":"382730240","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"64028672","read_bytes":"512229376","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"80347136","read_bytes":"642777088","total_rows":"0"}
< X-Clickhouse-Progress: {"read_rows":"96600064","read_bytes":"772800512","total_rows":"0"}
< X-Clickhouse-Server-Display-Name: <host>
< Content-Length: 11
<
49999999.5
* Connection #0 to host <chproxy> left intact
DEBUG: 2019/07/09 08:24:30 proxy.go:113: [ Id: 15AFAF990FDA68BB; User "test"(1) proxying as "readonly"(1) to "<host>:8123"(1); RemoteAddr: "<ip>:45546"; LocalAddr:
"<ip>:9090"; Duration: 626.197469ms ]: request success; query: "select avg(number) from (select number from system.numbers limit 100000000);\n"; URL: "http://<host>
:8123/?cancel_http_readonly_queries_on_client_close=1&query_id=15AFAF990FDA68BB&send_progress_in_http_headers=1"
In this example X-Clickhouse-Progress
headers will appear in the end, so progress bar can't be displayed during query execution.
Is it possible to send X-Clickhouse-Progress
headers from chproxy
to the client as ClickHouse server do?
Thanks!
Hello,
I think it will be nice if you can some settings to allow only specific queries or deny specific queries (or maybe both).
For example:
allowed_query_words = SELECT,USE,SHOW,DESCRIBE,EXISTS,USE,KILL
disallowed_query_words=CREATE,ALTER,DROP,INSERT
So the user will only able to send queries starting with "SELECT"
I know there is a readonly mechanism inside Clickhouse. But through proxy we can restrict proxy users more granular.
What do you think?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.