go-graphite / graphite-clickhouse Goto Github PK
View Code? Open in Web Editor NEWGraphite cluster backend with ClickHouse support
License: MIT License
Graphite cluster backend with ClickHouse support
License: MIT License
Make option to send internal metrics (requests, memory etc) to carbon server (like carbonapi, go-carbon and carbon-clickhouse does).
I am running graphite-clickhouse with the recommended docker-compose and I am conducting a stress test with a command line tool.
I publish metrics every second from metrics like STRESS.host.ip-0.com.graphite.stresser.a.count
, I changed the rollup.xml
to test its performance such like:
<yandex>
<graphite_rollup>
<pattern>
<regexp>^STRESS\.</regexp>
<function>avg</function>
<retention>
<age>0</age>
<precision>1</precision>
</retention>
<retention>
<age>120</age>
<precision>5</precision>
</retention>
<retention>
<age>600</age>
<precision>60</precision>
</retention>
</pattern>
<default>
<function>avg</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>2592000</age>
<precision>3600</precision>
</retention>
</default>
</graphite_rollup>
</yandex>
(I tried with avg
function as well but the problem persisted.)
It is returning inconsistent, gapped points:
watch -n 1 "curl 'localhost:8080/render/?target=STRESS.host.ip-0.com.graphite.stresser.a.count&format=csv&from=-30s'"
watch -n 1 "curl 'localhost:8080/render/?target=STRESS.host.ip-0.com.graphite.stresser.a.count&format=csv&from=-150s&to=-120'"
With the follwing blacklist:
target-blacklist = ["^carbon.*","^clickhouse.*"]
the metrics finder still shows that the blacklisted prefixes exist:
$ curl http://10.47.127.37:8081/metrics/find/?query='*'
[{"allowChildren":1,"expandable":1,"leaf":0,"id":"clickhouse","text":"clickhouse","context":{}},{"allowChildren":1,"expandable":1,"leaf":0,"id":"carbon","text":"carbon","context":{}}]
although it hides their content properly:
$ curl http://10.47.127.37:8081/metrics/find/?query='carbon.*'
[]
I'd expect, that the '*' also hides all blacklisted series.
Thanks for fixing!
I use graphite-clickhouse buuild from current master,
config:
[common]
listen = ":9090"
max-cpu = 8
memory-return-interval = "0s"
max-metrics-in-find-answer = 0
[clickhouse]
url = "http://127.0.0.1:8123/"
extra-prefix = ""
data-table = "graphite.points_cluster"
data-timeout = "1m0s"
rollup-conf = "/etc/graphite-clickhouse/graphite_rollup.xml"
index-table = "graphite.index_cluster"
index-use-daily = true
index-timeout = "1m"
tagged-table = "graphite.tagged_cluster"
[prometheus]
external-url = "https://grafana.example.com/prometheus"
page-title = "Prometheus Time Series Collection and Processing Server"
[[data-table]]
table = "graphite.points_cluster"
reverse = true
rollup-conf = "/etc/graphite-clickhouse/graphite_rollup.xml"
[[logging]]
logger = ""
file = "stdout"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
graphite_rollup:
<pattern>
<regexp>^.*scrape_interval=10s.*$</regexp>
<function>avg</function>
<retention>
<age>0</age>
<precision>10</precision>
</retention>
<retention>
<age>259200</age>
<precision>60</precision>
</retention>
<retention>
<age>2592000</age>
<precision>600</precision>
</retention>
</pattern>
Collect 10s metric with telegraf and send to carbon-clickhouse.
If request metric for period above 7d - get error:
Object
xhrStatus:"complete"
request:Object
method:"GET"
url:"api/datasources/proxy/18/api/v1/query_range?query=mem_used_percent%7Bhost%3D%22metric%22%7D&start=1568832000&end=1569523200&step=600&timeout=10s"
response:Object
status:"error"
errorType:"execution"
error:"bufio.Scanner: token too long"
message:"bufio.Scanner: token too long"
in log graphite-clickhouse:
[2019-09-26T21:42:56.285+0300] INFO [query] query {"query": "SELECT splitByChar('=', Tag1)[2] as value FROM graphite.tagged_cluster WHERE (Tag1 LIKE 'host=%') AND (Date >= '2019-09-19') GROUP BY value ORDER BY value", "request_id": "81fe3ec423d58c03e75dccd575f9c19a", "time": 0.00851583}
[2019-09-26T21:42:56.285+0300] INFO access {"request_id": "81fe3ec423d58c03e75dccd575f9c19a", "grafana": "Org:1; Dashboard:; Panel:", "time": 0.008859073, "method": "GET", "url": "/api/v1/label/host/values", "peer": "[::1]:6660", "status": 200}
[2019-09-26T21:42:56.351+0300] INFO [query] query {"query": "SELECT Path FROM graphite.tagged_cluster WHERE ((Tag1='__name__=mem_used_percent') AND (arrayExists((x) -> x='host=metric', Tags))) AND (Date >='2019-09-18' AND Date <= '2019-09-26') GROUP BY Path", "request_id": "5d67a79bb41a4e85e2d6c5eea1038326", "time": 0.011381311}
[2019-09-26T21:42:56.394+0300] INFO access {"request_id": "5d67a79bb41a4e85e2d6c5eea1038326", "grafana": "Org:1; Dashboard:400; Panel:3", "time": 0.054455356, "method": "GET", "url": "/api/v1/query_range?query=mem_used_percent%7Bhost%3D%22metric%22%7D&start=1568832000&end=1569523200&step=600&timeout=10s", "peer": "[::1]:6660", "status": 422}
Request only one metric, if request data, for example, 1 day -10 days ago, the data is displayed
I've got the following concerns about current http error handling:
Hello!
When i trying using tags i get :
$ graphite-clickhouse -tags
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7b9ad2]
goroutine 1 [running]:
github.com/lomik/graphite-clickhouse/tagger.(*Set).Merge(0x0, 0x0, 0x1104955)
/root/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/tagger/set.go:53 +0x22
github.com/lomik/graphite-clickhouse/tagger.Make(0xc420210000, 0x8, 0x1)
/root/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/tagger/tagger.go:225 +0xd23
main.main()
/root/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/graphite-clickhouse.go:149 +0x3b5
Hi,
I trust you and your loved ones are well and safe.
I'm probably misunderstanding things here so your patience is appreciated.
It appears that when an empty data set is returned it is done so with a http 404 status.
I'm honestly not sure if this is intentional but the behaviour appears to trigger the retry functionality in carbonapi, it could be that it should not...
So my question is, should graphite-clickhouse return a 404 when correctly returning an empty data set?
Thanks.
Looks like the protocol in https://github.com/lomik/graphite-clickhouse/blob/master/carbonzipperpb/carbonzipper.proto is derived from v1 of go-graphite/protocol
.
How about supporting v2 or v3? Are there any fundamental limitations to this?
Hello,
I'm using GraphiteMergeTree with rollup function 'quantile' (https://clickhouse-docs.readthedocs.io/en/latest/agg_functions/#quantile-level-x). It seems that graphite-clcikhouse does not support this function.
2020/01/14 10:48:19 unknown function "quantile(0.95)"
It probably should be here: https://github.com/lomik/graphite-clickhouse/blob/a6fed39d02064bf520d2201dfdfafd341a66d051/helper/rollup/aggr.go
Is it possible to add it ? :)
Regads,
Mateusz
After getting tags correctly, I'm unable to populate the tags table using the command, nor get any results when directly talking to the api.
[root@core-clickhouse01 log]# curl http://127.0.0.1:9090/tags/autoComplete/tags?pretty=1&limit=100
[1] 4937
[root@core-clickhouse01 log]# clickhouse response status 404: Code: 46, e.displayText() = DB::Exception: Unknown table function WHERE, e.what() = DB::Exception
[root@core-clickhouse01 log]# graphite-clickhouse -tags
[root@core-clickhouse01 log]# clickhouse-client
ClickHouse client version 1.1.54380.
Connecting to localhost:9000.
Connected to ClickHouse server version 1.1.54380.
core-clickhouse01 :) select * from graphite_tag;
SELECT *
FROM graphite_tag
┌───────Date─┬─Level─┬─Tag1─┬─Path─┬─IsLeaf─┬─Tags─┬────Version─┐
│ 2016-11-01 │ 0 │ │ │ 0 │ [] │ 1525973670 │
└────────────┴───────┴──────┴──────┴────────┴──────┴────────────┘
┌───────Date─┬─Level─┬─Tag1─┬─Path─┬─IsLeaf─┬─Tags─┬────Version─┐
│ 2016-11-01 │ 0 │ │ │ 0 │ [] │ 1525958477 │
└────────────┴───────┴──────┴──────┴────────┴──────┴────────────┘
2 rows in set. Elapsed: 10.025 sec.
I'm feeding in tagged data
SELECT *
FROM graphite
WHERE Path LIKE 'system.%'
LIMIT 10
┌─Path───────────────────────────────────────────┬─Value─┬───────Time─┬───────Date─┬──Timestamp─┐
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958330 │ 2018-05-10 │ 1525958354 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958350 │ 2018-05-10 │ 1525958369 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958360 │ 2018-05-10 │ 1525958389 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958380 │ 2018-05-10 │ 1525958414 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958410 │ 2018-05-10 │ 1525958444 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958450 │ 2018-05-10 │ 1525958479 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958480 │ 2018-05-10 │ 1525958519 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958530 │ 2018-05-10 │ 1525958564 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958590 │ 2018-05-10 │ 1525958614 │
│ system.core.count?host=core-dddev01&type=gauge │ 1 │ 1525958630 │ 2018-05-10 │ 1525958669 │
└────────────────────────────────────────────────┴───────┴────────────┴────────────┴────────────┘
10 rows in set. Elapsed: 0.056 sec. Processed 24.58 thousand rows, 2.09 MB (442.29 thousand rows/s., 37.63 MB/s.)
graphite-conifg.conf
[common]
listen = ":9090"
max-cpu = 1
[clickhouse]
url = "http://localhost:8123"
data-table = "graphite"
tree-table = "graphite_tree"
date-tree-table = ""
date-tree-table-version = 0
rollup-conf = "/etc/graphite-clickhouse/rollup.xml"
tag-table = "graphite_tag"
extra-prefix = ""
data-timeout = "1m0s"
tree-timeout = "1m0s"
[carbonlink]
server = ""
threads-per-request = 10
connect-timeout = "50ms"
query-timeout = "50ms"
total-timeout = "500ms"
[[logging]]
logger = ""
file = "/var/log/graphite-clickhouse.log"
level = "debug"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
carbon-clickhouse.conf
[common]
metric-prefix = "carbon.ck-agents.{host}"
metric-endpoint = "local"
metric-interval = "30s"
max-cpu = 2
[logging]
file = "/var/log/carbon-clickhouse.log"
level = "debug"
[data]
path = "/var/local/carbon-clickhouse/data"
chunk-interval = "1s"
chunk-auto-interval = ""
[upload.graphite]
type = "points"
table = "graphite"
threads = 1
url = "http://localhost:8123/"
timeout = "1m0s"
[upload.graphite_tree]
type = "tree"
table = "graphite_tree"
date = "2016-11-01"
threads = 1
url = "http://localhost:8123/"
timeout = "1m0s"
cache-ttl = "12h0m0s"
[upload.graphite_tagged]
type = "tagged"
table = "graphite_tagged"
threads = 1
url = "http://localhost:8123/"
timeout = "1m0s"
cache-ttl = "12h0m0s"
[udp]
listen = ":2003"
enabled = true
[tcp]
listen = ":2003"
enabled = true
The data in question looks like this:
system.swap.total;host=core-dddev01;type=gauge 0 1525976850
system.swap.free;host=core-dddev01;type=gauge 0 1525976850
system.mem.usable;host=core-dddev01;type=gauge 630.55859375 1525976850
system.mem.pct_usable;host=core-dddev01;type=gauge 0.6871520032692537 1525976850
I think I have all my bases covered, so I must be missing something.
Hello,
I would like to know how to send the metrics from ClickHouse to Graphite to visualize them from Grafana. Could you help me? Thank you!
We use the following table for date-tree
graphite.date_metrics ( Path String, Level UInt32, Date Date ) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/graphite.date_metrics', '{replica}', Date, (Level, Path, Date), 8192) AS SELECT toUInt32(length(splitByChar('.', Path))) AS Level, Date, Path FROM graphite.data;
Starting from graphite-clickhouse-0.7 it tries to query with Deleted field even when date-tree-table-version is being set to 1.
Now it's not possible becuase they have extra element.
Though except of that, files are identical. This makes no sense to keep 2 sets of files, one for clickhouse, another for graphite-clickhouse.
With a big dataset http interface performs considerabley slower.
Clickhouse version: 20.3.12.112-stable
export QUERY="SELECT Path, Time, Value, Timestamp
FROM graphite_points
PREWHERE Date >='2020-06-25' AND Date <= '2020-07-02'
WHERE
Time >= 1593063300
AND Time <= 1593668401
AND Path IN (SELECT Path FROM graphite_tagged WHERE Tag1='__name__=notifications_latency_ms_bucket' AND Date >='2020-06-25' AND Date <= '2020-07-02' GROUP BY Path)
FORMAT RowBinary;"
$ time clickhouse-client -h 127.0.0.1 --port 9000 <<< $QUERY | wc -c
5596887430
real 0m8.670s
user 0m3.551s
sys 0m4.532s
$ time curl -s --data-binary @- 'http://127.0.0.1:8123/' <<< $QUERY | wc -c
5596887430
real 0m22.171s
user 0m6.714s
sys 0m15.519s
Could you explain in a few words what does graphite-clickhouse
? I am familiar with basic original Graphite stack:
Per my understanding in clickhouse case graphite-clickhouse
needs to be located before clickhouse
for providing the same metrics (as Whisper) for Graphite frontend (Graphite-web or carbonapi etc). Is it correct? Useful links are also appreciated.
I have a metric called my.something_sum
with the following rollup rules setup in graphite-clickhouse and clickhouse-server:
<pattern>
<regexp>^my\..*_sum(\.|$)</regexp>
<function>sum</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>1209600</age>
<precision>900</precision>
</retention>
<retention>
<age>5184000</age>
<precision>86400</precision>
</retention>
</pattern>
So, this request /render/?target=my.something_sum&from=-64d&format=csv
can return either a list of values with resolution = 60sec or resolution = 86400sec, but I expect the resolution to be 86400sec.
The problem, as I see, lies here https://github.com/lomik/graphite-clickhouse/blob/138d67073ca210eceb7f5c8efa31814c220d0877/helper/rollup/rollup.go#L215
Here the first point's timestamp is analyzed. In a situation when the event represented by this metric is rare we have the first non-null value (only non-null values are present in clickhouse) with timestamp being less than age = 5184000, so custom precision isn't applied.
Graphite-web always applies the resolution corresponding to from
field in request, not the age of the oldest metric's point.
Isn't it a problem?
Requests for tagged metrics are slow when there are many tags in ClickHouse. In our test case we had 3 tags and 10K values for each, metrics were generated for period of one year. So a seriesByTag
request took about 1 second.
I found that graphite-clickhouse makes 2 queries to ClickHouse when processing seriesByTag requests:
Query to get Paths by tags (from graphite_tagged table).
Query to get metrics by these Paths.
The first query is pretty slow (0.6 s in our test). Can we make the second query without it, just doing
WHERE Path LIKE '%tag1_name=tag2_value%' AND Path LIKE '%tag2_name=tag2_value'...
instead of
WHERE Path IN (<results of the first query>)
?
It is much faster, takes just 0.04 s on the same test data.
Maybe there are some cases when original queries are faster and it would be better to have a config option to enable the proposed changes.
Are there any issues with this? If no, then I can implement it and make a PR.
UPD: This feature could be used to avoid problems with too long queries.
Besides, the maximum metrics per query should be checked
Hello. I would like to implement a feature to mitigate an error DB::Exception: Syntax error: failed at position 262125: ..... Max query size exceeded
The idea is to get the current user setting from ClickHouse. Then split the query https://github.com/lomik/graphite-clickhouse/blob/master/render/handler.go#L162 to multiple and send then in parallel. The setting could be refreshed as well as rollup config in background once per minute.
What do you think?
Hi,
8ea233a says version 0.11.4, but the last tag is v0.11.1 - could you push your git tags?
Thanks,
Bernd
Hi, it seems that this backend does not support tag queries very well when there is also some computation.
The following query displays the series as perSecond(interface_in_octets
and perSecond(interface_out_octets
:
aliasByTags(perSecond(seriesByTag('hostname=$host', 'ifName=~${interface:regex}', 'name=~interface_(in|out)_octets')), 'name')
While the following query works correct (but does not show correct data):
aliasByTags(seriesByTag('hostname=$host', 'ifName=~${interface:regex}', 'name=~interface_(in|out)_octets'), 'name')
Hello,
I'm using the reversed Path for the data table. It works well however the rollup conf in graphite-clickhouse does not seem to work with regexps on the reversed Path. So I have to maintain 2 rollup.xml files: 1 for graphite-clickhouse that applies to the non-reversed Path and another for clickhouse that works on the reversed Path. For example:
For graphite-clickhouse:
<pattern>
<regexp>^carbon\.</regexp>
<function>avg</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>1296000</age>
<precision>600</precision>
</retention>
</pattern>
And for CH:
<pattern>
<regexp>\.carbon$</regexp>
<function>avg</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>1296000</age>
<precision>600</precision>
</retention>
</pattern>
Would it be possible to use the same logic in graphite-clickhouse as in CH?
I'm currently using carbonapi & go-carbon but would like to switch to clickhouse for ingest performance. The following query takes <1s to run over the past 3h of data (each * expands to roughly 500 metrics):
aliasByNode(highestAverage(group(
scale(divideSeriesLists($prefix.$group.$server.cloudlinux.*.counter.CPU,$prefix.$group.$server.cloudlinux.*.gauge.lCPU), 0.00001),
divideSeriesLists($prefix.$group.$server.cloudlinux.*.counter.IOPS,$prefix.$group.$server.cloudlinux.*.gauge.lIOPS),
divideSeriesLists($prefix.$group.$server.cloudlinux.*.counter.IO,$prefix.$group.$server.cloudlinux.*.gauge.lIO),
divideSeriesLists($prefix.$group.$server.cloudlinux.*.gauge.NPROC,$prefix.$group.$server.cloudlinux.*.gauge.lNPROC),
divideSeriesLists($prefix.$group.$server.cloudlinux.*.gauge.EP,$prefix.$group.$server.cloudlinux.*.gauge.lEP),
divideSeriesLists($prefix.$group.$server.cloudlinux.*.gauge.MEMPHY,$prefix.$group.$server.cloudlinux.*.gauge.lMEMPHY)
) , 10),4,6)
When I switch carbonapi to point at graphite-clickhouse it now takes ~27sec for the same query ie ~30-60* slower. How can I debug this and improve the performance?
in what there can be an error in a configuration?
do not go to the table in the archive
[common]
listen = ":9090"
max-cpu = 8
# Daemon returns empty response if query matches any of regular expressions
# target-blacklist = ["^not_found.*"]
[clickhouse]
# You can add user/password (http://user:password@localhost:8123) and any clickhouse options (GET-parameters) to url
# It is recommended to create read-only user
url = "http://localhost:8123"
data-table = "graphite.points_cluster"
tree-table = "graphite.tree_cluster"
# Optional table with daily series list.
# Useful for installations with big count of short-lived series
date-tree-table = "graphite.series_daily_cluster"
#date-tree-table = ""
# Supported several schemas of date-tree-table:
# 1 (default): table only with Path, Date, Level fields. Described here: https://habrahabr.ru/company/avito/blog/343928/
# 2: table with Path, Date, Level, Deleted, Version fields. Table type "series" in the carbon-clickhouse
date-tree-table-version = 2
rollup-conf = "/etc/graphite-clickhouse/rollup.xml"
# `tagged` table from carbon-clickhouse. Required for seriesByTag
tagged-table = ""
# Add extra prefix (directory in graphite) for all metrics
extra-prefix = ""
data-timeout = "30s"
tree-timeout = "30s"
[carbonlink]
server = ""
threads-per-request = 10
connect-timeout = "50ms"
query-timeout = "50ms"
total-timeout = "500ms"
[[data-table]]
table = "graphite.points_cluster"
max-age = "240h"
reverse = true
rollup-conf = "/etc/graphite-clickhouse/rollup.xml"
[[data-table]]
table = "graphite.points_archive_cluster"
min-age = "240h"
reverse = true
rollup-conf = "/etc/graphite-clickhouse/rollup_archive.xml"
[[logging]]
logger = ""
file = "/var/log/graphite-clickhouse/graphite-clickhouse.log"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
Hi,
we use graphite-clickhouse and carbon-clickhouse in production and are very happy with the performance improvements over our previous graphite stack.
the query logging is too much for us but we would like to keep the access logging.
So, ideally we would like to log query at warn and everything else at info.
We have unsuccessfully tried this configuration but feel that we are misunderstanding how this should work:
[[logging]]
logger = "query"
file = "/var/log/graphite-clickhouse/graphite-clickhouse.log"
level = "warn"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
[[logging]]
logger = ""
file = "/var/log/graphite-clickhouse/graphite-clickhouse.log"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
we reversed the ordering and tried playing around with it in general but the result is the same - when we set any logger to be warn we get no logs, we dont see many warnings or errors so I guess we may not have been patient enough for them but I would have expected to see many access log lines.
Cheers!
I have next docker-compose file:
clickhouse:
image: yandex/clickhouse-server:19.6.2.11
volumes:
- "./rollup.xml:/etc/clickhouse-server/config.d/rollup.xml"
- "./init.sql:/docker-entrypoint-initdb.d/init.sql"
- "./data/clickhouse/data:/var/lib/clickhouse/data"
- "./data/clickhouse/metadata:/var/lib/clickhouse/metadata"
carbon-clickhouse:
image: lomik/carbon-clickhouse:v0.10.2
volumes:
- "./data/carbon-clickhouse:/data/carbon-clickhouse"
- "./carbon-clickhouse.conf:/etc/carbon-clickhouse/carbon-clickhouse.conf"
ports:
- "2003:2003" # plain tcp
- "2003:2003/udp" # plain udp
- "2004:2004" # pickle
- "2006:2006" # prometheus remote write
links:
- clickhouse
graphite-clickhouse:
image: lomik/graphite-clickhouse:v0.11.1
volumes:
- "./rollup.xml:/etc/graphite-clickhouse/rollup.xml"
- "./graphite-clickhouse.conf:/etc/graphite-clickhouse/graphite-clickhouse.conf"
links:
- clickhouse
grafana:
image: grafana/grafana
restart: always
ports:
- 3000:3000
links:
- graphite-clickhouse
I get error message on adding graphite data source:
I also put data in graphite via nc:
echo "local.random.diceroll 4
date +%s" | nc 127.0.0.1 2003
Graphite-web seems works fine. Is there any way to get the data in Grafana from graphite-clickhouse?
Hi.
I have installed new version 0.11.6 and now I have a problem with autoComplete/tag
I try to request all tags in Grafana (dropdow in interface) and there is no tags. It is calling tags/autoComplete/tags
I see in carbonapi logs
ERROR zipper error fetching result {"type": "protoV2Group", "name": "clickhouse-cluster", "type": "tagName", "function": "HttpQuery.doRequest", "server": "http://graphite-clickhouse:9090", "name": "clickhouse-cluster", "uri": "http://graphite-clickhouse:9090/tags/autoComplete/tags", "error": "Get http://graphite-clickhouse:9090/tags/autoComplete/tags: EOF"}
I see in logs graphite-clickhouse next message
[2019-07-15T05:09:15.687Z] INFO access {"request_id": "744edfcb576e13af8d578e289f0fc50c", "grafana": "Org:6; Dashboard:; Panel:", "time": 0.006001445, "method": "GET", "url": "/metrics/find/?format=protobuf&query=%2A", "peer": "10.0.0.29:40438", "status": 200}
2019/07/15 05:09:15 http: panic serving 10.0.0.29:40438: runtime error: invalid memory address or nil pointer dereference
goroutine 1003 [running]:
net/http.(*conn).serve.func1(0xc0003de3c0)
/usr/local/go/src/net/http/server.go:1769 +0x139
panic(0x16bbd60, 0x2cac7c0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/lomik/graphite-clickhouse/pkg/where.(*Where).And(0x0, 0xc0007bc400, 0x14)
/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/pkg/where/where.go:126 +0x37
github.com/lomik/graphite-clickhouse/pkg/where.(*Where).Andf(0x0, 0x18aa925, 0xc, 0xc000a63870, 0x1, 0x1)
/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/pkg/where/where.go:145 +0x75
github.com/lomik/graphite-clickhouse/autocomplete.(*Handler).ServeTags(0xc0007bb990, 0x1d4fce0, 0xc00050a820, 0xc000742100)
/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/autocomplete/autocomplete.go:118 +0x267
github.com/lomik/graphite-clickhouse/autocomplete.(*Handler).ServeHTTP(0xc0007bb990, 0x1d4fce0, 0xc00050a820, 0xc000742100)
/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/autocomplete/autocomplete.go:45 +0x80
main.Handler.func1(0x1d50020, 0xc0004a41c0, 0xc000742000)
/go/src/github.com/lomik/graphite-clickhouse/_vendor/src/github.com/lomik/graphite-clickhouse/graphite-clickhouse.go:65 +0xe4
net/http.HandlerFunc.ServeHTTP(0xc00000f3a0, 0x1d50020, 0xc0004a41c0, 0xc000742000)
/usr/local/go/src/net/http/server.go:1995 +0x44
net/http.(*ServeMux).ServeHTTP(0x2cd1720, 0x1d50020, 0xc0004a41c0, 0xc000742000)
/usr/local/go/src/net/http/server.go:2375 +0x1d6
net/http.serverHandler.ServeHTTP(0xc00047d520, 0x1d50020, 0xc0004a41c0, 0xc000742000)
/usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.(*conn).serve(0xc0003de3c0, 0x1d5b1a0, 0xc000702700)
/usr/local/go/src/net/http/server.go:1878 +0x851
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2884 +0x2f4
If is getting request with defined tags all works correctly. Example, request seriesByTag('service_name=tech_consul_node', 'host=tech-node3', 'cpu=cpu-total')
Hi,
Could it be possible to build new release? Latest build is from 0.9.0. Also a changelog would be nice.
Looking at the format=json outfor for a recently added metric, I see
[{"target": "collectd.........", "datapoints": [[null, 1492522390], [null, 1492522400], [null, 1492522410], [null, 1492522420], [null, 1492522430], [null, 1492522440], [null, 1492522450], [null, 1492522460], [null, 1492522470], [null, 1492522480], [null, 1492522490], [null, 1492522500], [null, 1492522510], [null, 1492522520], [null, 1492522530], [null, 1492522540], [null, 1492522550], [null, 1492522560], [null, 1492522570], [null, 1492522580], [null, 1492522590], [null, 1492522600], [null, 1492522610], [null, 1492522620], [null, 1492522630], [null, 1492522640], [null, 1492522650], [null, 1492522660],
........,
[NaN, 1492606510], [0.0, 1492606520], [0.0, 1492606530], [0.0, 1492606540], [0.0, 1492606550], [0.0, 1492606560], [0.0, 1492606570], [0.0, 1492606580], [0.0, 1492606590], [0.0, 1492606600], [0.0, 1492606610], [0.0, 1492606620], [0.0, 1492606630], [0.0, 1492606640], [0.0, 1492606650], [0.0, 1492606660], [0.0, 1492606670], [0.0, 1492606680], [0.0, 1492606690], [0.0, 1492606700], [0.0, 1492606710], [0.0, 1492606720], [0.0, 1492606730], [0.0, 1492606740], [0.0, 1492606750], [0.0, 1492606760], [0.0, 1492606770], [0.0, 1492606780], [0.0, 1492606790], [0.0, 1492606800], [0.0, 1492606810], [0.0, 1492606820], [0.0, 1492606830], [0.0, 1492606840], [0.0, 1492606850], [0.0, 1492606860], [0.0, 1492606870], [0.0, 1492606880], [0.0, 1492606890], [0.0, 1492606900], [0.0, 1492606910], [0.0, 1492606920], [0.0, 1492606930], [0.0, 1492606940], [0.0, 1492606950], [0.0, 1492606960], [0.0, 1492606970], [0.0, 1492606980], [0.0, 1492606990], [0.0, 1492607000], [0.0, 1492607010], [0.0, 1492607020], [0.0, 1492607030], [0.0, 1492607040], [0.0, 1492607050], [0.0, 1492607060], [0.0, 1492607070], [0.0, 1492607080], [0.0, 1492607090], [0.0, 1492607100], [0.0, 1492607110], [0.0, 1492607120], [0.0, 1492607130], [0.0, 1492607140], [0.0, 1492607150], [0.0, 1492607160], [0.0, 1492607170], [0.0, 1492607180], [0.0, 1492607190], [0.0, 1492607200], [0.0, 1492607210], [0.0, 1492607220], [0.0, 1492607230], [0.0, 1492607240], [0.0, 1492607250], [0.0, 1492607260], [0.0, 1492607270], [0.0, 1492607280], [0.0, 1492607290], [0.0, 1492607300], [0.0, 1492607310], [0.0, 1492607320], [0.0, 1492607330], [0.0, 1492607340], [0.0, 1492607350], [0.0, 1492607360], [0.0, 1492607370], [0.0, 1492607380], [0.0, 1492607390], [0.0, 1492607400], [0.0, 1492607410], [0.0, 1492607420], [0.0, 1492607430], [0.0, 1492607440], [0.0, 1492607450], [0.0, 1492607460], [0.0, 1492607470], [0.0, 1492607480], [0.0, 1492607490], [0.0, 1492607500], [0.0, 1492607510], [0.0, 1492607520], [0.0, 1492607530], [0.0, 1492607540], [0.0, 1492607550], [0.0, 1492607560], [0.0, 1492607570], [0.0, 1492607580], [0.0, 1492607590], [0.0, 1492607600], [0.0, 1492607610], [0.0, 1492607620], [0.0, 1492607630], [0.0, 1492607640], [0.0, 1492607650], [0.0, 1492607660], [0.0, 1492607670], [0.0, 1492607680], [0.0, 1492607690], [0.0, 1492607700], [0.0, 1492607710], [0.0, 1492607720], [0.0, 1492607730], [0.0, 1492607740], [0.0, 1492607750], [0.0, 1492607760], [0.0, 1492607770], [0.0, 1492607780], [0.0, 1492607790], [0.0, 1492607800], [0.0, 1492607810], [0.0, 1492607820], [0.0, 1492607830], [0.0, 1492607840], [0.0, 1492607850], [0.0, 1492607860], [0.0, 1492607870], [0.0, 1492607880], [0.0, 1492607890], [0.0, 1492607900], [0.0, 1492607910], [0.0, 1492607920], [0.0, 1492607930], [0.0, 1492607940], [0.0, 1492607950], [0.0, 1492607960], [0.0, 1492607970], [0.0, 1492607980], [0.0, 1492607990], [0.0, 1492608000], [0.0, 1492608010], [0.0, 1492608020], [0.0, 1492608030], [0.0, 1492608040], [0.0, 1492608050], [0.0, 1492608060], [0.0, 1492608070], [0.0, 1492608080], [0.0, 1492608090], [0.0, 1492608100], [0.0, 1492608110], [0.0, 1492608120], [0.0, 1492608130], [0.0, 1492608140], [0.0, 1492608150], [0.0, 1492608160], [0.0, 1492608170], [0.0, 1492608180], [0.0, 1492608190], [0.0, 1492608200], [0.0, 1492608210], [0.0, 1492608220], [0.0, 1492608230], [0.0, 1492608240], [0.0, 1492608250], [0.0, 1492608260], [0.0, 1492608270], [0.0, 1492608280], [0.0, 1492608290], [0.0, 1492608300], [0.0, 1492608310], [0.0, 1492608320], [0.0, 1492608330], [0.0, 1492608340], [0.0, 1492608350], [0.0, 1492608360], [0.0, 1492608370], [0.0, 1492608380], [0.0, 1492608390], [0.0, 1492608400], [0.0, 1492608410], [0.0, 1492608420], [0.0, 1492608430], [0.0, 1492608440], [0.0, 1492608450], [0.0, 1492608460], [0.0, 1492608470], [0.0, 1492608480], [0.0, 1492608490], [0.0, 1492608500], [0.0, 1492608510], [0.0, 1492608520], [0.0, 1492608530], [0.0, 1492608540], [0.0, 1492608550], [0.0, 1492608560], [0.0, 1492608570], [0.0, 1492608580], [0.0, 1492608590], [0.0, 1492608600], [0.0, 1492608610], [0.0, 1492608620], [0.0, 1492608630], [0.0, 1492608640], [0.0, 1492608650], [0.0, 1492608660], [0.0, 1492608670], [0.0, 1492608680], [0.0, 1492608690], [0.0, 1492608700], [0.0, 1492608710], [0.0, 1492608720], [0.0, 1492608730], [0.0, 1492608740], [0.0, 1492608750], [0.0, 1492608760], [0.0, 1492608770], [null, 1492608780]]}]
Trying to look at the format=png image of such a query results in the following backtrace:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python2.7/dist-packages/graphite/render/views.py", line 215, in renderView
image = doImageRender(requestOptions['graphClass'], graphOptions)
File "/usr/lib/python2.7/dist-packages/graphite/render/views.py", line 436, in doImageRender
img = graphClass(**graphOptions)
File "/usr/lib/python2.7/dist-packages/graphite/render/glyph.py", line 196, in __init__
self.drawGraph(**params)
File "/usr/lib/python2.7/dist-packages/graphite/render/glyph.py", line 678, in drawGraph
self.setupYAxis()
File "/usr/lib/python2.7/dist-packages/graphite/render/glyph.py", line 1137, in setupYAxis
self.yLabelWidth = max([self.getExtents(label)['width'] for label in self.yLabels])
ValueError: max() arg is an empty sequence
Right now I have no idea where this issue comes from, it might be the NaN value. With the normal graphite whisper backend the issue does not happen at all.
Hi,
the old graphite_tree table had a "deleted" row to mark Paths as deleted - which was very helpful in those cases where badly named series appeared for whatever reason.
Is there a way to make this work with the index table?
Thanks,
Bernd
I'm trying to expose tags, and I'm getting the following errors using v0.6.3:
# graphite-clickhouse -tags
2018/05/09 06:52:44 clickhouse response status 500: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 14 (line 1, col 14): (Date,Version,Level,Path,IsLeaf,Tags,Tag1) FORMAT RowBinary �B ��Z. Expected one of: TABLE, identifier, FUNCTION, e.what() = DB::Exception
And in the log file:
2018-05-09T07:36:42.694-0500] INFO [tagger] parse rules {}
[2018-05-09T07:36:42.695-0500] INFO [tagger] parse rules {"time": 0.000354584, "mem_rss_mb": 2}
[2018-05-09T07:36:42.695-0500] INFO [tagger] read and parse tree {}
[2018-05-09T07:36:42.705-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 0 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.010675804}
[2018-05-09T07:36:42.710-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 1 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.004206918}
[2018-05-09T07:36:42.714-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 2 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.003856585}
[2018-05-09T07:36:42.719-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 3 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.003746844}
[2018-05-09T07:36:42.724-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 4 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.004499933}
[2018-05-09T07:36:42.730-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 5 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.005991626}
[2018-05-09T07:36:42.735-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 6 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.004849088}
[2018-05-09T07:36:42.741-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 7 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.00503664}
[2018-05-09T07:36:42.745-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 8 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.004059725}
[2018-05-09T07:36:42.750-0500] INFO [query] query {"query": "SELECT Path FROM graphite_tree WHERE cityHash64(Path) % 10 == 9 GROUP BY Path HAVING argMax(Deleted, Version)==0 FORMAT RowBinary", "request_id": "", "time": 0.00467767}
[2018-05-09T07:36:42.753-0500] INFO [tagger] read and parse tree {"time": 0.057897203, "mem_rss_mb": 4}
[2018-05-09T07:36:42.753-0500] INFO [tagger] sort {}
[2018-05-09T07:36:42.759-0500] INFO [tagger] sort {"time": 0.00616382, "mem_rss_mb": 4}
[2018-05-09T07:36:42.759-0500] INFO [tagger] make map {}
[2018-05-09T07:36:42.760-0500] INFO [tagger] make map {"time": 0.000638004, "mem_rss_mb": 4}
[2018-05-09T07:36:42.760-0500] INFO [tagger] match {}
[2018-05-09T07:36:42.764-0500] INFO [tagger] match {"time": 0.004148708, "mem_rss_mb": 4}
[2018-05-09T07:36:42.764-0500] INFO [tagger] copy tags from childs to parents {}
[2018-05-09T07:36:42.764-0500] INFO [tagger] copy tags from childs to parents {"time": 0.000559527, "mem_rss_mb": 4}
[2018-05-09T07:36:42.764-0500] INFO [tagger] marshal RowBinary + gzip {}
[2018-05-09T07:36:42.765-0500] INFO [tagger] marshal RowBinary + gzip {"time": 0.000569245, "mem_rss_mb": 7}
[2018-05-09T07:36:42.765-0500] INFO [tagger] upload to clickhouse {}
[2018-05-09T07:36:42.794-0500] ERROR [query] query {"query": "INSERT INTO (Date,Version,Level,Path,IsLeaf,Tags,Tag1) FORMAT RowBinary", "request_id": "", "time": 0.028687229, "error": "clickhouse response status 500: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 14 (line 1, col 14): (Date,Version,Level,Path,IsLeaf,Tags,Tag1) FORMAT RowBinary\n\ufffdBZ\ufffd\ufffdZ\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000. Expected one of: TABLE, identifier, FUNCTION, e.what() = DB::Exception\n"}
Clickhouse version is 1.1.54380.
Hi,
it would be nice to have the option to blacklist all metrics and whitelist a few chosen ones.
I might come up with a PR at some point, but I have the hope that you are faster ;)
Thanks,
Bernd
After upgrade to v0.6.4 Grafana hung at tag_values().
On graphite-web: Read timed out. (read timeout=10.0)
On ClickHouse cluster: Traffic raise from 200Mb/s to 4 Gb/s
Hello!
Is there any options to return zero or null to grafana if there is no datapoints in timerange?
graphite-clickhouse -version
0.11.7
cat /etc/clickhouse-server/config.d/rollup.xml
<yandex>
<graphite_rollup>
<default>
<function>avg</function>
<retention>
<age>0</age>
<precision>10</precision>
</retention>
<retention>
<age>259200</age>
<precision>30</precision>
</retention>
<retention>
<age>1209600</age>
<precision>300</precision>
</retention>
<retention>
<age>2419200</age>
<precision>900</precision>
</retention>
<retention>
<age>29030400</age>
<precision>3600</precision>
</retention>
</default>
</graphite_rollup>
</yandex>
Распределенная таблица
SHOW CREATE TABLE graphite_reverse
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.graphite_reverse (`Path` String, `Value` Float64, `Time` UInt32, `Date` Date, `Timestamp` UInt32) ENGINE = Distributed(shardOne, shardOne, graphite_reverse, sipHash64(Path)) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Таблица с данными
SHOW CREATE TABLE shardOne.graphite_reverse
┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE shardOne.graphite_reverse (`Path` String, `Value` Float64, `Time` UInt32, `Date` Date, `Timestamp` UInt32) ENGINE = GraphiteMergeTree(Date, (Path, Time), 8192, 'graphite_rollup') │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
правила Retentions из БД
SELECT *
FROM system.graphite_retentions
┌─config_name─────┬─regexp─┬─function─┬──────age─┬─precision─┬─priority─┬─is_default─┬─Tables.database─┬─Tables.table─────────┐
│ graphite_rollup │ │ avg │ 29030400 │ 3600 │ 65535 │ 1 │ ['shardOne'] │ ['graphite_reverse'] │
│ graphite_rollup │ │ avg │ 2419200 │ 900 │ 65535 │ 1 │ ['shardOne'] │ ['graphite_reverse'] │
│ graphite_rollup │ │ avg │ 1209600 │ 300 │ 65535 │ 1 │ ['shardOne'] │ ['graphite_reverse'] │
│ graphite_rollup │ │ avg │ 259200 │ 30 │ 65535 │ 1 │ ['shardOne'] │ ['graphite_reverse'] │
│ graphite_rollup │ │ avg │ 0 │ 10 │ 65535 │ 1 │ ['shardOne'] │ ['graphite_reverse'] │
└─────────────────┴────────┴──────────┴──────────┴───────────┴──────────┴────────────┴─────────────────┴──────────────────────┘
Запрос, который делает graphite-clickhouse, чтобы эти правила получить
SELECT regexp, function, age, precision, is_default FROM system.graphite_retentions ARRAY JOIN Tables AS table WHERE (table.database = 'default') AND (table.table = 'graphite_reverse') ORDER BY is_default ASC, priority ASC, regexp ASC, age ASC
Конфиг
cat /etc/graphite-clickhouse/graphite-clickhouse.conf
[common]
listen = ":9090"
max-cpu = 8
[clickhouse]
url = "http://localhost:8123/?max_query_size=2097152&readonly=2"
data-table = ""
index-table = "graphite_index"
rollup-conf = "auto"
data-timeout = "1m0s"
index-timeout = "1m0s"
tagged-table = "graphite_tagged"
[[data-table]]
table = "graphite_reverse"
reverse = true
rollup-conf = "auto"
[[logging]]
logger = ""
file = "/var/log/graphite-clickhouse/graphite-clickhouse.log"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
При настройке rollup-conf = "auto" точки все равно выбирались раз в одну минуту, то есть с дефолтными настройками retention. Происходит это из-за того, что выборка правил происходит по той таблице, которая настроена в конфиге graphite-clickhouse, но если она распределенная, то её упоминания с system.retentions не будет
Hi,
it would be awesome if graphite-clickhouse could filter series based on an http header (or something similar). What I'm thinking about is basically:
[12345]
blacklisted-prefixes = ["*"]
whitelisted-prefixes = ["foo", "bar", "carbon"]
With such a configuration graphite-clickhouse should just behave like always, but limit all its output to the metrics matching the whitelisted (or not blacklisted) prefixes.
Thanks for considering,
Bernd
Query like this
seriesByTag('type_instance=~nonpaged|active|used|wired')
incorrectly translating into
Tag1 LIKE 'type\\\\_instance=nonpaged%' AND match(Tag1, 'type_instance=nonpaged|active|used|wired')
and simply won't work as expected because first value from alternate activates in LIKE
and effectively blocking any other alternate. Seems like source of the problem lies in https://github.com/lomik/graphite-clickhouse/blob/master/pkg/where/where.go#L31 . It should be something like:
Tag1 LIKE 'type\\\\_instance=%' AND match(Tag1, 'type_instance=nonpaged|active|used|wired')
And even there regex is incorrect, it matches type_instance=nonpaged
or active
or used
or wired
. Wrapping regex value into (?:
and )
will produce sane results.
As a temporary workaround I'm wrapping right side of match expression into (?:
and )
myself and everything works as expected. For example:
seriesByTag('plugin=memory', 'type_instance=~(?:nonpaged|active|used|wired)')
Hi,
as I mentioned in #67 (comment) would it be possible to make bufio.Scanner a config option and print out a clear log message if the size is too small ? I have increased the size again to look backwards more than 75 days.
@ihard where have you seen this error message ? I haven't seen it. Have to find out the hard way.
No idea what the right fixed size could be. Maybe someone would like to look backwards some years.
So make it a config option would be nice if it's possible. I'm really not a developer so I can't do it by myself
.
Ralph
Please document the usage of the 'tag' table and config files.
Hi,
We are using carbon-clickhouse as remote storage for prometheus, with tags (graphite + graphite_tagged tables). Data is sent every 15s.
graphite-clickhouse does not return data for series less than 1 hour ago with the following rollup config:
<graphite_rollup>
<pattern>
<regexp>nginx_http_stats_requests</regexp>
<function>sum</function>
<retention>
<age>3600</age>
<precision>60</precision>
</retention>
</pattern>
<default>
<function>any</function>
<retention>
<age>0</age>
<precision>15</precision>
</retention>
</default>
</graphite_rollup>
grafana request to graphite-web:
aliasByTags(groupByTags(perSecond(seriesByTag('name=nginx_http_stats_requests_count', 'region=~${region:regex}', 'job=~$role', 'http_host=~$http_host','http_code=~$http_code')), 'sum', 'hostname','http_code'), 'hostname', 'http_code')
When requesting data for the last 3 hours, it returns as expected, graph is populated up to last minute.
When requesting last 30 minutes, all datapoints are null.
Hi,
In prometheus grafana data source one could get available tag values with the following function:
label_values(metric_name{label1=~"name.*",label2=~"value.*"},hostname)
This will return all hostname label values where label1 and label2 values matches regex expression
http://docs.grafana.org/features/datasources/prometheus/#query-variable
Is it possible to achieve same functionality with graphite datasource and graphite-clickhouse?
Hello. Here I'm trying to provide the changelog. If it fits requirements, I'll appreciate so much the new release
Changes since v.0.11.7
## Features
- Add memory-return-interval option
- Decrease amount of transferred data with an aggregation of values and timestamps by Path
- Increase `scanner.Buffer` to fix `bufio.Scanner: token too long`
- Add `noprom` tag to make prometheus dependency optional
- Upload packages to https://packagecloud.io/go-graphite/
## Bugfix
- Fix metric finder content type for `protobuf`
- Remove unused escape functions
- Sort labels with __name__
- Allow using `graphite..inner.data` as rollup table
- Fix broken deb-compression fpm argument
Hi,
I have set up following rollup:
<graphite_rollup>
<default>
<function>avg</function>
<retention>
<age>0</age>
<precision>1</precision>
</retention>
<retention>
<age>86400</age>
<precision>60</precision>
</retention>
<retention>
<age>63072000</age>
<precision>86400</precision>
</retention>
</default>
</graphite_rollup>
And got expected results for first interval (30 min) but after got null values between actual values returned from graphite (see screenshot).
Am I misconfigured it?
Hi,
right now the current version from master does not work at all for me. Looking at the log it seems that the query in graphite_tree has an extra . at the end which should not be there.
[2017-07-12 12:58:41] I clickhouse.go:71: query {"request_id":"2","query":"SELECT Path FROM graphite_tree WHERE (Level = 5) AND (Path = 'carbon.agents.graphitedev002.pickle.metricsReceived' OR Path = 'carbon.agents.graphitedev002.pickle.metricsReceived.') GROUP BY Path HAVING argMax(Deleted, Version)==0","runtime_ns":6432321,"runtime":"6.432321ms"}
[2017-07-12 12:58:41] I graphite-clickhouse.go:74: access {"request_id":"2","runtime":"6.548465ms","runtime_ns":6548465,"method":"GET","url":"/metrics/find/?local=1&format=pickle&query=carbon.agents.graphitedev002.pickle.metricsReceived&from=1499770721&until=1499857121","peer":"127.0.0.1:48454","status":200}
[2017-07-12 12:58:41] I clickhouse.go:71: query {"request_id":"6","query":"SELECT Path FROM graphite_tree WHERE (Level = 5) AND (Path = 'carbon.agents.graphitedev002.pickle.metricsReceived' OR Path = 'carbon.agents.graphitedev002.pickle.metricsReceived.') GROUP BY Path HAVING argMax(Deleted, Version)==0","runtime_ns":4658751,"runtime":"4.658751ms"}
[2017-07-12 12:58:41] I clickhouse.go:71: query {"request_id":"6","query":" SELECT Path, Time, Value, Timestamp FROM graphite WHERE (Path IN ('carbon.agents.graphitedev002.pickle.metricsReceived')) AND ((Date >='2017-07-11' AND Date <= '2017-07-12' AND Time >= 1499770721 AND Time <= 1499857139)) FORMAT RowBinary ","runtime_ns":11681202,"runtime":"11.681202ms"}
[2017-07-12 12:58:41] I graphite-clickhouse.go:74: access {"request_id":"6","runtime":"19.191811ms","runtime_ns":19191811,"method":"GET","url":"/render/?format=pickle&local=1&noCache=1&from=1499770721&until=1499857121&target=carbon.agents.graphitedev002.pickle.metricsReceived&now=1499857121","peer":"127.0.0.1:48454","status":200}
I've bisected it down to
117efa3b9d07125d57c3bc6f13f0d2ab751597bd is the first bad commit
commit 117efa3b9d07125d57c3bc6f13f0d2ab751597bd
Author: Roman Lomonosov <[email protected]>
Date: Mon May 15 21:32:19 2017 +0300
update zap
:100644 100644 f9fc933eb6a2a06169461103f3c1b53ceb980699 7bd0e607ddec34a4c97039b72eb8d2b0e069fbda M .gitmodules
:040000 040000 54d57556519c0c07f3aa0ca107929341d9e6e375 37e986474ae52d2c816b5640b7a7a4f24200bd47 M config
:100644 100644 3e5506c35e38055699a0f7ba69931caebcae90af cd4d14d048048e943ba3e6e3b469999349e07a0c M graphite-clickhouse.go
:040000 040000 e55d6aed02ce35347137a5c8fec2321c56cfed98 2196e146f095e058191e10134267c9913d188218 M helper
:040000 040000 52be63571bafd8ae7b25bcc29433defcc71c7fa5 1f74a7862fcb8534276b1f93aaaeaafd1c513f89 M render
:040000 040000 3b6a0ce99138ea284590349d0aa33f66d8017ef4 83b6cabd5f099cd38d9f514c69bb3cad52605254 M tagger
:040000 040000 52ad2c251dd4e077bfb886dd5ecf107af53c75b5 af4377c27303521a02d6d4cc2a4a859bbbee958f M vendor
Right now I don't have the time to debug this further, but maybe later.
Hi,
I'm trying to set up carbon-clickhouse, ClickHouse and graphite-clickhouse with replicated ClickHouse tables. Now my question is, if using carbonapi, should I configure all servers as endpoint, or only 1 (e.g. with VIP) and switch to another when the one is down?
I don't know if there is any other place (forum, chat, .. ) to ask questions, so trying here.
Hello. From v19.2 ClickHouse supports the setting to kill a query after http client was disconnected ClickHouse/ClickHouse@39f8eb5 + ClickHouse/ClickHouse#4213
What do you think about use it by default?
I posted an issue to the carbon-clickhouse repo some time ago asking how to delete specific metrics.
I ended up finding a way using some SQL queries, but I think this project would greatly benefit from another tool to manage the metrics as a whole.
In the traditional Graphite use, you can rename and delete metrics easily by just renaming or deleting folders/files on disk, with the changes being almost instantly visible in any front-end.
With CH as the backend, you'd need to figure out how to do very specific SQL queries to the same thing.
Would it be possible to create a way to do this? I'm thinking either a standalone tool that talks directly to the CH instance, or maybe a REST API that works via graphite-clickhouse on a new URL (/metrics
, or /api/metrics
), which would allow for some typical management tasks.
Hello!
I'm running some benchmarks between graphite-clickhouse and go-carbon.
One of the test cases is a query that returns around 3k metrics with 114 millions data points, for which go-carbon is considerably faster. I can share more details about the setup and the test if that helps.
I had a look at the code and was wondering why the sorting is implemented in graphite-clickhouse instead of relying on ORDER BY (Path,Time) in Clickhouse.
[2019-08-06T07:59:31.195Z] INFO render {"request_id": "c1c74d80768cc1363a839879dad61a36", "read_bytes": 14907614050, "read_points": 113399566}
[2019-08-06T07:59:31.196Z] DEBUG parse {"request_id": "c1c74d80768cc1363a839879dad61a36", "runtime": "1m11.01137844s", "runtime_ns": 71.01137844}
[2019-08-06T08:00:08.777Z] DEBUG sort {"request_id": "c1c74d80768cc1363a839879dad61a36", "runtime": "37.581330127s", "runtime_ns": 37.581330127}
[2019-08-06T08:00:22.976Z] DEBUG reply {"request_id": "c1c74d80768cc1363a839879dad61a36", "runtime": "12.760293752s", "runtime_ns": 12.760293752}
Have you also considered parallelizing the parse?
I'm basically looking into improving overall run time for heavy queries.
Thanks!
Hello again. I've done some experiment and got that current SELECT is very ineffective for a big set of data. Here results:
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, Time, Value, Timestamp FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data) FORMAT RowBinary" --external --file=metrics --types=String | wc
188699 1777390 2455102113
real 0m48.303s
user 0m45.364s
sys 0m2.240s
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, Time, Value, Timestamp FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data) FORMAT Null" --external --file=metrics --types=String | wc
0 0 0
real 0m13.941s
user 0m1.396s
sys 0m0.464s
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, Time, Value, Timestamp FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data)" --external --file=metrics --types=String | wc
24261245 97044980 2683087586
real 0m43.619s
user 0m43.384s
sys 0m2.388s
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, groupArray(Time), groupArray(Value), groupArray(Timestamp) FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data) GROUP BY Path FORMAT RowBinary" --external --file=metrics --types=String | wc
188739 1779468 387864252
real 0m28.289s
user 0m14.300s
sys 0m0.576s
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, groupArray(Time), groupArray(Value), groupArray(Timestamp) FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data) GROUP BY Path FORMAT Null" --external --file=metrics --types=String | wc
0 0 0
real 0m14.882s
user 0m0.200s
sys 0m0.072s
>>> time clickhouse-client --optimize_throw_if_noop 1 -d graphite --log_queries 0 -q "SELECT Path, groupArray(Time), groupArray(Value), groupArray(Timestamp) FROM graphite.carbon PREWHERE (Date >= '2019-08-29') AND (Date <= '2019-08-29') WHERE (Path IN _data) GROUP BY Path" --external --file=metrics --types=String | wc
25792 103168 597917411
real 0m24.955s
user 0m10.348s
sys 0m0.576s
arrayGroup
has a tiny overhead for a calculation, but it more than two times faster for the transfer of the reference data. I've put the TSV format for an idea of the total amount of points.
Do you mind if I try to implement arrayGroup
in DataParse?
Is there a way to filter the metric name list so that I can exclude metrics that have not been inserted in the past few days? I can see in the RU wiki page there is detail about the old index_tree table where you could set Deleted = 1
but can't seem to find any documentation about how to do this currently?
On large results /metric/find set incorrect content type in header
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
instead of
Content-Type: application/octet-stream
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.