Comments (11)
Hi @zhangxin511 , you should keep in mind the proposed architecture layout.
As you could see , there are:
- 2 influxdb-srelay instances
- 2 syncflux instances
Working on both nodes
In this layout with the dbbackend names in influxdb-srelay.conf and the influxdb name in the syncflux.conf shoud be the same.
Supose the rwha-sample. influxdb-srelay.conf
influxdb-srelay.conf ( in myinfluxdb01_server )
...
...
[[influxdb]]
name = "myinfluxdb01"
location = "http://myinfluxdb01_server:8086/"
timeout = "10s"
[[influxdb]]
name = "myinfluxdb02"
location = "http://myinfluxdb02_server:8086/"
timeout = "10s"
[[influxcluster]]
# name = cluster id for route configs and logs
name = "ha_cluster"
# members = array of influxdb backends
members = ["myinfluxdb01","myinfluxdb02"]
log-file = "ha_cluster.log"
log-level = "info"
type = "HA"
query-router-endpoint-api = ["http://myinfluxdb01_server:4090/api/queryactive","http://myinfluxdb02_server:4090/api/queryactive"]
..
...
influxdb-srelay.conf ( in myinfluxdb02_server )
...
...
[[influxdb]]
name = "myinfluxdb01"
location = "http://myinfluxdb01_server:8086/"
timeout = "10s"
[[influxdb]]
name = "myinfluxdb02"
location = "http://myinfluxdb02_server:8086/"
timeout = "10s"
[[influxcluster]]
# name = cluster id for route configs and logs
name = "ha_cluster"
# members = array of influxdb backends
members = ["myinfluxdb02","myinfluxdb01"]
log-file = "ha_cluster.log"
log-level = "info"
type = "HA"
query-router-endpoint-api = ["http://myinfluxdb02_server:4090/api/queryactive","http://myinfluxdb01_server:4090/api/queryactive"]
...
...
Only changes the members and query-router-endpoint-api order to query first its own syncflux
syncflux.conf (on myinfluxdb01_server )
master-db = "myinfluxdb01"
slave-db = "myinfluxdb02"
[[influxdb]]
release = "1x"
name = "myinfluxdb01"
location = "http://myinfluxdb01_server:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
[[influxdb]]
release = "1x"
name = "myinfluxdb02"
location = "http://myinfluxdb02_server:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
syncflux.conf (on myinfluxdb02_server )
Only swaps mater and slave values
master-db = "myinfluxdb02"
slave-db = "myinfluxdb01"
[[influxdb]]
release = "1x"
name = "myinfluxdb01"
location = "http://myinfluxdb01_server:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
[[influxdb]]
release = "1x"
name = "myinfluxdb02"
location = "http://myinfluxdb02_server:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
About your questions:
- As in the previous example ( let me know if more doubts on that issue)
- Right now , there is no way to have more than one (as this one is also a very young project) , but if needed won't be difficult to add this feature.
3,4) Config as in the layout and example. If you have more questions about how to config and possible errors please open and specific issue in the https://github.com/toni-moreno/syncflux issue manager.
I hope you can understand how smart-relay and syncflux can work together to build a better HA solution when we can not run a InfluxDB Enterprise cluster.
Any other question?
from influxdb-srelay.
Appreciated for you detailed info. I tried your approach, since I am not sure where the HA load Balance coming from I made only one srelay instance but keeps the others as you suggested, but still can't make data in sync when one node is down.
This is what I have done:
docker-compose up
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE mydb"
curl -XPOST http://localhost:8087/query --data-urlencode "q=CREATE DATABASE mydb"
(haven't try to turn on the admin on influx to use your admin endpoint)- Baseline,
curl -i -XPOST "http://127.0.0.1:9096/write?db=mydb" --data-binary "cpu_load_short,host=server01,region=us-west value=0.64 1434055561000000000"
, both database backend got the data2015-06-11T20:46:01Z server01 us-west 0.64
- stop influx-a(running on 8086),
docker-compose stop influx-a
- Try insert data while a is down,
curl -i -XPOST "http://127.0.0.1:9096/write?db=mydb" --data-binary "cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000"
, which is2015-06-11T20:46:02Z server01 us-west 0.64
- Start influx-a again
docker-compose start influx-a
- Wait, then check database, the
2015-06-11T20:46:02Z server01 us-west 0.64
never synced back to a, while b have the two data entries.
Here is my setup:
docker-compose.yml
version: '3.7'
services:
influx-a:
image: influxdb:1.7
ports:
- 8086:8086
volumes:
- C:/Docker/InfluxHA/Influxdb/a:/var/lib/influxdb
influx-b:
image: influxdb:1.7
ports:
- 8087:8086
volumes:
- C:/Docker/InfluxHA/Influxdb/b:/var/lib/influxdb
influx-relay:
image: tonimoreno/influxdb-srelay:latest
ports:
- 9096:9096
links:
- influx-a
- influx-b
- sync-flux-a
- sync-flux-b
volumes:
- C:/Docker/InfluxHA/Influx-srelay/conf/influxdb-srelay.conf:/etc/influxdb-srelay/influxdb-srelay.conf
- C:/Docker/InfluxHA/Influx-srelay/log/:/var/log/
sync-flux-a:
image: tonimoreno/syncflux
ports:
- 4090:4090
links:
- influx-a
- influx-b
volumes:
- C:/Docker/InfluxHA/Sync-flux/a/conf/:/opt/syncflux/conf/
- C:/Docker/InfluxHA/Sync-flux/a/log/:/opt/syncflux/log/
sync-flux-b:
image: tonimoreno/syncflux
ports:
- 4091:4090
links:
- influx-a
- influx-b
volumes:
- C:/Docker/InfluxHA/Sync-flux/b/conf/:/opt/syncflux/conf/
- C:/Docker/InfluxHA/Sync-flux/b/log/:/opt/syncflux/log/
The configuration file and folder structure are attached. I am sorry to bother you like this, but could you take a look and let me know what went wrong? Much appreciated!
InfluxHA.zip
from influxdb-srelay.
Hi @zhangxin511 I will check your config ASAP
from influxdb-srelay.
Hi @zhangxin511 first thing I've detected is in your syncflux.toml config.
db names should be the same in both engines srelay and syncflux config
influxdb-srelay.conf
[[influxdb]]
name = "myinfluxdb01"
location = "http://influx-a:8086/"
timeout = "10s"
[[influxdb]]
name = "myinfluxdb02"
location = "http://influx-b:8086/"
timeout = "10s"
syncflux-a.toml
master-db = "myinfluxdb01"
slave-db = "myinfluxdb02"
[[influxdb]]
release = "1x"
name = "myinfluxdb01"
location = "http://influx-a:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
[[influxdb]]
release = "1x"
name = "myinfluxdb02"
location = "http://influx-b:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
syncflux-b.toml
master-db = "myinfluxdb02"
slave-db = "myinfluxdb01"
[[influxdb]]
release = "1x"
name = "myinfluxdb01"
location = "http://influx-a:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
[[influxdb]]
release = "1x"
name = "myinfluxdb02"
location = "http://influx-b:8086/"
admin-user = "admin"
admin-passwd = "admin"
timeout = "10s"
Could you fix these config files and test again please?
from influxdb-srelay.
@toni-moreno I was not able to make recover works by using your suggestions only. But after change the docker-compose file from using tonimoreno/syncflux
to tonimoreno/syncflux:latest
data got synced! looks like I was using some old version of syncflux which might not have recover feature(I was able to see partial of the recovery logs but not all)?. Anyway it is finally working for me now. I will do more performance testing and keep you posted.
I understand this is still in the early development phase, a suggestion based on my issue: it looks like srelay
and syncflux
are tightly related, maybe consider to merge these two repos to one repo, or break the naming retraction?
from influxdb-srelay.
Sorry, too early, it looks like the data recovery is not ALWAYS working for my case , I actually get only one good replication so far and others all failed.
I do see the syncflux
was trying to recover data based on the log. When the recovery not recovering data shows:
time="2019-05-29 18:45:42" level=info msg="HACluster check...."
time="2019-05-29 18:45:42" level=info msg="HACLuster: detected UP Last(2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701) Duratio OK (9.9980293s) RECOVERING"
time="2019-05-29 18:45:42" level=info msg="HACLUSTER: INIT RECOVERY : FROM [ 2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 ] TO [ 2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 ]"
time="2019-05-29 18:45:42" level=info msg="Replicating Data from DB mydb RP autogen..."
time="2019-05-29 18:45:42" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[]}] From:2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 To:2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 | Duration: 59.9997968s || #chunks: 1 | chunk Duration 1h0m0s "
time="2019-05-29 18:45:42" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb01 OK (Version 1.7.6 : Duration 1.454ms )"
time="2019-05-29 18:45:42" level=info msg="Processed Chunk [1/1](100%) from [1559151932][2019-05-29 17:45:32 +0000 UTC] to [1559155532][2019-05-29 18:45:32 +0000 UTC] (0) Points Took [9.7µs]"
time="2019-05-29 18:45:42" level=info msg="Processed DB data from myinfluxdb02[mydb|autogen] to myinfluxdb01[mydb|autogen] has done #Points (0) Took [798.4µs] !\n"
time="2019-05-29 18:45:42" level=info msg="HACLUSTER: DATA SYNCRONIZATION Took 1.9629ms"
While only one time there was a good recover, which give this output:
time="2019-05-29 18:23:34" level=info msg="HACluster check...."
time="2019-05-29 18:23:34" level=info msg="HACLuster: detected UP Last(2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101) Duratio OK (9.9964807s) RECOVERING"
time="2019-05-29 18:23:34" level=info msg="HACLUSTER: INIT RECOVERY : FROM [ 2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 ] TO [ 2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 ]"
time="2019-05-29 18:23:34" level=info msg="Replicating Data from DB mydb RP autogen..."
time="2019-05-29 18:23:34" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[cpu_load_short:%!s(*agent.MeasurementSch=&{cpu_load_short map[value:0xc000276600]})]}] From:2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 To:2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 | Duration: 50.0005789s || #chunks: 1 | chunk Duration 1h0m0s "
time="2019-05-29 18:23:34" level=debug msg="processing Database mydb Measurement cpu_load_short from 1559150604 to 1559154204"
time="2019-05-29 18:23:34" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb01 OK (Version 1.7.6 : Duration 1.6828ms )"
time="2019-05-29 18:23:34" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb02 OK (Version 1.7.6 : Duration 2.414ms )"
time="2019-05-29 18:23:34" level=debug msg="Query [select * from \"cpu_load_short\" where time > 1559150604s and time < 1559154204s group by *] took 1.5465ms "
time="2019-05-29 18:23:34" level=debug msg="processed 4 points"
time="2019-05-29 18:23:34" level=debug msg="Write attempt [1] took 3.3763ms "
time="2019-05-29 18:23:34" level=info msg="Processed Chunk [1/1](100%) from [1559150604][2019-05-29 17:23:24 +0000 UTC] to [1559154204][2019-05-29 18:23:24 +0000 UTC] (4) Points Took [6.3572ms]"
time="2019-05-29 18:23:34" level=info msg="Processed DB data from myinfluxdb02[mydb|autogen] to myinfluxdb01[mydb|autogen] has done #Points (4) Took [6.6578ms] !\n"
time="2019-05-29 18:23:34" level=info msg="HACLUSTER: DATA SYNCRONIZATION Took 8.6714ms"
It looks like this block of code is not always executed, which I have no idea how: https://github.com/toni-moreno/syncflux/blob/6627a8281cd93305f9315b6b6be325f4cdbd0dbb/pkg/agent/client.go#L594-L615
from influxdb-srelay.
Hi @zhangxin511 my workmate @sbengo will review your case ASAP
from influxdb-srelay.
Thank you @toni-moreno fpr your continuously helps! Let me know if you need anything else @sbengo
from influxdb-srelay.
@sbengo @toni-moreno I figured out partially why my data was not recovered:
- Whenever the log
processing Database mydb Measurement cpu_load_short
appears, it means it is a "good" recover states for me. But even in the good state, there are two issues for the logic of syncfluxgetvalues := fmt.Sprintf("select * from \"%v\" where time > %vs and time < %vs group by *", m, startsec, endsec)
[here] (https://github.com/toni-moreno/syncflux/blob/6627a8281cd93305f9315b6b6be325f4cdbd0dbb/pkg/agent/client.go#L602):
- If I insert data without specify a time, influx will use its current utc time to add to the time tag key. The logic above will find the missing data during the node down and backfill when the node alive again. However, due to the time difference the influx server may have, and when the syncflux detects when the cluster went down, it may add duplicate records. For example, the value
2
got inserted before node went down, and after node recovered, syncflux think it should add this value again and got duplicated
- For us, most data we inserted has an previous timestamp due to our delayed/scheduled/batch job. Say a node was down at time 8, and at time 10 we insert an entry happened at time 3, then the broken node recovered at time 11. The entry will not be synced at all due to above logic.
- I still don't know why some times the
processing Database mydb Measurement cpu_load_short
log never appear and put the recovery in a "bad" state.
from influxdb-srelay.
Hi @zhangxin511 , thanks for the info and sorry for the late response!
When the Syncflux is initiated it gets info about available databases, rps and measurements attached to them (a.k.a schema), and currently it never refresh it (only on init).
As I can see on your logs, on the failing case the schema seems to be empty so it won't iterate over the measurements (on linked function)
Bad case:
...
time="2019-05-29 18:45:42" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[]}] From:2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 To:2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 | Duration: 59.9997968s || #chunks: 1 | chunk Duration 1h0m0s "
...
Working case:
time="2019-05-29 18:23:34" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[cpu_load_short:%!s(*agent.MeasurementSch=&{cpu_load_short map[value:0xc000276600]})]}] From:2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 To:2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 | Duration: 50.0005789s || #chunks: 1 | chunk Duration 1h0m0s "
Review
I think its related with schema creation (if there were no data, the schema would be empty: only db, rp was stored). So:
- There were data on your DB when you init (up) srelay + syncflux stack?
@toni-moreno opened an issue (I think it was before your comment!) asking for a reload schema toni-moreno/syncflux#16 . We have discussed it and we think we will add this feature on next days, so the schema will be always reloaded before the sync data process.
About timing issues/feature, we will keep discussing , but we currently doesn't support those cases
Thanks,
Regards!
from influxdb-srelay.
@sbengo Thank you for your detailed response.
Yes I created DB AFTER started the syncflux, therefore syncflux won't know the DB. The change you mentioned makes sense.
It would be great to backfill data based on when the data was inserted instead of based on pure the time tag, because a lot of influxDB data are inserted by scheduled JOBs instead of real-time insertion.
Lastly, I think the syncflux takes time to start, there is a noticeable delay. Hope you can take a look at.
With these being said, I have a full srelay setup and working as your specific, I will close this issue now. Appreciate all your helps @toni-moreno and @sbengo
from influxdb-srelay.
Related Issues (20)
- Error sometimes with http: multiple response.WriteHeader calls HOT 1
- Bad status shown in log on Single instance Clusters
- [BUG] Traceroute doesn't contains the backend name on Single Clusters Write Mode
- [Feature Request] Add Real Client IP when sourceIP is a proxy
- [Bug] queryRouterEndpointAPI not working, does not seem to be populated HOT 6
- access logs are piling up continuously HOT 2
- Content length 0 for second route.rule backend HOT 1
- Random Agent Panic
- Http access log is hard-coded to debug. Configuration log-level is useless HOT 18
- How to write data points HOT 6
- Panic while trying to read from second influxdb node while first is down. HOT 5
- Flux query with influxdb-srelay? HOT 6
- Influxdb-srelay and influxdb-python library HOT 1
- Add DROP SERIES in admin endpoint HOT 3
- srelay not compatible with influxdb native interfaces
- [Doc] Example of simple setup - srelay with Prometheus remote config
- No support for v2 write api HOT 3
- Recovering from buffer results in missing and double data HOT 6
- it might be useful to announce the v2 api support more explicitly HOT 2
- help on build config to route on measurements
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from influxdb-srelay.