Giter VIP home page Giter VIP logo

Comments (11)

toni-moreno avatar toni-moreno commented on June 19, 2024 1

Hi @zhangxin511 , you should keep in mind the proposed architecture layout.

HA_architecture_with_syncflux

As you could see , there are:

  • 2 influxdb-srelay instances
  • 2 syncflux instances

Working on both nodes

In this layout with the dbbackend names in influxdb-srelay.conf and the influxdb name in the syncflux.conf shoud be the same.

Supose the rwha-sample. influxdb-srelay.conf

influxdb-srelay.conf ( in myinfluxdb01_server )

...
...
[[influxdb]]
  name = "myinfluxdb01"
  location = "http://myinfluxdb01_server:8086/"
  timeout = "10s"


[[influxdb]]
  name = "myinfluxdb02"
  location = "http://myinfluxdb02_server:8086/"
  timeout = "10s"

[[influxcluster]]
  # name = cluster id for route configs and logs
  name  = "ha_cluster"
  # members = array of influxdb backends
  members = ["myinfluxdb01","myinfluxdb02"]
  log-file = "ha_cluster.log"
  log-level = "info"
  type = "HA"
  query-router-endpoint-api = ["http://myinfluxdb01_server:4090/api/queryactive","http://myinfluxdb02_server:4090/api/queryactive"]
..
...

influxdb-srelay.conf ( in myinfluxdb02_server )

...
...
[[influxdb]]
  name = "myinfluxdb01"
  location = "http://myinfluxdb01_server:8086/"
  timeout = "10s"


[[influxdb]]
  name = "myinfluxdb02"
  location = "http://myinfluxdb02_server:8086/"
  timeout = "10s"

[[influxcluster]]
  # name = cluster id for route configs and logs
  name  = "ha_cluster"
  # members = array of influxdb backends
  members = ["myinfluxdb02","myinfluxdb01"] 
  log-file = "ha_cluster.log"
  log-level = "info"
  type = "HA"
  query-router-endpoint-api = ["http://myinfluxdb02_server:4090/api/queryactive","http://myinfluxdb01_server:4090/api/queryactive"]
...
...

Only changes the members and query-router-endpoint-api order to query first its own syncflux

syncflux.conf (on myinfluxdb01_server )

 master-db = "myinfluxdb01"
 slave-db = "myinfluxdb02"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb01"
 location = "http://myinfluxdb01_server:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb02"
 location = "http://myinfluxdb02_server:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

syncflux.conf (on myinfluxdb02_server )

Only swaps mater and slave values

 master-db = "myinfluxdb02"
 slave-db = "myinfluxdb01"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb01"
 location = "http://myinfluxdb01_server:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb02"
 location = "http://myinfluxdb02_server:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

About your questions:

  1. As in the previous example ( let me know if more doubts on that issue)
  2. Right now , there is no way to have more than one (as this one is also a very young project) , but if needed won't be difficult to add this feature.
    3,4) Config as in the layout and example. If you have more questions about how to config and possible errors please open and specific issue in the https://github.com/toni-moreno/syncflux issue manager.

I hope you can understand how smart-relay and syncflux can work together to build a better HA solution when we can not run a InfluxDB Enterprise cluster.

Any other question?

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

Appreciated for you detailed info. I tried your approach, since I am not sure where the HA load Balance coming from I made only one srelay instance but keeps the others as you suggested, but still can't make data in sync when one node is down.
This is what I have done:

  1. docker-compose up
  2. curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE mydb"
  3. curl -XPOST http://localhost:8087/query --data-urlencode "q=CREATE DATABASE mydb" (haven't try to turn on the admin on influx to use your admin endpoint)
  4. Baseline, curl -i -XPOST "http://127.0.0.1:9096/write?db=mydb" --data-binary "cpu_load_short,host=server01,region=us-west value=0.64 1434055561000000000", both database backend got the data 2015-06-11T20:46:01Z server01 us-west 0.64
  5. stop influx-a(running on 8086), docker-compose stop influx-a
  6. Try insert data while a is down, curl -i -XPOST "http://127.0.0.1:9096/write?db=mydb" --data-binary "cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000", which is 2015-06-11T20:46:02Z server01 us-west 0.64
  7. Start influx-a again docker-compose start influx-a
  8. Wait, then check database, the 2015-06-11T20:46:02Z server01 us-west 0.64 never synced back to a, while b have the two data entries.

Here is my setup:

docker-compose.yml

version: '3.7'
services:
  influx-a:
    image: influxdb:1.7
    ports:
      - 8086:8086
    volumes:
      - C:/Docker/InfluxHA/Influxdb/a:/var/lib/influxdb
  influx-b:
    image: influxdb:1.7
    ports: 
      - 8087:8086
    volumes:
      - C:/Docker/InfluxHA/Influxdb/b:/var/lib/influxdb
  influx-relay:
    image: tonimoreno/influxdb-srelay:latest
    ports:
      - 9096:9096
    links:
      - influx-a
      - influx-b
      - sync-flux-a
      - sync-flux-b
    volumes:
      - C:/Docker/InfluxHA/Influx-srelay/conf/influxdb-srelay.conf:/etc/influxdb-srelay/influxdb-srelay.conf
      - C:/Docker/InfluxHA/Influx-srelay/log/:/var/log/
  sync-flux-a:
    image: tonimoreno/syncflux
    ports:
      - 4090:4090
    links:
      - influx-a
      - influx-b
    volumes:
      - C:/Docker/InfluxHA/Sync-flux/a/conf/:/opt/syncflux/conf/
      - C:/Docker/InfluxHA/Sync-flux/a/log/:/opt/syncflux/log/
  sync-flux-b:
    image: tonimoreno/syncflux
    ports:
      - 4091:4090
    links:
      - influx-a
      - influx-b
    volumes:
      - C:/Docker/InfluxHA/Sync-flux/b/conf/:/opt/syncflux/conf/
      - C:/Docker/InfluxHA/Sync-flux/b/log/:/opt/syncflux/log/  

The configuration file and folder structure are attached. I am sorry to bother you like this, but could you take a look and let me know what went wrong? Much appreciated!
InfluxHA.zip

from influxdb-srelay.

toni-moreno avatar toni-moreno commented on June 19, 2024

Hi @zhangxin511 I will check your config ASAP

from influxdb-srelay.

toni-moreno avatar toni-moreno commented on June 19, 2024

Hi @zhangxin511 first thing I've detected is in your syncflux.toml config.

db names should be the same in both engines srelay and syncflux config

influxdb-srelay.conf

  [[influxdb]]
    name = "myinfluxdb01"
    location = "http://influx-a:8086/"
    timeout = "10s"
  
  [[influxdb]]
    name = "myinfluxdb02"
    location = "http://influx-b:8086/"
    timeout = "10s"

syncflux-a.toml

master-db = "myinfluxdb01"
slave-db = "myinfluxdb02"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb01"
 location = "http://influx-a:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb02"
 location = "http://influx-b:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

syncflux-b.toml

master-db = "myinfluxdb02"
slave-db = "myinfluxdb01"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb01"
 location = "http://influx-a:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

[[influxdb]]
 release = "1x"
 name = "myinfluxdb02"
 location = "http://influx-b:8086/"
 admin-user = "admin"
 admin-passwd = "admin"
 timeout = "10s"

Could you fix these config files and test again please?

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

@toni-moreno I was not able to make recover works by using your suggestions only. But after change the docker-compose file from using tonimoreno/syncflux to tonimoreno/syncflux:latest data got synced! looks like I was using some old version of syncflux which might not have recover feature(I was able to see partial of the recovery logs but not all)?. Anyway it is finally working for me now. I will do more performance testing and keep you posted.

I understand this is still in the early development phase, a suggestion based on my issue: it looks like srelay and syncflux are tightly related, maybe consider to merge these two repos to one repo, or break the naming retraction?

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

Sorry, too early, it looks like the data recovery is not ALWAYS working for my case , I actually get only one good replication so far and others all failed.

I do see the syncflux was trying to recover data based on the log. When the recovery not recovering data shows:

time="2019-05-29 18:45:42" level=info msg="HACluster check...."
time="2019-05-29 18:45:42" level=info msg="HACLuster: detected UP Last(2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701) Duratio OK (9.9980293s) RECOVERING"
time="2019-05-29 18:45:42" level=info msg="HACLUSTER: INIT RECOVERY : FROM [ 2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 ] TO [ 2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 ]"
time="2019-05-29 18:45:42" level=info msg="Replicating Data from DB mydb RP autogen..."
time="2019-05-29 18:45:42" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[]}] From:2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 To:2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 | Duration: 59.9997968s || #chunks: 1  | chunk Duration 1h0m0s "
time="2019-05-29 18:45:42" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb01  OK (Version  1.7.6 : Duration 1.454ms )"
time="2019-05-29 18:45:42" level=info msg="Processed Chunk [1/1](100%) from [1559151932][2019-05-29 17:45:32 +0000 UTC] to [1559155532][2019-05-29 18:45:32 +0000 UTC] (0) Points Took [9.7µs]"
time="2019-05-29 18:45:42" level=info msg="Processed DB data from myinfluxdb02[mydb|autogen] to myinfluxdb01[mydb|autogen] has done  #Points (0)  Took [798.4µs] !\n"
time="2019-05-29 18:45:42" level=info msg="HACLUSTER: DATA SYNCRONIZATION Took 1.9629ms"

While only one time there was a good recover, which give this output:

time="2019-05-29 18:23:34" level=info msg="HACluster check...."
time="2019-05-29 18:23:34" level=info msg="HACLuster: detected UP Last(2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101) Duratio OK (9.9964807s) RECOVERING"
time="2019-05-29 18:23:34" level=info msg="HACLUSTER: INIT RECOVERY : FROM [ 2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 ] TO [ 2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 ]"
time="2019-05-29 18:23:34" level=info msg="Replicating Data from DB mydb RP autogen..."
time="2019-05-29 18:23:34" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[cpu_load_short:%!s(*agent.MeasurementSch=&{cpu_load_short map[value:0xc000276600]})]}] From:2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 To:2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 | Duration: 50.0005789s || #chunks: 1  | chunk Duration 1h0m0s "
time="2019-05-29 18:23:34" level=debug msg="processing Database mydb Measurement cpu_load_short from 1559150604 to 1559154204"
time="2019-05-29 18:23:34" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb01  OK (Version  1.7.6 : Duration 1.6828ms )"
time="2019-05-29 18:23:34" level=info msg="InfluxMonitor: InfluxDB : myinfluxdb02  OK (Version  1.7.6 : Duration 2.414ms )"
time="2019-05-29 18:23:34" level=debug msg="Query [select * from  \"cpu_load_short\" where time  > 1559150604s and time < 1559154204s group by *] took 1.5465ms "
time="2019-05-29 18:23:34" level=debug msg="processed 4 points"
time="2019-05-29 18:23:34" level=debug msg="Write attempt [1] took 3.3763ms "
time="2019-05-29 18:23:34" level=info msg="Processed Chunk [1/1](100%) from [1559150604][2019-05-29 17:23:24 +0000 UTC] to [1559154204][2019-05-29 18:23:24 +0000 UTC] (4) Points Took [6.3572ms]"
time="2019-05-29 18:23:34" level=info msg="Processed DB data from myinfluxdb02[mydb|autogen] to myinfluxdb01[mydb|autogen] has done  #Points (4)  Took [6.6578ms] !\n"
time="2019-05-29 18:23:34" level=info msg="HACLUSTER: DATA SYNCRONIZATION Took 8.6714ms"

It looks like this block of code is not always executed, which I have no idea how: https://github.com/toni-moreno/syncflux/blob/6627a8281cd93305f9315b6b6be325f4cdbd0dbb/pkg/agent/client.go#L594-L615

from influxdb-srelay.

toni-moreno avatar toni-moreno commented on June 19, 2024

Hi @zhangxin511 my workmate @sbengo will review your case ASAP

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

Thank you @toni-moreno fpr your continuously helps! Let me know if you need anything else @sbengo

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

@sbengo @toni-moreno I figured out partially why my data was not recovered:

  1. Whenever the log processing Database mydb Measurement cpu_load_short appears, it means it is a "good" recover states for me. But even in the good state, there are two issues for the logic of syncflux getvalues := fmt.Sprintf("select * from \"%v\" where time > %vs and time < %vs group by *", m, startsec, endsec) [here] (https://github.com/toni-moreno/syncflux/blob/6627a8281cd93305f9315b6b6be325f4cdbd0dbb/pkg/agent/client.go#L602):
  • If I insert data without specify a time, influx will use its current utc time to add to the time tag key. The logic above will find the missing data during the node down and backfill when the node alive again. However, due to the time difference the influx server may have, and when the syncflux detects when the cluster went down, it may add duplicate records. For example, the value 2 got inserted before node went down, and after node recovered, syncflux think it should add this value again and got duplicated
    image
  • For us, most data we inserted has an previous timestamp due to our delayed/scheduled/batch job. Say a node was down at time 8, and at time 10 we insert an entry happened at time 3, then the broken node recovered at time 11. The entry will not be synced at all due to above logic.
  1. I still don't know why some times the processing Database mydb Measurement cpu_load_short log never appear and put the recovery in a "bad" state.

from influxdb-srelay.

sbengo avatar sbengo commented on June 19, 2024

Hi @zhangxin511 , thanks for the info and sorry for the late response!

When the Syncflux is initiated it gets info about available databases, rps and measurements attached to them (a.k.a schema), and currently it never refresh it (only on init).

As I can see on your logs, on the failing case the schema seems to be empty so it won't iterate over the measurements (on linked function)

Bad case:

...
time="2019-05-29 18:45:42" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[]}] From:2019-05-29 18:44:32.1545497 +0000 UTC m=+480.024959901 To:2019-05-29 18:45:32.1542501 +0000 UTC m=+540.024756701 | Duration: 59.9997968s || #chunks: 1  | chunk Duration 1h0m0s "
...

Working case:

time="2019-05-29 18:23:34" level=debug msg="SYNC-DB-RP[mydb|&{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[cpu_load_short:%!s(*agent.MeasurementSch=&{cpu_load_short map[value:0xc000276600]})]}] From:2019-05-29 18:22:34.5799461 +0000 UTC m=+230.038399201 To:2019-05-29 18:23:24.5806122 +0000 UTC m=+280.038978101 | Duration: 50.0005789s || #chunks: 1  | chunk Duration 1h0m0s "

Review

I think its related with schema creation (if there were no data, the schema would be empty: only db, rp was stored). So:

  • There were data on your DB when you init (up) srelay + syncflux stack?

@toni-moreno opened an issue (I think it was before your comment!) asking for a reload schema toni-moreno/syncflux#16 . We have discussed it and we think we will add this feature on next days, so the schema will be always reloaded before the sync data process.


About timing issues/feature, we will keep discussing , but we currently doesn't support those cases

Thanks,
Regards!

from influxdb-srelay.

zhangxin511 avatar zhangxin511 commented on June 19, 2024

@sbengo Thank you for your detailed response.
Yes I created DB AFTER started the syncflux, therefore syncflux won't know the DB. The change you mentioned makes sense.

It would be great to backfill data based on when the data was inserted instead of based on pure the time tag, because a lot of influxDB data are inserted by scheduled JOBs instead of real-time insertion.

Lastly, I think the syncflux takes time to start, there is a noticeable delay. Hope you can take a look at.

With these being said, I have a full srelay setup and working as your specific, I will close this issue now. Appreciate all your helps @toni-moreno and @sbengo

from influxdb-srelay.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.