Giter VIP home page Giter VIP logo

syncflux's Introduction

SyncFlux

SyncFlux is an Open Source InfluxDB Data syncronization and replication tool with HTTP API Interface which has as main goal recover lost data from any handmade HA influxDB 1.X cluster ( made as any simple relay https://github.com/influxdata/influxdb-relay or our Smart Relay http://github.com/toni-moreno/influxdb-srelay )

Intall from precompiled packages

Debian RedHat Docker
deb - signature rpm - signature docker run -d --name=syncflux_instance00 -p 4090:4090 -v /mylocal/conf:/opt/syncflux/conf -v /mylocal/log:/opt/syncflux/log tonimoreno/syncflux

All releases here.

releases

Run from master

If you want to build a package yourself, or contribute. Here is a guide for how to do that.

Dependencies

  • Go 1.11

Get Code

go get -d github.com/toni-moreno/syncflux/...

Building the backend

cd $GOPATH/src/github.com/toni-moreno/syncflux
go run build.go build           

Creating minimal package tar.gz

After building frontend and backend you will do

go run build.go pkg-min-tar

Creating rpm and deb packages

you will need previously installed the fpm/rpm and deb packaging tools. After building frontend and backend you will do.

go run build.go latest

Running first time

To execute without any configuration you need a minimal config.toml file on the conf directory.

cp conf/sample.syncflux.toml conf/syncflux.toml
./bin/syncflux [options]

Creating and running docker image

make -f Makefile.docker
docker run tonimoreno/syncflux:latest -version
docker run  tonimoreno/syncflux:latest -h
docker run  -p 4090:4090 -v /mylocal/conf:/opt/syncflux/conf -v /mylocal/log:/opt/syncflux/log tonimoreno/syncflux:latest [options]

Recompile backend on source change (only for developers)

To rebuild on source change (requires that you executed godep restore)

go get github.com/Unknwon/bra
bra run  

will init a change autodetect webserver with angular-cli (ng serve) and also a autodetect and recompile process with bra for the backend

Basic Usage

Execution parameters

Usage of ./bin/syncflux:
   -action: hamonitor(default),copy,fullcopy,replicaschema
    -chunk: set RW chuck periods as in the data-chuck-duration config param
   -config: config file
-copyorder: backward (most to least recent, default), forward (least to most recent)
       -db: set the db where to play
      -end: set the endtime do action (no valid in hamonitor) default now
     -full: copy full database or now()- max-retention-interval if greater retention policy
  -logmode: log mode [console/file] default console
     -logs: log directory (only apply if action=hamonitor and logmode=file)
   -master: choose master ID from all those in the config file where to get data (override the master-db parameter in the config file)
     -meas: set the meas where to play
    -newdb: set the db to work on
    -newrp: set the rp to work on
  -pidfile: path to pid file
       -rp: set the rp where to play
    -slave: choose master ID from all those in the config file where to write data (override the slave-db parameter in the config file)
    -start: set the starttime to do action (no valid in hamonitor) default now-24h
        -v: set log level to Info
  -version: display the version
       -vv: set log level to Debug
      -vvv: set log level to Trace

Set config file

# -*- toml -*-

# -------GENERAL SECTION ---------
# syncflux could work in several ways, 
# not all General config parameters works on all modes.
#  modes
#  "hamonitor" => enables syncflux as a daemon to sync 
#                2 Influx 1.X OSS db and sync data between them
#                when needed (does active monitoring )
#  "copy" => executes syncflux as a new process to copy data 
#            between master and slave databases
#  "replicashema" => executes syncflux as a new process to create 
#             the database/s and all its related retention policies 
#  "fullcopy" => does database/rp replication and after does a data copy

[General]
 # ------------------------
 # logdir ( only valid on hamonitor action) 
 #  the directory where to place logs 
 #  will place the main log "
 #  

 logdir = "./log"

 # ------------------------
 # loglevel ( valid for all actions ) 
 #  set the log level , valid values are:
 #  fatal,error,warn,info,debug,trace

 loglevel = "debug"

 # -----------------------------
 # sync-mode (only valid on hamonitor action)
 #  NOTE: rigth now only  "onlyslave" (one way sync ) is valied
 #  (planned sync in two ways in the future)

 sync-mode = "onlyslave"

 # ---------------------------
 # master-db choose one of the configured InfluxDB as a SlaveDB
 # this parameter will be override by the command line -master parameter
 
 master-db = "influxdb01"

 # ---------------------------
 # slave-db choose one of the configured InfluxDB as a SlaveDB
 # this parameter will be override by the command line -slave parameter
 
 slave-db = "influxdb02"

 # ------------------------------
 # check-interval
 # the inteval for health cheking for both master and slave databases
 
 check-interval = "10s"

 # ------------------------------
 # min-sync-interval
 # the inteval in which HA monitor will check both are ok and change
 # the state of the cluster if not, making all needed recovery actions

 min-sync-interval = "20s"
 
 # ---------------------------------------------
 # initial-replication
 # tells syncflux if needed some type of replication 
 # on slave database from master database on initialize 
 # (only valid on hamonitor action)
 #
 # none:  no replication
 # schema: database and retention policies will be recreated on the slave database
 # data: data for all retention policies will be replicated 
 #      be carefull: this full data copy could take hours,days.
 # both:  will replicate first the schema and them the full data 

 initial-replication = "none"

 # 
 # monitor-retry-durtion 
 #
 # syncflux only can begin work when master and slave database are both up, 
 # if some of them is down syncflux will retry infinitely each monitor-retry-duration to work.
 monitor-retry-interval = "1m"

 # 
 # data-chuck-duration
 #
 # duration for each small, read  from master -> write to slave, chuck of data
 # smaller chunks of data will use less memory on the syncflux process
 # and also less resources on both master and slave databases
 # greater chunks of data will improve sync speed 

 data-chuck-duration = "60m"

 # 
 #  max-retention-interval
 #
 # for infinite ( or bigger ) retention policies full replication should begin somewhere in the time
 # this parameter set the max retention.
 
 max-retention-interval = "8760h" # 1 year
 

# ---- HTTP API SECTION (Only valid on hamonitor action)
# Enables an HTTP API endpoint to check the cluster health

[http]
 name = "example-http-influxdb"
 bind-addr = "127.0.0.1:4090"
 admin-user = "admin"
 admin-passwd = "admin"
 cookie-id = "mysupercokie"

# ---- INFLUXDB  SECTION
# Sets a list of available DB's that can be used 

Run as a Database replication Tool

Available actions:

  • Replicate Schema
  • Copy data
  • Full copy (replicate schema + copy data)

Replicate schema

Allows the user to copy DB schemas from DB1 to DB2. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action replicaschema [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>]

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags

Limitations

  • Only the default RP can be renamed

Important Notes

When copying big databases, there is a few things you shoult take care, to ensure data is corretly copied.

Syncflux tool copy data by doing "select * from XXXXX where time > [INIT_CHUNK] AND time > [END_CHUNK]" for each one of the existing measurements in the choosen database, It does num queries concurrently Depending on the measurement cardinality these queries could take long time (be carefull with timeouts) and also need for resources (memory mainly ) in both databases , but also in for the syncflux process itself.

We recomends increase/disable all query timeouts:

Examples

Example 1: Copy schema from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy schema from Influx01-DB1 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from rp1

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -rp "^rp1$"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db3
    |-- rp1*

Example 4: Copy schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and set the defaultrp to rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -newrp "rp3"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Copy data

Allows the user to copy DB data from master to slave. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action copy [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>] { [-start <start_time>] [-endtime <end_time>] , [-full] }

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags The start end end allow to define a time window to copy data. If full is passed, the data will be copied from now to max-retention-interval

Remember that with this action schema is not replicated so if the DB or RP on slave doesn't exists it will be skipped

Limitations

  • ...

Examples

Example 1: Copy all data from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "coy" -master "influx01" -slave "influx02"

The command above will copy data from all dbs from Influxdb01 into Influx02

Influx02 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy data from Influx01-DB1 to Influx02 on a time window and only from rp1

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -rp "^rp1$" -start -10h end -5h

The command above will repicate all data from Influx01 to InfluxDB but only from db1.rp1 and with a time window from -10h to -5h

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy data from Influx01-DB1 to Influx02-DB3 (existing db called DB3)

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3'

Influx02 schema
----------------
  |-- db3
    |-- rp1*
    |-- rp2

Example 4: Copy data from Influx01-DB1 to Influx02-DB3 (existing db called DB3) and set the defaultrp to existing rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -mes "cpu.*"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
      |-- cpu
    |-- rp2

Copy data + schema

Allows the user to copy DB data from master to slave. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action fullcopy [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>] { [-start <start_time>] [-endtime <end_time>] , [-full] }

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags The start end end allow to define a time window to copy data. If full is passed, the data will be copied from now to max-retention-interval

Remember that with this action schema is not replicated so if the DB or RP on slave doesn't exists it will be skipped

Limitations

  • Only the default RP can be renamed

Examples

Example 1: Copy all data and schema from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "coy" -master "influx01" -slave "influx02"

The command above will create the schema and will copy data from all dbs from Influxdb01 into Influx02

Influx02 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy data and schema from Influx01-DB1 to Influx02 on a time window

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -start -10h end -5h

The command above will create the schema and will repicate all data from Influx01 to InfluxDB but only from db1 and with a time window from -10h to -5h

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy data from Influx01-DB1 to Influx02-DB3 (new db called DB3)

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3'

Influx02 schema
----------------
  |-- db3
    |-- rp1*
    |-- rp2

Example 4: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and set the defaultrp to rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -mes "cpu.*"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
      |-- cpu
    |-- rp2

Run as a HA Cluster monitor

./bin/syncflux -config ./conf/syncflux.conf -action hamonitor 

syncflux by default search a file syncflux.conf in the CWD/conf/ and syncflux has hamonitor action by default so this last is equivalent to this one

./bin/syncflux  

you can check the cluster state with any HTTP client, posibles values are:

  • OK: both nodes are ok
  • CHECK_SLAVE_DOWN: current slave is down
  • RECOVERING: both databases are working but slave leaks some data and syncflux is recovering them
 % curl http://localhost:4090/api/health
{
  "ClusterState": "CHECK_SLAVE_DOWN",
  "ClusterNumRecovers": 0,
  "ClusterLastRecoverDuration": 0,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T09:45:05.461897766+02:00",
  "SlaveState": false,
  "SlaveLastOK": "2019-04-06T09:44:55.465393243+02:00"
}

% curl http://localhost:4090/api/health
{
  "ClusterState": "RECOVERING",
  "ClusterNumRecovers": 0,
  "ClusterLastRecoverDuration": 0,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T10:28:25.459701432+02:00",
  "SlaveState": true,
  "SlaveLastOK": "2019-04-06T10:28:25.55500823+02:00"
}


% curl http://localhost:4090/api/health
{
  "ClusterState": "OK",
  "ClusterNumRecovers": 1,
  "ClusterLastRecoverDuration": 2473620691,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T10:28:25.459701432+02:00",
  "SlaveState": true,
  "SlaveLastOK": "2019-04-06T10:28:25.55500823+02:00"
}

syncflux's People

Contributors

dependabot[bot] avatar maxadamo avatar ptoews avatar sbengo avatar toni-moreno avatar wdhongtw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

syncflux's Issues

Error synchronizing measurements with points in the name

I have an error when I try to run syncflux in a database where there are measurements that have a point in the name. For example if I have the measurement: CITY.TEMPERATURE when I execute the following command, this error appears

/opt/syncflux # ./bin/syncflux -action "copy" -start -1h
INFO[2019-08-05 18:54:50] CFG :&{General:{InstanceID: LogDir:./log HomeDir: DataDir: LogLevel:debug SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb01 SlaveDB:influxdb02 InitialReplication:none MonitorRetryInterval:1m0s DataChunkDuration:5m0s MaxRetentionInterval:8760h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:20000} HTTP:{BindAddr:0.0.0.0:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc0001b13e0 0xc0001b14a0]}
INFO[2019-08-05 18:54:50] Set Master DB influxdb01 from Command Line parameters
INFO[2019-08-05 18:54:50] Set Slave DB influxdb02 from Command Line parameters
WARN[2019-08-05 18:54:50] The response for Query is null, get Fields from DB WHEATHER Measurement CITY.TEMPERATURE error!

This is because “CITY” is taken as retention policy.

[Bug] On creating RP must be surrounded with double quotes

Trying to create a new DB:

CREATE DATABASE "db_metrics" WITH DURATION 8736h REPLICATION 1 SHARD DURATION 72h NAME "1y"

It is giving the following error:

Error on Create DB &{...} on SlaveDB slaveinfluxserver : Error: error parsing query: found 1y, expected identifier at line 1, char XX

The RP name must be surrounded with double quotes

[Feature request] Speed up GetSchema

Loading the schema from a DB with a large number of measurements takes a long time. I've observed anywhere from 8-20 minutes before GetSchema completes.

I suspect the cause of long load times to be a result of:

mf[m.Name].Fields = GetFields(hac.Master.cli, db, m.Name, rp.Name)

This is making individual API calls for each measurement to fetch field keys.

I was thinking that it may be possible to use show field keys on <sdb>, so that the API responds with field keys for ALL measurements in the selected db. I think this would work, but I haven't investigated whether there are any size limitations with influxdb JSON responses, or the rest client used.

With 1000 measurements, the API took 12s to respond with a 1.72MB JSON payload. Compared to a request for fields on a single measurement, which took between 500-800ms within a small sample size of requests.

An alternate could be splitting the list of measurements and fetch field keys in batches, but this could also be very slow. For example, show field keys from disk,diskio,interrupts,kernel would take upward of 12s, sometimes even giving an empty response. Maybe influxdb does not index on this sort of query?

For my limited testing, I am running InfluxDB 1.7.7, with queries being routed through influxdb-srelay. Queries made directly to master were slightly faster, with all fields being returned in 4s, and batches of 4 varying between 4-12s per request.

It would be awesome if we could set a flag at the command line to force bulk loading of all field keys in a single request, or have some sort of logic that automatically switches to bulk loading if a certain amount of measurements are seen in one DB. If batching requests is workable with additional configuration in influxdb, that would also be great.

I'd be happy to submit a PR with my proposed solution, but would appreciate some feedback on the correct approach to take.

Cancel copy operation if one chunk failed

Hi,

to ensure continuity in the destination database, i.e. no missing data points in between, it would be nice if it was possible to tell syncflux that it should stop after a chunk could not be copied.
Currently, the amount of retries and the wait delay for each chunk copy can be configured, but it seems that even if all this fails the next chunk is always attempted.

Would something like this be possible?

initial replication works but hamonitor doesn't sync after that

after the initial replication "hamonitor" just does a cluster check and doesn't copy data anymore (there is no error)
hamonitor

when I start a syncflux with "copy" action manually (while hamonitor is running in the background), the data (after the initial replication) gets synced again, but since this is manual I have to do this every time
copy

shouldn't that sync be done by hamonitor process ?

[Feature Request] Make copy order configurable

First of all, thank you for this great tool! It fits our use case pretty well.
There is just one thing: We would like to use this to sync databases that cannot always be connected to each other, therefore there are long periods where big amounts are collected but not immediately transferred. Connection periods are rather short and will be interrupted often (this seems to be handled already pretty well).

To prevent data fragmentation, it would be ideal that the data is therefore copied starting from the start timestamp instead of going backwards from the current point in time. Then the data on the destination db would never have any gaps.

I think this is the corresponding location in the code, and I couldn't find any already existing configurable options there:

syncflux/pkg/agent/sync.go

Lines 144 to 146 in dd51b97

//sync from newer to older data
endsec := eEpoch.Unix() - (i * chunkSecond)
startsec := eEpoch.Unix() - ((i + 1) * chunkSecond)

Thanks!

P.S: Sadly I don't have any experience in go, so a PR would be difficult.

Sync between 2 dbs with the same name

Hi,

Can you please clarify how should the config file look if we have 2 DBs, a master and a slave, that have the same db name (live in 2 different servers) and need to be sync'd in HA?
From my testing, it seems Syncflux is assuming the last DB on the config file list as both master and slave, so no sync occurs.

Thanks

[bug] error copy when no db selected

This is the comand and the log with the bug..

# bin/syncflux -vv -action copy -start -1h
...
...
DEBU[2019-04-14 08:36:03] Database snmp not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database _kapacitor not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database telegraf_relay not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database pseries not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database test not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database ml_metrics not match to regex all:  skipping..  
DEBU[2019-04-14 08:36:03] Database telegraf not match to regex all:  skipping..  

Error in log message

There is a format error Splitting %!s(int=134880) ,and a logical error : batchpoints into 50000 points chunks from 0 to 10000 "


time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 0 to 10000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 10000 to 20000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 20000 to 30000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 30000 to 40000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 40000 to 50000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 50000 to 60000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 60000 to 70000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 70000 to 80000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 80000 to 90000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 90000 to 100000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 100000 to 110000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 110000 to 120000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 120000 to 130000 "
time="2019-06-11 16:27:42" level=debug msg="Splitting %!s(int=134880) batchpoints into 50000  points chunks from 130000 to 134880 "
```

[Feature Request] Granular ha sync

In my use case, I need to sync multiple same DB (from multiple host) to a centralized host with specific db for each host.

So, it could be interested in hamonitor mode to use -db -newdb args for specific db source and specific db dest.

A quick read of code seems to evaluate as not possible for the moment.
What do you think about it ?

I can look about a making a pull request about it.
Thx
Dody

panic index out of range when execute the function agent.GetFields

log:

time="2020-11-10 06:34:45" level=debug msg="discovered measurement &{\x18.�\x1b\x00\v� map[]} on DB: prometheus-RP:autogen"
time="2020-11-10 06:34:45" level=debug msg="get fields query[show field keys from "\x18.�\x1b\x00\v�"],db[prometheus],meas[\x18.�\x1b\x00\v�]"
time="2020-11-10 06:34:45" level=debug msg="get fields from meas[\x18.�\x1b\x00\v�], response:[[]]"
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/toni-moreno/syncflux/pkg/agent.GetFields(0xbee080, 0xc00038aa90, 0xc00039a870, 0xa, 0xc00012c890, 0xb, 0xc00002f9a7, 0x7, 0xc0002ae000)
/home/golang/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:256 +0x8fc
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).GetSchema(0xc000120f70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb58088, 0xc0001cdd60, ...)
/home/golang/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:147 +0x8ea
github.com/toni-moreno/syncflux/pkg/agent.HAMonitorStart(0xc00012c2a0, 0xa, 0xc00012c380, 0xa)
/home/golang/src/github.com/toni-moreno/syncflux/pkg/agent/agent.go:247 +0x9b
main.main()
/home/golang/src/github.com/toni-moreno/syncflux/pkg/main.go:296 +0x53b

[Feature Request] Support uint64 columns

Currently, if a copy of an influx database with uint64 typed columns is attempted, these columns are skipped and the log shows
WARN[2020-10-10 16:32:49] Unhandled type &{data unsigned} in field data measuerment can.

I will see if I can fix this in a PR.

Fail to get response from query select * (and long pauses)

I'm using your syncflux tool (https://github.com/toni-moreno/syncflux) to get a full, up to date copy of a fairly large db (300Gb). While I have some questions about use case, I'm more immediately concerned about these errors. I'm doing a 'syncflux -action fullcopy.' I don't have any options in my syncflux.toml configuration (only servers defined) and the Influx configuration on both master/slave is default.

In running syncflux I'll see some data written to the receiving side, but then everything pauses for 30 seconds or more.

** PLEASE SEE LOG IN RECENT COMMENT **

I see the following in the logs:

time="2019-10-09 14:50:57" level=info msg="CFG :&{General:{InstanceID: LogDir:./log HomeDir: DataDir: LogLevel:debug SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb01 SlaveDB:influxdb02 InitialReplication:none MonitorRetryInterval:1m0s DataChunkDuration:5m0s MaxRetentionInterval:8760h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:20000} HTTP:{BindAddr:127.0.0.1:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc00006fec0 0xc00006ff80]}"
time="2019-10-09 14:50:57" level=info msg="Set Master DB influxdb01 from Command Line parameters"
time="2019-10-09 14:50:57" level=info msg="Set Slave DB influxdb02 from Command Line parameters"
time="2019-10-09 14:51:09" level=warning msg="Fail to get response from query select * from "vsphere_host_sys" where time > 1570657557s and time < 1570657857s group by * on [telegraf|autogen] in attempt 1 / read database error: "
time="2019-10-09 14:51:09" level=warning msg="Trying again... in 10s sec"
time="2019-10-09 14:51:29" level=warning msg="Fail to get response from query select * from "vsphere_host_sys" where time > 1570657557s and time < 1570657857s group by * on [telegraf|autogen] in attempt 2 / read database error: "
time="2019-10-09 14:51:29" level=warning msg="Trying again... in 10s sec"
time="2019-10-09 14:51:49" level=warning msg="Fail to get response from query select * from "vsphere_host_sys" where time > 1570657557s and time < 1570657857s group by * on [telegraf|autogen] in attempt 3 / read database error: "
time="2019-10-09 14:51:49" level=warning msg="Trying again... in 10s sec"
time="2019-10-09 14:52:09" level=warning msg="Fail to get response from query select * from "vsphere_host_sys" where time > 1570657557s and time < 1570657857s group by * on [telegraf|autogen] in attempt 4 / read database error: "
time="2019-10-09 14:52:09" level=warning msg="Trying again... in 10s sec"
time="2019-10-09 14:52:29" level=warning msg="Fail to get response from query select * from "vsphere_host_sys" where time > 1570657557s and time < 1570657857s group by * on [telegraf|autogen] in attempt 5 / read database error: "
time="2019-10-09 14:52:29" level=warning msg="Trying again... in 10s sec"
time="2019-10-09 14:52:39" level=error msg="Max Retries (5) exceeded on read Data: Last error "
time="2019-10-09 14:52:39" level=error msg="error in read DB telegraf | Measurement vsphere_host_sys | ERR: "
time="2019-10-09 14:52:39" level=warning msg="Initializing Recovery for 1 chunks"
time="2019-10-09 14:52:39" level=warning msg="Recovery for Bad Chunk 1/1 from [1570657557][2019-10-09 14:45:57 -0700 PDT] to [1570657857][2019-10-09 14:50:57 -0700 PDT] (88763) Points Took [1m40.027365462s] ERRORS[R:1|W:0]"


10.41.86.23 - admin [09/Oct/2019:22:02:50 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" 891a12ff-eae0-11e9-8205-02cdb5175738 107
10.41.86.23 - admin [09/Oct/2019:22:02:50 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" 893e597a-eae0-11e9-8206-02cdb5175738 59
10.41.86.23 - admin [09/Oct/2019:22:02:50 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" 8948c0a6-eae0-11e9-8207-02cdb5175738 50
10.41.86.23 - admin [09/Oct/2019:22:02:50 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" 89534a09-eae0-11e9-8208-02cdb5175738 44
10.41.86.23 - admin [09/Oct/2019:22:02:50 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" 8959e0c7-eae0-11e9-8209-02cdb5175738 35

[Feature Request] Allow parallel sync/copy per measurement instead per chunk

Right now sync is done by chunk period, each chunk process in parallel ( as workers as configured ) each one of the measurements , if one measurement fails all the chuck are marked as bad chunk (even though all other measurements has been synced/copies ok) .

Our DB's usually could be one big measurement and others smaller, if processed by chunks all data will be affected if one big measurements impact on all other data. Perhaps per measurment parallel process data will be fastest copied and also recovered by measurement.

This change requires a big refactor.

Unable to start getting stderr.log error

Hi @toni-moreno,

i have total 25 measurement and data is coming continuously so if data is not there then syncflyx is working fine, But when i am restarting that i am getting error.

time="2020-01-21 04:39:34" level=info msg="CFG :&{General:{InstanceID: LogDir:./log HomeDir: DataDir: LogLevel:debug SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb01 SlaveDB:influxdb02 InitialReplication:none MonitorRetryInterval:1m0s DataChunkDuration:5m0s MaxRetentionInterval:43h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:40000} HTTP:{BindAddr:0.0.0.0:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc0001f14a0 0xc0001f1560]}"
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/toni-moreno/syncflux/pkg/agent.GetFields(0xafbb40, 0xc0001500c0, 0xc0002f047a, 0x6, 0xc000025008, 0x5, 0xc0002f0577, 0x7, 0xc00041ff80)
/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:254 +0x6dc
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).GetSchema(0xc00031a4e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0001abd01, 0xc0001abda0, 0x42d771, ...)
/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:147 +0x5db
github.com/toni-moreno/syncflux/pkg/agent.HAMonitorStart(0xc0001fa210, 0xa, 0xc0001fa280, 0xa)
/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/agent.go:246 +0x9b
main.main()
/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/main.go:288 +0x4af

thanks

too many open connections

I noticed too many open connections on my server.
It should be resolved by #30

Do you mind having a look, if it makes sense this change to you?

p.s.: I believe that there might be other cases where Close() is missing.

deb package version number does not start with digit

When I tried to install syncflux from deb package I've encountered this error:

dpkg: error processing archive /home/<user>/Downloads/syncflux_v0.7.0_amd64.deb (--unpack):
 parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'syncflux':
 'Version' field value 'v0.7.0': version number does not start with digit

I fixed it simply by changing version in DEBIAN/control from v0.7.0 to 0.7.0, but it'd be great to have this built in package building :).

RPM SystemD script is broken because Golang project does not recognize all given flags (arguments)

Here is the output:

Started SyncFlux Agent.
syncflux.service: Main process exited, code=exited, status=1/FAILURE
syncflux.service: Failed with result 'exit-code'.
syncflux.service: Service RestartSec=100ms expired, scheduling restart.
syncflux.service: Scheduled restart job, restart counter is at 5.
Stopped SyncFlux Agent.
syncflux.service: Start request repeated too quickly.
syncflux.service: Failed with result 'exit-code'.
Failed to start SyncFlux Agent.

Checking systemctl status syncflux gives this output:

● syncflux.service - SyncFlux Agent
   Loaded: loaded (/usr/lib/systemd/system/syncflux.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2022-01-27 16:35:35 EET; 29s ago
     Docs: http://github.com/toni-moreno/syncflux
  Process: 140031 ExecStart=/usr/sbin/syncflux --pidfile=${PID_FILE} --config=${CONF_FILE} --logs=${LOG_DIR} --home=${HOME_DIR} --data=${DATA_DIR} (code=exited, status=1/FAILURE)
 Main PID: 140031 (code=exited, status=1/FAILURE)

Taking all the variables from here and copy-pasting them into the terminal and then running command /usr/sbin/syncflux --pidfile=${PID_FILE} --config=${CONF_FILE} --logs=${LOG_DIR} --home=${HOME_DIR} --data=${DATA_DIR} gives this error:

flag provided but not defined: -home
Usage of /usr/sbin/syncflux:
...

Same goes with -data flag.

This should be related, I can see commented out lines: https://github.com/toni-moreno/syncflux/blob/master/pkg/main.go#L30-L35

workaround: run systemctl edit syncflux.service and set contents as per below:

[Service]
ExecStart=
ExecStart=/usr/sbin/syncflux --pidfile=${PID_FILE} --config=${CONF_FILE} --logs=${LOG_DIR}

[Feature Request] Mark errors on chucks

When doing a massive copy from a big database , sometimes incoming database has query limitations and also destination database can have write limits.

If we can build a "bad chunk" map we could restart the massive copy process only querying bad chunks and avoiding rewrite the complete database.

http_access.log always print /api/health request

Although I set loglevel to warn, I found it doesn't affect on http_access.log, and it always print like below:
[Macaron] 2020-04-30 02:43:50: Started GET /api/health for 10.110.19.67
[Macaron] 2020-04-30 02:43:50: Completed GET /api/health 200 OK in 533.368µs
[Macaron] 2020-04-30 02:45:50: Started GET /api/health for 10.110.19.67
[Macaron] 2020-04-30 02:45:50: Completed GET /api/health 200 OK in 295.624µs

[BUG] Unhandled failed master down while recovering slave

When trying to recover if master chrashes the local sync process crashes also.

INFO[2019-04-07 09:13:21] CFG :&{General:{InstanceID: LogDir:./log HomeDir: DataDir: LogLevel:debug SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb01 SlaveDB:influxdb02 InitialReplication:none MonitorRetryInterval:1m0s} HTTP:{BindAddr:127.0.0.1:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc00007b980 0xc00007ba40]} 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa671da]

goroutine 378 [running]:
github.com/toni-moreno/syncflux/pkg/agent.ReadDB(0xd23720, 0xc000112f00, 0xc0001fc560, 0xa, 0xc0001cca70, 0x7, 0xc0001fc560, 0xa, 0xc0001cca70, 0x7, ...)
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:320 +0x3ba
github.com/toni-moreno/syncflux/pkg/agent.SyncDBRP(0xc00015c280, 0xc000246000, 0xc0001fc560, 0xa, 0xc0001e3380, 0xbf229ff52b5eee6e, 0x1793cdec144b, 0x117eb20, 0xbf229ffa2b5eee6e, 0x17987603dc4b, ...)
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:513 +0x75c
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).ReplicateData(0xc000113380, 0xc00020a3c0, 0x7, 0x8, 0xbf229ff52b5eee6e, 0x1793cdec144b, 0x117eb20, 0xbf229ffa2b5eee6e, 0x17987603dc4b, 0x117eb20, ...)
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:156 +0x25f
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).checkCluster(0xc000113380)
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:224 +0xb88
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).startSupervisorGo(0xc000113380, 0x119ca30)
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:258 +0x2f2
created by github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).SuperVisor
	/home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:175 +0x6e
[Bra] 04-07 16:26:18 [ WARN] Fail to execute command: ./bin/syncflux [] - exit status 2

How to install?

Sorry, I am a noob with docker. I have docker installed on my InfluxDB server. What's next?

From what I read there should be a tar file for docker to run syncflux... where I can find the precompiled packages?

panic: runtime error: index out of range. Caused by hypens in measurement name

whichever action I choose (hamonitor, copy, replicaschema) I am not being able to start the application:

INFO[2019-07-19 13:33:19] CFG :&{General:{InstanceID: LogDir:/var/log/syncflux HomeDir: DataDir: LogLevel:trace SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:sensu02 SlaveDB:sensu01 InitialReplication:none MonitorRetryInterval:1m0s DataChunkDuration:5m0s MaxRetentionInterval:8760h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:20000} HTTP:{BindAddr:83.97.94.46:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc4201ed260 0xc4201ed320 0xc4201ed3e0]} 
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/toni-moreno/syncflux/pkg/agent.GetFields(0xa03e20, 0xc4201f2180, 0xc4202371a0, 0x5, 0xc420237600, 0x10, 0xc4202372f7, 0x7, 0xc420250e00)
	/home/maxadamo/go/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:254 +0x6d2
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).GetSchema(0xc42029a410, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420195d90, 0x42bdf4, 0x9bcc40, ...)
	/home/maxadamo/go/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:147 +0x548
github.com/toni-moreno/syncflux/pkg/agent.HAMonitorStart(0xc4201e2720, 0x7, 0xc4201e27c0, 0x7)
	/home/maxadamo/go/src/github.com/toni-moreno/syncflux/pkg/agent/agent.go:246 +0x9c
main.main()
	/home/maxadamo/go/src/github.com/toni-moreno/syncflux/pkg/main.go:288 +0x4c1

[Feature Request] add measurement filtering on all modes.

We have detected in old Influx version (1.0.0) some big measurements where its impossible for query data with "select * from ..."

If needed to copy this database you won't be able to copy data with syncflux, as a minor problem we can skip this measurements from the hamonitor/copy/fullcopy or replicateschema funtion mode.

Error while trying to write large amount of data

Hi,

Tried to write a large amount of with:

 data-chuck-duration = 24h
> syncflux -action fullcopy -master masterinfluxserver -slave slaveinfluxserver -db db_metrics -start -720h

The following errors appeared:

...
DEBU[2019-04-12 11:35:32] processed 250560 points
INFO[2019-04-12 11:35:33] Fail to write to database, error: {"error":"Request Entity Too Large"}
...
WARN[2019-04-12 11:35:55] Fail to get response from query select * from  "mymeasurement" where time  > 1554975322s and time < 1555061722s group by *, read database error: unable to decode json: received status code 200 err: net/http: request canceled (Client.Timeout exceeded while reading body)
ERRO[2019-04-12 11:35:55] Data Replication error in DB [&{db_metrics 1y [%!s(*agent.RetPol=&{1y 31449600000000000 259200000000000 1 true})] map[DATA] RP [rp] | Error: unable to decode json: received status code 200 err: net/http: request canceled (Client.Timeout exceeded while reading body)

Data don't sync between master and slave

Hi @toni-moreno !
I'm testing my cluster with two influxdb backend nodes and two influxdb-srelay + syncflux pods.
Configurations are similar to provided there toni-moreno/influxdb-srelay#9.
When I deleting influxdb-1 (slave) instance for the test syncflux starts the recovery process but no data are recovered
What can be the reason for that? Thanks

Configurations:
SyncFlux

# -*- toml -*-
[General]
logdir = "./log"
loglevel = "trace"
sync-mode = "onlyslave"
master-db = "influxdb-0" #influxdb-1 for second instance
slave-db = "influxdb-1" #influxdb-0 for second instance
check-interval = "5s"
min-sync-interval = "20s"
initial-replication = "both"
monitor-retry-interval = "1m"
data-chuck-duration = "5m"
max-retention-interval = "8760h" # 1 year
rw-max-retries = 5
rw-retry-delay = "10s"
num-workers = 4
max-points-on-single-write = 20000
[http]
name = "http-syncflux"
bind-addr = "0.0.0.0:4090"
cookie-id = "mysupercokie"
# ---- INFLUXDB  SECTION
# Sets a list of available DB's that can be used 
# as master or slaves db's on any of the posible actions     

[[influxdb]]
  release = "1x"          
  name = "influxdb-0"
  location = "http://influxdb-0.influxdb:8086/"
  timeout="10s"
  admin-user = "admin"
  admin-passwd = "pasword"     

[[influxdb]]
  release = "1x"          
  name = "influxdb-1"
  location = "http://influxdb-1.influxdb:8086/"
  timeout="10s"
  admin-user = "admin"
  admin-passwd = "password"

Request for each db log (64 and 100 rows)

/ $ curl -G http://influxdb-1.influxdb:8086/query?db=example --data-urlencode "q=select count(*) from cpu_load_short"
{"results":[{"statement_id":0,"series":[{"name":"cpu_load_short","columns":["time","count_value"],"values":[["1970-01-01T00:00:00Z",64]]}]}]}
/ $ curl -G http://influxdb-0.influxdb:8086/query?db=example --data-urlencode "q=select count(*) from cpu_load_short"
{"results":[{"statement_id":0,"series":[{"name":"cpu_load_short","columns":["time","count_value"],"values":[["1970-01-01T00:00:00Z",100]]}]}]}

Recovery Log

 12:39:23" level=info msg="HACLuster: detected UP Last(2020-08-11 12:39:18.130767922 +0000 UTC m=+320.108574677) Duratio OK (4.992932233s) RECOVERING"
time="2020-08-11 12:39:23" level=info msg="HACLUSTER: INIT RECOVERY : FROM [ 2020-08-11 12:38:03.12768705 +0000 UTC m=+245.105493774 ] TO [ 2020-08-11 12:39:18.130767922 +0000 UTC m=+320.108574677 ]"
time="2020-08-11 12:39:23" level=info msg="HACLUSTER: INIT REFRESH SCHEMA"
time="2020-08-11 12:39:23" level=debug msg="discovered database 0: [_internal]"
time="2020-08-11 12:39:23" level=debug msg="discovered database 1: [example]"
time="2020-08-11 12:39:23" level=debug msg="discovered retention Policies 0:  5 : []interface {}{\"autogen\", \"0s\", \"168h0m0s\", \"1\", true}"
time="2020-08-11 12:39:23" level=trace msg="SHOW DATABASES On InitPint: [{StatementId:0 Series:[{Name:databases Tags:map[] Columns:[name] Values:[[_internal] [example]] Partial:false}] Messages:[] Err:}]"
time="2020-08-11 12:39:23" level=info msg="InfluxMonitor: InfluxDB : influxdb-1  OK (Version  1.7.6 : Duration 3.169399ms )"
time="2020-08-11 12:39:23" level=trace msg="SHOW DATABASES On InitPint: [{StatementId:0 Series:[{Name:databases Tags:map[] Columns:[name] Values:[[_internal] [example]] Partial:false}] Messages:[] Err:}]"
time="2020-08-11 12:39:23" level=info msg="InfluxMonitor: InfluxDB : influxdb-0  OK (Version  1.7.6 : Duration 5.330761ms )"
time="2020-08-11 12:39:23" level=debug msg="discovered measurement  &{cpu_load_short map[]} on DB: example-RP:autogen"
time="2020-08-11 12:39:23" level=debug msg="Detected Field [value] type [float] on measurement [cpu_load_short]"
time="2020-08-11 12:39:23" level=info msg="HACLUSTER: INIT REPLICATION DATA PROCESS"
time="2020-08-11 12:39:23" level=info msg="Replicating Data from DB example RP autogen..."
time="2020-08-11 12:39:23" level=debug msg="SYNC-DB-RP[example|autogen] From:2020-08-11 12:38:03.12768705 +0000 UTC m=+245.105493774 To:2020-08-11 12:39:18.130767922 +0000 UTC m=+320.108574677 | Duration: 1m15.003080903s || #chunks: 1  | chunk Duration 5m0s "
time="2020-08-11 12:39:23" level=trace msg="SYNC-DB-RP Schema: &{autogen 0s 168h0m0s %!s(int64=1) %!s(bool=true) map[cpu_load_short:%!s(*agent.MeasurementSch=&{cpu_load_short map[value:0xc00009b340]})]}  "
time="2020-08-11 12:39:23" level=debug msg="Detected 1 measurements on example|autogen"
time="2020-08-11 12:39:23" level=trace msg="Processing measurement cpu_load_short with schema #&{Name:cpu_load_short Fields:map[value:0xc00009b340]}"
time="2020-08-11 12:39:23" level=debug msg="processing Database example Measurement cpu_load_short from 1597149258 to 1597149558"
time="2020-08-11 12:39:23" level=debug msg="Query [select * from  \"cpu_load_short\" where time  > 1597149258s and time < 1597149558s group by *] took 2.597468ms "
time="2020-08-11 12:39:23" level=trace msg="Reading 0 Series for db example"
time="2020-08-11 12:39:23" level=debug msg="processed 0 points"
time="2020-08-11 12:39:23" level=debug msg="MaxPointsOnSingleWrite [20000] "
time="2020-08-11 12:39:23" level=debug msg="Write attempt [1] took 2.855488ms "
time="2020-08-11 12:39:23" level=info msg="Processed Chunk [1/1](100%) from [1597149258][2020-08-11 12:34:18 +0000 UTC] to [1597149558][2020-08-11 12:39:18 +0000 UTC] (0) Points Took [5.611181ms] ERRORS[R:0|W:0]"

pkg/agent/client.go:262 index out of range

Hi.

syncflux -vvv -action "copy" -master "influxdb2-testers" -slave "influxdb3-testers" -db "^payment$" -config "/etc/syncflux/syncflux.toml"
INFO[2020-11-02 06:25:33] CFG :&{General:{InstanceID: LogDir:/var/log/syncflux/ HomeDir: DataDir: LogLevel:debug SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb3-testers SlaveDB:influxdb2-testers InitialReplication:none MonitorRetryInterval:1m0s DataChunkDuration:5m0s MaxRetentionInterval:8760h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:20000} HTTP:{BindAddr:127.0.0.1:4090 AdminUser:admin AdminPassword:admin CookieID:mysupercokie} InfluxArray:[0xc000067ec0 0xc000067f80]}
INFO[2020-11-02 06:25:33] Set Default directories :

  • Exec: /root
  • Config: /etc/syncflux
    -Logs: /root/log
    INFO[2020-11-02 06:25:33] Initializing cluster
    INFO[2020-11-02 06:25:33] Found MasterDB[influxdb2-testers] in config File &{Release:1x Name:influxdb2-testers Location:http://influxdb2-testers.mhd.local:8086/ AdminUser: AdminPasswd: Timeout:10s}
    TRAC[2020-11-02 06:25:33] SHOW DATABASES On InitPint: [{StatementId:0 Series:[{Name:databases Tags:map[] Columns:[name] Values:[[_internal] [payment_monitoring] [payment]] Partial:false}] Messages:[] Err:}]
    INFO[2020-11-02 06:25:33] Found SlaveDB[influxdb3-testers] in config File &{Release:1x Name:influxdb3-testers Location:http://127.0.0.1:8086/ AdminUser: AdminPasswd: Timeout:10s}
    TRAC[2020-11-02 06:25:33] SHOW DATABASES On InitPint: [{StatementId:0 Series:[{Name:databases Tags:map[] Columns:[name] Values:[[_internal] [payment]] Partial:false}] Messages:[] Err:}]
    DEBU[2020-11-02 06:25:33] discovered database 0: [_internal]
    DEBU[2020-11-02 06:25:33] discovered database 1: [payment_monitoring]
    DEBU[2020-11-02 06:25:33] discovered database 2: [payment]
    DEBU[2020-11-02 06:25:33] Database payment_monitoring not match to regex ^payment$: skipping..
    DEBU[2020-11-02 06:25:33] discovered retention Policies 0: 5 : []interface {}{"default", "0", "168h0m0s", "1", true}
    DEBU[2020-11-02 06:25:33] discovered measurement &{transactions map[]} on DB: payment-RP:default
    panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/toni-moreno/syncflux/pkg/agent.GetFields(0xb39a40, 0xc000020180, 0xc0001a1539, 0x7, 0xc0001a1820, 0xc, 0xc0001a16d0, 0x7, 0x0)
/home/vant/proyectos/otros/syncflux/pkg/agent/client.go:262 +0x667
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).GetSchema(0xc000167ba0, 0x7ffef06ee7cf, 0x9, 0xa56ea2, 0x2, 0xa56ea2, 0x2, 0xb1cfa8, 0x1, 0xc000162438, ...)
/home/vant/proyectos/otros/syncflux/pkg/agent/hacluster.go:147 +0x8db
github.com/toni-moreno/syncflux/pkg/agent.Copy(0x7ffef06ee7a0, 0x11, 0x7ffef06ee7b9, 0x11, 0x7ffef06ee7cf, 0x9, 0x0, 0x0, 0xa56ea2, 0x2, ...)
/home/vant/proyectos/otros/syncflux/pkg/agent/agent.go:214 +0xd7
main.main()

4090 port can't be opened after pod restart

I install influxdb-srealy and syncflux with influxdb on two hosts by statefulset, when I change config and pod restart, 4090 port of syncflux can't be opened and log of syncflux like below:
root@mgt01:~# kubectl log influxdb-0 -n monitoring syncflux

log is DEPRECATED and will be removed in a future version. Use logs instead.
time="2020-09-03 07:33:41" level=info msg="CFG :&{General:{InstanceID: LogDir: HomeDir: DataDir: LogLevel:info SyncMode:onlyslave CheckInterval:10s MinSyncInterval:20s MasterDB:influxdb01 SlaveDB:influxdb02 InitialReplication:both MonitorRetryInterval:30s DataChunkDuration:5m0s MaxRetentionInterval:8760h0m0s RWMaxRetries:5 RWRetryDelay:10s NumWorkers:4 MaxPointsOnSingleWrite:20000} HTTP:{BindAddr:0.0.0.0:4090 AdminUser: AdminPassword: CookieID:} InfluxArray:[0xc0001cd0e0 0xc0001cd200]}"
time="2020-09-03 07:33:41" level=info msg="Set Master DB influxdb01 from Command Line parameters"
time="2020-09-03 07:33:41" level=info msg="Set Slave DB influxdb02 from Command Line parameters"
time="2020-09-03 07:33:41" level=info msg="Set log level to  info from Config File"
time="2020-09-03 07:33:41" level=info msg="Set Default directories : \n   - Exec: \n   - Config: conf\n   -Logs: log\n"
time="2020-09-03 07:33:41" level=info msg="Initializing cluster"
time="2020-09-03 07:33:41" level=info msg="Found MasterDB[influxdb01] in config File &{Release:1x Name:influxdb01 Location:http://influxdb-0.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:33:41" level=error msg="Fail to build newclient to database http://influxdb-0.influxdb-svc:8086/, error: Get http://influxdb-0.influxdb-svc:8086/ping?wait_for_leader=10s: dial tcp: lookup influxdb-0.influxdb-svc on 100.105.0.3:53: server misbehaving\n"
time="2020-09-03 07:33:41" level=error msg="MasterDB[influxdb01] has  problems :Get http://influxdb-0.influxdb-svc:8086/ping?wait_for_leader=10s: dial tcp: lookup influxdb-0.influxdb-svc on 100.105.0.3:53: server misbehaving"
time="2020-09-03 07:33:41" level=info msg="Found SlaveDB[influxdb02] in config File &{Release:1x Name:influxdb02 Location:http://influxdb-1.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:33:41" level=error msg="Master DB is not runing I should wait until both up to begin to chek sync status"
time="2020-09-03 07:34:11" level=info msg="Found MasterDB[influxdb01] in config File &{Release:1x Name:influxdb01 Location:http://influxdb-0.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:34:11" level=error msg="Fail to build newclient to database http://influxdb-0.influxdb-svc:8086/, error: Get http://influxdb-0.influxdb-svc:8086/ping?wait_for_leader=10s: dial tcp: lookup influxdb-0.influxdb-svc on 100.105.0.3:53: server misbehaving\n"
time="2020-09-03 07:34:11" level=error msg="MasterDB[influxdb01] has  problems :Get http://influxdb-0.influxdb-svc:8086/ping?wait_for_leader=10s: dial tcp: lookup influxdb-0.influxdb-svc on 100.105.0.3:53: server misbehaving"
time="2020-09-03 07:34:11" level=info msg="Found SlaveDB[influxdb02] in config File &{Release:1x Name:influxdb02 Location:http://influxdb-1.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:34:11" level=error msg="Master DB is not runing I should wait until both up to begin to chek sync status"
time="2020-09-03 07:34:41" level=info msg="Found MasterDB[influxdb01] in config File &{Release:1x Name:influxdb01 Location:http://influxdb-0.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:34:41" level=info msg="Found SlaveDB[influxdb02] in config File &{Release:1x Name:influxdb02 Location:http://influxdb-1.influxdb-svc:8086/ AdminUser: AdminPasswd: Timeout:10s}"
time="2020-09-03 07:36:27" level=info msg="Replicating DB Schema from Master to Slave"
time="2020-09-03 07:36:27" level=info msg="Replicating DATA Schema from Master to Slave"
time="2020-09-03 07:36:27" level=info msg="Replicating Data from DB prometheus RP autogen...."

[Feature Request] Divide data into chunks based on amount rather than time

When I tried to sync a large database I experienced a few errors, for example Request Entitiy too Large which I could not fix yet by increasing the max-points-on-write parameter, and similar issues have been discussed here already for large amounts of data. But this is not the main point of this issue.

My data consists of ~50k points which are contained within about one minute, and I tried to sync the last month. So to decrease the amount of points per chunk, I would have to choose a chunk-interval of a few seconds, which results in a huge amount of empty chunks for this month. So I wondered: what is the reason for dividing the data based on time, instead of actual amount?
Granted, my example is a bit extreme, but in cases where the data distribution is uneven or has spikes this approach might not be the best. Instead it might be better to be able to define a chunk size, for example 1000 points, and then syncflux queries the first 1000 points, then the next 1000 points, and so on, resulting in very even and adjustable chunk sizes.
InfluxQL does support this with the LIMIT and OFFSET clauses.

I cannot even think of a reason why aggregating data over time would be better than simply over amount as described. Am I missing something? What do you think?

Duplication of data

I have a problem with duplication of data.
I have a 2 instances of InfluxDB (in1, in2), on the front of it there is a influxdb-srelay with HA config. Every write command is executed in in1/in2 instance.
Also on in1/in2 I have a 2 instances of syncflux in configuration -> at in1 there is master:in1,slave:in2, at in2 there is master:in2, slave:in1.
I'm executing some write queries, and everything is ok, queries are executed on both instances.
And now, I'm shutting down in2 instance still sending write command. Next I'm restarting in2 -> syncflux are starting to process chunks from in1 and it writes it to in2 instance. The problem is that some of data which was before shutting down in instance in2 are also retrieved from in1, and added as a duplicates in process chunks command.

My configs are simple as examples from github, srelay are using HA example, syncflux are using default HA configuration with initial-replication = "both" (changing to none doesn't help)
Why syncflux duplicates the data? Why it not checks that data is present in the database?

At the screenshot there is example:
19.28 - Servers was started
19.29 - I've executed a one write command
19.30 - I've stopped a second instance, and execute two write commands.
19.32 - I've started secondary database, and syncflux rebuild database, but it add a duplication of write command at 19.29, so at secondary graph there is a 2 instead of 1

Zrzut ekranu 2020-11-25 o 19 38 41

Service stopping

Not sure exactly how to configure synflux service to stay up and sync the databases. The service starts, seems to sync data properly, but then stops again. Could you please advise?
I am running on RHEL 7, installed from RPM.

TOML file attached
syncflux_toml.txt
.

initial-replication config

When I configured the initial-replication config as all

initial-replication = "all"

Receive the error:

time="2020-04-20 11:46:09" level=error msg="Unknown replication config all"

[Panic] Panic trying fullcopy

The following panic was given when trying to do a full copy:

Config:

...
 #
 # data-chuck-duration
 #
 # duration for each small, read  from master -> write to slave, chuck of data
 # smaller chunks of data will use less memory on the syncflux process
 # and also less resources on both master and slave databases
 # greater chunks of data will improve sync speed

 data-chuck-duration = "24h"
...

Command

> syncflux -action fullcopy -master masterinfluxdbserver -slave salveinfluxdbserver -start -24h -db db_metrics

Panic:

WARN[2019-04-11 12:28:08] Fail to get response from query select * from  "measurement_XXX" where time  > 1554892073s and time < 1554978473s group by *, read database error: unable to decode json: received status code 200 err: net/http: request canceled (Client.Timeout exceeded while reading body)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x818ac1]

goroutine 1 [running]:
github.com/toni-moreno/syncflux/pkg/agent.ReadDB(0xa53120, 0xc0001e6180, 0xc0001ce150, 0xb, 0xc0001ce518, 0x2, 0xc0001ce150, 0xb, 0xc0001ce518, 0x2, ...)
        /home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:320 +0x231
github.com/toni-moreno/syncflux/pkg/agent.SyncDBRP(0xc0002180a0, 0xc000324000, 0xc0001ce150, 0xb, 0xc0001cce40, 0xbf238ba641241262, 0xffffae253e9b1422, 0xdfe720, 0xbf23e38a40ede6de, 0x6c8099, ...)
        /home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/client.go:528 +0x65d
github.com/toni-moreno/syncflux/pkg/agent.(*HACluster).ReplicateData(0xc0001df110, 0xc0001c63b8, 0x1, 0x1, 0xbf238ba641241262, 0xffffae253e9b1422, 0xdfe720, 0xbf23e38a40ede6de, 0x6c8099, 0xdfe720, ...)
        /home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/hacluster.go:162 +0x1ee
github.com/toni-moreno/syncflux/pkg/agent.SchCopy(0x7ffd1088a7c9, 0xa, 0x7ffd1088a7db, 0xa, 0x7ffd1088a7f6, 0xb, 0xbf238ba641241262, 0xffffae253e9b1422, 0xdfe720, 0xbf23e38a40ede6de, ...)
        /home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/agent/agent.go:172 +0x21d
main.main()
        /home/developer/src/gospace/src/github.com/toni-moreno/syncflux/pkg/main.go:250 +0x488

Gaps after sync using ha-monitor

Hi syncflux guys,

Thanks for this extremely useful piece of software.
I have a setup with master and slave Influx DBs, using syncflux to sync them. The problem I am facing is that, after the initial sync is completed ("msg="Processed Chunk 1590/17473 from [16...." reaches 100%), there is a gap in the newest portion of the DB, that only gets filled when I restart the service. Is that the usual behavior or am I missing something in the configuration?

This is the command that the service runs:
/usr/sbin/syncflux -config=/etc/syncflux/syncflux.toml -logs=/var/log/syncflux -action hamonitor -chunk 30m -master influxdb-1 -slave influxdb-2

And this is the .toml file:

[General]
logdir = "./log"
loglevel = "info"
 sync-mode = "onlyslave"

master-db = "influxdb-1"

slave-db = "influxdb-2"

check-interval = "10s"
min-sync-interval = "20s"

initial-replication = "both"
monitor-retry-interval = "1m"
data-chuck-duration = "20m"


rw-max-retries = 5

 rw-retry-delay = "10s"
num-workers = 6
max-points-on-single-write = 10000
[http]
 name = "influx-sync"
 bind-addr = "10.1.1.14:4090"
 admin-user = "admin"
 admin-passwd = "passwd"
 cookie-id = "mycookie"
[[influxdb]]
 release = "1x"
 name = "influxdb-1"
 location = "http://10.1.1.15:8086/"
 admin-user = "librenms"
 admin-passwd = "passwd"
 timeout = "60s"
[[influxdb]]
 release = "1x"
 name = "influxdb-2"
 location = "http://10.1.1.14:8086/"
 admin-user = "influxsync"
 admin-passwd = "passwd"
 timeout = "60s"

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.