Giter VIP home page Giter VIP logo

outflux's Introduction

Outflux - Migrate InfluxDB to TimescaleDB

Go Report Card

This repo contains code for exporting complete InfluxDB databases or selected measurements to TimescaleDB.

Table of Contents

  1. Installation
  1. How to use
  1. Connection
  1. Known limitations

Installation

Binary releases

We provide binaries for GNU/Linux, Windows and MacOS with each release, these can be found under releases. To use outflux, download the binary, extract the compressed tarball and run the executable.

wget https://github.com/timescale/outflux/releases/download/v0.3.0/outflux_0.3.0_Linux_x86_64.tar.gz
tar xf outflux_0.3.0_Linux_x86_64.tar.gz
./outflux --help

Installing from source

Outflux is a Go project managed by go modules. You can download it in any directory and on the first build it will download it's required dependencies.

Depending on where you downloaded it and the go version you're using, you may need to set the GO111MODULE to auto, on or off. Learn about the GO111MODULE environment variable here.

# Fetch the source code of Outflux in any directory
$ git clone [email protected]:timescale/outflux.git
$ cd ./outflux

# Install the Outflux binary (will automaticly detect and download)
# dependencies.
$ cd cmd/outflux
$ GO111MODULE=auto go install

# Building without installing will also fetch the required dependencies
$ GO111MODULE=auto go build ./... 

How to use

Outflux supports InfluxDB 1.x.

Outflux should support using the 1.x query APIs for InfluxDB 2.x and 3.x. You will need to enable the 1.x APIs to use them. Consult the InfluxDB documentation for more details.

Before using it

It is recommended that you have and InfluxDB database with some data. For testing purposes you can check out the TSBS Data Loader Tool, which is part of the Time Series Benchmark Suite. It can generate large amounts of data to load into influx. Data can be generated with one command, just specify the format as 'influx', and then load it in with another command.

Connection params

Detailed information about how to pass the connection parameters to Outflux can be found at the bottom of this document at the Connection section.

Schema Transfer

The Outflux CLI has two commands. The first one is schema-transfer. This command discoverx the schema of an InfluxDB database, or specific measurements in an InfluxDB database, and (depending on the strategy selected) create or verify a TimescaleDB database that could hold the data.

The possible flags for the command can be seen by running:

$ cd $GOPATH/bin/
$ ./outflux schema-transfer --help

Usage is outflux schema-transfer database [measure1 measure2 ...] [flags], where database is the name of the InfluxDB database you wish to export, [measure1 ...] are optional and if specified will export only those measurements from the selected database. Additionally, you can specify the retention policy with the retention-policy flag.

For example outflux schema-transfer benchmark cpu mem will discover the schema for the cpu and mem measurements from the benchmark database.

Available flags for schema-transfer are:

flag type default description
input-server string http://localhost:8086 Location of the input database, http(s)://location:port.
input-pass string Password to use when connecting to the input database
input-user string Username to use when connecting to the input database
input-unsafe-https bool false Should 'InsecureSkipVerify' be passed to the input connection
retention-policy string autogen The retention policy to select the tags and fields from
output-conn string sslmode=disable Connection string to use to connect to the output database
output-schema string The schema of the output database that the data will be inserted into
schema-strategy string CreateIfMissing Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate
tags-as-json bool false If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale
tags-column string tags When tags-as-json is set, this column specifies the name of the JSON column for the tags
fields-as-json bool false If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale
fields-column string fields When fields-as-json is set, this column specifies the name of the JSON column for the fields
multishard-int-float-cast bool false If a field is Int64 in one shard, and Float64 in another, with this flag it will be cast to Float64 despite possible data loss
quiet bool false If specified will suppress any log to STDOUT

Migrate

The second command of the Outflux CLI is migrate. The possible flags for the command can be seen by running:

$ cd $GOPATH/bin/
$ ./outflux migrate --help

Usage is outflux migrate database [measure1 measure2 ...] [flags], where database is the name of the InfluxDB database you wish to export, [measure1 measure2 ...] are optional and if specified will export only those measurements from the selected database.

The retention policy can be specified with the retention-policy flag. By default, the 'autogen' retention policy is used.

For example outflux migrate benchmark cpu mem will export the cpu and mem measurements from the benchmark database. On the other hand outflux migrate benchmark will export all measurements in the benchmark database.

Available flags are:

flag type default description
input-server string http://localhost:8086 Location of the input database, http(s)://location:port.
input-pass string Password to use when connecting to the input database
input-user string Username to use when connecting to the input database
input-unsafe-https bool false Should 'InsecureSkipVerify' be passed to the input connection
retention-policy string autogen The retention policy to select the data from
limit uint64 0 If specified will limit the export points to its value. 0 = NO LIMIT
from string If specified will export data with a timestamp >= of its value. Accepted format: RFC3339
to string If specified will export data with a timestamp <= of its value. Accepted format: RFC3339
output-conn string sslmode=disable Connection string to use to connect to the output database
output-schema string public The schema of the output database that the data will be inserted into.
schema-strategy string CreateIfMissing Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate
chunk-size uint16 15000 The export query will request data in chunks of this size. Must be > 0
batch-size uint16 8000 The size of the batch inserted in to the output database
data-buffer uint16 15000 Size of the buffer holding exported data ready to be inserted in the output database
max-parallel uint8 2 Number of parallel measure extractions. One InfluxDB measure is exported using 1 worker
rollback-on-external-error bool true If set, when an error occurs while extracting the data, the insertion will be rollbacked. Otherwise it will try to commit
tags-as-json bool false If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale
tags-column string tags When tags-as-json is set, this column specifies the name of the JSON column for the tags
fields-as-json bool false If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale
fields-column string fields When fields-as-json is set, this column specifies the name of the JSON column for the fields
multishard-int-float-cast bool false If a field is Int64 in one shard, and Float64 in another, with this flag it will be cast to Float64 despite possible data loss
quiet bool false If specified will suppress any log to STDOUT

Examples

  • Use environment variables for determining output db connection
$ PGPORT=5433
$ PGDATABASE=test
$ PGUSER=test
...
$ ./outflux schema-transfer benchmark
  • Export the complete 'benchmark' database on 'localhost:8086' to the 'targetdb' database on localhost:5432. Use environment variable to set InfluxDB password
$ PGDATABASE=some_default_db
$ INFLUX_PASSWORD=test
...
$ outflux migrate benchmark \
> --input-user=test \
> --input-pass=test \
> --output-conn='dbname=targetdb user=test password=test' \
  • Export only measurement 'cpu' from 'two_week' retention policy in the 'benchmark' database. Drop the existing '"two_week.cpu"' table in 'targetdb' if exists, create if not
$ outflux migrate benchmark two_week.cpu \
> --input-user=test \
> --input-pass=test \
> --output-conn='dbname=targetdb user=test pass=test'\
> --schema-strategy=DropAndCreate
  • Export only the 1,000,000 rows from measurements 'cpu' and 'mem' from 'benchmark', starting from Jan 14th 2019 09:00
$ ./outflux migrate benchmark cpu mem \
> --input-user=test \
> --input-pass=test \
> --limit=1000000 \
> --from=2019-01-01T09:00:00Z

Connection

TimescaleDB connection params

The connection parameters to the TimescaleDB instance can be passed to Outflux in several ways. One is through the Postgres Environment Variables. Supported environment variables are: PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD, PGSSLMODE, PGSSLCERT, PGSSLKEY, PGSSLROOTCERT, PGAPPNAME, PGCONNECT_TIMEOUT. If they are not specified defaults used are: host=localhost, dbname=postgres, pguser=$USER, and sslmode=disable.

The values of the enviroment variables can be OVERRIDEN by specifying the '--output-conn' flag when executing Outflux.

The connection string can be in the format URI or DSN format:

  • example URI: "postgresql://username:password@host:port/dbname?connect_timeout=10"
  • example DSN: "user=username password=password host=1.2.3.4 port=5432 dbname=mydb sslmode=disable"

InfluxDB connection params

The connection parameters to the InfluxDB instance can be passed also through flags or environment variables. Supported/Expected environment variables are: INFLUX_USERNAME, INFLUX_PASSWORD. These are the same environment variables that the InfluxDB CLI uses.

If they are not set, or if you wish to override them, you can do so with the --input-user and --input-pass. Also you can specify to Outflux to skip HTTPS verification when communicating with the InfluxDB server by setting the --input-unsafe-https flag to true.

Known limitations

Fields with different data types across shards

Outflux doesn't support fields that have the same name but different data types across shards in InfluxDB, UNLESS the field is an integer and float in the InfluxDB shards. InfluxDB can store the fields as integer (64bit integer), float (64bit float), string, and boolean. You can specify the multishard-int-float-cast flag. This will tell Outflux to cast the integer values to float values. A 64bit float can't hold all the int64 values, so this may result in scrambled data (for values > 2^53).

If the same field is of any of the other possible InfluxDB types, an error will be thrown, since the values can't be converted.

This is also an issue even if you select a time interval in which a field has a consistent type, but exists as a different type in a shard outside of that interval. This is because the SHOW FIELD KEYS FROM measurement_name doesn't accept a time interval for which you would be asking

outflux's People

Contributors

atanasovskib avatar freeznet avatar jamesguthrie avatar jamessewell avatar mfreed avatar petetnt avatar robatticus avatar toanalien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

outflux's Issues

InfluxDB crashes because of high memory consumption

This is a problem with InfluxDB. I imagine outflux executes something like SELECT * FROM measurement . This kind of queries are documented to exhaust memory resources on InfluxDB hosts influxdata/influxdb#9313 .
I worked around this issue by provisioning 40GB of RAM instead of the usual 8GB on my google VM, and memory consumption during export of a measurement with around 35M metrics reached 25GB.
As documented in the influx issue, using limits and offsets wouldn't work. What I've seen to make a difference is using date ranges ๐Ÿคทโ€โ™‚๏ธ .

I understand this is not an outflux issue, but solving it would help facilitate the transition from influx to timescale

Reserved SQL names

Some fields in Influx might be labeled with reserved SQL, and should probably be quoted when creating the DDL.

./outflux migrate farmusage --input-server http://metrics-long.laika.com:8086 --output-conn xxx' active_tasks
2019/02/22 10:49:18 Selected input database: farmusage
2019/02/22 10:49:18 Selected measurements for exporting: [active_tasks]
2019/02/22 10:49:18 Extracted data sets schema. Preparing output database
2019/02/22 10:49:18 Overriding PG envrionment variables for connection with: dbname=...
2019/02/22 10:49:18 Selected Schema Strategy: CreateIfMissing
2019/02/22 10:49:18 CreateIfMissing strategy: Table active_tasks does not exist. Creating
2019/02/22 10:49:18 Creating table with:
 CREATE TABLE active_tasks(time TIMESTAMP, collation TEXT, netgroup TEXT, owner TEXT, sequence TEXT, shot TEXT, value BIGINT)
2019/02/22 10:49:18
---------------------------------------------
Migration finished with errors:
could not prepare output data set 'active_tasks'
ERROR: syntax error at or near "collation" (SQLSTATE 42601)

Sure enough:

db01> CREATE TABLE active_tasks(time TIMESTAMP, collation TEXT, netgroup TEXT, owner TEXT, sequence TEXT, shot TEXT, value BIGINT);
ERROR:  syntax error at or near "collation"
LINE 1: CREATE TABLE active_tasks(time TIMESTAMP, collation TEXT, ne...

vs

db01> CREATE TABLE active_tasks(time TIMESTAMP, "collation" TEXT, netgroup TEXT, owner TEXT, sequence TEXT, shot TEXT, value BIGINT);
CREATE TABLE
Time: 3.127 ms

That'll probably be a similar case to a lot of keywords, ie select user, etc. Might make more sense to just quote all column names in the CREATE TABLE.

duplicate column names found: index_name

When I try to transfer schema, I get:

outflux schema-transfer sun measures --input-server=https://influx.company.com --output-conn="dbname=sun user=julien password=sun" --schema-strategy=DropAndCreate
2020/08/12 08:42:53 Selected input database: sunchain
2020/08/12 08:42:53 Overriding PG environment variables for connection with: dbname=sun user=julien password=sun
2020/08/12 08:42:53 pipe_measures starting execution
2020/08/12 08:42:53 Discovering influx schema for measurement: measures
2020/08/12 08:42:54 could not transfer schema for measurement 'measures'
pipe_measures: could not prepare extractor
pipe_measures_ext: could not fetch data set definition for measure: measures
duplicate column names found: index_name

Yet, it is true, I have index_name, and index_name_1 in my measurement. But as it is not possibe to drop a field in Influx, how should I deal with it ? it would be no problem to import it, and then I would be able to delete duplicated column in postgres

isHypertableQueryTemplate incorrect for docker image timescale/timescaledb:2.0.0-pg12

Hi,
While migrating the data from influxdb I can transfer the schema correctly using outflux schema-transfer but outflux migrate using the schema strategy CreateIfMissing fails with the following error:

2021/01/26 23:57:39 pipe_post: pipe_post: could not prepare ingestor
could not check if table post is hypertable

Since the table post is a hypertable I digged into the code and found that the failed validation uses the query isHypertableQueryTemplate which is :

SELECT EXISTS (
		    						SELECT 1
								 	FROM timescaledb_information.hypertable
									 WHERE  table_schema = 'public' AND table_name='post')

However the hypertable table is not on timescaledb_information schema, but it is on _timescaledb_catalog.

So currently the only option for us to use outflux is to use the schema strategy DropAndCreate.

Thanks.

Error with Retention Policy in migration

Running the outflux command I get an error:

2021/05/03 16:14:43 Will ignore measurement 'thread_count' because:
error fetching fields for measurement 'thread_count'
error executing query: SHOW FIELD KEYS FROM "autogen"."thread_count"
retention policy not found: autogen

The command I am executing is:
./outflux migrate mydb --input-server https://testme:1234 --input-user testme --input-pass testme --output-conn='host=testme2 port=1234 dbname=migrate user=test password=test sslmode=verify-full'

What am i doing wrong? How should I use the retention policy?

PS: In several places the README uses output-con which is wrong.

Unable to Migrate - show tag keys from query is failing...

image

outflux is trying to execute query - SHOW TAG KEYS FROM .......
I believe it should be SHOW TAG KEYS ON database. I have tried the full query as SHOW TAG KEYS on "database"."measurement" and it fails on my influxdb.

Hopefully I am making sense. Please let me know. Thank you.

Does this tool work with InfluxDB2?

I'm trying to figure out if this tool works to migrate data/schema from InfluxDB2 instances.

Anyone managed to make it work? I'm stuck in:
2023/04/24 15:55:49 No measurements explicitly specified. Discovering automatically

Migrating same name fields from a measurement

I just ran into an edge case, where InfluxDB has different data types for the same field name within different shards. This is causing a problem with outflux and the migration for the measurement fails.

More on why and how InfluxDB does this can be found here.

2019/07/17 23:47:34 Discovering influx schema for measurement: M1
2019/07/17 23:47:34 pipe_M1: pipe_M1: could not prepare extractor
pipe_M1_ext: could not fetch data set definition for measure: M1
duplicate column names found: signal

This is happening because the signal field type is a float and an integer. InfluxDB:

> show field keys from M1;
name: M1
fieldKey         fieldType
--------         ---------
...
signal           float
signal           integer
...

I'm not sure what the best way to handle this could be. Some ideas that come to mind:

  1. Ignore fields that exhibit this behavior and warn the user, but migrate the rest of the measurement. Currently the entire measruement migration fails.
  2. Keep (merge) the duplicate field names as a single column in TimeScaleDB, with the highest precision data type and 'promote' the rest to it during the migration (per above mentioned influxdb explanation). I.e. select signal::float from M1.

Data Structure in influxdb and timescaledb are different

In my influx measurement, the timestamp tag data type is timestamp and that in timescale it is timestamp with timezone.
Also , there us one more field which is of type string seperated by commas and I want to convert it into an array in timescale.

Please help me in which file should I make the required chanages and how to run the code for the same.

Adding an "append" option to the --schema-strategy

Is it possible to add support for an append option for the --schema-strategy parameter?

Example of what the syntax would look like if implemented --schema-strategy=Append

./outflux migrate dbmetrics cpu disk mem processes system --output-conn='dbname=dbmetrics user=monitor password=monitor' --schema-strategy=Append

There doesn't seem to be a command in the GoLang language to accomplish that.

expected json response, got "text/plain" - but with JSON response body

I tried to test TimescaleDB by importing some data from InfluxDB, but the error below happens. The values in the response are what I have in InfluxDB. The only problem seems to be the Content-Type on the response. How to proceed?

influx version
Influx CLI 2.0.3 (git: fe04d346df) build_date: 2020-12-15T01:00:16Z
./outflux --version
Outflux version 0.2.0
Run 'outflux --help' for usage
2021/01/26 11:21:44 Discovering influx schema for measurement: http_get
2021/01/26 11:21:44 could not transfer schema for measurement 'http_get'
pipe_http_get: could not prepare extractor
pipe_http_get_ext: could not fetch data set definition for measure: http_get
could not discover the tags of measurement 'http_get'
error fetching tags for measurement 'http_get'
error executing query: SHOW TAG KEYS FROM "autogen"."http_get"
expected json response, got "text/plain", with status: 200 and response body: "{\"results\":[{\"statement_id\":0,\"series\":[{\"name\":\"http_get\",\"columns\":[\"tagKey\"],\"values\":[[\"addr\"],[\"host\"],[\"ip\"],[\"site\"]]}]}]}\n"

Error fetching fields when GUIDs are involved

We have issues running outflux with our current database structure. We have a large number of measurements where each are named using a unique GUID (example "015fab01-477f-40a1-99fa-cd020b7e8bf5") and where each measurement contain a single tag column containing guids (unique for each time series. Typically somewhere between 1 and 50 per measurement) (example '015fab01-477f-40a1-99fa-cd020b7e8bf5', though not the same as the measurement). Finally a field column named "value".

The output from outflux is
field keys query returned unexpected result. no values returned for measure '01111498-68f7-4512-86fe-2de98a8d67bf'
2020/08/24 15:05:50 Will ignore measurement '0120704f-ddc2-488f-84cf-86ee9b5abe96' because:
error fetching fields for measurement '0120704f-ddc2-488f-84cf-86ee9b5abe96'

And as it only says "unexpected result" we have issues diagnosing it further

cannot convert time column with nanoseconds

trying to use outflux to migrate influxdb data to timescaledb, but when transferring to timescaledb, the time column only have ms precision instead of ns. According to timescaledb docs, the time column should be long int, but how can i create hypertable with outflux to support ns?

outflux fails if pg user not superuser

If outflux migrate is run with a postgres user which is the owner of the destination database, but not super-user, the following message is output:

ERROR: permission denied for relation telemetry_metadata (SQLSTATE 42501)

and the procedure is aborted.

expected behaviour:

migration should succeed even without super-user privileges, without altering the telemetry table.

Outflux to compressed timseries (with Timescale 1.5)

Hello,

The new column-oriented compression now makes timescaleDB a viable alternative to influxdb. It would however help to have outflux directly handle the compression. At the moment in my trials I run a command similar to this one after running outflux migrate for each of my measurement

ALTER TABLE co2_concentration SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'factory_location,sensor_location'
);
SELECT add_compress_chunks_policy('co2_concentration', INTERVAL '1 day');

It's pretty long though, data is written to disk in uncompressed format and then is compressed, IO is a bit bottleneck, and I have to process measurements one by one to avoid running out of disk space.

Would be nice if outflux could insert directly in a compressed format.

PS: the "compress_segmentby" option in my use case is really easy to figure out, it's simply the tags for the measure

show tag keys on my_database FROM "{measure}"

I suspect this would be a good enough default for outflux?

migrate command fails on tables with ~ 1300 columns

I manage many influxdb databases with scientific data. Some measurements have thousands of fields. I migrated the schema for a measurement with 1335 fields successfully. When I did the data migration, the default procedure threw this error

# 2019/10/25 14:56:23 pipe_lunch.gse_ext:Pulling chunks with size 15000
# 2019/10/25 14:56:23 'pipe_lunch.gse_ext': extractor 'pipe_lunch.gse_ext' could not execute a chunked query.
# expected json response, got "text/html", with status: 414 and response body: "<html>\r\n<head><title>414 Request-URI Too Large</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>414 Request-URI Too Large</h1></center>\r\n</body>\r\n</html>\r\n"

Is there a different way to migrate a measurement like mine?

Could not open connection to TimescaleDB server

[root@influx outflux]# ./outflux schema-transfer testdb measurement1--input-server=http://localhost:8086--output-conn="dbname=postgres user=postgres"
2021/07/22 16:43:19 Selected input database: testdb
2021/07/22 16:43:19 Overriding PG environment variables for connection with: sslmode=disable
2021/07/22 16:43:19 could not open connections to input and output database
could not open connection to TimescaleDB Server
FATAL: role "root" does not exist (SQLSTATE 28000)
[root@influx outflux]#

Both influxDB and TimescaleDB are installed on the same server. The "dbname=postgres user=postgres" couldn't be wrong. Why can't connect to outflux with TSDB?

unknown authentication type: 10

I just tried it out and I get error... I also realised last release is from 2019 .. and last commit 2 years ago..

./outflux schema-transfer database123
2022/01/06 07:20:17 Selected input database: database123
2022/01/06 07:20:17 Overriding PG environment variables for connection with: sslmode=disable
2022/01/06 07:20:17 could not open connections to input and output database
could not open connection to TimescaleDB Server
unknown authentication type: 10

Outflux fails silently when Influxdb server has query-timeout defined

Running outflux, it consistently stops after about 900 seconds:

2021/12/08 16:06:35 pipe_479_ext: Extracted 83655000 rows from Influx
2021/12/08 16:06:35 pipe_479_ext: Extracted 83670000 rows from Influx
2021/12/08 16:06:35 pipe_479_ing: Complete. Inserted 83670000 rows.
2021/12/08 16:06:35 All pipelines finished
2021/12/08 16:06:35 Migration execution time: 940.285 seconds

But, it's far from complete. It managed to load about a day of data, but it should load 6 months.

Our Influx server has a query-timeout of 900 seconds, so this seems to be because of that.

I can't even work around it by batching using multiple calls to outflux, because only DropAndCreate works, because of querying the Timescale version 2 schema fails.

Expected/requested behavior:

  1. At least detection of the error. According to the documentation, Influx 1 should say: ERR: query timeout reached.
  2. Ability to batch the reading in x number of rows, or time range, etc.

Outflux version 0.2.0

Allow unsafe HTTPS to InfluxDB

InfluxDB documentation on https setup suggests setting up HTTPS with a self-signed certificate.
Outflux doesn't allow self signed certificates, but allowing them would be easily done by adding an option flag --insecureSSL and InsecureSkipVerify: insecure to

clientConfig := influx.HTTPConfig{Addr: params.Server, Username: user, Password: pass}

workaround:

If you have a working go toolchain, you can work around this limitation by fetching and modifying the code

go get github.com/timescale/outflux
cd $GOPATH/src/github.com/timescale/outflux
nano internal/connections/influx_connection.go

replacing line 54 (HTTPConfig creation) with the following

clientConfig := influx.HTTPConfig{Addr: params.Server, Username: user, Password: pass, InsecureSkipVerify: true}

and then finally compiling and installing outflux with:

dep ensure -v
cd cmd/outlux
go install 

The binary program will be in $GOPATH/bin

A migration error occurred when additionally migrating Influx Data containing a boolean type.

A migration error occurred when additionally migrating Influx Data containing a boolean type.

The following measurements from InfluxDB were used.
name: kensho001

fieldKey fieldType
test_bool boolean
test_float float
test_int integer
test_string string

procedure:

  1. Create an empty table kensho001 in PostgreSQL.
  2. Data was transferred to kensho001 with Outflux. โ†’ Successful migration
2019/08/02 10:37:31 All pipelines scheduled
2019/08/02 10:37:31 Overriding PG environment variables for connection with: user=postgres password=postgres host=xxx.xxx.xxx.xxx  port=5432 dbname=postgres
2019/08/02 10:37:31 pipe_kensho001 starting execution
2019/08/02 10:37:31 Discovering influx schema for measurement: kensho001
2019/08/02 10:37:31 Discovered: DataSet { Name: kensho001, Columns: [Column { Name: time, DataType: IDRFTimestamp} Column { Name: product_no, DataType: String} Column { Name: test_bool, DataType;  Boolean} Column { Name: test_float, DataType: Double} Column { Name: test_int, DataType: Integer64} Column { Name: test_string, DataType: String}], Time Column: time }
2019/08/02 10:37:31 Selected Schema Strategy: CreateIfMissing
2019/08/02 10:37:31 CreateIfMissing strategy: Table kensho001 does not exist. Creating
2019/08/02 10:37:31 Creating table with:
 CREATE TABLE "kensho001"("time" TIMESTAMP, "product_no" TEXT, "test_bool" BOOLEAN, "test_float" FLOAT, "test_int" BIGINT, "test_string" TEXT)
2019/08/02 10:37:31 Preparing TimescaleDB extension:
CREATE EXTENSION IF NOT EXISTS timescaledb
2019/08/02 10:37:31 Creating hypertable with: SELECT create_hypertable('"kensho001"', 'time');
2019/08/02 10:37:31 Installation metadata table doesn't exist in this TimescaleDB version
2019/08/02 10:37:31 Starting extractor 'pipe_kensho001_ext' for measure: kensho001
2019/08/02 10:37:31 pipe_kensho001_ext: Extracting data from database 'testdb'
2019/08/02 10:37:31 pipe_kensho001_ext: SELECT "time", "product_no", "test_bool", "test_float", "test_int", "test_string" FROM "autogen"."kensho001" WHERE time >= '2019-08-02T01:35:17.512358912Z' AND  time <= '2019-08-02T01:35:17.512358912Z'
2019/08/02 10:37:31 pipe_kensho001_ext:Pulling chunks with size 15000
2019/08/02 10:37:31 Starting data ingestor 'pipe_kensho001_ing'
2019/08/02 10:37:31 Will batch insert 8000 rows at once. With commit strategy: CommitOnEachBatch
2019/08/02 10:37:31 pipe_kensho001_ext: Extracted 1 rows from Influx
2019/08/02 10:37:31 pipe_kensho001_ing: Complete. Inserted 1 rows.
2019/08/02 10:37:31 All pipelines finished
2019/08/02 10:37:31 Migration execution time: 0.066 seconds
  1. Added measurement data to kensho001. โ†’ Migration error
2019/08/02 10:38:45 All pipelines scheduled
2019/08/02 10:38:45 Overriding PG environment variables for connection with: user=postgres password=postgres host=xxx.xxx.xxx.xxx  port=5432 dbname=postgres
2019/08/02 10:38:45 pipe_kensho001 starting execution
2019/08/02 10:38:45 Discovering influx schema for measurement: kensho001
2019/08/02 10:38:45 Discovered: DataSet { Name: kensho001, Columns: [Column { Name: time, DataType: IDRFTimestamp} Column { Name: product_no, DataType: String} Column { Name: test_bool, DataType: Boolean} Column { Name: test_float, DataType: Double} Column { Name: test_int, DataType: Integer64} Column { Name: test_string, DataType: String}], Time Column: time }
2019/08/02 10:38:45 Selected Schema Strategy: CreateIfMissing
2019/08/02 10:38:45 pipe_kensho001: pipe_kensho001: could not prepare ingestor
existing table in target db is not compatible with required. Required column test_bool of type Boolean is not compatible with existing type Unknown
2019/08/02 10:38:45 All pipelines finished
2019/08/02 10:38:45 Migration execution time: 0.047 seconds
2019/08/02 10:38:45 Migration finished with errors:
pipe_kensho001: could not prepare ingestor
existing table in target db is not compatible with required. Required column test_bool of type Boolean is not compatible with existing type Unknown

migrate crashes due to 0x00 character in string field

While running migrate, the process crashed with the following message:

2020/01/14 21:10:06 pipe_device_monitor_V2_ext: Extracted 4785000 rows from Influx
2020/01/14 21:10:06 pipe_device_monitor_V2_ext: Extracted 4800000 rows from Influx
2020/01/14 21:10:07 pipe_device_monitor_V2_ext: Extracted 4815000 rows from Influx
2020/01/14 21:10:08 pipe_device_monitor_V2_ext: Extracted 4830000 rows from Influx
2020/01/14 21:10:09 pipe_device_monitor_V2_ext: Extracted 4845000 rows from Influx
2020/01/14 21:10:10 pipe_device_monitor_V2_ext: Extracted 4860000 rows from Influx
2020/01/14 21:10:10 pipe_device_monitor_V2_ing could not insert batch of rows in output db
ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021)
2020/01/14 21:10:11 pipe_device_monitor_V2_ext: Extracted 4875000 rows from Influx
^C

The migration stalled and did not exit after that error was thrown; I had to ^C out.

Would it be advisable / possible to add a flag indicating that these characters be stripped out during the migrate process?

unable to migrate with tags with blanks or dashes

I get this error: error executing query: SHOW TAG KEYS FROM current load
but running SHOW TAG KEYS FROM "current load" works.

on another database I get this:
pipe_swap-pages_ext: could not fetch data set definition for measure: swap-pages
could not discover the tags of measurement 'swap-pages'
error fetching tags for measurement 'swap-pages'
error executing query: SHOW TAG KEYS FROM swap-pages

br

Can't install - cannot find package ...

Hi,

when I run installing command:
GO111MODULE=auto go install

I got errors about finding a lot of packages.
I don't have experience with go at all, so maybe there is something wrong with my installation?

Thanks.

Errors:
migrate.go:10:2: cannot find package "github.com/influxdata/influxdb/client/v2" in any of:
/usr/lib/go-1.10/src/github.com/influxdata/influxdb/client/v2 (from $GOROOT)
/home/oldrich/go/src/github.com/influxdata/influxdb/client/v2 (from $GOPATH)
migrate.go:11:2: cannot find package "github.com/spf13/cobra" in any of:
/usr/lib/go-1.10/src/github.com/spf13/cobra (from $GOROOT)
/home/oldrich/go/src/github.com/spf13/cobra (from $GOPATH)
app_context.go:4:2: cannot find package "github.com/timescale/outflux/internal/cli" in any of:
/usr/lib/go-1.10/src/github.com/timescale/outflux/internal/cli (from $GOROOT)
/home/oldrich/go/src/github.com/timescale/outflux/internal/cli (from $GOPATH)
...
...
...

Additional debug output on schema validation

When trying to move an entire Influx database, if a specific measurement is in a bad way, it'd be nice to know which measurement has the problem.

./outflux schema-transfer farmusage --input-server http://metrics-long.laika.com:8086 --schema-strategy DropCascadeAndCreate --output-conn 'xxx'
2019/02/22 10:44:24 Selected input database: farmusage
2019/02/22 10:44:24 All measurements selected for exporting
2019/02/22 10:44:25 Couldn't discover the database schema
duplicate column names found: capacity
2019/02/22 10:44:25 could not discover the input database schema
duplicate column names found: capacity

In this case, "farmusage" has maaany measurements, so trying to discover which had a duplicate capacity field was unpleasant.

Nevermind how Influx even allows such a thing to be possible, but setting that aside -- having the outflux tool include the measurement name on error would be bueno.

Entry point to run the source code

Which is the entrypoint file to run the code?
While installing from source code, the documentation says to dep ensure -v but then there is no .toml file in the current project.

When I tried to run the main.go file in the cmd/outflux folder, I got the below error
# command-line-arguments ./main.go:6:2: undefined: RootCmd

When I tried to run the root.go file in the cmd/outflux folder, I got the below error
../../internal/cli/pipe_service.go:6:2: cannot find package "github.com/influxdata/influxdb/client/v2" in any of: /usr/local/go/src/github.com/influxdata/influxdb/client/v2 (from $GOROOT) /home/komal/Documents/work/go_projects/src/github.com/influxdata/influxdb/client/v2 (from $GOPATH) ../../internal/connections/pgx_mock.go:3:8: cannot find package "github.com/jackc/pgx" in any of: /usr/local/go/src/github.com/jackc/pgx (from $GOROOT) /home/komal/Documents/work/go_projects/src/github.com/jackc/pgx (from $GOPATH) root.go:7:2: cannot find package "github.com/spf13/cobra" in any of: /usr/local/go/src/github.com/spf13/cobra (from $GOROOT) /home/komal/Documents/work/go_projects/src/github.com/spf13/cobra (from $GOPATH) ../../internal/cli/flagparsers/connection_args_parser.go:6:2: cannot find package "github.com/spf13/pflag" in any of: /usr/local/go/src/github.com/spf13/pflag (from $GOROOT) /home/komal/Documents/work/go_projects/src/github.com/spf13/pflag (from $GOPATH)

Dot (.) in InfluxDB measurement name is interpreted wrong

Hi

I am testing Outflux to see if Timescale can offer an alternative to InfluxDB, but our measurements in InfluxDB often have dots in the measurement name (example: VS.AsrRxMalformedPkts) and this causes Outflux to fail. Outflux translates "VS" as the retention policy and "AsrRxMalformedPkts" as the measurement name, and the resulting query fails:

2019/07/01 08:39:18 Selected input database: spgw
2019/07/01 08:39:18 Overriding PG environment variables for connection with: dbname=spgw user=postgres
2019/07/01 08:39:18 pipe_VS.AsrRxMalformedPkts starting execution
2019/07/01 08:39:18 Discovering influx schema for measurement: VS.AsrRxMalformedPkts
2019/07/01 08:39:19 could not transfer schema for measurement 'VS.AsrRxMalformedPkts'
pipe_VS.AsrRxMalformedPkts: could not prepare extractor
pipe_VS.AsrRxMalformedPkts_ext: could not fetch data set definition for measure: VS.AsrRxMalformedPkts
could not discover the fields of measure 'VS.AsrRxMalformedPkts'
error fetching fields for measurement 'VS.AsrRxMalformedPkts'
error executing query: SHOW FIELD KEYS FROM "VS"."AsrRxMalformedPkts"
retention policy not found: VS

Here is the output from Influx:

> SHOW FIELD KEYS FROM "VS"."AsrRxMalformedPkts"
ERR: retention policy not found: VS
> SHOW FIELD KEYS FROM "VS.AsrRxMalformedPkts"
name: VS.AsrRxMalformedPkts
fieldKey fieldType
-------- ---------
value    float
>

If there is a dot in the measurement name the query will have to be like this:
SHOW FIELD KEYS FROM "VS.AsrRxMalformedPkts"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.