the4thdoctor / pg_chameleon Goto Github PK

View Code? Open in Web Editor NEW

377.0 20.0 82.0 2.83 MB

MySQL to PostgreSQL replica system

Home Page: http://www.pgchameleon.org

License: BSD 2-Clause "Simplified" License

Python 76.85% PLpgSQL 22.50% Shell 0.65%

python postgresql mysql mysql-replication replication database linux database-migrations database-replication postgres

pg_chameleon's Introduction

https://img.shields.io/github/release/the4thdoctor/pg_chameleon

pg_chameleon is a MySQL to PostgreSQL replica system written in Python 3. The system use the library mysql-replication to pull the row images from MySQL which are stored into PostgreSQL as JSONB. A pl/pgsql function decodes the jsonb values and replays the changes against the PostgreSQL database.

pg_chameleon 2.0 is available on pypi

The documentation is available on pgchameleon.org

Please submit your bug reports on GitHub.

Requirements

Replica host

Operating system: Linux, FreeBSD, OpenBSD Python: CPython 3.5+

Optionals for building documentation

Origin database

MySQL 5.5+

Aurora MySQL 5.7+

Destination database

PostgreSQL 9.5+

Example scenarios

Analytics
Migrations
Data aggregation from multiple MySQL databases

Features

Read from multiple MySQL schemas and restore them it into a target PostgreSQL database. The source and target schema names can be different.
Setup PostgreSQL to act as a MySQL replica.
Support for enumerated and binary data types.
Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY/TRUNCATE, RENAME).
Discard of rubbish data coming from the replica.
Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica.
Possibilty to refresh single tables or single schemas.
Basic replica monitoring.
Detach replica from MySQL for migration support.
Data type override (e.g. tinyint(1) to boolean)
Daemonised init_replica process.
Daemonised replica process with two separated subprocess, one for the read and one for the replay.
Rollbar integration
Support for geometrical data. Requires PostGIS on the target database.
Minimal locking during init_replica for transactional engines (e.g. innodb)

Caveats

The replica requires the tables to have a primary or unique key. Tables without primary/unique key are initialised during the init_replica process but not replicated.

The copy_max_memory is just an estimate. The average rows size is extracted from mysql's informations schema and can be outdated. If the copy process fails for memory error check the failing table's row length and the number of rows for each slice.

Python 3 is supported only from version 3.5 as required by mysql-replication .

The lag is determined using the last received event timestamp and the postgresql timestamp. If the mysql is read only the lag will increase because no replica event is coming in.

The detach replica process resets the sequences in postgres to let the database work standalone. The foreign keys from the source MySQL schema are extracted and created initially as NOT VALID. The foreign keys are created without the ON DELETE or ON UPDATE clauses. A second run tries to validate the foreign keys. If an error occurs it gets logged out according to the source configuration.

Setup

RPM PGDG

pg_chameleon is included in the PGDG RMP repository thanks to Devrim.

Please follow the instructions on https://www.postgresql.org/download/linux/redhat/

openSUSE Build Service

pg_chameleon is available on the openSUSE build Service

Currently all releases are supported except SLE_12_SP5 because of unresolved dependencies.

Virtual env setup

Create a virtual environment (e.g. python3 -m venv venv)
Activate the virtual environment (e.g. source venv/bin/activate)
Upgrade pip with pip install pip --upgrade
Install pg_chameleon with pip install pg_chameleon.
Create a user on mysql for the replica (e.g. usr_replica)
Grant access to usr on the replicated database (e.g. GRANT ALL ON sakila.* TO 'usr_replica';)
Grant RELOAD privilege to the user (e.g. GRANT RELOAD ON *.* to 'usr_replica';)
Grant REPLICATION CLIENT privilege to the user (e.g. GRANT REPLICATION CLIENT ON *.* to 'usr_replica';)
Grant REPLICATION SLAVE privilege to the user (e.g. GRANT REPLICATION SLAVE ON *.* to 'usr_replica';)

Configuration directory

The system wide install is now supported correctly.

The configuration is set with the command chameleon set_configuration_files in $HOME/.pg_chameleon . Inside the directory there are three subdirectories.

configuration is where the configuration files are stored.
pid is where the replica pid file is created. it can be changed in the configuration file
logs is where the replica logs are saved if log_dest is file. It can be changed in the configuration file

You should use config-example.yaml as template for the other configuration files. Check the configuration file reference for an overview.

pg_chameleon's People

Contributors

Stargazers

Watchers

Forkers

mmariani zhuomingliang yuejiesong1900 cbandy wangyipengpeter gitter-badger whloyi chipxd jinjin123 jackerzhou jie123108 dilpreet92 skyformat99 prashantjha tom2jack hasnatapu ranjeetsinghmk gridl stefanproell waiting4158 sahilgan cuulee dbprincess arcadiapower zhoubh timmapuramreddy li8033 rafaelichow itair dgary1980 aravindhrs duzhanyuan streemau tookennysupreme xiaoxiaohe-e kevinl8890 qcode-software jeremydesvals jovankricka wycwyx bugfyi zoldar systemofadrow acarapetis rorhug porshkevich rascaldan hadesfeng blancinot srqway bipthelin junli0411 bushari hovel packbackbooks wesen hummans zambodaniel zhiyang0310 suryatmodulus johbcodes xmlgrg stackitgroup enixdark vitaly-ma toolsense icodein rafonseca wuyb518 songhuagao nikochiko btt-17 nocturnalglory bukem piotriw ekochetkov tselischevpavel radtek jamal-b jasonliu1567 kenoiwi

pg_chameleon's Issues

Question: how to replicate 2 schema from the same mysql-instance to a single postgres-database (different schema)

Is it possible to run a replication of 2 mysql schema on the same mysql instance (thus sharing the same binlogs) to a single postgress database instance but to different schema.

I am not sure if there would be a problem with the shared sch_chameleon schema that is used as the intermediate. Also I see that possibly the same binlogs may get processed twice.

Thanks, Martin

add support for rename statement

RENAME TABLE sakila.test_partition TO sakila._test_partition_old, sakila._test_partition_new TO sakila.test_partition is not captured.
this cause an issue with the percona on line schema change.

add check for mysql configuration

as per issue #7 add a sanity check on mysql before starting the replica

improve support for NULLable fields

Currently the alter table or create table parsed from mysql generate always NULL fields.
Improve the aspect managing the implicit defaults when replicating the DDL.

NoneType: None

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/10/

Traceback (most recent call last):

NoneType: None

pg can't handle NUL characters in string, but MySQL/MariaDB can. Migration of rows fails.

When debug log level is on, the following is found:

[ERROR] - pg_lib.py (547): unexpected error when processing the row
[ERROR] - pg_lib.py (548):  - > Table: deposit_imports
[ERROR] - pg_lib.py (549):  - > Insert list: "id","deposit_date","source","record_amount","reference","code"
[ERROR] - pg_lib.py (550):  - > Insert values: ['219456', '020160624', 'H2H', '+0000000005000.00',  '             \x00\x0001234567890     '

This can be addressed by replacing the NUL's in MySQL with a space, but it means the original data is changed, which we want to avoid.

Can a method be incorporated to check for NUL's in MySQL data strings on replication, since this is essentially an integral part of a migration process, especially if it is not a once off exercise?

add resync on error as option for the source's replica

if an error happens on replay the replica can remove the table or stop it.
add a third option to have the affected tables reloaded when an error happens

improvement: run the batch ingestion process in parallel

As an improvement:
run the batch-process (executing fn_process_batch) in parallel (thread/forkl) to the parsing of binary logs when using start_replica

That would help keep up with the catchup of the database.

Note: I am not asking for parallel table processing when running init_replica, but that would also help ;)

plugin for postgresql logical decoding

write a plugin for postgresql logical decoding emitting the row images in the same format as mysql's

clarify replica log messages

as per issue #8 change the replica messages to better understanding

The replica process crashed.

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/11/

The replica process crashed.
 Configuration: {'pid_dir': '~/.pg_chameleon/pid/', 'rollbar_key': 'cedda341e2374bb88f90fc6daa6fa135', 'log_level': 'debug', 'sources': {'example2': {'db_conn': {'database': 'db_replica', 'port': '3306', 'charset': 'utf8', 'host': 'localhost', 'password': 'never_commit_passwords', 'user': 'usr_replica'}, 'lock_timeout': '120s', 'copy_max_memory': '300M', 'sleep_loop': 5, 'limit_tables': None, 'replica_batch_size': 1000, 'batch_retention': '1 day', 'out_dir': '/tmp', 'my_server_id': 100, 'schema_mappings': {'my_schema': 'dest_schema3'}, 'copy_mode': 'file', 'skip_tables': None, 'grant_select_to': ['usr_readonly'], 'create_views_in_schema': ['sch_readonly']}, 'example': {'db_conn': {'connect_timeout': 10, 'host': 'construct', 'charset': 'utf8', 'port': '3306', 'password': 'root', 'user': 'root'}, 'lock_timeout': '120s', 'copy_max_memory': '300M', 'sleep_loop': 5, 'limit_tables': None, 'replica_batch_size': 1000, 'batch_retention': '1 day', 'out_dir': '/tmp', 'my_server_id': 100, 'schema_mappings': {'sakila': 'sch_sakila'}, 'copy_mode': 'file', 'skip_tables': None, 'grant_select_to': ['usr_readonly'], 'create_views_in_schema': ['sch_readonly']}}, 'rollbar_env': 'production', 'pg_conn': {'database': 'db_replica', 'port': '5433', 'charset': 'utf8', 'host': 'construct', 'password': 'replica', 'user': 'usr_replica'}, 'log_dir': '~/.pg_chameleon/logs/', 'log_days_keep': 10, 'log_dest': 'file', 'type_override': {'tinyint(1)': {'override_tables': ['*'], 'override_to': 'boolean'}}}

Can I manually specify the binlog file and location?

In addition to deploying a standalone cascading replica database, can I restore an instance from a backup, then specify binlog files and locations for incremental synchronization. Finally build a simulation of the slave database synchronization.
FLUSH TABLE WITH READ LOCK This online database is not acceptable

Read and reply data using the threading library

Read and reply data in different threads using the threading library.

The replay performance are now good enough to overlap with the read.

support for logical replica with source type postgresql

add the full support for logical replica (except the DDL) with source type postgresql

There was an error during the replay of data on the tables db_sakila.test_tiny SQLSTATE: 22P02 - ERROR MESSAGE: invalid input syntax for type boolean: "32". Those tables are no longer replicated.

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/12/

There was an error during the replay of data on the tables db_sakila.test_tiny SQLSTATE: 22P02 - ERROR MESSAGE: invalid input syntax for type boolean: "32". Those tables are no longer replicated.

SQLCODE: 22008 SQLERROR: ERROR: date/time field value out of range: "0000-00-00 00:00:00"

Is there a way to "transform" those?

Jul 24 13:55:39: [ERROR] - pg_lib.py (510): b'INSERT INTO "cpc"."meta_config" ("id","key","value","created_at","updated_at") VALUES (\'1\',\'payment_overdue_limit\',\'30\',\'0000-00-00 00:00:00\',\'0000-00-00 00:00:00\');'

I understand that it should look like this:

insert into subscriptions_sub(datetime_created) values (NULLIF(your_field_name,'0000-00-00 00:00:00')::timestamp)

to be totally safe...

Write method for reading the mysql replica stream

Write the method(s) for reading the mysql replica stream and write it into the target postgresql database.

Escaping table names

I would like to synchronize tables with reserved keyword as names - for example "order", "user". Escaping it in config file by '`' didn't work. I don't know nothing about creating and synchronizing it in pgSQL cause I didn't pass init_replica. I would appreciate your help, as I would like to use this solution in the near future in a fairly large e-commerce system production.

MySQL changes not reflected in Postgres

Hi,

I'm test driving pg_chameleon, but I've hit a snag. None of the changes in MySQL show up in Postgres (both tables and rows). I've put all the relevant information I can think of into https://gist.github.com/samplesizeofone/bf9019e8ff0dcb272405afa3c47c7d09 (pg_chameleon config, the relevant lines from my deploy, logged output, test sql, and some of sch_chameleon.*).

It looks like information is making it to sch_chameleon.t_log_replica. I'm logged in as vagrant, which should be the replica user.

Any help is very appreciated.

Thanks!

init_replica: ERROR 1270 (HY000): Illegal mix of collations

We got the following table:

 CREATE TABLE `channel_attributes` (
  `channel_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `attribute_id` bigint(20) NOT NULL,
  `value` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT 'true',
  PRIMARY KEY (`channel_id`,`attribute_id`),
  KEY `attribute_id` (`attribute_id`),
  CONSTRAINT `channel_attributes_ibfk_2` FOREIGN KEY (`attribute_id`) REFERENCES `channel_attributes_domain` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

This table produces the following issue when running init_replica:

Jul 17 09:46:49: [INFO] - mysql_lib.py (612): copying table channel_attributes
Jul 17 09:46:49: [DEBUG] - mysql_lib.py (615): estimating rows in channel_attributes
Jul 17 09:46:49: [DEBUG] - mysql_lib.py (644): channel_attributes will be copied in 1 slices of 3054105 rows
Jul 17 09:46:49: [DEBUG] - mysql_lib.py (651): Executing query for table channel_attributes
Jul 17 09:46:49: [ERROR] - mysql_lib.py (654): error when pulling data from channel_attributes. sql executed: SELECT REPLACE(CONCAT('"',CONCAT_WS('","',COALESCE(REPLACE(`channel_id`, '"', '""'),'NULL') ,COALESCE(REPLACE(`attribute_id`, '"', '""'),'NULL') ,COALESCE(REPLACE(`value`, '"', '""'),'NULL') ),'"'),'"NULL"','NULL') as data FROM channel_attributes;
Jul 17 09:46:49: [DEBUG] - mysql_lib.py (656): Starting extraction loop for table channel_attributes
Jul 17 09:46:49: [INFO] - mysql_lib.py (692): the table channel_attributes does not exist

running the sql manually shows the following extra error:

mysql> SELECT REPLACE(CONCAT('"',CONCAT_WS('","',COALESCE(REPLACE(`channel_id`, '"', '""'),'NULL') ,COALESCE(REPLACE(`attribute_id`, '"', '""'),'NULL') ,COALESCE(REPLACE(`value`, '"', '""'),'NULL') ),'"'),'"NULL"','NULL') as data FROM channel_attributes;
ERROR 1270 (HY000): Illegal mix of collations (utf8_bin,NONE), (utf8_general_ci,COERCIBLE), (utf8_general_ci,COERCIBLE) for operation 'replace'

The issue is primarily with the value column - removing it from the sql and it works.

The replica process crashed.

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/9/

The replica process crashed.

Ensure the sync replica on single tables works on the specified tables only

Ensure the sync replica on single tables works on the specified tables only. In particular be sure no other tables gets truncated when syncing except the one specified with --table

Views replication?

Hello!
Is it possible to replicate simple views?

For example only views containing primary key of a regular table and without "group by" clause, i.e:

CREATE VIEW USER_STATS as
	SELECT user.id, user.name, LPAD( user.sap_id,10,'0'), stats WHERE stats>0

It would be great to copy data without schema freezing.
Thanks!

Question: strangeness observed with replication

Hi!

We have observed the following:

Jul 24 08:53:09: [DEBUG] - mysql_lib.py (277): total events exceeded 200001. Writing batch.: {'Position': 678740220, 'Time': 1500649371, 'File': 'mysql-bin.000039'}
Jul 24 08:53:34: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 693897139, 'Time': 1500649947, 'File': 'mysql-bin.000039'}
Jul 24 08:53:59: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 710102852, 'Time': 1500650358, 'File': 'mysql-bin.000039'}
Jul 24 08:54:27: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 724663510, 'Time': 1500650843, 'File': 'mysql-bin.000039'}
Jul 24 08:54:51: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 740756958, 'Time': 1500651181, 'File': 'mysql-bin.000039'}
Jul 24 08:55:16: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 758713920, 'Time': 1500651730, 'File': 'mysql-bin.000039'}
Jul 24 08:55:40: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 771976313, 'Time': 1500651790, 'File': 'mysql-bin.000039'}
Jul 24 08:56:04: [DEBUG] - mysql_lib.py (277): total events exceeded 200000. Writing batch.: {'Position': 785997674, 'Time': 1500651905, 'File': 'mysql-bin.000039'}
Jul 24 08:56:26: [DEBUG] - mysql_lib.py (146): ROTATE EVENT - binlogfile mysql-bin.000040, position 4.
Jul 24 08:56:28: [DEBUG] - mysql_lib.py (317): trying to save the master data...
Jul 24 08:56:28: [INFO] - pg_lib.py (796): saving master data id source: 1 log file: mysql-bin.000040  log position:4 Last event: 1500651900
Jul 24 08:56:28: [DEBUG] - pg_lib.py (814): master data: table file t_log_replica_1_src_1, log name: mysql-bin.000040, log position: 4
Jul 24 08:56:28: [DEBUG] - mysql_lib.py (320): new batch created, saving id_batch 11 in class variable
Jul 24 08:56:28: [DEBUG] - mysql_lib.py (325): updating processed flag for id_batch 11
Jul 24 08:56:28: [DEBUG] - pg_lib.py (1007): updating batch 11 to processed
Jul 24 08:56:28: [DEBUG] - mysql_lib.py (328): replaying batch.

so the mysql-bin.000039 is replicated up to position 785997674 (byte offset?).

While:

mysql> show master logs;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000037 | 1059711419 |
| mysql-bin.000038 | 2143584141 |
| mysql-bin.000039 | 1802709339 |
| mysql-bin.000040 |  254273132 |
+------------------+------------+

Is this normal behaviour and to be expected - there is no dataloss?

Note that the mysql master is actually a slave as well and is currently replaying the last 3 days of replication from the real master database.

Dow't work with big tables > ~9 000 000 rows.

(venv) [root@centoscloud ~]# chameleon.py create_schema
Jul 6 02:05:06: [DEBUG] - mysql_lib.py (475): getting table metadata
Jul 6 02:05:06: [INFO] - mysql_lib.py (478): table copy limited to tables: document
Jul 6 02:05:06: [INFO] - pg_lib.py (624): Installing service schema base
(venv) [root@centoscloud ~]# chameleon.py add_source --config default
Jul 6 02:05:19: [DEBUG] - mysql_lib.py (475): getting table metadata
Jul 6 02:05:19: [INFO] - mysql_lib.py (478): table copy limited to tables: document
(venv) [root@centoscloud ~]# chameleon.py init_replica --config default
Jul 6 02:05:33: [DEBUG] - mysql_lib.py (475): getting table metadata
Jul 6 02:05:33: [INFO] - mysql_lib.py (478): table copy limited to tables: document
Jul 6 02:05:33: [INFO] - global_lib.py (307): Importing mysql schema
Jul 6 02:05:33: [INFO] - mysql_lib.py (580): locking the tables
Jul 6 02:05:33: [INFO] - mysql_lib.py (590): copying table document
Jul 6 02:05:33: [DEBUG] - mysql_lib.py (595): estimating rows in document
Jul 6 02:05:33: [DEBUG] - mysql_lib.py (624): document will be copied in 24 slices of 390168 rows
Jul 6 02:05:33: [DEBUG] - mysql_lib.py (631): Executing query for table document
Jul 6 02:05:33: [DEBUG] - mysql_lib.py (636): Starting extraction loop for table document
Jul 6 02:05:43: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:05:43: [INFO] - mysql_lib.py (513): Table document copied 4 %
Jul 6 02:05:55: [INFO] - mysql_lib.py (513): Table document copied 8 %
Jul 6 02:06:05: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:06:05: [INFO] - mysql_lib.py (513): Table document copied 12 %
Jul 6 02:06:17: [INFO] - mysql_lib.py (513): Table document copied 16 %
Jul 6 02:06:28: [INFO] - mysql_lib.py (513): Table document copied 20 %
Jul 6 02:06:39: [INFO] - mysql_lib.py (513): Table document copied 25 %
Jul 6 02:06:50: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:06:50: [INFO] - mysql_lib.py (513): Table document copied 29 %
Jul 6 02:07:01: [INFO] - mysql_lib.py (513): Table document copied 33 %
Jul 6 02:07:12: [INFO] - mysql_lib.py (513): Table document copied 37 %
Jul 6 02:07:23: [INFO] - mysql_lib.py (513): Table document copied 41 %
Jul 6 02:07:34: [INFO] - mysql_lib.py (513): Table document copied 45 %
Jul 6 02:07:46: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:07:46: [INFO] - mysql_lib.py (513): Table document copied 50 %
Jul 6 02:07:57: [INFO] - mysql_lib.py (513): Table document copied 54 %
Jul 6 02:08:08: [INFO] - mysql_lib.py (513): Table document copied 58 %
Jul 6 02:08:19: [INFO] - mysql_lib.py (513): Table document copied 62 %
Jul 6 02:08:29: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:08:29: [INFO] - mysql_lib.py (513): Table document copied 66 %
Jul 6 02:08:40: [INFO] - mysql_lib.py (513): Table document copied 70 %
Jul 6 02:08:51: [INFO] - mysql_lib.py (513): Table document copied 75 %
Jul 6 02:09:02: [INFO] - mysql_lib.py (513): Table document copied 79 %
Jul 6 02:09:14: [INFO] - mysql_lib.py (513): Table document copied 83 %
Jul 6 02:09:25: [INFO] - mysql_lib.py (513): Table document copied 87 %
Jul 6 02:09:36: [INFO] - mysql_lib.py (513): Table document copied 91 %
Jul 6 02:09:47: [INFO] - mysql_lib.py (513): Table document copied 95 %
Jul 6 02:09:58: [INFO] - mysql_lib.py (513): Table document copied 100 %
Jul 6 02:10:09: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:10:09: [INFO] - mysql_lib.py (513): Table document copied 104 %
Jul 6 02:10:19: [INFO] - mysql_lib.py (657): table document error in PostgreSQL copy, saving slice number for the fallback to insert statements
Jul 6 02:10:19: [INFO] - mysql_lib.py (513): Table document copied 108 %
Jul 6 02:10:30: [INFO] - mysql_lib.py (513): Table document copied 112 %
Jul 6 02:10:41: [INFO] - mysql_lib.py (513): Table document copied 116 %
Jul 6 02:10:53: [INFO] - mysql_lib.py (513): Table document copied 120 %
Jul 6 02:10:57: [INFO] - mysql_lib.py (513): Table document copied 125 %
Traceback (most recent call last):
File "/usr/bin/chameleon.py", line 41, in
replica.init_replica()
File "/usr/lib/python2.7/site-packages/pg_chameleon/lib/global_lib.py", line 257, in init_replica
self.copy_table_data()
File "/usr/lib/python2.7/site-packages/pg_chameleon/lib/global_lib.py", line 489, in copy_table_data
self.my_eng.copy_table_data(self.pg_eng, self.global_config.copy_max_memory)
File "/usr/lib/python2.7/site-packages/pg_chameleon/lib/mysql_lib.py", line 670, in copy_table_data
self.insert_table_data(pg_engine, ins_arg)
File "/usr/lib/python2.7/site-packages/pg_chameleon/lib/mysql_lib.py", line 556, in insert_table_data
pg_engine.insert_data(table_name, insert_data , self.my_tables)
File "/usr/lib/python2.7/site-packages/pg_chameleon/lib/pg_lib.py", line 437, in insert_data
self.pg_conn.pgsql_cur.execute(sql_head,column_values)
ValueError: A string literal cannot contain NUL (0x00) characters.

I've tried many different configuration options for the configuration file, but large tables can not be initialized. On small tables everything works fine.
P.S. Percona5.6 (cascading replica), PostgreSQL 9.5, Python3.6

Test issue from Rollbar

This is a test issue created by Rollbar. If you can see this, it works!

Init replica

Add init_replica in order to initialise the replica.
This method will initialise an entire replica set or a specific schema within a source

Create and drop service schema

add methods for creating and dropping the service schema

replication issue: null value in column "id" violates not-null constraint

Jul 18 08:13:44: [ERROR] - global_lib.py (413): An error occurred during the replica. (<class 'psycopg2.IntegrityError'>, IntegrityError('null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, null, null, null, null, 2, null, null, b:1:144,b:2:144, null, null, null, null, null, null, null, 0, 0, 0, null).
CONTEXT:  SQL statement "
UPDATE  switchv2.rate
  SET
   day=NULL,
   data=NULL,
   tier0='b:1:144,b:2:144',
   tier1=NULL,
   tier2=NULL,
   tier3=NULL,
   tier4=NULL,
   tier5=NULL,
   tier6=NULL,
   tier7=NULL,
   max_los=NULL,
   min_los='2',
   id=NULL,
   tier_active='0',
   max_los_thru=NULL,
   min_los_thru=NULL,
   rateplan_code=NULL,
   number_of_units=NULL,
   close_to_arrival='0',
   close_to_departure='0'
  WHERE ("id","rateplan_code","day")=('S000186','ST_BAR4','2017-08-25');
"
PL/pgSQL function sch_chameleon.fn_process_batch(integer,integer) line 216 at EXECUTE
',), <traceback object at 0x7f99ac2c5148>)

Mysql does do this correctly in the replication (I am replicating of a slave), but I do not know the exact sql statement right now that would trigger this.

I guess it is a rowlevel entry that fails.

Error in version 1.6. Can't start init_replica

Traceback (most recent call last):
File "/usr/local/bin/chameleon.py", line 50, in
replica.init_replica()
File "/usr/local/lib/python3.6/site-packages/pg_chameleon/lib/global_lib.py", line 255, in init_replica
self.pg_eng.set_source_id('initialising')
File "/usr/local/lib/python3.6/site-packages/pg_chameleon/lib/pg_lib.py", line 279, in set_source_id
self.pg_conn.pgsql_cur.execute(sql_source, (source_status, source_name))
psycopg2.ProgrammingError: relation "sch_chameleon.t_sources" does not exist
LINE 2: UPDATE sch_chameleon.t_sources

There was an error during the replay of data on the tables db_sakila.test_tinyd SQLSTATE: 22P02 - ERROR MESSAGE: invalid input syntax for type boolean: "13". Those tables are no longer replicated.

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/13/

There was an error during the replay of data on the tables db_sakila.test_tinyd SQLSTATE: 22P02 - ERROR MESSAGE: invalid input syntax for type boolean: "13". Those tables are no longer replicated.

is this normal: i_skipped > 0 on a finished batch

I have been looking at some of the tables used by pg_chameleon and one thing that is interresting is the fact that in t_replica_batch the i_skipped column is not 0 for a finished batch.

Is this normal behaviour?

i_id_batch	b_replayed	i_replayed	i_skipped
9	t	40664	10
10	t	4123	2399
11	t	7465	9
12	t	39083	23
13	t	2571	3
14	t	23766	9
15	t	24464	8
16	t	1747	7
17	f	50000	\N
18	f	\N	\N

Just wanting to make sure...

binary data in varchar sometimes cannot get replicated

We got a table with some binary data in a varchar field.
for some reason this can not get replicated - seems as if there is a "formating" problem in mysql.

The error in question looks like this:

The row itself looks like this:

mysql> select * from history where id="20573461" LIMIT 1;
+-------------+---------------------+--------------+---------------+-------------+---------------------+--------------+-------------------------------------------------------------------------------------------------+----------+
| user_id     | update_type         | code         | affected_from | affected_to | submit_time         | submit_ip    | submit_user                                                                                     | id       |
+-------------+---------------------+--------------+---------------+-------------+---------------------+--------------+-------------------------------------------------------------------------------------------------+----------+
| DEMO_MARKUS | OTA_XXXXXRQ         | XXXXX        | 2014-10-22    | 2014-10-29  | 2014-10-21 10:49:59 | 11.22.33.44 | �"q��ɭ�̹͕ɑ�խ��͕��������脅 ݵ���������                                                         | 20573461 |
+-------------+---------------------+--------------+---------------+-------------+---------------------+--------------+-------------------------------------------------------------------------------------------------+----------+
1 row in set (0.00 sec)

in this output you can see that the id 20573461 is presented before the | - similar to the error in the script itself.

Here when running the concat sql for this one row manually:

mysql>  SELECT
    ->  REPLACE(
    ->   CONCAT(
    ->    '"',
    ->    CONCAT_WS(
    ->     '","',
    ->     COALESCE(REPLACE(`user_id`, '"', '""'),'NULL'),
    ->     COALESCE(REPLACE(`update_type`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`code`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`affected_from`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`affected_to`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`submit_time`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`submit_ip`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`submit_user`, '"', '""'), 'NULL'),
    ->     COALESCE(REPLACE(`id`, '"', '""'), 'NULL')
    ->    ),
    ->    '"'
    ->   ),
    ->   '"NULL"',
    ->   'NULL'
    ->  ) as data
    -> FROM history
    ->  WHERE id=20573461;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data                                                                                                                                                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "DEMO_MARKUS","OTA_XXXXXRQ","XXXXX","2014-10-22","2014-10-29","2014-10-21 10:49:59","11.22.33.44","�""q��ɭ�̹͕ɑ�խ��͕��������脅 ݵ���������","20573461"                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

So there is possibly a formatting issue from mysql itself because of 0x00 being part of the payload.
Maybe someone has an idea of how to handle this correctly.

Update statement can not be applied immediately

hi,
my environment is centos 7.3, python 3.5.3. We have been testing pg_chameleon 2.0.0a3 for a week now. The problem is now
Using pg_chameleon init_replica migrate data to PG there is no problem, but when I manually update a record in the source mysql, PG will not be applied immediately. Unless I manually execute flush logs or DDL statements in the mysql side to trigger reply.
In addition, there is another problem, that every time I start_replica will create a copy thread in the mysql side, even if I executed stop_replica. In linux using netstat view mysql side, tcp connection shows close_wait state

We also analyzed the python source code, looks like the class mysql_source, read_replica_stream function "for binlogevent in my_stream:" This statment has been waiting, no matter whether the mysql source has a new binlog event

I hope someone can help me solve this, thank you very much

Update schema mappings

add methods for update schema mappings
this method allows to add or remove schema replica withing a source

pg Schema name upper case is dropped to lowercase in init_replica.

Using a schema name like "CamelCaseSchema" in the configfile.yaml, creates a schema "camelcaseschema". When the replication is actually run though, the script again looks for "CamelCaseSchema", which it then doens't find and an error is generated.

MySQL numeric password breaks pg_chameleon

When using a numeric password in the config file, a python error "AttributeError: 'int' object has no attribute 'encode'" occurs when running ./chameleon.py create_schema.

mysql_conn:
    host: hout1
    port: 3306
    user: mysql_rep
    passwd: 123456

A workaround is to enclose the password with double quotes -> "".

    passwd: "123456"

When adding a unique key the table's creation fails because of null

It worth to reconsider the null policy, adding the correct default value management, before the stable release.

can I add tables synchronization to PG at any time?

My MySQL database has a capacity of 1600GB, full table synchronization to PG time is too long, and some data is not needed. I hope to synchronize some of the tables to the PG, but if the requirements change, I can always select some other table synchronization to the PG, how to achieve this requirement?

I tried to add a new source, but the error message:
There is a duplicate destination schema in the schema mappings

What should I do? thanks

parallel init process

improve the init process using parallel jobs

Write method for replaying the row images into the target database

Write method for replaying the row images into the target database with error management built into the plpgsql function.

Question: TRUNCATE message in logs

Hi!

is this a possible replication problem:

...
Jul 27 09:57:45: [DEBUG] - mysql_lib.py (172): QUERY EVENT - binlogfile mysql-bin.000015, position 576554713.
--------
TRUNCATE TABLE tableau.res_info
--------
Jul 27 09:57:45: [DEBUG] - mysql_lib.py (192): TOKEN: {'name': 'tableau', 'command': 'TRUNCATE'}
Jul 27 09:57:45: [DEBUG] - mysql_lib.py (352): trying to save the master data...
Jul 27 09:57:45: [INFO] - pg_lib.py (818): Saved master data for source: tableau
...

Thanks

Handling of datatypes when migrating schema

Hi,

We are currently evaluating pg_chameleon for migrating our database from mysql to postgresql with minimal downtime.

One blocking issue we found is that in mysql booleans are represented as tinyint(1) or bit(1) while in postgresql our ORM mapper expects real booleans.

We didn't find a way to do this kind of transformation with pg_chameleon. Other tools like pgloader have this built in.

Question is can this functionality be added?

The replica process crashed.

View details in Rollbar: https://rollbar.com/the4thdoctor/pg_chameleon/items/14/

The replica process crashed.
 Source: example

setup a basic init_replica for the postgresql source

[ERROR] - global_lib.py (416): An error occurred during the replica.

At the MySQL：

alter table kp_web_log add key idx_idx_mobile(idx_mobile)

[ERROR] - global_lib.py (416): An error occurred during the replica. (<class 'KeyError'>, KeyError('idx_mobile',), <traceback object at 0x7fbf3eced448>)

Add and drop source

Create methods for add and drop sources

add sync_tables

the method allows the sync of a single table within a source replication set

Feedback&Question: Thank you for a very useful replication solution. Need timeout MySQL connection.

Feedback&Question: Thank you for a very useful replication solution. It was very useful to me. But I use an internet connection between replicas. And with an unstable network connection, an error message sometimes appears and replication stops.

Jul 21 03:52:05: [ERROR] - global_lib.py (409): An error occurred during the replica. (<class 'pymysql.err.OperationalError'>, OperationalError(2003, "Can't connect to MySQL server on '192.168.2.130' (timed out)"), <traceback object at 0x7efe501d6208>)

To run it again, need to delete the pid file
rm -rf /root/.pg_chameleon/pid/default.lock ; chameleon.py start_replica --config default

Can I increase the timeout for reconnection in the replication mode without stopping?

1.5: stop_replica does not work

chameleon.py stop_replica --config switchv2 does not work with 1.5:

I could only stop by CTRL-C the chameleon.py start_replica --config switchv2 process

here the log:

Jul 18 15:03:11: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:19: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:23: [DEBUG] - mysql_lib.py (513): getting table metadata
Jul 18 15:03:24: [INFO] - global_lib.py (273): waiting for replica process to stop
Jul 18 15:03:27: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:35: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:43: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:51: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:03:58: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:06: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:14: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:22: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:30: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:38: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:46: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:04:54: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:02: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:10: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:19: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:27: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:35: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:44: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:05:52: [DEBUG] - pg_lib.py (1037): Batch loop value True
Jul 18 15:06:00: [ERROR] - global_lib.py (413): An error occurred during the replica. (<class 'SystemError'>, SystemError('<built-in function utf_8_decode> returned a result with an error set',), <traceback object at 0x7ff151295d08>)
Jul 18 15:06:03: [INFO] - global_lib.py (280): replica process stopped

the4thdoctor / pg_chameleon Goto Github PK

pg_chameleon's Introduction

Requirements

Replica host

Origin database

Destination database

Example scenarios

Features

Caveats

Setup

RPM PGDG

openSUSE Build Service

Virtual env setup

Configuration directory

pg_chameleon's People

Contributors

Stargazers

Watchers

Forkers

pg_chameleon's Issues

Recommend Projects

Recommend Topics

Recommend Org