Giter VIP home page Giter VIP logo

pgsync's Introduction

pgsync

Sync data from one Postgres database to another (like pg_dump/pg_restore). Designed for:

  • speed - tables are transferred in parallel
  • security - built-in methods to prevent sensitive data from ever leaving the server
  • flexibility - gracefully handles schema differences, like missing columns and extra columns
  • convenience - sync partial tables, groups of tables, and related records

🍊 Battle-tested at Instacart

Build Status

Installation

pgsync is a command line tool. To install, run:

gem install pgsync

This will give you the pgsync command. If installation fails, you may need to install dependencies.

You can also install it with Homebrew:

brew install pgsync

Setup

In your project directory, run:

pgsync --init

This creates .pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.

How to Use

First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few convenience methods. Once that’s done, you’re ready to sync data.

Sync tables

pgsync

Sync specific tables

pgsync table1,table2

Works with wildcards as well

pgsync "table*"

Sync specific rows (existing rows are overwritten)

pgsync products "where store_id = 1"

You can also preserve existing rows

pgsync products "where store_id = 1" --preserve

Or truncate them

pgsync products "where store_id = 1" --truncate

Tables

Exclude specific tables

pgsync --exclude table1,table2

Add to .pgsync.yml to exclude by default

exclude:
  - table1
  - table2

Sync tables from all schemas or specific schemas (by default, only the search path is synced)

pgsync --all-schemas
# or
pgsync --schemas public,other
# or
pgsync public.table1,other.table2

Groups

Define groups in .pgsync.yml:

groups:
  group1:
    - table1
    - table2

And run:

pgsync group1

Variables

You can also use groups to sync a specific record and associated records in other tables.

To get product 123 with its reviews, last 10 coupons, and store, use:

groups:
  product:
    products: "where id = {1}"
    reviews: "where product_id = {1}"
    coupons: "where product_id = {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id = {1})"

And run:

pgsync product:123

Schema

Sync the schema before the data (this wipes out existing data)

pgsync --schema-first

Specify tables

pgsync table1,table2 --schema-first

Sync the schema without data (this wipes out existing data)

pgsync --schema-only

pgsync does not try to sync Postgres extensions.

Sensitive Data

Prevent sensitive data like email addresses from leaving the remote server.

Define rules in .pgsync.yml:

data_rules:
  email: unique_email
  last_name: random_letter
  birthday: random_date
  users.auth_token:
    value: secret
  visits_count:
    statement: "(RANDOM() * 10)::int"
  encrypted_*: null

last_name matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

  • unique_email
  • unique_phone
  • unique_secret
  • random_letter
  • random_int
  • random_date
  • random_time
  • random_ip
  • value
  • statement
  • null
  • untouched

Rules starting with unique_ require the table to have a single column primary key. unique_phone requires a numeric primary key.

Foreign Keys

Foreign keys can make it difficult to sync data. Three options are:

  1. Defer constraints (recommended)
  2. Manually specify the order of tables
  3. Disable foreign key triggers, which can silently break referential integrity (not recommended)

To defer constraints, use:

pgsync --defer-constraints

To manually specify the order of tables, use --jobs 1 so tables are synced one-at-a-time.

pgsync table1,table2,table3 --jobs 1

To disable foreign key triggers and potentially break referential integrity, use:

pgsync --disable-integrity

This requires superuser privileges on the to database. If syncing to (not from) Amazon RDS, use the rds_superuser role. If syncing to (not from) Heroku, there doesn’t appear to be a way to disable integrity.

Triggers

Disable user triggers with:

pgsync --disable-user-triggers

Sequences

Skip syncing sequences with:

pgsync --no-sequences

Append-Only Tables

For extremely large, append-only tables, sync in batches.

pgsync large_table --in-batches

Note: This requires the table to have a numeric, increasing primary key

The script will resume where it left off when run again, making it great for backfills.

Connection Security

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

Safety

To keep you from accidentally overwriting production, the destination is limited to localhost or 127.0.0.1 by default.

To use another host, add to_safe: true to your .pgsync.yml.

Multiple Databases

To use with multiple databases, run:

pgsync --init db2

This creates .pgsync-db2.yml for you to edit. Specify a database in commands with:

pgsync --db db2

Integrations

Django

If you run pgsync --init in a Django project, migrations will be excluded in .pgsync.yml.

exclude:
  - django_migrations

Heroku

If you run pgsync --init in a Heroku project, the from database will be set in .pgsync.yml.

from: $(heroku config:get DATABASE_URL)?sslmode=require

Laravel

If you run pgsync --init in a Laravel project, migrations will be excluded in .pgsync.yml.

exclude:
  - migrations

Rails

If you run pgsync --init in a Rails project, Active Record metadata and schema migrations will be excluded in .pgsync.yml.

exclude:
  - ar_internal_metadata
  - schema_migrations

Debugging

To view the SQL that’s run, use:

pgsync --debug

Other Commands

Help

pgsync --help

Version

pgsync --version

List tables

pgsync --list

Scripts

Use groups when possible to take advantage of parallelism.

For Ruby scripts, you may need to do:

Bundler.with_unbundled_env do
  system "pgsync ..."
end

Docker

Get the Docker image with:

docker pull ankane/pgsync
alias pgsync="docker run -ti ankane/pgsync"

This will give you the pgsync command.

Dependencies

If installation fails, your system may be missing Ruby or libpq.

On Mac, run:

brew install libpq

On Ubuntu, run:

sudo apt-get install ruby-dev libpq-dev build-essential

Upgrading

Run:

gem install pgsync

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/pgsync.git

With Homebrew, run:

brew upgrade pgsync

With Docker, run:

docker pull ankane/pgsync

Related Projects

Also check out:

  • Dexter - The automatic indexer for Postgres
  • PgHero - A performance dashboard for Postgres
  • pgslice - Postgres partitioning as easy as pie

Thanks

Inspired by heroku-pg-transfer.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pgsync.git
cd pgsync
bundle install

createdb pgsync_test1
createdb pgsync_test2
createdb pgsync_test3

bundle exec rake test

pgsync's People

Contributors

alexcharlton avatar aliou avatar ankane avatar arshsingh avatar atul9 avatar dannyim avatar joehorsnell avatar julienanne avatar mtilda avatar ogerman avatar pyrliu avatar sandeep45 avatar stem avatar stephenlacy avatar vgasparini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgsync's Issues

OS lib dependencies in README

I think it's worth to mention that, on a clean installation of a OS, like Ubuntu 18.04, I had to install a bunch of dependencies:

sudo apt install postgresql postgresql-server-dev-10

Otherwise I was getting:

Building native extensions. This could take a while...
ERROR:  Error installing pgsync:
	ERROR: Failed to build gem native extension.

    current directory: /var/lib/gems/2.5.0/gems/pg-1.1.3/ext
/usr/bin/ruby2.5 -r ./siteconf20181017-28433-1b2jfs9.rb extconf.rb
checking for pg_config... no
No pg_config... trying anyway. If building fails, please try again with
 --with-pg-config=/path/to/pg_config
checking for libpq-fe.h... no
Can't find the 'libpq-fe.h header
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

This may be pretty obvious for Ruby developers and who knows the internals of everything, but now showing this information may prevent new developers (as in, new in experience) to understand why it doesn't install.

How to synchronize data between two databases?

Sorry, but from the README file I didn't understand how to do it.

I have an application that needs to be connected to two databases (main database and mirror). Rails application will work with the main database. But it is necessary to synchronize the data of the main database and the secondary one. Tell me, please, how can this be implemented?

How to sync only a specific schema?

Hey!

i've seen that since version 0.3.5 there is a possibility to sync a specific schema and not public. But it's not documented how it can be used. Can you provide an example?

Cheers,
Patryk

Not restoring truncated tables.

pgsync not restoring truncated tables that are excluded in yml file.
There is data lose.
i want to sync only one table but that refers other table that's why it truncating tables. but i want my old data on that truncated tables. it there any way to do so

Unable to update referenced row

* Syncing user
    where id = 19398
/var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:130:in `exec': ERROR:  update or delete on table "user" violates foreign key constraint "point_events_user_id_fkey" on table "point_events" (PG::ForeignKeyViolation)
DETAIL:  Key (id)=(19398) is still referenced from table "point_events".
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:130:in `block in sync'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:129:in `transaction'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:129:in `sync'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/client.rb:95:in `block in sync'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/client.rb:264:in `each'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/client.rb:264:in `in_parallel'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/client.rb:94:in `sync'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/client.rb:24:in `perform'
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/exe/pgsync:5:in `<top (required)>'
        from /usr/local/bin/pgsync:23:in `load'
        from /usr/local/bin/pgsync:23:in `<main>'

Attached is the error that is faced when trying to update rows that are referenced in other tables. This makes sense if the row was being deleted but it is just being updated,

select colums

Is there a way to only select specific columns of a table to copy?
Thank you

Win10: Process.fork is not supported by this Ruby

Hi. I'm using win10, ruby 2.4.2p198, gem 2.6.13, pgsync 0.3.7.

I'm not able to do pgsync --setup to generate .yml file but copying an existing one seems to work fine. With the copied .yml file (set to fetch from a remote server) am able to do pgsync tablename --schema-only but not pgsync tablename because it throws the ff. error:

Process.fork is not supported by this Ruby
* Syncing instrumentdata
<internal:prelude>:78:in `__read_nonblock': A non-blocking socket operation could not be completed immediately. - read would block (IO::EWOULDBLOCKWaitReadable)
        from <internal:prelude>:78:in `read_nonblock'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:66:in `block in locked?'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `try_handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:64:in `locked?'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:117:in `block in unlock'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `try_handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:116:in `unlock'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:141:in `block in synchronize'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing.rb:26:in `try_handle_interrupt'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/multiprocessing-0.0.2/lib/multiprocessing/mutex.rb:133:in `synchronize'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:162:in `block (3 levels) in sync_table'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:394:in `with_connection'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:145:in `block (2 levels) in sync_table'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:394:in `with_connection'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:144:in `block in sync_table'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:405:in `benchmark'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:143:in `sync_table'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:129:in `block in perform'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:486:in `call_with_index'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:318:in `block in work_direct'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:495:in `with_instrumentation'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:317:in `work_direct'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:260:in `map'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-1.12.0/lib/parallel.rb:217:in `each'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:572:in `in_parallel'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:128:in `perform'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/exe/pgsync:5:in `<top (required)>'
        from C:/Ruby24-x64/bin/pgsync:23:in `load'
        from C:/Ruby24-x64/bin/pgsync:23:in `<main>'

Anything that can be done with this?

Thanks for making great tool.

Parallel::UndumpableException: PG::NotNullViolation: ERROR: null value in column "encrypted_password" violates not-null constraint

Rails 5.2.3
Devise 4.7.1


I'm trying to synchronize a table with users between two different databases. Databases are different, but have the same scheme.

bundle exec pgsync --from db1 --to db2 --preserve users
bundle exec pgsync --from db2 --to db1 --preserve users

After executing the first command, I get this error:

bundler: failed to load command: pgsync (/Users/afuno/.rbenv/versions/2.6.2/bin/pgsync)
Parallel::UndumpableException: PG::NotNullViolation: ERROR:  null value in column "encrypted_password" violates not-null constraint

And:

DETAIL:  Failing row contains (705, [email protected], 2019-08-20 18:11:51.115133, 2019-08-20 18:12:21.307266, null, null, 1, 2019-08-20 18:11:51.175513, 2019-08-20 18:11:51.175513, ::1, ::1, 3ge6662f06fe1e331e97421c0511892bfrqb1ec8, ST, L, null, null, f-7_2x.png, London, null, t, t, 2019-08-20 18:12:21.307266, 203140, null).

The fifth value in the list above is encrypted_password. For some reason, it is null. I checked the user 705 through the rails console - this field is not empty for him.

What to do?

README suggestion

I wanted to suggest adding a "tip" to the README.md for anyone using pgsync via a rake task being run on Heroku.

The heroku/heroku-buildpack-cli is required in order to allow one to run the Heroku CLI w/n a dyno alongside application code. Take the example below:

from: $(heroku config:get DATABASE_URL)

Feel free to shelf this suggestion if it's to specific, totally understand!

Error dumping schema: "Syntax error"

pgsync version: 0.3.7

Gday. I'm trying to use pgsync for the first time and I get this error:

$ pgsync --config .pgsync-app_db.yml --schema-only --debug
From: application on <host-omitted>:5432
To: p2p_analysis on localhost:5432
* Dumping schema
sh: 1: Syntax error: "(" unexpected
Completed in 2.7s

When I login using psql I see the following:

psql "postgres://<URI-omitted>/application"
psql (9.5.8, server 9.5.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

application=> \d

Any help appreciated!

Table name with capital letters.

Executed command: pgsync challengeTemplates --debug --overwrite
Output:

* Syncing challengeTemplates
/Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:438:in `exec_params': ERROR:  relation "challengetemplates" does not exist (PG::UndefinedTable)
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:438:in `primary_key'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:232:in `block (3 levels) in sync_table'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:391:in `with_connection'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:143:in `block (2 levels) in sync_table'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:391:in `with_connection'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:142:in `block in sync_table'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:402:in `benchmark'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:141:in `sync_table'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:127:in `block in perform'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:569:in `each'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:569:in `in_parallel'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync.rb:126:in `perform'
	from /Library/Ruby/Gems/2.3.0/gems/pgsync-0.3.8/exe/pgsync:5:in `<top (required)>'
	from /usr/local/bin/pgsync:22:in `load'
	from /usr/local/bin/pgsync:22:in `<main>'

I am able to sync all tables having names in lower case. As per the exception

relation "challengetemplates" does not exist

where as the table name is "challengeTemplates".

Better documentation for the data protection rules

Hi !

Thanks for this interesting project !

I wanted to test some features, especially the sensitive data protection, but I got really confused by the unique_email rule. It took me a while to understand that unique_email will only work if the table has a column name "id" (also assuming that this field is unique ).

https://github.com/ankane/pgsync/blob/master/lib/pgsync/table_sync.rb#L181

I feel like the data protection rules should be presented with more details in the README file as most users don't have an "id" column on each table.

Same thing for the rules random_date, random_ip and random_letter which are not random at all

Or maybe this is a tribute to https://xkcd.com/221/ ;-)

Trying to connect at a port other than specified in config file

Command executed: pgsync --schema-only

Logs printed:

From: production on localhost:5432
To: production on rm-gs55mx73mnim6lvqq.pgsql:3432
* Dumping schema
pg_dump: [archiver (db)] connection to database "production" failed: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Cannot assign requested address
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
pg_restore: [archiver] input file is too short (read 0, expected 5)
Completed in 0.2s

The specified post in from is 5432 but as per logs it is looking for

TCP/IP connections on port 5433

.

Syncing data with camelcase columns results in lowercase name

This was from master eea0b28

/var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:127:in 
`exec': PG::UndefinedColumn: ERROR:  column users.postid does not exist (Parallel::UndumpableException)
LINE 1: ...SELECT 1 FROM "public"."users" WHERE "public"."...
                                                             ^
HINT:  Perhaps you meant to reference the column "users.postId" or the column "users.postId".
        from /var/lib/gems/2.3.0/gems/pgsync-0.4.0/lib/pgsync/table_sync.rb:127:in `sync'

The correct column name is postId in table users
Edit: this is with the --preserve flag enabled

Extensions are not being dumped

I'm trying to sync my db with (roughly) the following command:

pgsync --from postgres://foo:[email protected]:5432/mydb --to postgres://localhost:5432/my_local_db --schema-first

The sync fails restore a table b/c the citext extension isn't enabled when the pg_restore runs.

pg_restore: creating TABLE "public.vendors"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 274; 1259 4757491 TABLE vendors cgoalvqzioskmf
pg_restore: [archiver (db)] could not execute query: ERROR:  type "public.citext" does not exist
LINE 12:     other_locations public.citext,
                             ^
    Command was: CREATE TABLE public.vendors (
    id integer NOT NULL,
    name character varying NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    gateway_account_id character varying,
    gateway_access_token_response jsonb,
    address character varying,
    onboarded_at timestamp without time zone,
    tax_id character varying,
    insurance_info jsonb DEFAULT '{}'::jsonb,
    other_locations public.citext,
    approved_at timestamp without time zone
);

I'm guessing this is b/c the pg_catalog tables are all being excluded from the dump:

query = "SELECT table_schema, table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ('information_schema', 'pg_catalog') ORDER BY 1, 2"

Meaning the pg_catlog/pg_extensions table isn't being dumped. Thus the citext extension isn't being restored, and so tables depending on it cannot be restored. Is there a reason not to include the extensions table? Am I looking in the wrong place or using the wrong set of flags?

EDIT: Now I realize we cannot dump and restore this table because it's a system catalog. Derp. So that all makes sense. Though I wonder if there's a way to restore/re-enable the extensions prior to restoring the schema? Maybe I need to do this in a couple of passes? Or perhaps I cannot use the --schema-first flag, but must rely on the correct schema already being in place?

Thank you for your time, and please let me know what I can do to help investigate this issue.

Timeout expired with no backtrace or error messages

pgsync used to work for me and suddenly I'm starting to receive this very vague error.

timeout expired

I'm able to connect to the database manually via postgres cli so I'm not sure what went wrong here.

UUID --in-batches

Will raise with SELECT MAX("id") FROM "my_awesome_table" No function matches the given name and argument types. You might need to add explicit type casts

Different table name

Is there a way to change the name of the table the data is being imported into?

Deadlock detected errors

Hi @ankane,
I get the error when I'm dealing with data in the production database, that has lots of relations PK-FK between tables. When pgsync runs in parallel the tables with the similar list of dependencies, it causes the deadlock error like below.

# just show a list of dependencies for two tables
>pgsync tier
* Syncing tier
NOTICE:  truncate cascades to table "product"
NOTICE:  truncate cascades to table "rate"
NOTICE:  truncate cascades to table "offer"
NOTICE:  truncate cascades to table "action_impact"
NOTICE:  truncate cascades to table "uwm_data"

>pgsync state
* Syncing state
NOTICE:  truncate cascades to table "tp_insurance"
NOTICE:  truncate cascades to table "uwm_data"
NOTICE:  truncate cascades to table "c_score"
NOTICE:  truncate cascades to table "action_impact"
NOTICE:  truncate cascades to table "offer"
<...>

When we run pgsync or pgsync group1 the error is occurred.

2019-10-02 09:43:06.500 UTC [12571] ERROR:  deadlock detected
2019-10-02 09:43:06.500 UTC [12571] DETAIL:  Process 12571 waits for AccessExclusiveLock on relation 40292(action_impact) of database 39735; blocked by process 12569.
        Process 12569 waits for AccessExclusiveLock on relation 40843(uwm_data) of database 39735; blocked by process 12571.
        Process 12571: TRUNCATE "postgres"."state" CASCADE
        Process 12569: TRUNCATE "postgres"."tier" CASCADE
2019-10-02 09:43:06.500 UTC [12571] HINT:  See server log for query details.
2019-10-02 09:43:06.500 UTC [12571] STATEMENT:  TRUNCATE "postgres"."state" CASCADE

This is an extract from the entire list, in a real situation I have 4 or even more tables that are processed concurrently and have the similar list of related tables need to be truncated.
All triggers on the target side are disabled.
I just use --debug as a workaround, but it completely disables multitheading.

invalid URI query parameter: "sslca"

When I provide a database url with a PEM, I get the following error:

<database_url>?sslca=config/rds-combined-ca-bundle.pem

Error:

invalid URI query parameter: "sslca"

second attempt at solving for FK constraints?

Hey @ankane, I know this PR has been stalled out for a long time:
#6

Would you been amenable to a fixed up branch with the functionality? We'd really love to stop using --debug :)

If so, do you have any hard requirements around the discussion from that thread? (e.g. the naming). Perhaps a --defer-constraints option? Given the way the Mutex is handled on parallelization, I think that approach might work...

(also, perhaps a way to specify that option by default, given a codebase with many FKs would always want that option).

Not able to setup

Executed command: pgsync --setup

Exception printed:

/usr/lib/ruby/2.3.0/fileutils.rb:1392:in `initialize': No such file or directory @ rb_sysopen - /var/lib/gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync/../config.yml (Errno::ENOENT)
	from /usr/lib/ruby/2.3.0/fileutils.rb:1392:in `open'
	from /usr/lib/ruby/2.3.0/fileutils.rb:1392:in `copy_file'
	from /usr/lib/ruby/2.3.0/fileutils.rb:486:in `copy_file'
	from /usr/lib/ruby/2.3.0/fileutils.rb:403:in `block in cp'
	from /usr/lib/ruby/2.3.0/fileutils.rb:1571:in `block in fu_each_src_dest'
	from /usr/lib/ruby/2.3.0/fileutils.rb:1587:in `fu_each_src_dest0'
	from /usr/lib/ruby/2.3.0/fileutils.rb:1569:in `fu_each_src_dest'
	from /usr/lib/ruby/2.3.0/fileutils.rb:402:in `cp'
	from /var/lib/gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync/client.rb:195:in `setup'
	from /var/lib/gems/2.3.0/gems/pgsync-0.3.8/lib/pgsync/client.rb:54:in `perform'
	from /var/lib/gems/2.3.0/gems/pgsync-0.3.8/exe/pgsync:5:in `<top (required)>'
	from /usr/local/bin/pgsync:23:in `load'
	from /usr/local/bin/pgsync:23:in `<main>'

win10: Bad file descriptor (Errno::EBADF) during pgsync --setup

Hi again @ankane . Forgot to include this error in my last post. Pops up during pgsync --setup

λ pgsync --setup
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/psych-2.2.4/lib/psych.rb:472:in `initialize': Bad file descriptor (Errno::EBADF)
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/psych-2.2.4/lib/psych.rb:472:in `open'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/psych-2.2.4/lib/psych.rb:472:in `load_file'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:340:in `config'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:41:in `block in perform'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:40:in `each'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/lib/pgsync.rb:40:in `perform'
        from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/pgsync-0.3.7/exe/pgsync:5:in `<top (required)>'
        from C:/Ruby24-x64/bin/pgsync:23:in `load'
        from C:/Ruby24-x64/bin/pgsync:23:in `<main>'

Does pgsync support between different pg version sync data?

I want to sync some table from pg 9.1.9 to pg 9.6.8,but it's not work.
and some error log as following:

/usr/local/share/gems/gems/pgsync-0.4.0/lib/pgsync/data_source.rb:145:in `conninfo': undefined method `conninfo_hash' for #<PG::Connection:0x0000000200ab20> (NoMethodError)
	from /usr/local/share/gems/gems/pgsync-0.4.0/lib/pgsync/data_source.rb:19:in `host'
	from /usr/local/share/gems/gems/pgsync-0.4.0/lib/pgsync/client.rb:47:in `sync'
	from /usr/local/share/gems/gems/pgsync-0.4.0/lib/pgsync/client.rb:24:in `perform'
	from /usr/local/share/gems/gems/pgsync-0.4.0/exe/pgsync:5:in `<top (required)>'
	from /usr/local/bin/pgsync:23:in `load'
	from /usr/local/bin/pgsync:23:in `<main>'

Tks
Best regards。

Ideas

pgsync 0.6.0

  • Drop support for Postgres < 9.5 (EOL)
  • Fix sync for multicolumn primary keys with the --overwrite and --preserve options
  • Use INSERT ... ON CONFLICT for the --overwrite and --preserve options (better for referential integrity with --overwrite)
  • Fetch all columns in a single query
  • Decide how to handle notices
  • Decide if should sync all schemas by default (and deprecate --all-schemas option)
  • Show schema differences before sync (like before 0.5.0)
  • Make --config flag more intuitive - don't search tree, throw error when it doesn't exist, throw error when both config and db specified
  • Remove deprecated options
  • Throw error when --in-batches used with more than one table

Sync to production db without wiping anything

Hi --

I am just about to use this tool but am afraid of wiping everything. Does sync from:to with both databases remote do anything unexpected, such as erasing existing data on either database? Or does it attempt to insert everything from "from" to "to"?

Thanks --
Tyler

Tables with column identifiers containing punctuation are not handled correctly

pgsync currently fails with the following error when it encounters a table with a column containing punctuation, which is valid if they are quoted:

Parallel::UndumpableException: PG::SyntaxError: ERROR:  syntax error at or near "?"
LINE 1: ...s"', 'column_with_punctuation?') AS column_with_punctuation?
                                                                      ^

    .../pgsync/lib/pgsync/data_source.rb:153:in `async_exec_params'
    .../pgsync/lib/pgsync/data_source.rb:153:in `execute'
    .../pgsync/lib/pgsync/data_source.rb:48:in `sequences'
    .../pgsync/lib/pgsync/table_sync.rb:20:in `sync'
    .../pgsync/lib/pgsync/client.rb:95:in `block in sync'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:506:in `call_with_index'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:471:in `process_incoming_jobs'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:453:in `block in worker'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:444:in `fork'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:444:in `worker'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:435:in `block in create_workers'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:434:in `each'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:434:in `each_with_index'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:434:in `create_workers'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:373:in `work_in_processes'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:277:in `map'
    /usr/local/var/rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/parallel-1.17.0/lib/parallel.rb:227:in `each'
    .../pgsync/lib/pgsync/client.rb:273:in `in_parallel'
    .../pgsync/lib/pgsync/client.rb:94:in `sync'
    .../pgsync/lib/pgsync/client.rb:24:in `perform'
    .../pgsync/test/pgsync_test.rb:77:in `block in assert_works'

PR incoming...

How to handle credentials?

What is the best way to handle credentials in .pgsync.yml file? I would like to keep this file in git, so it's not really good to have there database username and password.

How would you suggest doing this?

Wrap TRUNCATE in transaction

I could be understanding this incorrectly, but it seems that when I type the filter syntax for pgsync'ing single record incorrectly (e.g. trying to sync a user via users:123), pgsync instead decides to try to sync the entire users table. This can be cancelled with Ctrl-C, but unfortunately pgsync also appear to TRUNCATE the users table on localhost before it starts syncing, and the Ctrl-C does not roll back the truncate. As a result, it is immediately and irrecoverably destructive to make a typo in pgsync. According to Postgres docs, TRUNCATE can be wrapped in a DDL transaction. I assume the multi-user sync is already wrapped in a transaction (because when I Ctrl-C I don't end up with only some users, I always end up with 0 users), so perhaps it would be possible to include the TRUNCATE in this same transaction?

I apologize in advance for not having the time to read through the code and possibly having made multiple incorrect assumptions about pgsync or Postgres in this report.

schema sync fails because of views

I'm working with a database that has several views. I typically run --schema-only, followed by a few commands to sync by group. However, on this particular database, some tables can't be dropped because they're in use by views. This means pgsync will fail to update the schema of those tables. In this case, with some tables modified and others not, it becomes difficult to get the database back to a correct state (rails migrations fail and manual work is required).

Is there a way to work around this within pgsync? Or should I plan to drop and recreate views before and after each sync?

Feature Request: Show row count in output

When you sync 0 rows from the remote database, pgsync's output does not signify this in any way (when using groups defined in yml file). It merrily prints out all of the dependent table names and timings as if it were successful, but when you check via Ruby/PG console you don't actually end up with any records. Would be great if pgsync could print the # of rows successfully synced.

Passing mutiple ids for groups

Hello, I'm trying to pass multiple ids to one group with no luck

pgsync product:(1,2,3)

groups:
  product:
    products: "where id IN {1}"
    reviews: "where product_id IN {1}"
    coupons: "where product_id IN {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id IN {1})"

[Question] Copy enum types

How to transfer all enum types?

I get an error

pg_restore: [archiver (db)] could not execute query: ERROR:  type "public.user_language" does not exist
LINE 25:     language public.user_language DEFAULT 'ru'::public.user_...

I have a large number of enum records, how can I copy them into an empty table?

SSH to AWS RDS

What is the recommended way to have Pgsync SSH into a server before connecting to the Postgres database?

table names with non-word characters not supported

Our db has a table named "app*domain" which pgsync fails on with the error:

pgsync-0.3.6/lib/pgsync.rb:266:in `exec': PG::SyntaxError: ERROR: syntax error at or near "domain" (Parallel::UndumpableException)
LINE 1: TRUNCATE app*domain CASCADE
^
Solution would presumably be to always quote table names, a la blazer.

Which table is source?

Which table is source ? I'm trying pgsync table1,table2 the result is allways record from table1 is deleted because table2 is still empty, isn't this should sync record from table 1 to table 2 ?

Getting errors on alpine 3.10 with ruby-2.5.5

I'm trying to deploy pgsync in docker alpine v3.10. This version of alpine adds ruby v.2.5.5, that is incompatible with the current pgsync version. BTW, it works with no issues on alpine 3.7 with ruby 2.4.6

/ # pgsync
Traceback (most recent call last):
3: from /usr/bin/pgsync:23:in <main>'
2: from /usr/bin/pgsync:23:in load'
1: from /usr/lib/ruby/gems/2.5.0/gems/pgsync-0.4.1/exe/pgsync:3:in <top (required)>' /usr/lib/ruby/gems/2.5.0/gems/pgsync-0.4.1/exe/pgsync:6:in rescue in <top (required)>': uninitialized constant PgSync (NameError)

Merging multiple DBs?

Can pgsync be used to merge multiple databases? So if I have these DBs all with the same schemas:

DB1 with tables that use UUIDs as keys with data for company 1
DB2 with tables that use UUIDs as keys with data for company 2
DB3 where I want all data for company 1 and 2 in

I'd like to merge DB1 and DB2 together into DB3.

Disable trigger

Hello, I have a problem to do a table sync.
I call it cards and in it I have a column called cover_id and can be null the problem occurs when I have the id of a filled image, it ends up not doing the sync because it says that it did not find the fk in the table of images, but the table of images belongs to the table of cards so I end up falling into the same problem when doing table images sync.

I was able to solve this problem by disabling the trigger of the cards table
ALTER TABLE cards DISABLE TRIGGER ALL;

It would be cool if you could do something like
pgsync cards --preserve --disable-trigger

sorry for my english

Syncing individual rows fails every time.

I didn't know a clear way to succinctly word this in the title, but the issue I'm having affects anything where I give a "SQL template" (for lack of a better term) to a named group.

For example,

💻  pgsync review:1
From: performance on performance-prod-01.ckxih48awmiq.us-west-2.rds.amazonaws.com:5432
To: performance_development_clone on localhost:5432
Table does not exist in source: {"review_workflows"=>"where id = {1}"}
💻  pgsync review_workflows
From: performance on performance-prod-01.ckxih48awmiq.us-west-2.rds.amazonaws.com:5432
To: performance_development_clone on localhost:5432
* Syncing review_workflows
NOTICE:  truncate cascades to table "performance_reviews"
* DONE review_workflows (5.4s)
Completed in 6.5s

In the first instance it claims that the review_workflows table does not exist, but then when I specify that table, it works, so it's clearly there and pgsync is able to find it.

Here's the relevant section of my .pgsync.yaml:

groups:
  review:
    - review_workflows: "where id = {1}"

Remove Heroku Gem Dependency

On April 17th the Legacy API for Heroku will be sunset. I would love to see this gem continue supporting Heroku after that date. It looks like there is a new gem implemented to cover interaction with Heroku without having Heroku Toolbelt installed. I haven't take a super long look at it, but at a glance it looks like it can replace the Heroku gem here.

Link to gem: https://github.com/heroku/platform-api
API reference: https://devcenter.heroku.com/articles/platform-api-reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.