Giter VIP home page Giter VIP logo

pgpipe's Introduction

PgPipe

PgPipe is a Python package designed to efficiently synchronize data between two PostgreSQL databases. It is specifically designed to handle large amounts of data by dividing the transfer process into smaller, manageable chunks.

Features

  • Efficient data transfer using the COPY command
  • Divide data into smaller chunks for improved performance
  • Track transfer progress and allow restarting from checkpoints
  • Handle foreign key dependencies by ensuring the correct order of table transfers

Installation

To install PgPipe, simply use pip:

pip install pgpipe

Usage

The PgPipe tool provides a command line interface (CLI) for synchronizing data between two PostgreSQL databases. The following sections will guide you on how to use this tool effectively.

Setting up the CLI

PgPipe requires you to specify several settings to connect to your PostgreSQL databases and define the synchronization parameters. These settings can be provided through command line arguments or environment variables. Here is the list of command line arguments you can use:

  • --log-level: Set the log level. Default is 'INFO'.
  • --schemas: Specify the list of schemas. Default is 'public'.
  • --chunk-size-offset: Set the chunk size for offset-based chunking. Default is 50000.
  • --chunk-size-timedelta: Set the chunk size for time-range-based chunking in days. Default is 1.
  • --db-schema-path: Specify the path to the DB schema file. Default is './db_schema.json'.
  • --chunk-data-path: Specify the path to the chunk data file. Default is './chunk_data.json'.
  • --source-dsn: Specify the source database DSN. If not set, it is read from the environment variable 'PGPIPE_SOURCE_DSN'.
  • --destination-dsn: Specify the destination database DSN. If not set, it is read from the environment variable 'PGPIPE_DESTINATION_DSN'.

Running the Synchronization Steps

The synchronization process is divided into three steps, each corresponding to a command:

  1. Create Database Schema: To execute this step, run the create_db_schema command. This will read the source database metadata and store it in the DB schema file specified by --db-schema-path.
pgpipe create_db_schema --source-dsn="your_source_dsn" --db-schema-path="your_db_schema_path"
  1. Prepare Chunks: Run the prepare_chunks command to prepare chunk metadata for data transfer. The metadata will be saved to the chunk data file specified by --chunk-data-path.
pgpipe prepare_chunks --source-dsn="your_source_dsn" --db-schema-path="your_db_schema_path" --chunk-data-path="your_chunk_data_path" --chunk-size-offset=50000 --chunk-size-timedelta=1
  1. Transfer Data: The transfer_data command starts the data transfer process, using the prepared metadata to guide the process.
pgpipe transfer_data --source-dsn="your_source_dsn" --destination-dsn="your_destination_dsn" --chunk-data-path="your_chunk_data_path"

If no command is specified, the CLI will print a help message with a list of available commands and settings.

License

PgPipe is released under the MIT License. See the LICENSE file for more details.

pgpipe's People

Contributors

elboerto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.