Giter VIP home page Giter VIP logo

pg_branch's Introduction

AI-generated logo that confers more legitimacy than this project deserves

pg_branch

Pre-release Checks

A Postgres extension for quickly creating "branches" of individual databases within a Postgres cluster using copy-on-write file systems like BTRFS.

Table of Contents

  1. Introduction
  2. Getting Started
  3. Future Work
  4. Similar Projects

Introduction

Warning This is pre-alpha software, meant as an experimental proof-of-concept. Use at your own risk!

Postgres makes it easy to create new, empty databases with the CREATE DATABASE command. It's so easy, in fact, that one would think that creating new databases from existing databases would be easy, too. But it's not.

Postgres provides the option to create one database from another using CREATE DATABASE name [WITH] [TEMPLATE template], but doing so has two major restrictions:

  1. there can be no active connections to the template database, and...
  2. performance degrades rapidly as the size of the database increases

pg_branch is a Postgres extension that solves those problems by giving CREATE DATABASE the power of snapshots. If your PGDATA directory is on a copy-on-write file system like BTRFS, the pg_branch extension turns every CREATE DATABASE into an atomic file system snapshot that takes seconds instead of minutes (or hours). In addition, the copy-on-write strategy keeps disk usage low by only writing new segment data files to disk when they're modified (rather than read).

TL;DR: CREATE EXTENSION pg_branch makes CREATE DATABASE <dbname> WITH TEMPLATE <bigdbname> super fast

Getting Started

Before installing pg_branch, it's important to configure the file system that the database cluster will use. The following steps will get you started:

Disclaimer: these steps are written with Linux in mind, and have only been testing on Linux.

  1. install prerequisites

    You'll need an installation of btrfs (usually packaged as btrfs-progs) as well as an up-to-date Rust toolchain and the pgrx subcommand for cargo.

  2. format a disk as BTRFS

    The easiest thing to do here is plug in a USB and check which disk it is with lsblk. Once you've figure out which disk you'd like to reformat, you can do so with:

    sudo mkfs.btrfs /dev/sdX # replace sdX with your drive
  3. mount your btrfs-formatted disk

    You need a directory to mount this disk to, first. Something like:

    sudo mkdir /mnt/database

    ...which you can then use as a mount point for your new btrfs drive with:

    sudo mount /dev/sdX /mnt/database
  4. intialize a Postgres cluster on your mounted disk

    cargo pgrx can take care of initialization as long as it knows where to initialize the data through the PGRX_HOME variable. Something like:

    PGRX_HOME=/mnt/database cargo pgrx init
  5. clone this repo

    The rest of these steps will be done from within this repo, so make sure you've run git clone [email protected]:NAlexPear/pg_branch.git and cd pg_branch.

  6. convert all segment data directories to subvolumes

    Before pg_branch can take over database creation, the subdirectories in the newly-initialized data directory of your database need to be converted to btrfs subvolumes. This repo provides an init.sh script for doing just this that, as long as it's provided a PGDATA variable that points to the data directory of your cluster.

    pgrx data directories have a structure of $PGRX_HOME/data-$PG_VERSION. So if you initialized your project as instructed in step 3, you should be able to run the init.sh script in this repository like so:

    PGDATA=/mnt/database/data-15 ./init.sh

    ...and you should have successfully converted all of the initial databases in your cluster to subvolumes.

  7. get into psql

    The quickest way to jump into a psql session that recognizes pg_branch is to run the following:

    PGX_HOME=/your/mounted/btrfs/disk cargo pgrx run
  8. create the extension in psql with CREATE EXTENSION pg_branch

  9. create some databases

    After creating the extension, you can run CREATE DATABASE <dbname> WITH TEMPLATE <template_dbname> commands to quickly and atomically copy databases without requiring an exclusive lock or dedicated connection. To use the default CREATE DATABASE behavior again, pick an explicit STRATEGY other than SNAPSHOT (i.e. WAL_COPY or FILE_COPY).

Future Work

  1. distribute as pre-compiled extension
  2. implement a cluster-wide fork
  3. support more of the options supported by CREATE DATABASE
  4. streamline setup of the data directory and its file system
  5. support additional copy-on-write file systems like ZFS and XFS
  6. include an example Dockerfile

Similar Projects

This project's use of file system snapshots as a branching mechanism is heavily inspired by pgcow and Postgres.ai. And credit for the concept of "forking" Postgres clusters goes to Heroku's Database Fork feature.

pg_branch's People

Contributors

nalexpear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pg_branch's Issues

TimescaleDB CE Testing

I realize this is pre-alpha so this is not an urgent issue, but testing this alongside TimescaleDB Community Edition (especially compression features) would be great to have as a "known extension compatability" - especially since branching instead of cloning is even more valuable for the scale of data that Timescale is designed to manage.

Neon's branching looks great, but it only supports Timescale's Apache features (so no compression) which is a no-go for us.

I'll try to update with some initial findings if I find some spare time.

I'm not a postgres expert enough to know if there is a fundamental reason why something like compression or other timescale features would be broken by branching so if anyone knows better, I'd be very interested in the info.

Anyways thanks for the extension, it's very exciting!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.