Postgres-XL Docker

Postgres-XL Docker is a Docker image source for Postgres-XL, the PostgreSQL-based database cluster. The image is based on CentOS, and contains compiled binaries for both the modified PostgreSQL and pgxc_ctl programs. An OpenSSH Server is run, to allow pgxc_ctl control.

The image allows for an arbitrary database cluster topology, starting from an empty configuration and allowing GTM, GTM Proxy, Coordinator, and Datanode nodes to be added as desired. Each service should run in its own container, with an additional Control container recommended for running pgxc_ctl and storing cluster configurations. This only needs to run when actually using the pgxc_ctl program.

Usage

Instructions are for running on Docker Swarm using services and overlay networking. Services are constrained to specific nodes to ensure the availabilty of mounted volumes. If you're using Flocker or an alternative solution, you shouldn't need to do this. Also, if you have an alternative solution for resolution of DNS which survives service container restarts, you might be able to reduce the number of services defined.

Note that the pg_hba.conf written is wide-open for any user on that network; if you use this method, be sure that you trust all users on that network, and isolate client connections using another network. Alternatively, you might like to configure ident or md5.

Networking

Create an overlay network:

pg_cluster=postgresql-1

docker network create \
    -d overlay \
    --internal \
    --opt encrypted \
    $pg_cluster

Control

Create a Control container, with write-access to the SSH volume:

pg_image=tiredpixel/postgres-xl
pg_data=/var/lib/postgres
pg_user=postgres
pg_cluster=postgresql-1
pg_service=ctl-1

docker service create \
    --name $pg_cluster-$pg_service \
    --network $pg_cluster \
    --mount type=volume,src=$pg_cluster-$pg_service,dst=$pg_data \
    --mount type=volume,src=$pg_cluster-ssh,dst=$pg_data/.ssh \
    --constraint node.hostname==$(hostname) \
    $pg_image

Initialise the Control container, optionally passing in the network subnet if you'd like to allow all users on that network to connect to any database (!). If you're using multiple Swarm nodes and not using Flocker or similar, you'll either need to replicate the SSH keys yourself, or alternatively generate multiple keys and add to .ssh/authorized_keys yourself as appropriate. The Control container must be able to reach every node without a password.

pg_subnet=$(docker network inspect -f '{{range .IPAM.Config}}{{.Subnet}}{{end}}' $pg_cluster | tr -d '\n')

docker exec -it -u $pg_user $(docker ps -q -f name=$pg_cluster-ctl) $pg_data/init.sh $pg_subnet

Cluster

Create GTM, GTM Proxy, Coordinator, and Datanode containers as desired, with read-only access to the SSH volume. These should be created from the same image as the Control container, but have independent data volumes.

Create cluster containers, adjusting pg_service or running on different Swarm nodes as desired:

pg_image=tiredpixel/postgres-xl
pg_data=/var/lib/postgres
pg_user=postgres
pg_cluster=postgresql-1

for pg_service in gtm-m-1 coord-m-1 data-m-1 # ...
do
docker service create \
    --name $pg_cluster-$pg_service \
    --network $pg_cluster \
    --mount type=volume,src=$pg_cluster-$pg_service,dst=$pg_data \
    --mount type=volume,src=$pg_cluster-ssh,dst=$pg_data/.ssh:ro \
    --constraint node.hostname==$(hostname) \
    $pg_image
done

Check the status of created services, which should boot single instances:

docker service ls -f name=$pg_cluster

ID            NAME                    REPLICAS  IMAGE               COMMAND
aaaaaaaaaaa1  postgresql-1-coord-m-1  1/1       postgres-xl:latest  
aaaaaaaaaaa2  postgresql-1-coord-m-2  1/1       postgres-xl:latest  
aaaaaaaaaaa3  postgresql-1-coord-m-3  1/1       postgres-xl:latest  
aaaaaaaaaaa4  postgresql-1-ctl-1      1/1       postgres-xl:latest  
aaaaaaaaaaa5  postgresql-1-data-m-1   1/1       postgres-xl:latest  
aaaaaaaaaaa6  postgresql-1-data-m-2   1/1       postgres-xl:latest  
aaaaaaaaaaa7  postgresql-1-data-m-3   1/1       postgres-xl:latest  
aaaaaaaaaaa8  postgresql-1-gtm-m-1    1/1       postgres-xl:latest

Initialisation

Initialise the cluster using pgxc_ctl commands:

pg_user=postgres
pg_cluster=postgresql-1

docker exec -it $(docker ps -q -f name=$pg_cluster-ctl) su $pg_user

pgxc_ctl

add gtm master gtm_m_1 postgresql-1-gtm-m-1 5432 /var/lib/postgres/data/gtm

add coordinator master coord_m_1 postgresql-1-coord-m-1 5432 15432 /var/lib/postgres/data/coord         coordExtraConfig    coordExtraPgHba
add datanode    master data_m_1  postgresql-1-data-m-1  5432 15432 /var/lib/postgres/data/data  none datanodeExtraConfig datanodeExtraPgHba
add coordinator master coord_m_2 postgresql-1-coord-m-2 5432 15432 /var/lib/postgres/data/coord         coordExtraConfig    coordExtraPgHba
add datanode    master data_m_2  postgresql-1-data-m-2  5432 15432 /var/lib/postgres/data/data  none datanodeExtraConfig datanodeExtraPgHba
add coordinator master coord_m_3 postgresql-1-coord-m-3 5432 15432 /var/lib/postgres/data/coord         coordExtraConfig    coordExtraPgHba
add datanode    master data_m_3  postgresql-1-data-m-3  5432 15432 /var/lib/postgres/data/data  none datanodeExtraConfig datanodeExtraPgHba
# ...

monitor all

Running: gtm master
Running: coordinator master coord_m_1
Running: coordinator master coord_m_2
Running: coordinator master coord_m_3
Running: datanode master data_m_1
Running: datanode master data_m_2
Running: datanode master data_m_3

Restart all containers, to allow Supervisor to control the processes. The type of node within the cluster is detected automatically.

Verify that all containers have booted services correctly using monitor all.

After this point, it is no longer necessary to run a Control container. However, you might find it useful to keep for monitoring or modifying the cluster.

Verification

Verify that the cluster is operating as expected by logging into a Coordinator container:

pg_user=postgres
pg_cluster=postgresql-1

docker exec -it $(docker ps -q -f name=$pg_cluster-coord-m) su postgres

psql

SELECT * FROM pgxc_node;

 node_name | node_type | node_port |       node_host        | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+------------------------+----------------+------------------+-------------
 coord_m_1 | C         |      5432 | postgresql-1-coord-m-1 | f              | f                |   767536994
 data_m_1  | D         |      5432 | postgresql-1-data-m-1  | f              | f                |     3954295
 coord_m_2 | C         |      5432 | postgresql-1-coord-m-2 | f              | f                |   752803535
 data_m_2  | D         |      5432 | postgresql-1-data-m-2  | f              | f                |  1425277018
 coord_m_3 | C         |      5432 | postgresql-1-coord-m-3 | f              | f                | -2069407832
 data_m_3  | D         |      5432 | postgresql-1-data-m-3  | f              | f                |   686298590
(6 rows)

Thoughts

Do Coordinator nodes route to Datanodes constrained to be on the same Swarm host, or do they route cross-network expensively?
How can GTM Proxies be utilised efficiently?
What is the role of the pooler ports, and do they work properly?
How can the existing pg_hba.conf rules be improved to be more secure, not requiring trust on the subnet?
Do Coordinator and Datanode Slaves work correctly?
How can GTM Slaves and automated failover be achieved?

Blessing

May you find peace, and help others to do likewise.

mustafayildirim / postgres-xl-docker Goto Github PK