Giter VIP home page Giter VIP logo

sda-services-backup's Introduction

Elasticsearch backups

Build the app

go build -ldflags "-extldflags -static" -o backup-svc . 

Configuration

The specific config file to be used can be set via the environmental variable CONFIGFILE, which holds the full path to the config file.

All parts of the config file can be set as ENVs where the separator is _ i.e. the S3 accesskey can be set as S3_ACCESSKEY. ENVs will overrule values set in the config file

For a complete example of configuration options see the example at the bottom.

For deploying the backup service, see example

Create a crypt4gh key pair

The key pair can be created using the crypt4gh tool

Golang verison

crypt4gh -n "key-name" -p "passphrase"

Python version

crypt4gh-keygen --sk private-key.sec.pem --pk public-key.pub.pem

Elasticsearch

Backing up encrypted index to S3

./backup-svc --action es_backup --name [ can be a glob `*INDEX-NAME*` ]
  • backup will be stored in S3 in the format of FULL-ES-INDEX-NAME.bup

Verify that the backup worked:

s3cmd -c PATH_TO_S3CONF_FILE ls s3://BUCKET-NAME/*INDEX-NAME

Restoring index from S3 to ES

./backup-svc --action es_restore --name S3-OBJECT-NAME

Create some indices in ES (only for teting)

./backup-svc --action es_create --name INDEX-NAME

Postgres backup

Backing up a database

Dump

  • backup will be stored in S3 in the format of YYYYMMDDhhmmss-DBNAME.sqldump
./backup-svc --action pg_dump

Pg_basebackup

  • backup will be stored in S3 in the format of YYYYMMDDhhmmss-DBNAME.tar
docker container run --rm -i --name pg-backup --network=host $(docker build -f dev_tools/Dockerfile-backup -q -t backup .) /bin/sda-backup --action pg_basebackup

NOTE

This type of backup runs through a docker container because of some compatibility issues that might appear between the PostgreSQL 13 running in the db container and the local one.

Restoring up a database

Restore dump file

  • The target database must exist when restoring the data.
./backup-svc --action pg_restore --name PG-DUMP-FILE

Restore from physical copy

This is done in more stages.

  • The target database must be stopped before restoring it.

  • Create a docker volume for the physical copy.

  • Get the physical copy from the S3 and unpack it in the docker volume which was created in the previous step

docker container run --rm -i --name pg-backup --network=host -v <docker-volume>:/home $(docker build -f dev_tools/Dockerfile-backup -q -t backup .) /bin/sda-backup --action pg_db-unpack --name TAR-FILE
  • Copy the backup from the its docker volume to the pgdata of the database's docker volume
docker run --rm -v <docker-volume>:/pg-backup -v <database-docker-volume>:/pg-data alpine cp -r /pg-backup/db-backup/ /pg-data/<target-pgdata>/
  • Start the database container.

NOTE

Again here a docker container is used for the same reason explained in the Pg_basebackup section.

MongoDB

Backing up a database

  • backup will be stored in S3 in the format of YYYYMMDDhhmmss-DBNAME.archive
./backup-svc --action mongo_dump --name <DBNAME>

Restoring up a database

./backup-svc --action mongo_restore --name MONGO-ARCHIVE-FILE

Example configuration file

crypt4ghPublicKey: "publicKey.pub.pem"
crypt4ghPrivateKey: "privateKey.sec.pem"
crypt4ghPassphrase: ""
loglevel: debug
s3:
  url: "FQDN URI" #https://s3.example.com
  #port: 9000 #only needed if the port difers from the standard HTTP/HTTPS ports
  accesskey: "accesskey"
  secretkey: "secret-accesskey"
  bucket: "bucket-name"
  #cacert: "path/to/ca-root"
elastic:
  host: "FQDN URI" # https://es.example.com
  #port: 9200 # only needed if the port difers from the standard HTTP/HTTPS ports
  user: "elastic-user"
  password: "elastic-password"
  #cacert: "path/to/ca-root"
  batchSize: 50 # How many documents to retrieve from elastic search at a time, default 50 (should probably be at least 2000
  filePrefix: "" # Can be emtpy string, useful in case an index has been written to and you want to backup a new copy
db:
  host: "hostname or IP" #pg.example.com, 127.0.0.1
  #port: 5432 #only needed if the postgresql databse listens to a different port
  user: "db-user"
  password: "db-password"
  database: "database-name"
  #cacert: "path/to/ca-root"
  #clientcert: "path/to/clientcert" #only needed if sslmode = verify-peer
  #clientkey: "path/to/clientkey" #only needed if sslmode = verify-peer
  #sslmode: "verify-peer" #
mongo:
  host: "hostname or IP with portnuber" #example.com:portnumber, 127.0.0.1:27017
  user: "backup"
  password: "backup"
  authSource: "admin"
  replicaset: ""
  #tls: true
  #cacert: "path/to/ca-root" #optional
  #clientcert: "path/to/clientcert" # needed if tls=true

sda-services-backup's People

Contributors

aaperis avatar darthvader2 avatar dbampalikis avatar dependabot[bot] avatar jbygdell avatar jonandernovella avatar kostas-kou avatar norling avatar pahatz avatar pontus avatar viklund avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sda-services-backup's Issues

ES Backup: Compress backups

So they don't take up so much space in the s3

First compress then encrypt, since encryption randomises the bits which makes compression not really efficient.

Acceptance criteria:

  • future proof format: compression and decompression using a file format that we think lasts
  • expected: approx the same compression level as gzip

Replace AES encryption with assymetic encryption

Using symmetric encryption means that the decryption needs to live in the cluster which might be a security issue.

We should replace it with asymmetic encryption so that only the public key needs to be available in the system.

Streaming elastic store

Currently, in the sda-backup tool, all elastic indices are dumped into a buffer before the extraction of the logs.

DoD: code that does not dump all the indices at memory in one go, but rather does it in a streaming fashion by reading until it's done.

ES backup doesn't fail if it cant find the correct index.

If the index is called foo-123 and backup-svc is started with the --name flag set to "bar-*" no error is generated and the app will exit cleanly.

A/C:
Exit with error if the expected index can't be found.

Description:
If the elastic search API returns with non-200 for index existence backup-svc should exit with a non-zero exit code so we can get error messages from kubernetes. There could be other cases where the backup-svc exits with Success (i.e. zero) where the connection with Elastic Search fails.

backup-svc should only return 0 after a successful backup.

If some case is very hard to test for, we don't have to write an integration test for that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.