christophetd / duplicacy-autobackup Goto Github PK

:floppy_disk: Painless automated backups to multiple storage providers with Docker and duplicacy.

Shell 82.94% Dockerfile 17.06%

backup automated-backup s3 backblaze-b2 azure hubic duplicacy docker

duplicacy-autobackup's Introduction

Duplicacy Autobackup

Duplicacy Autobackup is a Docker image to easily perform automated backups. It uses duplicacy under the hood, and therefore supports:

Multiple storage backends: S3, Backblaze B2, Hubic, Dropbox, SFTP...
Client-side encryption
Deduplication
Multi-versioning
... and more generally, all the features that duplicacy has.

Usage

The following environment variables can be used to configure the backup strategy.

BACKUP_NAME: The name of your backup (should be unique, e.g. prod-db-backups)
BACKUP_ENCRYPTION_KEY: An optional passphrase to encrypt your backups with before they are stored remotely.
BACKUP_SCHEDULE: Cron-like string to define the frequency at which backups should be made (e.g. 0 2 * * * for Every day at 2am). Note that this string should be indicated in the UTC timezone.
BACKUP_LOCATION: Duplicacy URI of where to store the backups.
- S3: s3://[email protected]/bucket/path/to/storage
- Backblaze B2: b2://my-bucket/
- ...

Additionally, the directory you want to backup must be mounted to /data on the container.

You need to provide credentials for the storage provider your of your choice using the following environment variables:

AWS S3: AWS_ACCESS_KEY_ID and AWS_SECRET_KEY
Backblaze B2: B2_ID and B2_KEY
Dropbox: DROPBOX_TOKEN
Azure: AZURE_KEY
Google Cloud Datastore: GCD_TOKEN
SSH/SFTP: SSH_PASSWORD or SSH_KEY_FILE*
Hubic: HUBIC_TOKEN_FILE*
Google Cloud Storage: GCS_TOKEN_FILE*
Onedrive: ONEDRIVE_TOKEN_FILE*
Onedrive Business: ONEDRIVE_BUSINESS_TOKEN_FILE*
Wasabi: WASABI_KEY and WASABI_SECRET

Environment variables marked with an asterix point to files. Those files must be mounted in the container so that they can be accessed from inside it.

If you want to execute an out of schedule backup, you can do so by running the script /app/backup.sh inside the container :

$ docker exec duplicacy-autobackup /app/duplicacy-autobackup.sh backup

Example

Backup /var/lib/mysql to the S3 bucket xtof-db-backups in the AWS region eu-west-1 every night at 2:00am, and encrypt them with the passphrase correct horse battery staple:

$ docker run -d --name duplicacy-autobackup \
    -v /var/lib/mysql:/data \
    -e BACKUP_NAME='prod-db-backups' \
    -e BACKUP_LOCATION='s3://[email protected]/xtof-db-backups' \
    -e BACKUP_SCHEDULE='0 2 * * *' \
    -e BACKUP_ENCRYPTION_KEY='correct horse battery staple' \
    -e AWS_ACCESS_KEY_ID='AKIA...' \
    -e AWS_SECRET_KEY='...' \
    ghcr.io/christophetd/duplicacy-autobackup:v1.4.0

Viewing and restoring backups

Backups are useless if you don't make sure they work. This shows the procedure to list files, versions, and restore a duplicacy backup made using duplicacy-autobackup.

Install Duplicacy: download the latest Duplicacy binary from its Github page, and put it in your path
cd to a directory where you'll restore your files, e.g. /tmp/restore

Run duplicacy init backup_name backup_location, where backup_name and backup_location correspond to the BACKUP_NAME and BACKUP_LOCATION environment variables of your setup.

If you used client-side encryption, add the -encrypt flag: duplicacy init -encrypt backup_name backup_location

You will get a prompt asking for your storage provider's credentials, and, if applicable, your encryption key:

Enter S3 Access Key ID: *****
Enter S3 Secret Access Key: *************
Enter storage password for s3://[email protected]/xtof-db-backups:*******************
The storage 's3://[email protected]/xtof-db-backups' has already been initialized
Compression level: 100
Average chunk size: 4194304
Maximum chunk size: 16777216
Minimum chunk size: 1048576
Chunk seed: fc7e56fb91f8f66b01ba033ec6f7b128bcb3420c66a31468a4f3541407d569bd
/tmp/restore will be backed up to s3://[email protected]/xtof-db-backups with id db-backups

To list the versions of your backups, run:

$ duplicacy list
Storage set to s3://[email protected]/xtof-db-backups
Enter storage password:*******************
Snapshot db-backups revision 1 created at 2018-04-19 09:47 -hash
Snapshot db-backups revision 2 created at 2018-04-19 09:48 
Snapshot db-backups revision 3 created at 2018-04-19 09:49

To view the files of a particular revision, run:

$ duplicacy list -files -r 2  # 2 is the revision number

To restore in the current directory all the files matching *.txt of the revision 2 of the backup, run:
```
$ duplicacy restore -r 2 '*.txt'
```
To restore in the current directory the whole revision 2 of your backup, run:
```
$ duplicacy restore -ignore-owner -r 2
```

More: see Duplicacy's documentation.

Other options

You can have duplicacy-autobackup run a script before and after the backup process by mounting scripts on /scripts/pre-backup.sh and /scripts/post-backup.sh. For instance if you're backing up a MySQL database, this script can involve doing a mysqldump into /data/mydb.sql. If pre-backup.sh exits with a non-zero status code, the backup will not be performed until the next scheduled backup.

Use the following environment variables if you want to customize duplicacy's behavior.

BACKUP_IMMEDIATELY (yes/no): indicates if a backup should be performed immediately after the container is started. Equivalent to launching the container and then running docker exec duplicacy-autobackup /app/duplicacy-autobackup.sh backup. By default, no.
DUPLICACY_INIT_OPTIONS: options passed to duplicacy init the first time a backup is made. By default, -encrypt if BACKUP_ENCRYPTION_KEY is not empty.
DUPLICACY_BACKUP_OPTIONS: options passed to duplicacy backup when a backup is performed. By default: -threads 4 -stats. If you are backing up a hard drive (and not a SSD), it is recommended to use -threads 1 -stats instead (see here for more details).

Pruning old backups

Duplicacy offers an option to prune old backups. By default, duplicacy-autobackup does not perform any pruning. However, you can set the environment variables DUPLICACY_PRUNE_OPTIONS and PRUNE_SCHEDULE to perform automatic pruning. As an example, setting:

DUPLICACY_PRUNE_OPTIONS='-keep 0:360 -keep 30:180 -keep 7:30'
PRUNE_SCHEDULE='0 0 * * *'

Means that:

Every day at midnight, the pruning process runs
When the pruning process runs...
- Any backup older than 1 year is deleted from the remote storage
- Only 1 backup per 30 days is kept for backups between 180 days and 360 days old
- Only 1 backup per 7 days is kept for backups between 7 days and 180 days old
- 1 backup per day is kept for backups between 0 day and 7 days old

See the prune command details for further details.

Choosing the Duplicacy version

When building the container, you can choose the Duplicacy version that will be used in the container image. The build argument DUPLICACY_VERSION is available for that purpose, e.g.:

docker build --build-arg DUPLICACY_VERSION=2.1.0 -t christophetd/duplicacy-autobackup .

Disclaimer

This project uses Duplicacy, which is free for personal use but requires purchasing a licence for non-trial commercial use. See the detailed terms here.

Contact

Feel free to open an issue for any suggestion or bug. You can also tweet @christophetd.

duplicacy-autobackup's People

Contributors

Stargazers

Watchers

duplicacy-autobackup's Issues

Upgrade CI environment to Python 3+

Check the return status of `duplicacy init`

If it fails, we shouldn't try to run duplicacy backup

How to use multiple destinations?

Is that possible[1]?

1 - https://github.com/gilbertchen/duplicacy/wiki/Back-up-to-multiple-storages

Choose whether a backup should be performed when the container is first started

Current behavior: when duplicacy-autobackup is run, it performs a backup.
Desired behavior: add an environment variable BACKUP_NOW to give the user the choice

A couple notes/possible improvements

Here are a couple suggestions for possible improvements or notes to put in the README based on my experience of getting this up and running.

The cron job timing will use the UTC timezone since docker doesn't sync with the system clock. There are some workarounds that can take in an environment variable like this but the easiest workaround for me was just to set the timing based on UTC. I am PST and I want to run at midnight, so I just used 0 7 * * *. May be worth putting a note about this in the README if you aren't interested in adding the environment variable option.
I was getting very poor backup speeds until I removed -threads 4 so that duplicacy would run single threaded. With 4 threads I was getting about 100kB/s at steady state and with only a single thread I'm getting closer to 1MB/s. This post has some good info about this. Again, if you don't want to modify that in the Dockerfile since 4 threads works well for your needs, it may be worth at least adding a note to the README.

Announce duplicacy autobackup on forum

Looks like you've done a magnificent job here. It would be great if you could post about this on the new duplicacy forum at forum.duplicacy.com.

Can't find duplicacy preferences file

I am trying to set up this container and getting the following error in the docker logs: "Failed to read the preference file from repository /data: open /data/.duplicacy/preferences: no such file or directory". It looks like this is preventing the backup from running.

Prune after backup option?

Thanks for this! Can I ask for a prune option that probably runs after the backup? Something that does:

duplicacy prune -keep 0:30 # Keep no snapshots older than 30 days

More info: https://forum.duplicacy.com/t/prune-command-details/1005

Thanks.

URGENT: backup currently broken

Hi, unfortunately the pull request I submitted broke backups that occur on the cron schedule. Backups work initially, and then when the cron schedule triggers further backups, they do not work. I had not noticed because the logs didn't look conspicuous. But you should notice that backups don't include any new files after the first backup.

Sorry about this. I am patching it now and will submit another pull request.

is this project still alive?

Perhaps this project doesn't need lots of updates, but the latest commit was almost a year ago now. There's a PR to bump duplicacy version to 2.1.2 that's open for a couple of weeks without any activity... Just wondering since I was thinking of using it

Add an option to run a script prior to the backup

e.g. if one wants to do a mysqldump before the backup

Support for DUPLICACY_SSH_PASSPHRASE

Please support DUPLICACY_SSH_PASSPHRASE (s. https://github.com/gilbertchen/duplicacy/wiki/Managing-Passwords)

Container writes to `/data`; needs write access.

This docker container relies on the /data mountpoint being writable. This doesn't make sense, as in principle (and for safety reasons) duplicacy should only need read-access to the data it is backing up, and should not modify that data. Mounting /data as read-only after .duplicity is created causes this docker container to first claim that the directory is already initialised, then complain that it isn't initialised as [ ! -d .duplicity] fails (I do not know why).

This seems to be a design flaw of Duplicacy which assumes the user is happy to create a directory .duplicacy in the current working directory containing metadata about the duplicacy backup. Now what is the point of such a thing? If you're backing up, you shouldn't have to back up metadata associated with the backup. How do I backup the metadata of the metadata? Et cetera, ad infinitum.

I propose having this container change directory to a dedicated working directory, e.g. /wd before beginning backup. I don't know why the .duplicity directory is needed at all. But if the user insists, then /wd can be mounted as a volume (persistent storage), which the user does not backup.

Local backup

Is it possible to backup to a mounted nfs share?

MissingRegion: could not find region configuration

Hi,

I'm getting this error when starting Duplicacy:

Failed to download the configuration file from the storage: MissingRegion: could not find region configuration

What could be the issue? What region is it talking about?

Environment variable misspelled

BACKUP_IMMEDIATLY should be BACKUP_IMMEDIATELY (there's a missing E). I understand that you might not want to change this now, but it would be to fix it and just support the misspelling for a period of time.

Feature to exclude files/ folders

Is it possible to exclude specific files/ folders mounted in /data ?

As I have limited online storage I can only select a limited set of files/ folders for remote backup. Would simplify things a lot if you could just exclude those.

Thanks a lot!

Add docker-compose example

Backblaze B2 - B2_ID with Application Key ID instead of Master Application Key ID doesn't work

Duplicacy version 2.1.0 only allows the B2_ID to be set to the Master Application Key ID. This issue was fixed on version 2.1.2 of the Duplicacy CLI.

Application Keys are used to limit permission access to the Buckets. With the right permission, one can enable write only to a bucket by not allowing delete capabilities.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.