Giter VIP home page Giter VIP logo

zrepl's Introduction

GitHub license Language: Go User Docs Support me on Patreon Donate via GitHub Sponsors Donate via Liberapay Donate via PayPal Twitter Chat

zrepl

zrepl is a one-stop ZFS backup & replication solution.

User Documentation

User Documentation can be found at zrepl.github.io.

Bug Reports

  1. If the issue is reproducible, enable debug logging, reproduce and capture the log.
  2. Open an issue on GitHub, with logs pasted as GitHub gists / inline.

Feature Requests

  1. Does your feature request require default values / some kind of configuration? If so, think of an expressive configuration example.
  2. Think of at least one use case that generalizes from your concrete application.
  3. Open an issue on GitHub with example conf & use case attached.
  4. Optional: Post a bounty on the issue, or contact Christian Schwarz for contract work.

The above does not apply if you already implemented everything. Check out the Coding Workflow section below for details.

Building, Releasing, Downstream-Packaging

This section provides an overview of the zrepl build & release process. Check out docs/installation/compile-from-source.rst for build-from-source instructions.

Overview

zrepl is written in Go and uses Go modules to manage dependencies. The documentation is written in ReStructured Text using the Sphinx framework.

Install build dependencies using ./lazy.sh devsetup. lazy.sh uses python3-pip to fetch the build dependencies for the docs - you might want to use a venv. If you just want to install the Go dependencies, run ./lazy.sh godep.

The test suite is split into pure Go tests (make test-go) and platform tests that interact with ZFS and thus generally require root privileges (sudo make test-platform). Platform tests run on their own pool with the name zreplplatformtest, which is created using the file vdev in /tmp.

For a full code coverage profile, run make test-go COVER=1 && sudo make test-platform && make cover-merge. An HTML report can be generated using make cover-html.

Code generation is triggered by make generate. Generated code is committed to the source tree.

Build & Release Process

The Makefile is catering to the needs of developers & CI, not distro packagers. It provides phony targets for

  • local development (building, running tests, etc)
  • building a release in Docker (used by the CI & release management)
  • building .deb and .rpm packages out of the release artifacts.

Build tooling & dependencies are documented as code in lazy.sh. Go dependencies are then fetched by the go command and pip dependencies are pinned through a requirements.txt.

We use CircleCI for continuous integration. There are two workflows:

  • ci runs for every commit / branch / tag pushed to GitHub. It is supposed to run very fast (<5min and provides quick feedback to developers). It runs formatting checks, lints and tests on the most important OSes / architectures. Artifacts are published to minio.cschwarz.com (see GitHub Commit Status).

  • release runs

    • on manual triggers through the CircleCI API (in order to produce a release)
    • periodically on master Artifacts are published to minio.cschwarz.com (see GitHub Commit Status).

Releases are issued via Git tags + GitHub Releases feature. The procedure to issue a release is as follows:

  • Issue the source release:
    • Git tag the release on the master branch.
    • Push the tag.
    • Run ./docs/publish.sh to re-build & push zrepl.github.io.
  • Issue the official binary release:
    • Run the release pipeline (triggered via CircleCI API)
    • Download the artifacts to the release manager's machine.
    • Create a GitHub release, edit the changelog, upload all the release artifacts, including .rpm and .deb files.
    • Issue the GitHub release.
    • Add the .rpm and .deb files to the official zrepl repos, publish those.

Official binary releases are not re-built when Go receives an update. If the Go update is critical to zrepl (e.g. a Go security update that affects zrepl), we'd issue a new source release. The rationale for this is that whereas distros provide a mechanism for this ($zrepl_source_release-$distro_package_revision), GitHub Releases doesn't which means we'd need to update the existing GitHub release's assets, which nobody would notice (no RSS feed updates, etc.). Downstream packagers can read the changelog to determine whether they want to push that minor release into their distro or simply skip it.

Additional Notes to Distro Package Maintainers

  • Run the platform tests (Docs -> Usage -> Platform Tests) on a test system to validate that zrepl's abstractions on top of ZFS work with the system ZFS.
  • Ship a default config that adheres to your distro's hier and logging system.
  • Ship a service manager file and please try to upstream it to this repository.
    • dist/systemd contains a Systemd unit template.
  • Ship other material provided in ./dist, e.g. in /usr/share/zrepl/.
  • Have a look at the Makefile's ZREPL_VERSION variable and how it passed to Go's ldFlags. This is how zrepl version knows what version number to show. Your build system should set the ldFlags flags appropriately and add a prefix or suffix that indicates that the given zrepl binary is a distro build, not an official one.
  • Make sure you are informed about new zrepl versions, e.g. by subscribing to GitHub's release RSS feed.

Contributing Code

  • Open an issue when starting to hack on a new feature
  • Commits should reference the issue they are related to
  • Docs improvements not documenting new features do not require an issue.

Breaking Changes

Backward-incompatible changes must be documented in the git commit message and are listed in docs/changelog.rst.

Glossary & Naming Inconsistencies

In ZFS, dataset refers to the objects filesystem, ZVOL and snapshot.
However, we need a word for filesystem & ZVOL but not a snapshot, bookmark, etc.

Toward the user, the following terminology is used:

  • filesystem: a ZFS filesystem or a ZVOL
  • filesystem version: a ZFS snapshot or a bookmark

Sadly, the zrepl implementation is inconsistent in its use of these words: variables and types are often named dataset when they in fact refer to a filesystem.

There will not be a big refactoring (an attempt was made, but it's destroying too much history without much gain).

However, new contributions & patches should fix naming without further notice in the commit message.

zrepl's People

Contributors

0x3333 avatar antonxy avatar calistoc avatar chenrui333 avatar citrus-it avatar cole-h avatar dsh2dsh avatar insaneprawn avatar jmovs avatar johnramsden avatar joshsouza avatar jtagcat avatar juergenhoetzel avatar kresike avatar lapo-luchini avatar lukas2511 avatar madbrain76 avatar maflo321 avatar mafredri avatar meilihao avatar mek-apelsin avatar moritzfago avatar overhacked avatar poettler-ric avatar problame avatar rbugajewski avatar se-jaeger avatar skirmess avatar ximalas avatar ydylla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zrepl's Issues

Ensure in-memory log in cmd.Daemon is bounded

Practically, it probably won't be a problem. But still, it would be nice to assert that the in-memory buffered log entries per task do not exceed a threshold.

Search for this issue in the codebase.

ref #10

Ideas

  • Find sensible max size for in-memory logs
  • Serialize logs to JSON []byte, keep level in separate value
  • Keep sum of len([]byte) arrays until max size is reached
  • Discard log messages lowest debug level first, then oldest

Docs - detail any dataset properties which are overridden

The zrepl documentation should have a section which details any dataset properties that are overridden.

There appears to be at least one, as I noticed the replicated datasets are not mounted.

This will be important for disaster recovery if it is ever needed - what do I need to put back to normal?

include version information in build artifacts

Version information a la git describe --dirty

  • version subcommand
  • control version subcommand?
  • in documentation -> figure out how to do multi-version docs?
  • in log message on start INFO

formatting of non-string values in `human` log format

Example output (mind the total_rx field at EOL)

[INFO][hn1][pull][storage/backups/zrepl/pull/hn1/zroot/ROOT/default][storage/backups/zrepl/pull/hn1/zroot/ROOT/default => zroot/ROOT/default]: progress on receive operation total_rx="%!s(uint64=22101986)"

Docs: describe PermitRootLogin in sshd_config(5)

The current recommendation is for zrepl to run as root (until the finer details of what needs to be set to allow it to run as an unprivileged user are discovered and clearly documented).

For remote replication, this is hindered by the fact that sshd does not allow root login by default. To work around this without opening a large security hole, the following should be added to /etc/ssh/sshd_config:
PermitRootLogin forced-commands-only

For more info, users should be directed to the sshd_config(5) man page:
https://man.freebsd.org/sshd_config

Suggest this should be added to both the installation doc page and the tutorial.

Tutorial - error in authorized_keys examples

The tutorial example for the authorized_keys file appears to be wrong. Rather than "zrepl stdinserver backups.example.com" I think it should be "zrepl stdinserver prod1.example.com"

Cannot compile from source

From GNU/Linux Ubuntu 17.04, following instructions:

./lazy.sh devsetup

returns

Collecting pygobject==3.22.0 (from -r /home/stephane/code/golang/gopath/src/github.com/zrepl/zrepl/docs/requirements.txt (line 17))
  Could not find a version that satisfies the requirement pygobject==3.22.0 (from -r /home/stephane/code/golang/gopath/src/github.com/zrepl/zrepl/docs/requirements.txt (line 17)) (from versions: )
No matching distribution found for pygobject==3.22.0 (from -r /home/stephane/code/golang/gopath/src/github.com/zrepl/zrepl/docs/requirements.txt (line 17))

and pip3 search pygobject says:

pygi-treeview-dnd (0.1.0)  - Workaround that allows PyGObject programs to use the high level TreeView DnD API

Second replication begins if first replication is not finished

During first replication of many Gigabytes of data, I initially had the interval of the pull job set as 10m, and the first replication would not be finished by the time the second one was called to start. I checked the status many hours later and could see numerous ssh sessions running which led me to believe multiple replication jobs were now running at once (which I dont think should ever happen). I expected that if another replication job was called to start before the previous had finished, the new job would just be cancelled entirely.

I did not look into the state of my replicated data, or if the replications were proceeding ok. It was purely the fact that multiple zrepl ssh sessions were running that led me to believe this was the behaviour.

create bookmarks when snapshotting

If source and pulll diverge (replication lag) and source still has a bookmark of the latest state at puller, we don't have a conflict, just a gap in replication. It is better to resume replication then instead of just throwing Diverged errors.

  • Diffing & Replication logic support (was always supported)
  • Create Bookmarks
  • update docs warning about replication lag (still a thing, but mitigated by this feature)
  • Find sane default for pruning bookmarks (basically they cost nothing, just keep them around?)

Property Replication

Right now, all zfs send invocations on the sending side are without -p, meaning we do not replicate any properties.

On the receiving side, as a safeguard, we override mountpoint in order to protect ourselves from a malicious sender that is trying to mount over some filesystem on the backup server.

This is more of a theoretical advantage and I'm investigating better solutions, e.g. zfs receive -x all, see https://github.com/zrepl/zrepl/wiki/ZFS-Feature-Support-&-Wishlist#zfs-receive--x-all

Agenda

  • Find a way how property replication should work
    • Where to store properties on receiver?
    • Protect against malicious sender (zfs receive -x all)
    • Think about restore procedures
  • Implement it
  • Document it (refs #23 )

Tutorial - initial confusion if config examples are for prod1 or backups

When reading the tutorial, I was initially confused about which PC should have the pull_prod1 job defined. It took a white for me to realize the tutorial section title "Configure backups" was referring to the "backups" server (even though it has a box around it, it wasn't immediately obvious). Perhaps this PC could be re-titled "backup_server" to make it more obvious?

It probably wasn't helped by the typo in "Analysis" section, which says the pull job is defined on prod1 (I believe this should have been "backups").

logging: make tcp outlet fully asynchronous

currently, a slow TCP connection will block the log call for retry_interval

additionally, the dialing / name resolution timeout is not set to retry_interval -> if no name resolution then log blocks for ~30s ?

DOCS - describe "interval" and "grid" parameters

I was initially confused between the interaction between the "interval" parameters and the "grid" pruning parameters. I think I won't be the only one asking these questions, so it should be added to your documentation site. Some questions I had:

  • What happens if I define an source interval of 10m, a source grid with 4x15m, and a pull grid with 3x20m?
  • What is the difference between the source interval and pull interval? What happens if the source interval is different from the pull interval? I assume it would be normal to have the source interval quite frequent (e.g. 10m), and much less frequent on the pull job (e.g. 24h)?
  • For the grid parameters, what is the difference between 1x24d vs 24x1d
  • What does the (keep=all) setting do in the grid parameter?
  • For the interval parameter, I don't seem to be able to set "1d"?

zrepl status subcommand

It can be annoying trying to read verbose log files. It would be nice to have a command "zrepl status" which outputs to the command line the status of any jobs currently running (including progress details?), and also "zrepl list" which outputs to the command line the full suite of snapshots (and size details) available on the PC it is being run (whether they are source snapshots from this machine or pull snapshots from other machines)

Safeguard against misconfigured system time

  • Check if zrepl was down for more than X times the snapshot interval length
  • -> disable pruning in such a case (switch it to dry run?)
  • -> could detect such a case by searching for gaps in the snapshot list -> problem: gotta understand pruning policy (fading inudces gaps)
  • -> should work independently of pruning strategy
  • Optionally do time-check in zrepl and compare to system time and see how far off we are?

docs: evaluate sphinx as alternative

  • readthedocs theme is better than customized docdock theme
  • evaluate cross referencing, but it can't be worse than hugo's
  • expressive, unified admonitions (docdock has pletora of notice, panels, alert, etc)
  • could reference go code easily using third party go domain

test connect subcommand

for a job, should test if it can connect to the other side, maybe show the remote version

  • API
  • subcommand

evaluate `zfs list -o createtxg,guid` availability and stability

FreeBSD

  • 11.X
  • 10.3
  • 9.X ? (not supported anymore)

ZoL

  • 0.6.5.9_4.10.9_1-1 (Arch Linux, ZFS released Feb 3 2017)
  • 0.6.4.2_3.16.39-1+deb8u2 (Debian Jessie, ZFS released June 26 2015)
  • 0.6.3_3.16.39-1+deb8u2 (Debian Jessie, ZFS released June 12 2014)
    ZFS: Loaded module v0.6.3-1.3, ZFS pool version 5000, ZFS filesystem version 5
  • 0.6.x ?
  • 0.5 ?

OS X

  • 1.6.1, Sierra (2017-02-10)

illumos based distros

  • ?

DOCS - does second replication begin if first replication is not finished?

I believe the documentation should explain the zrepl behaviour for what happens if the first replication is not finished at the time that the second replication is called to begin. This needs to cover the pull, push and local scenarios.

This is likely to occur during the first replication of many Gigabytes of data, if the interval of the pull job set as 10m. The first replication would not be finished by the time the second one was called to start. Given that this will be occurring to new users, it is important they are clear on the behaviour they can expect during this first time use.

ZFS channel program support

  • Feature detection
  • Use it in autosnapper (queue up snaps + bookmarks, then do all at once)
  • Use it in pruner (queue up destroys, fallback to individual destroy if atomically destroying all of them fails)
  • Use as replacement for complicated ZFS lists?

Error output when stopping zrepl 0.0.1

After updating to 0.0.1 release, I am getting some errors upon stopping zrepl:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x72a326]

goroutine 37 [running]:
github.com/zrepl/zrepl/logger.(*Logger).WithError(0xc4201aa080, 0x0, 0x0, 0x0)
        /wrkdirs/usr/ports/sysutils/zrepl/work/src/github.com/zrepl/zrepl/logger/logger.go:105 +0x26
github.com/zrepl/zrepl/cmd.(*ControlJob).JobStart(0xc4200f7060, 0xad5ec0, 0xc4201381b0)
        /wrkdirs/usr/ports/sysutils/zrepl/work/src/github.com/zrepl/zrepl/cmd/config_job_control.go:62 +0x3fc
github.com/zrepl/zrepl/cmd.(*Daemon).Loop.func1(0xad5ec0, 0xc4201381b0, 0xc4201b4000, 0xad2b80, 0xc4200f7060)
        /wrkdirs/usr/ports/sysutils/zrepl/work/src/github.com/zrepl/zrepl/cmd/daemon.go:82 +0x45
created by github.com/zrepl/zrepl/cmd.(*Daemon).Loop
        /wrkdirs/usr/ports/sysutils/zrepl/work/src/github.com/zrepl/zrepl/cmd/daemon.go:81 +0x367

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.