Giter VIP home page Giter VIP logo

bupstash's People

Contributors

andrewchambers avatar benkard avatar bket avatar gouchi avatar jirutka avatar klemensn avatar nh2 avatar piegamesde avatar romainreignier avatar runfalk avatar sogaiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bupstash's Issues

Option to disable encryption

I don't want an insecure default, but I do want to support disabling encryption. My intended use case is for places where you only care about access controls, and don't mind ignoring encryption at rest.

Allow the user to control retry

In various situations, bupstash is in a position to retry upload:

  • When saving the output of a command with --exec, it is able to retry on failure.
  • When saving a directory and the directory was altered it is able to retry.

In both situations it might be good to allow the user to opt into this retry, and perhaps set an upper bound.

Building on FreeBSD 12 (amd64) fails

Using following environment:

  • FreeBSD 12.2-RELEASE-p2
  • rustc 1.49.0 (e1884a8e3 2020-12-29)

With the current master (b619813d5e18673bfe357d8ef3b777e75293be86). I get the following error when doing a cargo build:

# cargo build
   Compiling bupstash v0.6.2 (/root/src/bupstash)
error[E0433]: failed to resolve: could not find `PosixFadviseAdvice` in `fcntl`
   --> src/client.rs:825:41
    |
825 | ...                   nix::fcntl::PosixFadviseAdvice::POSIX_FADV_NOREUSE,
    |                                   ^^^^^^^^^^^^^^^^^^ could not find `PosixFadviseAdvice` in `fcntl`

error[E0425]: cannot find value `O_NOATIME` in crate `libc`
   --> src/client.rs:807:49
    |
807 | ...                   .custom_flags(libc::O_NOATIME)
    |                                           ^^^^^^^^^ not found in `libc`

error[E0425]: cannot find function `posix_fadvise` in module `nix::fcntl`
   --> src/client.rs:821:37
    |
821 |                         nix::fcntl::posix_fadvise(
    |                                     ^^^^^^^^^^^^^ not found in `nix::fcntl`
    |
help: consider importing this function
    |
1   | use libc::posix_fadvise;

I tried updating to nix 0.19 but that didn't fix anything. I tried looking at the code which seems to be there for freebsd at least (https://docs.rs/nix/0.19.1/src/nix/fcntl.rs.html#583) but apparently something is weird.

More efficient pick

Currently when we get with --pick bupstash knows data chunk offset in the chunk stream, but it does not know how to efficiently seek to that point.

I am hoping there is a way to save a path to the pick start in the index, alternatively we may need to add chunk offsets into the htree structure, thought this increases complexity and also maybe removes some deduplication.

Do a backup of a remote directory to a local repository

If we want to backup a remote directory to a local repository, how might we make that work?

Suppose we have permission to log into a remote machine, but the remote machine does not have permission to login to our server, we should be able to perform 'put' of a remote directory.

This doesn't necessarily have to be a builtin command, but perhaps a script or tutorial that does the correct ssh invocations.

Push/Pull command

It might be nice to be able to sync items between repositories, for example to export data from a remote server. This is possible manually, but there may be something to gain from an automated command.

Enhanced FS smear prevention

We may be able to help users get more consistent snapshots:

  • When starting a put we record the current time.
  • When adding files check modtime is before the put start time.
    • If it is not, abort or retry.

The existing concurrent put logic detects cases when files have been resized or removed.

This logic is not totally fool proof, as its possible to alter a file by rewriting it, just after the stat, but before the read, but it does not seem expensive and should help.

One downside is you might be get too many retries if you try to snapshot a busy system.

Language Localization

I think the plan is to wait until we have a stable project, then we can expand the user base by adding more translations. Translating too early might result in a lot of incorrect translations if we change things too much.

Support for backing up multiple files from different directories with single put

Assuming the following directory structure

dir
├── bar
│   └── zxc
└── foo
    ├── asd
    └── qwe

It would be nice if I could do bupstash put dir/bar/zxc dir/foo/asd to backup only a subset of the directory hierarchy under dir.

I know I could do something like this with bupstash put --exec name=stuff.tar tar -cvf - dir/bar/zxc dir/foo/asd, or with exclude patterns, but it feels a bit clunky

How does it compare to Frost ?

I've trouble finding documentation about the internal algorithm used. You are talking about deduplication, and remote storage, but you don't describe the deduplication algorithm nor the policy used to split file's data into chunks (and how you are identifying chunks).

I'm the author of Frost and I'm interested in all alternative that I've tested.

Add configurable robustness for block recovery

The current bupstash setup does not handle data corruption in the repository (e.g. through bitrot or storing the repository on mediums with a low MTBF). This could be done by adding an optional erasure code like Reed-Solomon-Codes or similar algorithms.

Since this is a trade between robustness and increase of repository size due to the parity data, the user should be able to choose the acceptable amount of data corruption per blocksize. Since a user might use different repositories with different storage mediums and different existing robustness levels in place, the robustness value should be set on a per-repository level.

Bupstash should also include a subcommand to check and repair a remote repository. Since RAID and tape storage systems use the term "scrub" for this, I suggest to use it here as the subcommand as well.

Option to set the bupstash binary path on remote host

Just like the --remote-path= option in borg.
It Is very useful when working with non-conventional paths like in a user home cargo install and when non-traditional servers like Synology NAS force to call binary on the package path.

Documentation on Robustness / Disaster Recovery

As an administrator, I care a lot about the robustness of a backup solution. Does it handle bitflips and bit rot? What happens, if a remote repository is corrupted - how many blocks are affected or can I throw away the complete backup? How do I notice bit rot? Is the remote chunk checked for consistency after uploading?

This topic should be added to the homepage documentation as well as to the manpage.

Building on ARM64 fails

Using following environment:

  • rustc: 1.49.0 (e1884a8e3 2020-12-29)
  • os: Linux rpi4 5.4.83-1-MANJARO-ARM #1 SMP PREEMPT aarch64 GNU/Linux
  • commit: b3e2ee2 (current master)

cargo build --release fails with the following error

   Compiling libc v0.2.80
   Compiling cc v1.0.65
   Compiling proc-macro2 v1.0.24
   Compiling autocfg v1.0.1
   Compiling unicode-xid v0.2.1
   Compiling syn v1.0.51
   Compiling memchr v2.3.4
   Compiling typenum v1.12.0
   Compiling version_check v0.9.2
   Compiling pkg-config v0.3.19
   Compiling lazy_static v1.4.0
   Compiling serde_derive v1.0.117
   Compiling serde v1.0.117
   Compiling bitflags v1.2.1
   Compiling cfg-if v1.0.0
   Compiling ryu v1.0.5
   Compiling unicode-width v0.1.8
   Compiling regex-syntax v0.6.21
   Compiling serde_json v1.0.59
   Compiling subtle v2.3.0
   Compiling nix v0.17.0
   Compiling anyhow v1.0.34
   Compiling linked-hash-map v0.5.3
   Compiling cfg-if v0.1.10
   Compiling fallible-streaming-iterator v0.1.9
   Compiling codemap v0.1.3
   Compiling arrayvec v0.5.2
   Compiling arrayref v0.3.6
   Compiling void v1.0.2
   Compiling constant_time_eq v0.1.5
   Compiling number_prefix v0.3.0
   Compiling termcolor v1.1.2
   Compiling itoa v0.4.6
   Compiling fallible-iterator v0.2.0
   Compiling smallvec v1.5.0
   Compiling shlex v0.1.1
   Compiling path-clean v0.1.0
   Compiling rangemap v0.1.8
   Compiling glob v0.3.0
   Compiling once_cell v1.5.2
   Compiling humantime v2.0.1
   Compiling num-traits v0.2.14
   Compiling crossbeam-utils v0.8.1
   Compiling num-integer v0.1.44
   Compiling generic-array v0.14.4
   Compiling thread_local v1.0.1
   Compiling bupstash v0.6.2 (/home/el/bupstash)
   Compiling lz4-sys v1.9.2
   Compiling libsqlite3-sys v0.18.0
   Compiling blake3 v0.3.7
   Compiling getopts v0.2.21
   Compiling lru-cache v0.1.2
   Compiling quote v1.0.7
   Compiling aho-corasick v0.7.15
   Compiling time v0.1.44
   Compiling terminal_size v0.1.15
   Compiling atty v0.2.14
   Compiling xattr v0.2.2
   Compiling filetime v0.2.13
   Compiling fs2 v0.4.3
   Compiling regex v1.4.2
   Compiling crossbeam-channel v0.5.0
   Compiling codemap-diagnostic v0.1.1
   Compiling tar v0.4.30
   Compiling digest v0.9.0
   Compiling crypto-mac v0.8.0
   Compiling console v0.13.0
   Compiling thiserror-impl v1.0.22
   Compiling lz4 v1.23.2
   Compiling indicatif v0.15.0
   Compiling thiserror v1.0.22
   Compiling serde_bare v0.3.0
   Compiling chrono v0.4.19
   Compiling rusqlite v0.23.1
error[E0308]: mismatched types
  --> src/base64.rs:15:13
   |
15 |             out_buf.as_mut_ptr() as *mut i8,
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
   |
   = note: expected raw pointer `*mut u8`
              found raw pointer `*mut i8`

error[E0308]: mismatched types
  --> src/base64.rs:44:13
   |
44 |             data.as_ptr() as *const i8,
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
   |
   = note: expected raw pointer `*const u8`
              found raw pointer `*const i8`

error[E0308]: mismatched types
  --> src/base64.rs:48:13
   |
48 |             std::ptr::null_mut::<*const i8>(),
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
   |
   = note: expected raw pointer `*mut *const u8`
              found raw pointer `*mut *const i8`

error: aborting due to 3 previous errors

For more information about this error, try `rustc --explain E0308`.
error: could not compile `bupstash`

To learn more, run the command again with --verbose.

Ability to undo rm commands

Because we only start deleting data when we run a garbage collection, we may wish to allow the user to undo an rm if it was an accident.

Potential ideas:

bupstash list --removed
bupstash undo-rm id=... 

Should we compress higher level htree blocks?

Originally I did not, this is because it seemed like the lists of hashes would not compress. After thinking more, it seems they may compress if they contain many repeated blocks (e.g. a 1GB block of zero empty). It might not be worth doing in practice and needs experiments/thought.

Backup propagation

My ideal backup setup would go straight to my external drive and also to a remote server to provide both local speed and also fault tolerance.

I can currently do this by making two different calls to bupstash, but there may be something to gain from a dedicated way to propagate backups. One potential way to implement this is via repository hooks which are able to forward backups via a dedicated sync command.

Replace deprecated failure trait

We should migrate bit by bit away from the failure package that has been deprecated and change the error handling to concrete errors where it makes sense.

Setup CI

We need CI for all supported platforms.

Pick assertion failure

[nix-shell:~/src/bupstash]$ ./target/release/bupstash get -q -k x.key -r ./repox/ --pick debug/build/libsqlite3-sys-ed02f75374eb67aa/stderr   id=1c8b3ecbc7df358cd57be2193c1b6d99
thread 'main' panicked at 'assertion failed: range.start < range.end', /home/ac/.cargo/registry/src/github.com-1ecc6299db9ec823/rangemap-0.1.7/src/map.rs:96:9
stack backtrace:
   0: std::panicking::begin_panic
   1: rangemap::set::RangeSet<T>::insert
   2: bupstash::get_main
   3: bupstash::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
bupstash serve: remote disconnected
Aborted (core dumped)

Documentation for stability policy

I just made a bunch of breaking, but useful changes to the repository and protocol and feel guilty about it.

Soon it will be time to commit to not breaking anything ever again, we should explain the plan in the documentation or give a timeline for stability.

Better user warning and/or concurrency for locked send-log

We currently have a single send-log in the ~/.cache directory if the user does not specify it.
We don't support using this concurrently, so if a user does two 'put operations at once, they will see one command do nothing.

We should consider ways to allow the user to run as many 'put commands as they want without having to manually specify a send-log. At the worst case, a progress message should say 'waiting for send-log' or something similar.

Handle hard links

Hardlinks pose a slight problem for bupstash as they don't play too well with our dedup view of the world... That being said, it might be possible to track them in our index. We should seriously consider how to best tackle them.

Ability to list items with differing or lost keys

Currently we only display and list items for the currently selected decryption key. We should at least provide the user with the ability to remove items for lost keys.

One solution would potentially always list items that are encrypted, and just let user queries filter them, If we do this, we must think carefully about ways to prevent accidental deletion for example, an encrypted entry must not match older-than or newer-than queries.

Another potential UI:

bupstash list --show-encrypted
id=... keyid=... encrypted=true
...

Gracefully handle filesystem smear during put.

Normally we recommend the user use an fs snapshot or some other mechanism to cease filesystem activity during a directory put. The reality is this isn't always the case, and a 'smeared' backup is probably better than no backup at all.

We can do a few things to gracefully handle files being edited while we are uploading them rather than a full abort.

  • If a file was truncated, we can fill in the missing data with zeros.
  • If a file grew, we can stop reading early.
  • If a file was deleted between listing and opening, we can skip it.
    ...

Abort the connection early on io error during upload

Currently we are sending these errors as an Abort packet, but really we should disconnect as we cannot report the error promptly with an abort packet.

Unlikely but the abort packet might deadlock too as the client is not reading at that time.

Logo/mascot

We need a logo and/or a mascot for the project.

Ability to inspect contents of and/or mount snapshots.

It would be nice to allow users to inspect or fetch only the single file they care about from within an automated backup.

Currently we do not store any form of indexing data for snapshots. We could introduce an optional index htree that accompanies the data htree of a backups.

The user can then download the smaller index, checksum and stream offsets of data within a tar snapshot. We could then introduce a 'get between' protocol message to allow streaming the subset of the data we need.

This would allow the user to look within a snapshot without complicating the htree structure itself. The downsides are space, computational overhead and implementation complexity when making and storing backups.

Unable to build bupstash 0.6.2 with `cargo install bupstash` on macOS Mojave with Rust 1.49

This evening I tried to install bupstash using cargo install bupstash and it did not work correctly on macOS Mojave while using Rust 1.49. However, I was able to compile it successfully from source after doing a git clone from the GitHub repository!

Some info about my environment:

$ uname -a
Darwin HOSTNAME.local 18.7.0 Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64 x86_64
$ rustc --version
rustc 1.49.0 (e1884a8e3 2020-12-29)

A cargo install bupstash did not succeed and left me with the following errors:

error[E0433]: failed to resolve: could not find `PosixFadviseAdvice` in `fcntl`
   --> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:825:41
    |
825 | ...                   nix::fcntl::PosixFadviseAdvice::POSIX_FADV_NOREUSE,
    |                                   ^^^^^^^^^^^^^^^^^^ could not find `PosixFadviseAdvice` in `fcntl`

error[E0425]: cannot find value `O_NOATIME` in crate `libc`
    --> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:807:49
     |
807  | ...                   .custom_flags(libc::O_NOATIME)
     |                                           ^^^^^^^^^ help: a constant with a similar name exists: `MNT_NOATIME`
     | 
    ::: /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.81/src/unix/bsd/apple/mod.rs:3083:1
     |
3083 | pub const MNT_NOATIME: ::c_int = 0x10000000;
     | -------------------------------------------- similarly named constant `MNT_NOATIME` defined here

error[E0425]: cannot find function `posix_fadvise` in module `nix::fcntl`
   --> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:821:37
    |
821 |                         nix::fcntl::posix_fadvise(
    |                                     ^^^^^^^^^^^^^ not found in `nix::fcntl`

error: aborting due to 3 previous errors

Please let me know if I can provide any more information or help test any new developments. I'd be thrilled to try something out for you!

Data recovery in the presence of corruption

  • We should provide a way to recover data despite bad chunks, notifying the user about corrupt files.
  • I think instead of a generic corrupt data message, the client could have an option to continue.

Ability to edit items

We may wish to let a user edit existing items, or perhaps only the tags of an existing item. Currently the user must rm and re-upload if they made a typo in a tag.

Put speed rolling/weighted average

Put progress uses average of whole put, not a rolling/weighted average.
This can make the progress speed indicator seem a bit inaccurate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.