andrewchambers / bupstash Goto Github PK
View Code? Open in Web Editor NEWEasy and efficient encrypted backups.
Home Page: https://bupstash.io
License: MIT License
Easy and efficient encrypted backups.
Home Page: https://bupstash.io
License: MIT License
I don't want an insecure default, but I do want to support disabling encryption. My intended use case is for places where you only care about access controls, and don't mind ignoring encryption at rest.
In various situations, bupstash is in a position to retry upload:
In both situations it might be good to allow the user to opt into this retry, and perhaps set an upper bound.
Using following environment:
FreeBSD 12.2-RELEASE-p2
rustc 1.49.0 (e1884a8e3 2020-12-29)
With the current master (b619813d5e18673bfe357d8ef3b777e75293be86
). I get the following error when doing a cargo build
:
# cargo build
Compiling bupstash v0.6.2 (/root/src/bupstash)
error[E0433]: failed to resolve: could not find `PosixFadviseAdvice` in `fcntl`
--> src/client.rs:825:41
|
825 | ... nix::fcntl::PosixFadviseAdvice::POSIX_FADV_NOREUSE,
| ^^^^^^^^^^^^^^^^^^ could not find `PosixFadviseAdvice` in `fcntl`
error[E0425]: cannot find value `O_NOATIME` in crate `libc`
--> src/client.rs:807:49
|
807 | ... .custom_flags(libc::O_NOATIME)
| ^^^^^^^^^ not found in `libc`
error[E0425]: cannot find function `posix_fadvise` in module `nix::fcntl`
--> src/client.rs:821:37
|
821 | nix::fcntl::posix_fadvise(
| ^^^^^^^^^^^^^ not found in `nix::fcntl`
|
help: consider importing this function
|
1 | use libc::posix_fadvise;
I tried updating to nix 0.19
but that didn't fix anything. I tried looking at the code which seems to be there for freebsd at least (https://docs.rs/nix/0.19.1/src/nix/fcntl.rs.html#583) but apparently something is weird.
This mainly applies to packet too large errors.
Currently when we get with --pick bupstash knows data chunk offset in the chunk stream, but it does not know how to efficiently seek to that point.
I am hoping there is a way to save a path to the pick start in the index, alternatively we may need to add chunk offsets into the htree structure, thought this increases complexity and also maybe removes some deduplication.
If we want to backup a remote directory to a local repository, how might we make that work?
Suppose we have permission to log into a remote machine, but the remote machine does not have permission to login to our server, we should be able to perform 'put' of a remote directory.
This doesn't necessarily have to be a builtin command, but perhaps a script or tutorial that does the correct ssh invocations.
It might be nice to be able to sync items between repositories, for example to export data from a remote server. This is possible manually, but there may be something to gain from an automated command.
We may be able to help users get more consistent snapshots:
The existing concurrent put logic detects cases when files have been resized or removed.
This logic is not totally fool proof, as its possible to alter a file by rewriting it, just after the stat, but before the read, but it does not seem expensive and should help.
One downside is you might be get too many retries if you try to snapshot a busy system.
I think the plan is to wait until we have a stable project, then we can expand the user base by adding more translations. Translating too early might result in a lot of incorrect translations if we change things too much.
Assuming the following directory structure
dir
├── bar
│ └── zxc
└── foo
├── asd
└── qwe
It would be nice if I could do bupstash put dir/bar/zxc dir/foo/asd
to backup only a subset of the directory hierarchy under dir
.
I know I could do something like this with bupstash put --exec name=stuff.tar tar -cvf - dir/bar/zxc dir/foo/asd
, or with exclude patterns, but it feels a bit clunky
Specifically the following seem to occupy the same space:
Curious how your implementation seeks to improve these!
I've trouble finding documentation about the internal algorithm used. You are talking about deduplication, and remote storage, but you don't describe the deduplication algorithm nor the policy used to split file's data into chunks (and how you are identifying chunks).
I'm the author of Frost and I'm interested in all alternative that I've tested.
The current bupstash setup does not handle data corruption in the repository (e.g. through bitrot or storing the repository on mediums with a low MTBF). This could be done by adding an optional erasure code like Reed-Solomon-Codes or similar algorithms.
Since this is a trade between robustness and increase of repository size due to the parity data, the user should be able to choose the acceptable amount of data corruption per blocksize. Since a user might use different repositories with different storage mediums and different existing robustness levels in place, the robustness value should be set on a per-repository level.
Bupstash should also include a subcommand to check and repair a remote repository. Since RAID and tape storage systems use the term "scrub" for this, I suggest to use it here as the subcommand as well.
Just like the --remote-path=
option in borg.
It Is very useful when working with non-conventional paths like in a user home cargo install and when non-traditional servers like Synology NAS force to call binary on the package path.
When snapshotting with 0.5.0 over a slow network, I found it to be 50 percent faster than 0.6.0 with no obvious reason. We should explain this regression.
As an administrator, I care a lot about the robustness of a backup solution. Does it handle bitflips and bit rot? What happens, if a remote repository is corrupted - how many blocks are affected or can I throw away the complete backup? How do I notice bit rot? Is the remote chunk checked for consistency after uploading?
This topic should be added to the homepage documentation as well as to the manpage.
Using following environment:
cargo build --release
fails with the following error
Compiling libc v0.2.80
Compiling cc v1.0.65
Compiling proc-macro2 v1.0.24
Compiling autocfg v1.0.1
Compiling unicode-xid v0.2.1
Compiling syn v1.0.51
Compiling memchr v2.3.4
Compiling typenum v1.12.0
Compiling version_check v0.9.2
Compiling pkg-config v0.3.19
Compiling lazy_static v1.4.0
Compiling serde_derive v1.0.117
Compiling serde v1.0.117
Compiling bitflags v1.2.1
Compiling cfg-if v1.0.0
Compiling ryu v1.0.5
Compiling unicode-width v0.1.8
Compiling regex-syntax v0.6.21
Compiling serde_json v1.0.59
Compiling subtle v2.3.0
Compiling nix v0.17.0
Compiling anyhow v1.0.34
Compiling linked-hash-map v0.5.3
Compiling cfg-if v0.1.10
Compiling fallible-streaming-iterator v0.1.9
Compiling codemap v0.1.3
Compiling arrayvec v0.5.2
Compiling arrayref v0.3.6
Compiling void v1.0.2
Compiling constant_time_eq v0.1.5
Compiling number_prefix v0.3.0
Compiling termcolor v1.1.2
Compiling itoa v0.4.6
Compiling fallible-iterator v0.2.0
Compiling smallvec v1.5.0
Compiling shlex v0.1.1
Compiling path-clean v0.1.0
Compiling rangemap v0.1.8
Compiling glob v0.3.0
Compiling once_cell v1.5.2
Compiling humantime v2.0.1
Compiling num-traits v0.2.14
Compiling crossbeam-utils v0.8.1
Compiling num-integer v0.1.44
Compiling generic-array v0.14.4
Compiling thread_local v1.0.1
Compiling bupstash v0.6.2 (/home/el/bupstash)
Compiling lz4-sys v1.9.2
Compiling libsqlite3-sys v0.18.0
Compiling blake3 v0.3.7
Compiling getopts v0.2.21
Compiling lru-cache v0.1.2
Compiling quote v1.0.7
Compiling aho-corasick v0.7.15
Compiling time v0.1.44
Compiling terminal_size v0.1.15
Compiling atty v0.2.14
Compiling xattr v0.2.2
Compiling filetime v0.2.13
Compiling fs2 v0.4.3
Compiling regex v1.4.2
Compiling crossbeam-channel v0.5.0
Compiling codemap-diagnostic v0.1.1
Compiling tar v0.4.30
Compiling digest v0.9.0
Compiling crypto-mac v0.8.0
Compiling console v0.13.0
Compiling thiserror-impl v1.0.22
Compiling lz4 v1.23.2
Compiling indicatif v0.15.0
Compiling thiserror v1.0.22
Compiling serde_bare v0.3.0
Compiling chrono v0.4.19
Compiling rusqlite v0.23.1
error[E0308]: mismatched types
--> src/base64.rs:15:13
|
15 | out_buf.as_mut_ptr() as *mut i8,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
|
= note: expected raw pointer `*mut u8`
found raw pointer `*mut i8`
error[E0308]: mismatched types
--> src/base64.rs:44:13
|
44 | data.as_ptr() as *const i8,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
|
= note: expected raw pointer `*const u8`
found raw pointer `*const i8`
error[E0308]: mismatched types
--> src/base64.rs:48:13
|
48 | std::ptr::null_mut::<*const i8>(),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
|
= note: expected raw pointer `*mut *const u8`
found raw pointer `*mut *const i8`
error: aborting due to 3 previous errors
For more information about this error, try `rustc --explain E0308`.
error: could not compile `bupstash`
To learn more, run the command again with --verbose.
Should we support them in snapshots.
Because we only start deleting data when we run a garbage collection, we may wish to allow the user to undo an rm if it was an accident.
Potential ideas:
bupstash list --removed
bupstash undo-rm id=...
Originally I did not, this is because it seemed like the lists of hashes would not compress. After thinking more, it seems they may compress if they contain many repeated blocks (e.g. a 1GB block of zero empty). It might not be worth doing in practice and needs experiments/thought.
My ideal backup setup would go straight to my external drive and also to a remote server to provide both local speed and also fault tolerance.
I can currently do this by making two different calls to bupstash, but there may be something to gain from a dedicated way to propagate backups. One potential way to implement this is via repository hooks which are able to forward backups via a dedicated sync command.
We should migrate bit by bit away from the failure package that has been deprecated and change the error handling to concrete errors where it makes sense.
We need CI for all supported platforms.
ForceCommand "bupstash serve --allow-put /home/backups/bupstash-repository"
Should be a script it seems, it cannot have arguments. Double check this is the case and fix the guide if so.
[nix-shell:~/src/bupstash]$ ./target/release/bupstash get -q -k x.key -r ./repox/ --pick debug/build/libsqlite3-sys-ed02f75374eb67aa/stderr id=1c8b3ecbc7df358cd57be2193c1b6d99
thread 'main' panicked at 'assertion failed: range.start < range.end', /home/ac/.cargo/registry/src/github.com-1ecc6299db9ec823/rangemap-0.1.7/src/map.rs:96:9
stack backtrace:
0: std::panicking::begin_panic
1: rangemap::set::RangeSet<T>::insert
2: bupstash::get_main
3: bupstash::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
bupstash serve: remote disconnected
Aborted (core dumped)
We might need small tweaks to get windows to work.
We should be able to perform most of the mark phase of GC without ever blocking other clients. We only need to block them for a final check that we have walked the complete item set.
I just made a bunch of breaking, but useful changes to the repository and protocol and feel guilty about it.
Soon it will be time to commit to not breaking anything ever again, we should explain the plan in the documentation or give a timeline for stability.
freebsd, openbsd, linux and macos are the bare minimum of what we should be testing and supporting.
A hamless edge condition is that when compression is disabled, we also disable compression of the index blocks, this is probably not what we want.
We currently have a single send-log in the ~/.cache directory if the user does not specify it.
We don't support using this concurrently, so if a user does two 'put operations at once, they will see one command do nothing.
We should consider ways to allow the user to run as many 'put commands as they want without having to manually specify a send-log. At the worst case, a progress message should say 'waiting for send-log' or something similar.
Hardlinks pose a slight problem for bupstash as they don't play too well with our dedup view of the world... That being said, it might be possible to track them in our index. We should seriously consider how to best tackle them.
Users need to be able to see what the key id is once they are able to list encrypted items.
Currently we only display and list items for the currently selected decryption key. We should at least provide the user with the ability to remove items for lost keys.
One solution would potentially always list items that are encrypted, and just let user queries filter them, If we do this, we must think carefully about ways to prevent accidental deletion for example, an encrypted entry must not match older-than or newer-than queries.
Another potential UI:
bupstash list --show-encrypted
id=... keyid=... encrypted=true
...
Normally we recommend the user use an fs snapshot or some other mechanism to cease filesystem activity during a directory put. The reality is this isn't always the case, and a 'smeared' backup is probably better than no backup at all.
We can do a few things to gracefully handle files being edited while we are uploading them rather than a full abort.
Currently we are sending these errors as an Abort packet, but really we should disconnect as we cannot report the error promptly with an abort packet.
Unlikely but the abort packet might deadlock too as the client is not reading at that time.
Typically users won't need to interact with
bupstash serve
unless they want
to create
It seems we are too fast for the progress library we use when we get lots of cache hits.
We need a logo and/or a mascot for the project.
It would be nice to allow users to inspect or fetch only the single file they care about from within an automated backup.
Currently we do not store any form of indexing data for snapshots. We could introduce an optional index htree that accompanies the data htree of a backups.
The user can then download the smaller index, checksum and stream offsets of data within a tar snapshot. We could then introduce a 'get between' protocol message to allow streaming the subset of the data we need.
This would allow the user to look within a snapshot without complicating the htree structure itself. The downsides are space, computational overhead and implementation complexity when making and storing backups.
Being disconnected from the server will be shown in the client, the server should remain quiet in this situation.
We should enable PGO for both the rust code, and also the C code in the sqlite3 crate we depend on. We should also benchmark to prove what difference it makes.
This evening I tried to install bupstash using cargo install bupstash
and it did not work correctly on macOS Mojave while using Rust 1.49. However, I was able to compile it successfully from source after doing a git clone
from the GitHub repository!
Some info about my environment:
$ uname -a
Darwin HOSTNAME.local 18.7.0 Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64 x86_64
$ rustc --version
rustc 1.49.0 (e1884a8e3 2020-12-29)
A cargo install bupstash
did not succeed and left me with the following errors:
error[E0433]: failed to resolve: could not find `PosixFadviseAdvice` in `fcntl`
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:825:41
|
825 | ... nix::fcntl::PosixFadviseAdvice::POSIX_FADV_NOREUSE,
| ^^^^^^^^^^^^^^^^^^ could not find `PosixFadviseAdvice` in `fcntl`
error[E0425]: cannot find value `O_NOATIME` in crate `libc`
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:807:49
|
807 | ... .custom_flags(libc::O_NOATIME)
| ^^^^^^^^^ help: a constant with a similar name exists: `MNT_NOATIME`
|
::: /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.81/src/unix/bsd/apple/mod.rs:3083:1
|
3083 | pub const MNT_NOATIME: ::c_int = 0x10000000;
| -------------------------------------------- similarly named constant `MNT_NOATIME` defined here
error[E0425]: cannot find function `posix_fadvise` in module `nix::fcntl`
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/bupstash-0.6.2/src/client.rs:821:37
|
821 | nix::fcntl::posix_fadvise(
| ^^^^^^^^^^^^^ not found in `nix::fcntl`
error: aborting due to 3 previous errors
Please let me know if I can provide any more information or help test any new developments. I'd be thrilled to try something out for you!
We may wish to let a user edit existing items, or perhaps only the tags of an existing item. Currently the user must rm and re-upload if they made a typo in a tag.
Put progress uses average of whole put, not a rolling/weighted average.
This can make the progress speed indicator seem a bit inaccurate.
We can handle this in the same way we handle the serve command.
When putting from a fast disk to a slow disk with the dir backend, the speed stutters as it takes a long time to catch up syncing disk buffers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.