Giter VIP home page Giter VIP logo

bscp's Introduction

Bscp – Secure and efficient copying of block devices

Please find the project website and documentation at

bscp's People

Contributors

art0int avatar countingpine avatar urjaman avatar vog avatar xtaran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bscp's Issues

bscp is slower than rsync

I'm copying a bunch of LV's from the local machine to a remote machine. My old script did the following:

  1. Get the name of the next LV
  2. Snapshot the LV
  3. Calculate the number of "chunks" in the LV (chunksize = 1G)
    3a) Use dd to read the chunk from the snapshot, and write to a ramdrive on the local system
    3b) Use ssh to the remote system, and use dd to read the chunk from the remote LV to a ramdrive on the remote
    3c) Depending on time of day and day of week, either set the rate to 1.5m or unlimited
    3d) Use rsync with the rate to update the remote copy of the "chunk"
    3e) Use dd to write the remote chunk back to the LV
    3f) Delete the local chunk copy
    3g) loop back to 3a for the next chunk
  4. Delete the snapshot

The "new" script seemed like a much better solution, since it would save steps 3a, 3b, and 3e which should all be somewhat "slow" compared to just reading, and writing the changes. It simply replaced all of step 3 with bscp to copy the snapshot to the remote.

The old script took approx 36 hours to complete (max 36 hours, min 29.5 hours, avg 33.8 hours over 10 runs) , and my hope was that bscp would help to reduce it to closer to 24 hours (or less, since it no longer sets a bwlimit during the sync copy). However, it is taking significantly longer, the current run has taken over 72 hours, and is still not complete yet. This isn't the first run, just I don't have the logs from the previous runs.
Can anyone advise if they have seen bscp as being (significantly) less efficient than rsync, or am I doing something wrong? Is there an alternative?

For reference, the maximum speed of the connection is 20Mbps, and the total size of all LV's being copied is 5.4TB.

Switch to a shorter GitHub repository name and project domain name

Switch from the pretty lengthy and confusing GitHub-pages-induced repository name:

bscp-tool.github.io

to the more obivious plain:

bscp

All existing GitHub clones using previous repository URIs should continue to work:

These should all redirect to the new repository URI:

This simplification means, however, that we can't use our current project domain anymore:

Keeping this would require maintaining a separate repository with that long name, as enforced by GitHub pages. This is against our design to keep the code and documentation in one repository. One possibility would be to keep that second repository in sync by some CI triggers, but that would add yet another moving part to the system and might cause trouble in the future. Moreover, recreating that repository would mean we loose the redirect of the repository URIs that would otherwise be automatically generated by GitHub upon the renaming.

Luckily, there is a simpler solution to this messy situation: We can just use an entirely different project domain, adding a CNAME file to the repository and be fine with it. For this purpose, I'm donating the following subdomain of my private main domain to this project:

As this new project domain is even shorter than our previous GitHub subdomain, this will be a win-win situation.

For your attention: @advanced-schema @alkmim @art0int @countingpine @imbmf @jhcloos @koollman @leprelnx @urjaman @xtaran

Destination bigger than source

Hi Volker, I used in the pass blocksync.py for these tasks, but it imposes a mandatory condition: source must be identical in size to the target. I was looking for an alternative an I found your utility in github and after test it I saw that it is possible to copy with a bigger destination device, but I don't know if it is safe because I did not see anything on the web about it.

RuntimeError: Checksum mismatch after transfer

I dont know if the syntax is right.
I'm using:

./bscp /dev/sda2 10.32.0.255:/dev/sda2

And I get:

Traceback (most recent call last):
  File "./bscp", line 162, in <module>
    (in_total, out_total, size) = bscp(local_filename, remote_host, remote_filename, blocksize, hashname)
  File "./bscp", line 142, in bscp
    raise RuntimeError('Checksum mismatch after transfer')
RuntimeError: Checksum mismatch after transfer

No other output before that.
I have python 2.7.15rc1 in Ubuntu 18.

Lack of any progress information

As this is designed for copying block devices, it's highly likely that we are talking about multiple gigabytes of data, so there should really be some form of progress information

Remote script failed to execute properly

CentOS 7:

lvcreate -L 1M -n bscp-test staticnode13
lvcreate -L 2M -n bscp-test2 staticnode13
/tmp/bscp /dev/staticnode13/bscp-test localhost:/dev/staticnode13/bscp-test2 4194304 sha1
results in:
Traceback (most recent call last): File "/tmp/bscp", line 176, in <module> (in_total, out_total, size) = bscp(local_filename, remote_host, remote_filename, blocksize, hashname) File "/tmp/bscp", line 132, in bscp raise RuntimeError('Remote script failed to execute properly') RuntimeError: Remote script failed to execute properly

While it does not make any difference, if you use LVM backed or ZFS backed or File backed raw devices as well if you talk to the sshd via localhost or external.

What exactly does this mean "Remote script failed to execute properly" ?

I understand that its coming from

sanity_digest = hashlib.new(hashname, remote_filename).digest() remote_digest = io.read(len(sanity_digest)) if remote_digest != sanity_digest: raise RuntimeError('Remote script failed to execute properly')

But i have no idea why :-)

Can only work in one direction?

Why?...

Documentation says:

Usage
bscp SRC HOST:DEST [BLOCKSIZE] [HASH]

What if I need to copy a remote block device into a local file? This fails:

$ /path/to/bscp hostname:/dev/vg/vm-vmname.snap vmname.img
Usage:

    bscp SRC HOST:DEST [BLOCKSIZE] [HASH]

Is there any reason it can't work both ways?

Future of bscp

I'd like to hear opinions about the future development of the bscp tool.

Bscp was initially created to fill a gap where Rsync had a really bad performance back then: Backing up single, large block devices over the network, transferring only changed blocks. This was useful to transfer disk images of virtual machines, even encrypted ones.

Regarding #14, Rsync seems to have caught up meanwhile, so maybe this tool is no longer needed at all?

In case there are still users who have important use cases for Bscp: Would anyone step up to take maintainership of this project? I'm willing to create an organization on GitHub and to move Bscp to it, given that one or two people join in.

Otherwise, I'd propose to close and archive this project, and to put a notice into the README redirecting people to Rsync.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.