Giter VIP home page Giter VIP logo

msrsync's Introduction

This project is not actively developed. Please have a look at the alternatives in the motivation section.

msrsync: maximize rsync bandwidth usage

msrsync (multi-stream rsync) is a python wrapper around rsync. It only depends on python >= 2.6 and rsync.

It will split the transfer in multiple buckets while the source is scanned and will hopefully help maximizing the usage of the available bandwidth by running a configurable number of rsync processes in parallel. The main limitation is it does not handle remote source or target directory, they must be locally accessible (local disk, nfs/cifs/other mountpoint). I hope to address this in a near future.

Quick example

$ msrsync -p 4 /source /destination # you can also use -P/--progress and --stats options

This will copy /source directory in the /destination directory (same behaviour as rsync regarding the slash handling) using 4 rsync processes (using "-aS --numeric-ids" as default option. Could be override with --rsync option). msrsync will split the files and directory list into bucket of 1G or 1000 files maximum (see --size and --files options) before feeding them to each rsync process in parallel using the --files-from option. As long as the source and the destination can cope with the parallel I/O (think big boring "enterprise grade" NAS), it should be faster than a single rsync.

msrsync shares the same spirit as fpart (and its fpsync associated tool) by Ganaël Laplanche or parsync by Harry Mangalam. Those are two fantastic much more complete tools used in the field to do real work. Please check them out, they might be what you're looking for.

You can also check fcp from the pcircle project. It looks very powerful. See the associated publication.

Motivation

Why write msrsync if tools like fpart, parsync or pftool exist ? While reasonable, their dependencies can be a point of friction given the constraints we can have on a given system. When you're lucky, you can use your package manager (fpart seems to be well supported among various GNU/Linux and FreeBSD distribution: FreeBSD, Debian, Ubuntu, Archlinux, OBS) to deal with the requirements but more often than not, I found myself struggling with the sad state of the machine I'm working with.

That's why the only dependencies of msrsync are python >=2.6 and rsync. What python 2.6 ? I'm aiming RHEL6 like distribution as a minimum requirement here, so I'm stuck with python 2.6. I miss some cool features, but that's part of the project.

The devil is in the details. If you need a starting point to think about data migration, this overview by Jeff Layton is very informative: Moving Your Data – It’s Not Always Pleasant.

The "How to transfer large amounts of data via network" article by parsync author is updated regularly and its worth a read also.

If you can read french, I co-wrote an article with Ganaël Laplanche about fpart : Parallélisez vos transferts de fichiers.

You might be also interested by this Intel whitepaper on data migration : Data Migration with Intel® Enterprise Edition for Lustre* Software which mentions all of those tools (but not msrsync).

Requirements

python >= 2.6 and rsync

Installation

msrsync is a single python file, you just have to download it. Or if you prefer, you can clone the repository and use the provided Makefile:

$ wget https://raw.githubusercontent.com/jbd/msrsync/master/msrsync && chmod +x msrsync

or

$ git clone https://github.com/jbd/msrsync && cd msrsync && sudo make install

Usage

$ msrsync --help
usage: msrsync [options] [--rsync "rsync-options-string"] SRCDIR [SRCDIR2...] DESTDIR
   or: msrsync --selftest

msrsync options:
    -p, --processes ...   number of rsync processes to use [1]
    -f, --files ...       limit buckets to <files> files number [1000]
    -s, --size ...        limit partitions to BYTES size (1024 suffixes: K, M, G, T, P, E, Z, Y) [1G]
    -b, --buckets ...     where to put the buckets files (default: auto temporary directory)
    -k, --keep            do not remove buckets directory at the end
    -j, --show            show bucket directory
    -P, --progress        show progress
    --stats               show additional stats
    -d, --dry-run         do not run rsync processes
    -v, --version         print version

rsync options:
    -r, --rsync ...       MUST be last option. rsync options as a quoted string ["-aS --numeric-ids"]. The "--from0 --files-from=... --quiet --verbose --stats --log-file=..." options will ALWAYS be added, no
                            matter what. Be aware that this will affect all rsync *from/filter files if you want to use them. See rsync(1) manpage for details.

self-test options:
    -t, --selftest        run the integrated unit and functional tests
    -e, --bench           run benchmarks
    -g, --benchshm        run benchmarks in /dev/shm or the directory in $SHM environment variable

If you want to use specific options for the rsync processes, use the --rsync option.

$ msrsync -p4 --rsync "-a --numeric-ids --inplace" source destination

Some examples:

$ msrsync -p 8 /usr/share/doc/ /tmp/doc/
$ msrsync -P -p 8 /usr/share/doc/ /tmp/doc/
[33491/33491 entries] [602.1 M/602.1 M transferred] [3378 entries/s] [60.7 M/s bw] [monq 1] [jq 1]
$ msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/
[33491/33491 entries] [602.1 M/602.1 M transferred] [3533 entries/s] [63.5 M/s bw] [monq 1] [jq 1]
Status: SUCCESS
Working directory: /home/jbdenis/Code/msrsync
Command line: ./msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/
Total size: 602.1 M
Total entries: 33491
Buckets number: 34
Mean entries per bucket: 985
Mean size per bucket: 17.7 M
Entries per second: 3533
Speed: 63.5 M/s
Rsync workers: 8
Total rsync's processes (34) cumulative runtime: 73.0s
Crawl time: 0.4s (4.3% of total runtime)
Total time: 9.5s

Performance

You can launch a benchmark using the --bench option or make bench. It is only for testing purpose. They are comparing the performance between vanilla rsync and msrsync using multiple options. Since I'm just creating a huge fake file tree with empty files, you won't see any msrsync benefits here, unless you're trying with many many files. They need to be run as root since I'm dropping disk cache between run.

$ sudo make bench # or sudo msrsync --bench
Benchmarks with 100000 entries (95% of files):
rsync -a --numeric-ids took 14.05 seconds (speedup x1)
msrsync --processes 1 --files 1000 --size 1G took 18.58 seconds (speedup x0.76)
msrsync --processes 2 --files 1000 --size 1G took 10.61 seconds (speedup x1.32)
msrsync --processes 4 --files 1000 --size 1G took 6.60 seconds (speedup x2.13)
msrsync --processes 8 --files 1000 --size 1G took 6.58 seconds (speedup x2.14)
msrsync --processes 16 --files 1000 --size 1G took 6.66 seconds (speedup x2.11)

Please test on real data instead =). There is also a --benchshm option that will perform the benchmark in /dev/shm.

Here is a real test on a big nas box (not known for handling small files well) on a 1G network (you'll see that is more than useless due to the I/O overhead) with the linux 4.0.4 kernel decompressed source 21 times in different folders:

$ ls /mnt/nfs/linux-src/
0  1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9
$ du -s --apparent-size --bytes /mnt/nfs/linux-src
11688149821     /mnt/nfs/linux-src
$ du -s --apparent-size --human /mnt/nfs/linux-src
11G     /mnt/nfs/linux-src
$ find /mnt/nfs/linux-src -type f | wc -l
1027908
$ find /mnt/nfs/linux-src -type d | wc -l
66360

The source and the destination are on an nfs mount.

Let's run rsync and msrsync with a various number of process:

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ time rsync -a --numeric-ids /mnt/nfs/linux-src /mnt/nfs/dest

real    136m10.406s
user    1m54.939s
sys     7m31.188s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 1 /mnt/nfs/linux-src /mnt/nfs/dest

real    144m8.954s
user    2m20.426s
sys     8m4.127s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 2 /mnt/nfs/linux-src /mnt/nfs/dest

real    73m57.312s
user    2m27.543s
sys     7m56.484s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 4 /mnt/nfs/linux-src /mnt/nfs/dest

real    42m31.105s
user    2m24.196s
sys     7m46.568s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 8 /mnt/nfs/linux-src /mnt/nfs/dest

real    36m55.141s
user    2m27.149s
sys     7m40.392s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 16 /mnt/nfs/linux-src /mnt/nfs/dest

real    33m0.976s
user    2m35.848s
sys     7m40.623s

Ridiculous rates due to the size of each file and the I/O overhead (nfs + network), but that's a real use case and we've got nice speedup without too much thinking : just use msrync and you're good to go. That's exactly what I wanted. Here is a summary of the previous results:

 Command Time Entries per second Bandwidth (MBytes/s)  Speedup
rsync 136m10s 133 1.36 x1
msrsync -p 1 144m9s 126 1.28 x0.94
msrsync -p 2 73m57s 246 2.51 x1.84
msrsync -p 4 42m31s 428 4.36 x3.20
msrsync -p 8 36m55s 494 5.03 x3.68
msrsync -p 16 33m0s 552 5.62 x4.12

Astute readers will notify the slight overhead of msrync over the equivalent rsync in the single process case. This overhead vanishes (but still exists) when you increase processes number.

Notes

  • The rsync processes are always run with the --from0 --files-from=... --quiet --verbose --stats --log-file=... options, no matter what. --from0 option affects --exclude-from, --include-from, --files-from, and any merged files specified in a --filter rule.

  • This may seem obvious but if the source or the destination of the copy cannot handle parallel I/O well, you won't see any benefits (quite the opposite in fact) using msrsync.

Development

I'm targeting python 2.6 without external dependencies besides rsync. The provided Makefile is just an helper around the embedded testing and coverage.py:

$ make help
Please use `make <target>' where <target> is one of
  clean         => clean all generated files
  cov           => coverage report using /usr/bin/python-coverage (use COVERAGE env to change that)
  covhtml       => coverage html report
  man           => build manpage
  test          => run embedded tests
  install       => install msrsync in /usr/bin (use DESTDIR env to change that)
  lint          => run pylint
  bench         => run benchmarks (linux only. Need root to drop buffer cache between run)
  benchshm      => run benchmarks using /dev/shm (linux only. Need root to drop buffer cache between run)

There is an integrated test suite (--selftest option, or make test). Since I'm using unittest from python 2.6 library, I cannot capture the output of the tests (buffer parameter from TestResult object appeared in 2.7).

$ make test # or msrsync --selftest
test_get_human_size (__main__.TestHelpers)
convert bytes to human readable string ... ok
test_get_human_size2 (__main__.TestHelpers)
convert bytes to human readable string ... ok
test_human_size (__main__.TestHelpers)
convert human readable size to bytes ... ok
...
test simple msrsync synchronisation ... ok
test_msrsync_cli_2_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_msrsync_cli_4_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_msrsync_cli_8_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_simple_msrsync_cli (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_simple_rsync (__main__.TestSyncCLI)
test simple rsync synchronisation ... ok

----------------------------------------------------------------------
Ran 29 tests in 3.320s

OK

msrsync's People

Contributors

jbd avatar madmax2012 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msrsync's Issues

issue with rsync option --delete

it seems that passing the --delete option to rsync produces inconsisten backups when using p >= 2

I used to backup a large nfs directory and the copied directory was 39GB instead of 42.

I think that the --delete option deletes files handled by other rsync processes. Is this the case?

Allowing remote hosts in destination

I'd like to be able to specify user@host:/dir as a remote destination. The code appears to only allow local destinations. Is this just a matter of modifying _check_srcs_dest ?

Performance transfer to remote host

Hi,
I'm running 2 hosts with identical HW.
The NIC supports a bandwith of >20Gbits/sec.
Please check the output of iperf below.

ld4465:~ # iperf2 -c 192.168.100.11
------------------------------------------------------------
Client connecting to 192.168.100.11, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.13 port 60636 connected with 192.168.100.11 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  25.1 GBytes  21.5 Gbits/sec

I used NFS mount to connect the target server to the source server.
Then I started msrsync with these parameters:
ld4464:/backup # msrsync -P -p 16 /backup/ML1/ /mnt/ML1/

When I check the number of rsync-processes on the source host, I get only 9!

ld4464:~ # ps -ef | grep "rsync -a"
root     73333 73230 87 14:46 pts/1    00:07:56 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpesZxD1 /backup/ML1 /mnt/ML1/
root     73334 73232 89 14:46 pts/1    00:08:10 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpfbXcjJ /backup/ML1 /mnt/ML1/
root     73335 73231 85 14:46 pts/1    00:07:48 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpvQTsDf /backup/ML1 /mnt/ML1/
root     73339 73334  0 14:46 pts/1    00:00:00 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpfbXcjJ /backup/ML1 /mnt/ML1/
root     73340 73333  0 14:46 pts/1    00:00:00 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpesZxD1 /backup/ML1 /mnt/ML1/
root     73341 73339 95 14:46 pts/1    00:08:42 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpfbXcjJ /backup/ML1 /mnt/ML1/
root     73342 73340 92 14:46 pts/1    00:08:26 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpesZxD1 /backup/ML1 /mnt/ML1/
root     73343 73335  0 14:46 pts/1    00:00:00 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpvQTsDf /backup/ML1 /mnt/ML1/
root     73344 73343 92 14:46 pts/1    00:08:26 /usr/bin/rsync -a --numeric-ids --from0 --files-from=/tmp/msrsync-tq0w9Z/tmpvQTsDf /backup/ML1 /mnt/ML1/

Why are only 9 processes running?

only 5 processes

I am testing capabilities of your app.
I made folder inside subfolder with 10 1GB files.

it works well with option -p 2,3,4,5 but after 5 it just keep using 5 threads. I can see that from refreshing target folder.
Do u know why?

how to resume from the last termination of msrsync

Hi, Thanks for the tool. This is really helpful. I have a requirement to copy 2 TB of files from source to target nfs file system.
After copying about 1TB, ./msrsync failed due to inodes utilisation on the target file system, which is fixed now.

But when I restarted the ./msrsync, it did not start actual rsync it is just processing existing files for almost 1 day now. Is there a way to quickly resume the copy without actually verifying the existing files? Please let me if my understanding is correct and how to overcome this?

./msrsync -p 32 --progress --stats
[12433616/12433616 entries] [1.2 T/1.2 T transferred] [226 entries/s] [23.6 M/s bw] [monq 0] [jq 0]

Does not show progress

Hello, I'm running with the following command and it doesn't show the progress.

msrsync -P -p 8 --rsync "-azP" $DIR1/ $DIR

--rsync --log-file being ignored?

Hi,
I think the always present --log-file option is overriding or ignoring the fact that log-file is set in the -r option.
Even with log-file set, the location seems to get put into /tmp.

I somewhat verified this by this hack:

if '--log-file' in options:
            rsync_cmd = "%s %s %s %s" % (RSYNC_EXE, options + ' --quiet --stats --verbose --from0', src + os.sep, dst)
        else:
            rsync_cmd = "%s %s %s %s" % (RSYNC_EXE, options + ' --quiet --stats --verbose --from0 --log-file %s' % rsync_log, src + os.sep, dst)

Which seems to place it to the desired file. Is this by design?

Displaying progress

Hello!

I have just recently found your program and it helped me alot to shift x TB of data from A to B.

However, I would like to ask you if you can implement another feature: progress display.
I mean not a forecast, but just to know how many bytes have been transferred, and how many are still pending.

THX

Missing SSH support

The main reason for me in using this tools was to speed ssh transfers, but this doesn't seem supported?

$ time msrsync  -p8 --progress  --rsync ' -a -e "ssh -o Compression=no"'  [email protected]:/mnt/data/Bilder/ Bilder_syncdir/syncdir/
Source '[email protected]:/mnt/data/Bilder/' is not a directory

UnboundLocalError: local variable 'rsync_result' referenced before assignment

msrsync --stats -d -p 4 --rsync "-ravutg --exclude=lost+found" /source/ /destination/
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "./msrsync", line 824, in rsync_worker
rsync_mon_result = {"type": TYPE_RSYNC, "rsync_result": rsync_result, "size": bucket_size, "files_nr": bucket_files_nr, "jq_size": jobs_queue.qsize()}
UnboundLocalError: local variable 'rsync_result' referenced before assignment

msrync with rsyncd

Hello,

Hope you are well and safe. I 'd like to ask whether it's possible to use msrsync with the rsync daemon in order to record the changes in the source and speed up the decision times for what needs to be synced or if this will confuse things for msrysnc. Thanks.

Best regards,
G

-d flag causes error

repro: run with -d flag. There is no else for the line that checks if it is present, so it attempts to continue with the loop without running rsync, so it throws the error below:

Process Process-2:
Traceback (most recent call last):
File "/misc/local/python-2.7.11/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/misc/local/python-2.7.11/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/bin/msrsync", line 824, in rsync_worker
rsync_mon_result = {"type": TYPE_RSYNC, "rsync_result": rsync_result, "size": bucket_size, "files_nr": bucket_files_nr, "jq_size": jobs_queue.qsize()}
UnboundLocalError: local variable 'rsync_result' referenced before assignment

msrsync fails with space in path

Works fine with no space in path.

[root@media ~]# pwd
/root
[root@media ~]# ls -l ./test/|wc -l
100
[root@media ~]# du -hs test/
400K    test/
[root@media ~]# msrsync -p2 test/ /root/test2/
[root@media ~]# echo $?
0
[root@media ~]# ls -l ./test2/|wc -l
100
[root@media ~]# du -hs test2/
404K    test2/

Fails with space in path, even when quoted.

[root@media ~]# msrsync -p2 test/ "/root/test 2/"
rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

rsync is a file transfer program capable of efficient remote update
via a fast differencing algorithm.

Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
  or   rsync [OPTION]... [USER@]HOST:SRC [DEST]
  or   rsync [OPTION]... [USER@]HOST::SRC [DEST]
  or   rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.

Options
 -v, --verbose               increase verbosity
     --info=FLAGS            fine-grained informational verbosity
     --debug=FLAGS           fine-grained debug verbosity
[...]
 -4, --ipv4                  prefer IPv4
 -6, --ipv6                  prefer IPv6
     --version               print version number
(-h) --help                  show this help (-h is --help only if used alone)

Use "rsync --daemon --help" to see the daemon-mode command-line options.
Please see the rsync(1) and rsyncd.conf(5) man pages for full documentation.
See http://rsync.samba.org/ for updates, bug reports, and answers
rsync error: syntax or usage error (code 1) at options.c(2301) [client=3.1.2]
errors during rsync command (see '/tmp/msrsync-nhl2uc/0000/0000/tmp6jCVFu.log' rsync log file):
/usr/bin/rsync -aS --numeric-ids --quiet --verbose --stats --from0 --files-from=/tmp/msrsync-nhl2uc/0000/0000/tmp6jCVFu --log-file /tmp/msrsync-nhl2uc/0000/0000/tmp6jCVFu.log test /root/test 2/

msrsync error: somes files/attr were not transferred (see previous errors)

[root@media ~]# cat /tmp/msrsync-nhl2uc/0000/0000/tmp6jCVFu.log
cat: /tmp/msrsync-nhl2uc/0000/0000/tmp6jCVFu.log: No such file or directory

And with space in src path:

[root@media ~]# msrsync -p2 "/root/test 2/" "/root/test/"
rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

rsync is a file transfer program capable of efficient remote update
via a fast differencing algorithm.

Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
  or   rsync [OPTION]... [USER@]HOST:SRC [DEST]
  or   rsync [OPTION]... [USER@]HOST::SRC [DEST]
  or   rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.

Options
 -v, --verbose               increase verbosity
     --info=FLAGS            fine-grained informational verbosity
     --debug=FLAGS           fine-grained debug verbosity
[...]
 -4, --ipv4                  prefer IPv4
 -6, --ipv6                  prefer IPv6
     --version               print version number
(-h) --help                  show this help (-h is --help only if used alone)

Use "rsync --daemon --help" to see the daemon-mode command-line options.
Please see the rsync(1) and rsyncd.conf(5) man pages for full documentation.
See http://rsync.samba.org/ for updates, bug reports, and answers
rsync error: syntax or usage error (code 1) at options.c(2301) [client=3.1.2]
errors during rsync command (see '/tmp/msrsync-4aGcsq/0000/0000/tmpePSZwr.log' rsync log file):
/usr/bin/rsync -aS --numeric-ids --quiet --verbose --stats --from0 --files-from=/tmp/msrsync-4aGcsq/0000/0000/tmpePSZwr --log-file /tmp/msrsync-4aGcsq/0000/0000/tmpePSZwr.log /root/test 2 /root/test/

msrsync error: somes files/attr were not transferred (see previous errors)
[root@media ~]#

installation is impossible on Debian 12

I need a version that does not require Python 2.X but can automatically work with Python 3.X, as Debian 12 makes it impossible to install Python 2.7.18, the latest. Is this possible? If not, is can you recommend a similar tool but a more updated one?

--delete flag

Can you explain why a --delete flag would be catastrophic with msrsync? Ideally I'd like to be able to use msrsync to create a backup from one enterprise NAS to another, but without the delete flag, that's not really feasible.

On centos 8 stream it fails

This fails
msrsync -p32 --rsync "-avzh --numeric-ids" /var/lib/mysql/* .
Source '/var/lib/mysql/aria_log.00000001' is not a directory

but this works
rsync -avzh --numeric-ids /var/lib/mysql/* .

Python 3 support

Please make it compatible with Python3

sudo msrsync -p 8 -xa --progress --exclude /nfs / /nfs/k3s-wrkr02
/usr/bin/env: ‘python2’: No such file or directory

-r option not parsed?

Unless I'm greatly mistaken, the -r flag is not being processed into --rsync. For example, I had -r '-W --delete --inplace' in my command, but it did not throw an exception on the --delete, and looking at the destination, it was definitely using temp files rather than using --inplace.

Randomize the buckets?

Hi, I am syncing two file servers. the source fileserver is unraid which is an jbod consumer type fileserver. The destination where i'm running this script is freenas. I've NFS mounted the unraid share locally. Ideally would like to spread transfer across all jbod drives, but the file positions kind of sequential (that's the way unraid populates the drives).. so if this script could randomize or perhaps have threads starting in offset positions, it could spread the IO transfer? just a thought. thanks.

file deletion

Hi,
I've tried to add to rsync params the --delete option for enable a syncronization scenario and it doesn't works.

It's a desidered behaviour?

Keep Option

Can we get a little more clarification of the keep option? Does msrsync do a mirrored sync without that option specified? I'm a little unclear about exactly what keep does. By default rsync doesn't delete anything so i'm wondering if that's the same approach msrsync is taking

delete option

I ran the msrync to try to replace an existing rsync command and got the error message

error: Cannot use --delete option type with msrsync. It would lead to disaster :)

While I find the error message to be acceptable (in fact, this is the issue I'm trying to Google for solving in parallel rsyncs), it would be nice if the README of this repo discussed this limitation up front. It would have saved me from the time spent downloading and testing it.

-x option is not parsed?

[root@gw2 sbin]# msrsync -p 10 -f 100 -s 32M -P -d --stats -r "-axPHAX --inplace --numeric-ids --exclude=*\.cache*" / /srv/gw2/os-backup/
[106906/106906 entries] [128.0 T/128.0 T transferred] [18738 entries/s] [22.4 T/s bw] [monq 0] [jq 0]Uncaught exception:
Traceback (most recent call last):
  File "/usr/local/sbin/msrsync", line 1093, in msrsync
    for bucket_files_nr, bucket_size, bucket in buckets(src, options.files, options.s):
  File "/usr/local/sbin/msrsync", line 563, in buckets
    for fsize, rpath in crawl(path, relative=True):
  File "/usr/local/sbin/msrsync", line 539, in crawl
    size = os.lstat(fullpath).st_size
OSError: [Errno 2] No such file or directory: '/proc/10809/task/10809/fd/3'

Process Process-11:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-8:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-9:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-12:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 869, in rsync_monitor_worker
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
    for result in consume_queue(monitor_queue):
  File "/usr/local/sbin/msrsync", line 674, in consume_queue
    item = jobs_queue.get()
  File "<string>", line 2, in get
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod
    kind, result = conn.recv()
EOFError
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-5:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
Process Process-6:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

Process Process-13:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 942, in messages_worker
    for result in consume_queue(G_MESSAGES_QUEUE):
  File "/usr/local/sbin/msrsync", line 674, in consume_queue
    item = jobs_queue.get()
  File "<string>", line 2, in get
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod
    kind, result = conn.recv()
EOFError
Process Process-7:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
Process Process-10:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/sbin/msrsync", line 831, in rsync_worker
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe
    jobs_queue.put(StopIteration)
  File "<string>", line 2, in put
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

/proc should not be there.

using rsync exclude

It is possible to use excludes to avoid syncing certain sub folders as a part of the -r arguments?

SSH

Is there a way to pass an ssh source to the script?

msrsync --rsync "-avzp --remove-source-files -e ssh" [email protected]:/mnt/Data/Media/Movies /mnt/pools/A/A0/Data/Media

What is "monq" and "jq" in the output

Hi,

Nice utility! I like the ease of it, and I think prefer it over fpsync.

I may be missing it, but what are the monq and jq in the progress line? It's not clear in the documentation (for me).

SSH?

Is there a way to run this over SSH?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.