martymac / fpart Goto Github PK
View Code? Open in Web Editor NEWSort files and pack them into partitions
Home Page: https://www.fpart.org/
License: BSD 2-Clause "Simplified" License
Sort files and pack them into partitions
Home Page: https://www.fpart.org/
License: BSD 2-Clause "Simplified" License
Hi Ganael,
A customer of ours has asked if there is any way to get a log from fpart/fpsync that will show what files were transferred and their sizes. Is that possible?
Thank you, sir.
-T
Hi Ganael,
I was debugging some pfp code and was seeing some oddities in what was showing up on the remote end.
I'm using the latest fpart executed as a system() command inside of Perl.
(fpart v1.2.1
Copyright (c) 2011-2021 Ganael LAPLANCHE [email protected]
WWW: http://contribs.martymac.org
Build options: debug=no, fts=system)
but this goes back to at least version 1.00.
The command line is a bit more complex than usual, using the -L -W flags to wait for the file write to be completed.
fpart -v -L -W 'mv $FPART_PARTFILENAME /home/hjm/pfp/fpcache' -z -s 10485760 -o \
/home/hjm/pfp/fpcache/hold/f '/home/hjm/nacs/zeba' 2> \
/home/hjm/pfp/fpcache/fpart.log.15.12.20_2021-04-07 & echo "${!}" > \
/home/hjm/pfp/fpcache/FP_PIDFILE15.12.20_2021-04-07
but that shouldn't (?) make a difference in the output.
The first partition file (f.0) has the file listings relative to the target '/home/hjm/nacs' , but the rest are fully qualified. the 1st 2 lines of several of the partition files are as below. After the 1st one, they are all fully qualified.
$ head -2 f.*
==> f.0 <==
zeba/pct_hpc/dipimage
zeba/pct_hpc/diplib
==> f.1 <==
/home/hjm/nacs/zeba/pct_hpc/lib.Linuxa64/libdipio.so.original
/home/hjm/nacs/zeba/pct_hpc/lib.Linuxa64/libdml_mlv7_6.a
==> f.2 <==
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce.prj
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce.sh
==> f.3 <==
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce_mcr/lib.Linuxa64/libdml_mlv7_6.so
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce_mcr/.matlab/creation.timestamp
==> f.4 <==
/home/hjm/nacs/zeba/pct_hpc/old_lib.Linuxa64/lib.Linuxa64/libdipio.so
/home/hjm/nacs/zeba/pct_hpc/old_lib.Linuxa64/lib.Linuxa64/libdml_mlv7_6.so
This happens with other dirs as well:
$ head -2 f.*
==> f.0 <==
penner/xlab final files to run experiment/xlab install instructions.doc
penner/xlab_final_files_to_run_experiment/computer number sheet.xls
==> f.1 <==
/home/hjm/nacs/penner/dev_to_apache2.notes
/home/hjm/nacs/penner/fpart.done
==> f.2 <==
/home/hjm/nacs/penner/Django-1.0.2-final/docs/ref/models/index.txt
/home/hjm/nacs/penner/Django-1.0.2-final/docs/ref/models/instances.txt
HOWEVER, it does not happen with a more direct command, omitting the -W flag.
fpart -v -L -z -s 2485760 -o \
> /home/hjm/pfp/fpcache/f '/home/hjm/nacs/penner'
on the same dirs as above, so it seems to be related to the -W flag.
So far i seems to be only the 1st partition file that's affected, and all the lines in that file are affected in the same way.
Any ideas?
Hi Ganael,
I'm starting FPSync like this (notice I'm asking for 8 threads):
/bin/sh /usr/bin/fpsync -O -x .glusterfs -x .root -x .r00t -n 8 -vvv /images_ebs /images
But when I do ps -ef | grep fpsync
, I see 24 rsync processes.
Is this expected behavior? What does the -n 8
really signify?
Thanks,
-Tennis
i have of folder of numbered images, i would to sort them using a mathematical function, is it possible with this ?
Hi,
I'm getting the following errors running fpsync
| 2020-07-21T04:46:39.467-05:00 | grep: write error
| 2020-07-21T04:46:39.467-05:00 | ls: write error: Broken pipe
| 2020-07-21T04:46:44.790-05:00 | => [QMGR] Starting job /tmp/fpsync/work/1595253731-3171/1989 (local)
| 2020-07-21T04:46:58.236-05:00 | <= [QMGR] Job 32760:local finished
| 2020-07-21T04:46:58.236-05:00 | grep: write error
| 2020-07-21T04:46:58.236-05:00 | ls: write error: Broken pipe
| 2020-07-21T04:47:03.790-05:00 | => [QMGR] Starting job /tmp/fpsync/work/1595253731-3171/1990 (local)
| 2020-07-21T04:47:29.255-05:00 | <= [QMGR] Job 14185:local finished
| 2020-07-21T04:47:29.255-05:00 | grep: write error
Background:
/usr/bin/fpsync -O -X /ebs/mvprd/.glusterfs -X ./.root -n 64 -vvv /ebs/ /efs/ef-wc-mhiv-l-dev1/vault_27MAY/
[ec2-user@ip-10-64-10-215 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.7G 76K 3.7G 1% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
/dev/xvda1 7.9G 2.7G 5.2G 34% /
10.64.128.181:/ 8.0E 5.5T 8.0E 1% /efs
/dev/xvdb1 3.0T 3.0T 5.8G 100% /ebs
[ec2-user@ip-10-64-10-215 ~]$ sudo du -sh /tmp
1.3G /tmp
Have you thought about a possible way to implement a percentage complete or an estimating completion progress bar?
Is there a way to throttle the transfer rate for fpsync to limit it to 1.25 Gbps. Running 16 jobs using the following syntax limit sthe rate to 1.25 Gbps in a test dataset comprising of 10 MB files:
fpsync -n 16 -s 37772160 /srcdir /targetdir
However, 16 jobs x 36 MiB (37772160 bytes) should transfer ~576 MiB/s(which is ~ 4.12 Gbps). How is fpsync restricting this to 1.25 Gbps?
Another test with different dataset shows using the same command drives ~6 Gbps transfer rate.
Any insights how fpsync is determining the transfer rate and if there is a better way to calculate the job size to throttle the overall transfer rate?
Sorry for writing a "support" question under an issue tracker, but I'm trying to speed up remote backup and which is atm run with the following command.
rsync -avz -e "ssh -i /path/to/backup.key -p 1234" --delete /mnt/data/ archive@$DESTINATION::location/data/
The archive user can only run rsync command
e,g. the authorized_keys_file looks like this:
command="rsync --config=/srv/archive/rsyncd.conf --server --daemon . /srv/archive/data",no-agent-forwarding,no-port-forwarding,no-user-rc,no-X11-forwarding,no-pty ssh-rsa .....
and then the rsyncd.conf has a named sections one for each server
In the menu:
fpart -s 4724464025 -o music-parts /path/to/music ./*.mp3
Produce partitions of 4.4 GB, containing music files from /path/to/music as well as
MP3 files from current directory; with such a partition size, each partition content
will be ready to be burnt to a DVD. Files music-parts.0 to music-parts.n, are
generated as output.
However it does not specified compatible tools to burn to a DVD or write to a burnable ISO image.
I tried mkisofs
with -path-list
, but it does not preserve the directory tree, so it failed when two or more files with the same filename in different directories are present.
I think there should be some tools that are compatible to fpart when you wrote the menu.
Would you mind recommending some of them? Thanks!
I've tried fpart with 3.4GB of data and I'm getting a 0 length file for the first partition:
$ fpart -s 26084354560 -o dev-parts /Users/ah/Documents/dev
Part #0: size = 0, 0 file(s)
Part #1: size = 26084354560, 115597 file(s)
Part #2: size = 26084354560, 55717 file(s)
Part #3: size = 26084354560, 88715 file(s)
Part #4: size = 3924425658, 13859 file(s)
$ ls -l
total 99472
-rw-r----- 1 alex.hunsley staff 0 Aug 19 16:23 dev-parts.0
-rw-r----- 1 alex.hunsley staff 14548982 Aug 19 16:23 dev-parts.1
-rw-r----- 1 alex.hunsley staff 12170472 Aug 19 16:23 dev-parts.2
-rw-r----- 1 alex.hunsley staff 20652071 Aug 19 16:23 dev-parts.3
-rw-r----- 1 alex.hunsley staff 1673407 Aug 19 16:23 dev-parts.4
Is that expected?
Hi Ganael,
The new '-S' (skip files larger than partition size) output format is the current output of '-S' is a bit noisy:
ie:
S (1902116864): /home/hjm/Downloads/isos/lmde-4-cinnamon-32bit.iso
Could this be simplifired to:
1902116864<tab>/home/hjm/Downloads/isos/lmde-4-cinnamon-32bit.iso
the '-v' output goes to STDERR so it can be filtered and that leaves a nicely segregated file
$ ~/bin/fpart-1.4.1 -vvv -L -S -s 200m \
-o ~/fpart/Fs/f ~/Downloads > chunk.exceptions
(lots of verbose STDERR output)
$ head -5 chunk.exceptions
S (327155712): /home/hjm/Downloads/RT-contents.ibd.most-of-rt-database.data
S (492830720): /home/hjm/Downloads/isos/debian-11.1.0-i386-netinst.iso
S (2009333760): /home/hjm/Downloads/isos/Fedora-Workstation-Live-x86_64-35-1.2.iso
S (3204448256): /home/hjm/Downloads/isos/kubuntu-20.04.3-desktop-amd64.iso
so rather than perform a couple of complex regex splits (OK, not very complex), all you need is simple (and usually default in many languages) split on whitespace.
The current format of the partition files is simply a list of fully qualified filename paths with no prefixes.
so the the simpler format suggested above is similar (prefixed with the byte size of the file.)
Thanks
Harry
Hi!
I'm currently using rsync to transfer multiple TB per day in single files. They are each around 100GB and have the problem that the destination only gets its fullspeed when it uses multiple connections (like what the application axel does for the web) so I found this which seems to be pretty interesting.
Is it somehow possible to sync single files over the day?
I'm trying to use fpsync in rsync mode to copy a MongoDB data dir that's several terabytes from one system to another. I want to use 4 parallel jobs, since we have found that is optimal. But I also want to use 4 file parts, so that all rsync processes get kicked off at the start of the sync. The reason for this is that the overall sync takes over 3 hours, but our 2FA ssh sessions between the 2 systems have a 1 hour TTL. We don't want new rsync calls to be happening throughout the sync, otherwise the operator will have to babysit it the entire time and re-enter their 2FA code when prompted.
This is what I tried:
fpsync -v -n 4 -o "-e \\\"ssh -q -o BatchMode=yes -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR\\\"" -O "-n|4" /data/mongodb/ t1m1r1:/data/mongodb/
but it fails with:
Option -n is incompatible with options -f, -s and -L.
Usage: fpart [OPTIONS] -n num | -f files | -s size [FILE or DIR...]
I messed around with modifying fpsync to not pass -f/-s/-L to fpart if -n was in the specified fpart options, but fpart/fpsync failed mysteriously.
Is there a way to make this work?
Thanks!
Hello,
I'm using fpart/fpsync for a large dataset migration, and I find it a very efficient tool.
To optimize the final delta migration I'd like to run several fpsync in parallel and I'm trying to use GNU parallel but I'm stuck on a problem since it seems that fpsync will go in the stop state when it creates the first rsync child process.
Even running fpsync from the shell with "&", after few seconds the process get STOPPED.
If I run several concurrent fpsync from tmux there are no problems at all, but using GNU parallel would be the best way to achieve my goal.
I'm running latest fpart version 1.5.1 compiled from source on Ubuntu 20.04.
Thanks in advance for any hint.
Best regards
Luca
I am trying to sync files from a slow Windows NFS server. rsync was slow, but fpsync is even slower - even just collecting the list of files to be copied.
Using strace, I can see that as well as waiting for getdents
, every file is being stat
'd. To demonstrate, here is a simple reproducer:
strace -f fpart -f 100 -L /etc 2>&1 | grep stat
But if you are using fpart -f
without -s
, I don't think stat()
needs to be called at all.
There is a secondary issue: if you run fpsync with -f
but without -s
, fpsync still passes -s 4294967296
to fpart. But sinc fpsync is a shell script, that's easily hacked out.
When estimating time remaining, it appears to only consider whether the jobs were submitted, not how much longer remaining jobs will take. Here's an example:
1687462151 <=== [QMGR] Done submitting jobs. Waiting for them to finish.
1687462190 <= [QMGR] Job 23067:248:local finished
1687462199 <= [QMGR] Job 27451:280:local finished
1687462200 <=== Parts done: 281/281 (100%), remaining: 0
1687462200 <=== Time elapsed: 1025s, remaining: ~0s (~3s/job)
1687462281 <=== Parts done: 281/281 (100%), remaining: 0
1687462281 <=== Time elapsed: 1106s, remaining: ~0s (~3s/job)
# 40 minutes pass, but status does not change at all aside from Time elapsed increasing
# during this time the final part is being processed. this part is 10x bigger than any other part due to containing a single huge file
1687464541 <= [QMGR] Job 5504:1:local finished
1687464541 <=== [QMGR] Queue processed
1687464541 <=== Parts done: 281/281 (100%), remaining: 0
1687464541 <=== Time elapsed: 3366s, remaining: ~0s (~11s/job)
1687464541 <=== Fpsync completed without error in 3366s.
1687464541 <=== End time: Thu Jun 22 20:09:01 UTC 2023
I'm intending to use fpsync or fpart with a wrapper to transfer a large number of small files(~9 TB) that have some directories with a fair number of hard links(think linux repositories) to save on disk space. I tried this over the past weekend with fpsync passing the -o "-lptgoDH argument to it and it ran really well with high throughput across multiple worker nodes vs vanilla rsync, I noted that the size on disk was quickly exceeding 10TB after letting it run for an extended duration. A followup vanilla rsync(-avH) wound up clearing up the "duplicate" files, restoring any hard links in the process. I think the individual rsyncs are respecting the "-H" flag but only for the data they are tasks to replicate.
I may need to do this again down the road so I'd like to leverage fpsync or fpart while still accounting for hard links across the filesystem.
Hello,
I am having issues excluding files or directories with Fpsync on macOS. downloaded and installed using brew install fpart
on macOS 13.6.1.
example, I have a folder that looks like this:
I have tried the syntax mentioned in this issue #17 to try and exclude, for example, any folders/files that mention 'pasta' using
fpsync -n 4 -o "--progress" -O '-X /pasta' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O '-X ./pasta' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O '-X *pasta*' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O "-X '*pasta*'"~/Test /dest/path/
I have tried just about every variation I could think of: double quotes, single quotes, quoting the pattern within portion that is feeding options to fpart, using -x instead of -X with all those variations, trying the ./ approach to get it to recognize a path, and I've tried different wildcard characters as it was stated those were supported as well. I have also tried excluding via the -o (feeding --exclude='whatever') to rsync, but I get an rsync 'action not supported' error in the the logs trying that. I confirmed that when using rsync itself, excludes work fine. Any help/guidance I could get would be appreciated.
Command:
fpsync -n 5 -f 1000 -o "-azh --info=progress2 --exclude 'lost+found'" /data2/ $USER@$REMOTE:/path/to/data2/
Error:
Please supply an absolute path for both src_dir/ and dst_dir/
Dear author! I want to sync a local directory to a remote directory via ssh using fpsync, my command is as follows:
fpsync -vv -n 8 -f 128 -t /data/public/alphafold2-2.3.1/tmp -o "-zlptgoD -v --numeric-ids -e \"ssh -i id_ecdsa -p 22 scx6001@ [email protected] \"" /data/public/alphafold2-2.3.1/ /home/bingxing2/public/alphafold2.3.2
The key file here is: id_ecdsa
Account name: scx6001@BSCC-N32-H
Domain name: ssh.cn-zhongwei-1.paracloud.com
source dir :/data/public/alphafold2-2.3.1/
destnation dir:/home/bingxing2/public/alphafold2.3.2
After execution, the following error will appear:
Cannot create destination directory: /home/bingxing2/public/alphafold2.3.2
How can I solve this problem? Looking forward to your reply.
Hi, thanks for maintaining such great tool.
I was wondering if it is possible to support remote url for the source directory?
e.g.
fpsync user@host:data/ /mnt/data/
I noticed that a remote url can be in the dst_dir
but not in the src_dir
.
Since there are cases in which a remote server cannot directly access a local computer due to NAT or other firewall settings, it can make fpsync somehow impractical to use in such case if I want to pull data from remote servers.
In rsync, I believe remote urls are both supported in src_dir
and dst_dir
as long as rsync is installed on both machines. Would it also be possibile for fpsync?
e.g.
rsync user@host:data/ /mnt/data/
hi
I'm using mac os. and I download fpart following link "http://moo.nac.uci.edu/~hjm/parsync/utils/fpart"
after I run the following commands ;
$ chmod +x fpart
./fpart
I getting the following error:
sh: /usr/local/bin/fpart: cannot execute binary file
even when I'm trying to do it manually, I get the same thing.
it seems that fpart don't support on mac os, (even that according to https://github.com/martymac/fpart it should work
Fpart is primarily developed on FreeBSD.
It has been successfully tested on :
...
Mac OS X (10.6, 10.8)
)
what am I missing?
I'm trying to use sshpass for "reasons". This works out of the box:
SSHPASS=somepassword sshpass -e rsync -av src/ user@server:dst/
But when using fpsync it gets stuck in the ssh phase
SSHPASS=somepassword sshpass -e fpsync -n 4 -f 3 src/ server:dst/
parts are generated successefully, 4 rsync processes are started and subsequent 4 ssh processes but all of them are stuck in the T
state (my guess that sshpass magic somehow stop to operate and ssh is waiting for user input)
Is there any way to use fpsync inside sshpass?
Hi Ganael
Is there any way to fix job control warnings when running under cron where there is no tty? My output is filled with errors like these:
/usr/local/bin/fpsync: 1531: set: can't access tty; job control turned off
one for each partition when set -m
is called. fpsync seems to work still but the output is hard to see amidst the error spam.
Thanks
James
Trying to build fpart on RHEL6 and having issues:
Greetings!
I'm using parsyncfp which uses fpart under the hood. The idea is to copy a (pretty huge) /home
dir onto /mnt/new_home
(which is on a different device, formatted with notably more inodes). There is (afaik) about ~200M files to enumerate, most of them quite small.
fpart -v -L -z -s 10485760 -o /mnt/new_home/zor/fpcache/f //home
In the process of troubleshooting/diagnosing what I deem pretty slow file enumeration (that runs contrary to everything I could read about fpart), I ran into a discrepancy between what fpart log says about my chunks (about 50K files per chunk) and the actual content of the chunk file (less than 1K files per chunk).
At this point I believe this to be some fundamental misunderstanding on my part about fpart and probably not a bug. Would you be kind enough to clarify what is going on?
[root:/mnt/new_home/zor/fpcache] [base] # tail fpart.log.00.57.52_2019-08-10
Filled part #162: size = 10487077, 46953 file(s)
Filled part #163: size = 10486100, 44556 file(s)
Filled part #164: size = 10485854, 46049 file(s)
Filled part #165: size = 10485967, 46284 file(s)
Filled part #166: size = 10487488, 46771 file(s)
Filled part #167: size = 10485843, 46619 file(s)
Filled part #168: size = 10486423, 48616 file(s)
Filled part #169: size = 10486659, 46067 file(s)
Filled part #170: size = 10485845, 44444 file(s)
Filled part #171: size = 10485861, 44997 file(s)
[root:/mnt/new_home/zor/fpcache] [base] # ls -l f.* | tr -s ' ' | sed 's/f\.//' | awk '// { print $9 " " $8 }' | sort -n | tail | awk '// { "wc -l < f." $1 | getline linecount; print "f." $1 ", " linecount " lines, written at " $2 } '
f.163, 1547 lines, written at 04:49
f.164, 638 lines, written at 04:52
f.165, 983 lines, written at 04:55
f.166, 937 lines, written at 04:58
f.167, 1062 lines, written at 05:01
f.168, 456 lines, written at 05:04
f.169, 715 lines, written at 05:07
f.170, 704 lines, written at 05:11
f.171, 695 lines, written at 05:14
f.172, 621 lines, written at 05:18
It looks like there have been several changes since the last minor release and I see that some commits were made to the RPM to mark it as "1.2.0" even though there hasn't been an equivalent git tag added. I'd like to submit some pull requests to the RPM spec file to reflect my own changes and then push new releases to the Fedora/EPEL builds I maintain but first I need a new release on here to be tagged.
If I generate partition files with the name 'part', I end up with files like this:
part.0
part.1
part.2
...
part.10
part.11
...
part.100
part.101
These filenames aren't 0-padded so don't sort into a logical order:
part.0
part.1
part.10
part.11
part.12
part.13
part.14
Could we have an option for generating 0-padded filenames please? i.e. part.000, part.001, etc.
Hi,
I'm using fpsync to transfer >1000 files from A to B via ssh.
Some of the files are >1TB, in fact the file size could be up to 10TB.
Question:
How can I identify which file(s) are currently processed in order to verify the process?
THX
Hi Ganael,
Sorry to bother you again.
We are passing this invocation to FPSync (below). The content after -O
and before -n 32
is quoted on the command line. Notice the "Tool options:" don't show any exclusions.
| 2020-07-27T11:39:07.755-05:00 | /usr/bin/fpsync -O -X /ebs/geisprd/.glusterfs -X ./.root -n 32 -f 500000 -vvvv /ebs/geisprd/ /efs/nv-wc-geis-prd/vault11/
| 2020-07-27T11:39:07.755-05:00 | 1595867947 =====> [9389] Syncing /ebs/geisprd/ => /efs/nv-wc-geis-prd/vault11/
| 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Job name: 1595867947-9389
| 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Start time: Mon Jul 27 16:39:07 UTC 2020
| 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Concurrent sync jobs: 32
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Workers: local
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Shared dir: /tmp/fpsync
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Temp dir: /tmp/fpsync
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Tool name: "rsync"
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Tool options: "-lptgoD -v --numeric-ids" <====
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Max files or directories per sync job: 500000
| 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Max bytes per sync job: 4294967296
I also confirmed there aren't any exclusions with the running jobs (via ps -ef
):
root 1723 8811 0 Jul27 ? 00:00:00 /bin/sh /tmp/fpsync/work/1595854539-8731/3322
root 1725 1723 0 Jul27 ? 00:01:36 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3322 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//
root 1728 1725 0 Jul27 ? 00:00:24 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3322 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//
root 2292 31289 0 07:33 ? 00:00:07 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3805 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//
What am I doing wrong?
Thanks,
-Tennis
My system doesn't have mail
resulting in which: no mail in (/sbin:/bin:/usr/sbin:/usr/bin)
on every run; would be good to only look for mail
if it is going to be used or silence it as checks are done later on in the code as well :)
Is it possible to add multi-thread support to fpart, similar to the functionality -n from fpsync?
A little bit of context on it: I am dealing with a file system that has a big amount of small files (1 TB of 7M+ files), and partitioning them with fpart takes about 63 minutes with the parameters I specified. Conversely, by running:
find /path/to/filesystem/ -mindepth 2 -maxdepth 2 -type d | parallel -j8 find {} -type f | split -dl 100000 - list
it takes 36 minutes to complete. Of course, the produced list is not sorted but this is unimportant for my use case. Ideally, I would like to be able to achieve the same results with fpart by leveraging more parallel jobs.
Can this be achieved?
I run the following command on debian 12 (same on another server):
fpsync -n 15 -o "-a" /Dashi/source/ [email protected]:/Mariana/source/
message:
Incompatible option(s) detected within toolopts (option -o)
fpart -V
fpart v1.5.1
Copyright (c) 2011-2022 Ganael LAPLANCHE [email protected]
WWW: http://contribs.martymac.org
Build options: debug=no, fts=system
rsync -V
rsync version 3.2.7 protocol version 31
Copyright (C) 1996-2022 by Andrew Tridgell, Wayne Davison, and others.
Web site: https://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, symlinks, symtimes, hardlinks, hardlink-specials,
hardlink-symlinks, IPv6, atimes, batchfiles, inplace, append, ACLs,
xattrs, optional secluded-args, iconv, prealloc, stop-at, no crtimes
Optimizations:
SIMD-roll, no asm-roll, openssl-crypto, no asm-MD5
Checksum list:
xxh128 xxh3 xxh64 (xxhash) md5 md4 sha1 none
Compress list:
zstd lz4 zlibx zlib none
Daemon auth list:
sha512 sha256 sha1 md5 md4
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
is there any sugguest ?
Thanks a lot.
This isn't a bug, but a confusing seeming design.
In '-s' mode, a '0' numbered partition file seems to be always created, to hold files too big for the specified -s size, even if no files were too big. So if the 0 numbered partition file is empty, fpart managed to partition everything ok, and you have to ignore the partition 0 file.
Compare to '-n' mode (i.e. that number of partitions with data split as evenly as possible between them): partition file 0 is just a regular partition file containing file names. It being non-empty is NOT an error condition.
Am I understanding this correctly? Is there anyway to make the 'overflow' partition file for '-s' mode not be the 0th partition file? e.g. a flag could be introduced that means the overflow files partition file is called something unique.
Hi Ganael,
At the end of a copy, I notice you put how long (in seconds) the copy process took. That's really great.
Would it also be possible to log how much data was copied?
Thanks,
-Tennis
For example, suppose I run fpart -z
against the following directory structure on disk, and I'm ignoring files called Thumbs.db
:
directoryA/
(no files)
directoryB/
Thumbs.db
The partition files that result will have no mention of directoryB, only directoryA.
Should a folder containing only ignored files should be considered an empty folder, rather than a non-existent folder? Otherwise the existence of ignored files actually changes the output of fpart -z
, which doesn't seem to make sense.
If this is expected behaviour, any chance of adding a flag that means "regard directories with only ignored files as empty directories"?
I am trying to run fpsync -n ${threads} -v src/directory dest/directory
on a ~5GB files. The command is executed in bash script.
I see the output as:
1684539262 ===> Analyzing filesystem...
1684539265 <=== Fpart crawling finished
1684539266 <=== Parts done: 77/77 (100%), remaining: 0
1684539266 <=== Time elapsed: 4s, remaining: ~0s (~0s/job)
1684539266 <=== Fpsync completed without error in 4s.
While inspecting the target or destination directory it appears that only ~50MB are getting copied.
src/directory/sub-directory
src/directory/files
src/directory/sub-directory/files
Am I missing something over here?
Hi Ganael,
I'm using this invocation:
/usr/bin/fpsync -O '-x .glusterfs -x .root -x .r00t' -n 64 -f 50000 -vvv /ebs/geisprd/ /efs/nv-wc-geis-prd/vault11/
Both root
and r00t
are files. They are successfully excluded. But the hidden directory .glusterfs
is not excluded. It is copied to the destination directory.
Is there something I'm doing wrong?
-Tennis
Is it possible to use rclone output file system list to fpart, and procssed by fpart
Would it be possible to create a homebrew tap for the fpart package to facilitate deployment on MacOS machines, or to allow that work to be done on a separate repo called homebrew-fpart
so one can use the brew tap
command? Thanks so much for your time!
I would like to split a directory "photos" into partitions, each of which should be 4.4GB, so that I can burn them to DVDs. The files inside it are organized with the directory hierarchy as follow:
......
./photos/2015-02-15-Meeting-With-Friends/
./photos/2015-02-15-Meeting-With-Friends/videos/
./photos/2015-02-18-Family-Trip/
.......
./photos/2020-04-26-Tim-Birthday/
./photos/2020-06-10-Anthony-Birthday/
./photos/2020-06-10-Anthony-Birthday/videos/
./photos/2020-06-10-Anthony-Birthday/memo/
......
How can I create partitions respecting the directory name, so that the partition content would be grouped by date according to the directory name?
Also I am still unable to get mkisofs
or xorrisofs
to work with the partition files generated by fpart
. Would you recommend some tips? Thanks!
Hi Ganael,
Is there any way to figure out how much data has been transferred while FPSync is running?
For example, if I start a transfer of a 3TB volume, how can I tell after 3 hours how much has actually been sent?
Possible?
Thank you,
-Tennis
Hello,
We are currently trying to use fpart on a mounted sshfs (part of libfuse) filesystem.
The filesystem contains about 11TB of data within over 10.000.000 directories across 15.000.000 files.
Running fpart produces parts until it hits a little over 1.000.000 files. after that it seems that all the directories look empty or unreadable to fpart as it just writes out the current levels directories, then one level up and so on.
I further discovered that some directories in the middle of the run have their subdirectories listed as empty which should have files in them.
as far as i can tell it is inconsistent for which directories this happens.
The commandline used is:
fpart -f 1000000 -o debug_2.part -zz -vv <MOUNTPOINT>
This was tested with version 1.4.0 packaged in debian bullseye and version 1.4.1 built from the github repo.
Building with the option --enable-debug does not reveal anything more to me, the output just shows valid_file():
iterating over the same files and directories (which later appear empty).
Running fpart on the original filesystem on the other server (ext4) produces complete partitions as far as i can tell.
I'd be happy to do further debug work if you tell me what you need.
Cheers,
Valentin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.