martymac / fpart Goto Github PK

View Code? Open in Web Editor NEW

225.0 16.0 37.0 975 KB

Sort files and pack them into partitions

Home Page: https://www.fpart.org/

License: BSD 2-Clause "Simplified" License

C 75.92% Makefile 0.33% M4 0.93% Shell 22.82%

data migration parallel rsync packing bigdata cpio tar

fpart's People

Contributors

Stargazers

Watchers

fpart's Issues

List of files transferred & their sizes?

Hi Ganael,

A customer of ours has asked if there is any way to get a log from fpart/fpsync that will show what files were transferred and their sizes. Is that possible?

Thank you, sir.
-T

1st partition has a different path than others.

Hi Ganael,
I was debugging some pfp code and was seeing some oddities in what was showing up on the remote end.
I'm using the latest fpart executed as a system() command inside of Perl.
(fpart v1.2.1
Copyright (c) 2011-2021 Ganael LAPLANCHE [email protected]
WWW: http://contribs.martymac.org
Build options: debug=no, fts=system)
but this goes back to at least version 1.00.
The command line is a bit more complex than usual, using the -L -W flags to wait for the file write to be completed.

fpart -v -L -W 'mv $FPART_PARTFILENAME /home/hjm/pfp/fpcache' -z -s 10485760 -o \
/home/hjm/pfp/fpcache/hold/f '/home/hjm/nacs/zeba'  2> \
/home/hjm/pfp/fpcache/fpart.log.15.12.20_2021-04-07 & echo "${!}" > \
/home/hjm/pfp/fpcache/FP_PIDFILE15.12.20_2021-04-07

but that shouldn't (?) make a difference in the output.
The first partition file (f.0) has the file listings relative to the target '/home/hjm/nacs' , but the rest are fully qualified. the 1st 2 lines of several of the partition files are as below. After the 1st one, they are all fully qualified.

$ head -2 f.*
==> f.0 <==
zeba/pct_hpc/dipimage
zeba/pct_hpc/diplib

==> f.1 <==
/home/hjm/nacs/zeba/pct_hpc/lib.Linuxa64/libdipio.so.original
/home/hjm/nacs/zeba/pct_hpc/lib.Linuxa64/libdml_mlv7_6.a

==> f.2 <==
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce.prj
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce.sh

==> f.3 <==
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce_mcr/lib.Linuxa64/libdml_mlv7_6.so
/home/hjm/nacs/zeba/pct_hpc/standalone/image2pce_mcr/.matlab/creation.timestamp

==> f.4 <==
/home/hjm/nacs/zeba/pct_hpc/old_lib.Linuxa64/lib.Linuxa64/libdipio.so
/home/hjm/nacs/zeba/pct_hpc/old_lib.Linuxa64/lib.Linuxa64/libdml_mlv7_6.so

This happens with other dirs as well:

$ head -2 f.*
==> f.0 <==
penner/xlab final files to run experiment/xlab install instructions.doc
penner/xlab_final_files_to_run_experiment/computer number sheet.xls

==> f.1 <==
/home/hjm/nacs/penner/dev_to_apache2.notes
/home/hjm/nacs/penner/fpart.done

==> f.2 <==
/home/hjm/nacs/penner/Django-1.0.2-final/docs/ref/models/index.txt
/home/hjm/nacs/penner/Django-1.0.2-final/docs/ref/models/instances.txt

HOWEVER, it does not happen with a more direct command, omitting the -W flag.

fpart -v -L  -z -s 2485760 -o \
> /home/hjm/pfp/fpcache/f '/home/hjm/nacs/penner'

on the same dirs as above, so it seems to be related to the -W flag.

So far i seems to be only the 1st partition file that's affected, and all the lines in that file are affected in the same way.

Any ideas?

Expected Behavior?

Hi Ganael,

I'm starting FPSync like this (notice I'm asking for 8 threads):
/bin/sh /usr/bin/fpsync -O -x .glusterfs -x .root -x .r00t -n 8 -vvv /images_ebs /images

But when I do ps -ef | grep fpsync, I see 24 rsync processes.

Is this expected behavior? What does the -n 8 really signify?

Thanks,
-Tennis

sorting with mathematical function

i have of folder of numbered images, i would to sort them using a mathematical function, is it possible with this ?

--exclude

Hey @martymac

Is the --exclude option supported in file mode ?

Thanks so much !

Getting 'grep' and 'ls' errors on fpsync

Hi,

I'm getting the following errors running fpsync

  | 2020-07-21T04:46:39.467-05:00 | grep: write error
  | 2020-07-21T04:46:39.467-05:00 | ls: write error: Broken pipe
  | 2020-07-21T04:46:44.790-05:00 | => [QMGR] Starting job /tmp/fpsync/work/1595253731-3171/1989 (local)
  | 2020-07-21T04:46:58.236-05:00 | <= [QMGR] Job 32760:local finished
  | 2020-07-21T04:46:58.236-05:00 | grep: write error
  | 2020-07-21T04:46:58.236-05:00 | ls: write error: Broken pipe
  | 2020-07-21T04:47:03.790-05:00 | => [QMGR] Starting job /tmp/fpsync/work/1595253731-3171/1990 (local)
  | 2020-07-21T04:47:29.255-05:00 | <= [QMGR] Job 14185:local finished
  | 2020-07-21T04:47:29.255-05:00 | grep: write error

Background:

Here is the fpsync invocation:

/usr/bin/fpsync -O -X /ebs/mvprd/.glusterfs -X ./.root -n 64 -vvv /ebs/ /efs/ef-wc-mhiv-l-dev1/vault_27MAY/

We are copying about 3TB of files
The errors do not happen immediately. It is far into the process before they start appearing.

Here is some additional info about the host on which this is running:

   [ec2-user@ip-10-64-10-215 ~]$ df -h
   Filesystem       Size  Used Avail Use% Mounted on
   devtmpfs         3.7G   76K  3.7G   1% /dev
   tmpfs            3.7G     0  3.7G   0% /dev/shm
   /dev/xvda1       7.9G  2.7G  5.2G  34% /
   10.64.128.181:/  8.0E  5.5T  8.0E   1% /efs
   /dev/xvdb1       3.0T  3.0T  5.8G 100% /ebs
   [ec2-user@ip-10-64-10-215 ~]$ sudo du -sh /tmp
   1.3G	/tmp

Status / percentage complete / progress bar?

Have you thought about a possible way to implement a percentage complete or an estimating completion progress bar?

Throttle fpsync transfer rate

Is there a way to throttle the transfer rate for fpsync to limit it to 1.25 Gbps. Running 16 jobs using the following syntax limit sthe rate to 1.25 Gbps in a test dataset comprising of 10 MB files:

fpsync -n 16 -s 37772160 /srcdir /targetdir

However, 16 jobs x 36 MiB (37772160 bytes) should transfer ~576 MiB/s(which is ~ 4.12 Gbps). How is fpsync restricting this to 1.25 Gbps?

Another test with different dataset shows using the same command drives ~6 Gbps transfer rate.

Any insights how fpsync is determining the transfer rate and if there is a better way to calculate the job size to throttle the overall transfer rate?

Remote sync via ssh

Sorry for writing a "support" question under an issue tracker, but I'm trying to speed up remote backup and which is atm run with the following command.

rsync -avz -e "ssh -i /path/to/backup.key -p 1234" --delete /mnt/data/ archive@$DESTINATION::location/data/

The archive user can only run rsync command
e,g. the authorized_keys_file looks like this:

command="rsync --config=/srv/archive/rsyncd.conf --server --daemon . /srv/archive/data",no-agent-forwarding,no-port-forwarding,no-user-rc,no-X11-forwarding,no-pty ssh-rsa  .....

and then the rsyncd.conf has a named sections one for each server

How to burn the partition content to DVD / Blu-ray?

In the menu:

     fpart -s 4724464025 -o music-parts /path/to/music ./*.mp3
             Produce partitions of 4.4 GB, containing music files from /path/to/music as well as
             MP3 files from current directory; with such a partition size, each partition content
             will be ready to be burnt to a DVD. Files music-parts.0 to music-parts.n, are
             generated as output.

However it does not specified compatible tools to burn to a DVD or write to a burnable ISO image.
I tried mkisofs with -path-list, but it does not preserve the directory tree, so it failed when two or more files with the same filename in different directories are present.

I think there should be some tools that are compatible to fpart when you wrote the menu.
Would you mind recommending some of them? Thanks!

Empty partition 0 file

I've tried fpart with 3.4GB of data and I'm getting a 0 length file for the first partition:

$ fpart -s 26084354560 -o dev-parts /Users/ah/Documents/dev
Part #0: size = 0, 0 file(s)
Part #1: size = 26084354560, 115597 file(s)
Part #2: size = 26084354560, 55717 file(s)
Part #3: size = 26084354560, 88715 file(s)
Part #4: size = 3924425658, 13859 file(s)

$ ls -l
total 99472
-rw-r-----  1 alex.hunsley  staff         0 Aug 19 16:23 dev-parts.0
-rw-r-----  1 alex.hunsley  staff  14548982 Aug 19 16:23 dev-parts.1
-rw-r-----  1 alex.hunsley  staff  12170472 Aug 19 16:23 dev-parts.2
-rw-r-----  1 alex.hunsley  staff  20652071 Aug 19 16:23 dev-parts.3
-rw-r-----  1 alex.hunsley  staff   1673407 Aug 19 16:23 dev-parts.4

Is that expected?

Format change for '-S' output

Hi Ganael,
The new '-S' (skip files larger than partition size) output format is the current output of '-S' is a bit noisy:
ie:
S (1902116864): /home/hjm/Downloads/isos/lmde-4-cinnamon-32bit.iso

Could this be simplifired to:
1902116864<tab>/home/hjm/Downloads/isos/lmde-4-cinnamon-32bit.iso

the '-v' output goes to STDERR so it can be filtered and that leaves a nicely segregated file

$ ~/bin/fpart-1.4.1 -vvv  -L -S  -s 200m \
-o ~/fpart/Fs/f  ~/Downloads > chunk.exceptions

(lots of verbose STDERR output)

$ head -5 chunk.exceptions
S (327155712): /home/hjm/Downloads/RT-contents.ibd.most-of-rt-database.data
S (492830720): /home/hjm/Downloads/isos/debian-11.1.0-i386-netinst.iso
S (2009333760): /home/hjm/Downloads/isos/Fedora-Workstation-Live-x86_64-35-1.2.iso
S (3204448256): /home/hjm/Downloads/isos/kubuntu-20.04.3-desktop-amd64.iso

so rather than perform a couple of complex regex splits (OK, not very complex), all you need is simple (and usually default in many languages) split on whitespace.

The current format of the partition files is simply a list of fully qualified filename paths with no prefixes.
so the the simpler format suggested above is similar (prefixed with the byte size of the file.)

Thanks
Harry

Single files support

Hi!

I'm currently using rsync to transfer multiple TB per day in single files. They are each around 100GB and have the problem that the destination only gets its fullspeed when it uses multiple connections (like what the application axel does for the web) so I found this which seems to be pretty interesting.

Is it somehow possible to sync single files over the day?

fpsync does not provide a way to specify number of file parts

I'm trying to use fpsync in rsync mode to copy a MongoDB data dir that's several terabytes from one system to another. I want to use 4 parallel jobs, since we have found that is optimal. But I also want to use 4 file parts, so that all rsync processes get kicked off at the start of the sync. The reason for this is that the overall sync takes over 3 hours, but our 2FA ssh sessions between the 2 systems have a 1 hour TTL. We don't want new rsync calls to be happening throughout the sync, otherwise the operator will have to babysit it the entire time and re-enter their 2FA code when prompted.

This is what I tried:

fpsync -v -n 4 -o "-e \\\"ssh -q -o BatchMode=yes -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR\\\"" -O "-n|4" /data/mongodb/ t1m1r1:/data/mongodb/

but it fails with:

Option -n is incompatible with options -f, -s and -L.
Usage: fpart [OPTIONS] -n num | -f files | -s size [FILE or DIR...]

I messed around with modifying fpsync to not pass -f/-s/-L to fpart if -n was in the specified fpart options, but fpart/fpsync failed mysteriously.

Is there a way to make this work?

Thanks!

Problem running fpsync with GNU parallel or in background from the shell, the process will go in STOP state

Hello,
I'm using fpart/fpsync for a large dataset migration, and I find it a very efficient tool.
To optimize the final delta migration I'd like to run several fpsync in parallel and I'm trying to use GNU parallel but I'm stuck on a problem since it seems that fpsync will go in the stop state when it creates the first rsync child process.
Even running fpsync from the shell with "&", after few seconds the process get STOPPED.
If I run several concurrent fpsync from tmux there are no problems at all, but using GNU parallel would be the best way to achieve my goal.
I'm running latest fpart version 1.5.1 compiled from source on Ubuntu 20.04.
Thanks in advance for any hint.
Best regards
Luca

Avoid calling stat() on every file if you're not using size limit

I am trying to sync files from a slow Windows NFS server. rsync was slow, but fpsync is even slower - even just collecting the list of files to be copied.

Using strace, I can see that as well as waiting for getdents, every file is being stat'd. To demonstrate, here is a simple reproducer:

strace -f fpart -f 100 -L /etc 2>&1 | grep stat

But if you are using fpart -f without -s, I don't think stat() needs to be called at all.

There is a secondary issue: if you run fpsync with -f but without -s, fpsync still passes -s 4294967296 to fpart. But sinc fpsync is a shell script, that's easily hacked out.

fpsync status/progress reporting does not work correctly

When estimating time remaining, it appears to only consider whether the jobs were submitted, not how much longer remaining jobs will take. Here's an example:

1687462151 <=== [QMGR] Done submitting jobs. Waiting for them to finish.
1687462190 <= [QMGR] Job 23067:248:local finished
1687462199 <= [QMGR] Job 27451:280:local finished

1687462200 <=== Parts done: 281/281 (100%), remaining: 0
1687462200 <=== Time elapsed: 1025s, remaining: ~0s (~3s/job)

1687462281 <=== Parts done: 281/281 (100%), remaining: 0
1687462281 <=== Time elapsed: 1106s, remaining: ~0s (~3s/job)

# 40 minutes pass, but status does not change at all aside from Time elapsed increasing
# during this time the final part is being processed. this part is 10x bigger than any other part due to containing a single huge file

1687464541 <= [QMGR] Job 5504:1:local finished
1687464541 <=== [QMGR] Queue processed
1687464541 <=== Parts done: 281/281 (100%), remaining: 0
1687464541 <=== Time elapsed: 3366s, remaining: ~0s (~11s/job)
1687464541 <=== Fpsync completed without error in 3366s.
1687464541 <=== End time: Thu Jun 22 20:09:01 UTC 2023

fpsync: What are the "default arguments supplied to rsync"?

Better support for hard links across large directory structures

I'm intending to use fpsync or fpart with a wrapper to transfer a large number of small files(~9 TB) that have some directories with a fair number of hard links(think linux repositories) to save on disk space. I tried this over the past weekend with fpsync passing the -o "-lptgoDH argument to it and it ran really well with high throughput across multiple worker nodes vs vanilla rsync, I noted that the size on disk was quickly exceeding 10TB after letting it run for an extended duration. A followup vanilla rsync(-avH) wound up clearing up the "duplicate" files, restoring any hard links in the process. I think the individual rsyncs are respecting the "-H" flag but only for the data they are tasks to replicate.

I may need to do this again down the road so I'd like to leverage fpsync or fpart while still accounting for hard links across the filesystem.

having issues with excluding directories or files, macOS

Hello,

I am having issues excluding files or directories with Fpsync on macOS. downloaded and installed using brew install fpart on macOS 13.6.1.

example, I have a folder that looks like this:

I have tried the syntax mentioned in this issue #17 to try and exclude, for example, any folders/files that mention 'pasta' using

fpsync -n 4 -o "--progress" -O '-X /pasta' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O '-X ./pasta' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O '-X *pasta*' ~/Test /dest/path/
fpsync -n 4 -o "--progress" -O "-X '*pasta*'"~/Test /dest/path/

I have tried just about every variation I could think of: double quotes, single quotes, quoting the pattern within portion that is feeding options to fpart, using -x instead of -X with all those variations, trying the ./ approach to get it to recognize a path, and I've tried different wildcard characters as it was stated those were supported as well. I have also tried excluding via the -o (feeding --exclude='whatever') to rsync, but I get an rsync 'action not supported' error in the the logs trying that. I confirmed that when using rsync itself, excludes work fine. Any help/guidance I could get would be appreciated.

"please supply and absolute path" when it already is

Command:

fpsync -n 5 -f 1000 -o "-azh --info=progress2 --exclude 'lost+found'" /data2/ $USER@$REMOTE:/path/to/data2/

Error:

Please supply an absolute path for both src_dir/ and dst_dir/

How to sync local directory to a remote directory by fpsync

Dear author! I want to sync a local directory to a remote directory via ssh using fpsync, my command is as follows:
fpsync -vv -n 8 -f 128 -t /data/public/alphafold2-2.3.1/tmp -o "-zlptgoD -v --numeric-ids -e \"ssh -i id_ecdsa -p 22 scx6001@ [email protected] \"" /data/public/alphafold2-2.3.1/ /home/bingxing2/public/alphafold2.3.2
The key file here is: id_ecdsa
Account name: scx6001@BSCC-N32-H
Domain name: ssh.cn-zhongwei-1.paracloud.com
source dir :/data/public/alphafold2-2.3.1/
destnation dir:/home/bingxing2/public/alphafold2.3.2
After execution, the following error will appear:
Cannot create destination directory: /home/bingxing2/public/alphafold2.3.2
How can I solve this problem? Looking forward to your reply.

Support remote url for source directory in fpsync

Hi, thanks for maintaining such great tool.
I was wondering if it is possible to support remote url for the source directory?
e.g.

fpsync user@host:data/ /mnt/data/

I noticed that a remote url can be in the dst_dir but not in the src_dir.
Since there are cases in which a remote server cannot directly access a local computer due to NAT or other firewall settings, it can make fpsync somehow impractical to use in such case if I want to pull data from remote servers.

In rsync, I believe remote urls are both supported in src_dir and dst_dir as long as rsync is installed on both machines. Would it also be possibile for fpsync?
e.g.

rsync user@host:data/ /mnt/data/

mac os users

I'm using mac os. and I download fpart following link "http://moo.nac.uci.edu/~hjm/parsync/utils/fpart"
after I run the following commands ;

$ chmod +x fpart
./fpart

I getting the following error:
sh: /usr/local/bin/fpart: cannot execute binary file

even when I'm trying to do it manually, I get the same thing.
it seems that fpart don't support on mac os, (even that according to https://github.com/martymac/fpart it should work

Fpart is primarily developed on FreeBSD.
It has been successfully tested on :
...
Mac OS X (10.6, 10.8)

)

what am I missing?

using sshpass does not seem to work with fpsync

I'm trying to use sshpass for "reasons". This works out of the box:
SSHPASS=somepassword sshpass -e rsync -av src/ user@server:dst/

But when using fpsync it gets stuck in the ssh phase
SSHPASS=somepassword sshpass -e fpsync -n 4 -f 3 src/ server:dst/

parts are generated successefully, 4 rsync processes are started and subsequent 4 ssh processes but all of them are stuck in the T state (my guess that sshpass magic somehow stop to operate and ssh is waiting for user input)

Is there any way to use fpsync inside sshpass?

fpsync: job control error running under cron

Hi Ganael

Is there any way to fix job control warnings when running under cron where there is no tty? My output is filled with errors like these:

/usr/local/bin/fpsync: 1531: set: can't access tty; job control turned off

one for each partition when set -m is called. fpsync seems to work still but the output is hard to see amidst the error spam.

Thanks

James

autoconf build fails on RHEL/CentOS 6

Trying to build fpart on RHEL6 and having issues:

autoreconf --install
configure.ac:1: error: Autoconf version 2.69 or higher is required
configure.ac:1: the top level
autom4te: /usr/bin/m4 failed with exit status: 63
aclocal: autom4te failed with exit status: 63
autoreconf: aclocal failed with exit status: 63

Chunk files contain less lines than expected

Greetings!

I'm using parsyncfp which uses fpart under the hood. The idea is to copy a (pretty huge) /home dir onto /mnt/new_home (which is on a different device, formatted with notably more inodes). There is (afaik) about ~200M files to enumerate, most of them quite small.

fpart -v -L -z -s 10485760 -o /mnt/new_home/zor/fpcache/f //home

In the process of troubleshooting/diagnosing what I deem pretty slow file enumeration (that runs contrary to everything I could read about fpart), I ran into a discrepancy between what fpart log says about my chunks (about 50K files per chunk) and the actual content of the chunk file (less than 1K files per chunk).

At this point I believe this to be some fundamental misunderstanding on my part about fpart and probably not a bug. Would you be kind enough to clarify what is going on?

[root:/mnt/new_home/zor/fpcache] [base] # tail fpart.log.00.57.52_2019-08-10
Filled part #162: size = 10487077, 46953 file(s)
Filled part #163: size = 10486100, 44556 file(s)
Filled part #164: size = 10485854, 46049 file(s)
Filled part #165: size = 10485967, 46284 file(s)
Filled part #166: size = 10487488, 46771 file(s)
Filled part #167: size = 10485843, 46619 file(s)
Filled part #168: size = 10486423, 48616 file(s)
Filled part #169: size = 10486659, 46067 file(s)
Filled part #170: size = 10485845, 44444 file(s)
Filled part #171: size = 10485861, 44997 file(s)

[root:/mnt/new_home/zor/fpcache] [base] # ls -l f.* | tr -s ' ' | sed 's/f\.//' | awk '// { print $9 " " $8 }' | sort -n | tail | awk '// { "wc -l < f." $1 | getline linecount; print "f." $1 ", " linecount " lines, written at " $2 } '
f.163, 1547 lines, written at 04:49
f.164, 638 lines, written at 04:52
f.165, 983 lines, written at 04:55
f.166, 937 lines, written at 04:58
f.167, 1062 lines, written at 05:01
f.168, 456 lines, written at 05:04
f.169, 715 lines, written at 05:07
f.170, 704 lines, written at 05:11
f.171, 695 lines, written at 05:14
f.172, 621 lines, written at 05:18

New Release

It looks like there have been several changes since the last minor release and I see that some commits were made to the RPM to mark it as "1.2.0" even though there hasn't been an equivalent git tag added. I'd like to submit some pull requests to the RPM spec file to reflect my own changes and then push new releases to the Fedora/EPEL builds I maintain but first I need a new release on here to be tagged.

Partition files do not have sort-friendly filenames

If I generate partition files with the name 'part', I end up with files like this:

part.0
part.1
part.2
...
part.10
part.11
...
part.100
part.101

These filenames aren't 0-padded so don't sort into a logical order:

part.0
part.1
part.10
part.11
part.12
part.13
part.14

Could we have an option for generating 0-padded filenames please? i.e. part.000, part.001, etc.

fpsync: Identify which file is currently processed (transfered)?

Hi,

I'm using fpsync to transfer >1000 files from A to B via ssh.
Some of the files are >1TB, in fact the file size could be up to 10TB.

Question:
How can I identify which file(s) are currently processed in order to verify the process?

THX

FPSync not passing exclude options to rsync

Hi Ganael,

Sorry to bother you again.

We are passing this invocation to FPSync (below). The content after -O and before -n 32 is quoted on the command line. Notice the "Tool options:" don't show any exclusions.

 | 2020-07-27T11:39:07.755-05:00 | /usr/bin/fpsync -O -X /ebs/geisprd/.glusterfs -X ./.root -n 32 -f 500000 -vvvv /ebs/geisprd/ /efs/nv-wc-geis-prd/vault11/
  | 2020-07-27T11:39:07.755-05:00 | 1595867947 =====> [9389] Syncing /ebs/geisprd/ => /efs/nv-wc-geis-prd/vault11/
  | 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Job name: 1595867947-9389
  | 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Start time: Mon Jul 27 16:39:07 UTC 2020
  | 2020-07-27T11:39:07.755-05:00 | 1595867947 ===> Concurrent sync jobs: 32
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Workers: local
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Shared dir: /tmp/fpsync
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Temp dir: /tmp/fpsync
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Tool name: "rsync"
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Tool options: "-lptgoD -v --numeric-ids"     <====
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Max files or directories per sync job: 500000
  | 2020-07-27T11:39:08.005-05:00 | 1595867947 ===> Max bytes per sync job: 4294967296

I also confirmed there aren't any exclusions with the running jobs (via ps -ef):

root      1723  8811  0 Jul27 ?        00:00:00 /bin/sh /tmp/fpsync/work/1595854539-8731/3322
root      1725  1723  0 Jul27 ?        00:01:36 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3322 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//
root      1728  1725  0 Jul27 ?        00:00:24 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3322 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//
root      2292 31289  0 07:33 ?        00:00:07 /usr/bin/rsync -lptgoD -v --numeric-ids -r --files-from=/tmp/fpsync/parts/1595854539-8731/part.3805 --from0 /ebs// /efs/nv-wc-xylem-prd/vault1//

What am I doing wrong?

Thanks,
-Tennis

Look for MAIL_BIN in fpsync only if OPT_MAIL is set

My system doesn't have mail resulting in which: no mail in (/sbin:/bin:/usr/sbin:/usr/bin) on every run; would be good to only look for mail if it is going to be used or silence it as checks are done later on in the code as well :)

Add multi-thread support while crawling the file system

Is it possible to add multi-thread support to fpart, similar to the functionality -n from fpsync?

A little bit of context on it: I am dealing with a file system that has a big amount of small files (1 TB of 7M+ files), and partitioning them with fpart takes about 63 minutes with the parameters I specified. Conversely, by running:

find /path/to/filesystem/ -mindepth 2 -maxdepth 2 -type d | parallel -j8 find {} -type f | split -dl 100000 - list

it takes 36 minutes to complete. Of course, the produced list is not sorted but this is unimportant for my use case. Ideally, I would like to be able to achieve the same results with fpart by leveraging more parallel jobs.

Can this be achieved?

Incompatible option(s) detected within toolopts (option -o)

I run the following command on debian 12 (same on another server):
fpsync -n 15 -o "-a" /Dashi/source/ [email protected]:/Mariana/source/
message:
Incompatible option(s) detected within toolopts (option -o)

fpart -V
fpart v1.5.1
Copyright (c) 2011-2022 Ganael LAPLANCHE [email protected]
WWW: http://contribs.martymac.org
Build options: debug=no, fts=system

rsync -V
rsync version 3.2.7 protocol version 31
Copyright (C) 1996-2022 by Andrew Tridgell, Wayne Davison, and others.
Web site: https://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, symlinks, symtimes, hardlinks, hardlink-specials,
hardlink-symlinks, IPv6, atimes, batchfiles, inplace, append, ACLs,
xattrs, optional secluded-args, iconv, prealloc, stop-at, no crtimes
Optimizations:
SIMD-roll, no asm-roll, openssl-crypto, no asm-MD5
Checksum list:
xxh128 xxh3 xxh64 (xxhash) md5 md4 sha1 none
Compress list:
zstd lz4 zlibx zlib none
Daemon auth list:
sha512 sha256 sha1 md5 md4

rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.

is there any sugguest ?
Thanks a lot.

Meaning of 0th partition file in different modes is confusing

This isn't a bug, but a confusing seeming design.

In '-s' mode, a '0' numbered partition file seems to be always created, to hold files too big for the specified -s size, even if no files were too big. So if the 0 numbered partition file is empty, fpart managed to partition everything ok, and you have to ignore the partition 0 file.

Compare to '-n' mode (i.e. that number of partitions with data split as evenly as possible between them): partition file 0 is just a regular partition file containing file names. It being non-empty is NOT an error condition.

Am I understanding this correctly? Is there anyway to make the 'overflow' partition file for '-s' mode not be the 0th partition file? e.g. a flag could be introduced that means the overflow files partition file is called something unique.

Feature request: Transmitted data size?

Hi Ganael,

At the end of a copy, I notice you put how long (in seconds) the copy process took. That's really great.

Would it also be possible to log how much data was copied?

Thanks,
-Tennis

Dirs containing only ignored files are not output when using -z option

For example, suppose I run fpart -z against the following directory structure on disk, and I'm ignoring files called Thumbs.db:

directoryA/
    (no files)
directoryB/
    Thumbs.db

The partition files that result will have no mention of directoryB, only directoryA.

Should a folder containing only ignored files should be considered an empty folder, rather than a non-existent folder? Otherwise the existence of ignored files actually changes the output of fpart -z, which doesn't seem to make sense.

If this is expected behaviour, any chance of adding a flag that means "regard directories with only ignored files as empty directories"?

Sub-directories and files are not getting copied to the destination

I am trying to run fpsync -n ${threads} -v src/directory dest/directory on a ~5GB files. The command is executed in bash script.
I see the output as:
1684539262 ===> Analyzing filesystem...
1684539265 <=== Fpart crawling finished
1684539266 <=== Parts done: 77/77 (100%), remaining: 0
1684539266 <=== Time elapsed: 4s, remaining: ~0s (~0s/job)
1684539266 <=== Fpsync completed without error in 4s.

While inspecting the target or destination directory it appears that only ~50MB are getting copied.

Not all sub-directories inside src/directory are getting copied eg: src/directory/sub-directory
Not all files inside src/directory are getting copied eg: src/directory/files
Not all files inside sub-directories are getting copied eg: src/directory/sub-directory/files

Am I missing something over here?

Hidden directory is not being excluded

Hi Ganael,

I'm using this invocation:
/usr/bin/fpsync -O '-x .glusterfs -x .root -x .r00t' -n 64 -f 50000 -vvv /ebs/geisprd/ /efs/nv-wc-geis-prd/vault11/

Both root and r00t are files. They are successfully excluded. But the hidden directory .glusterfs is not excluded. It is copied to the destination directory.

Is there something I'm doing wrong?

-Tennis

fpart/fsync is slowly when reading files names from S3 or similiar object file system

Is it possible to use rclone output file system list to fpart, and procssed by fpart

[REQUEST] Homebrew package for fpart

Would it be possible to create a homebrew tap for the fpart package to facilitate deployment on MacOS machines, or to allow that work to be done on a separate repo called homebrew-fpart so one can use the brew tap command? Thanks so much for your time!

How can I create partitions respecting the directory name in alphabatical order?

I would like to split a directory "photos" into partitions, each of which should be 4.4GB, so that I can burn them to DVDs. The files inside it are organized with the directory hierarchy as follow:

......
./photos/2015-02-15-Meeting-With-Friends/
./photos/2015-02-15-Meeting-With-Friends/videos/
./photos/2015-02-18-Family-Trip/
.......
./photos/2020-04-26-Tim-Birthday/
./photos/2020-06-10-Anthony-Birthday/
./photos/2020-06-10-Anthony-Birthday/videos/
./photos/2020-06-10-Anthony-Birthday/memo/
......

How can I create partitions respecting the directory name, so that the partition content would be grouped by date according to the directory name?
Also I am still unable to get mkisofs or xorrisofs to work with the partition files generated by fpart. Would you recommend some tips? Thanks!

Currently transferred data measurement?

Hi Ganael,

Is there any way to figure out how much data has been transferred while FPSync is running?

For example, if I start a transfer of a 3TB volume, how can I tell after 3 hours how much has actually been sent?

Possible?

Thank you,
-Tennis

fpart: incomplete partitions on sshfs

Hello,

We are currently trying to use fpart on a mounted sshfs (part of libfuse) filesystem.
The filesystem contains about 11TB of data within over 10.000.000 directories across 15.000.000 files.
Running fpart produces parts until it hits a little over 1.000.000 files. after that it seems that all the directories look empty or unreadable to fpart as it just writes out the current levels directories, then one level up and so on.
I further discovered that some directories in the middle of the run have their subdirectories listed as empty which should have files in them.
as far as i can tell it is inconsistent for which directories this happens.
The commandline used is:
fpart -f 1000000 -o debug_2.part -zz -vv <MOUNTPOINT>
This was tested with version 1.4.0 packaged in debian bullseye and version 1.4.1 built from the github repo.
Building with the option --enable-debug does not reveal anything more to me, the output just shows valid_file(): iterating over the same files and directories (which later appear empty).

Running fpart on the original filesystem on the other server (ext4) produces complete partitions as far as i can tell.

I'd be happy to do further debug work if you tell me what you need.

Cheers,
Valentin

martymac / fpart Goto Github PK

fpart's People

Contributors

Stargazers

Watchers

Forkers

fpart's Issues

Recommend Projects

Recommend Topics

Recommend Org