boredzo / dd-parallel Goto Github PK
View Code? Open in Web Editor NEWA parallelized block copier for macOS that copies large quantities of data faster than the system dd.
A parallelized block copier for macOS that copies large quantities of data faster than the system dd.
Once in a blue moon, dd-parallel-posix writes a corrupted version of the input. This is a critical blocker.
A bunch of enhancements in one:
The SIGINFO progress report should include an estimate of when the copy will finish.
The estimate could be a range of two estimates: one based on the current momentary copy speed, and one based on the overall average copy speed.
The range of that estimate could also be used to adjust the precision with which the estimates are presented. If the range is a few minutes, the estimates could be given to the minute. If the range is less than a minute, seconds may be warranted.
If both estimates are within today, the day should be omitted.
dd-parallel/PRHMD5Context.m:32:38: Implicit conversion loses integer precision: 'const NSUInteger' (aka 'const unsigned long') to 'CC_LONG' (aka 'unsigned int')
This warning, which appears on LP64 architectures such as x86_64, is correct, and the code has been flagged with a FIXME comment:
//FIXME: CC_LONG is technically not equivalent to NSUInteger. This should update in chunks until numBytes is exhausted.
CC_MD5_Update(&_context, bytesPtr, numBytes);
numBytes
is typed as NSUInteger
(unsigned long
), but CC_MD5_Update
expects a CC_LONG
, which (as the warning says) is defined as unsigned int
.
Questionable type definition choices in CommonCrypto notwithstanding, the comment above describes the correct resolution to the warning. CC_MD5_Update
should be called in a loop as long as numBytes
is greater than UINT_MAX
, shaving off UINT_MAX
bytes at a time, and then one more time for whatever's left. (When numBytes
is less than UINT_MAX
to start with, which is 100% of the time, the loop will run zero times.)
This isn't urgent; since the chunk size used in main
is 1 MiB, numBytes
should generally be no more than that, nowhere near the limit of an unsigned int
. Even so, it's good to fix things the right way.
Just changing the type of numBytes
won't work; that just moves the problem to main
, which passes the ssize_t
(signed long
) returned by write
. The code in main
is convoluted enough without introducing this loop there.
This thing would be extremely helpful on non-Mac platforms. A version that doesn't require GCD or Foundation or Objective-C would be handy.
(“Pure” POSIX might be a bit lofty; pthread_setname_np
is extremely valuable.)
If the writer thread runs out of space, or otherwise encounters an error, it simply exits and may leave a lock dangling. We should report the failure and exit with the correct failure code rather than hanging.
I'm not seeing a difference with vs. without F_RDAHEAD
on a RAM disk (though it's possible I would if I were copying from a slower device), but F_NOCACHE
makes a massive difference. It's the difference between 2 GB/sec and 1 MB/sec.
There's not really any reason to use /dev/diskN when /dev/rdiskN is available AFAIK. The non-r disks have a kernel buffer in front of them that makes them way slower—both dd and dd-parallel get their speed limited to that buffer's throughput.
I think there are cases when rdiskN isn't available (I remember seeing a forum thread about such a situation when I was web-searching the differences between the two, though I didn't bookmark it or look more deeply into that problem). So we should probably check whether rdiskN is available and only switch to it when it is.
Also, we probably want to infer /dev when the user refers to a file named r?disk[0-9]+(s[0-9]+)?
by name alone that doesn't exist in the CWD. Maybe with a confirmation prompt, just in case they are trying to copy from/to a file in the CWD and made a typo.
Tried a large file again and it got nowhere. sample says there's only a reader thread, no writer thread. Seems bad!
I tentatively blame #10.
% bin/dd-parallel configure /Volumes/RAM\ Disk/configure
Copied 0 bytes in 0 ms (overall avg 0 bytes/sec)
As much as this is like using a tank as a fly-swatter, dd-parallel should be able to copy any file, big or small.
It would be amazing if someone would port this to Linux-based OSs, including Intel OSs such as Linux Mint and ARM OSs such as Raspberry Pi OS. I'd love to plug two drives into the USB 3 ports on a Pi 4 and see how fast I can drive them.
--md5
option, at least, but I don't know where to go from a fault identified by it to an investigation, aside swinging printf
around in a dark cave and hoping you hit something.The code doesn't quite build as-is just yet on Linux Mint.
macOS Monterey:
<sys/syslimits.h>
clock_gettime
available through one of the headers we're already includingPTHREAD_ERRORCHECK_MUTEX_INITIALIZER
CLOCK_UPTIME_RAW
Linux Mint 20.1:
<limits.h>
<time.h>
to get clock_gettime
strlcpy
/strlcat
🙁SIGINFO
(ah, so this is where dd uses SIGUSR1
)PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
pthread_setname_np
but it looks like you have to define _GNU_SOURCE
for itCLOCK_MONOTONIC_RAW
, which works like CLOCK_UPTIME_RAW
does on macOS (which also has a clock called CLOCK_MONOTONIC_RAW
, but macOS's counts time in which the machine is asleep whereas Linux's doesn't)As of 1328c03, this actually is basically sequential. We can do better than this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.