boredzo / dd-parallel Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 108 KB

A parallelized block copier for macOS that copies large quantities of data faster than the system dd.

Objective-C 31.58% Roff 5.00% Makefile 1.12% C 62.30%

copying dd disk duplication block-copier

dd-parallel's People

Contributors

Watchers

dd-parallel's Issues

dd-parallel-posix occasionally corrupts data

Once in a blue moon, dd-parallel-posix writes a corrupted version of the input. This is a critical blocker.

Better progress format

A bunch of enhancements in one:

Provide the amount copied so far as the largest applicable unit (MB/GB/TB) in addition to bytes
Comma-separate all large numbers
Provide time so far as H:M:S
Provide copying rate as largest applicable unit instead of bytes
Provide copying rate for the last minute (or maybe x number of cycles) or so, in addition to overall average
- (maybe also some kind of graph of copying rates by minute and/or second?)
Estimate date and time of completion
- Give a range, where one end is based on the overall average speed and the other is based on the last minute
- Adjust the precision of the estimate based on the span of the range (e.g., if the range is within a minute, give estimates to the second; else, give estimates to the minute)

Estimate date/time of completion

The SIGINFO progress report should include an estimate of when the copy will finish.

The estimate could be a range of two estimates: one based on the current momentary copy speed, and one based on the overall average copy speed.

The range of that estimate could also be used to adjust the precision with which the estimates are presented. If the range is a few minutes, the estimates could be given to the minute. If the range is less than a minute, seconds may be warranted.

If both estimates are within today, the day should be omitted.

Integer precision loss in call to CC_MD5_Update

dd-parallel/PRHMD5Context.m:32:38: Implicit conversion loses integer precision: 'const NSUInteger' (aka 'const unsigned long') to 'CC_LONG' (aka 'unsigned int')

This warning, which appears on LP64 architectures such as x86_64, is correct, and the code has been flagged with a FIXME comment:

		//FIXME: CC_LONG is technically not equivalent to NSUInteger. This should update in chunks until numBytes is exhausted.
		CC_MD5_Update(&_context, bytesPtr, numBytes);

numBytes is typed as NSUInteger (unsigned long), but CC_MD5_Update expects a CC_LONG, which (as the warning says) is defined as unsigned int.

Questionable type definition choices in CommonCrypto notwithstanding, the comment above describes the correct resolution to the warning. CC_MD5_Update should be called in a loop as long as numBytes is greater than UINT_MAX, shaving off UINT_MAX bytes at a time, and then one more time for whatever's left. (When numBytes is less than UINT_MAX to start with, which is 100% of the time, the loop will run zero times.)

This isn't urgent; since the chunk size used in main is 1 MiB, numBytes should generally be no more than that, nowhere near the limit of an unsigned int. Even so, it's good to fix things the right way.

Just changing the type of numBytes won't work; that just moves the problem to main, which passes the ssize_t (signed long) returned by write. The code in main is convoluted enough without introducing this loop there.

Pure-POSIX port

This thing would be extremely helpful on non-Mac platforms. A version that doesn't require GCD or Foundation or Objective-C would be handy.

(“Pure” POSIX might be a bit lofty; pthread_setname_np is extremely valuable.)

Handle failures

If the writer thread runs out of space, or otherwise encounters an error, it simply exits and may leave a lock dangling. We should report the failure and exit with the correct failure code rather than hanging.

Sometimes the last mebibyte doesn't get copied on a single-core machine

If I run the POSIX version on a single-core VM running Debian, copying a 4 MiB file on a ramfs volume, about half the time I get this:

Screenshot of a VM running Debian in which dd-parallel has verbosely copied 3 mebibytes of a 4-mebibyte file.

cktest considers the copy intact, and catting the copy confirms that it includes mebibytes 0, 1, and 2, but not 3.

POSIX port should use F_RDAHEAD and F_NOCACHE when available

I'm not seeing a difference with vs. without F_RDAHEAD on a RAM disk (though it's possible I would if I were copying from a slower device), but F_NOCACHE makes a massive difference. It's the difference between 2 GB/sec and 1 MB/sec.

Quietly correct /dev/diskN paths to /dev/rdiskN

There's not really any reason to use /dev/diskN when /dev/rdiskN is available AFAIK. The non-r disks have a kernel buffer in front of them that makes them way slower—both dd and dd-parallel get their speed limited to that buffer's throughput.

I think there are cases when rdiskN isn't available (I remember seeing a forum thread about such a situation when I was web-searching the differences between the two, though I didn't bookmark it or look more deeply into that problem). So we should probably check whether rdiskN is available and only switch to it when it is.

Also, we probably want to infer /dev when the user refers to a file named r?disk[0-9]+(s[0-9]+)? by name alone that doesn't exist in the CWD. Maybe with a confirmation prompt, just in case they are trying to copy from/to a file in the CWD and made a typo.

Writer thread prematurely exiting

Tried a large file again and it got nowhere. sample says there's only a reader thread, no writer thread. Seems bad!

I tentatively blame #10.

POSIX port doesn't copy small files

% bin/dd-parallel configure /Volumes/RAM\ Disk/configure
Copied 0 bytes in 0 ms (overall avg 0 bytes/sec)

As much as this is like using a tank as a fly-swatter, dd-parallel should be able to copy any file, big or small.

Linux port

It would be amazing if someone would port this to Linux-based OSs, including Intel OSs such as Linux Mint and ARM OSs such as Raspberry Pi OS. I'd love to plug two drives into the USB 3 ports on a Pi 4 and see how fast I can drive them.

Easy:

clang exists on Linux
libdispatch (GCD) exists on Linux
Xcode doesn't, but it's not like this is a complex codebase—you could build this just fine with a simple Makefile

Moderate:

I have no idea how testing with TSan or otherwise testing against race conditions and other parallelism bugs works on Linux. You have the --md5 option, at least, but I don't know where to go from a fault identified by it to an investigation, aside swinging printf around in a dark cave and hoping you hit something.
This is written in Objective-C, and while much of it doesn't really need to be, it needs to either use GNUstep Foundation, be ported to CF, or be rewritten in something else entirely.

Hard:

The current implementation-in-progress of #1 uses NSByteCountFormatter and NSDateComponentsFormatter. I don't know whether GNUstep Foundation has these APIs, or how complete they are, or what alternatives exist (aside from reinventing those particular wheels).
Performance optimization (including concurrency optimization). I have no idea how Linux devs profile without sample and Instruments. I assume there is a way, but I don't know how effective those tools are, in general or on this sort of highly-parallel code.

Linux compatibility

The code doesn't quite build as-is just yet on Linux Mint.

macOS Monterey:

defines limits in <sys/syslimits.h>
makes clock_gettime available through one of the headers we're already including
has PTHREAD_ERRORCHECK_MUTEX_INITIALIZER
has CLOCK_UPTIME_RAW

Linux Mint 20.1:

defines limits in <limits.h>
requires including <time.h> to get clock_gettime
doesn't have strlcpy/strlcat 🙁
doesn't have SIGINFO (ah, so this is where dd uses SIGUSR1)
has PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
has pthread_setname_np but it looks like you have to define _GNU_SOURCE for it
has CLOCK_MONOTONIC_RAW, which works like CLOCK_UPTIME_RAW does on macOS (which also has a clock called CLOCK_MONOTONIC_RAW, but macOS's counts time in which the machine is asleep whereas Linux's doesn't)

Fix the parallelism

As of 1328c03, this actually is basically sequential. We can do better than this.