Giter VIP home page Giter VIP logo

dd-parallel's People

Contributors

boredzo avatar

Watchers

 avatar  avatar

dd-parallel's Issues

Better progress format

A bunch of enhancements in one:

  • Provide the amount copied so far as the largest applicable unit (MB/GB/TB) in addition to bytes
  • Comma-separate all large numbers
  • Provide time so far as H:M:S
  • Provide copying rate as largest applicable unit instead of bytes
  • Provide copying rate for the last minute (or maybe x number of cycles) or so, in addition to overall average
    • (maybe also some kind of graph of copying rates by minute and/or second?)
  • Estimate date and time of completion
    • Give a range, where one end is based on the overall average speed and the other is based on the last minute
    • Adjust the precision of the estimate based on the span of the range (e.g., if the range is within a minute, give estimates to the second; else, give estimates to the minute)

Estimate date/time of completion

The SIGINFO progress report should include an estimate of when the copy will finish.

The estimate could be a range of two estimates: one based on the current momentary copy speed, and one based on the overall average copy speed.

The range of that estimate could also be used to adjust the precision with which the estimates are presented. If the range is a few minutes, the estimates could be given to the minute. If the range is less than a minute, seconds may be warranted.

If both estimates are within today, the day should be omitted.

Integer precision loss in call to CC_MD5_Update

dd-parallel/PRHMD5Context.m:32:38: Implicit conversion loses integer precision: 'const NSUInteger' (aka 'const unsigned long') to 'CC_LONG' (aka 'unsigned int')

This warning, which appears on LP64 architectures such as x86_64, is correct, and the code has been flagged with a FIXME comment:

		//FIXME: CC_LONG is technically not equivalent to NSUInteger. This should update in chunks until numBytes is exhausted.
		CC_MD5_Update(&_context, bytesPtr, numBytes);

numBytes is typed as NSUInteger (unsigned long), but CC_MD5_Update expects a CC_LONG, which (as the warning says) is defined as unsigned int.

Questionable type definition choices in CommonCrypto notwithstanding, the comment above describes the correct resolution to the warning. CC_MD5_Update should be called in a loop as long as numBytes is greater than UINT_MAX, shaving off UINT_MAX bytes at a time, and then one more time for whatever's left. (When numBytes is less than UINT_MAX to start with, which is 100% of the time, the loop will run zero times.)

This isn't urgent; since the chunk size used in main is 1 MiB, numBytes should generally be no more than that, nowhere near the limit of an unsigned int. Even so, it's good to fix things the right way.

Just changing the type of numBytes won't work; that just moves the problem to main, which passes the ssize_t (signed long) returned by write. The code in main is convoluted enough without introducing this loop there.

Pure-POSIX port

This thing would be extremely helpful on non-Mac platforms. A version that doesn't require GCD or Foundation or Objective-C would be handy.

(“Pure” POSIX might be a bit lofty; pthread_setname_np is extremely valuable.)

Handle failures

If the writer thread runs out of space, or otherwise encounters an error, it simply exits and may leave a lock dangling. We should report the failure and exit with the correct failure code rather than hanging.

Quietly correct /dev/diskN paths to /dev/rdiskN

There's not really any reason to use /dev/diskN when /dev/rdiskN is available AFAIK. The non-r disks have a kernel buffer in front of them that makes them way slower—both dd and dd-parallel get their speed limited to that buffer's throughput.

I think there are cases when rdiskN isn't available (I remember seeing a forum thread about such a situation when I was web-searching the differences between the two, though I didn't bookmark it or look more deeply into that problem). So we should probably check whether rdiskN is available and only switch to it when it is.

Also, we probably want to infer /dev when the user refers to a file named r?disk[0-9]+(s[0-9]+)? by name alone that doesn't exist in the CWD. Maybe with a confirmation prompt, just in case they are trying to copy from/to a file in the CWD and made a typo.

POSIX port doesn't copy small files

% bin/dd-parallel configure /Volumes/RAM\ Disk/configure
Copied 0 bytes in 0 ms (overall avg 0 bytes/sec)

As much as this is like using a tank as a fly-swatter, dd-parallel should be able to copy any file, big or small.

Linux port

It would be amazing if someone would port this to Linux-based OSs, including Intel OSs such as Linux Mint and ARM OSs such as Raspberry Pi OS. I'd love to plug two drives into the USB 3 ports on a Pi 4 and see how fast I can drive them.

Easy:

  • clang exists on Linux
  • libdispatch (GCD) exists on Linux
  • Xcode doesn't, but it's not like this is a complex codebase—you could build this just fine with a simple Makefile

Moderate:

  • I have no idea how testing with TSan or otherwise testing against race conditions and other parallelism bugs works on Linux. You have the --md5 option, at least, but I don't know where to go from a fault identified by it to an investigation, aside swinging printf around in a dark cave and hoping you hit something.
  • This is written in Objective-C, and while much of it doesn't really need to be, it needs to either use GNUstep Foundation, be ported to CF, or be rewritten in something else entirely.

Hard:

  • The current implementation-in-progress of #1 uses NSByteCountFormatter and NSDateComponentsFormatter. I don't know whether GNUstep Foundation has these APIs, or how complete they are, or what alternatives exist (aside from reinventing those particular wheels).
  • Performance optimization (including concurrency optimization). I have no idea how Linux devs profile without sample and Instruments. I assume there is a way, but I don't know how effective those tools are, in general or on this sort of highly-parallel code.

Linux compatibility

The code doesn't quite build as-is just yet on Linux Mint.

macOS Monterey:

  • defines limits in <sys/syslimits.h>
  • makes clock_gettime available through one of the headers we're already including
  • has PTHREAD_ERRORCHECK_MUTEX_INITIALIZER
  • has CLOCK_UPTIME_RAW

Linux Mint 20.1:

  • defines limits in <limits.h>
  • requires including <time.h> to get clock_gettime
  • doesn't have strlcpy/strlcat 🙁
  • doesn't have SIGINFO (ah, so this is where dd uses SIGUSR1)
  • has PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
  • has pthread_setname_np but it looks like you have to define _GNU_SOURCE for it
  • has CLOCK_MONOTONIC_RAW, which works like CLOCK_UPTIME_RAW does on macOS (which also has a clock called CLOCK_MONOTONIC_RAW, but macOS's counts time in which the machine is asleep whereas Linux's doesn't)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.