When invoking Parallel 70,000 files with filenames piped on stdin, its memory usage gr

"command error: Resource temporarily unavailable (os error 11)" when processing a lot of files about parallel HOT 16 CLOSED

mmstick commented on August 21, 2024

"command error: Resource temporarily unavailable (os error 11)" when processing a lot of files

from parallel.

Comments (16)

Shnatsel commented on August 21, 2024

For comparison, GNU parallel is limited to 15Mb of RAM on my system on the same workload.

from parallel.

mmstick commented on August 21, 2024

I'll be landing a large amount of changes soon that refactors a decent portion of the source code and adds quiet mode, and once that is done I'll look into how I can improve the resource consumption of inputs. I may need to remove the feature of counting the total number of jobs and use an iterator for reading inputs from standard input.

from parallel.

mmstick commented on August 21, 2024

Spent a whole day trying to fix a bug when it was actually caused by the beta and nightly compiler. Anyway, I've landed the changes that I meant to land yesterday, so now I will begin to work on handling standard input in a more efficient manner.

from parallel.

Shnatsel commented on August 21, 2024

Ah, I should have warned you about the nightly compiler - I've already figured out that it's buggy with your parallel.

I've already tried chars() iterator on BufReader in my Rust version of tr, and in addition to being an unstable language feature it's also very slow. lines() is probably a better idea.

Also, in 0.4.1 your parallel was 2x slower than GNU parallel when reading arguments from stdin.

from parallel.

mmstick commented on August 21, 2024

Updating the issue to say that I have Nightly builds fixed by eliminating the unsafe { mem::uninitialized<Child>() } usage. Additionally, I have been working on fixing this issue, but it will take some time to implement as I want to implement it as efficiently as possible the first time, such as trying not to use Vectors.

I plan to solve the issue by buffering 64K worth of arguments at a time and writing the arguments to disk in an unprocessed file in reverse-newline-delimited order. Then, creating an iterator that will buffer 64K worth of arguments at a time and truncate the unprocessed file after reading arguments. As arguments are completed, they will be written to processed file. This should allow me to retain the ability to determine total number of jobs and getting the Nth job, whether it's currently in memory, or in either the processed or unprocessed file. This should keep memory usage very low. Once everything is working, I'll benchmark the program with perf stat and time to get memory consumption and cycles/time spent and modify the size of the buffer to reduce the number of syscalls.

from parallel.

mmstick commented on August 21, 2024

I'm pretty close to resolving this problem. The issue of memory is fixed with my local changes now that inputs are being buffered to and from the disk into byte arrays. However, I've yet to resolve the issue of OS Error 11 as that's being caused by Rust failing to close child processes for some reason. I'm not sure how to ensure that child processes are closed so I'm asking the community for help with this issue.

from parallel.

Shnatsel commented on August 21, 2024

Processes that you're done working with stay around as zombie processes; this means that they have terminated, and the only thing left from them is an entry in the process table and the exit code. As soon as the exit code is read by parallel, the process table entry will be gone.

This is done by waitpid syscall and I believe the appropriate function to call from Rust is https://doc.rust-lang.org/std/process/struct.Child.html#method.wait

from parallel.

mmstick commented on August 21, 2024

I've been able to fix it by borrowing the Child process as a mutable reference and then borrowing the child's fields with the as_mut() methods. Previously, I was not able to use the wait() method because it caused a borrow checker conflict with the child's fields being borrowed.

It will still be a while before I push the fixes though. I'm in the middle of refactoring a large portion of the code I've written so far, which has caused some bugs that I'm having to track down.

from parallel.

mmstick commented on August 21, 2024

The good news is that I just successfully processed 100,000 inputs, seq 1 100000, using only 13 Mbytes according to the maximum resident set size reported by time.

from parallel.

mmstick commented on August 21, 2024

Later today I'll have the changes landed for you to test out. It's going to be quite the update.

20 files changed, 5265 insertions(+), 612 deletions(-)

And some benchmarks:

Rust Parallel

    ~/D/parallel (master) $ seq 1 10000 | time -v target/release/parallel echo > /dev/null
        Command being timed: "target/release/parallel echo"
        User time (seconds): 0.48
        System time (seconds): 2.48
        Percent of CPU this job got: 59%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.93
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 12928
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2198164
        Voluntary context switches: 73174
        Involuntary context switches: 36678
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Parallel

    ~/D/parallel (master) $ seq 1 10000 | time -v parallel echo > /dev/null
        Command being timed: "parallel echo"
        User time (seconds): 97.04
        System time (seconds): 29.17
        Percent of CPU this job got: 232%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:54.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 66848
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 15070207
        Voluntary context switches: 250452
        Involuntary context switches: 113320
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

from parallel.

mmstick commented on August 21, 2024

The new release has been made, so you can try it out to see if it's working as you like.

from parallel.

Shnatsel commented on August 21, 2024

It doesn't seem to leave a lot of zombie processes around anymore. Huzzah!

Tried it on 10,000 files so far, and it's now significantly slower on a simple 'cat' workload than gnu parallel. The command line is:

find '/folder/with/lots/of/text/files/' -type f | head -n 10000 | parallel -j 6 cat '{}' > /dev/null

Runtime and peak memory usage for each:
rust: 1:40, 36,8MiB
rust, --no-shell: 1:36, 58,9MiB
gnu: 0:31, 14,6MiB

Additionally, the memory usage for Rust parallel grows over time while GNU parallel uses a fixed amount of memory.

For the record, the regular use case for this is piping all that stuff to grep instead of /dev/null to get aggregate statistics for the entire dataset.

from parallel.

Shnatsel commented on August 21, 2024

This issue is resolved by 0.5.0 release.

from parallel.

Shnatsel commented on August 21, 2024

Shall I open another issue for the lack of performance parity with GNU?

from parallel.

mmstick commented on August 21, 2024

It should be opened as a bug. I'm guessing that memory consumption is rising because the standard output and error of each task is being temporarily buffered into memory, and subsequently dropped from memory after that process has had it's turn being printed. The solution will be to modify the piping to use the DiskBuffer mechanism I created for inputs.

from parallel.

mmstick commented on August 21, 2024

I think you'll find with the latest version, 0.7.0, the issue of memory consumption has been thoroughly resolved.

from parallel.

"command error: Resource temporarily unavailable (os error 11)" when processing a lot of files about parallel HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent