Giter VIP home page Giter VIP logo

Comments (6)

ezrosent avatar ezrosent commented on July 18, 2024 1

I definitely think that line buffering on output was a big issue in the last benchmark. That's fixed in the latest commit; reading is still slower though.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Fascinating! Thanks for filing this issue. I think mawk, frawk, and gawk may all be buffering their file IO a bit differently. I can try to take a look at what mawk is doing and compare it with the Rust standard library.

from frawk.

ghuls avatar ghuls commented on July 18, 2024

I assume mawk might not do line buffering in this case.

The original code I had is actually decompressing gziiped files and writing out gzipped files via those *_cmd commands:
https://github.com/aertslab/single_cell_toolkit/blob/master/barcode_10x_scatac_fastqs.sh

from frawk.

ghuls avatar ghuls commented on July 18, 2024

Could CommanReader be used for reading from pipes to solve this issue? https://docs.rs/grep-cli/latest/grep_cli/struct.CommandReader.html

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Feel free to try things out on that latest commit: I don't notice any improvement (and wrapping in a BufRead doesn't seem to help either, unfortunately).

from frawk.

ghuls avatar ghuls commented on July 18, 2024

Probably it is not related to reading from a pipe, but just getline that is slow.
When reading from a premade file directly (with getline) instead of a piped filehandle, the slowdown is the same.

❯  time yes | head -n 100000000 | frawk '{ print $0 }' > /dev/null

real    0m8.219s
user    0m8.313s
sys     0m0.421s


❯  time yes | head -n 100000000 | frawk 'BEGIN { while ( (getline line1 < "/dev/stdin") > 0 ) { print line1 } }' > /dev/null

real    0m23.011s
user    0m23.014s
sys     0m0.491s


❯  time yes | head -n 100000000 | frawk 'BEGIN { while ( (getline line1 < "/dev/stdin") > 0 ) { print line1 > "/dev/null" } }'

real    0m26.739s
user    0m26.760s
sys     0m0.512s


❯  time yes | head -n 100000000 | frawk 'BEGIN { write_cmd = "cat > /dev/null"; while ( (getline line1 < "/dev/stdin") > 0 ) { print line1 | write_cmd } }'

real    0m26.507s
user    0m26.519s
sys     0m0.564s

# Create file first.
❯  yes | head -n 100000000 > 100000000.txt

❯  time frawk 'BEGIN { while ( (getline line1 < "100000000.txt") > 0 ) { print line1 } }' > /dev/null

real    0m23.025s
user    0m22.778s
sys     0m0.148s

Also now that CommandReader is used, it should be relatively straightforward to be able to handle compressed text files automagically if requested by constructing a CommandReader with the correct decompression tool.

from frawk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.