Giter VIP home page Giter VIP logo

Comments (24)

ezrosent avatar ezrosent commented on July 18, 2024 1

Sure thing. I should be able to get to that in the next day or two. Thanks for bearing with me on this.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024 1

I have a theory about what might be going on. I should be able to test it out on my own later, but in the meantime could you confirm if your CPU has AVX2 support (see "CPUs with AVX2" here) ? I'm thinking that the runtime fallback to SSE2 may not be working. That would explain why you don't get SIGILL when compiling from source; I should be able to confirm later by reproducing the initial bug while compiling without AVX2.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024 1

Okay, I have reproduced the initial issue, there is a bug in the non-AVX2 implementation of whitespace splitting. I'm going to focus on that bug first.

There's a further bug, though, which is why you got SIGILL. I just re-checked my my code and there isn't anything obviously wrong with the runtime feature detection (in that, I'd expect that you would have gotten the same, buggy, output when running the pre-built binary). That issue could take longer to track down, and it's possibly an issue with one of frawk's dependencies.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024 1

quick update: I've made progress on the fix_non_avx2 branch in fixing bugs and increasing test coverage, but I may not be able to merge until next week.

To your point on not being able to build from git directly, I suspect that is related to that "second problem" I mentioned above. I have some preliminary changes on the same branch that I think will help, but I'll only be able to verify once I get a QEMU setup without AVX2 (I'm afraid my last computer without AVX2 isn't functioning).

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024 1

Had to take more time away than expected. I started up work again on the fix_non_avx2 branch this week. I think a number of things have been fixed, but I'm still seeing some avx instructions getting executed. I'll spend some more time looking, but I may also merge the branch as is to fix your initial issue, depending on how much progress I make in the short term tracking down the inclusion of avx instructions.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024 1

After some more reading, I think this is mostly a matter of passing the right compiler flags. I'll follow up and try and post binaries with minimal dependencies, but for now I'm fairly confident that building from source as you have been doing so far should work, once the aforementioned bug fixes have been merged.

Thanks again for your patience. I'll plan to close this issue out in the next few days and file other issues for follow-up items.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Thanks again for taking the time to create a small test case. I am having some trouble reproducing this at head (which I think should be at the same commit as 0.4.1 on crates.io).

$ cat > /tmp/fields.txt
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx  yyyyyyyyyyyyyyyyyyyyyyyyyyyy 111111
xxxxxxxxxxxxxxxxxxxxxxxxxxx    yyyyyyyyyyyyyyyyyyyyyyyy     222222
xxxxxxxxxxxxxxxxxxxxxxxxxxxx  yyyyyyyyyyyyyyyyyyyyyyyyyyyy 3333333
xxxxxxxxxxxxxxxxxxxxxxxxxx    yyyyyyyyyyyyyyyyyyyyyyyy     4444444

$ target/debug/frawk '{print NR ":" $3;}' /tmp/fields.txt
1:111111
2:222222
3:3333333
4:4444444
# same output for release builds as well

As for why you had trouble shrinking this case further: frawk determines field and line boundaries for the default field splitter in large batches, sometimes for multiple lines in parallel. Bugs in the code that does the splitting might only show up when you feed it multiple lines of sufficient length.

This is on a linux machine. Note that I've only tested this on x86; I plan to get an M1 machine eventually but haven't gotten around to it, and some of the whitespace-splitting code is pretty architecture-specific.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

Oh ok. Is there something I can do to provide further debug details? This is what I used to install frawk version 0.4.1:

sudo apt install rustc # version 1.47.0
cargo install frawk --no-default-features --features use_jemalloc,allow_avx2

I'm on Ubuntu 20.04.1 LTS, x86_64.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Gotcha. I confirmed that that feature combination still works for me. I'll see about setting up a VM tonight or tomorrow with your exact OS and rustc version (I use rustup.rs to get more recent versions of rustc/cargo). I'd honestly be pretty surprised if this was causing an issue, but it shouldn't be too difficult to rule it out.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Hmmm, yeah. I get the exact same output I did before running a fresh VM with Ubuntu 20.04 with rustc 1.47 (albeit without jemalloc, but again I doubt that's the issue here).

The only other thing I can think of is if I'm not pasting the data correctly. Maybe post a hash of the data file, or put it in a gist?

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I purged rustc and cargo and did a fresh installation using the commands given before. Still the same error.

$ md5sum fields.txt
418e730b6b83f9400ea813042f9003a0  fields.txt

# not sure if this will help
$ md5sum frawk
02efa88db6aea820be1635b98640fc61  frawk

If it is possible, could you release a generic Linux binary or .deb file? I don't know Rust, so not sure if I'm making some mistake.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

I got a docker setup to reliably produce binaries for ubuntu 20.04, posted here. Let me know if this helps.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I tried both the 20.04 variations, I get this error for anything I try:

$ ./frawk 'BEGIN{print "hi";}'
Illegal instruction (core dumped)

Also, I don't remember the option, but I thought there is a way to create a x86_64-unknown-linux-musl binary which would work on most Linux distros (instead of doing it for Ubuntu 20, 18, etc). I think if you can contact Andrew Gallant (ripgrep author) or perhaps ask on https://www.reddit.com/r/rust/ you'll get better info.

Here's one thread I found: https://www.reddit.com/r/rust/comments/eaa8f7/building_compatible_linux_binaries/

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Thanks for the tips! I haven't been using cross because it wasn't obvious to me how to configure external dependencies like LLVM (but I do want to go back and confirm when I have more time). I have updated the release with a statically-linked musl binary. Let me know if that helps, and thanks again!

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I'm still getting Illegal instruction (core dumped) with the musl binary. I have been wondering if there's some issue with my machine. It would be good if someone else can try and see if it works for them. Otherwise we might try various options and still not figure out the issue because of some machine specific problem.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I checked /proc/cpuinfo and avx isn't present in the flags. I certainly hope this helps to narrow down the issues.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Update: I have a fix tested for the initial bug on the fix_non_avx2 branch. Feel free to give it a spin. I'll probably merge that branch later this week once I've improved test coverage for the other specialized splitters (bytes, csv, tsv).

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I haven't been able to compile from the git repository directly (may be I should have asked about that before). From the fix_non_avx2 branch, now I tried commands like cargo build --release --no-default-features and cargo install --path . --no-default-features but I get signal: 4, SIGILL: illegal instruction.

I'll wait for the crate to be updated, that works for me.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

Cool, thanks for the update. Take your time though :)

I don't know Rust, otherwise I'd have tried to help fix issues. This is a cool project and I'm interested in writing blog post/book about it. Some times, I see users on stackoverflow working with GBs of data, they'll certainly benefit from a faster tool.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

Alright. I've bumped the version to 0.4.2; I think if you cargo install as you did initially, things should work. I've opened #62 to track the remaining work.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I'm getting this error:

  feature `resolver` is required

  consider adding `cargo-features = ["resolver"]` to the manifest

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

some brief searching around suggests that this requires a more recent version of cargo. Possibly due to the recent version bump for the cranelift crates. I'll try and confirm in a bit in a vm.

from frawk.

ezrosent avatar ezrosent commented on July 18, 2024

I've confirmed this is due to the cranelift version bump. It's not clear to me if there are any workarounds easier than installing a more recent version of cargo via rustup.

from frawk.

learnbyexample avatar learnbyexample commented on July 18, 2024

I installed the newer version using rustup and it works 👍

from frawk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.