Giter VIP home page Giter VIP logo

Comments (5)

haesbaert avatar haesbaert commented on May 26, 2024 1

I've tested the kernel patch from axboe in axboe/liburing#665 (comment) and indeed it fixes the bug.

from eio.

haesbaert avatar haesbaert commented on May 26, 2024

Sorry I had missed the original message, yes it hangs for me, rather fast:

RUN 579
Testing `eio_linux'.
This run has ID `26FSRR2T'.

  ...           io          0   copy.

with

sam:eio: local i=0;while true; do echo RUN $i; i=$((i+1)); ./_build/default/lib_eio_linux/tests/test.exe;done
Linux sam 5.18.16-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 03 Aug 2022 11:25:04 +0000 x86_64 GNU/Linux

from eio.

haesbaert avatar haesbaert commented on May 26, 2024

I reduced the program that hangs as much as I could to this, I think my initial impression is correct, somehow the EOF is never seen by the reader.

let test_copy () =
  Eio_linux.run ~queue_depth:10 @@ fun _stdenv ->
  Eio.Switch.run @@ fun sw ->
  let from_pipe, to_pipe = Eio_linux.pipe sw in
  let buffer = Cstruct.create 20 in
  Eio.Flow.copy (Eio.Flow.string_source "a") to_pipe;
  Eio.Flow.close to_pipe;
  let () = 
    try 
      while true do
        ignore (Eio.Flow.read from_pipe buffer)
      done
  with End_of_file -> ()
  in
  Eio.Flow.close from_pipe

strace here: https://gist.github.com/haesbaert/437fd9e30e4568cc3f5ba95f0387d63a

Writer is FD6, which is actually closed during the hang (by looking at /proc/foo), FD5 (reader) is still opened and we are blocked in io_uring_wait_cqe().

The pattern I see is that if the close happens before a readv is queued, sometimes the readv will never see EOF. If the readv is submitted before the close, it always sees the EOF. I've tested this in two machines with slightly different kernels 5.18 vs 5.19, with released as well as current code base for eio and uring, behaviour is the same.

My only theory of why you can't trigger the bug is because on your tests the writer/reader dance terminates always in the order where the close only happens after the reader is queued, the order really depends on which CQE comes back first on the Fiber.both() tests. This program above should always trigger the bad case, I can hang it in < 5 seconds.

Tomorrow I wanna try to peek at the uring stats, like dropped requests and whatnot, I'll also write the equivalent in C and try to trigger.

At this point, it smells like a kernel bug though.

from eio.

haesbaert avatar haesbaert commented on May 26, 2024

I can confirm the bug with a simple C program https://gist.github.com/haesbaert/10d3e3bb5fa9171dfcf65e1f5b58e95c
The non-blocking version works.
cc -o uring_tests uring_tests.c -Wall -luring && while true;do date; ./uring_tests;done

from eio.

talex5 avatar talex5 commented on May 26, 2024

I can confirm the bug with a simple C program

OK, that hangs for me after a while too! Using Linux 5.19.9.

from eio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.