Giter VIP home page Giter VIP logo

Comments (23)

plicease avatar plicease commented on July 18, 2024 1

Perhaps the FFI::Platypus closures are not thread safe? It could also be GDAL that is not thread safe.

It's probably only safe if the Perl is threaded and the os calling thread also has a Perl interpreter on it.

There is a test here:

https://github.com/PerlFFI/FFI-Platypus/blob/81d8c6910287aa61170f51f5cc261657f3ed1179/t/threads.t

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

It might help to use a state variable to declare $instance inside the new sub, rather than a package lexical as currently used.

Possibly related: #5

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Although using a state var would require a reasonable amount of re-engineering now that I look.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

It is not evident for me why a singleton in this case needs to be used, but for efficiency (maybe is it related to FFI implementaton?). The problem here is the use of new() in the BEGIN section that has side effects, not the singleton per se, anyway.
Eventually, I would consider the IO::Socket::SSL implementation case which has the same type of problems in initialization (needs to be done once in the main thread and never in other ones). In my code I experienced thread issues with HTTP::Tiny, but I solved by simply adopting a closure with a new() per thread even with SSL connections, for instance. Unfortunately that implementation needs to consider explicitly threads case if evailable at Config level.

from geo-gdal-ffi.

ajolma avatar ajolma commented on July 18, 2024

I guess the singleton is just for convenience, so that the user does not need to declare and initialize it.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Just for reference, I added the following patch to the Debian package for Geo::GDAL::FFI, currently available in unstable:

https://salsa.debian.org/perl-team/modules/packages/libgeo-gdal-ffi-perl/-/blob/debian/latest/debian/patches/threads-fix.patch

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Sigh, the patch seems to mitigate the problem, but it does not solve the issue entirely. I just saw the same issue appearing again randomly in my code. It Is definitively an heisenbug. It disappears by not initializing the $instance in the FFI.pm module, but I'm not sure again if it is a true solution. I will try to dive into the problem better.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Are you able to provide test code to reproduce?

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Are you able to provide test code to reproduce?

The code is public on the net in our official GitLab instance, but it is too complex for a test case. I could try to prepare something that forces the failure and possibly segfaults (it happens, too). Maybe reinstating the singleton in the BEGIN section could suffice.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Can you provide a link to the code?

Also, is Geo::GDAL::FFI loaded before or after Thread::Queue in your code?

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

I managed to write a simple script that fails with Perl 5.36 here:

snippet.pl

$ /tmp/snippet.pl 
thread in1: pushed 36.0724123035656
thread out1: popped 36.0724123035656
thread in2: pushed 56.6034915697859
thread out3: popped 56.6034915697859
thread in3: pushed 21.4827805831195
thread out5: popped 21.4827805831195
thread in4: pushed 58.0341478650556
thread out1: popped 58.0341478650556
thread in5: pushed 94.7460995362238
thread out4: popped 94.7460995362238
Scalars leaked: -3
Attempt to free unreferenced scalar: SV 0x55eee135c618, Perl interpreter: 0x55eee3f44de0 during global destruction.
Attempt to free unreferenced scalar: SV 0x55eee135c5d0, Perl interpreter: 0x55eee3f44de0 during global destruction.
Attempt to free unreferenced scalar: SV 0x55eee135c510, Perl interpreter: 0x55eee3f44de0 during global destruction.
Annullato

This is with the initialization of $instance in the BEGIN section, which is the code currently in the master branch. Changing the sleeping time or even repeating the run more times in a row, one can get only the Scalars leaked or even the other three messages.
Of course, order of the use clauses is not important. A constant is the number (3) of leaks. Typically the script crashes at the first join() in the main thread.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Windows gives a "free to wrong pool" error. gdb backtrace is below.

I wonder if this is something in FFI::Platypus?

(gdb) bt
#0  0x00007ffed33847ab in perl538!Perl_drand48_r () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#1  0x00007fff141e2ccb in ?? () from C:\perls\5.38.0.1_PDL\perl\vendor\lib\auto\FFI\Platypus\Platypus.xs.dll
#2  0x00007ffed3322fec in perl538!Perl_clear_defarray () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#3  0x00007ffed3377932 in perl538!Perl_runops_standard () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#4  0x00007ffed332d2ad in perl538!Perl_call_sv () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#5  0x00007ffed32da70b in perl538!Perl_newRV () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#6  0x00007ffed32da91d in perl538!Perl_sv_clear () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#7  0x00007ffed32db0a9 in perl538!Perl_sv_free2 () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#8  0x00007ffed32d889e in perl538!Perl_rcpv_copy () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#9  0x00007ffed32db341 in perl538!Perl_sv_free2 () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#10 0x00007ffed332f926 in perl_destruct () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#11 0x00007fff1d2d1693 in ?? () from C:\perls\5.38.0.1_PDL\perl\lib\auto\threads\threads.xs.dll
#12 0x00007fff1d2d2c20 in ?? () from C:\perls\5.38.0.1_PDL\perl\lib\auto\threads\threads.xs.dll
#13 0x00007ffed3322fec in perl538!Perl_clear_defarray () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#14 0x00007ffed3377932 in perl538!Perl_runops_standard () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#15 0x00007ffed333363a in perl_run () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#16 0x00007ffed3386e23 in perl538!RunPerl () from C:\perls\5.38.0.1_PDL\perl\bin\perl538.dll
#17 0x00007ff79d751340 in ?? ()
#18 0x00007ff79d751146 in ?? ()
#19 0x00007fff3d5f257d in KERNEL32!BaseThreadInitThunk () from C:\WINDOWS\System32\kernel32.dll
#20 0x00007fff3ebaaa78 in ntdll!RtlUserThreadStart () from C:\WINDOWS\SYSTEM32\ntdll.dll
#21 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

And FWIW, adding this line to the create sub gets things to work.

eval 'require Geo::GDAL::FFI';

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024
eval 'require Geo::GDAL::FFI';

Ah, that was exactly the magic illusion that moving the initialization out of the BEGIN block gave to me before. Unfortunately, it could reappear later on, when it hurts more :-) What is evident is that it does not relate directly to some initialization done for every Perl thread because the magic number (3) and the number of threads do not coincide. To be safe I would try to move the init phase only into the main thread.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Unfortunately, it could reappear later on, when it hurts more :-)

A pity.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

I found that apparently after moving off the BEGIN block the new() sub, no errors appear after commenting out

#$ffi->load_custom_type('::StringPointer' => 'string_pointer');

and all related attaches. I guess the whole problem is due to some oddities in FFI::Platypus that are still obscure for me.
PS:

NOTE: As of version 0.61, this custom type is now deprecated since pointers to strings are supported in the [FFI::Platypus](https://metacpan.org/pod/FFI::Platypus) directly without custom types.

This module provides a [FFI::Platypus](https://metacpan.org/pod/FFI::Platypus) custom type for pointers to strings.

This is in FFI::Platypus::Type::StringPointer

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Wow! Apparently I got it. It is enough to use string* instead of string_pointer (and remove the load_custom_type) to have the master working with the snippet code, even with the original BEGIN block. \o/

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

That's good news.

Would you be able to submit a new PR?

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

@plicease - FYI.

It's a longish thread but #58 (comment) and #58 (comment) have the key diagnoses.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Although I still get the free to wrong pool error using Strawberry Perl 5.38 and 5.32 after I replace string_pointer with string* and comment out the load_custom_type line.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Although I still get the free to wrong pool error using Strawberry Perl 5.38 and 5.32 after I replace string_pointer with string* and comment out the load_custom_type line.

Yep, the last step is understanding what's wrong with the SetErrorHandling() call. For instance, if you comment out that step and move it after the thread creation, the whole script works. The string_pointer use instead seems the cause of the reference leaking. I will do some other trials because thread issues are always challenging with timing.

from geo-gdal-ffi.

shawnlaffan avatar shawnlaffan commented on July 18, 2024

Commenting out the CPLPushErrorHandler avoids the crash.

CPLPushErrorHandler($instance->{CPLErrorHandler});

Perhaps the FFI::Platypus closures are not thread safe? It could also be GDAL that is not thread safe.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

GDAL is mostly thread-safe. I already tried with CPLPushErrorHandler (only main thread) and CPLSetErrorHandler (all threads) and sticking of the closure (as strongly suggested in the doc): no results. Definitively I guess it can be called only and strictly after threads creation. See for instance, my fresh t/threads.t for a PR ;-)
here
Of course, that needs to be documented. That said, I'm still confused about why and how the old binding deals with that (probably @ajolma can enlight us on that). That's because I'm currently using the same code in production with the old binding in Debian 11 and it works like a charm.
And talking about that, this is the piece of code where I found problems with multi-threading.

from geo-gdal-ffi.

fpl avatar fpl commented on July 18, 2024

Ok, if it is acceptable, I'm going to close PR #69 and create two new PRs: one for removing the string_pointer use and the other for thread-safety that introduces the init() and modifies the new() into a per-thread method. All with appropriate changes to documentation, too.

EDIT: in the meantime, I verified that the string_pointer use is the cause of the SV leaks in multi-thread context (and possibly other cases), so that is also a prereq to restructure the code for thread safety.

from geo-gdal-ffi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.