Giter VIP home page Giter VIP logo

Comments (7)

gllort avatar gllort commented on August 11, 2024

I would suggest to upgrade first to version 4.1.7, there was a critical bug fix related to MPI tracing. Can you please try with 4.1.7 and let us know whether the issue is fixed in the new version?

from extrae.

julianmorillo avatar julianmorillo commented on August 11, 2024

Unfortunately, it is not fixed; it is the same Segmentation Fault problem. Just let me know if there is something more I can try (maybe in the line of commenting out //Backend_Flush_pThread (pthread_self()); as I did for the PTHREAD test).

from extrae.

julianmorillo avatar julianmorillo commented on August 11, 2024

Debugging the binary of the first MPI test with

jmorillo@arriesgado-6:~/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI$ gdb --args .libs/mpi_initfini_c_linked

I obtained that:

(gdb) run
Starting program: /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI/.libs/mpi_initfini_c_linked
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Welcome to Extrae 4.1.7
Extrae: Application has been linked or preloaded with Extrae, BUT neither EXTRAE_ON nor EXTRAE_CONFIG_FILE are set!
[Detaching after fork from child process 36899]
[New Thread 0x3ff4bff060 (LWP 36903)]
[New Thread 0x3ff41b6060 (LWP 36904)]

Thread 3 "mpi_initfini_c_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ff41b6060 (LWP 36904)]
0x0000003ff7fe48a4 in do_lookup_x (undef_name=undef_name@entry=0x3ff7e36830 <__func__.6> "writev", new_hash=new_hash@entry=633298886, old_hash=old_hash@entry=0x3ff41b50b8, ref=0x0, result=result@entry=0x3ff41b50c8, scope=0x3ff7fff560, i=1, version=version@entry=0x0, flags=flags@entry=0, skip=skip@entry=0x3ff7fd72b0, type_class=type_class@entry=0, undef_map=undef_map@entry=0x3ff7fd72b0) at ./elf/dl-lookup.c:363
363     ./elf/dl-lookup.c: No such file or directory.

Hope this can help understanding what the problem is. Printing a backtrace shows:

(gdb) bt
#0  0x000000155555d8a4 in do_lookup_x (undef_name=undef_name@entry=0x1555619830 <__func__.6> "writev", new_hash=new_hash@entry=633298886,
    old_hash=old_hash@entry=0x1558ba00b8, ref=0x0, result=result@entry=0x1558ba00c8, scope=0x1555578560, i=1, version=version@entry=0x0, flags=flags@entry=0,
    skip=skip@entry=0x155557d2b0, type_class=type_class@entry=0, undef_map=undef_map@entry=0x155557d2b0) at ./elf/dl-lookup.c:363
#1  0x000000155555e008 in _dl_lookup_symbol_x (undef_name=0x1555619830 <__func__.6> "writev", undef_map=0x155557d2b0, ref=0x1558ba0188, symbol_scope=<optimized out>,
    version=0x0, type_class=<optimized out>, flags=<optimized out>, skip_map=0x155557d2b0) at ./elf/dl-lookup.c:860
#2  0x00000015558acf40 in do_sym (handle=<optimized out>, name=0x1555619830 <__func__.6> "writev", who=0x15555c4356 <writev+84>, vers=vers@entry=0x0, flags=flags@entry=2)
    at ./elf/dl-sym.c:146
#3  0x00000015558ad118 in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195
#4  0x0000001555828bbc in dlsym_doit (a=a@entry=0x1558ba04b8) at ./dlfcn/dlsym.c:40
#5  0x00000015558ac86e in __GI__dl_catch_exception (exception=exception@entry=0x1558ba03f0, operate=0x1555828baa <dlsym_doit>, args=0x1558ba04b8)
    at ./elf/dl-error-skeleton.c:208
#6  0x00000015558ac8fc in __GI__dl_catch_error (objname=0x1558ba0458, errstring=0x1558ba0460, mallocedp=0x1558ba0457, operate=<optimized out>, args=<optimized out>)
    at ./elf/dl-error-skeleton.c:227
#7  0x0000001555828776 in _dlerror_run (operate=operate@entry=0x1555828baa <dlsym_doit>, args=args@entry=0x1558ba04b8) at ./dlfcn/dlerror.c:138
#8  0x0000001555828c1a in dlsym_implementation (dl_caller=<optimized out>, name=0x1555619830 <__func__.6> "writev", handle=0xffffffffffffffff) at ./dlfcn/dlsym.c:54
#9  ___dlsym (handle=handle@entry=0xffffffffffffffff, name=name@entry=0x1555619830 <__func__.6> "writev") at ./dlfcn/dlsym.c:68
#10 0x00000015555c4356 in writev (fd=<optimized out>, iov=0x1558ba0548, iovcnt=<optimized out>) at io_wrapper.c:1188
#11 0x0000001558adb74e in pmix_ptl_base_send_handler () from /opt/pmix/4.2.0/lib/libpmix.so.2
#12 0x0000001555db522c in ?? () from /home/jmorillo/arriesgado-jammy/extrae-4.1.7/src/tracer/.libs/libmpitrace-4.1.7.so

from extrae.

gllort avatar gllort commented on August 11, 2024

Can you please try reconfiguring Extrae with the flag "--disable-instrument-io" ? The backtrace suggests there's a problem intercepting the syscall "writev". This option disables the instrumentation of the whole family of I/O system calls, and this test will help us isolate the problem.

from extrae.

julianmorillo avatar julianmorillo commented on August 11, 2024

"--disable-instrument-io" did not solve the issue (still Segmentation Fault):

FAIL: mpi_initfini_c_linked_1proc.sh
====================================

Welcome to Extrae 4.1.7
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: <counters> tag at <MPI> level will be ignored. This library does not support CPU HW counters.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

./trace-static.sh: line 9: 1822586 Segmentation fault      (core dumped) $*
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[54668,1],0]
  Exit code:    139
--------------------------------------------------------------------------

from extrae.

julianmorillo avatar julianmorillo commented on August 11, 2024

The backtrace looks different, though...

jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI$ gdb --args .libs/mpi_initfini_c_linked
Reading symbols from .libs/mpi_initfini_c_linked...
(gdb) run
Starting program: /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI/.libs/mpi_initfini_c_linked
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Welcome to Extrae 4.1.7
Extrae: Warning! EXTRAE_HOME has not been defined!.
Extrae: Generating intermediate files for Paraver traces.
Extrae: Intermediate files will be stored in /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI
Extrae: Tracing buffer can hold 500000 events
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

[Detaching after fork from child process 1823937]
[New Thread 0x3fed744080 (LWP 1823941)]
[New Thread 0x3feccfb080 (LWP 1823942)]

Thread 3 "mpi_initfini_c_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3feccfb080 (LWP 1823942)]
0x0000003ff7fe4adc in do_lookup_x (undef_name=undef_name@entry=0x3fed57b7be "pmix_bfrops_base_unpack", new_hash=new_hash@entry=2551185065, old_hash=old_hash@entry=0x3feccfa148, ref=0x3fed570e88, result=result@entry=0x3feccfa158, scope=<optimized out>, i=24, version=version@entry=0x0, flags=flags@entry=5, skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x2aaab1a9d0) at ./elf/dl-lookup.c:431
431     ./elf/dl-lookup.c: No such file or directory.
(gdb) bt
#0  0x0000003ff7fe4adc in do_lookup_x (undef_name=undef_name@entry=0x3fed57b7be "pmix_bfrops_base_unpack", new_hash=new_hash@entry=2551185065,
    old_hash=old_hash@entry=0x3feccfa148, ref=0x3fed570e88, result=result@entry=0x3feccfa158, scope=<optimized out>, i=24, version=version@entry=0x0, flags=flags@entry=5,
    skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x2aaab1a9d0) at ./elf/dl-lookup.c:431
#1  0x0000003ff7fe5008 in _dl_lookup_symbol_x (undef_name=0x3fed57b7be "pmix_bfrops_base_unpack", undef_map=undef_map@entry=0x2aaab1a9d0, ref=ref@entry=0x3feccfa210,
    symbol_scope=<optimized out>, version=0x0, type_class=type_class@entry=1, flags=<optimized out>, skip_map=skip_map@entry=0x0) at ./elf/dl-lookup.c:860
#2  0x0000003ff7fe903c in _dl_fixup (l=0x2aaab1a9d0, reloc_arg=<optimized out>) at ./elf/dl-runtime.c:95
#3  0x0000003ff7fea88e in _dl_runtime_resolve () at ../sysdeps/riscv/dl-trampoline.S:61
Backtrace stopped: frame did not save the PC

from extrae.

julianmorillo avatar julianmorillo commented on August 11, 2024

Just in case you have the chance, it is pretty easy to reproduce:

jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ module list
No Modulefiles Currently Loaded.
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ ./configure --with-mpi=/apps/riscv/ubuntu/openmpi/4.1.5_gcc11.3.0/ --without-unwind --with-xml=/home/jmorillo/arriesgado-jammy/libxml2-v2.11.8-install --without-papi --enable-posix-clock --disable-instrument-io
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ make
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ make check

from extrae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.