Comments (7)
I would suggest to upgrade first to version 4.1.7, there was a critical bug fix related to MPI tracing. Can you please try with 4.1.7 and let us know whether the issue is fixed in the new version?
from extrae.
Unfortunately, it is not fixed; it is the same Segmentation Fault problem. Just let me know if there is something more I can try (maybe in the line of commenting out //Backend_Flush_pThread (pthread_self());
as I did for the PTHREAD test).
from extrae.
Debugging the binary of the first MPI test with
jmorillo@arriesgado-6:~/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI$ gdb --args .libs/mpi_initfini_c_linked
I obtained that:
(gdb) run
Starting program: /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI/.libs/mpi_initfini_c_linked
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Welcome to Extrae 4.1.7
Extrae: Application has been linked or preloaded with Extrae, BUT neither EXTRAE_ON nor EXTRAE_CONFIG_FILE are set!
[Detaching after fork from child process 36899]
[New Thread 0x3ff4bff060 (LWP 36903)]
[New Thread 0x3ff41b6060 (LWP 36904)]
Thread 3 "mpi_initfini_c_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ff41b6060 (LWP 36904)]
0x0000003ff7fe48a4 in do_lookup_x (undef_name=undef_name@entry=0x3ff7e36830 <__func__.6> "writev", new_hash=new_hash@entry=633298886, old_hash=old_hash@entry=0x3ff41b50b8, ref=0x0, result=result@entry=0x3ff41b50c8, scope=0x3ff7fff560, i=1, version=version@entry=0x0, flags=flags@entry=0, skip=skip@entry=0x3ff7fd72b0, type_class=type_class@entry=0, undef_map=undef_map@entry=0x3ff7fd72b0) at ./elf/dl-lookup.c:363
363 ./elf/dl-lookup.c: No such file or directory.
Hope this can help understanding what the problem is. Printing a backtrace shows:
(gdb) bt
#0 0x000000155555d8a4 in do_lookup_x (undef_name=undef_name@entry=0x1555619830 <__func__.6> "writev", new_hash=new_hash@entry=633298886,
old_hash=old_hash@entry=0x1558ba00b8, ref=0x0, result=result@entry=0x1558ba00c8, scope=0x1555578560, i=1, version=version@entry=0x0, flags=flags@entry=0,
skip=skip@entry=0x155557d2b0, type_class=type_class@entry=0, undef_map=undef_map@entry=0x155557d2b0) at ./elf/dl-lookup.c:363
#1 0x000000155555e008 in _dl_lookup_symbol_x (undef_name=0x1555619830 <__func__.6> "writev", undef_map=0x155557d2b0, ref=0x1558ba0188, symbol_scope=<optimized out>,
version=0x0, type_class=<optimized out>, flags=<optimized out>, skip_map=0x155557d2b0) at ./elf/dl-lookup.c:860
#2 0x00000015558acf40 in do_sym (handle=<optimized out>, name=0x1555619830 <__func__.6> "writev", who=0x15555c4356 <writev+84>, vers=vers@entry=0x0, flags=flags@entry=2)
at ./elf/dl-sym.c:146
#3 0x00000015558ad118 in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195
#4 0x0000001555828bbc in dlsym_doit (a=a@entry=0x1558ba04b8) at ./dlfcn/dlsym.c:40
#5 0x00000015558ac86e in __GI__dl_catch_exception (exception=exception@entry=0x1558ba03f0, operate=0x1555828baa <dlsym_doit>, args=0x1558ba04b8)
at ./elf/dl-error-skeleton.c:208
#6 0x00000015558ac8fc in __GI__dl_catch_error (objname=0x1558ba0458, errstring=0x1558ba0460, mallocedp=0x1558ba0457, operate=<optimized out>, args=<optimized out>)
at ./elf/dl-error-skeleton.c:227
#7 0x0000001555828776 in _dlerror_run (operate=operate@entry=0x1555828baa <dlsym_doit>, args=args@entry=0x1558ba04b8) at ./dlfcn/dlerror.c:138
#8 0x0000001555828c1a in dlsym_implementation (dl_caller=<optimized out>, name=0x1555619830 <__func__.6> "writev", handle=0xffffffffffffffff) at ./dlfcn/dlsym.c:54
#9 ___dlsym (handle=handle@entry=0xffffffffffffffff, name=name@entry=0x1555619830 <__func__.6> "writev") at ./dlfcn/dlsym.c:68
#10 0x00000015555c4356 in writev (fd=<optimized out>, iov=0x1558ba0548, iovcnt=<optimized out>) at io_wrapper.c:1188
#11 0x0000001558adb74e in pmix_ptl_base_send_handler () from /opt/pmix/4.2.0/lib/libpmix.so.2
#12 0x0000001555db522c in ?? () from /home/jmorillo/arriesgado-jammy/extrae-4.1.7/src/tracer/.libs/libmpitrace-4.1.7.so
from extrae.
Can you please try reconfiguring Extrae with the flag "--disable-instrument-io" ? The backtrace suggests there's a problem intercepting the syscall "writev". This option disables the instrumentation of the whole family of I/O system calls, and this test will help us isolate the problem.
from extrae.
"--disable-instrument-io" did not solve the issue (still Segmentation Fault):
FAIL: mpi_initfini_c_linked_1proc.sh
====================================
Welcome to Extrae 4.1.7
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: <counters> tag at <MPI> level will be ignored. This library does not support CPU HW counters.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads
./trace-static.sh: line 9: 1822586 Segmentation fault (core dumped) $*
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[54668,1],0]
Exit code: 139
--------------------------------------------------------------------------
from extrae.
The backtrace looks different, though...
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI$ gdb --args .libs/mpi_initfini_c_linked
Reading symbols from .libs/mpi_initfini_c_linked...
(gdb) run
Starting program: /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI/.libs/mpi_initfini_c_linked
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Welcome to Extrae 4.1.7
Extrae: Warning! EXTRAE_HOME has not been defined!.
Extrae: Generating intermediate files for Paraver traces.
Extrae: Intermediate files will be stored in /home/jmorillo/arriesgado-jammy/extrae-4.1.7/tests/functional/tracer/MPI
Extrae: Tracing buffer can hold 500000 events
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads
[Detaching after fork from child process 1823937]
[New Thread 0x3fed744080 (LWP 1823941)]
[New Thread 0x3feccfb080 (LWP 1823942)]
Thread 3 "mpi_initfini_c_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3feccfb080 (LWP 1823942)]
0x0000003ff7fe4adc in do_lookup_x (undef_name=undef_name@entry=0x3fed57b7be "pmix_bfrops_base_unpack", new_hash=new_hash@entry=2551185065, old_hash=old_hash@entry=0x3feccfa148, ref=0x3fed570e88, result=result@entry=0x3feccfa158, scope=<optimized out>, i=24, version=version@entry=0x0, flags=flags@entry=5, skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x2aaab1a9d0) at ./elf/dl-lookup.c:431
431 ./elf/dl-lookup.c: No such file or directory.
(gdb) bt
#0 0x0000003ff7fe4adc in do_lookup_x (undef_name=undef_name@entry=0x3fed57b7be "pmix_bfrops_base_unpack", new_hash=new_hash@entry=2551185065,
old_hash=old_hash@entry=0x3feccfa148, ref=0x3fed570e88, result=result@entry=0x3feccfa158, scope=<optimized out>, i=24, version=version@entry=0x0, flags=flags@entry=5,
skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x2aaab1a9d0) at ./elf/dl-lookup.c:431
#1 0x0000003ff7fe5008 in _dl_lookup_symbol_x (undef_name=0x3fed57b7be "pmix_bfrops_base_unpack", undef_map=undef_map@entry=0x2aaab1a9d0, ref=ref@entry=0x3feccfa210,
symbol_scope=<optimized out>, version=0x0, type_class=type_class@entry=1, flags=<optimized out>, skip_map=skip_map@entry=0x0) at ./elf/dl-lookup.c:860
#2 0x0000003ff7fe903c in _dl_fixup (l=0x2aaab1a9d0, reloc_arg=<optimized out>) at ./elf/dl-runtime.c:95
#3 0x0000003ff7fea88e in _dl_runtime_resolve () at ../sysdeps/riscv/dl-trampoline.S:61
Backtrace stopped: frame did not save the PC
from extrae.
Just in case you have the chance, it is pretty easy to reproduce:
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ module list
No Modulefiles Currently Loaded.
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ ./configure --with-mpi=/apps/riscv/ubuntu/openmpi/4.1.5_gcc11.3.0/ --without-unwind --with-xml=/home/jmorillo/arriesgado-jammy/libxml2-v2.11.8-install --without-papi --enable-posix-clock --disable-instrument-io
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ make
jmorillo@arriesgado-5:~/arriesgado-jammy/extrae-4.1.7$ make check
from extrae.
Related Issues (20)
- sleep needs unistd.h
- MPI test failures
- cannot build on opensuse HOT 2
- Fortran MPI+OpenMP mpi2dim: Error! unregistered event type HOT 3
- Can not get user-functions HOT 5
- Extrae ignores config files HOT 4
- xml2 cflags aren't propagated correctly HOT 8
- PTHREAD test fails with Segmentation fault in Extrae 4.1.6 HOT 3
- Segmentation fault occurred when running with PAPI HOT 3
- Avoid hw-counters tests in the `make check` step?
- Internal Error: insert(): Duplicate key found! - using Extrae_event HOT 2
- `Extrae_define_event_type` does not honor value range of `extrae_value_t`
- `Extrae_define_event_type` does not honor value range of `extrae_type_t`
- Extrae doesn't build with GCC versions >= 14
- Non-clear configure options on intercepting LLVM OpenMP Runtime symbols
- Missing script in distribution available at tools.bsc.es HOT 1
- Missing documentation on src/tracer/wrappers/OMP/genstubs-*.sh HOT 2
- `run_overhead_tests.sh` doesn't work properly when using a minimal installation of extrae
- Figure 13.1 in https://tools.bsc.es/doc/html/extrae/overhead.html isn't rendered
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from extrae.