Comments (15)
I will open a bug upstream in GCC for P2.
The issue was fixed upstream for GCC14, see:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111413#c2
- https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=15345980633c502f0486a2e40e96224f49134130
- gcc-mirror/gcc@1534598
The patch is huge, but avoiding to to indentation changes it can be summarized to single line change, that for backport can be more adapt to reduce the risk of patch conflicts.
from ctng-compilers-feedstock.
OK, sorry about that. I followed your "downstream issue" a bit, that's why I got to ipopt. If this happens purely with python+libgomp, then I'm more stumped (I thought it was something about setting/parsing the OMP_*
options). I fail to imagine how the commit you referenced would touch the ABI, but perhaps that's the case. Might be interesting to rebuild python with gcc 13 to see if that changes anything?
from ctng-compilers-feedstock.
I found another issue that contains a segfault in libgomp's initialize_env() weechat/weechat#2009 , if I got it correctly it happens again with libgomp 13.2.0 , but with Fedora 39.
The issue here seems to happen even with PHP. I wonder if it happens in general when using
dlopen
I tested with casadi, and the issue did not happened when using a simple C++ example (I tested https://github.com/casadi/casadi/blob/main/docs/examples/cplusplus/ipopt_nl.cpp).
from ctng-compilers-feedstock.
Ok, I think this is the combination of two different behaviour/problems:
- P1: constructors of shared library opened by dlopen with RTLD_DEEPBIND on Python on conda-forge/Debian have
environ==NULL
- I am not sure why this happens, and if it is expected behaviour or a bug
- P2: libgomp >= 13 segfaults if
environ==NULL
- This behaviour I think it is a bug of libgomp, as
environ==NULL
is a valid state, for example caused by callingclearenv()
on Linux.
- This behaviour I think it is a bug of libgomp, as
P2 can be reproduced easily on libgomp >= 13 with this MWE:
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
int main () {
clearenv();
void * handle = dlopen("libgomp.so.1", RTLD_NOW);
if (handle) {
fprintf(stderr, "dlopen of libgomp.so.1 done correctly.\n");
return EXIT_SUCCESS;
} else {
fprintf(stderr, "dlopen of libgomp.so.1 failed with error: %s.\n", dlerror());
return EXIT_SUCCESS;
}
return EXIT_SUCCESS;
}
to run:
gcc -ldl test_gomp_segfault.c -o test_gomp_segfault
./test_gomp_segfault
I will open a bug upstream in GCC for P2.
from ctng-compilers-feedstock.
libgomp <= 12 is used
Indeed, it seems that the problematic piece of code was only introduced in libgomp 13 : gcc-mirror/gcc@9f2fca5 .
from ctng-compilers-feedstock.
Based on the patch you found, it seems to have something to do with parsing OMP_*
environment variables?
I saw that the ipopt-feedstock sets
# Environment variables needed by spral
# See https://github.com/ralna/spral#usage-at-a-glance
export OMP_CANCELLATION=TRUE
export OMP_PROC_BIND=TRUE
In particular, from the commit you linked that introduced the new facility for host vs. device, it seems to me that:
- The parsing for
OMP_PROC_BIND
changed substantially (as opposed toOMP_CANCELLATION
) - While things clearly shouldn't break, the code does warn for invalid values, so trying to rebuild the affected stack against libgomp 13.x would probably be a good idea.
- Out of all the test cases in that commit, none of them has a value of TRUE for
OMP_PROC_BIND
, but rather things like:"spread"
,"close"
,"spread,spread"
,"spread,close"
from ctng-compilers-feedstock.
I am not sure this is related to ipopt/spral. The environment in which this happens reported in #114 (comment) is created with mamba create -n testsegfault libgomp python
, and in that environment no OMP_*
variable are defined.
While things clearly shouldn't break, the code does warn for invalid values, so trying to rebuild the affected stack against libgomp 13.x would probably be a good idea.
Just to understand, which stack? The problem occurs just by combining libgomp and python, and I do not think that python depends on libgomp .
from ctng-compilers-feedstock.
I found another issue that contains a segfault in libgomp's initialize_env() weechat/weechat#2009 , if I got it correctly it happens again with libgomp 13.2.0 , but with Fedora 39.
from ctng-compilers-feedstock.
I reproduced the issue in Debian and Ubuntu distro with apt-packages that contain gomp 13, while earlier distros with gomp 12 all pass fine: https://github.com/traversaro/reproduce-python-gomp-deepbind-issue/actions/runs/6172933871. On the other hand, Fedora 38 has gomp 13.2.0, but does not reproduce the error, similarly also latest arch does not reproduce the problem.
from ctng-compilers-feedstock.
I found another issue that contains a segfault in libgomp's initialize_env() weechat/weechat#2009 , if I got it correctly it happens again with libgomp 13.2.0 , but with Fedora 39.
The issue here seems to happen even with PHP. I wonder if it happens in general when using dlopen
from ctng-compilers-feedstock.
I found another issue that contains a segfault in libgomp's initialize_env() weechat/weechat#2009 , if I got it correctly it happens again with libgomp 13.2.0 , but with Fedora 39.
The issue here seems to happen even with PHP. I wonder if it happens in general when using
dlopen
I tested with casadi, and the issue did not happened when using a simple C++ example (I tested https://github.com/casadi/casadi/blob/main/docs/examples/cplusplus/ipopt_nl.cpp).
Just to be sure I created a minimal C-based test, and indeed the issue does not appear to happen with that, see https://github.com/traversaro/reproduce-python-gomp-deepbind-issue/actions/runs/6174144211 and https://github.com/traversaro/reproduce-python-gomp-deepbind-issue/blob/main/test.c .
from ctng-compilers-feedstock.
I was able to reproduce the problem without libgomp, just with a manually coded shared lib, i.e. testso.c
:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h> // Include this header for environ
extern char **environ; // Declare extern environ
static void __attribute__((constructor))
initialize_env (void)
{
char **env;
fprintf(stderr, "Print debug\n", *env);
env = environ;
fprintf(stderr, "environ %p env %p\n", env, environ);
for (env = environ; *env != 0; env++)
{
fprintf(stderr, "%s\n", *env);
}
return;
}
gcc -shared -fPIC testso.c -o testso.so
(testsegfault) traversaro@IITICUBLAP257:~/test_ipopt_dir$ python -c "import ctypes; import os; ctypes._dlopen('./testso.so', os.RTLD_DEEPBIND)"
Print debug
environ (nil) env (nil)
Segmentation fault
While in normal use:
Trying to load with RTLD_LAZY|RTLD_DEEPBIND ./testso.so
Print debug
environ 0x7ffcf93cefc0 env 0x7ffcf93cefc0
For some reason the environ
global variable is set to 0
/NULL
.
So perhaps we should move the issue to Python feedstock?
from ctng-compilers-feedstock.
I will open a bug upstream in GCC for P2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111413
from ctng-compilers-feedstock.
Great job!
The patch is huge, but avoiding to to indentation changes it can be summarized to single line change
Proof of that statement, using Github's UI.
from ctng-compilers-feedstock.
P1: constructors of shared library opened by dlopen with RTLD_DEEPBIND on Python on conda-forge/Debian have
environ==NULL
* I am not sure why this happens, and if it is expected behaviour or a bug
It turns that also this was working fine in gomp <= 12 and it does not work in gomp 13, so I opened an issue also for that: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111556 . However, to be honest I am not sure if this is a problem in libgomp, in glibc or simply a problem of how ELF and the POSIX spec interact.
from ctng-compilers-feedstock.
Related Issues (20)
- Update to 9.5.0 HOT 1
- @conda-forge-admin rerender HOT 1
- [Bug] `gxx` 11.2.0 is not compatible with `nvidia::cuda-nvcc`
- libstdc++ is missing `gdb` pretty-printer integration HOT 6
- old glibc forcing syscall HOT 5
- ldd wrapper? HOT 1
- @conda-forge-admin, rerender HOT 1
- libgccjit support HOT 2
- Fortran compiler lacks ISO_Fortran_binding.h HOT 1
- Debug symbols for libstdc++ in gxx_impl_linux-64 HOT 1
- tzdb support in libstdcxx 14
- Conda Error during Conda update.
- ppc64le: Issue with gcc-12 and gcc-13 HOT 7
- gcc-ar is missing LTO plugin HOT 2
- C++ libs for mingw - use vc14_runtime? HOT 1
- @conda-forge-admin rerender HOT 1
- libGL Error with Ubuntu 22.04 HOT 60
- latest gcc / gxx 7.5.0 builds have `GLIBC_2.14` symbols HOT 1
- Assembly error (on Linux ARM): Error: unknown pseudo-op: `.alig' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctng-compilers-feedstock.