Giter VIP home page Giter VIP logo

Comments (12)

ikeydoherty avatar ikeydoherty commented on May 22, 2024

That's a fairly old bug report @amonakov but thank you for bringing it up. I'm happy to do some benchmarking if you have a copy of the original test program, so we can test it against liblsi-intercept
to see if the fact still stands. (I've not noticed any noticeable slowdowns here tbf.)

If there is still a slowdown, then we'll just patch glibc and document that, as the module only implements la_objsearch - any slow-down would be very much a bug in the libc implementation and not LSI.

from linux-steam-integration.

amonakov avatar amonakov commented on May 22, 2024

The original test program is attached to the aforementioned bug report; here's a direct link to the attachment: https://sourceware.org/bugzilla/attachment.cgi?id=7044

Note that if your toolchain enables hardening by default (-z relro -z now) you won't see the slowdown because the test program won't use PLT (but games aren't usually compiled like that).

The reason I've brought it up is exactly because this Glibc bug remains unfixed.

If you prefer to patch Glibc on your end, what would be your recommendation to people packaging this on other distros?

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

I'm definitely seeing a minor regression with your test case here:

0.14user 0.00system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 5904maxresident)k
0inputs+0outputs (0major+69minor)pagefaults 0swaps
time env LD_AUDIT=./libaudit.so ./main
1.15user 0.00system 0:01.16elapsed 99%CPU (0avgtext+0avgdata 9104maxresident)k
0inputs+0outputs (0major+158minor)pagefaults 0swaps

However, when I build your libaudit with the distro CFLAGS:

cc    -c -o main.o main.c
cc  -o main main.o -lm
cc    -c -o libaudit.o libaudit.c
cc "-g2 -O3 -pipe -fPIC -Wformat -Wformat-security -fno-omit-frame-pointer -fexceptions -D_FORTIFY_SOURCE=2 -fstack-protector --param ssp-buffer-size=32 -fasynchronous-unwind-tables -ftree-vectorize -feliminate-unused-debug-types -Wall -Wno-error -Wp,-D_REENTRANT" -shared -o libaudit.so libaudit.o
time ./main
0.03user 0.00system 0:00.03elapsed 100%CPU (0avgtext+0avgdata 5728maxresident)k
0inputs+0outputs (0major+67minor)pagefaults 0swaps
time env LD_AUDIT=./libaudit.so ./main
0.03user 0.00system 0:00.04elapsed 95%CPU (0avgtext+0avgdata 9296maxresident)k
0inputs+0outputs (0major+159minor)pagefaults 0swaps

Note that libaudit is being built with the CFLAGS, not the binary (representing a proprietary game).
Also note changing -O3 to the normalised package -O2 has zero difference.

If I reintroduce your -fno-builtin-sqrt call, then the regression is back:

CFLAGS='-fno-builtin-sqrt' make
cc -fno-builtin-sqrt   -c -o main.o main.c
cc -fno-builtin-sqrt -o main main.o -lm
cc -fno-builtin-sqrt   -c -o libaudit.o libaudit.c
cc "-g2 -O2 -pipe -fPIC -Wformat -Wformat-security -fno-omit-frame-pointer -fexceptions -D_FORTIFY_SOURCE=2 -fstack-protector --param ssp-buffer-size=32 -fasynchronous-unwind-tables -ftree-vectorize -feliminate-unused-debug-types -Wall -Wno-error -Wp,-D_REENTRANT" -shared -o libaudit.so libaudit.o
time ./main
0.12user 0.00system 0:00.12elapsed 100%CPU (0avgtext+0avgdata 5904maxresident)k
0inputs+0outputs (0major+69minor)pagefaults 0swaps
time env LD_AUDIT=./libaudit.so ./main
1.09user 0.00system 0:01.09elapsed 99%CPU (0avgtext+0avgdata 8736maxresident)k
0inputs+0outputs (0major+154minor)pagefaults 0swaps

Thus I assume this is more about symbol resolution time, thus, I hacked the demo to call some gtk_ calls:

+ ./main

real	0m0.128s
user	0m0.113s
sys	0m0.010s
+ env LD_AUDIT=./libaudit.so ./main

real	0m0.150s
user	0m0.135s
sys	0m0.011s

Even building everything with hardening didn't make a significant difference after.

Finally, after installing your patch, even with a hardened toolchain (which Solus uses by default), and having done tests with full relro on the main binary and audit lib, and finally replacing it with the LSI lib:

+ ./main

real	0m0.127s
user	0m0.112s
sys	0m0.011s
+ env LD_AUDIT=/usr/lib64/liblsi-intercept.so ./main

real	0m0.128s
user	0m0.113s
sys	0m0.010s

Basically, we need the rtld-audit interface, and we also need your patch. Given that LSI is aimed at distribution integrators, my hope is that they also integrate your patch (we can add this to Solus without issue). It seems your original patch thread died out, perhaps now is the time to upstream it so that all the distributions benefit from it?

Distributions like Ubuntu are more willing to import an out of series patch to fix a bug when it has already landed in the VCS of the upstream project. :)

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

Oh, and as a final metric, using your installed patches and your original test:

0.13user 0.00system 0:00.13elapsed 99%CPU (0avgtext+0avgdata 5936maxresident)k
0inputs+0outputs (0major+70minor)pagefaults 0swaps
time env LD_AUDIT=./libaudit.so ./main
0.12user 0.00system 0:00.12elapsed 99%CPU (0avgtext+0avgdata 9472maxresident)k
0inputs+0outputs (0major+163minor)pagefaults 0swaps

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

glibc patch import into Solus: https://dev.solus-project.com/R927:afa5b639e8a9b62618457a304d1e6fb42a9f2066

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

Thinking further on this, and correct me if I'm wrong, but the performance regression should only come from initial symbol resolution, thus affecting startup time and module load time, right? During the initial mapping.

Anyway, this further illustrates the need for a self contained LSI bundle that is free from distro issues..

from linux-steam-integration.

amonakov avatar amonakov commented on May 22, 2024

Thinking further on this, and correct me if I'm wrong, but the performance regression should only come from initial symbol resolution, thus affecting startup time and module load time, right? During the initial mapping.

No, of course not, please read main.c in the testcase (note that it deliberately calls the same function in a loop many times to highlight the issue) and the initial report. The issue is that every runtime call that goes via PLT gets slower, not just initial calls!

If only initial calls get slower, that's not a major issue for games in the first place.

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

Ah well that's not good at all. Just read properly through _dl_relocate_object, apologies, not awake that long. :)

OK so I'm going to document this issue within the README, just so integrators know the story. Obviously it would be fantastic if upstream accepts your patch (thank you for that!). FWIW LSI does allow you to turn off the intercept module, which may actually come in useful for those wanting to do benchmarks with and without the patch inside the games themselves.

FWIW I'm aware of the pressure on distributions when faced with integrating Steam, and it is becoming a heavy burden for them. This is why I'm looking to third party application systems with the view of building a specialised (ABI compatible) runtime containing a strict-mode LSI (and your glibc patch ofc!) that would effectively be a Solus-based runtime to provide the same Steam experience everywhere, even on distributions not supporting multilib.

In these third party systems we can ensure only our own libraries are used, and there is no more cross contamination, and distributions wouldn't have to worry about these issues anymore. :)

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

^ I've documented this in the README - if you feel it needs more clarification or details, please let me know :)

from linux-steam-integration.

amonakov avatar amonakov commented on May 22, 2024

Users with older AVX-capable CPUs, especially the famous SandyBridge generation (i5-2500 and such) should especially beware, since there the penalty due to this issue is the highest. My test indicates roughly extra 420 cycles per call (this very high!), of those 140 I believe are twice 70 cycles avx transition penalty; didn't try to accurately analyze the rest.

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

Damn - very common CPU too.

from linux-steam-integration.

ikeydoherty avatar ikeydoherty commented on May 22, 2024

Gonna close this now as the issue is documented, Solus is patched, and we're gonna provide a Snap with a patched glibc.

from linux-steam-integration.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.