Giter VIP home page Giter VIP logo

Comments (7)

mitchblank avatar mitchblank commented on September 27, 2024

Have you considered using gcc's __builitin_expect() (i.e. the guts of the common likely()/unlikely() macros)? I've found that helps the icache issues quite a bit, since the compiler will lay all of the hotpath code blocks together.

gcc is very aggressive about inlining static functions, so be sure to add the never_inline attribute if you're sure you want it out-of-line. With builtin_expect() it's usually OK to just let it inline what it wants, although it's possible that 32-bit x86 will get better register allocation with a smaller function.

from jemalloc.

jasone avatar jasone commented on September 27, 2024

Thanks, I will definitely keep likely()/unlikely() in mind for places where refactoring cold code into separate functions is unwieldy.

It definitely used to be the case (10+ years ago) that putting a static function implementation after the function that calls it would prevent inlining. However, I'm not sure whether this was due to a compiler limitation or a language compliance issue.

from jemalloc.

mitchblank avatar mitchblank commented on September 27, 2024

unlikely() still helps with an out-of-line function call. There are always a bunch of instructions needed to save registers, rearrange the arguments, etc needed before and after the actual "call". If you do unlikely() annotation then the compiler will put all that stuff in a branch path after the main function body.

I am a bit of an evangelist for unlikely() -- I picked up the habit while doing some linux kernel work many moons ago. Now I use it constantly. I actually find it makes my code more readable by making the fast-paths clear to humans reading the code. The beautifully icache-optimized output is a side benefit, of course.

If you are going to split things into separate functions, then you might also want to consider using the (gcc >= 4.3) function attributes "hot" and "cold". I haven't used them myself but they can hint the optimizer and supposedly move the coldpath functions into their own text section. Also, if a function is pre-declared as "cold" any branches that lead to it being called will be treated just as if they had been marked unlikely(). Again, haven't tested it myself.

I hadn't thought about ordering static functions so they appear after the caller (I always try to order my code to avoid having to pre-declare them) I suspect that trick does work on many compilers, although I'm sure that's not a language compliance issue. The compiler is always allowed to inline if it feels like it (in theory even if the function is in a different translation unit with LTO!) The only thing that can force an out-of-line copy to be emitted is if you take the address of the function somewhere, but even then the compiler can inline individual calls to the function if it really wants to.

from jemalloc.

jasone avatar jasone commented on September 27, 2024

A lot of recent restructuring has significantly improved the fast path (both in terms of speed and code clarity), but likely()/unlikely() is still worth investigating before calling this "done".

from jemalloc.

thestinger avatar thestinger commented on September 27, 2024

I tried conservatively applying unlikely for some obvious cases like running under Valgrind, and it does result in a tiny improvement. I don't know if it would a good idea to apply it to code paths that are infrequent or expensive like the incremental garbage collection. It's too hard to measure the impact for an individual case (perhaps Callgrind?).

from jemalloc.

jasone avatar jasone commented on September 27, 2024

I did some quick measurements for #120 based on test/stress/benchmark, and saw speedups of 1.03X to 1.09X (OS X Mavericks on a Haswell-based laptop). That's pretty compelling, thanks!

I'm going to add likely()/unlikely() to prof-related functionality as well. There are probably some opportunities in the tcache code as well, but I'm going to hold off on that because tcache is likely to get a major rework soon.

from jemalloc.

jasone avatar jasone commented on September 27, 2024

With the addition of 9c640bf (ended up optimizing the entire fast path, including tcache), the speedup for test/stress/benchmark is ~1.3X (OS X Mavericks, llvm). When running Ubuntu 13.10 (gcc 4.8.1) in a VM on the same machine, the speedup is much less, ~1.04X, but it's a clear improvement in any case.

from jemalloc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.