Giter VIP home page Giter VIP logo

jemalloc's Introduction

jemalloc is a general purpose malloc(3) implementation that emphasizes
fragmentation avoidance and scalable concurrency support.  jemalloc first came
into use as the FreeBSD libc allocator in 2005, and since then it has found its
way into numerous applications that rely on its predictable behavior.  In 2010
jemalloc development efforts broadened to include developer support features
such as heap profiling and extensive monitoring/tuning hooks.  Modern jemalloc
releases continue to be integrated back into FreeBSD, and therefore versatility
remains critical.  Ongoing development efforts trend toward making jemalloc
among the best allocators for a broad range of demanding applications, and
eliminating/mitigating weaknesses that have practical repercussions for real
world applications.

The COPYING file contains copyright and licensing information.

The INSTALL file contains information on how to configure, build, and install
jemalloc.

The ChangeLog file contains a brief summary of changes for each release.

URL: https://jemalloc.net/

jemalloc's People

Contributors

azat avatar bmaurer avatar cmuellner avatar cpeterso avatar daverigby avatar davidtgoldblatt avatar deadalnix avatar devnexen avatar eggpi avatar georgthegreat avatar glandium avatar gnzlbg avatar guangli-dai avatar ilvokhin avatar interwq avatar jasone avatar jqian-aurora avatar lapenkov avatar rkmisra avatar ronawho avatar rustyx avatar svetlitski avatar tamird avatar thestinger avatar trasz avatar tyleretzel avatar wqfish avatar yinan1048576 avatar yuslepukhin avatar zoulasc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jemalloc's Issues

Building 3.5.0 on i386 fails because of SSE2 issue

Hi, hope this isn't something completely obvious I'm missing. When building 3.5.0 on i386, it seems like HAVE_SSE2 gets defined when it shouldn't be, which causes this:

`gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -std=gnu99 -fvisibility=hidden -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_UNIT_TEST -Itest/include -Itest/include -o test/src/math.unit.o test/src/math.c
In file included from test/include/test/jemalloc_test.h:58:0,
from test/src/math.c:2:
/usr/lib/gcc/i686-linux-gnu/4.8/include/emmintrin.h:31:3: error: #error "SSE2 instruction set not enabled"

error "SSE2 instruction set not enabled"

^`

I looked at configure.ac and it doesn't seem like HAVE_SSE2 should be getting defined for i386, but I guess maybe it is somewhere. Here's a full build output from Launchpad:

https://launchpadlibrarian.net/165200991/buildlog_ubuntu-saucy-i386.jemalloc_3.5.0-3chl1~saucy1_FAILEDTOBUILD.txt.gz

Wasn't an issue on earlier releases as there's no references to any SSE2 stuff, or to emmintrin.h anywhere that I can see.

Use arena to determine dss precedence for huge allocations.

Right now, huge allocations always use the default dss precedence setting to determine whether to use mmap() or sbrk(). Instead, consider using the dss precedence setting from the arena that would have serviced the allocation request, had it not exceeded the maximum arena size class. Absent this change, applications are faced with having to explicitly take jemalloc size classes into account when segregating dss/heap allocations.

Implement heap profiling on OS X.

Implement heap profiling on OS X. This will require synthesizing the heap map data in a format that pprof can understand, or alternatively implementing symbolification and emitting what pprof calls 'raw' heap profiles.

Experiment with better "clocks".

jemalloc has no true sense of time, but it effectively measures time in units of allocation events. In particular, the tcache code uses an event counter to drive its incremental garbage collection, but unfortunately this is only a weak proxy for what tcache really wants to know -- wall time elapsed since a previous event. Furthermore, the unused dirty page purging code is in dire need of time sense so that hysteresis can be incorporated into the purge rate.

Experiment with clocks that actually track wall time (one second granularity is likely good enough for intended purposes), determine how expensive they are to use, and develop strategies for using them where needed to the fullest extent possible. Data points: time(2) is a vsyscall on Linux 3.2, and gettimeofday() is a vsyscall on Linux 2.6.32. Assuming that clock overhead is manageably low across all operating systems of interest, use clocks to:

  • Introduce hysteresis into unused dirty page purging. This will require logic changes and will impact application-visible tuning parametrization, but internal data structures will need little modification beyond changes necessary to support a global limit on aggregate tcache size.
  • Introduce hysteresis into tcache flushing based on time rather than event count, and experiment with dynamically growing tcaches to match demand. (tcache is currently effective at amortizing mutex overhead, but falls far short in terms of handling large short-term swings in memory usage.)

Experiment with chunk map layout.

Each arena chunk header contains a map with one entry per page that the chunk manages. Right now the map is stored as a single array of arena_chunk_map_t elements, but many code paths access only one field (bits) within the arena_chunk_map_t. Experiment with breaking the map into separate arrays for each field, with an eye toward improving cache locality.

compile error under mingw 4.8.1

jemalloc 3.4.1 is not compiling under mingw32 4.8.1:
-ffs() is not defined
-ffsl() is not defined
-printf("%z", size_t) is incorrect

  • in malloc_init_hard() missing InitializeCriticalSection(&init_lock); call before malloc_mutex_lock(&init_lock);

Faster free-with-size API

Google proposed a C++1y extension to allow the delete operator to pass the size of the freed object to the allocator, when it is known: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3536.html. There's an upstream patch to implement this that Google has integrated into its own GCC branch: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00809.html.

It'd be interesting to have jemalloc provide a similar API that would avoid the need to lookup the slab size of the freed object.

Stats improvements

Recording what we talked about over email:

(1) Would be helpful to have thread cache size statistics. You mentioned this might be a part of the work.

(2) I mentioned that it's a bit difficult to figure out the amount of memory lost to fragmentation on a per-size basis. I'm wondering if there are ways to present the statistics that might be more helpful to application developers.

Compile problems with 3.5.0 and older gcc.

Hello,

We build a lot of our code on older machines with ancient gcc's. The following problem showed up when I tried to upgrade jemalloc to 3.5.0:

ccache gcc -ggdb -isystem /data/plex-dependency-builder/output/pms-depends-linux-ubuntu-x86_64-release-08838f5/include -msse -fno-stack-protector -fvisibility=hidden -fPIC -DPIC -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.pic.o src/ctl.c
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/atomic.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/chunk_dss.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/atomic.pic.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [src/chunk_dss.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/base.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/base.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/chunk.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/chunk.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/jemalloc.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/ckh.c:39:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/jemalloc.pic.o] Error 1
make: *** [src/ckh.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/bitmap.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/bitmap.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/ctl.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/ctl.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
                 from src/chunk_mmap.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'

Seems like restrict is not defined for these systems. Should this be a config test or something similar?

Add hash unit tests.

The current hash function in uses is MurmurHash3, for which validation tests exist. Make sure that the integrated code actually generates correct hashes.

Embed web server.

Embed a minimal web server (naturally listening on port 7469 by default) to support the heap profiling end points that pprof uses. Consider also exposing all relevant mallctl() functionality.

Provide workaround for glibc floating point bug

As reported by David Abdurachmanov, some versions of glibc do not properly save/restore floating point state when calling into the allocator, which can cause state corruption as a side effect of dynamic lazy loading. Provide a workaround: make it possible to compile jemalloc with floating point support completely disabled. This will require that heap profiling is disabled, and the body of prof_sample_threshold_update() will have to be wrapped in #ifdef JEMALLOC_PROF.

Implement test utilities.

Implement a set of test utilities that will allow succinct tests that mimic common allocation patterns. For example:

  • High quality PRNGs.
  • Asynchronous message queues.
  • Statistical distribution generators:
    • I(p): Indicator.
    • Gamma(shape, beta, min, max): Gamma distribution.
    • Norm(mean, sd): Normal distribution.
    • Unif(min, max): Uniform distribution.

Refactor prof_dump().

Refactor prof_dump() to use a two-pass algorithm, such that string formatting and write(2) calls happen outside the critical section.

Experiment with small run dirty page purging.

Experiment with small run dirty page purging -- tracking which pages within active small runs are unused and dirty. The motivation for this is to enable "medium" size classes in the [4 KiB ..16 KiB) range that are not even multiples of 4 KiB, and thereby reduce worst case internal fragmentation (a 4097-byte allocation currently consumes 8KiB).

Much of the bookkeeping can be done under the protection of bin locks, but the ndirty counters in the chunk headers and arenas will probably need to be protected by atomic operations (currently protected by arena locks).

Optimize arena_prof_ctx_set().

Refactor profiling code so that arena_prof_ctx_set() receives usize as an argument. This will allow the following:

if ((mapbits & CHUNK_MAP_LARGE) == 0) {

to be rewritten as:

if (usize <= SMALL_MAXCLASS) {

The latter avoids reading mapbits from the chunk header under normal circumstances (prof_promote is typically true).

Reduce rtree memory usage.

Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree.

Reduce code size for inlined fast path.

The entire fast path code for allocation/deallocation is inlined, thanks to always-inline attributes on all relevant functions. However, there's quite a bit of code in cold branches within those functions, and that likely has a negative impact on machine code layout and icache locality. Move cold code into helper functions.

Reverse the cc-silence default.

Reverse the cc-silence default (replace --enable-cc-silence with --disable-cc-silence), so that by default people won't see spurious warnings when building jemalloc.

hash_variant_verify fails on PPC64 Linux

Hi,

On PPC64 I see:

=== test/unit/hash ===
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 1645424702 != 2968878819: hash_variant_verify
test_hash_x86_32: fail
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 644358346 != 3018647082: hash_variant_verify
test_hash_x86_128: fail
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 3428985711 != 1669642857: hash_variant_verify
test_hash_x64_128: fail
--- pass: 0/3, skip: 0/3, fail: 3/3 ---

On PPC the tests won't even compile:

In file included from test/src/SFMT.c:132:0:
test/include/test/SFMT-alti.h:64:1: error: duplicate 'static'
static vector unsigned int vec_recursion(vector unsigned int a,
^

Experiment with cache index randomization.

Experiment with cache index randomization, for small and large objects. Consider that with every large object aligned at a page boundary, hardware caches tend to get uneven use (set associativity only goes so far). Perhaps the most critical performance issue here is that jemalloc puts all small run headers at page boundaries, so jemalloc metadata structures stress the cache very unevenly. The fix is to put run headers in varied locations throughout runs, and to vary page offsets for large objects (perhaps even incur cost of one extra page of memory). For more motivation, see e.g. this message. For more information on the problem and a range of possible solutions, see "Cache Index-Aware Memory Allocation", by Afek, Dice, and Morrison (ISSM 2011).

Add stress tests.

Add stress tests that simulate common/challenging allocation patterns. For example:

  • Multi-threaded producer/consumer. The producer allocates jobs, and the consumers process them (perhaps doing some additional allocation) before deallocating some/all of the job data.
  • Multi-threaded large data structure initialization, followed by a different mode of operation.
  • Imbalanced allocator usage (some threads much more active than others).

Remove a0malloc(), a0calloc() and a0free().

The a0malloc(), a0calloc() and a0free() functions exist only so that FreeBSD can avoid recursion during TLS variable allocation, but the *allocx() API provides this functionality via the MALLOCX_ARENA() flag. Verify that *allocx() is a viable replacement, and remove these obsolete functions.

Experiment with per CPU arenas.

Experiment with using just one arena per CPU, and using sched_getcpu(3) or equivalent as a heuristic when choosing an arena for allocation that cannot be satisfied by the thread-specific cache. The known challenges include:

  • sched_getcpu(3) is Linux-specific. FreeBSD probably has the necessary infrastructure to implement an equivalent feature, but it does not currently exist.
  • Because this strategy for arena selection cannot be implemented on all platforms, the existing arena selection strategy will have to remain as an option.
  • If an application runs fewer threads than there are CPUs, a per CPU arena is a waste. It may be important to transition from one arena selection strategy to another as the number of threads increases.

Do anyone konw why jemalloc use more memory in arm

when i cross compile it,and use it in arm. the test program take about 9M memory(us PS cmd), and if it use system malloc, PS is about 1.5M...
do anyone know why? does jemalloc not recommend use in arm?

failed with php '--enable-maintainer-zts'

when compiling php with option '--enable-maintainer-zts' that require libpthread, linking with jemalloc will failed


Error:
configure: error: Your system seems to lack POSIX threads.

Refactor tsd.

Experiment with refactoring the tsd code such that there is only one TLS variable that contains all TLS data. The advantage is that it may be possible to reduce the number of TLS lookups, but that's only advantageous if it's practical to pass the pointer through the entire fast path.

Refactor to allow white box testing

Refactor the testing infrastructure to enable white box unit testing. For an example of how tests are currently thwarted, see Mike Hommey's tsd test. There are several related problems:

  • Tests may depend on special build configuration (e.g. no inlining so that internal functions can be overridden).
  • Tests may apply only to particular configurations, e.g. Valgrind integration.
  • Tests may apply only to particular platforms, e.g. allocation within TLS internals during thread destruction.
  • Running the full set of tests may require iteration over numerous configurations.

Port heap profiling to FreeBSD.

Port heap profiling to FreeBSD. This is very close to working already, the only issue being that the the the heap map data need to be synthesized in a format that pprof can digest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.