jemalloc / jemalloc Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://jemalloc.net/
License: Other
Home Page: http://jemalloc.net/
License: Other
jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came into use as the FreeBSD libc allocator in 2005, and since then it has found its way into numerous applications that rely on its predictable behavior. In 2010 jemalloc development efforts broadened to include developer support features such as heap profiling and extensive monitoring/tuning hooks. Modern jemalloc releases continue to be integrated back into FreeBSD, and therefore versatility remains critical. Ongoing development efforts trend toward making jemalloc among the best allocators for a broad range of demanding applications, and eliminating/mitigating weaknesses that have practical repercussions for real world applications. The COPYING file contains copyright and licensing information. The INSTALL file contains information on how to configure, build, and install jemalloc. The ChangeLog file contains a brief summary of changes for each release. URL: https://jemalloc.net/
Remove the "opt.valgrind" mallctl, because it is not needed (jemalloc automatically detects whether it's running inside Valgrind).
Hi, hope this isn't something completely obvious I'm missing. When building 3.5.0 on i386, it seems like HAVE_SSE2 gets defined when it shouldn't be, which causes this:
`gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -std=gnu99 -fvisibility=hidden -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_UNIT_TEST -Itest/include -Itest/include -o test/src/math.unit.o test/src/math.c
In file included from test/include/test/jemalloc_test.h:58:0,
from test/src/math.c:2:
/usr/lib/gcc/i686-linux-gnu/4.8/include/emmintrin.h:31:3: error: #error "SSE2 instruction set not enabled"
^`
I looked at configure.ac and it doesn't seem like HAVE_SSE2 should be getting defined for i386, but I guess maybe it is somewhere. Here's a full build output from Launchpad:
Wasn't an issue on earlier releases as there's no references to any SSE2 stuff, or to emmintrin.h anywhere that I can see.
Right now, huge allocations always use the default dss precedence setting to determine whether to use mmap() or sbrk(). Instead, consider using the dss precedence setting from the arena that would have serviced the allocation request, had it not exceeded the maximum arena size class. Absent this change, applications are faced with having to explicitly take jemalloc size classes into account when segregating dss/heap allocations.
Add atomic.h support for powerpc 32/64 architectures. This is needed when building with old versions of gcc that lack the necessary intrinsics.
The heap profiling code substantially complicates the public API functions, to a degree that changes to these functions are an inherently high regression risk. Figure out how to better encapsulate the heap profiling complexity.
Implement heap profiling on OS X. This will require synthesizing the heap map data in a format that pprof can understand, or alternatively implementing symbolification and emitting what pprof calls 'raw' heap profiles.
jemalloc has no true sense of time, but it effectively measures time in units of allocation events. In particular, the tcache code uses an event counter to drive its incremental garbage collection, but unfortunately this is only a weak proxy for what tcache really wants to know -- wall time elapsed since a previous event. Furthermore, the unused dirty page purging code is in dire need of time sense so that hysteresis can be incorporated into the purge rate.
Experiment with clocks that actually track wall time (one second granularity is likely good enough for intended purposes), determine how expensive they are to use, and develop strategies for using them where needed to the fullest extent possible. Data points: time(2) is a vsyscall on Linux 3.2, and gettimeofday() is a vsyscall on Linux 2.6.32. Assuming that clock overhead is manageably low across all operating systems of interest, use clocks to:
Each arena chunk header contains a map with one entry per page that the chunk manages. Right now the map is stored as a single array of arena_chunk_map_t elements, but many code paths access only one field (bits) within the arena_chunk_map_t. Experiment with breaking the map into separate arrays for each field, with an eye toward improving cache locality.
jemalloc 3.4.1 is not compiling under mingw32 4.8.1:
-ffs() is not defined
-ffsl() is not defined
-printf("%z", size_t) is incorrect
Remove the "arenas.purge" mallctl, which was obsoleted by the "arena.<i>.purge" in 3.1.0.
The INSTALL doc mentions a build_lib target, but Makefile.in only has a build target. Not sure which way it should be.
Google proposed a C++1y extension to allow the delete operator to pass the size of the freed object to the allocator, when it is known: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3536.html. There's an upstream patch to implement this that Google has integrated into its own GCC branch: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00809.html.
It'd be interesting to have jemalloc provide a similar API that would avoid the need to lookup the slab size of the freed object.
Experiment with heap profiling fast path optimizations that further reduce heap profiling overhead. PROF_ALLOC_PREP() has several branches that clever refactoring may obsolete.
Recording what we talked about over email:
(1) Would be helpful to have thread cache size statistics. You mentioned this might be a part of the work.
(2) I mentioned that it's a bit difficult to figure out the amount of memory lost to fragmentation on a per-size basis. I'm wondering if there are ways to present the statistics that might be more helpful to application developers.
Hello,
We build a lot of our code on older machines with ancient gcc's. The following problem showed up when I tried to upgrade jemalloc to 3.5.0:
ccache gcc -ggdb -isystem /data/plex-dependency-builder/output/pms-depends-linux-ubuntu-x86_64-release-08838f5/include -msse -fno-stack-protector -fvisibility=hidden -fPIC -DPIC -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.pic.o src/ctl.c
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/atomic.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/chunk_dss.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/atomic.pic.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [src/chunk_dss.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/base.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/base.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/chunk.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/chunk.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/jemalloc.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/ckh.c:39:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/jemalloc.pic.o] Error 1
make: *** [src/ckh.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/bitmap.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/bitmap.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/ctl.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
make: *** [src/ctl.pic.o] Error 1
In file included from include/jemalloc/internal/jemalloc_internal.h:538,
from src/chunk_mmap.c:3:
include/jemalloc/internal/util.h:88: error: expected ';', ',' or ')' before 'nptr'
Seems like restrict is not defined for these systems. Should this be a config test or something similar?
The current hash function in uses is MurmurHash3, for which validation tests exist. Make sure that the integrated code actually generates correct hashes.
Embed a minimal web server (naturally listening on port 7469 by default) to support the heap profiling end points that pprof uses. Consider also exposing all relevant mallctl() functionality.
As reported by David Abdurachmanov, some versions of glibc do not properly save/restore floating point state when calling into the allocator, which can cause state corruption as a side effect of dynamic lazy loading. Provide a workaround: make it possible to compile jemalloc with floating point support completely disabled. This will require that heap profiling is disabled, and the body of prof_sample_threshold_update() will have to be wrapped in #ifdef JEMALLOC_PROF.
Use gcov to compute code coverage for the test suite.
Implement a set of test utilities that will allow succinct tests that mimic common allocation patterns. For example:
Refactor prof_dump() to use a two-pass algorithm, such that string formatting and write(2) calls happen outside the critical section.
Experiment with small run dirty page purging -- tracking which pages within active small runs are unused and dirty. The motivation for this is to enable "medium" size classes in the [4 KiB ..16 KiB) range that are not even multiples of 4 KiB, and thereby reduce worst case internal fragmentation (a 4097-byte allocation currently consumes 8KiB).
Much of the bookkeeping can be done under the protection of bin locks, but the ndirty counters in the chunk headers and arenas will probably need to be protected by atomic operations (currently protected by arena locks).
The *allocm() API has been obsoleted by the *allocx() API, and can be deleted.
Remove the --enable-dss configure option -- always build dss support on supported platforms.
The prof_accum test is unreliable because the compiler can cause failure if it inlines and/or optimizes tail recursion. Break the test into multiple compilation units to reliably thwart undesirable optimizations.
Refactor profiling code so that arena_prof_ctx_set() receives usize as an argument. This will allow the following:
if ((mapbits & CHUNK_MAP_LARGE) == 0) {
to be rewritten as:
if (usize <= SMALL_MAXCLASS) {
The latter avoids reading mapbits from the chunk header under normal circumstances (prof_promote is typically true).
Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree.
The entire fast path code for allocation/deallocation is inlined, thanks to always-inline attributes on all relevant functions. However, there's quite a bit of code in cold branches within those functions, and that likely has a negative impact on machine code layout and icache locality. Move cold code into helper functions.
Reverse the cc-silence default (replace --enable-cc-silence with --disable-cc-silence), so that by default people won't see spurious warnings when building jemalloc.
Hi,
On PPC64 I see:
=== test/unit/hash ===
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 1645424702 != 2968878819: hash_variant_verify
test_hash_x86_32: fail
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 644358346 != 3018647082: hash_variant_verify
test_hash_x86_128: fail
hash_variant_verify:test/unit/hash.c:133: Failed assertion: (computed) == (expected) --> 3428985711 != 1669642857: hash_variant_verify
test_hash_x64_128: fail
--- pass: 0/3, skip: 0/3, fail: 3/3 ---
On PPC the tests won't even compile:
In file included from test/src/SFMT.c:132:0:
test/include/test/SFMT-alti.h:64:1: error: duplicate 'static'
static vector unsigned int vec_recursion(vector unsigned int a,
^
clang on OS X Mavericks complains that sbrk() is deprecated; automatically disable sbrk() on OS X.
Experiment with cache index randomization, for small and large objects. Consider that with every large object aligned at a page boundary, hardware caches tend to get uneven use (set associativity only goes so far). Perhaps the most critical performance issue here is that jemalloc puts all small run headers at page boundaries, so jemalloc metadata structures stress the cache very unevenly. The fix is to put run headers in varied locations throughout runs, and to vary page offsets for large objects (perhaps even incur cost of one extra page of memory). For more motivation, see e.g. this message. For more information on the problem and a range of possible solutions, see "Cache Index-Aware Memory Allocation", by Afek, Dice, and Morrison (ISSM 2011).
Add stress tests that simulate common/challenging allocation patterns. For example:
The a0malloc(), a0calloc() and a0free() functions exist only so that FreeBSD can avoid recursion during TLS variable allocation, but the *allocx() API provides this functionality via the MALLOCX_ARENA() flag. Verify that *allocx() is a viable replacement, and remove these obsolete functions.
Do redzone validation when inserting into quarantine, but only if Valgrind isn't active (in which case the validation would be redundant).
Do redzone validation when inserting into quarantine, but only if Valgrind isn't active (in which case the validation would be redundant).
Experiment with using just one arena per CPU, and using sched_getcpu(3) or equivalent as a heuristic when choosing an arena for allocation that cannot be satisfied by the thread-specific cache. The known challenges include:
when i cross compile it,and use it in arm. the test program take about 9M memory(us PS cmd), and if it use system malloc, PS is about 1.5M...
do anyone know why? does jemalloc not recommend use in arm?
when compiling php with option '--enable-maintainer-zts' that require libpthread, linking with jemalloc will failed
Error:
configure: error: Your system seems to lack POSIX threads.
Experiment with refactoring the tsd code such that there is only one TLS variable that contains all TLS data. The advantage is that it may be possible to reduce the number of TLS lookups, but that's only advantageous if it's practical to pass the pointer through the entire fast path.
Refactor the testing infrastructure to enable white box unit testing. For an example of how tests are currently thwarted, see Mike Hommey's tsd test. There are several related problems:
Port heap profiling to FreeBSD. This is very close to working already, the only issue being that the the the heap map data need to be synthesized in a format that pprof can digest.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.