grapheneos / hardened_malloc Goto Github PK

Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.

Home Page: https://grapheneos.org/

License: MIT License

Makefile 5.83% C 74.80% Python 8.26% Shell 0.11% C++ 9.53% Java 1.47%

malloc security memory-allocation malloc-library memory-allocator memory hardening slab-allocator quarantine grapheneos

hardened_malloc's Issues

implement slab allocation arenas for scalability

improve malloc_object_size by calculating offset from start for small allocations

It can report accurate results based on the offset from the start of the allocation rather than simply returning the overall size class as a bound.

virtual memory quarantine for large allocations

add isolation for regions hash table

One option to keep it simple is having 2 separated reserved regions and moving it between them every time it grows rather. That way, the security property of completely isolated metadata can be provided by not reusing memory used for large allocations for the hash table and vice versa.

add ChaCha20 CSPRNG to improve performance compared to just caching getrandom output

use FIFO instead of LIFO for free slabs

These are purged of pages so they aren't hot, and they're memory protected so keeping them that way for as long as possible will generally improve security.

implement in-place growth for large allocations

It will add an extra system call with synchronization when it fails, so it isn't necessarily a good idea at the smallest sizes if it almost always fails in practice. The Linux mmap heap uses best-fit and grows downwards which tends to eliminate common cases where this can actually work.

It's also unclear if in-place growth is a neat positive or negative for security, similar to in-place shrinking. It's probably best to focus only on the performance aspects of this for now because there are too many variables when it comes to security. A large virtual memory quarantine could potentially turn avoiding in-place growth into a security feature so there may end up being an option to disable the optimizations.

implement in-place shrinking of large allocations

use popcount + PDEP for uniform random slot selection on x86 for Haswell and later

PDEP allows selecting the nth unset bit efficiently (a couple cycles) so it's a fantastic way of implementing this. There's no clear way to do it at all efficiently elsewhere, which is why the current portable implementation only randomizes the search start index and then uses the ffs intrinsic.

use multiple bitmap words for smallest size classes to fill entire slabs

higher level randomization of slab allocations

Compilation error on ubuntu 18.04 with gcc

Hello,

I have a compilation error when using gcc 7.3.0 under Ubuntu 18.04 x64 on the current HEAD:

jvoisin@grimhilde 12:36 ~/dev/hardened_malloc export CC=gcc
jvoisin@grimhilde 12:36 ~/dev/hardened_malloc make
gcc -std=c11 -O2 -flto -fPIC -fvisibility=hidden -fno-plt -pipe -Wall -Wextra -Wmissing-prototypes -D_GNU_SOURCE  -c -o chacha.o chacha.c
gcc -std=c11 -O2 -flto -fPIC -fvisibility=hidden -fno-plt -pipe -Wall -Wextra -Wmissing-prototypes -D_GNU_SOURCE  -c -o malloc.o malloc.c
malloc.c:148:46: error: initializer element is not constant
 static const size_t real_class_region_size = class_region_size * 2;
                                              ^~~~~~~~~~~~~~~~~
malloc.c:149:40: error: initializer element is not constant
 static const size_t slab_region_size = real_class_region_size * N_SIZE_CLASSES;
                                        ^~~~~~~~~~~~~~~~~~~~~~
malloc.c:523:45: error: initializer element is not constant
 static const size_t max_region_table_size = class_region_size / PAGE_SIZE;
                                             ^~~~~~~~~~~~~~~~~
malloc.c:527:31: error: initializer element is not constant
 static size_t regions_total = initial_region_table_size;
                               ^~~~~~~~~~~~~~~~~~~~~~~~~
malloc.c:528:30: error: initializer element is not constant
 static size_t regions_free = initial_region_table_size;
                              ^~~~~~~~~~~~~~~~~~~~~~~~~
<builtin>: recipe for target 'malloc.o' failed
make: *** [malloc.o] Error 1
zsh: exit 2     make
jvoisin@grimhilde 12:36 ~/dev/hardened_malloc gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04) 
jvoisin@grimhilde 12:36 ~/dev/hardened_malloc git show | head -n1
commit e47c78352401923504da822fe27a4fc50f86b4ae
jvoisin@grimhilde 12:37 ~/dev/hardened_malloc

consider switching from hash table for large allocations to an arena-based red-black tree to satisfy range requests for malloc_object_size

The current implementation will lead to only being able to provide size results for offsets into the initial page(s) of a large allocation, since with a hash table there isn't a way to find the closest previous match.

This would have a performance cost compared to the hash table for allocation, and would also make malloc_object_size significantly slower. It might make sense as a configuration option, or perhaps it's not worth doing at all.

purge and memory protect idle free slabs

shard the region hash table for scalability (primarily for malloc_object_size)

improve performance on boundary of empty slab cache being full

random base for region_allocator state

implement slab allocation write-after-free check

improve malloc_object_size for small allocations by checking if they're free

This would be unavailable for malloc_object_size_fast which exists to avoid locking.

support ARMv8.5 memory tagging

Generating the tags is going to require careful thought. Adjacent allocations should be guaranteed to have different tags, which could be approached by dedicating a bit to whether the slot is at an odd or even index (wasting entropy) or by generating a new random tag until there's no collision which requires a way to fetch the tags from the shadow region or wasting memory on metadata in the allocator.

The existing CSPRNG (ChaCha8 keystream with a small cache) is very efficient and will make generating these tags low overhead, but using a couple more bytes on every allocation will further increase the pressure to optimize it, especially alongside further slab allocation randomization beyond the existing slot selection randomization.

fewer guard slabs by default with a configuration option for it

The feature is currently too aggressive and assigns half of the slab address space to guard slabs by default. It would be better to choose a much more conservative proportion by default, such as a guard slab after every 9 usable slabs.

randomized quarantine for large allocations

It can be split between the FIFO ring buffer and a random array.

add optional custom mutex to reduce overhead and spin for a few iterations before using futex

implement slab allocation quarantine

Combining a randomized array (per OpenBSD malloc) with a ring buffer (per my OpenBSD malloc extension).

add sized deallocation support

To be used by a C++ sized deallocation implementation.

harden realloc against undefined races from other threads

migrate all configuration to make arguments

add configuration for maximum large allocation guard region sizes

add more slab size classes for > 16k allocations

It would be good to cover up to at least 64k and perhaps higher to cover a much higher percentage of the common cases with the proper core allocation scheme.

speed up division

Using libdivide is one option, or figuring out some way to avoid it altogether. Size and slab sizes aren't all powers of 2 for extremely important reasons so it's non-trivial.

override C++ allocation functions including sized deallocation

add memory corruption checking to malloc_usable_size for slab allocations

It's nice not needing to grab any locks to return a size for slabs, but it's more important to check for memory corruption. It's a very rarely used API and has little use case with this allocator design due to the tight bounds on internal fragmentation and the fact that it's currently precisely tracking sizes for large allocations instead of rounding them at all.

per-slab canaries for small allocations

add handling of medium sized allocations

This could be done simply by adding more small size classes if the cost is deemed acceptable.

improve malloc_object_size for large allocations

It can easily return results for pointers within the first page of a memory allocation. It could go beyond that but it would require an extra hash table lookup for each extra page checked.

make region guard pages randomly sized (another region_info field)

implement malloc_object_size extension

add configuration for proportion of guard slabs to slabs

Android Bionic support

faster uniform random number generation

https://lemire.me/blog/2016/06/30/fast-random-shuffling/
https://github.com/golang/go/blob/master/src/math/rand/rand.go#L141-L163
http://www.pcg-random.org/posts/bounded-rands.html

optimize the implementation based on libc integration by avoiding dynamic initialization

zero leading byte of canaries

randomize slab slot selection

use pthread_atfork for safe threaded forking

leave guard pages (guard slabs) in the slab class regions

This can be done simply by skipping metadata slots during metadata allocation.

add Android build system integration and resolve any compatibility issues

use mremap with MREMAP_MAYMOVE for large allocations when sensible

It needs to move the surrounding guard pages with the allocation so it will be necessary to have extra mprotect calls as part of the implementation. It may end up being cheaper to allocate and copy for the smaller range of large allocations, so there could be a threshold determining when to start using mremap. It's important to leave the guard pages intact at the old location before remapping it to avoid having this reduce security compared to before.

This optimization may end up being lost for at least some ranges of allocations when the implementation is more sophisticated, but that's acceptable.

grapheneos / hardened_malloc Goto Github PK

hardened_malloc's Issues

Recommend Projects

Recommend Topics

Recommend Org