lukaskalbertodt / stable-vec Goto Github PK

View Code? Open in Web Editor NEW

17.0 5.0 10.0 377 KB

A Vec-like collection which guarantees stable indices and features O(1) deletion of elements

Home Page: https://docs.rs/stable-vec

License: Apache License 2.0

Rust 100.00%

vec stable data-structures rust rust-library indices

stable-vec's Introduction

stable-vec

A Vec<T>-like collection which guarantees stable indices and features O(1) element removal at the cost of wasting some memory. It is semantically very similar to Vec<Option<T>>, but with a more optimized memory layout and a more convenient API. This data structure is very useful as a foundation to implement other data structures like graphs and polygon meshes. In those situations, stable-vec functions a bit like an arena memory allocator. This crate works in #![no_std] context (it needs the alloc crate, though).

This crate implements different strategies to store the information. As these strategies have slightly different performance characteristics, the user can choose which to use. The two main strategies are:

something similar to Vec<T> with a BitVec (used by default), and
something similar to Vec<Option<T>>.

Please refer to the documentation for more information. Example:

let mut sv = StableVec::new();
let star_idx = sv.push('★');
let heart_idx = sv.push('♥');
let lamda_idx = sv.push('λ');

// Deleting an element does not invalidate any other indices.
sv.remove(star_idx);
assert_eq!(sv[heart_idx], '♥');
assert_eq!(sv[lamda_idx], 'λ');

// You can insert into empty slots (again, without invalidating any indices)
sv.insert(star_idx, '☺');

// We can also reserve memory (create new empty slots) and insert into
// these new slots. All slots up to `sv.capacity()` can be accessed.
sv.reserve_for(15);
assert_eq!(sv.get(15), None);
sv.insert(15, '☮');

// The final state of the stable vec
assert_eq!(sv.get(0), Some(&'☺'));
assert_eq!(sv.get(1), Some(&'♥'));
assert_eq!(sv.get(2), Some(&'λ'));
assert_eq!(sv.get(3), None);
assert_eq!(sv.get(14), None);
assert_eq!(sv.get(15), Some(&'☮'));

Alternatives? What about `slab`?

The crate slab works very similar to stable-vec, but has way more downloads. Despite being very similar, there are a few differences which might be important for you:

slab reuses keys of deleted entries, while stable-vec does not automatically.
slab does a bit more management internally to quickly know which keys to reuse and where to insert. This might incur a tiny bit of overhead. Most notably: each entry in the underlying Vec in slab is at least size_of::<usize>() + 1 bytes large. If you're storing small elements, this might be a significant memory usage overhead.
slab has a fixed memory layout while stable-vec lets you choose between different layouts. These have different performance characteristics and you might want to choose the right one for your situation.
The API of stable-vec is a bit more low level.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

stable-vec's People

Contributors

Stargazers

Watchers

Forkers

mikeyhew farhanpatwary burjui gnzlbg luro02 srikwit icodein nilekurt ratmice shabbirhasan1

stable-vec's Issues

API additions

Things we probably want to add to the API:

Things we might or might not want to add, mostly because the semantics are not clear yet:

truncate(new_len)
clear() (but what does it do exactly?)
first(), first_mut(), last(), last_mut() (but those would be O(n)!)
swap() (invalidates two indices... uhm maybe?)

Layout experiments

There are some things we could be experimenting with. They might lead to a performance improvement, but this needs to be checked.

A Vec<T> core with user-specified nil-value: the OptionCore is useful when T is NonNull, i.e. the 0-bit pattern is never used. But sometimes other bit patterns are not used, which could be used to indicate an empty slot.
compact: bool: store a boolean that is true if the stable vec is still compact. In this case, we don't have to store any additional information for slots (i.e. not bit vec). This could be useful in situation where no element is removed as this saves memory and could be faster. But it might also be slower in general because it adds complexity.

Add `StableVecFacade::swap`

Swapping two elements. Users can't easily do that with mem::swap due to borrowck errors.

Confusion with `push` and `insert` methods

There seem to be two ways to add an element to a StableVec:

StableVec::push, which inserts an element in the next free slot? It is not entirely clear by the documentation what this is supposed to be?
StableVec::insert, which inserts an element at the specified position, but does panic if the index is out of bounds (which makes it quite inconvenient, because you would have to always call StableVec::reserve_for(index) and then call StableVec::insert, to prevent panics).

I am a bit confused about the behavior of those two methods.

I expect a function called push to push an element to the back of the vec (like Vec::push)
and I do not expect insert to panic if the index is out of bounds (I see this method more like a HashMap::<usize, T>::insert)

I would suggest adding the following methods:

StableVec::push_back(&mut self, value: T), appends an element to the back of the StableVec, ignoring free slots
StableVec::pop_back(&mut self) -> Option<T>, removes the last element from a StableVec

And changing the StableVec::insert method to not panic if the capacity is exhausted and instead resize the vector.

Use `needs_drop` once it's stabilized to speed up `drop()`

I'm not sure if the optimizer is smart enough, but needs_drop() would certainly help. Blocked by rust-lang/rust#41890.

Random notes: performance

I'm currently working on another performance-critical project that uses this crate. I'm inspecting generated assembly code and noted some possible improvements to this crate. Before I forget all of this, here is a random list of notes:

~~Using bitvec[index] is bad: because that crate uses the static TRUE/FALSE trick to in the [] operator we have a branch where there shouldn't be one. Use get instead or something like that~~
~~First indexing the bitvec and then the data vec does two bound checks. That's bad. Everywhere where we use has_element_at (and it returns true), we should use get_unchecked or so.~~
Setting or resetting a bit (done via & or | with a mask) is not optimized well. This should (probably be done) with the BTS and BTR instructions on x86. Have too look into this.

(This issue is rather a reminder to myself...)

Missing trait implementations

There are some traits missing for StableVec that could be implemented.

Important traits:

serdes Serialize and Deserialize could also be supported?

Convenience:

More and better tests and examples

Add `no_std` support with `alloc` crate

Rewrite `OptionCore` to avoid relying on unclear guarantees by `Vec`

OptionCore uses a Vec<Option<T>>. It maintains that all elements between len and cap are initialized to None. Some methods rely on this. Unfortunately, I don't think there are super clear promises given by the docs of Vec. There is this part:

Vec will not specifically overwrite any data that is removed from it, but also won’t specifically preserve it. Its uninitialized memory is scratch space that it may use however it wants. It will generally just do whatever is most efficient or otherwise easy to implement. Do not rely on removed data to be erased for security purposes. Even if you drop a Vec, its buffer may simply be reused by another allocation. Even if you zero a Vec’s memory first, that might not actually happen because the optimizer does not consider this a side-effect that must be preserved. There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

Plus the documentation of set_len with the example. But this just suggests that writing to uninitialized memory and then immediately calling set_len is fine. No promises that any other method calls would preserve the memory after len.

So I think I would probably feel more comfortable just not using Vec but writing custom allocation and all of that.

Rename `exists()`

This is a bad name. It should be something like contains_index() or something like that...

Try out and offer different ways to store the "deleted" flag for elements

There are different ways to store whether or not an element is present or deleted:

(A) Vec<Option<T>>: the obvious, semantically "correct" version. Sadly, this can lead to a lot of wasted memory since Option<T> can be twice as large as T for some types (e.g. u32). However, for cases where Option<T> has the same size as T, it's pretty optimal, because we need no additional memory. Additionally, this works completely without unsafe code, so that's good.
(B) Vec<T> + BitVec + count: this is what this crate is currently using. In situations where Option<T> is larger than T, this is more or less the most space efficient. Iteration over the indices of existing elements is the fastest this way since that data is packed a lot closer than Vec<T>, we have a lot fewer cache misses and need to load fewer memory.

There are the following variants I can think of:

Combined Vec and BitVec plus count (variant of (B)): a simple optimization of the second version. The length, and probably also the capacity field of Vec and BitVec contain the same information and shouldn't be stored twice. This only improves the stack size of StablVec, but that's something.
Vec<T> with user specified dummy value (variant of (A)): this might be useful in a couple of situations. Option<T> can only get optimized if the dummy value of T is a binary 0.
compact: bool (variant of (B)): while the vector is still compact, we don't need to store any per element information. Since I expect stable vecs to be compact a lot of the time, this could be worth it.

All these variants need to be benchmarked and compared to one another. Then, it would be nice to let the user decide which one to use.

Add benchmarks to compare `StableVec` with `HashMap`

To better explain the performance advantages of this crate. Of course, a fast hash functions for integer keys should be used.

`IterMut::remove_current` does not drop the value

IterMut::remove_current is broken. I don't know what my brain thought when writing this, but it's not great.

This method was used by StableVec::retain. This was changed in 0.2.2 (#20). Next, I will remove remove_current as it's unlikely to stay in this form. And changing semantics about it would require a breaking change anyway. So rather remove it now before anything bad happens.