Giter VIP home page Giter VIP logo

stable-vec's Introduction

stable-vec

CI status of master Crates.io Version docs.rs

A Vec<T>-like collection which guarantees stable indices and features O(1) element removal at the cost of wasting some memory. It is semantically very similar to Vec<Option<T>>, but with a more optimized memory layout and a more convenient API. This data structure is very useful as a foundation to implement other data structures like graphs and polygon meshes. In those situations, stable-vec functions a bit like an arena memory allocator. This crate works in #![no_std] context (it needs the alloc crate, though).

This crate implements different strategies to store the information. As these strategies have slightly different performance characteristics, the user can choose which to use. The two main strategies are:

  • something similar to Vec<T> with a BitVec (used by default), and
  • something similar to Vec<Option<T>>.

Please refer to the documentation for more information. Example:

let mut sv = StableVec::new();
let star_idx = sv.push('★');
let heart_idx = sv.push('♥');
let lamda_idx = sv.push('λ');

// Deleting an element does not invalidate any other indices.
sv.remove(star_idx);
assert_eq!(sv[heart_idx], '♥');
assert_eq!(sv[lamda_idx], 'λ');

// You can insert into empty slots (again, without invalidating any indices)
sv.insert(star_idx, '☺');

// We can also reserve memory (create new empty slots) and insert into
// these new slots. All slots up to `sv.capacity()` can be accessed.
sv.reserve_for(15);
assert_eq!(sv.get(15), None);
sv.insert(15, '☮');

// The final state of the stable vec
assert_eq!(sv.get(0), Some(&'☺'));
assert_eq!(sv.get(1), Some(&'♥'));
assert_eq!(sv.get(2), Some(&'λ'));
assert_eq!(sv.get(3), None);
assert_eq!(sv.get(14), None);
assert_eq!(sv.get(15), Some(&'☮'));

Alternatives? What about slab?

The crate slab works very similar to stable-vec, but has way more downloads. Despite being very similar, there are a few differences which might be important for you:

  • slab reuses keys of deleted entries, while stable-vec does not automatically.
  • slab does a bit more management internally to quickly know which keys to reuse and where to insert. This might incur a tiny bit of overhead. Most notably: each entry in the underlying Vec in slab is at least size_of::<usize>() + 1 bytes large. If you're storing small elements, this might be a significant memory usage overhead.
  • slab has a fixed memory layout while stable-vec lets you choose between different layouts. These have different performance characteristics and you might want to choose the right one for your situation.
  • The API of stable-vec is a bit more low level.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

stable-vec's People

Contributors

burjui avatar farhanpatwary avatar lukaskalbertodt avatar ratmice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stable-vec's Issues

Rename `exists()`

This is a bad name. It should be something like contains_index() or something like that...

Random notes: performance

I'm currently working on another performance-critical project that uses this crate. I'm inspecting generated assembly code and noted some possible improvements to this crate. Before I forget all of this, here is a random list of notes:

  • Using bitvec[index] is bad: because that crate uses the static TRUE/FALSE trick to in the [] operator we have a branch where there shouldn't be one. Use get instead or something like that
  • First indexing the bitvec and then the data vec does two bound checks. That's bad. Everywhere where we use has_element_at (and it returns true), we should use get_unchecked or so.
  • Setting or resetting a bit (done via & or | with a mask) is not optimized well. This should (probably be done) with the BTS and BTR instructions on x86. Have too look into this.

(This issue is rather a reminder to myself...)

Try out and offer different ways to store the "deleted" flag for elements

There are different ways to store whether or not an element is present or deleted:

  • (A) Vec<Option<T>>: the obvious, semantically "correct" version. Sadly, this can lead to a lot of wasted memory since Option<T> can be twice as large as T for some types (e.g. u32). However, for cases where Option<T> has the same size as T, it's pretty optimal, because we need no additional memory. Additionally, this works completely without unsafe code, so that's good.
  • (B) Vec<T> + BitVec + count: this is what this crate is currently using. In situations where Option<T> is larger than T, this is more or less the most space efficient. Iteration over the indices of existing elements is the fastest this way since that data is packed a lot closer than Vec<T>, we have a lot fewer cache misses and need to load fewer memory.

There are the following variants I can think of:

  • Combined Vec and BitVec plus count (variant of (B)): a simple optimization of the second version. The length, and probably also the capacity field of Vec and BitVec contain the same information and shouldn't be stored twice. This only improves the stack size of StablVec, but that's something.
  • Vec<T> with user specified dummy value (variant of (A)): this might be useful in a couple of situations. Option<T> can only get optimized if the dummy value of T is a binary 0.
  • compact: bool (variant of (B)): while the vector is still compact, we don't need to store any per element information. Since I expect stable vecs to be compact a lot of the time, this could be worth it.

All these variants need to be benchmarked and compared to one another. Then, it would be nice to let the user decide which one to use.

Layout experiments

There are some things we could be experimenting with. They might lead to a performance improvement, but this needs to be checked.

  • A Vec<T> core with user-specified nil-value: the OptionCore is useful when T is NonNull, i.e. the 0-bit pattern is never used. But sometimes other bit patterns are not used, which could be used to indicate an empty slot.

  • compact: bool: store a boolean that is true if the stable vec is still compact. In this case, we don't have to store any additional information for slots (i.e. not bit vec). This could be useful in situation where no element is removed as this saves memory and could be faster. But it might also be slower in general because it adds complexity.

`IterMut::remove_current` does not drop the value

IterMut::remove_current is broken. I don't know what my brain thought when writing this, but it's not great.

This method was used by StableVec::retain. This was changed in 0.2.2 (#20). Next, I will remove remove_current as it's unlikely to stay in this form. And changing semantics about it would require a breaking change anyway. So rather remove it now before anything bad happens.

API additions

Things we probably want to add to the API:

  • a consuming iterator
  • insert_at
  • FromIterator impl
  • into_vec()
  • contains()
  • Debug impl
  • Extend impl
  • Default impl
  • retain
  • insert_into_hole(index, elem) -> Result<(), Elem>
  • io::Write impl for StableVec<u8>
  • From<Vec<T>> impl -> only possible with specialization. Instead from_vec was added.
  • Expose next_index or something like that to allow users to build iterators themselves

Things we might or might not want to add, mostly because the semantics are not clear yet:

  • truncate(new_len)
  • clear() (but what does it do exactly?)
  • first(), first_mut(), last(), last_mut() (but those would be O(n)!)
  • swap() (invalidates two indices... uhm maybe?)

Rewrite `OptionCore` to avoid relying on unclear guarantees by `Vec`

OptionCore uses a Vec<Option<T>>. It maintains that all elements between len and cap are initialized to None. Some methods rely on this. Unfortunately, I don't think there are super clear promises given by the docs of Vec. There is this part:

Vec will not specifically overwrite any data that is removed from it, but also won’t specifically preserve it. Its uninitialized memory is scratch space that it may use however it wants. It will generally just do whatever is most efficient or otherwise easy to implement. Do not rely on removed data to be erased for security purposes. Even if you drop a Vec, its buffer may simply be reused by another allocation. Even if you zero a Vec’s memory first, that might not actually happen because the optimizer does not consider this a side-effect that must be preserved. There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

Plus the documentation of set_len with the example. But this just suggests that writing to uninitialized memory and then immediately calling set_len is fine. No promises that any other method calls would preserve the memory after len.

So I think I would probably feel more comfortable just not using Vec but writing custom allocation and all of that.

Missing trait implementations

There are some traits missing for StableVec that could be implemented.

Important traits:

  • Hash
  • Ord
  • Write
  • RefUnwindSafe
  • UnwindSafe
  • PartialOrd

serdes Serialize and Deserialize could also be supported?

Convenience:

  • From<BTreeMap<usize, T>>
  • Extend<&'a T>
  • From<&'_ str>
  • From<BinaryHeap<T>>
  • From<Box<[T]>>
  • From<CString>
  • From<String>
  • From<Cow<'a, [T]>>
  • PartialEq<&'_ [B; N]>
  • PartialEq<&'_ mut [B]>
  • PartialEq<[B; N]>

Confusion with `push` and `insert` methods

There seem to be two ways to add an element to a StableVec:

  1. StableVec::push, which inserts an element in the next free slot? It is not entirely clear by the documentation what this is supposed to be?

  2. StableVec::insert, which inserts an element at the specified position, but does panic if the index is out of bounds (which makes it quite inconvenient, because you would have to always call StableVec::reserve_for(index) and then call StableVec::insert, to prevent panics).

I am a bit confused about the behavior of those two methods.

  • I expect a function called push to push an element to the back of the vec (like Vec::push)
  • and I do not expect insert to panic if the index is out of bounds (I see this method more like a HashMap::<usize, T>::insert)

I would suggest adding the following methods:

  • StableVec::push_back(&mut self, value: T), appends an element to the back of the StableVec, ignoring free slots
  • StableVec::pop_back(&mut self) -> Option<T>, removes the last element from a StableVec

And changing the StableVec::insert method to not panic if the capacity is exhausted and instead resize the vector.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.