orium / rpds Goto Github PK
View Code? Open in Web Editor NEWRust persistent data structures
License: Mozilla Public License 2.0
Rust persistent data structures
License: Mozilla Public License 2.0
rpds/src/set/red_black_tree_set/mod.rs
Lines 81 to 92 in ed9e65e
error[E0599]: no method named `get` found for struct `RedBlackTreeSet` in the current scope
--> src/lib.rs:456:17
|
456 | set.get(x)
| ^^^ method not found in `RedBlackTreeSet<Val>`
For my use-case it'd be useful to have a get
that returned the given key, as if the set were a map where the keys are also the values, (or None
, some value indicating lack of value in the set).
I can wrap the set type(s) with a custom implementation to achieve the above, but worry about losing the traits/impls that the set type(s) already have (or having to forward, with whatever pain may come).
As #41 suggests, when dropping a large List, rust recursively calls Arc's drop function, which leads to stackoverflow. We can implement our custom Drop trait for List to prevent this, the following code seems to solve this problem:
impl<T, P> Drop for List<T, P>
where
P: SharedPointerKind,
{
fn drop(&mut self) {
let mut head = self.head.take();
while let Some(node) = head {
if let Ok(mut node) = SharedPointer::try_unwrap(node) {
head = node.next.take();
}
else {
break;
}
}
}
}
We will not need to use the iter_no_drop
as criterion offers something similar.
See #31
This should be stable by rust 1.27.
I have been hacking on a persistent red-black tree in C++ recently. I just wanted to say thank you for your Rust implementation. It's been incredibly useful in getting this working!
Empty data structures should need no heap allocation.
Vector
(See #28)HashTrieMap
See #24
We can test this with https://github.com/Windfisch/rust-assert-no-alloc
When wanting to update a single value in the HashMap, we just call insert
with the new value (and the same key that already was there). Of course, a wrapper could be written to only insert if the key already exists; this is simple to do for the user (although it might be a worthwhile addition to HashTrieMap's API as well, because it is a very common operation.)
However, when I want to update all of the values, I am out of luck: It is not possible to get a multable iterator to the values or (key, value)-pairs with the current API. Is this a restriction by the underlying data structure? Or is this something that could be improved? (Or is it already possible now and did I loverlook it?)
HashTrieSet
RedBlackTreeSet
Depends on #21.
I was experimenting with using a RedBlackTree
as a priority queue and ended up with:
let next_work = queue.first();
if let Some(k) = next_work {
queue.remove_mut(k);
}
It would be nice to have a pop_first
that removed the first item and returned it in one.
I was recently toying with the idea to port of https://github.com/usethesource/capsule which implemented https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf in Java.
Curious if you see the CHAMP option as a future state for rpds
?
Hi,
I was reviewing the code in cargo's resolver when I came across this line:
"We should switch to persistent hash maps if we can..."
and thought didn't I just see a persistent hash map crate.
So if/when this crate is ready, perhaps a pr thare?
Just my 2c.
I was trying to look up if new()
allocates on the heap, or if like std it only allocates when the first item is inserted. Then I wanted to know if clone()
allocates, I assume no, but a O(1)
allocation on clone would also be persistent. I found the Complexity section of each doc, but did not see an ancer.
Any chance for adding a Finger Tree data-structure to rpds? 😃🙏
(there are references for this being done—at least in partially persistent ways—as early as '86)
Is it possible to search RedBlackTreeMap by index?
like this
This should happen in rust 1.26: rust-lang/rust#47463
This is bad because it involves more heap allocations and also more efford when Vec
s need to expand.
To not lose the ability to configure the branching factor this is better done when rust-lang/rust#44580 is ready and stable. If we need this before const-generics stabilize then we can use the smallvec crate + some type aliases.
I'm actually really interested in hearing about the tradeoffs I should consider when deciding whether to use this crate, because I know that immutable structures can have some great benefits in specific cases. Would it be possible to get some benchmarks demonstrating how the structures implemented here can improve performance in those cases?
Simple benchmark that includes an in_list
reversal.
Attempting to implement a wrapper for HashTrieMap and got the following error:
SIGSEGV: invalid memory reference
Most of the relevant code is below. It compiles but fails when running the test. Any idea what a fix would be for this?
type ContainedHval = Box<HVal>;
#[derive(Clone,Debug)]
pub struct HDict(HashTrieMap<String, ContainedHval>);
//pub type HDict<'b> = HashTrieMap<String, Box<HVal<'b>>>;
impl HVal for HDict {
fn to_zinc(&self) -> String {
"Not implemented!".to_string()
}
}
impl Eq for HDict {}
impl PartialEq for HDict {
fn eq(&self, other: &Self) -> bool {
if self.0.size() != other.0.size() {
return false;
}
for (k,v) in self.0.iter() {
let tmp = match other.0.get(k) {
Some(ov) => **v == **ov,
None => false
};
if !tmp {
return false;
}
}
true
}
}
impl HDict {
pub fn new() -> HDict {
HDict(HashTrieMap::new())
}
fn has(&self, name: &String) -> HBool {
self.0.contains_key(name)
}
fn is_empty(&self) -> HBool {
self.0.is_empty()
}
fn missing(&self, name: &String) -> HBool {
!self.0.contains_key(name)
}
pub fn iter(&self) -> Iter<String, ContainedHval> {
self.0.iter()
}
fn insert(&mut self, name: &String, value: ContainedHval) -> Self {
HDict(self.0.insert(name.clone(),value))
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn create() {
let a = HDict::new();
let b = HDict::new().insert(&"Hello".to_owned(),Box::new(true));
//assert_ne!(a,a.clone().insert(&"Hello".to_owned(),Box::new(true)));
//assert_eq!(a,HDict::new());
///*
assert_eq!(
b.clone(),
HDict::new().insert(&"Hello".to_owned(),Box::new(true))
);
//*/
}
}
In #rust today someone was struggling to serialize rpds::Vector (logs). It would be great to have Serde impls for these data structures that can be enabled by a serde
cfg.
I was playing around this library when I encountered a stack overflow when allocating large lists (>100k elements).
Here is the diff in list tests which reproduces the error:
diff --git a/src/list/test.rs b/src/list/test.rs
index 7cbbcc0..a6315d5 100644
--- a/src/list/test.rs
+++ b/src/list/test.rs
@@ -10,7 +10,7 @@ mod iter {
#[test]
fn test_iter() {
- let limit = 1024;
+ let limit = 1024 * 1024;
let mut list = List::new();
let mut left = limit;
With the modification above, we get:
$ cargo test list::test::iter::test_iter
Finished dev [unoptimized + debuginfo] target(s) in 0.06s
Running target/debug/deps/rpds-eca271399800c9ae
running 2 tests
test list::test::iter::test_iter_size_hint ... ok
thread 'list::test::iter::test_iter' has overflowed its stack
fatal runtime error: stack overflow
Some GDB backtrace output (continues very long):
#0 0x00005555557a1c31 in <core::ptr::NonNull<T>>::as_ref (self=0x7fffeb111090)
at /checkout/src/libcore/ptr.rs:2915
#1 0x0000555555759343 in <alloc::sync::Arc<T>>::inner (self=0x7fffeb111090) at /checkout/src/liballoc/sync.rs:520
#2 0x000055555576fc63 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111090)
at /checkout/src/liballoc/sync.rs:946
#3 0x00005555556071ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#4 0x0000555555609311 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#5 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb1110d8)
at /checkout/src/liballoc/sync.rs:528
#6 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb1110d8)
at /checkout/src/liballoc/sync.rs:981
#7 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#8 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#9 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#10 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111118)
at /checkout/src/liballoc/sync.rs:528
#11 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111118)
at /checkout/src/liballoc/sync.rs:981
#12 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#13 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#14 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#15 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111158)
at /checkout/src/liballoc/sync.rs:528
#16 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111158)
at /checkout/src/liballoc/sync.rs:981
#17 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#18 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#19 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#20 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111198)
at /checkout/src/liballoc/sync.rs:528
#21 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111198)
at /checkout/src/liballoc/sync.rs:981
#22 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#23 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#24 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#25 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb1111d8)
at /checkout/src/liballoc/sync.rs:528
#26 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb1111d8)
at /checkout/src/liballoc/sync.rs:981
#27 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#28 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#29 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#30 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111218)
at /checkout/src/liballoc/sync.rs:528
#31 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111218)
at /checkout/src/liballoc/sync.rs:981
#32 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#33 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#34 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#35 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111258)
at /checkout/src/liballoc/sync.rs:528
#36 0x0000555555770b28 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7fffeb111258)
at /checkout/src/liballoc/sync.rs:981
#37 0x00005555556040ce in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#38 0x0000555555607e2d in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#39 0x0000555555609346 in core::ptr::drop_in_place () at /checkout/src/libcore/ptr.rs:59
#40 0x00005555557619d9 in <alloc::sync::Arc<T>>::drop_slow (self=0x7fffeb111298)
at /checkout/src/liballoc/sync.rs:528
...
It seems that the internal routines in Rust for managing Arc
references work recursively, in which case this might not be easy to fix.
See 6af0922.
What is a good way to get the current memory usage of a rpds structure? I'm looking for something like Servo's malloc_size_of
. I can get the size of leaves but I'm missing the size of branches. Or is it trivial to calculate it? Any recommendation would be appreciated.
Is it possible to search RedBlackTreeMap like std's BTreeMap range function?
It would be nice to have these impls available.
List
Vector
Stack
Queue
HashTrieMap
RedBlackTreeMap
For set we have #20.
I would like to make rpds
an optional dependency while at the same time implementing some traits in a generic manner
(i.e. impl<T, P: archery::SharedPointerKind> for rpds::Vector<T, P>
).
I can't find a way to make this work without adding archery
as an optional dependency and then requiring users to enable both features.
List
Vector
Stack
Queue
HashTrieMap
HashTrieSet
RedBlackTreeMap
RedBlackTreeSet
Make sure to have nice examples (as the standard collections do).
I was using im
and recently moving to rpds
. im
as methods like skip
take
drop_first
...which create slices from existing vectors. I can't find these methods in rpds
so I have to use for... .iter()
and then copy elements into new vectors, and that would be slow due to lack of structural sharing.
Is there suggesting ways to make slices?
updated,
probably just #61 .
These can use {Rc,Arc}::make_mut()
that can speed up things a lot if there is not a lot of structural sharing.
List
Vector
Stack
Queue
HashTrieMap
HashTrieSet
RedBlackTreeMap
RedBlackTreeSet
README.md
It would be nice to either use Box
or Rc
as the underlying smart pointer type. Alas, until we have associated generic types, we can't be polymorphic over this, but perhaps in the interim code generation could be used to create concrete implementations of:
ArcList
RcList
BoxList
Then once associated generic types comes we could convert these to type aliases that refer to a common List
type.
From a quick look we might be able to do this with criterion (see #32). We can also remove the no_drop
thing.
When inserting values in bulk it should be possible to do better than calling insert/push/etc
repeatedly. Each insertion call requires that the tree gets traversed again but this could instead be avoided by reusing this traversal for each subsequent call.
I find myself doing this a lot:
match foo.peek() {
Some(head) => {
foo = foo.pop().unwrap();
}
}
As far as I can see, there isn't a single API for popping an item and returning it. Would it make sense to add one?
(I'm sure you can think of a better name if you do add this.)
The port to C++ I've made of the red-black tree is working perfectly with a range of tests. While I was porting, I ran across a operation in the Rust implementation that raised a question for me.
It's in reference to make_black
and make_red
. These two methods appear to be modifying the color of an existing node, rather than returning a copy of a node with a new color. At least that's how I'm reading the Rust code, but I'm by no means very comfortable in the language.
https://github.com/orium/rpds/blob/master/src/map/red_black_tree_map/mod.rs#L166
In my port I made Node be fully immutable, such that make_black and make_red return copies with only the color modified. I have not encountered any issue, but it occurs to me that perhaps there is an inefficiency that results by breaking some structure sharing in the underlying tree.
I was hoping you could perhaps comment on those methods and their semantics / role in the algorithm.
Best,
Noah
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.