dcjones / coitrees Goto Github PK
View Code? Open in Web Editor NEWA very fast interval tree data structure
License: MIT License
A very fast interval tree data structure
License: MIT License
I was using this library in a project and noticed a subtle potential bug: taking a Vec<Interval<()>>
and creating both BasicCOITree
and NeonCOITree
objects, and then iterating over their intervals with .iter()
produces different results. I have created a MRE (mre.rs
)
use coitrees::{BasicCOITree, Interval, IntervalTree, NeonCOITree};
fn main() {
let mut intervals: Vec<Interval<()>> = Vec::new();
intervals.push(Interval { first: 0, last: 10, metadata: () } );
intervals.push(Interval { first: 11, last: 14, metadata: () } );
let basic_trees: BasicCOITree<(), usize> = BasicCOITree::new(&intervals);
for interval in basic_trees.iter() {
println!("basic tree interval: {:?}", interval);
}
let neon_trees: NeonCOITree<(), usize> = NeonCOITree::new(&intervals);
for interval in neon_trees.iter() {
println!("neon tree interval: {:?}", interval);
}
}
which can be placed in the examples/
directory and run with cargo run --example mre
to see the behavior. The output I see is:
basic tree interval: Interval { first: 0, last: 10, metadata: () }
basic tree interval: Interval { first: 11, last: 14, metadata: () }
neon tree interval: Interval { first: -1, last: 11, metadata: () }
neon tree interval: Interval { first: 10, last: 15, metadata: () }
I believe the potential issue could be due to these lines (https://github.com/dcjones/coitrees/blob/main/src/neon.rs#L62-L81). Unfortunately, I do not know enough about targeting these specific architectures to know what is the right way to address this issue and create a pull request. Perhaps these lines are correct, but the iterator for NeonCOITree
needs to be adapted to output offset intervals.
I would like to add more discussions on coitrees in my bedtk manuscript. I want to make sure my following understanding is correct. cgranges encodes the interval tree in the in-order layout. coitrees reorders the array into the van Emde Boas (vEB) layout. It seems that in the implementation, you are keeping the index of the left and right children for efficient access. This increases the memory of the index by two integers per element. Brodal et al (2002) seems to give a method to compute the index of a child using a precomputed table of O(log N) space instead of O(N). Have you considered something along this line? Also, I haven't read the source code to generate the vEB layout. Do you use an auxiliary array? If so, that would increase the peak memory. Is that correct? Thanks.
I'd love to use coitrees in a project, and was stumped why I couldn't import Interval
. It looks to be caused by an older version (2.1) only being available on crates.io. Is there a plan to push the most recent 3.0 version there?
Thanks for creating coitrees, it looks great!
hi @dcjones, what do you think about the quick ideas?
Hi! I've been working on an interval tree crate, rust-lapper. Are you planning on turning this into a library / crate? If not, do you mind if I do?
Hello Team,
For arm CPUs, any possibilities to support it? nightly rust seems to support any CPU structure for SIMD operations.
Thanks,
Jianshu
Hello team,
cargo run --release --example bed-intersect -- test1.bed test2.bed > intersections.bed
error[E0599]: no method named about
found for struct Arg
in the current scope
--> examples/bed-intersect.rs:340:14
|
340 | .about("intervals to index")
| ^^^^^ method not found in Arg<'_>
error[E0599]: no method named about
found for struct Arg
in the current scope
--> examples/bed-intersect.rs:345:14
|
345 | .about("query intervals")
| ^^^^^ method not found in Arg<'_>
error[E0599]: no method named about
found for struct Arg
in the current scope
--> examples/bed-intersect.rs:352:14
|
352 | .about("use alternative search strategy that's faster if queries are sorted and tend to overlap"))
| ^^^^^ method not found in Arg<'_>
error[E0599]: no method named about
found for struct Arg
in the current scope
--> examples/bed-intersect.rs:356:14
|
356 | .about("load both interval sets into memory instead of streaming queries"))
| ^^^^^ method not found in Arg<'_>
error[E0599]: no method named about
found for struct Arg
in the current scope
--> examples/bed-intersect.rs:360:14
|
360 | .about("compute proportion of queries covered"))
| ^^^^^ method not found in Arg<'_>
For more information about this error, try rustc --explain E0599
.
error: could not compile coitrees
due to 5 previous errors
I have this error with the most recent rust v1.65
Thanks,
Jianshu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.