Giter VIP home page Giter VIP logo

slice-group-by's Introduction

slice-group-by

slice-group-by crate slice-group-by documentation dependency status build & tests worflow License

An implementation of the groupBy Haskell function, providing tools for efficiently iterating over slice and str by groups defined by a function that specifies if two elements are in the same group.

Differences with Itertools::group_by

The Itertools::group_by method use a key to compare elements, this library works like, say, slice::sort_by, it uses a comparison function. It works on every Iterator type, slice-group-by work only with slice and str, which is the power of this library, it is fast thanks to data locality.

Also slice-group-by support multiple search algorithms (i.e. linear, binary and exponential search) and can return groups starting from the end.

Examples

Linear Searched Immutable Groups

You will only need to define a function that returns true if two elements are in the same group.

The LinearGroupBy iterator will always gives contiguous elements to the predicate function.

use slice_group_by::GroupBy;

let slice = &[1, 1, 1, 3, 3, 2, 2, 2];

let mut iter = slice.linear_group_by(|a, b| a == b);

assert_eq!(iter.next(), Some(&[1, 1, 1][..]));
assert_eq!(iter.next(), Some(&[3, 3][..]));
assert_eq!(iter.next(), Some(&[2, 2, 2][..]));
assert_eq!(iter.next(), None);

Linear Searched Immutable Str Groups

You will only need to define a function that returns true if two char are in the same group.

The LinearStrGroupBy iterator will always gives contiguous char to the predicate function.

use slice_group_by::StrGroupBy;

let string = "aaaabbbbb饰饰cccc";

let mut iter = string.linear_group_by(|a, b| a == b);

assert_eq!(iter.next(), Some("aaaa"));
assert_eq!(iter.next(), Some("bbbbb"));
assert_eq!(iter.next(), Some("饰饰"));
assert_eq!(iter.next(), Some("cccc"));
assert_eq!(iter.next(), None);

Binary Searched Mutable Groups

It is also possible to get mutable non overlapping groups of a slice.

The BinaryGroupBy/Mut and ExponentialGroupBy/Mut iterators will not necessarily gives contiguous elements to the predicate function. The predicate function should implement an order consistent with the sort order of the slice.

use slice_group_by::GroupByMut;

let slice = &mut [1, 1, 1, 2, 2, 2, 3, 3];

let mut iter = slice.binary_group_by_mut(|a, b| a == b);

assert_eq!(iter.next(), Some(&mut [1, 1, 1][..]));
assert_eq!(iter.next(), Some(&mut [2, 2, 2][..]));
assert_eq!(iter.next(), Some(&mut [3, 3][..]));
assert_eq!(iter.next(), None);

Exponential Searched Mutable Groups starting from the End

It is also possible to get mutable non overlapping groups of a slice even starting from end of it.

use slice_group_by::GroupByMut;

let slice = &mut [1, 1, 1, 2, 2, 2, 3, 3];

let mut iter = slice.exponential_group_by_mut(|a, b| a == b).rev();

assert_eq!(iter.next(), Some(&mut [3, 3][..]));
assert_eq!(iter.next(), Some(&mut [2, 2, 2][..]));
assert_eq!(iter.next(), Some(&mut [1, 1, 1][..]));
assert_eq!(iter.next(), None);

slice-group-by's People

Contributors

alexcrichton avatar kerollmops avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

slice-group-by's Issues

LinearGroupByKey could allow provided closure to borrow from input

Current:

fn linear_group_by_key<F, K>(&self, func: F) -> LinearGroupByKey<T, F>
    where F: FnMut(&T) -> K,
          K: PartialEq

Does not allow:

let strings = vec!["A".to_owned()];
strings.linear_group_by_key(|s| s.as_str());

In order for that to work, we need:

fn linear_group_by_key<'a, F, K>(&'a self, func: F) -> LinearGroupByKey<T, F>
    where F: FnMut(&'a T) -> K,
          K: PartialEq

Since nothing moves in memory it should be possible.

Lot of unsafe code could probably be replaced by safe code without perf. impact

Looking at the source code:
https://docs.rs/slice-group-by/0.2.6/src/slice_group_by/linear_group/linear_group_by.rs.html#131-136

It seems this crate makes extensive use of unsafe.
However, it also looks like this unsafe is unnecessary as it creates no performance improvement:

Representation with the two pointers is exactly the memory representation of a slice, so it looks like this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    ptr: *const T,
    end: *const T,
    predicate: P,
    _phantom: marker::PhantomData<&'a T>,
}

could be replaced by this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    elements_left: &'a [T],
    predicate: P,
}

while this:

                let mut i = 0;
                let mut ptr = self.ptr;

                // we use an unsafe block to avoid bounds checking here.
                // this is safe because the only thing we do here is to get
                // two elements at `ptr` and `ptr + 1`, bounds checking is done by hand.

                // we need to get *two* contiguous elements so we check that:
                //  - the first element is at the `end - 1` position because
                //  - the second one will be read from `ptr + 1` that must
                //    be lower or equal to `end`
                unsafe {
                    while ptr != self.end.sub(1) {
                        let a = &*ptr;
                        ptr = ptr.add(1);
                        let b = &*ptr;

                        i += 1;

                        if !(self.predicate)(a, b) {
                            let slice = $mkslice(self.ptr, i);
                            self.ptr = ptr;
                            return Some(slice)
                        }
                    }
                }

could be rewritten using for (i,s) in self.elements_left.windows(2).enumerate() and slice::split_at/slice::split_at_mut.

(and obviously this:

                // `i` is either `0` or the `slice length - 1` because either:
                //  - we have not entered the loop and so `i` is equal to `0`
                //    the slice length is necessarily `1` because we ensure it is not empty
                //  - we have entered the loop and we have not early returned
                //    so `i` is equal to the slice `length - 1`
                let slice = unsafe { $mkslice(self.ptr, i + 1) };
                self.ptr = self.end;
                Some(slice)

by this:

Some(std::mem::replace(&mut self.elements_left, &[]))

)

I don't think this change would create additional runtime cost, as most double-bounds-checking (or constant-bounds-checking in particular with the constants 2 and 1 here) usually gets optimized-out by the compiler, and I would feel much more comfortable using this crate, as I wouldn't like to introduce so much unsafe to our codebase for such a simple algorithm.

no-std

I think this crate could be no-std -- this would make it usable on WASM or other such environments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.