contain-rs / bit-vec Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 58.0 864 KB

A Vec of Bits

License: Apache License 2.0

Shell 0.18% Rust 99.82%

bit-vec's People

Contributors

Stargazers

Watchers

bit-vec's Issues

Why does to_bytes() reverse the bits?

I was looking for an efficient way to get a &[u8] from the BitVec to write the bits out to file (when the memory layout is little-endian). I was thinking of adding an as_bytes() view, but I noticed that to_bytes() flips the bits within each byte so that the as_bytes() I want would be inconsistent with it. And from_bytes() does the same flip (which made my tests in the code I'm writing fail confusingly).

Why was the current version implemented as such?
Can we expose a byte slice view or an iterator that views the bytes in little-bit-endian format?

Implement Index<Range>, Index<RangeFrom>, Index<RangeTo>, Index<RangeFull>

Hi!

People are asking for bit level parsing in nom, and to use BitVec, I would need to implement those traits to get the slicing syntax on bit streams.

I see in the code that Index<usize> is already implemented, but there's a big warning. Could you explain what it means?

I can try to implement those, but I wonder if you would accept it (because of that warning)

Should `set` be renamed to `switch`?

set usually means "set to 1", clear means "set to 0", toggle means "set to negation" and from my experience switch is a good word for setting to given boolean value.

Add shrink_to_fit and split_off methods

These are ones that Vec has that would also make sense on BitVec. Unfortunately, the latter is much harder to implement.

'blocks_for_bits' implementation.

I don't know how the function as it exists is currently optimised, but in principle you can much more simply use (without risk of over/underflow):

if bits == 0 {
    0
} else {
    (bits-1) / B::bits() + 1
}

If bits were signed (and you have a signed right shift operated that was not UB) you can just use (bits-1) >> B::bitsbits() + 1 where bitsbits() is the number of bits required to store the index into a word (i.e. 5 for 32 bit words). This is probably not practical though.

Concurrent read/write support

Hello, I've been using BitVec for both a single-threaded and a multi-threaded version of the Sieve of Eratosthenes algorithm. Please note that I'm a beginner in the Rust language, so I might be doing things wrong.

In the documentation I haven't found anything indicating that this structure supports concurrent access and putting it in an Arc<Mutex<BitVec>> seems like a terrible idea when using multiple threads for computations as then it is effectively the same or worse than using a single thread.

Have I missed a part of the documentation where it says that concurrency is supported?
If not, are there plans to support concurrency?

To me it would seem most logical to put the BitBlock under a Mutex<BitBlock>, but that is just my beginner's view on this matter. On the other hand I could also hand out slices of the full vector to each thread and in that case I wouldn't even need a mutex conceptually to access the values safely, not sure how this is done in Rust though.

Question: Best way to negate a section of BitVec?

As a trivial example, suppose we set let k = bv.len() / 4. What would be the most efficient way to flip all the bits from index k to 3 * k?

Illegal memory access

It appears that sometimes this crate does illegal memory access, and/or the test harness somehow does. When I run the tests with Valgrind, I get the following output:

 /  tmp  bit-vec  valgrind ./target/debug/deps/bit_vec-3219dec6733cc80b                                                                                                                                                          master 
==31860== Memcheck, a memory error detector
==31860== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31860== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==31860== Command: ./target/debug/deps/bit_vec-3219dec6733cc80b
==31860== 

running 32 tests
==31860== Thread 5 tests::test_1_el:
==31860== Invalid read of size 8
==31860==    at 0x1BB9F0: je_tcache_bin_flush_small (tcache.c:132)
==31860==    by 0x1BB9F0: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x6000000 is in a rwx anonymous segment
==31860== 
==31860== Invalid read of size 8
==31860==    at 0x1BB813: tcache_destroy (tcache.c:109)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x6000000 is in a rwx anonymous segment
==31860== 
==31860== Invalid read of size 8
==31860==    at 0x1BB874: je_tcache_bin_flush_small (tcache.c:132)
==31860==    by 0x1BB874: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x6000000 is in a rwx anonymous segment
==31860== 
==31860== Invalid read of size 8
==31860==    at 0x19C5CC: je_arena_mapbits_small_runind_get (arena.h:857)
==31860==    by 0x19C5CC: arena_dalloc_bin_locked_impl (arena.c:2954)
==31860==    by 0x19C5CC: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860==    by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860==    by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x6000018 is in a rwx anonymous segment
==31860== 
==31860== Invalid read of size 8
==31860==    at 0x19C636: je_arena_ptr_small_binind_get (arena.h:1096)
==31860==    by 0x19C636: arena_run_reg_dalloc (arena.c:307)
==31860==    by 0x19C636: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860==    by 0x19C636: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860==    by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860==    by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x6000018 is in a rwx anonymous segment
==31860== 
==31860== Invalid read of size 8
==31860==    at 0x19C6C7: je_bitmap_unset (bitmap.h:247)
==31860==    by 0x19C6C7: arena_run_reg_dalloc (arena.c:323)
==31860==    by 0x19C6C7: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860==    by 0x19C6C7: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860==    by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860==    by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  Address 0x15555555b556138 is not stack'd, malloc'd or (recently) free'd
==31860== 
==31860== Can't extend stack to 0x402a138 during signal delivery for thread 5:
==31860==   no stack segment
==31860== 
==31860== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==31860==  Access not within mapped region at address 0x402A138
==31860==    at 0x19C6C7: je_bitmap_unset (bitmap.h:247)
==31860==    by 0x19C6C7: arena_run_reg_dalloc (arena.c:323)
==31860==    by 0x19C6C7: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860==    by 0x19C6C7: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860==    by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860==    by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860==    by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860==    by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860==    by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860==  If you believe this happened as a result of a stack
==31860==  overflow in your program's main thread (unlikely but
==31860==  possible), you can try to increase the size of the
==31860==  main thread stack using the --main-stacksize= flag.
==31860==  The main thread stack size used in this run was 8388608.
==31860== Invalid write of size 8
==31860==    at 0x4A27660: _vgnU_freeres (vg_preloaded.c:59)
==31860==  Address 0x402aff8 is on thread 5's stack
==31860== 
==31860== 
==31860== Process terminating with default action of signal 11 (SIGSEGV)
==31860==  Access not within mapped region at address 0x402AFF8
==31860==    at 0x4A27660: _vgnU_freeres (vg_preloaded.c:59)
==31860==  If you believe this happened as a result of a stack
==31860==  overflow in your program's main thread (unlikely but
==31860==  possible), you can try to increase the size of the
==31860==  main thread stack using the --main-stacksize= flag.
==31860== 
==31860== HEAP SUMMARY:
==31860==     in use at exit: 1,220 bytes in 7 blocks
==31860==   total heap usage: 20 allocs, 13 frees, 3,444 bytes allocated
==31860== 
==31860== LEAK SUMMARY:
==31860==    definitely lost: 0 bytes in 0 blocks
==31860==    indirectly lost: 0 bytes in 0 blocks
==31860==      possibly lost: 1,152 bytes in 4 blocks
==31860==    still reachable: 68 bytes in 3 blocks
==31860==         suppressed: 0 bytes in 0 blocks
==31860== Rerun with --leak-check=full to see details of leaked memory
==31860== 
==31860== For counts of detected and suppressed errors, rerun with: -v
==31860== ERROR SUMMARY: 8 errors from 7 contexts (suppressed: 0 from 0)
zsh: segmentation fault (core dumped)  valgrind ./target/debug/deps/bit_vec-3219dec6733cc80b

Maintenance?

Is anyone still maintaining this lib? Is there a better fork available? We've got a lot of open PRs/issues that I'd like to see worked towards. Would be happy to take over if there's no one else doing it.

Add a BitVecView<'a> that references the BitVec with offset / len

This is related to #13, but I think it can be implemented before IndexGet by adding

impl BitVec {
    fn slice<'a, R: RangeArgument>(&self, offset: usize, len: usize) -> BitVecView<'a> { ... }
}

Thoughts?

Implement bit-wise traits?

I'm currently trying to implement a few bit-parallel algorithms in Rust that I planned to use the BitVec type for. I frequently have to use bit-wise operations (|,&,^,<<,>>) on the bit vectors and was a bit disappointed that only &, | and ^ are supported, and that only as in-place operations via union, difference and intersection. Shifting is currently not supported at all.

Is there a specific reason why the bit-wise traits for these operations from std::ops (BitAnd, Shl, etc) are not implemented for this type? If not, would there be interest in a PR that implements them?

Set/unset bits iterator

We could use leading_zeros or trailing_zeros to iterate over the indices of set or unset bits, avoiding lots of arithmetic/shifting.

I could whip up a PR if it's decided it's worth adding.

Stack-based storage for short vectors?

Since Vec is three usize values, we could have a stack-based variant of BitVec that could store up to 96 bits (on x86) or 192 bits (on x64) in the struct itself. This could potentially be much faster for short bitvectors since it doesn't require allocations. For example, my crate img_hash by default uses bitvectors of size 64.

With const fn, this could even work with the current user-defined block size design using mem::size_of() to calculate at compile time how many blocks of the given type could fit in the space that Vec would take up:

enum BitVecStorage<B> {
    Heap(Vec<B>),
    Stack([B; mem::size_of::<Vec<B>>() /  mem::size_of::<B>()]),
}

pub struct BitVec<B=u32> {
    storage: BitVecStorage<B>,
    nbits: usize,        
}

With the exception of B=u64 on a x86 system, as 3 * 32 % 64 != 0, both variants will essentially use the same amount of space. And the stack storage can be transformed to heap storage very efficiently via slice::into_vec(self: Box<[T]>).

The only downsides I can see are an extra byte for the enum discriminator, and system-dependent performance since the point at which it switches to heap storage depends on the block type and the bitness of the system.

clear is inconsistent with all other collections

clear currently just zeros all the bits, but by convention it should set the len to 0.

We should have a clear_all method that zeros the bits in symmetry to set_all.

no way to disable serde on 0.6 with stable Rust

There seems to be no way to disable serde support on the recent bit-vec 0.6.0 release without having to disable std support as well. It's because of the line std = ["serde/std"] line in Cargo.toml: if you enable std you automatically also enable serde. This is a 0.5 → 0.6 regression. Either, add a std_no_deps feature that is used inside the code and that std depends on, or just erase that line and expect users to enable the std feature in serde themselves.

Implement operators for `BitAnd`, `BitOr` and `BitXor`

These could just delegate to the current methods, but allow the user a bit less confusing use:

this.or(that); // looks like it shouldn't be modifying `this`
this |= that;

Documentation is missing

The README says

Documentation is available at https://contain-rs.github.io/bit-vec/bit_vec.

But that URL yields an HTTP 404 error.

Support Custom Allocators

may be

let mut bits = BitVec::from_bytes(&mut mmap[..]);

but now (can't be mut)

let mut bits = BitVec::from_bytes(&mmap[..]);

Infinite recursion issue

The implementation of count_ones() does not make sense. It's actually an infinite recursion. Or is this intended? Or a bug? If intended, what is it supposed to do? Is it not infinite, or am I reading the code correctly?

bit-vec/src/lib.rs

Lines 153 to 164 in d3585be

 impl BitBlock for $t { 

 #[inline] 

 fn bits() -> usize { $size } 

 #[inline] 

 fn from_byte(byte: u8) -> Self { $t::from(byte) } 

 #[inline] 

 fn count_ones(self) -> usize { self.count_ones() as usize } 

 #[inline] 

 fn one() -> Self { 1 } 

 #[inline] 

 fn zero() -> Self { 0 } 

 }

Implement windows|chunks|chunks_mut, a la [T]

Function get(&self, i: usize)

https://docs.rs/bit-vec/0.5.0/bit_vec/struct.BitVec.html#method.get

Function returns MSB. I need LSB. How do it?

Actually contemplate the internals we expose

set_len
blocks
storage
storage_mut

were exposed to permit a separation into two crates. While I think these are useful in general, a more principled approach is probably desirable.

Also set_len and storage_mut were marked as unsafe, which I think is strictly unnecessary (bitvec has no unsafe code), but I left as a "here be dragons" reservation for future unsafety. Might not want to have it like that, though.

Add a “view” type for sequences of integers, to allow bitwise operations without copying

BitVec lets you manipulate individual bits, which is often useful, but it doesn't really help if you have a [u8] (especially a large one). Adding a sort of BitVecView type that could be created from a reference to a slice of integers and would behave like a BitVec using the original slice for storage would enable this.

Add a method to get a partition of `BitVec`

It's sometimes helpful to get a partition of BitVec specified by a given range. Specifically, the following API might be useful:

/// Return a new `BitVec` at a given range. This operation is taking
/// O(range) time.
    pub fn get_range(&self, range: Range<usize>) -> Self;

This function will essentially copy range/block_size blocks with some constant time adjustments.

It's also helpful to have its iterator equivalent:

  pub fn iter(&self, range: Range<usize>) -> Iter<B> {
      self.ensure_invariant();
      Iter { bit_vec: self, range }
  }

Those two features will be useful in applications like MPC. It would be helpful if similar features are already implemented. Otherwise I can submit a PR. Thanks!

Push new documentation

I'd like to xor two bitvecs. The documentation suggests there is no such method. However, the latest release does contain the xor method. In other words: The documentation is outdated.

Please rebuild and re-upload, thanks!

(And thanks for the nice crate! No additional dependencies, just what I like to see.)

Grab tests from libcollectionstest

Forgot

Publish 0.4.2?

I'd like to publish a new version of mioco and it depends on the recent fix for usize. Could you push a new version to crates.io?

Discussion: u32 default?

Not really a bug but I wanted to bring this up. In the bitvec struct the default storage size is u32. I typically use it with u8 but I appreciate the use of u32 or a higher size storage since this is allows register alignment. However, shouldn't it be usize to account for the wordsize on 64bit processors as well as 32bit?

Remove dependence on feature(core)

Too lazy right now

`set_all` slow?

I noticed a regression in w3f/vdf@961caee which indicates set_all being slower than an allocation. Is it perhaps that the allocation is being optimized away, but the for loop in set_all has some overhead?

Restore `append` feature

This feature is unstable in nightly branch of rust, but has been forgotten when migrating from std::collections to this external repo.

Question: Is it possible to efficiently find a sequence of bits?

Hi!
I don't know the relevance of this issue regarding BitVecs in particular, but is there a way to efficiently find a sequence of (un)set bits with a precise length? Ex: 15 set bits (and return the location of the first one)

`BitVec::append` fails silently

Hello, I've been using BitVec extensively for a project and I noticed that BitVec::append doesn't work all the time. See below for a minimal example, where I would expect the output to be [01, 00, 02, 03], but BitVec returns [01, 00, 02, 00].

use bit_vec::BitVec;

#[test]
fn it_works() {
    // Create a 8 bit integer with value (`1`) encoded in BE.
    let mut a = BitVec::from_elem(8, false);
    a.set(7, true);

    // Create a 16 bit integer with value (`2`) encoded in BE.
    let mut b = BitVec::from_elem(16, false);
    b.set(14, true);

    // Create a 8 bit integer with value (`3`) encoded in BE.
    let mut c = BitVec::from_elem(8, false);
    c.set(6, true);
    c.set(7, true);

    // Merge all bit_vecs together
    a.append(&mut b);
    a.append(&mut c);

    assert_eq!(&[01, 00, 02, 03][..], &*a.to_bytes());
}

Allow all non-resizing bitvec methods to take &self and use internal mutation?

I think BitVec could use Cell<BitBlock> to avoid taking &mut self whenever bits are the only mutated thing.

This would change the signatures of

storage
storage_mut
set
set_all
clear
negate
union
intersect
difference

More efficient `reverse_bits` implementation.

In general the reverse_bits function should be generalised to work on whatever word type the BitVec object is using and then be applied to accumulator directly instead of each byte (assuming the bytes are reversed while assembling the accumulator value). Additionally bit switching should be cheaper than shifting. Consider:

fn reverse_bits<B: BitBlock>(block: B) -> B {
    let mut result = B::zero();
    result = ((block >> 1) & (!B::zero() / 3)) | ((block & (!B::zero() / 3)) << 1);
    result = ((block >> 2) & (!B::zero() / 5)) | ((block & (!B::zero() / 5)) << 2);
    result = ((block >> 4) & (!B::zero() / 17)) | ((block & (!B::zero() / 17)) << 4);
    result
}

I'm assuming !B::zero() / 3 etc. become compile time constants.

	impl BitBlock for $t {
	#[inline]
	fn bits() -> usize { $size }
	#[inline]
	fn from_byte(byte: u8) -> Self { $t::from(byte) }
	#[inline]
	fn count_ones(self) -> usize { self.count_ones() as usize }
	#[inline]
	fn one() -> Self { 1 }
	#[inline]
	fn zero() -> Self { 0 }
	}

contain-rs / bit-vec Goto Github PK

bit-vec's People

Contributors

Stargazers

Watchers

Forkers

bit-vec's Issues

Recommend Projects

Recommend Topics

Recommend Org