contain-rs / bit-vec Goto Github PK
View Code? Open in Web Editor NEWA Vec of Bits
License: Apache License 2.0
A Vec of Bits
License: Apache License 2.0
I was looking for an efficient way to get a &[u8]
from the BitVec
to write the bits out to file (when the memory layout is little-endian). I was thinking of adding an as_bytes()
view, but I noticed that to_bytes()
flips the bits within each byte so that the as_bytes()
I want would be inconsistent with it. And from_bytes()
does the same flip (which made my tests in the code I'm writing fail confusingly).
So
Hi!
People are asking for bit level parsing in nom, and to use BitVec
, I would need to implement those traits to get the slicing syntax on bit streams.
I see in the code that Index<usize>
is already implemented, but there's a big warning. Could you explain what it means?
I can try to implement those, but I wonder if you would accept it (because of that warning)
set
usually means "set to 1", clear
means "set to 0", toggle
means "set to negation" and from my experience switch
is a good word for setting to given boolean value.
These are ones that Vec
has that would also make sense on BitVec
. Unfortunately, the latter is much harder to implement.
I don't know how the function as it exists is currently optimised, but in principle you can much more simply use (without risk of over/underflow):
if bits == 0 {
0
} else {
(bits-1) / B::bits() + 1
}
If bits
were signed (and you have a signed right shift operated that was not UB) you can just use (bits-1) >> B::bitsbits() + 1
where bitsbits()
is the number of bits required to store the index into a word (i.e. 5 for 32 bit words). This is probably not practical though.
Hello, I've been using BitVec
for both a single-threaded and a multi-threaded version of the Sieve of Eratosthenes algorithm. Please note that I'm a beginner in the Rust language, so I might be doing things wrong.
In the documentation I haven't found anything indicating that this structure supports concurrent access and putting it in an Arc<Mutex<BitVec>>
seems like a terrible idea when using multiple threads for computations as then it is effectively the same or worse than using a single thread.
Have I missed a part of the documentation where it says that concurrency is supported?
If not, are there plans to support concurrency?
To me it would seem most logical to put the BitBlock
under a Mutex<BitBlock>
, but that is just my beginner's view on this matter. On the other hand I could also hand out slices of the full vector to each thread and in that case I wouldn't even need a mutex conceptually to access the values safely, not sure how this is done in Rust though.
As a trivial example, suppose we set let k = bv.len() / 4
. What would be the most efficient way to flip all the bits from index k
to 3 * k
?
It appears that sometimes this crate does illegal memory access, and/or the test harness somehow does. When I run the tests with Valgrind, I get the following output:
/ tmp bit-vec valgrind ./target/debug/deps/bit_vec-3219dec6733cc80b master
==31860== Memcheck, a memory error detector
==31860== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31860== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==31860== Command: ./target/debug/deps/bit_vec-3219dec6733cc80b
==31860==
running 32 tests
==31860== Thread 5 tests::test_1_el:
==31860== Invalid read of size 8
==31860== at 0x1BB9F0: je_tcache_bin_flush_small (tcache.c:132)
==31860== by 0x1BB9F0: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x6000000 is in a rwx anonymous segment
==31860==
==31860== Invalid read of size 8
==31860== at 0x1BB813: tcache_destroy (tcache.c:109)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x6000000 is in a rwx anonymous segment
==31860==
==31860== Invalid read of size 8
==31860== at 0x1BB874: je_tcache_bin_flush_small (tcache.c:132)
==31860== by 0x1BB874: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x6000000 is in a rwx anonymous segment
==31860==
==31860== Invalid read of size 8
==31860== at 0x19C5CC: je_arena_mapbits_small_runind_get (arena.h:857)
==31860== by 0x19C5CC: arena_dalloc_bin_locked_impl (arena.c:2954)
==31860== by 0x19C5CC: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860== by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860== by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x6000018 is in a rwx anonymous segment
==31860==
==31860== Invalid read of size 8
==31860== at 0x19C636: je_arena_ptr_small_binind_get (arena.h:1096)
==31860== by 0x19C636: arena_run_reg_dalloc (arena.c:307)
==31860== by 0x19C636: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860== by 0x19C636: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860== by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860== by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x6000018 is in a rwx anonymous segment
==31860==
==31860== Invalid read of size 8
==31860== at 0x19C6C7: je_bitmap_unset (bitmap.h:247)
==31860== by 0x19C6C7: arena_run_reg_dalloc (arena.c:323)
==31860== by 0x19C6C7: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860== by 0x19C6C7: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860== by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860== by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== Address 0x15555555b556138 is not stack'd, malloc'd or (recently) free'd
==31860==
==31860== Can't extend stack to 0x402a138 during signal delivery for thread 5:
==31860== no stack segment
==31860==
==31860== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==31860== Access not within mapped region at address 0x402A138
==31860== at 0x19C6C7: je_bitmap_unset (bitmap.h:247)
==31860== by 0x19C6C7: arena_run_reg_dalloc (arena.c:323)
==31860== by 0x19C6C7: arena_dalloc_bin_locked_impl (arena.c:2963)
==31860== by 0x19C6C7: je_arena_dalloc_bin_junked_locked (arena.c:2981)
==31860== by 0x1BBD94: je_tcache_bin_flush_small (tcache.c:137)
==31860== by 0x1BBD94: tcache_destroy (tcache.c:371)
==31860== by 0x1BCCC1: je_tcache_cleanup (tcache.c:410)
==31860== by 0x1BD854: je_tsd_cleanup (tsd.c:82)
==31860== by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==31860== by 0x524C1E4: start_thread (in /usr/lib/libpthread-2.26.so)
==31860== by 0x576FE7E: clone (in /usr/lib/libc-2.26.so)
==31860== If you believe this happened as a result of a stack
==31860== overflow in your program's main thread (unlikely but
==31860== possible), you can try to increase the size of the
==31860== main thread stack using the --main-stacksize= flag.
==31860== The main thread stack size used in this run was 8388608.
==31860== Invalid write of size 8
==31860== at 0x4A27660: _vgnU_freeres (vg_preloaded.c:59)
==31860== Address 0x402aff8 is on thread 5's stack
==31860==
==31860==
==31860== Process terminating with default action of signal 11 (SIGSEGV)
==31860== Access not within mapped region at address 0x402AFF8
==31860== at 0x4A27660: _vgnU_freeres (vg_preloaded.c:59)
==31860== If you believe this happened as a result of a stack
==31860== overflow in your program's main thread (unlikely but
==31860== possible), you can try to increase the size of the
==31860== main thread stack using the --main-stacksize= flag.
==31860==
==31860== HEAP SUMMARY:
==31860== in use at exit: 1,220 bytes in 7 blocks
==31860== total heap usage: 20 allocs, 13 frees, 3,444 bytes allocated
==31860==
==31860== LEAK SUMMARY:
==31860== definitely lost: 0 bytes in 0 blocks
==31860== indirectly lost: 0 bytes in 0 blocks
==31860== possibly lost: 1,152 bytes in 4 blocks
==31860== still reachable: 68 bytes in 3 blocks
==31860== suppressed: 0 bytes in 0 blocks
==31860== Rerun with --leak-check=full to see details of leaked memory
==31860==
==31860== For counts of detected and suppressed errors, rerun with: -v
==31860== ERROR SUMMARY: 8 errors from 7 contexts (suppressed: 0 from 0)
zsh: segmentation fault (core dumped) valgrind ./target/debug/deps/bit_vec-3219dec6733cc80b
Is anyone still maintaining this lib? Is there a better fork available? We've got a lot of open PRs/issues that I'd like to see worked towards. Would be happy to take over if there's no one else doing it.
This is related to #13, but I think it can be implemented before IndexGet
by adding
impl BitVec {
fn slice<'a, R: RangeArgument>(&self, offset: usize, len: usize) -> BitVecView<'a> { ... }
}
Thoughts?
I'm currently trying to implement a few bit-parallel algorithms in Rust that I planned to use the BitVec
type for. I frequently have to use bit-wise operations (|
,&
,^
,<<
,>>
) on the bit vectors and was a bit disappointed that only &
, |
and ^
are supported, and that only as in-place operations via union
, difference
and intersection
. Shifting is currently not supported at all.
Is there a specific reason why the bit-wise traits for these operations from std::ops
(BitAnd
, Shl
, etc) are not implemented for this type? If not, would there be interest in a PR that implements them?
We could use leading_zeros or trailing_zeros to iterate over the indices of set or unset bits, avoiding lots of arithmetic/shifting.
I could whip up a PR if it's decided it's worth adding.
Since Vec
is three usize
values, we could have a stack-based variant of BitVec
that could store up to 96 bits (on x86) or 192 bits (on x64) in the struct itself. This could potentially be much faster for short bitvectors since it doesn't require allocations. For example, my crate img_hash
by default uses bitvectors of size 64.
With const fn
, this could even work with the current user-defined block size design using mem::size_of()
to calculate at compile time how many blocks of the given type could fit in the space that Vec
would take up:
enum BitVecStorage<B> {
Heap(Vec<B>),
Stack([B; mem::size_of::<Vec<B>>() / mem::size_of::<B>()]),
}
pub struct BitVec<B=u32> {
storage: BitVecStorage<B>,
nbits: usize,
}
With the exception of B=u64
on a x86 system, as 3 * 32 % 64 != 0
, both variants will essentially use the same amount of space. And the stack storage can be transformed to heap storage very efficiently via slice::into_vec(self: Box<[T]>)
.
The only downsides I can see are an extra byte for the enum discriminator, and system-dependent performance since the point at which it switches to heap storage depends on the block type and the bitness of the system.
clear currently just zeros all the bits, but by convention it should set the len to 0.
We should have a clear_all method that zeros the bits in symmetry to set_all.
There seems to be no way to disable serde support on the recent bit-vec 0.6.0 release without having to disable std support as well. It's because of the line std = ["serde/std"]
line in Cargo.toml: if you enable std you automatically also enable serde. This is a 0.5 → 0.6 regression. Either, add a std_no_deps feature that is used inside the code and that std depends on, or just erase that line and expect users to enable the std feature in serde themselves.
These could just delegate to the current methods, but allow the user a bit less confusing use:
this.or(that); // looks like it shouldn't be modifying `this`
this |= that;
The README says
Documentation is available at https://contain-rs.github.io/bit-vec/bit_vec.
But that URL yields an HTTP 404 error.
may be
let mut bits = BitVec::from_bytes(&mut mmap[..]);
but now (can't be mut)
let mut bits = BitVec::from_bytes(&mmap[..]);
The implementation of count_ones()
does not make sense. It's actually an infinite recursion. Or is this intended? Or a bug? If intended, what is it supposed to do? Is it not infinite, or am I reading the code correctly?
Lines 153 to 164 in d3585be
https://docs.rs/bit-vec/0.5.0/bit_vec/struct.BitVec.html#method.get
Function returns MSB. I need LSB. How do it?
were exposed to permit a separation into two crates. While I think these are useful in general, a more principled approach is probably desirable.
Also set_len and storage_mut were marked as unsafe
, which I think is strictly unnecessary (bitvec has no unsafe code), but I left as a "here be dragons" reservation for future unsafety. Might not want to have it like that, though.
BitVec
lets you manipulate individual bits, which is often useful, but it doesn't really help if you have a [u8]
(especially a large one). Adding a sort of BitVecView
type that could be created from a reference to a slice of integers and would behave like a BitVec using the original slice for storage would enable this.
It's sometimes helpful to get a partition of BitVec
specified by a given range. Specifically, the following API might be useful:
/// Return a new `BitVec` at a given range. This operation is taking
/// O(range) time.
pub fn get_range(&self, range: Range<usize>) -> Self;
This function will essentially copy range/block_size
blocks with some constant time adjustments.
It's also helpful to have its iterator equivalent:
pub fn iter(&self, range: Range<usize>) -> Iter<B> {
self.ensure_invariant();
Iter { bit_vec: self, range }
}
Those two features will be useful in applications like MPC. It would be helpful if similar features are already implemented. Otherwise I can submit a PR. Thanks!
I'd like to xor two bitvecs. The documentation suggests there is no such method. However, the latest release does contain the xor
method. In other words: The documentation is outdated.
Please rebuild and re-upload, thanks!
(And thanks for the nice crate! No additional dependencies, just what I like to see.)
Forgot
I'd like to publish a new version of mioco and it depends on the recent fix for usize. Could you push a new version to crates.io?
Not really a bug but I wanted to bring this up. In the bitvec struct the default storage size is u32. I typically use it with u8
but I appreciate the use of u32
or a higher size storage since this is allows register alignment. However, shouldn't it be usize
to account for the wordsize on 64bit processors as well as 32bit?
Too lazy right now
I noticed a regression in w3f/vdf@961caee which indicates set_all
being slower than an allocation. Is it perhaps that the allocation is being optimized away, but the for loop in set_all
has some overhead?
This feature is unstable in nightly branch of rust, but has been forgotten when migrating from std::collections to this external repo.
Hi!
I don't know the relevance of this issue regarding BitVec
s in particular, but is there a way to efficiently find a sequence of (un)set bits with a precise length? Ex: 15 set bits (and return the location of the first one)
Hello, I've been using BitVec extensively for a project and I noticed that BitVec::append
doesn't work all the time. See below for a minimal example, where I would expect the output to be [01, 00, 02, 03]
, but BitVec
returns [01, 00, 02, 00]
.
use bit_vec::BitVec;
#[test]
fn it_works() {
// Create a 8 bit integer with value (`1`) encoded in BE.
let mut a = BitVec::from_elem(8, false);
a.set(7, true);
// Create a 16 bit integer with value (`2`) encoded in BE.
let mut b = BitVec::from_elem(16, false);
b.set(14, true);
// Create a 8 bit integer with value (`3`) encoded in BE.
let mut c = BitVec::from_elem(8, false);
c.set(6, true);
c.set(7, true);
// Merge all bit_vecs together
a.append(&mut b);
a.append(&mut c);
assert_eq!(&[01, 00, 02, 03][..], &*a.to_bytes());
}
I think BitVec could use Cell<BitBlock>
to avoid taking &mut self
whenever bits are the only mutated thing.
This would change the signatures of
storage
storage_mut
set
set_all
clear
negate
union
intersect
difference
In general the reverse_bits
function should be generalised to work on whatever word type the BitVec
object is using and then be applied to accumulator directly instead of each byte (assuming the bytes are reversed while assembling the accumulator value). Additionally bit switching should be cheaper than shifting. Consider:
fn reverse_bits<B: BitBlock>(block: B) -> B {
let mut result = B::zero();
result = ((block >> 1) & (!B::zero() / 3)) | ((block & (!B::zero() / 3)) << 1);
result = ((block >> 2) & (!B::zero() / 5)) | ((block & (!B::zero() / 5)) << 2);
result = ((block >> 4) & (!B::zero() / 17)) | ((block & (!B::zero() / 17)) << 4);
result
}
I'm assuming !B::zero() / 3
etc. become compile time constants.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.