Giter VIP home page Giter VIP logo

dynamic's People

Contributors

adamnovak avatar chrisbarber avatar ekg avatar jeizenga avatar karasikov avatar mr-c avatar nicolaprezza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamic's Issues

Bug: Construction of list of rle_str with specified alphabet size fails (segmentation fault).

Hi, I wanted to use multiple of rle_str, but I get segmentation fault when constructing the vector. Here is MWE:

#include <dynamic.hpp>

using namespace std;
using namespace dyn;

int main() {
    vector<rle_str> strs(2,rle_str(5));
    return 0;
}

Reproduced the bug on both clang-8 and gcc-8.
When constructing without the specified alphabet size vector<rle_str> strs(2) everything works fine.
What is strange that this pointer is NULL which was assigned in node(const node& other) constructor.

Thank you for the effort of creating this library.
Jan

Stacktrace:

dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > >::node::operator= wt_string.hpp:305
dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > >::node::operator= wt_string.hpp:305
dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > >::node::node wt_string.hpp:301
dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > >::wt_string wt_string.hpp:32
dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >::rle_string rle_string.hpp:28
std::_Construct<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >, dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > const&> stl_construct.h:75
std::__uninitialized_fill_n<false>::__uninit_fill_n<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >*, unsigned long, dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > > stl_uninitialized.h:210
std::uninitialized_fill_n<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >*, unsigned long, dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > > stl_uninitialized.h:255
std::__uninitialized_fill_n_a<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >*, unsigned long, dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >, dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > > stl_uninitialized.h:366
std::vector<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >, std::allocator<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > > >::_M_fill_initialize stl_vector.h:1480
std::vector<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > >, std::allocator<dyn::rle_string<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >, dyn::wt_string<dyn::succinct_bitvector<dyn::spsi<dyn::packed_vector, 8192u, 16u> > > > > >::vector stl_vector.h:430
main test_size.cpp:7
__libc_start_main 0x00007f4a41191830
_start 0x0000000000429099

Efficient batch construction/insert for rle_str

It would be useful (I think) to have a way to make a rle_str from a vector<int>, or to insert a whole vector<int> into an rle_str at some position.

Would breaking those operations out as batch operations make them any more efficient than just inserting each value one at a time? I don't know much about the internals, but I suspect a workload of inserts all at the very end, or all one after the other, might produce tree balancing problems.

Weird install behavior wants files in <prefix>/include/internal/*.hpp

The main dynamic.hpp wants to include internal files as #include <internal/whatever.hpp>. To satisfy this when installing DYNAMIC into a prefix, we have to dump DYNAMIC's internal directory straight into the root of the prefix (or tell downstream projects to use -I<prefix>/dynamic). Either solution causes trouble if any other e.g. internal/bwt.hpp exists in the project's include paths.

DYNAMIC ought to reference its internal includes via a path that has the library's name in it, to prevent filename conflicts and people scratching their heads at what package is responsible for a directory like /usr/local/include/internal.

Batch construction of wavelet trees

Orthogonal to #5, it would be nice to have (and probably easier to implement) batch construction of wavelet trees which is agnostic to the underlying bitvector implementation. While still making serial inserts (push back) to the individual bitvectors, acceleration is possible as follows:

Given a string of symbols to batch-create a new wavelet tree, first scan the string and encode all the symbols using the chosen alphabet encoder. Build an "empty" wavelet tree with nodes for each of the encoded symbols. For each node, precompute the symbols which belong to its partition, as well as whether those symbols belong to its left or right child. Now, the bitvectors of each node can be populated in parallel by iterating symbols in the string, and:

  • if symbol belongs to partition for this node
    • push back a 0 onto bitvector if symbol belongs to left child
    • push back a 1 onto bitvector if symbol belongs to right child
  • else, do nothing

Since the empty tree is pre-constructed, and the bitvectors are all independent, there is no synchronization to worry about, and all threads can read from one copy of the string in memory. Run one thread per node up to a max number of concurrent threads.

Conceptually it is that simple but there are a few things to optimize, e.g. you can wait to create a node and initiate its thread until coming across the first symbol belonging to it, so you don't actually have to pre-scan the string first.

This would also work for push_back_many or even insert_many, where you have threads per node that scan the whole vector of symbols to push back, or the pairs of (symbol, position) to insert, and only do the push back or insert if the symbol belongs to its partition.

Template Specializations in Header File

The dynamic.hpp header has some template specializations in it, which means that if it is included in multiple compilation units, it produces a multiple definitions linker error when you go to link them together (because the object code for the specializations gets generated twice). If you want to use template specializations like that, you need to put them in a .cpp file and generate an object file to hold them.

I think an alternative may be to make the specializations themselves still templates, somehow. Or to inline the specializations.

abort: packed_vector.hpp:465: void dyn::packed_vector::remove(uint64_t): Assertion `(size_ / int_per_word_ == words.size()

Hello,
ran into

$ obj-$(dpkg-architecture -qDEB_TARGET_GNU_TYPE)/benchmark -g 1000000 0.01
size = 1000000. P = 0.01
Benchmarking gap bitvector
insert ... done.
access ... done.
rank 0 ... done.
rank 1 ... done.
select 0 ... done.
select 1 ... done.
remove ... benchmark: /home/moeller/git/med-team/dynamic/dynamic-1.0~alpha.1/include/dynamic/internal/packed_vector.hpp:465: void dyn::packed_vector::remove(uint64_t): Assertion `(size_ / int_per_word_ == words.size() || !(words[size_ / int_per_word_] >> ((size_ % int_per_word_) * width_))) && "uninitialized non-zero values in the end of the vector"' failed.
Aborted (core dumped)

And peeking a boo in gdb:

Core was generated by `obj-x86_64-linux-gnu/benchmark -g 1000000 0.01'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa19f15e55b in __GI_abort () at abort.c:79
#2  0x00007fa19f15e42f in __assert_fail_base (fmt=0x7fa19f2c4b48 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x557aacd476a0 "(size_ / int_per_word_ == words.size() || !(words[size_ / int_per_word_] >> ((size_ % int_per_word_) * width_))) && \"uninitialized non-zero values in the end of the vector\"", 
    file=0x557aacd47058 "/home/moeller/git/med-team/dynamic/xxsds-dynamic-1.0~alpha.1/include/dynamic/internal/packed_vector.hpp", line=465, function=<optimized out>) at assert.c:92
#3  0x00007fa19f16d092 in __GI___assert_fail (
    assertion=0x557aacd476a0 "(size_ / int_per_word_ == words.size() || !(words[size_ / int_per_word_] >> ((size_ % int_per_word_) * width_))) && \"uninitialized non-zero values in the end of the vector\"", 
    file=0x557aacd47058 "/home/moeller/git/med-team/dynamic/xxsds-dynamic-1.0~alpha.1/include/dynamic/internal/packed_vector.hpp", line=465, function=0x557aacd477e0 "void dyn::packed_vector::remove(uint64_t)") at assert.c:101
#4  0x0000557aacd3867c in dyn::packed_vector::remove (this=0x557aae434630, i=<optimized out>) at ./include/dynamic/internal/packed_vector.hpp:981
#5  0x0000557aacd4373a in dyn::spsi<dyn::packed_vector, 256u, 16u>::node::remove (this=0x557aae4232c0, i=3) at ./include/dynamic/internal/packed_vector.hpp:591
#6  0x0000557aacd46a16 in dyn::spsi<dyn::packed_vector, 256u, 16u>::remove (this=0x7ffd24959ba0, i=3) at ./include/dynamic/internal/spsi.hpp:276
#7  dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >::delete1 (i=123, this=0x7ffd24959ba0) at ./include/dynamic/internal/gap_bitvector.hpp:253
#8  dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> >::remove (i=123, this=0x7ffd24959ba0) at ./include/dynamic/internal/gap_bitvector.hpp:197
#9  benchmark_bv<dyn::gap_bitvector<dyn::spsi<dyn::packed_vector, 256u, 16u> > > (size=1000000, p=<optimized out>) at ./benchmark.cpp:122
#10 0x0000557aacd373f5 in main (argc=<optimized out>, argv=0x7ffd24959d48) at ./benchmark.cpp:165

but found it to reproduce only every 2nd to 5th invocation. Spotted with version 1.0-alpha.1.
Anything you want me to do about it?

Cheers,
Steffen

SPSI incorrect insert

{
    dyn::succinct_spsi spsi;
    spsi.insert(0,0);
    spsi.remove(0);
    spsi.insert(0,0);
    spsi.insert(0,0);
    spsi.insert(2,1);
    spsi.insert(1,1);
    spsi.remove(3);

    for(size_t i = 0; i < spsi.size(); ++i)
        std::cout << spsi[i] << ",";
    std::cout << std::endl;

    spsi.insert(3,1);

    for(size_t i = 0; i < spsi.size(); ++i)
        std::cout << spsi[i] << ",";
    std::cout << std::endl;
}

outputs

0,1,0,
0,0,0,1,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.