Giter VIP home page Giter VIP logo

integer_encoding_library's Introduction

Hello, folks!

GitHub Stats

integer_encoding_library's People

Contributors

maropu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

integer_encoding_library's Issues

Can't mmap a large file with 'PROT_WRITE'

VSEncodingRest re-writes its own compressed lists for a buffering technique so as to harness the
padding areas of each list. And then, a compressed file needs to be mmap-ed with a 'PROT_WRITE' option.
However, the option can't be applied to a large file.

Problem building with GCC 4.6.2 and MacOS

When trying to build the software, I have some difficulties:

$ make test
(...)

../include/misc/benchmarks.hpp: In function ‘uint64_t __get_file_size(FILE*)’:
../include/misc/benchmarks.hpp:179:43: error: request for member ‘__pos’ in ‘size’, which is of non-class type ‘fpos_t {aka long long int}’

$ uname -a
Darwin Daniels-MacBook-Air.local 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug 9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64

$ g++- --version
g++-4 (GCC) 4.6.2
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Improve encoding processing

During encoding, many memory allocations in std:vector makes encoding time longer, so these working space might need to be allocated only once for performance reasons.

FIXME Items to be Modified

There are many FIXME items in the sources of this library. These items should be modified in the future.

Newest Encoder/Decoder Implimentation

A newest coder was proposed in CIKM'11, which is titled 'SIMD-based Decoding of Posting Lists'. We would like the coder to be appended into this library.

Remove too large macro funcs

There are too large macro functions, __vserest_fill_pad & __vserest_push_pad, in VSEncodingRest.
And then, these funcs should be rewritten to inline functions for readability.

Missing documentation regarding test data files

If I read the code right (encoders.cpp), you expect a flat binary data file containing a list of 32-bit arrays with their length prepended as a 32-bit integer.

I suppose that one could generate his own data files, but the online documentation at http://integerencoding.isti.cnr.it/?page_id=8 suggests that at least one such data file is available. There is indeed a tar file available (gov2.sort.tar) but it creates a file called gov2.sort.Delta which is apparently not of the right format.

It would be interesting/useful to have ready-made data files, or instructions as to how to generate them for realistic testing.

A correction of decodeArray(..) in every one algorithm class

Take Simple16 as an example, the decodeArray is rendered as following:

decodeArray(uint32_t _in, uint32_t len, uint32_t *out, uint32_t nvalue){
uint32_t *end = out + nvalue;
while (end > out) {
(__simple16_unpack[_in >>
(32 - SIMPLE16_LOGDESC)])(&out, &in);
}
}

According to my understanding of this function, the len indicates the count of numbers to be decompressed, and nvalue should be returned as the count of numbers decompressed to be. So I suggest correct this part of file to be following:

decodeArray(uint32_t _in, uint32_t len, uint32_t *out, uint32_t &nvalue){
uint32_t *end = in + len;
uint32_t *last;
uint32_t count = 0;
while (end > in) {
(__simple16_unpack[_in >>
(32 - SIMPLE16_LOGDESC)])(&out, &in);
count += in - last;
last = in;
}
nvalue = count;
}

Segmentation Fault when decode

when I try to to decode on the encoded file,

I type: ./decoders 3 ../test ../output

Output:
Segmentation fault

if I delete the destination, it show the info of the decoding process with a "segmentation fault" in behind

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.