Giter VIP home page Giter VIP logo

daachorse's People

Contributors

kampersanda avatar kg86 avatar vbkaisetsu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

daachorse's Issues

Very slow deserialization

I found that in practice, use of DoubleArrayAhoCorasick::<V>::deserialize_unchecked() can be extremely slow. It typically takes 4-5s on my m1 (16gb ram) machine.

Here are some stats:

heap_bytes=90855648 // .heap_bytes()
num_states=7440593  // .num_states()
2023-12-11T01:35:52.049237Z  INFO load_index{index_file="index.bin"}:load:deserialize
2023-12-11T01:35:56.495537Z  INFO ...

Memory efficient management of outputs

Daachorse handles a value-length pair as a pattern's output (see https://github.com/daac-tools/daachorse/blob/main/src/lib.rs#L289-L292)

In the current implementation, 31 bits and 32 bits are assigned to length and value, respectively. (1 bit is used for the flag,)
But, in many cases, the assignment is too rich.
For example, when the maximum length is 255, 1 byte is sufficient to represent.

If we know the maximum length and value, we can memory-efficiently store members on byte-aligned memory.
For example, if a length is represented in 1 byte and a value (with flag) is represented in 3 bytes, we can interleave them in a byte array outputs as follows.

outputs[0] = length 1
outputs[1] = value 1
outputs[2] = value 1
outputs[3] = value 1
outputs[4] = length 2
outputs[5] = value 2
outputs[6] = value 2
outputs[7] = value 2
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.