Giter VIP home page Giter VIP logo

str's Introduction

Str

Str is a char array wrapper providing some frequently used operations in the most efficient way(supporting AVX512 SIMD optimization), including string comparisons and conversion to/from integers.

StrHash

StrHash is an adaptive open addressing hash table template taking Str as key and providing a find function in the most efficient way. It's adaptive in that it can extract features from the keys contained in the table and train its hashing parameters dynamically to distribute the keys for avoiding collision.

StrHash is actually a subclass of std::map, so user can use whatever funcitons it provides to modify the table, and then call doneModify to train the table and fastFind to find keys in the table. Note that doneModify is pretty slow so it's not efficient to modify the table frequently between fastFind. It's recommended that clear be called immediately after doneModify if only fastFind is needed afterwards, so some memory can be saved.

StrHash currently supports 7 hash functions and one of which can be selected using template parameter HashFunc:

  • 0: djb ver1(default)
  • 1: djb ver2
  • 2: sax
  • 3: fnv
  • 4: oat
  • 5: murmur
  • 6: int(for integer keys)

User can also add other hash functions himself.

StrHash is also suitable to have integers(such as uint32_t or uint64_t) as key for searching. Define StrHash<8, Value, NullV, 6> for uint64_t and StrHash<4, Value, NullV, 6> for uint32_t, see benchfindint.cc for detailed usage.

Benchmark

Tests show that StrHash is 7x faster than std::unordered_map and 3x faster than other open addressing hash table implementations such as tsl::hopscotch_map, tsl::robin_map, robin_hood::unordered_map and google::dense_hash_map.

Str's operator== and compare is 2x faster than strncmp/memcmp or those of std::string.

Str's fromi and toi is 10x faster than stoi/strtol/to_string/sprintf.

benchfindstr.cc tests the performance of multiple string search solutions using the same data set. The data set contains the KRX option issue codes of Feb 2019 that we are interested in and are to be inserted into the table, and the first 1000 option issue codes we received from the market data(which are mostly of Feb 2019 but some are of other months) and are to be searched in the table. In benchfindstr.cc:

  • bench_hash<0~5> compair the performance of different hash functions StrHash supports.
  • bench_hash vs other searching solutions shows how StrHash is faster than others.
  • bench_map vs bench_string_map and bench_bsearch vs bench_string_bsearch show how Str is faster than std::string.

benchfindint.cc tests the performance of multiple integer search solutions in similar way to benchfindstr.cc. The data set contains the SHFE instrument No of type uint64_t. Here bench_hash6 should be the most suitable method.

benchcmp.cc tests string comparison operations.

benchnum.cc tests conversions to/from integers.

str's People

Contributors

liam0205 avatar mengrao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

str's Issues

benchmark compared with robin map

image
image

the benchmark contains William Ahern's perfect hash library (pf), unordered map(um), unordered map with string view key(umv), robin map (mr),
robin map with string view key(mrv), and your StrHash(mca).

first summarize timing of one execution and second for the loop of 1000 execution. generally your version of hashFunc(0) performs better but it's not 3x faster than robin map.

great work!

best string hash function

the best string hash function for hash map.
https://github.com/wangyi-fudan/wyhash

from benchmark your str win is because fixed size string(no heap alloc -> cache friendly) and opt string compare

I add my emhash7(git emhash) into your integer bench:

bench_hash 0 sum: 0 avg lat: 4.8297
bench_hash 1 sum: 0 avg lat: 4.69109
bench_hash 2 sum: 0 avg lat: 3.46806
bench_hash 3 sum: 0 avg lat: 3.35624
bench_hash 4 sum: 0 avg lat: 4.9344
bench_hash 5 sum: 0 avg lat: 1.73721
bench_hash 6 sum: 0 avg lat: 1.64723
std::__1::map<unsigned int, unsigned short>, sum: 2893000 avg lat: 4.69555
std::__1::unordered_map<unsigned int, unsigned short>, sum: 2893000 avg lat: 7.24729
tsl::robin_map<unsigned int, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned int, unsigned short>>, true>, sum: 2893000 avg lat: 1.14586
tsl::hopscotch_map<unsigned int, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned int, unsigned short>>, 10, true>, sum: 2893000 avg lat: 3.34826
robin_hood::detail::Table<true, 80, unsigned int, unsigned short, robin_hood::hash, std::__1::equal_to>, sum: 2893000 avg lat: 3.61231
emhash7::HashMap<unsigned int, unsigned short>, sum: 2893000 avg lat: 1.54262
dense_hash_set, sum: 2893000 avg lat: 7.42706
bench_bsearch sum: 2893000 avg lat: 4.78543

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.