luispedro / diskhash Goto Github PK
View Code? Open in Web Editor NEWDiskbased (persistent) hashtable
License: Other
Diskbased (persistent) hashtable
License: Other
Hi, thank you for this library.
I was trying to use c++ API of this library for this data: key = uint32_t
and my value is
struct temp_t {
std::uint32_t _cnt =0 ;
std::uint32_t _cnt2 =0;
};
But it fails to compile, because of static_assert:
static_assert(std::is_pod<T>::value,
"DiskHash only works for POD (plain old data) types that can be mempcy()ed around");
My struct is not POD
because it has user-defined constructor. But, I believe it still can be memcpyed.
Maybe its better to replace std::is_pod
with std::is_standard_layout
? What do you think? Also is_pod
is deprecated in c++20.
Thanks for sharing the good work.
I'm looking at the C++ version:
I'd give a suggestion for you to accept more key types. double, float, int, long, ...
Also, a method for clearing the hashmap, so we can better manage unit tests.
... ah, and there is missing a method to remove members.
... keys, once inserted, cannot be updated?
-- luiz
https://pypi.python.org/pypi/diskhash and this is because your include the README.md as the description. pypi expects this to use ReST not Markdown....
Any plan to also have Java bindings?
I wonder that can you implement loop over existing keys?
It seems that the package does not handle python Unicode string keys correctly.
For example, when I insert a Unicode key using the following code:
import diskhash
tb = diskhash.StructHash("test.dht", 15, 'l', 'rw')
tb.insert("가", 1)
tb.lookup("가")
I get the correct value 1 in the same process where the (key, value) pair is inserted. But if I stop the python shell, restart a new python shell, and execute the following code:
import diskhash
tb = diskhash.StructHash("test.dht", 15, 'l', 'r')
tb.lookup("가")
Nothing is returned from the code. I do not experience a similar problem if the key is an ASCII string.
My environment is:
ERROR: Failed building wheel for diskhash
Running setup.py clean for diskhash
Failed to build diskhash
Installing collected packages: diskhash
Running setup.py install for diskhash ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\user1\appdata\local\programs\python\python37-32\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\user1\AppData\Local\Temp\pip-install-n8q6wuhq\diskhash\setup.py'"'"'; file='"'"'C:\Users\user1\AppData\Local\Temp\pip-install-n8q6wuhq\diskhash\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\user1\AppData\Local\Temp\pip-record-nrcm6gyo\install-record.txt' --single-version-externally-managed --compile
cwd: C:\Users\user1\AppData\Local\Temp\pip-install-n8q6wuhq\diskhash
Complete output (25 lines):
running install
running build
running build_py
creating build
creating build\lib.win32-3.7
creating build\lib.win32-3.7\diskhash
copying python\diskhash\diskhash_version.py -> build\lib.win32-3.7\diskhash
copying python\diskhash_init_.py -> build\lib.win32-3.7\diskhash
creating build\lib.win32-3.7\diskhash\tests
copying python\diskhash\tests\test_larger.py -> build\lib.win32-3.7\diskhash\tests
copying python\diskhash\tests\test_smoke.py -> build\lib.win32-3.7\diskhash\tests
copying python\diskhash\tests_init_.py -> build\lib.win32-3.7\diskhash\tests
running build_ext
building 'diskhash._diskhash' extension
creating build\temp.win32-3.7
creating build\temp.win32-3.7\Release
creating build\temp.win32-3.7\Release\python
creating build\temp.win32-3.7\Release\python\diskhash
creating build\temp.win32-3.7\Release\src
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\user1\appdata\local\programs\python\python37-32\include -Ic:\users\user1\appdata\local\programs\python\python37-32\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /Tcpython/diskhash/_diskhash.c /Fobuild\temp.win32-3.7\Release\python/diskhash/_diskhash.obj
_diskhash.c
python/diskhash/_diskhash.c(121): error C2078: too many initializers
python/diskhash/_diskhash.c(117): error C2078: too many initializers
python/diskhash/_diskhash.c(196): warning C4028: formal parameter 1 different from declaration
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\user1\appdata\local\programs\python\python37-32\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\user1\AppData\Local\Temp\pip-install-n8q6wuhq\diskhash\setup.py'"'"'; file='"'"'C:\Users\user1\AppData\Local\Temp\pip-install-n8q6wuhq\diskhash\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\user1\AppData\Local\Temp\pip-record-nrcm6gyo\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.
C:\Users\user1>
I see that the provided API only has dht-insert, but there is no dht-update.
If there is a key in the hashtable, then the value cannot update the value.
What should I do?
As can be seen at e.g. https://matrix.hackage.haskell.org/package/diskhash#GHC-7.8/diskhash-0.0.1.2
Configuring component lib from diskhash-0.0.1.2...
Preprocessing library diskhash-0.0.1.2...
[1 of 1] Compiling Data.DiskHash ( haskell/Data/DiskHash.hs, /tmp/matrix-worker/1498509481/dist-newstyle/build/x86_64-linux/ghc-7.8.4/diskhash-0.0.1.2/build/Data/DiskHash.o )
haskell/Data/DiskHash.hs:63:34: Not in scope: ‘<$>’
haskell/Data/DiskHash.hs:67:34: Not in scope: ‘<$>’
haskell/Data/DiskHash.hs:94:69: Not in scope: ‘<$>’
haskell/Data/DiskHash.hs:138:27: Not in scope: ‘<$>’
haskell/Data/DiskHash.hs:149:26: Not in scope: ‘<$>’
haskell/Data/DiskHash.hs:165:30: Not in scope: ‘<$>’
This is a result of inaccurate version bounds; Proper bounds would be e.g.:
build-depends: base > 4.8 && < 5, bytestring == 0.10.*
You can correct the version bounds yourself via
https://hackage.haskell.org/package/diskhash-0.0.1.2/diskhash.cabal/edit
In some benchmark, murmurhash is better.
And also, key length param can be added to fit any data type.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.