cruppstahl / upscaledb Goto Github PK

A very fast lightweight embedded database engine with a built-in query language.

License: Apache License 2.0

Perl 0.33% Shell 0.22% C# 4.58% C 5.37% Java 2.79% Makefile 0.54% Python 0.95% C++ 83.37% Batchfile 0.04% M4 1.74% Visual Basic .NET 0.07%

upscaledb's Introduction

upscaledb 2.2.1                                   Fr 10. Mär 21:33:03 CET 2017
(C) Christoph Rupp, [email protected]; http://www.upscaledb.com

This is the README file of upscaledb.

Contents:

1. About

upscaledb is a database engine written in C/C++. It is fast, production-proven
and easy to use.

This release has a bunch of bug fixes, performance improvements and a new
API for bulk operations.

2. Changes

New Features
* Added a new API function for bulk operations (ups_db_bulk_operations in
	ups/upscaledb_int.h)

Bugfixes
* Fixed compiler error related to inline assembly on gcc 4.8.x
* Fixed bug when ups_cursor_overwrite would overwrite a transactional record
	instead of the (correct) btree record
* Fixed several bugs in the duplicate key consolidation
* issue #80: fixed streamvbyte compilation for c++11
* issue #79: fixed crc32 failure when reusing deleted blobs spanning
	multiple pages
* Fixed a bug when recovering duplicates that were inserted with one of the
	UPS_DUPLICATE_INSERT_* flags
* Minor improvements for the journalling performance
* Fixed compilation issues w/ gcc 6.2.1 (Thanks, Roel Brook)

Other Changes
* Performance improvements when appending keys at the end of the database
* The flags UPS_HINT_APPEND and UPS_HINT_PREPEND are now deprecated
* Removed the libuv dependency; switched to boost::asio instead
* Performance improvements when using many duplicate keys (with a
	duplicate table spanning multiple pages)
* Committed transactions are now batched before they are flushed to disk
* The integer compression codecs UPS_COMPRESSOR_UINT32_GROUPVARINT and
	UPS_COMPRESSOR_UINT32_STREAMVBYTE are now deprecated
* The integer compression codec UPS_COMPRESSOR_UINT32_MASKEDVBYTE is now a
	synonym for UPS_COMPRESSOR_UINT32_VARBYTE, but uses the MaskedVbyte
	library under the hood.
* Added Mingw compiler support (thanks, topilski)

To see a list of all changes, look in the file ChangeLog.

3. Features

- Very fast sorted B+Tree with variable length keys
- Basic schema support for POD types (i.e. uint32, uint64, real32 etc)
- Very fast analytical functions
- Can run as an in-memory database
- Multiple databases in one file
- Record number databases ("auto-increment")
- Duplicate keys
- Logging and recovery
- Unlimited number of parallel Transactions
- Transparent AES encryption
- Transparent CRC32 verification
- Various compression codecs for journal, keys and records using 
    zlib, snappy, lzf
- Compression for uint32 keys
- Network access (remote databases) via TCP/Protocol Buffers
- Very fast bi-directional database cursors
- Configurable page size, cache size, key sizes etc
- Runs on Linux, Unices, Microsoft Windows and other architectures
- Uses memory mapped I/O for fast disk access (but falls back to read/write if
    mmap is not available)
- Uses 64bit file pointers and supports huge files (>2 GB)
- Easy to use and well-documented
- Open source and released under the APL 2.0 license
- Wrappers for C++, Java, .NET, Erlang, Python, Ada and others

4. Known Issues/Bugs

See https://github.com/cruppstahl/upscaledb/issues.

5. Compiling

5.1 Linux, MacOS and other Unix systems

To compile upscaledb, run ./configure, make, make install.

Run `./configure --help' for more options (i.e. static/dynamic library,
build with debugging symbols etc).

5.2 Microsoft Visual Studio

A Solution file is provided for Microsoft Visual C++ in the "win32" folder
for MSVC 2013.
All libraries can be downloaded precompiled from the upscaledb webpage.

To download Microsoft Visual Studio Express Edition for free, go to
http://msdn.microsoft.com/vstudio/express/visualc/default.aspx.

5.3 Dependencies

On Ubuntu, the following packages are required:
  - libdb-dev (optional)
  - protobuf-compiler
  - libprotobuf-dev
  - libgoogle-perftools-dev
  - libboost-system-dev
  - libboost-filesystem-dev
  - libboost-thread-dev
  - libboost-dev

For Windows, precompiled dependencies are available here:
https://github.com/cruppstahl/upscaledb-alien

6. Testing and Example Code

Make automatically compiles several example programs in the directory
'samples'. To see upscaledb in action, just run 'samples/db1'
or any other sample. (or 'win32/out/samples/db1/db1.exe' on Windows platforms).

7. API Documentation

The header files in 'include/ups' have extensive comments. Also, a doxygen
script is available; run 'make doc' to start doxygen. The generated
documentation is also available on the upscaledb web page.

8. Porting upscaledb

Porting upscaledb shouldn't be too difficult. All operating
system dependend functions are declared in '1os/*.h' and defined
in '1os/os_win32.cc' or '1os/os_posix.cc'.
Other compiler- and OS-specific macros are in 'include/ups/types.h'.
Most likely, these are the only files which have to be touched. Also see item
9) for important macros.

9. Migrating files from older versions

Usually, upscaledb releases are backwards compatible. There are some exceptions,
though. In this case tools are provided to migrate the database. First, export
your existing database with ups_export linked against the old version.
(ups_export links statically and will NOT be confused if your system has a
newer version of upscaledb installed). Then use the newest version of
ups_import to import the data into a new database. You can find ups_export
and ups_import in the "tools" subdirectory.

    Example (ups_export of 2.1.2 was renamed to ups_export-2.1.2 for clarity):

    ups_export-2.1.2 input.db | ups_import --stdin output.db

10. Licensing

upscaledb is released under the Apache Public License (APL) 2.0. See the
file COPYING for more information.

For commercial use licenses are available. Visit http://upscaledb.com
for more information.

11. Contact

Author of upscaledb is
    Christoph Rupp
    Paul-Preuss-Str. 63
    80995 Muenchen/Germany
    email: [email protected]
    web: http://www.upscaledb.com

12. Other Copyrights

The Google Protocol Buffers ("protobuf") library is Copyright 2008, Google Inc.
It has the following license:

    Copyright 2008, Google Inc.
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are
    met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following disclaimer
      in the documentation and/or other materials provided with the
      distribution.
    * Neither the name of Google Inc. nor the names of its
      contributors may be used to endorse or promote products derived from
      this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    Code generated by the Protocol Buffer compiler is owned by the owner
    of the input file used when generating it.  This code is not
    standalone and requires a support library to be linked with it.  This
    support library is itself covered by the above license.

upscaledb's People

Contributors

Stargazers

Watchers

upscaledb's Issues

cant compile in OS X Lion

Hamsterdb 2.0.2

/bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I. -I..  -I../include -I../3rdparty/aes    -g -O2 -Wall -DHAM_LITTLE_ENDIAN  -MT log.lo -MD -MP -MF .deps/log.Tpo -c -o log.lo log.cc
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I.. -I../include -I../3rdparty/aes -g -O2 -Wall -DHAM_LITTLE_ENDIAN -MT log.lo -MD -MP -MF .deps/log.Tpo -c log.cc  -fno-common -DPIC -o .libs/log.o
log.cc: In member function 'std::string Log::get_path()':
log.cc:410: error: '::basename' has not been declared
make[3]: *** [log.lo] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Darwin wsantos 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr  9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64

hamsterdb-2.0.2 ❯ libtool -V       
Apple Inc. version cctools-822

Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.9~22/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.9~22/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)

Failed to build with boost 1.33.1

CentOS 5.9 x86_64

boost: 1.33.1 is installed by yum.

hamsterdb.cc:1191: error: no matching function for call to 'boost::detail::thread::scoped_lockboost::mutex::scoped_lock()'
/usr/include/boost/thread/detail/lock.hpp:68: note: candidates are: boost::detail::thread::scoped_lock::scoped_lock(Mutex&, bool) [with Mutex = boost::mutex]
/usr/include/boost/thread/detail/lock.hpp:64: note:                 boost::detail::thread::scoped_lockboost::mutex::scoped_lock(const boost::detail::thread::scoped_lockboost::mutex&)
/usr/include/boost/noncopyable.hpp: In member function 'boost::detail::thread::scoped_lockboost::mutex& boost::detail::thread::scoped_lockboost::mutex::operator=(const boost::detail::thread::scoped_lockboost::mutex&)':
/usr/include/boost/noncopyable.hpp:28: error: 'const boost::noncopyable_::noncopyable& boost::noncopyable_::noncopyable::operator=(const boost::noncopyable_::noncopyable&)' is private
/usr/include/boost/thread/detail/lock.hpp:64: error: within this context
/usr/include/boost/thread/detail/lock.hpp:64: error: non-static reference member 'boost::mutex& boost::detail::thread::scoped_lockboost::mutex::m_mutex', can't use default assignment operator

Fix signal handler

The variable "running" should use the data type "sig_atomic_t", shouldn't it?

Do not allow flag UPS_FIND_EQ_MATCH

This is an internal flag, and ups_cursor_find can fail if the flag is specified. Remove it from the header file, also from the wrappers!

very small cache sizes cause file size explosion

If cache size is very small then file size grows a LOT:

./ham_bench --cache=1024 --stop-ops=10000
hamsterdb filesize 154468352

./ham_bench --stop-ops=10000
hamsterdb filesize 10534912 (10 times less!)

Eternal recursion in ham_cursor_find

Hamsterdb 2.1.10 falls into the endless recursion if I declare an environment with the HAM_ENABLE_TRANSACTIONS flag, but the cursor itself has no associated transaction (also there're no active transactions around):

ham_cursor_create(&cursor, _dbi, NULL, 0);
ham_cursor_find(cursor, &key, &rec, HAM_FIND_GEQ_MATCH);

Dbx_kv.dll!hamsterdb::LocalDatabase::find_txn(hamsterdb::Context * context=0x000000000012f128, hamsterdb::Cursor * cursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 438 C++
Dbx_kv.dll!hamsterdb::LocalDatabase::find_txn(hamsterdb::Context * context=0x000000000012f128, hamsterdb::Cursor * cursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 438 C++
Dbx_kv.dll!hamsterdb::LocalDatabase::find_txn(hamsterdb::Context * context=0x000000000012f128, hamsterdb::Cursor * cursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 438 C++
Dbx_kv.dll!hamsterdb::LocalDatabase::find_txn(hamsterdb::Context * context=0x000000000012f128, hamsterdb::Cursor * cursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 438 C++
Dbx_kv.dll!hamsterdb::LocalDatabase::find_impl(hamsterdb::Context * context=0x000000000012f128, hamsterdb::Cursor * cursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 1670 C++
Dbx_kv.dll!hamsterdb::LocalDatabase::find(hamsterdb::Cursor * cursor=0x0000000003510060, hamsterdb::Transaction * txn=0x0000000000000000, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 1135 C++
Dbx_kv.dll!ham_cursor_find(ham_cursor_t * hcursor=0x0000000003510060, ham_key_t * key=0x000000000012f468, ham_record_t * record=0x000000000012f428, unsigned int flags=24576) Line 1278 C++
Dbx_kv.dll!CDbxKV::EnumContactSettings(unsigned int contactID=0, DBCONTACTENUMSETTINGS * dbces=0x000000000012f6f8) Line 585 C++

Cursor.Find in hamsterdb-dotnet not working properly with hamzilla

A call to Cursor.Find with one of the search flags (e.g., HAM_FIND_GT_MATCH) should return both the matching key and value of the result. This works fine when accessing the database file directly. However, when accessing the database via the hamzilla server, the original search key is returned instead of the key corresponding to the result.

The following code gives an example of the error, and can be copy-pasted into 'CursorTest.cs' in the 'Unittests' project of the hamsterdb-dotnet project.

        // Note: this is just the 'checkEqual' method from CursorTest.cs made into a static method
        static void checkEqual(byte[] lhs, byte[] rhs)
        {
            Assert.AreEqual(lhs.Length, rhs.Length);
            for (int i = 0; i < lhs.Length; i++)
                Assert.AreEqual(lhs[i], rhs[i]);
        }

        private static void Search()
        {
            byte[] k1 = new byte[5]; k1[0] = 5;
            byte[] k2 = new byte[5]; k2[0] = 6;
            byte[] r1 = new byte[5]; r1[0] = 1;
            byte[] r2 = new byte[5]; r2[0] = 2;

            // direct file access
            var env = new Hamster.Environment();
            var db = new Database();
            env.Create("ntest.db");
            db = env.CreateDatabase(1, 0);
            Cursor c = new Cursor(db);
            db.Insert(k1, r1, Hamster.HamConst.HAM_OVERWRITE);
            db.Insert(k2, r2, Hamster.HamConst.HAM_OVERWRITE);
            byte[] krslt = k1;
            var f = c.Find(ref krslt, HamConst.HAM_FIND_GT_MATCH);
            checkEqual(k2, krslt); // WORKS FINE
            checkEqual(r2, f);
            db.Close();
            env.Close();

            // remote access
            env = new Hamster.Environment();
            db = new Database();
            env.Open("ham://localhost:8080/ntest.db");
            db = env.OpenDatabase(1, 0);
            c = new Cursor(db);
            db.Insert(k1, r1, Hamster.HamConst.HAM_OVERWRITE);
            db.Insert(k2, r2, Hamster.HamConst.HAM_OVERWRITE);
            krslt = k1;
            f = c.Find(ref krslt, HamConst.HAM_FIND_GT_MATCH);
            checkEqual(k2, krslt);  // FAILS HERE (krslt is unchanged)
            checkEqual(r2, f);
            db.Close();
            env.Close();
        }

The corresponding hamzilla.config file is as follows:

{
    /* global configuration settings */
    "global": {
        "port": 8080
    },

    /* list of hamsterdb Environments that are served */
    "environments": [
        {
            "url": "/ntest.db",
            "path": "./ntest.db",
            "flags": "",
            "databases": [
                {
                    "name": 1,
                    "flags": ""
                }
            ]
        }
    ]
}

segfault when committing transactions

In a specific sequence of inserts/deletes in various databases, a segfault can happen.

I have received a test case which reproduces the issue.

undefined reference to `gnutls_global_deinit'

In attempting to build for osx and redhat/centos 6.4, I was unable to build without specifying ./configure LIBS="-lgnutls"

Error:
hamsterdb/unittests/main.cpp:230: undefined reference to `gnutls_global_deinit'

osx build needs reference to boost_system

when trying to build for osx (using homebrew libs), build fails with the following. works fine on linux... thoughts?

Undefined symbols for architecture x86_64:
"boost::system::system_category()", referenced from:
global constructors keyed to _ZN12_GLOBAL__N_12_1Ein ham_info.o
"boost::system::generic_category()", referenced from:
global constructors keyed to _ZN12_GLOBAL__N_12_1Ein ham_info.o
ld: symbol(s) not found for architecture x86_64

Was able to work around the issue with the following:
./configure LIBS="-lboost_system-mt -lgnutls"

segfault when moving cursor

KvSpeedTestApp.exe!hamsterdb::TransactionOperation::get_flags() Line 74 C++
KvSpeedTestApp.exe!hamsterdb::Cursor::move(ham_key_t * key, ham_record_t * record, unsigned int flags) Line 963 C++
KvSpeedTestApp.exe!hamsterdb::LocalDatabase::cursor_move(hamsterdb::Cursor * cursor, ham_key_t * key, ham_record_t * record, unsigned int flags) Line 1113  C++
KvSpeedTestApp.exe!ham_cursor_move(ham_cursor_t * hcursor, ham_key_t * key, ham_record_t * record, unsigned int flags) Line 1484    C++
KvSpeedTestApp.exe!HamDBIterator::GetNext() Line 49 C++
KvSpeedTestApp.exe!KvSpeedTest::PutGetSameKeyTest() Line 108    C++
KvSpeedTestApp.exe!KvSpeedTest::RunAll() Line 131   C++
KvSpeedTestApp.exe!main(int argc, const char * * argv) Line 30  C++

Verify win32 mmap limits

When opening a file, the mmap view is limited to 2^32 bytes.
What if the file is larger? Will this lead to problems?

cursor returns wrong record when moving back and forth w/ approx. matching

Example: The time series is created sequentially by inserting (1,1) (2,2) (3,3) ...

Key, Value
1,1
2,2
3,3
4,4
5,5
6,6

The read cursor is put at the end via cursor.move_last() to be at the end of the series shortly after a new entry is inserted. So the read cursor is at the end while a new entry is put "behind" the end. This seems to be a critical action, because if this is not this constellation the following does not happen!

Read cursor is at position 5.
Write cursor puts in (6,6).
Read cursor is positioned via cursor.move_last().
Now the read cursor is put again at position 6 (ok)
Now the read cursor is asked to find(key,record,HAM_FIND_LT_MATCH) with key = 6. The result is key=5 and record=5 (ok)
Now repeat the step backward in time: find(key,record,HAM_FIND_LT_MATCH) with key=5. The result is key=4 and record = 4 (ok)
Now ask for the step forward in time: find(key,record,HAM_FIND_GT_MATCH) with key=4. The result is key=4 and record=6 (?????)

All this is NOT happening, if the last entry is put at the end and the read cursor is not at the last position.

osx build fails with google perftools installed, undefined symbols

build fails with google perftools (2.0) is installed

Undefined symbols for architecture x86_64:
"MallocExtension::instance()", referenced from:
hamsterdb::Memory::get_global_metrics(ham_env_metrics_t_) in libhamsterdb.a(mem.o)
hamsterdb::Memory::release_to_system() in libhamsterdb.a(mem.o)
"tc_calloc", referenced from:
HamsterdbFixture::callocTest() in hamsterdb.o
hamsterdb::BtreeCursor::uncouple(unsigned int) in libhamsterdb.a(btree_cursor.o)
hamsterdb::BtreeCursor::clone(hamsterdb::BtreeCursor) in libhamsterdb.a(btree_cursor.o)
hamsterdb::BtreeCursor::get_duplicate_table(hamsterdb::PDupeTable*, bool) in libhamsterdb.a(btree_cursor.o)
"tc_free", referenced from:
hamsterdb::ByteArray::clear() in db.o
hamsterdb::ExtKeyCache::ExtKeyHelper::remove_if(hamsterdb::ExtKeyCache::ExtKey) in extkeys.o
hamsterdb::ExtKeyCache::remove(unsigned long) in extkeys.o
HamsterdbFixture::callocTest() in hamsterdb.o
hamsterdb::JournalFixture::compareJournal(hamsterdb::Journal_, hamsterdb::LogEntry_, unsigned int) in journal.o
hamsterdb::JournalFixture::appendEraseTest() in journal.o
hamsterdb::JournalFixture::appendPartialInsertTest() in journal.o
...
"tc_malloc", referenced from:
hamsterdb::ExtKeyCache::insert(unsigned long, unsigned int, unsigned char const) in extkeys.o
hamsterdb::Database::copy_key(ham_key_t const_, ham_key_t_) in misc.o
hamsterdb::BtreeIndex::copy_key(hamsterdb::PBtreeKey const_, ham_key_t_) in libhamsterdb.a(btree.o)
hamsterdb::ExtKeyCache::insert(unsigned long, unsigned int, unsigned char const_) in libhamsterdb.a(btree_insert.o)
hamsterdb::Database::copy_key(ham_key_t const_, ham_key_t_) in libhamsterdb.a(btree_insert.o)
hamsterdb::Database::copy_key(ham_key_t const_, ham_key_t_) in libhamsterdb.a(btree_cursor.o)
hamsterdb::DiskDevice::read_page(hamsterdb::Page_) in libhamsterdb.a(env.o)
...
"tc_realloc", referenced from:
void* hamsterdb::Memory::reallocate(void, unsigned long) in libhamsterdb.a(btree.o)
void_ hamsterdb::Memory::reallocate(void_, unsigned long) in libhamsterdb.a(txn_cursor.o)
void_ hamsterdb::Memory::reallocate(void_, unsigned long) in libhamsterdb.a(db.o)
void_ hamsterdb::Memory::reallocate(void_, unsigned long) in libhamsterdb.a(blob_manager_disk.o)
void_ hamsterdb::Memory::reallocate(void*, unsigned long) in libhamsterdb.a(blob_manager_inmem.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

segfault when using approx. matching on deleted keys

btree: insert "aa"
txn: begin
txn: erase "aa"
txn: find "aa" with GEQ match
-> segfault

ubuntu 12.10: check compilation if protobuf-compiler is missing

It seems the compilation fails if the protobuf-compiler and/or the libprotocol-libraries are missing.

make[3]: Entering directory `/home/joe/hamsterdb-2.1.0/src'
/bin/bash ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -I../include -I../include -g -O2 -Wall -DHAM_LITTLE_ENDIAN -fno-tree-vectorize -D_FILE_OFFSET_BITS=64 -MT blob.lo -MD -MP -MF .deps/blob.Tpo -c -o blob.lo blob.cc
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I.. -I../include -I../include -g -O2 -Wall -DHAM_LITTLE_ENDIAN -fno-tree-vectorize -D_FILE_OFFSET_BITS=64 -MT blob.lo -MD -MP -MF .deps/blob.Tpo -c blob.cc -fPIC -DPIC -o .libs/blob.o
In file included from env.h:34:0,
from db.h:28,
from blob.cc:18:
protocol/protocol.h:23:25: fatal error: messages.pb.h: No such file or directory
compilation terminated.
make[3]: *** [blob.lo] Error 1

configure needs more --without-tcmalloc checks

there are 2 locations in configure that do:
LIBS="-ltcmalloc_minimal $LIBS"

these appear to run even when --without-tcmalloc is specified

i'm not sure why, but consequently this is used when checking for protobuf and causes the test to segfault, causing configure to not detect the library

configure:18216: g++ -o conftest -g -O2 conftest.cpp -lcrypto -ltcmalloc_minimal >&5
conftest.cpp:43:1: warning: "HAVE_GOOGLE_TCMALLOC_H" redefined
conftest.cpp:33:1: warning: this is the location of the previous definition
configure:18216: $? = 0
configure:18216: ./conftest
./configure: line 2107: 39775 Segmentation fault: 11 ./conftest$ac_exeext
configure:18216: $? = 139
configure: program exited with status 139

this was observed on osx 10.8.4

reserved identifier violation

I would like to point out that identifiers like "HAM_HAMSTERDB_HPP__" and "HAM_TYPES_H__" do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Problem compiling on Centos 6 with tc_malloc

Hello,

I'm trying to compile version 2.1.1 on centos 6 + EPEL (for gperftools).

configure doesn't seems very happy about tcmalloc libraries, because one of the check fails :

checking google/tcmalloc.h usability... yes
checking google/tcmalloc.h presence... yes
checking for google/tcmalloc.h... yes
checking for tc_malloc in -ltcmalloc_minimal... no

But in some way, tc_malloc support is still enabled, because the build fails in sample :

Making all in samples
make[2]: Entering directory `/root/hamsterdb-2.1.1/samples'
  CC     db1.o
  CCLD   db1
../src/.libs/libhamsterdb.so: undefined reference to `tc_free'
../src/.libs/libhamsterdb.so: undefined reference to `tc_malloc'
../src/.libs/libhamsterdb.so: undefined reference to `tc_calloc'
../src/.libs/libhamsterdb.so: undefined reference to `tc_realloc'
../src/.libs/libhamsterdb.so: undefined reference to `MallocExtension::instance()'
collect2: ld returned 1 exit status
make[2]: *** [db1] Error 1
make[2]: Leaving directory `/root/hamsterdb-2.1.1/samples'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/hamsterdb-2.1.1'
make: *** [all] Error 2

gperftools are from EPEL. Synced from upstream svn r219.

webpage: contact form fails if text includes html comment

If the email text includes a

<--

then the mail that is sent is truncated.

potential segfault when performing recovery

Journal::recover -> __abort_uncommitted_txns -> ham_txn_abort
-> env->get_txn_manager()->abort
crashes because env->m_txn_manager == 0

ups_info prints bogus record size values

See Dmitriy's mail from dec 17:

What I see is that Common.db uses bogus values for the record size:

average record size:  1086
minimum record size:  419
maximum record size:  419

If minimum and maximum is 419 then the average cannot be 1086. I think one of these values is off, and therefore maybe the max. key size is misleading. But this is just a wrong printout, it does not explain why you see unexpected results in your program.

ham_find with HAM_FIND_GT_MATCH does not update the key

With 2.0.3 I'm trying to store a key "FooBar" and then find it by "Foo", however, the ham_find seems to leave the "Foo" in ham_key_t, failing to return the "FooBar" key.

Below is a more detailed description:

recKey << key << '\x1F' << suffix;
log_info ("cacheLocal, recKey: " << recKey << " (" << recKey.size() << ")");
ham_key_t hkey; memset (&hkey, 0, sizeof (ham_key_t));
hkey.data = (void*) recKey.data(); hkey.size = recKey.size();
ham_record_t record; memset (&record, 0, sizeof (ham_record_t));
record.data = (void*) data.data(); record.size = data.size();
ham_status_t st = ham_insert ((ham_db_t*) _db.get(), txn, &hkey, &record, HAM_OVERWRITE);
...
ham_key_t hkey; memset (&hkey, 0, sizeof (ham_key_t));
hkey.data = (void*) key.data(); hkey.size = key.size();
ham_record_t record; memset (&record, 0, sizeof (ham_record_t));
ham_status_t st = ham_find ((ham_db_t*) _db.get(), txn, &hkey, &record, HAM_FIND_GT_MATCH);
if (st == HAM_KEY_NOT_FOUND) return;
log_info ("getLocal, looking for " << key << " (" << key._buf << ", " << key.size() << "); found " << hkey.data << " (" << hkey.size << ")");

The output is:
cacheLocal, recKey: 9l1FJRJZADA6wtVfUUKC0SSvpFIF7zfK000at (38)
getLocal, looking for 9l1FJRJZADA6wtVfUUKC0SSvpFIF7zfK (0x2b68716fafe0, 32); found 0x2b68716fafe0 (32)

There is no key of size 32 in the database! Yet after ham_find (&hkey) I'm still seeing the same 32-byte prefix in hkey.

Make include guards unique

I find that include guards like "GRAPH_H" and "COMMON_H" are too short for the safe reuse of your header files (when they belong to an application programming interface).

2.1.4 fails on configure thread_clock test on osx

configure fails early on with:
conftest.cpp:49:16: error: no type named 'thread_clock' in namespace 'boost::chrono'; did you mean 'steady_clock'?
boost::chrono::thread_clock d;

Looking elsewhere, seems thread_clock is not available on osx
Doesn't seem like this is used elsewhere, commenting out thread_clock test produces a working build.

Missing openssl header files are not tracked correctly

If openssl libraries are installed but the header files are missing there's a compilation error:

CXX env_local.lo
In file included from device_disk.h:22:0,
from device_factory.h:17,
from env_local.cc:18:
aes.h:21:25: fatal error: openssl/evp.h: No such file or directory
compilation terminated.
make[3]: *** [env_local.lo] Error 1
make[3]: Leaving directory /root/ryan/hamsterdb-2.1.2/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory/root/ryan/hamsterdb-2.1.2/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/ryan/hamsterdb-2.1.2'

workaround: install openssl-devel packages (Ubuntu: libssl-dev).

hamserver.cc:1316:71 Server will not compile on Debian 7

Hello,

Even if I set the -fpermissive flag, it's still fails to compile.

make[3]: Entering directory `/home/randy/prg/hamsterdb-2.1.7/src/server'
CXX hamserver.lo
hamserver.cc: In function 'void hamsterdb::on_new_connection(uv_stream_t_, int)':
hamserver.cc:1316:71: error: invalid conversion from 'uv_buf_t ()(uv_handle_t, size_t) {aka uv_buf_t ()(uv_handle_s, unsigned int)}' to 'uv_alloc_cb {aka void ()(uv_handle_s, unsigned int, uv_buf_t_)}' [-fpermissive]
In file included from hamserver.h:25:0,
from hamserver.cc:25:
/usr/local/include/uv.h:645:15: error: initializing argument 2 of 'int uv_read_start(uv_stream_t_, uv_alloc_cb, uv_read_cb)' [-fpermissive]
hamserver.cc:1316:71: error: invalid conversion from 'void ()(uv_stream_t, ssize_t, uv_buf_t) {aka void ()(uv_stream_s, int, uv_buf_t)}' to 'uv_read_cb {aka void ()(uv_stream_s, int, const uv_buf_t_)}' [-fpermissive]
In file included from hamserver.h:25:0,
from hamserver.cc:25:
/usr/local/include/uv.h:645:15: error: initializing argument 3 of 'int uv_read_start(uv_stream_t_, uv_alloc_cb, uv_read_cb)' [-fpermissive]
hamserver.cc: In function 'ham_status_t ham_srv_init(ham_srv_config_t_, ham_srv_t*)':
hamserver.cc:1354:69: error: too few arguments to function 'int uv_ip4_addr(const char, int, sockaddr_in_)'
In file included from hamserver.h:25:0,
from hamserver.cc:25:
/usr/local/include/uv.h:2052:15: note: declared here
hamserver.cc:1355:38: error: cannot convert 'sockaddr_in' to 'const sockaddr_' for argument '2' to 'int uv_tcp_bind(uv_tcp_t_, const sockaddr_, unsigned int)'
hamserver.cc:1365:52: error: invalid conversion from 'void ()(uv_async_t, int) {aka void ()(uv_async_s, int)}' to 'uv_async_cb {aka void ()(uv_async_s)}' [-fpermissive]
In file included from hamserver.h:25:0,
from hamserver.cc:25:
/usr/local/include/uv.h:1352:15: error: initializing argument 3 of 'int uv_async_init(uv_loop_t_, uv_async_t_, uv_async_cb)' [-fpermissive]
make[3]: *** [hamserver.lo] Error 1

Building with libuv fails

I've build and installed libuv from source, but I get this error:

make[3]: Entering directory `/home/wouter/upscaledb/src/5server'
  CXX      upsserver.lo
upsserver.cc: In function 'void upscaledb::on_new_connection(uv_stream_t*, int)':
upsserver.cc:1994:71: error: invalid conversion from 'uv_buf_t (*)(uv_handle_t*, size_t) {aka uv_buf_t (*)(uv_handle_s*, long unsigned int)}' to 'uv_alloc_cb {aka void (*)(uv_handle_s*, long unsigned int, uv_buf_t*)}' [-fpermissive]
     uv_read_start((uv_stream_t *)client, on_alloc_buffer, on_read_data);
                                                                       ^
In file included from upsserver.cc:26:0:
/usr/local/include/uv.h:465:15: error:   initializing argument 2 of 'int uv_read_start(uv_stream_t*, uv_alloc_cb, uv_read_cb)' [-fpermissive]
 UV_EXTERN int uv_read_start(uv_stream_t*,
               ^
upsserver.cc:1994:71: error: invalid conversion from 'void (*)(uv_stream_t*, ssize_t, uv_buf_t) {aka void (*)(uv_stream_s*, long int, uv_buf_t)}' to 'uv_read_cb {aka void (*)(uv_stream_s*, long int, const uv_buf_t*)}' [-fpermissive]
     uv_read_start((uv_stream_t *)client, on_alloc_buffer, on_read_data);
                                                                       ^
In file included from upsserver.cc:26:0:
/usr/local/include/uv.h:465:15: error:   initializing argument 3 of 'int uv_read_start(uv_stream_t*, uv_alloc_cb, uv_read_cb)' [-fpermissive]
 UV_EXTERN int uv_read_start(uv_stream_t*,
               ^
upsserver.cc: In function 'ups_status_t ups_srv_init(ups_srv_config_t*, ups_srv_t**)':
upsserver.cc:2041:50: error: too few arguments to function 'int uv_ip4_addr(const char*, int, sockaddr_in*)'
   bind_addr = uv_ip4_addr("0.0.0.0", config->port);
                                                  ^
In file included from upsserver.cc:26:0:
/usr/local/include/uv.h:1358:15: note: declared here
 UV_EXTERN int uv_ip4_addr(const char* ip, int port, struct sockaddr_in* addr);
               ^
upsserver.cc:2042:38: error: cannot convert 'sockaddr_in' to 'const sockaddr*' for argument '2' to 'int uv_tcp_bind(uv_tcp_t*, const sockaddr*, unsigned int)'
   uv_tcp_bind(&srv->server, bind_addr);
                                      ^
upsserver.cc:2058:52: error: invalid conversion from 'void (*)(uv_async_t*, int) {aka void (*)(uv_async_s*, int)}' to 'uv_async_cb {aka void (*)(uv_async_s*)}' [-fpermissive]
   uv_async_init(srv->loop, &srv->async, on_async_cb);
                                                    ^
In file included from upsserver.cc:26:0:
/usr/local/include/uv.h:763:15: error:   initializing argument 3 of 'int uv_async_init(uv_loop_t*, uv_async_t*, uv_async_cb)' [-fpermissive]
 UV_EXTERN int uv_async_init(uv_loop_t*,

segfault when recovering database with UPS_TYPE_CUSTOM keys

During recovery, the compare callback is required - but obviously it's not yet installed.

Clean up db file.

Hello Christoph.
I use hamsterdb with some tests and have question.
I dont shrink DB file after create db, records inserts,delete db and close Env.
If i do loop with records insert-delete file growth continuously.
Any comments?

Thanks, Vladimir.

C# wrapper crashes in exception handler

Hi Christoph,

I have been running through compiling hamster db in VS 2010 and I think there may be a possible bug in the wrapper. The issue arises when I try to access the exception object when an exception is thrown from hamsterdb when a key is not found. For example I have been running the SampleDb1 project and tried to put a breakpoint within the catch (See below) and it seems to crash the application. Note this particular piece of code is AFTER erasing all items. If I remove the breakpoint the application completes successfully however the WriteLine within the catch is never written..

                try
                {
                    byte[] r = db.Find(key);
                    if (r == null)
                    {
                        Console.WriteLine("r is null");
                    }
                    else
                    {
                        Console.WriteLine("r is not null");
                    }
                }
                catch (DatabaseException e)
                {
                    if (e.ErrorCode != HamConst.HAM_KEY_NOT_FOUND)
                    {
                        Console.WriteLine("db.Find() returned error " + e);
                        return;
                    }
                }

Please let me know if there’s anything I can do to assist or if you require further information.

just a test

.NET crashes in ham_strerror

Reason: a native function returning char * cannot be directly marshalled in a string!

See http://stackoverflow.com/questions/6300093/why-cant-i-return-a-char-string-from-c-to-c-sharp-in-a-release-build

recovery can fail with IO_ERROR if journal file is incomplete

./ham_bench --key=uint64 --recsize=10241024 --distribution=ascending --use-transactions=1

then press ctrl-c after a few seconds

../ham_recover test-ham.db3journal/journal.cc[137]: Changeset magic is invalid, skipping
3journal/journal.cc[137]: Changeset magic is invalid, skipping
3journal/journal.cc[651]: Changeset magic is invalid, skipping
1os/os_posix.cc[213]: File::pread() failed with short read (No such file or directory)
ham_env_open() returned error -18: System I/O error

seems that the recovery routine cannot cope with the incomplete changeset entry.

add a unittest for this!

#define ⇒ enum?

Would you like to replace more defines for constant values by enumerations to stress their relationships?

temporary transactions create huge log files

log file switching is basically disabled when creating temporary transactions. Same when using recovery (without transactions) - the changesets are always written to the same log.

64-bit unsigned integers fail to compile in certain compilers

On Linux 2.6.32.12 on armv5tel, using gcc version 4.2.3, I receive the following error while trying to 'make':

CXX  env_local.loenv_local.cc:102: error: integer constant is too large for 'long' type
env_local.cc:219: error: integer constant is too large for 'long' type

I can resolve this error by change instances of '0xffffffffffffffff' to '0xffffffffffffffffULL'.

Completion of error handling

I have looked at a few source files for your current software. I have noticed that some checks for return codes are missing.

Would you like to add more error handling for return values from functions like the following?

ftell ⇒ read_config
fwrite ⇒ BinaryExporter

Temporary transactions do not cause a journal "switch"

The following loop writes many "insert" entries to the journal .jrn0:

ham_env_create(... HAM_ENABLE_TRANSACTIONS);

while (1) {
ham_insert(db, NULL, key, record, 0);
}

The problem is that the ham_txn_begin/ham_txn_commit markers are not written to the journal, and therefore .jrn1 is never used.

This could be implemented in Journal::append_insert; if the txn pointer is a TEMPORARY one then the txn counters in the journal will be increased/decreased.

Win32: race condition with big files

To reproduce: run ham_bench several times
This is caused by the I/O functions in os_win32.cc: they are not atomic. Solution: use a mutex.

segfault during recovery

see this thread:
https://groups.google.com/forum/#!topic/hamsterdb-user/klHj8j1bh5c

Cursor iteration not returning results in sorted order

The code below can be pasted into dotnet\unittests\CursorTest.cs and run with either ReproduceBug(true) or ReproduceBug(false). The point in the test at which each fails has been annotated within the code. I've tried to provide a minimal example of the error, however the following things seems to be required to obtain the errors: 1) Use trasactions, 2) Inserting unequal length arrays for the record values, and 3) Using Cursor.Find followed by Cursor.Move to iterate over the contents of the DB.

By the way, the .NET project is missing the file Properties\AssemblyInfo.cs in the GitHub repo.

        private void WithTransaction(Action<Transaction> f)
        {
            var txn = env.Begin();
            try
            {
                f(txn);
                txn.Commit();
            }
            catch
            {
                txn.Abort();
                throw;
            }
        }

        private void ReproduceBug(bool bugVariation)
        {
            env = new Hamster.Environment();
            env.Create("ntest.db", HamConst.HAM_ENABLE_TRANSACTIONS);   // Note: not using transactions works fine

            var prm = new Parameter();
            prm.name = HamConst.HAM_PARAM_KEY_TYPE;
            prm.value = HamConst.HAM_TYPE_UINT64;
            var prms = new Parameter[] { prm };

            db = new Database();
            db = env.CreateDatabase(1, HamConst.HAM_ENABLE_DUPLICATE_KEYS, prms);

            var k1 = new byte[] { 128, 93, 150, 237, 49, 178, 92, 8 };
            var k2 = new byte[] { 0, 250, 234, 1, 199, 250, 128, 8 };
            var k3 = new byte[] { 128, 17, 181, 113, 1, 220, 132, 8 };

            // print keys (note they are in ascending order as UInt64)
            Console.WriteLine("{0}", BitConverter.ToUInt64(k1, 0));
            Console.WriteLine("{0}", BitConverter.ToUInt64(k2, 0));
            Console.WriteLine("{0}", BitConverter.ToUInt64(k3, 0));
            Console.WriteLine();

            var v1  = new byte[46228];   // Note: using equal size value byte arrays works fine!
            var v11 = new byte[446380];
            var v12 = new byte[525933];
            var v21 = new byte[334157];
            var v22 = new byte[120392];
            WithTransaction(txn => db.Insert(txn, k1, v1, Hamster.HamConst.HAM_DUPLICATE));
            WithTransaction(txn => db.Insert(txn, k2, v11, Hamster.HamConst.HAM_DUPLICATE));
            WithTransaction(txn => db.Insert(txn, k2, v12, Hamster.HamConst.HAM_DUPLICATE));
            WithTransaction(txn => db.Insert(txn, k3, v21, Hamster.HamConst.HAM_DUPLICATE));
            WithTransaction(txn => db.Insert(txn, k3, v22, Hamster.HamConst.HAM_DUPLICATE));

            WithTransaction(txn =>
            {
                using (var c = new Cursor(db, txn))
                {
                    // Note: calling c.Move(HamConst.HAM_CURSOR_NEXT) instead works fine!
                    if (bugVariation)                   
                        c.Find(k1, HamConst.HAM_FIND_GEQ_MATCH);    
                    else
                        c.Find(k1);

                    var s1 = c.GetKey();
                    Console.WriteLine("{0}", BitConverter.ToUInt64(s1, 0));

                    c.Move(HamConst.HAM_CURSOR_NEXT);
                    var s2 = c.GetKey();
                    Console.WriteLine("{0}", BitConverter.ToUInt64(s2, 0));

                    c.Move(HamConst.HAM_CURSOR_NEXT);
                    var s3 = c.GetKey();
                    Console.WriteLine("{0}", BitConverter.ToUInt64(s3, 0));

                    c.Move(HamConst.HAM_CURSOR_NEXT);
                    var s4 = c.GetKey();
                    Console.WriteLine("{0}", BitConverter.ToUInt64(s4, 0));

                    c.Move(HamConst.HAM_CURSOR_NEXT);   // fails here when bugVariation == false
                    var s5 = c.GetKey();
                    Console.WriteLine("{0}", BitConverter.ToUInt64(s5, 0));

                    checkEqual(k1, s1);
                    checkEqual(k2, s2); // fails here when bugVariation == true
                    checkEqual(k2, s3);
                    checkEqual(k3, s4);
                    checkEqual(k3, s5);
                }
            });

            env.Close();

            return;
        }

distributed version?

Are there any plans to create a distributed version of upscaledb, e.g. on hdfs?

osx: no type named 'malloc_trim' in the global namespace

mem.cc:61:5: error: no type named 'malloc_trim' in the global namespace
::malloc_trim(os_get_granularity());

This function is missing on some platforms:
http://www.gnu.org/software/gnulib/manual/html_node/malloc_005ftrim.html

Memory limits are exceeded b/c cache is not flushed by worker thread

Reproduce with
./ham_bench --stop-ops=50000000 --recsize-fixed=37 --keysize-fixed --keysize=21 --distribution=ascending

ham_find: Data blob not found

I'm getting a "Data blob not found" error from the ham_find. Makes no sense to me.
Database: http://glim.ru/personal/bugs/FUC.7z

Documentation error for HAM_PARAM_PAGESIZE

hamsterdb.h defines the parameter as
/** Parameter name for @ref ham_open_ex, @ref ham_create_ex; sets the page

size */
#define HAM_PARAM_PAGESIZE 0x00000101

this should be @ref ham_env_create_ex, @ref ham_create_ex;

approx. matching fails when fetching a key < key[0] of a node

... and HAM_PARAM_KEY_SIZE > 32
and HAM_KEY_TYPE = HAM_TYPE_BINARY or HAM_TYPE_CUSTOM

I have received a test case which reproduces the issue.

approx. matching: cursor returns wrong key when using transactions

As reported by Michael Möllney; i'm able to reproduce this

configure fails to find correct boost_major_version on boost 1.58

When compiling hamsterdb 2.1.11 on ubuntu 15.10, the configure fails with
checking whether the compiler supports GCC C++ ABI name demangling... yes
checking for Boost headers version >= 1.53.0... yes
checking for Boost's header version...

configure: error: invalid value: boost_major_version=

Boost library is installed (1.58).

I'm not sure where this is going wrong, but $boost_cv_lib_version contains an empty string after the test in configure (around line 17338).

Quick 'n dirty workaround is to manually assign "158" to boost_cv_lib_version after the test.

This allows hamsterdb to compile successfully.

/usr/include/boost/version.hpp:
// Boost version.hpp configuration header file ------------------------------//

// (C) Copyright John maddock 1999. Distributed under the Boost
// Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

// See http://www.boost.org/libs/config for documentation

ifndef BOOST_VERSION_HPP

define BOOST_VERSION_HPP

//
// Caution: this is the only Boost header that is guaranteed
// to change with every Boost release. Including this header
// will cause a recompile every time a new Boost version is
// used.
//
// BOOST_VERSION % 100 is the patch level
// BOOST_VERSION / 100 % 1000 is the minor version
// BOOST_VERSION / 100000 is the major version

define BOOST_VERSION 105800

//
// BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
// but as a string in the form "x_y[_z]" where x is the major version
// number, y is the minor version number, and z is the patch level if not 0.
// This is used by <config/auto_link.hpp> to select which library version to link to.

define BOOST_LIB_VERSION "1_58"