Giter VIP home page Giter VIP logo

iresearch-toolkit / iresearch Goto Github PK

View Code? Open in Web Editor NEW
182.0 10.0 25.0 34.72 MB

IResearch is a cross-platform, high-performance search analytics library written entirely in C++ with the focus on a pluggability of different ranking/similarity models

Home Page: https://iresearch-toolkit.github.io/iresearch/

License: Other

CMake 1.41% C++ 97.25% Shell 0.19% Python 1.10% SWIG 0.05%
search-engine relevant-search tf-idf bm25 ranking analytics

iresearch's Introduction

!!! THE PROJECT IS ARCHIVED AND NO LONGER MAINTAINED !!!

IResearch search engine

Version 1.3

Table of contents

Overview

The IResearch library is meant to be treated as a standalone index that is capable of both indexing and storing individual values verbatim. Indexed data is treated on a per-version/per-revision basis, i.e. existing data version/revision is never modified and updates/removals are treated as new versions/revisions of the said data. This allows for trivial multi-threaded read/write operations on the index. The index exposes its data processing functionality via a multi-threaded 'writer' interface that treats each document abstraction as a collection of fields to index and/or store. The index exposes its data retrieval functionality via 'reader' interface that returns records from an index matching a specified query. The queries themselves are constructed query trees built directly using the query building blocks available in the API. The querying infrastructure provides the capability of ordering the result set by one or more ranking/scoring implementations. The ranking/scoring implementation logic is plugin-based and lazy-initialized during runtime as needed, allowing for addition of custom ranking/scoring logic without the need to even recompile the IResearch library.

High level architecture and main concepts

Index

An index consists of multiple independent parts, called segments and index metadata. Index metadata stores information about active index segments for the particular index version/revision. Each index segment is an index itself and consists of the following logical components:

  • segment metadata
  • field metadata
  • term dictionary
  • postings lists
  • list of deleted documents
  • stored values

Read/write access to the components carried via plugin-based formats. Index may contain segments created using different formats.

Document

A database record is represented as an abstraction called a document. A document is actually a collection of indexed/stored fields. In order to be processed each field should satisfy at least IndexedField or StoredField concept.

IndexedField concept

For type T to be IndexedField, the following conditions have to be satisfied for an object m of type T:

Expression Requires Effects
m.name() The output type must be convertible to irs::string_ref A value uses as a key name.
m.get_tokens() The output type must be convertible to irs::token_stream* A token stream uses for populating in invert procedure. If value is nullptr field is treated as non-indexed.
m.index_features() The output type must be implicitly convertible to irs::IndexFeatures A set of features requested for evaluation during indexing. E.g. it may contain request of processing positions and frequencies. Later the evaluated information can be used during querying and scoring.
m.features() The output type must be convertible to const irs::flags& A set of user supplied features to be associated with a field. E.g. it may contain request of storing field norms. Later the stored information can be used during querying and scoring.

StoredField concept

For type T to be StoredField, the following conditions have to be satisfied for an object m of type T:

Expression Requires Effects
m.name() The output type must be convertible to irs::string_ref A value uses as a key name.
m.write(irs::data_output& out) The output type must be convertible to bool. One may write arbitrary data to stream denoted by out in order to retrieve written value using index_reader API later. If nothing has written but returned value is true then stored value is treated as flag. If returned value is false then nothing is stored even if something has been written to out stream.

Directory

A data storage abstraction that can either store data in memory or on the filesystem depending on which implementation is instantiated. A directory stores at least all the currently in-use index data versions/revisions. For the case where there are no active users of the directory then at least the last data version/revision is stored. Unused data versions/revisions may be removed via the directory_cleaner. A single version/revision of the index is composed of one or more segments associated, and possibly shared, with the said version/revision.

Writer

A single instance per-directory object that is used for indexing data. Data may be indexed in a per-document basis or sourced from another reader for trivial directory merge functionality. Each commit() of a writer produces a new version/revision of the view of the data in the corresponding directory. Additionally the interface also provides directory defragmentation capabilities to allow compacting multiple smaller version/revision segments into larger more compact representations. A writer supports two-phase transactions via begin()/commit()/rollback() methods.

Reader

A reusable/refreshable view of an index at a given point in time. Multiple readers can use the same directory and may point to different versions/revisions of data in the said directory.

Build prerequisites

v3.10 or later

v1.57.0 or later (headers only)

set environment

BOOST_ROOT=<path-to>/boost_1_57_0

install (*nix)

make
make install

or point LZ4_ROOT at the source directory to build together with IResearch

install (win32)

If compiling IResearch with /MT add add_definitions("/MTd") to the end of cmake_unofficial/CMakeLists.txt since cmake will ignore the command line argument -DCMAKE_C_FLAGS=/MTd

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=<install-path> -DBUILD_STATIC_LIBS=on -g "Visual studio 17" -Ax64 ../contrib/cmake_unofficial
cmake --build .
cmake --build . --target install

or point LZ4_ROOT at the source directory to build together with IResearch

set environment

LZ4_ROOT=<install-path>

win32 binaries also available in:

v53 or higher

install (*nix)

./configure --disable-samples --disable-tests --enable-static --srcdir="$(pwd)" --prefix=<install-path> --exec-prefix=<install-path>
make install

or point ICU_ROOT at the source directory to build together with IResearch or via the distributions' package manager: libicu

install (win32)

look for link: "ICU4C Binaries"

set environment

ICU_ROOT=<path-to-icu>

install (*nix)

the custom CMakeLists.txt is intended to be used with snowball v2.0.0 and later versions. At least it was tested to work on commit 53739a805cfa6c77ff8496dc711dc1c106d987c1

git clone https://github.com/snowballstem/snowball.git
mkdir build && cd build
cmake -DENABLE_STATIC=OFF -DNO_SHARED=OFF -g "Unix Makefiles" ..
cmake --build .
cmake -DENABLE_STATIC=OFF -DNO_SHARED=ON -g "Unix Makefiles" ..
cmake --build .

or point SNOWBALL_ROOT at the source directory to build together with IResearch or via the distributions' package manager: libstemmer

install (win32)

the custom CMakeLists.txt was based on revision 5137019d68befd633ce8b1cd48065f41e77ed43e later versions may be used at your own risk of compilation failure

git clone https://github.com/snowballstem/snowball.git
git reset --hard adc028f3ae646623bda2f99191fe9dc3287a909b
mkdir build && cd build
set PATH=%PATH%;<path-to>/build/Debug
cmake -DENABLE_STATIC=OFF -DNO_SHARED=OFF -g "Visual studio 12" -Ax64 ..
cmake --build .
cmake -DENABLE_STATIC=OFF -DNO_SHARED=ON -g "Visual studio 12" -Ax64 ..
cmake --build .

or point SNOWBALL_ROOT at the source directory to build together with IResearch

For static builds:

  1. in MSVC open: build/snowball.sln
  2. set: stemmer -> Properties -> Configuration Properties -> C/C++ -> Code Generation -> Runtime Library = /MTd
  3. BUILD -> Build Solution

set environment

SNOWBALL_ROOT=<path-to-snowball>

point VPACK_ROOT at the source directory to build together with IResearch

install (*nix)

mkdir build && cd build
cmake ..
make

or point GTEST_ROOT at the source directory to build together with IResearch

install (win32)

mkdir build && cd build
cmake -g "Visual studio 12" -Ax64 -Dgtest_force_shared_crt=ON -DCMAKE_DEBUG_POSTFIX="" ..
cmake --build .
mv Debug ../lib

or point GTEST_ROOT at the source directory to build together with IResearch

set environment

GTEST_ROOT=<path-to-gtest>

Stopword list (for use with analysis::text_analyzer)

download any number of lists of stopwords, e.g. from: https://github.com/snowballstem/snowball-website/tree/master/algorithms/*/stop.txt https://code.google.com/p/stop-words/

install

  1. mkdir
  2. for each language, (e.g. "c", "en", "es", "ru"), create a corresponding subdirectory (a directory name has 2 letters except the default locale "c" which has 1 letter)
  3. place the files with stopwords, (utf8 encoded with one word per line, any text after the first whitespace is ignored), in the directory corresponding to its language (multiple files per language are supported and will be interpreted as a single list)

set environment

IRESEARCH_TEXT_STOPWORD_PATH=<path-to-stopword-lists>

If the variable IRESEARCH_TEXT_STOPWORD_PATH is left unset then locale specific stopword-list subdirectories are deemed to be located in the current working directory

Build

git clone <IResearch code repository>/iresearch.git iresearch
cd iresearch
mkdir build && cd build

generate build file <*nix>:

cmake -DCMAKE_BUILD_TYPE=[Debug|Release|Coverage] -g "Unix Makefiles" ..
  1. if some libraries are not found by the build then set the needed environment > variables (e.g. BOOST_ROOT, BOOST_LIBRARYDIR, LZ4_ROOT, OPENFST_ROOT, GTEST_ROOT)
  2. if ICU or Snowball from the distribution paths are not found, the following additional > environment variables might be required: > ICU_ROOT_SUFFIX=x86_64-linux-gnu SNOWBALL_ROOT_SUFFIX=x86_64-linux-gnu

generate build file (win32):

cmake -g "Visual studio 12" -Ax64 ..

If some libraries are not found by the build then set the needed environment variables (e.g. BOOST_ROOT, BOOST_LIBRARYDIR, LZ4_ROOT, OPENFST_ROOT, GTEST_ROOT)

set Build Identifier for this build (optional)

echo "<build_identifier>" > BUILD_IDENTIFIER

build library:

cmake --build .

test library:

cmake --build . --target iresearch-check

install library:

cmake --build . --target install

code coverage:

cmake --build . --target iresearch-coverage

Pyresearch

There is Python wrapper for IResearch. Wrapper gives access to directory reader object. For usage example see /python/scripts

Build

To build Pyresearch SWIG generator should be available. Add -DUSE_PYRESEARCH=ON to cmake command-line to generate Pyresearch targets

Install

Run target pyresearch-install

win32 install notes:

Some version of ICU installers seems to fail to make available all icu dlls through PATH enviroment variable, manual adjustment may be needed.

(*nix) install notes:

Shared version of libiresearch is used. Install IResearch before running Pyresearch.

External 3rd party dependencies

External 3rd party dependencies must be made available to the IResearch library separately. They may either be installed through the distribution package management system or build from source and the appropriate environment variables set accordingly.

v1.57.0 or later (locale system thread) used for functionality not available in the STL (excluding functionality available in ICU)

used for compression/decompression of byte/string data

used by analyzers for parsing, transforming and tokenising string data

used by analyzers for computing word stems (i.e. roots) for more flexible matching matching of words from languages not supported by 'snowball' are done verbatim

used for writing tests for the IResearch library

used for JSON serialization/deserialization

Stopword list

used by analysis::text_analyzer for filtering out noise words that should not impact text ranging e.g. for 'en' these are usualy 'a', 'the', etc... download any number of lists of stopwords, e.g. from: https://github.com/snowballstem/snowball-website/tree/master/algorithms/*/stop.txt https://code.google.com/p/stop-words/ or create a custom language-specific list of stopwords place the files with stopwords, (utf8 encoded with one word per line, any text after the first whitespace is ignored), in the directory corresponding to its language (multiple files per language are supported and will be interpreted as a single list)

Query filter building blocks

Filter Description
irs::by_edit_distance for filtering of values based on Levenshtein distance
irs::by_granular_range for faster filtering of numeric values within a given range, with the possibility of specifying open/closed ranges
irs::by_ngram_similarity for filtering of values based on NGram model
irs::by_phrase for word-position-sensitive filtering of values, with the possibility of skipping selected positions
irs::by_prefix for filtering of exact value prefixes
irs::by_range for filtering of values within a given range, with the possibility of specifying open/closed ranges
irs::by_same_position for term-insertion-order sensitive filtering of exact values
irs::by_term for filtering of exact values
irs::by_terms for filtering of exact values by a set of specified terms
irs::by_wildcard for filtering of values based on matching pattern
irs::ByNestedFilter for filtering of documents based on matching pattern on its sub-documents
irs::And boolean conjunction of multiple filters, influencing document ranks/scores as appropriate
irs::Or boolean disjunction of multiple filters, influencing document ranks/scores as appropriate (including "minimum match" functionality)
irs::Not boolean negation of multiple filters

Supported compilers

  • GCC: 10+
  • MSVC: 2019+
  • Clang: 12+

License

Copyright (c) 2017-2023 ArangoDB GmbH

Copyright (c) 2016-2017 EMC Corporation

This software is provided under the Apache 2.0 Software license provided in the LICENSE.md file. Licensing information for third-party products used by IResearch search engine can be found in THIRD_PARTY_README.md

iresearch's People

Contributors

alexbakharew avatar diameter avatar dothebart avatar dronplane avatar elfringham avatar geenen124 avatar gnusi avatar goedderz avatar iurii-i-popov avatar jsteemann avatar kvs85 avatar maierlars avatar markuspf avatar mbkkt avatar mpoeter avatar nabatv avatar obiwahn avatar vasiliy-arangodb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iresearch's Issues

IRES-325: Use size_t where it possible

Jira issue originally created by user @gnusi:

Use size_t where it possible

Remove ungly static casts where it possible, e.g.:

  • docid and docmax
  • writevint and sizet

Search for:
uint32_t(
staticcast<uint32t>(

IRES-230: Jasmine Tests shell-foxx-manager-install-spec.js, shell-query-timecritical-spec.js, shell-foxx-repository-spec.js, shell-foxx-query-spec.js, shell-foxx-model-events-spec.js failed

Jira issue originally created by user belyaa2:

Jasmine test failed with multiple errors:

Main errors:

  • Manifest 'js/apps/*db/*system/unittest/broken/APP/manifest.json' does not provide required attribute 'name'
  • Manifest file 'js/apps/db/_system/unittest/broken/APP/manifest.json' is invald: The App name can only contain a to z, A to Z, 0-9, '-' and ''
  • Manifest file 'js/apps/*db/*system/unittest/broken/APP/manifest.json' is invald: The version requires the format: .., all have to be integer numbers.
  • JavaScript exception in file 'js/apps/*db/*system/unittest/broken/APP/broken-controller.js' at 3,8: SyntaxError: Unexpected identifier
  • Cannot compute Foxx application routes: [ArangoError 3006: File: broken-controller.js syntax error in script SyntaxError: Unexpected identifier
  • Cannot compute Foxx application routes: [ArangoError 3007: Route has to start with /
  • Cannot compute Foxx application routes: [ArangoError 14: file not found: js/apps/*db/*system/unittest/broken/APP/illegal/file/name/]�
  • Cannot compute Foxx application routes: [ArangoError 3005: failed to execute script File: broken-controller.js Error: Error: This is an error from the controller.]�
  • Setup not possible for mount '/unittest/broken': Error: This is an error from the setup.�
  • JavaScript exception in file 'js/apps/*db/*system/unittest/broken/APP/broken-exports.js' at 3,8: SyntaxError: Unexpected identifier�
  • JavaScript exception in file 'js/apps/*db/*system/unittest/broken/APP/broken-setup.js' at 3,8: SyntaxError: Unexpected identifier�
  • Setup not possible for mount '/unittest/broken': SyntaxError: Unexpected identifier�
  • Setup not possible for mount '/unittest/broken'
  • Cannot compute Foxx application routes: [ArangoError 14: file not found: js/apps/*db/*system/unittest/broken/APP/does-not-exist.js]�
  • Setup not possible for mount '/unittest/broken'
  • JavaScript exception in file './js/server/modules/org/arangodb/arango-statement.js' at 86,45: [ArangoError 1500: query killed (while executing)

Errors have logs like following:
{quote}
Running Jasmine Tests: ./js/common/tests/shell-foxx-manager-install-spec.js, ./js/common/tests/shell-query-timecritical-spec.js, ./js/server/tests/shell-foxx-repository-spec.js, ./js/server/tests/shell-foxx-query-spec.js, ./js/server/tests/shell-foxx-model-events-spec.js

..�[31m2015-12-24T12:11:31Z [225] ERROR Manifest 'js/apps/*db/*system/unittest/broken/APP/manifest.json' does not provide required attribute 'name'�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR Manifest 'js/apps/*db/*system/unittest/broken/APP/manifest.json' does not provide required attribute 'version'�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR Manifest file 'js/apps/*db/*system/unittest/broken/APP/manifest.json' is invald: missing manifest attribute�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR Error: �[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at checkManifest (./js/server/modules/org/arangodb/foxx/manager.js:259:13)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at validateManifestFile (./js/server/modules/org/arangodb/foxx/manager.js:301:7)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at appConfig (./js/server/modules/org/arangodb/foxx/manager.js:433:17)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at createApp (./js/server/modules/org/arangodb/foxx/manager.js:447:18)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at _scanFoxx (./js/server/modules/org/arangodb/foxx/manager.js:726:15)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at db._executeTransaction.action (./js/server/modules/org/arangodb/foxx/manager.js:871:17)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at [object ArangoDatabase].ArangoDatabase._executeTransaction (./js/server/modules/org/arangodb/arango-database.js:142:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at _install (./js/server/modules/org/arangodb/foxx/manager.js:866:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Object.install (./js/server/modules/org/arangodb/foxx/manager.js:921:15)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Object. (./js/common/tests/shell-foxx-manager-install-spec.js:93:21)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at attemptSync (./js/common/modules/jasmine/core.js:1510:12)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.run (./js/common/modules/jasmine/core.js:1498:9)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.execute (./js/common/modules/jasmine/core.js:1485:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Spec.Env.queueRunnerFactory (./js/common/modules/jasmine/core.js:518:35)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Spec.execute (./js/common/modules/jasmine/core.js:306:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Object. (./js/common/modules/jasmine/core.js:1708:37)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at attemptAsync (./js/common/modules/jasmine/core.js:1520:12)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.run (./js/common/modules/jasmine/core.js:1496:16)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at next (./js/common/modules/jasmine/core.js:1517:37)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at complete (./js/common/modules/jasmine/core.js:333:9)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.clearStack (./js/common/modules/jasmine/core.js:506:9)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.run (./js/common/modules/jasmine/core.js:1505:12)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at QueueRunner.execute (./js/common/modules/jasmine/core.js:1485:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Spec.Env.queueRunnerFactory (./js/common/modules/jasmine/core.js:518:35)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Spec.execute (./js/common/modules/jasmine/core.js:306:10)�[0m
�[31m2015-12-24T12:11:31Z [225] ERROR at Object. (./js/common/modules/jasmine/core.js:1708:37)�[0m
...
{quote}

Build marked as successfull by error (I suggest because of test crashes).

See ArangoDBUnitTestsShellServer 433 log for details.

IRES-349: Compact FST

Jira issue originally created by user @gnusi:

In order to reduce term dictionary index memory footprint we can use compact representation of FST

IRES-133: Near realtime search

Jira issue originally created by user @gnusi:

At the moment search is available after commit operation. But should be able to query pre-inverted in-memory data too

IRES-364: README.md issues

Jira issue originally created by user belyaa2:

README.md contains multiple issues:

specify gcc version

add GCC dependency to Build prerequisites section (point that something like sudo yum groupinstall 'Development Tools' should be performed because simple gcc installation missed g, see http://unix.stackexchange.com/questions/140350/linux-g-command-not-found for details)

target iresearch-check in command cmake --build . --target iresearch-check does not work, please fix corresponding issue or change command to one with cobertura suffix which works fine

remove bash command from examples because it does not work

add sudo command to all commands where installation of built packages is performed (lz4 and others)

change link to https://github.com/lz4/lz4 for LZ4 paragraph name because current one is not correct

add more details about lz4 installation because only make commands do not reflect real installation procedure

add git client installation as prerequisites because we need clone git repo

lz4 require m4, so may be such step should be added (but m4 installation is lz4 installation step)

path GTEST_ROOT=/var/lib/jenkins/tools/gtest-1.7.0/ should be changed like others (e.g. <path to ...> should be used instead of hard coded one)

git clone <iResearch code repository>/iresearch.git iresearch should be changed to actual github path

add description to use <build_identifier> (whether it is mandatory and what should be passed)

LZ4_ROOT=/usr/local should be used because of install step

mkdir $GTEST_ROOT/lib and cp <path>/googletest-master/googletest//include/gtest/\**.h <path>/googletest-master/googletest/include/ should be added to gtest current installation procedure (for exampe: mkdir googletest/googletest/lib && cp googletest/googletest/build/libgtest* googletest/googletest/lib/ && cp googletest/googletest/include/gtest/\**.h googletest/googletest/include/)

installation for win platform should be described as separate part of readme

full build of coverage version should be described separately (note as part of main build procedure)

steps of build procedure should be numbered because current text is hard to read

description of tested environment should be added (OS at least)

change IResearch to IReSearch in name "EMC IResearch search engine"

specify which command cmake -DENABLE*STATIC=OFF -DNO_SHARED=OFF -g "Unix Makefiles" .. or cmake -DENABLE_STATIC=OFF -DNO*SHARED=ON -g "Unix Makefiles" .. should be run for snowball installation from sources (or describe in different section build for shared and static cases)

specify command/package to instal libstemmer for centos

add some detailed info about gtest installation (specify package to download, extract and show what should be build, e.g. gtest or gmock)

perhaps, we need to build icu dependencies from sources, download package http://site.icu-project.org/download/58#TOC-ICU4C-Download and read docs from the package

describe libicu installation process step by step and point out which files are need (for Centos there are no static built files in packages, so there is need to configure ICU with option --enable-static to build necessary files)

Tested platform by the moment: Ubuntu 14.04.3, gcc 4.8.4

CentOS platform testing is in progress.

IRES-208: format::get_field_writer(...) takes a parameter

Jira issue originally created by user nabatv:

format::getfieldwriter(...) takes a parameter

find a way to remove the parameter requirement to the call
parameter used when merging segments to advise that values of attribute values might change on calls to doc_itr->next()
e.g. position attributes (offset/payload) must be refreshed after doc_itr->next()

IRES-250: make iresearch-coverage fails

Jira issue originally created by user @gnusi:

Execute command:
make iresearch-coverage

Get the following output (rest):
....
Writing data to coverage.info.cleaned
Summary coverage rate:
lines......: 65.1% (10285 of 15802 lines)
functions..: 80.7% (2632 of 3261 functions)
branches...: no data found
Reading data file coverage.info.cleaned
Found 142 entries.
Found common filename prefix "/home/sk/git/iresearch"
Writing .css and .png files.
Generating output.
Processing file build/core/CMakeFiles/iresearch.dir/iql/position.hh
genhtml: ERROR: cannot read /home/sk/git/iresearch/build/core/CMakeFiles/iresearch.dir/iql/position.hh
make[3]: ***** [CMakeFiles/iresearch-coverage] Error 2
make[2]: ***** [CMakeFiles/iresearch-coverage.dir/all] Error 2
make[1]: ***** [CMakeFiles/iresearch-coverage.dir/rule] Error 2
make: ***** [iresearch-coverage] Error 2

IRES-120: Setup Lz4 version checking into build procedure

Jira issue originally created by user belyaa2:

During setup docker enabled build environment I installed Lz4 library via package manager for Ubuntu 14.04. Build fails because of obsolete Lz4 version. But there are no warnings about incompatibility with Lz4 version for IReSearch lib. Please add version checking into configuration/build procedure.

Following version is OK (see lz4.h for r131)

/****************************************
*  Version
****************************************/
#define LZ4*VERSION*MAJOR    1    /** for breaking interface changes  **/
#define LZ4*VERSION*MINOR    7    /** for new (non-breaking) interface capabilities **/
#define LZ4*VERSION*RELEASE  1    /** for tweaks, bug-fixes, or development **/
#define LZ4*VERSION_NUMBER (LZ4_VERSION_MAJOR *100*100 <ins> LZ4_VERSION_MINOR *100 </ins> LZ4_VERSION*RELEASE)
int LZ4_versionNumber (void);

IRES-338: memory_index_test.profile_bulk_index_multithread_batched failed on mismatched indexed_docs_count

Jira issue originally created by user belyaa2:

{quote}
[ RUN ] memoryindex_test.profile_bulk_index_multithreadbatched
Path to timing log: /home/jenkins/workspace/IReSearchMemLeakChecks-staticfast/88/build/bin/iresearch-tests-s_2016_09_09_06_59_08_OaN2QV/memory_index_test/profile_bulk_index_multithread_batched/profile_bulkindex.log
/home/jenkins/workspace/IReSearchMemLeakChecks-staticfast/88/tests/index/indextests.cpp:689: Failure
Value of: indexeddocscount
Actual: 99998
Expected: parseddocscount
Which is: 100000
[ FAILED ] memoryindex_test.profile_bulk_index_multithreadbatched (2451501 ms)
{quote}

See IReSearchMemLeakChecks-static*fast 88 for full details.

IRES-295: index_writer::flush_all() may cause inconsistence of index_meta

Jira issue originally created by user @gnusi:

      SCOPED*LOCK(lock*);

      for (auto metaItr = meta.segments.begin(); metaItr != meta.segments.end();) {
        auto& seg_meta = metaItr->meta;
        document*mask docs*mask;

        read*document_mask(docs_mask, *dir_, seg*meta);

        // write docs_mask if masks added, if all docs are masked then remove segment altogether
        if (add*document_mask_modified_records(docs_mask, seg*meta)) {
          meta.gen_dirty = true;

          if (docs*mask.size() == seg_meta.docs*count) { // remove empty segments
            metaItr = meta.segments.erase(metaItr);
            continue;
          }

          write*document_mask(*dir_, seg_meta, docs*mask);

//!!!!!!!!!! in case of exception after this line, meta may become inconsistent !!!!!!!!!!

          metaItr->filename = std::move(write*segment_meta(*dir_, seg*meta)); // write with new mask
        }

        <ins></ins>metaItr;
      }
    }

    // 'flushed' and 'writers' are filled in parallel above, differing only in scope
    assert(flushed.size() == segment_ctx.size());
    auto metaItr = flushed.begin();

    // write docs_mask if !empty(), if all docs are masked then remove segment altogether
    for (auto ctxItr = segment*ctx.begin(); ctxItr != segment*ctx.end(); <ins></ins>ctxItr) {
      auto& seg_meta = metaItr->meta;
      auto& seg_ctx = *ctxItr;

      // if have a writer with potential update-replacement records then check if they were seen
      if (seg_ctx.writer) {
        add*document_mask_unused*updates(
          seg*ctx.docs_mask, seg_meta, seg_ctx.writer->docs*context()
        );
      }

      if (seg*ctx.docs_mask.size() == seg_meta.docs*count) { // remove empty segments
        metaItr = flushed.segments.erase(metaItr);
      } else {
        if (!seg*ctx.docs*mask.empty()) { // write non-empty document mask
          write*document_mask(*dir_, seg_meta, seg_ctx.docs*mask); 
          metaItr->filename = std::move(write*segment_meta(*dir_, seg*meta)); // write with new mask
        }

        <ins></ins>metaItr;
      }
    }
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.