Giter VIP home page Giter VIP logo

stn / search-benchmark-game Goto Github PK

View Code? Open in Web Editor NEW

This project forked from quickwit-oss/search-benchmark-game

0.0 0.0 0.0 8.91 MB

Search engine benchmark (Tantivy, Lucene, PISA, ...)

Home Page: https://tantivy-search.github.io/bench/

Shell 0.52% JavaScript 12.13% C++ 8.61% Python 5.17% Java 12.44% Go 7.49% Rust 40.38% Makefile 8.12% HTML 2.89% CMake 1.14% Dockerfile 0.48% SCSS 0.62%

search-benchmark-game's Introduction

This repository is forked from quickwit-oss/search-benchmark-game to test pysearchlite.

Welcome to Search Benchmark, the Game!

This repository is standardized benchmark for comparing the speed of various aspects of search engine technologies.

The results are available here.

This benchmark is both

  • for users to make it easy for users to compare different libraries
  • for library developers to identify optimization opportunities by comparing their implementation to other implementations.

Currently, the benchmark only includes Lucene and tantivy. It is reasonably simple to add another engine.

You are free to communicate about the results of this benchmark in a reasonable manner. For instance, twisting this benchmark in marketing material to claim that your search engine is 31x faster than Lucene, because your product was 31x on one of the test is not tolerated. If this happens, the benchmark will publicly host a wall of shame. Bullshit claims about performance are a plague in the database world.

The benchmark

Different search engine implementation are benched over different real-life tests. The corpus used is the English wikipedia. Stemming is disabled. Queries have been derived from the AOL query dataset (but do not contain any personal information).

Out of a random sample of query, we filtered queries that had at least two terms and yield at least 1 hit when searches as a phrase query.

For each of these query, we then run them as :

  • intersection
  • unions
  • phrase queries

with the following collection options :

  • COUNT only count documents, no need to score them
  • TOP 10 : Identify the 10 documents with the best BM25 score.
  • TOP 10 + COUNT: Identify the 10 documents with the best BM25 score, and count the matching documents.

We also reintroduced artificially a couple of term queries with different term frequencies.

All tests are run once in order to make sure that

  • all of the data is loaded and in page cache
  • Java's JIT already kicked in.

Test are run in a single thread. Out of 5 runs, we only retain the best score, so Garbage Collection likely does not matter.

Engine specific detail

Lucene

  • Query cache is disabled.
  • GC should not influence the results as we pick the best out of 5 runs.
  • JVM used was openjdk 10.0.1 2018-04-17

Tantivy

  • Tantivy returns slightly more results because its tokenizer handles apostrophes differently.
  • Tantivy and Lucene both use BM25 and should return almost identical scores.

Reproducing

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

The lucene benchmarks requires java and Gradle. This can be installed from the Gradle website. The tantivy benchmarks and benchmark driver code requires Cargo. This can be installed using rustup.

Installing

Clone this repo.

git clone [email protected]:stn/search-benchmark-game.git

Running

Checkout the Makefile for all available commands. You can adjust the ENGINES parameter for a different set of engines.

Run make corpus to download and unzip the corpus used in the benchmark.

make corpus

Run make index to create the indices for the engines.

make index

Run make bench to build the different project and run the benches. This command may take more than 30mn.

make bench

The results are outputted in a results.json file.

You can then check your results out by running:

make serve

And open the following in your browser: http://localhost:8000/

Adding another search engine

See CONTRIBUTE.md.

search-benchmark-game's People

Contributors

fulmicoton avatar jason-wolfe avatar stn avatar amallia avatar mosuka avatar lengyijun avatar pseitz avatar petr-tik avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.