Giter VIP home page Giter VIP logo

sphinx-haystack's Introduction

A SmartFile Open Source project. Read more about how SmartFile uses and contributes to Open Source software.

SmartFile

Introduction

This is a backend for haystack that implements support for Sphinx RT Indexes.

Sphinx RT indexes are real-time indexes managed via SphinxQL (MySQL compatible API).

Usage

You should install this backend using setup.py or pip.

$ pip install sphinx-haystack

-or-

$ python setup.py install

This backend uses MySQLdb to connect to Sphinx using it's SQL emulation.

$ pip install MySQL-python

The database connection pooling from SQLAlchemy is used if available. It is highly recommended that you install this.

$ pip install SQLAlchemy

Once installed, you must define a connection in your settings.py file:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'sphinx_haystack.sphinx_backend.SphinxEngine',
        # Name of the Sphinx real-time index.
        'INDEX_NAME': 'notes',
    },
}

You also need to add sphinx_haystack to your INSTALLED_APPS list in settings.py:

INSTALLED_APPS += ('sphinx_haystack', )

Sphinx Configuration

Traditional Sphinx indexes require a data source to be configured. You would then use the indextool command line program to build the index from the data source. Traditional indexes are not supported by this backend.

This backend works with Sphinx Real-Time indexes. This type of index starts out empty (or as a traditional index "converted" to a real time index) and is populated using SQL INSERT/UPDATE/REPLACE INTO queries. Sphinx emulates MySQL server to allow the user to issue these commands directly to it.

To configure a real-time index, you do the following in /etc/sphinx.conf. The following example is for the haystack example Notes application.

# realtime index example
#
# you can run INSERT, REPLACE, and DELETE on this index on the fly
# using MySQL protocol (see 'listen' directive below)
index notes
{
        # 'rt' index type must be specified to use RT index
        type                    = rt

        # index files path and file name, without extension
        # mandatory, path must be writable, extensions will be auto-appended
        path                    = /var/data/indexes/notes

        # RAM chunk size limit
        # RT index will keep at most this much data in RAM, then flush to disk
        # optional, default is 32M
        #
        # rt_mem_limit          = 512M

        # full-text field declaration
        # multi-value, mandatory
        rt_field                = title
        rt_field                = text

        # unsigned integer attribute declaration
        # multi-value (an arbitrary number of attributes is allowed), optional
        # declares an unsigned 32-bit attribute
        #rt_attr_uint           = gid

        # RT indexes currently support the following attribute types:
        # uint, bigint, float, timestamp, string
        #
        rt_attr_bigint                = user_id
        # rt_attr_float         = gpa
        rt_attr_timestamp       = pub_date
        # rt_attr_string                = author
}

Implementation Notes

Sphinx is different than other haystack backends as it allows multiple "full text" columns. You can do full text searches against any (or all) of these columns. You can do additional filtering on other "attributes" which are non-indexed columns.

Additionally, Sphinx requires that you provide a unique ID for each document. One is NOT generated for you by Sphinx, and it is required to be unique. This ID must also be reproducable, since if you later update a document, you will need to reference it using the same ID that it was created with.

Sphinx-haystack handles this by using a Document model. By default, each indexed document will result in the creation of a Document instance. This instance will lend it's pk as the unique document ID Sphinx requires. In addition, this model is used as the search() method's result_class instead of the default haystack.model.SearchResult class. It emulates it's behavior.

The Document class uses a GenericForeignKey to reference the indexed object. This adds overhead but is the only way to guarantee a unique document ID when indexing multiple models into the same Sphinx index.

sphinx-haystack's People

Contributors

btimby avatar

Watchers

Konstantin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.