Giter VIP home page Giter VIP logo

acts_as_xapian_gem's Introduction

acts_as_xapian_gem / acts_as_xapian

Introduction

Xapian is a full text search engine library which has Ruby bindings. acts_as_xapian adds support for it to Rails. It is an alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed, acts_as_searchable or acts_as_tsearch.

acts_as_xapian is deployed in production on these websites.

A Quick Note

This gem was created directly from the acts_as_xapian plugin. There were very few changes, the majority of which were to make the gem handle installation better. If you’d like more information about the original plugin go here or if I’ve left something crucial out, send me a message via github.

Installation

Install Xapian with the ruby bindings on your box. For you OSX users, I’d recommend using Homebrew

Then install the gem

sudo gem install acts_as_xapian

Navigate to your project and generate the required files

script/generate acts_as_xapian

Migrate your database

rake db:migrate

Usage

Xapian is an offline indexing search library - only one process can have the Xapian database open for writing at once, and others that try meanwhile are unceremoniously kicked out. For this reason, acts_as_xapian does not support immediate writing to the database when your models change.

Instead, there is a ActsAsXapianJob model which stores which models need updating or deleting in the search index. A rake task ‘xapian:update_index’ then performs the updates since last change. You can run it on a cron job, or similar.

Here’s how to add indexing to your Rails app:

Put acts_as_xapian in your models that need search indexing. e.g.

acts_as_xapian :texts => [:name, :short_name],
                         :values => [[ :created_at, 0, "created_at", :date ]],
                         :terms => [[ :variety, 'V', "variety" ]]

Options must include:

  • :texts, an array of fields for indexing with full text search.

e.g. :texts => [ :title, :body ]
  • :values, things which have a range of values for sorting or collapsing. Specify an array quadruple of [ field, identifier, prefix, type ] where identifier is an arbitary numeric identifier for use in the Xapian database, prefix is the part to use in search queries that goes before the : , and type can be any of :string, :number or :date.

e.g. :values => [[ :created_at, 0, "created_at", :date ], [ :size, 1, "size", :string ]]
  • :terms, things which come with a prefix (before a ‘:’) in search queries. Specify an array triple of [ field, char, prefix ] where char is an arbitary single upper case char used in the Xapian database, just pick any single uppercase character, but use a different one for each prefix. prefix is the part to use in search queries that goes before the : . For example, if you were making Google and indexing to be able to later do a query like “site:www.whatdotheyknow.com”, then the prefix would be “site”.

e.g. :terms => [ [ :variety, 'V', "variety" ] ]

A ‘field’ is a symbol referring to either an attribute or a function which returns the text, date or number to index. Both ‘identifier’ and ‘char’ must be the same for the same prefix in different models.

Options may include:

  • :eager_load, added as an :include clause when looking up search results in database

  • :if, either an attribute or a function which if returns false means the object isn’t indexed

To build the index the first time, call:

rake xapian:rebuild_index

It puts the db in the development/test/production directory in your db directory. See the configuration section below if you want to change this.

Then from a cron job or a daemon, or by hand regularly call:

'rake xapian:update_index'

Querying

Testing indexing

If you just want to test indexing is working, you’ll find this rake task useful:

rake xapian:query q="moo"

You have a few more options here:

  • models - the models to query (ex: models=“User Company”). Omitting searches all xapian models

  • offset - the offset of the results

  • limit - the limiting number of results

  • sort_by_prefix - sort by the prefix specified in value field of the acts_as_xapian call

  • collapse_by_prefix - collapse the results based on best result for it’s prefix

Performing a query

To perform a query from code call ActsAsXapian::Search.new. This takes in turn:

  • model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]

  • query_string - Google like syntax, see below

And then a hash of options:

  • :offset - Offset of first result (default 0)

  • :limit - Number of results per page

  • :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance

  • :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort

  • :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)

Google like query syntax is as described in Xapian::QueryParser Syntax Queries can include prefix:value parts, according to what you indexed in the acts_as_xapian part above. You can also say things like model:InfoRequestEvent to constrain by model in more complex ways than the :model parameter, or modelid:InfoRequestEvent-100 to only find one specific object.

Returns an ActsAsXapian::Search object. Useful methods are:

  • description - a techy one, to check how the query has been parsed

  • matches_estimated - a guesstimate at the total number of hits

  • spelling_correction - the corrected query string if there is a correction, otherwise nil

  • words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight

  • results - an array of hashes each structured like:

{:model > YourModel, :weight => 3.92, :percent => 100%, :collapse_count => 0}
  • :model - your Rails model, this is what you most want!

  • :weight - relevancy measure

  • :percent - the weight as a %, 0 meaning the item did not match the query at all

  • :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix

Finding similar models

To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:

  • model_classes - list of model classes to return models from within

  • models - list of models that you want to find related ones to

Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except for words_to_highlight. In addition has:

  • important_terms - the terms extracted from the input models, that were used to search for output. You need the results methods to get the similar models.

Configuration

If you want to customise the configuration of acts_as_xapian, it will look for a file called ‘xapian.yml’ under RAILS_ROOT/config. As is familiar from the format of the database.yml file, separate :development, :test and :production sections are expected.

The following options are available:

  • base_db_path - specifies the directory, relative to RAILS_ROOT, in which acts_as_xapian stores its search index databases. Default is the xapiandbs directory within the db directory.

Performance

On development sites, acts_as_xapian automatically logs the time taken to do searches. The time displayed is for the Xapian parts of the query; the Rails database model lookups will be logged separately by ActiveRecord. Example:

Xapian query (0.00029s) Search: hello

To enable this, and other performance logging, on a production site, temporarily add this to the end of your config/environment.rb

ActiveRecord::Base.logger = Logger.new(STDOUT)

Support

Please ask any questions on the acts_as_xapian Google Group

The official home page and repository for acts_as_xapian are the acts_as_xapian github page

For more details about anything, see source code in lib/acts_as_xapian/*.rb

acts_as_xapian_gem's People

Contributors

robinhouston avatar workmad3 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

mysociety

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.