Giter VIP home page Giter VIP logo

query-autofiltering-component's Introduction

Notes about this fork

This project is a fork of https://github.com/lucidworks/query-autofiltering-component, but includes the following changes to the 5.x code base:

  • Resolved issues with Ivy dependencies.
  • Upgraded component to work with Solr 5.3.1 and above.
  • Added support for using a field whitelist.

The whilelist field definition feature was implemented to solve a gap regarding dynamic fields. Although asking the Luke admin handler would have been another implementation option, it seems that a generic whitelist is more powerful; albeit, a bit more verbose in the configuration. Simply define the following:

 <searchComponent name="autofilter" class="org.apache.solr.handler.component.QueryAutoFilteringComponent" >
   <arr name="whitelistFields">
   <str>field1</str>
   <str>field2</str>
   <str>fieldN</str>
   </arr>
 </searchComponent>

query-autofiltering-component

A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata.

Introduction

The Query Autofiltering Component provides a method of inferring user intent by matching noun-phrases that are typically used for faceted-navigation into Solr filter or boost queries (depending on configuration settings) so that more precise user queries are met with more precise results. The algorithm uses a "longest contiguous phrase match" strategy which allows it to disambiguate queries where single terms are ambiguous but phrases are not. It will work when there is structured information in the form of String fields that are normally used for faceted navigation. It works across fields by building a map of search term to index field using the Lucene FieldCache (UninvertingReader in Solr5.x and above). This enables users to create multi-term queries that combine attributes across facet fields - as if they had searched and then navigated through several facet layers. To address the problem of exact-match only semantics of String fields, support for synonyms (including multi-term synonyms) and stemming was added.

Building from source

The buildware requires that Apache Ant is installed on the development machine. There are two versions of the component in this distribution, one for Solr 4.x installations and one for Solr 5.x. This is due to API changes introduced in Solr 5.0 for Lucene FieldCache access. The buildware was tested with Solr 4.10.3 and Solr 5.1 respectively.

After downloading the source code distribution, cd to the appropriate directory (solr4.x or solr5.x) and type: ant

If all goes well, (BUILD SUCCESSFUL message) a Java archive file should be created as dist/query-autofiltering-component-1.0.jar. This jar file should be copied to [solr-home]/solr/lib in Solr 4.x and [solr-home]/server/lib in Solr 5.x

Note that in Solr4.x there is an intermittent classpath issue that may cause the test to fail with "fix your classpath to have tests-framework.jar before lucene-core.jar". If this happens, running the build again (ant clean test) should (eventually) yield a successful completion (YMMV for Solr < 4.10.3 but I will update this README for issues with older 4.x versions as they are identified). Note that this issue is not related to Query Autofiltering code, rather it is due to assertion failures in the Java ClassLoader layers - and does not occur with the Solr5.x build.

Configuration

solrconfig.xml snippet:

  <!-- test query auto filter -->
  <requestHandler name="/autofilter" class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="df">text
    </lst>
    <arr name="first-components">
      <str>autofilter</str>
    </arr>
  </requestHandler>

  <searchComponent name="autofilter" class="org.apache.solr.handler.component.QueryAutoFilteringComponent" >
    <str name="synonyms">synonyms.txt</str>
  </searchComponent>
  
  <!-- Needed for Autofiltering in SolrCloud -->
  <searchComponent name="termsComp" class="org.apache.solr.handler.component.TermsComponent"/>
  
  <requestHandler name="/terms" class="org.apache.solr.handler.component.SearchHandler">
      <arr name="components">
          <str>termsComp</str>
      </arr>
  </requestHandler>

Filter Query or Boost Query:

The Query Autofiltering component can be used in filter query or boost query mode. To use boost mode by default, add a "boostFactor" configuration setting to the configuration:

  <searchComponent name="autofilter" class="org.apache.solr.handler.component.QueryAutoFilteringComponent" >
    <str name="synonyms">synonyms.txt</str>
    <int name="boostFactor">100</int>
  </searchComponent>

To use autofiltering boost query mode "on demand" add an &afb parameter to the query request as in &afb=100

##Sample Data

To show the query autofiltering component in action, I created a sample data set for a hypothetical department store. The input data contains a number of fields, product_type, product_category, color, material, brand, style, consumer_type and so on.

To build the demo, download Solr 5 (or Solr 4 if that is what your production app is on), put the schema.xml and solrconfig.xml in the solr/collection1/conf directory for Solr 4 or server/solr/configsets/basic_configs/conf for Solr 5 - or better yet, clone basic_configs/conf and create a new configset called query_autofilter_config_set and replace schema.xml and solrconfig.xml.

Put the jar file generated from running "ant dist" - or simply "ant" (it will be in the dist/ folder and called query-autofiltering-component-1.0.jar) into solr-webapp/webapp/WEB-INF/lib for Solr 4 or server/solr-webapp/webapp/WEB-INF/lib for Solr 5.

Startup Solr (Solr 4: java -jar start.jar Solr 5: ./bin/solr start), import the data file using the post tool (java -jar post.jar QueryAutofilteringData_1.xml) and start searching (localhost:8983/solr).

Note - with Solr 5 you will need to create a new collection first, it ships with configuration sets, not with a pre-built collection (collection1) as in Solr 4.

To compare the behavior of the search engine with and without autofiltering, use the /autofilter handler for "with" and the default /select handler for "without".

Happy autofiltering!

#High Level Design

[ basic control and data flow ] - build synonym maps (finite state transform)

query parsing steps alternative parsings of a phrase - alternative parsing must be able to use the complete phrase - partial mappings are rejected in favor of longer matches (e.g. "red baron pizza" - partial match "red" will be rejected. "white linen shirts" - "white linen" has two alternate and complete matches - brand:"white linen" and (color:white and material:linen) handling boolean terms in user query verb/adjective/preposition mapping

query-autofiltering-component's People

Contributors

detnavillus avatar bkheadwire avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.