Giter VIP home page Giter VIP logo

elasticfacets's Introduction

Elastic Facets

A collection of facets and facet-related tools for ElasticSearch.

build status

Installation

In order to install the plugin, simply run: bin/plugin -install bleskes/elasticfacets/0.2. Change the version number if needed (see Versions).

Versions

ElasticFacets ElasticSearch
master 0.19.9 -> 0.19.11
0.2 0.19.9 -> 0.19.11
0.1 0.19.8

Included facets

Faceted Date Histogram

ElasticSearch comes with a powerfull built in facet called Date Histogram. Using the date histogram facet you can get a statistical analysis of a field for different time intervals (week by week, hour by hour etc.).

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "pub_per_week": {
      "date_histogram": {
        "key_field": "published",
        "value_field": "copies",
        "interval": "week"
      }
    }
  }
}

While this is very powerful, it is limitted to numerical fields.

The Faceted Date Histogram combines the power of Date Histogram with *any* facet in ElasticSearch. With it, you can replace the value_field parameter with a complete facet definition of your choice. For exmaple:

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "pub_per_week": {
      "faceted_date_histogram": {
        "field": "published",
        "interval": "week",
        "facet": {
          "terms": {
            "field": "username"
          }
        }
      }
    }
  }
}

This snippet uses the Terms facet to return the top usernames on a week by week basis.

Hashed Strings Facet

** STILL UNDER DEVELOPMENT. Available in master only **

A drop in replacement [1] to standrand terms facet but with lower memory signature. Usefull when you want to facet on a string field with many many possible string values. Normally, all these strings will be loaded into memory which take a lot. The Hashed Strings Facet only loads the hashes of these strings which considerably reduce its memory signatures. The facet onload loads the strings needed to actually return a response. Of course, nothing comes for free. The price you is slightly more IO and potentially inacuracies due to hash collisions. If the latter occur, you would see an appropriate WARN log message.

After installing the plugin you can call it as follows:

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "facet_name": {
      "hashed_terms": {
        "field": "username",
        "size": 10,
      }
    }
  }
}

This will return the top 10 usernames in your index in exactly the same format the standard terms facet does.

[1]

As the original string is no longer available at faceting time, these features of the standard term facets are not supported:

  • Regex filtering.
  • Term Scripts (although there is some control on output - see the extensions section).
  • Script Field
  • Term Ordering - alphabetically ordering terms is impossible.

Extensions to the standard terms facet

Next to the features offered by the terms facet, the Hashed Strings facet has some extra tricks to it:

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "facet_name": {
      "hashed_terms": {
        "field": "username",
        "size": 10,
        "fetch_size": 20,           # control over the number of terms returned by every shard before aggregation.
        "output_script":            # Modify what is outputed via a script.
            "_source.username+' on '+_source.website"
      }
    }
  }
}

Other Goodies

Cache stats per field

** STILL UNDER DEVELOPMENT. Available in master only **

Facets in ElasticSearch are powered by the FieldCache - a component that loads values into memory so they could be counted. This can potentially lead to high memory usage. ElasticSearch comes with a cache statistics end point from which you can get the current *total* cache size. This end points tells what is the cache size per field stored in it so you can find the source of the problem.

Usage:

curl -XGET 'http://localhost:9200/_cluster/nodes/cache/fields/stats
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/cache/fields/stats'

# simplified
curl -XGET 'http://localhost:9200/_nodes/cache/fields/stats'

Respones:

{
"cluster_name": "BoazMBP.local_buzzcapture_1.0"
  "nodes": {
      "node_id": {
        "timestamp": 1353134666971
        "name": "Frost, Deacon"
        "transport_address": "inet[/192.168.1.107:9300]"
        "hostname": "something.com"
        "fields": {
          "publish_date": {
            "size": 180
          }
          "copies": {
            "size": 180
          }
        }
      }
   }
}

elasticfacets's People

Contributors

bleskes avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.