Giter VIP home page Giter VIP logo

elasticsearch-index-termlist's Introduction

Elasticsearch Index Termlist Plugin

This plugin extends Elasticsearch with a term list capability. It presents a list of terms in a field of an index and can also list each terms frequency. Term lists can be generated from one index or even of all of the indexes.

Versions

Elasticsearch Plugin Release date
2.3.0 2.3.0.0 March 29, 2016
2.2.0 2.2.0.2 March 22, 2016
1.5.2 1.5.2.0 Jun 5, 2015
1.5.0 1.5.0.0 Apr 9, 2015
1.4.4 1.4.4.0 Mar 15, 2015
1.4.0 1.4.0.2 Feb 19, 2015
1.4.0 1.4.0.1 Jan 14, 2015
1.4.0 1.4.0.0 Nov 18, 2014
1.3.2 1.3.0.0 Aug 21, 2014
1.2.1 1.2.1.0 Jul 3, 2014

Installation

./bin/plugin -install index-termlist -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.5.2.0/elasticsearch-index-termlist-1.5.2.0-plugin.zip

Do not forget to restart the node after installing.

Project docs

The Maven project site is available at Github

Issues

All feedback is welcome! If you find issues, please post them at Github

Introduction

Getting the list of all terms indexed is useful for various purposes, for example

  • term statistics
  • building dictionaries
  • controlling the overall effects of analyzers on the indexed terms
  • automatic query building on indexed terms, e.g. for load tests
  • input to linguistic analysis tools
  • for other post-processing of the indexed terms outside of Elasticsearch

Optionally, the term list can be narrowed down to a field name. The field name is the Lucene field name as found in the Lucene index.

Only terms of field names not starting with underscore are listed. Terms of internal fields like _uid, _all, or _type are always skipped.

Response

For each term, statistics are computed.

{
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "took": 384,
   "numdocs": 51279,
   "numterms": 100,
   "terms": [
	  {
		 "term": "aacr2",
		 "totalfreq": 34699,
		 "docfreq": 34697,
		 "min": 1,
		 "max": 2,
		 "mean": 1.0000505458956723,
		 "geomean": 1.0000399550985877,
		 "sumofsquares": 34703,
		 "sumoflogs": 1.3862943611198906,
		 "sigma": 0.008475454987021664,
		 "variance": 0.00007183333723703039
	  }, ...

took - milliseconds required for executing

numdocs - the number of documents examined

numterms - the number of terms returned

terms - the array of term infos

term - the name of the term

totalfreq - the total number of occurrences of this term

docfreq - the document count where this term appears in

min - the minimum number of occurrences of this term in a document

max - the maximum number of occurrences of this term in a document

mean - the mean of the term occurences

geomean - the gemotric mean of the term occurrences

sumofsquares - sum of the squares of the term occurrences

sumoflogs - sum of the logarithms of the term occurences

variance - the variance of the term occurences

sigma - the standard deviation, equal to sqrt(variance)

Example

Consider the following example

curl -XDELETE 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "test": "Hello World" }'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{ "test": "Hello Jörg Prante" }'
curl -XPUT 'http://localhost:9200/test/test/3' -d '{ "message": "elastic search" }'

Get term list of index test

curl -XGET 'http://localhost:9200/test/_termlist'

Get term list of index test of field message

curl -XGET 'http://localhost:9200/test/_termlist?field=message'

Get term list of index test with total frequencies but only the first three of the list

curl -XGET 'http://localhost:9200/test/_termlist?size=3'

Get term list of terms starting with hello in index test field test

curl -XGET 'http://localhost:9200/test/_termlist?field=test&term=hello'

A page of 100 terms of a sorted list of terms in your index beginning with a

curl -XGET 'http://localhost:9200/books/_termlist?term=a&sortbyterms&pretty&from=0&size=100' 

A page of 100 terms of a sorted list of terms in your index beginning with frodo,'frod','fro' and 'fr', since your backtracingcount is set to 3

curl -XGET 'http://localhost:9200/books/_termlist?term=frodo&sortbyterms&pretty&from=0&size=100*backtracingcount=3' 

Caution

The term list is built internally into an unsorted, compact set of strings which i s not streamed to the client. You should be aware that if you have lots of unique terms in the index, this procedure consumes a lot of heap memory and may result in out of memory situations that can render your Elasticsearch cluster unusable until it is restarted.

License

Elasticsearch Term List Plugin

Copyright (C) 2011-2015 Jörg Prante

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elasticsearch-index-termlist's People

Contributors

benmccann avatar charvind avatar jprante avatar mschumann avatar q42jaap avatar spyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-index-termlist's Issues

Readme invalid

The plugin command in the readme is not valid. It should be elasticsearch-index-termlist not elasticsearch-termlist

option to show terms of leading underscore fields (e.g., "_all")?

With an index with only "index": "not_analyzed" fields, it is confusing to not see the terms of _all in curl -XGET 'http://localhost:9200/_termlist' (i.e., output is: {"_shards":{"total":1,"successful":1,"failed":0},"terms":[]}), but I see this is intentional in the README.md. Can an option to show these be added?

No handler found for uri [/test/_termlist] and method [GET]

I followed the example in README, created the test index and then

curl -XGET 'http://localhost:9200/test/_termlist'
No handler found for uri [/test/_termlist] and method [GET]

Any hint on why I am getting this message?


{
   "status": 200,
   "name": "Dwith Scrhute",
   "version": {
      "number": "1.2.1",
      "build_hash": "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
      "build_timestamp": "2014-06-03T15:02:52Z",
      "build_snapshot": false,
      "lucene_version": "4.8"
   },
   "tagline": "You Know, for Search"
}

Support for elastic search 6.x ?

any plan to upgrade to support the ES 6.x? I really wanna write the codes, however have no idea about designing a plugin for ES. Any help?

ES 1.1.1

It seems that this plugin does not compatible with ES 1.1.1.
Once the plugin is installed, the ES can not start and error messages are listed as follows:
{1.1.1}: Initialization Failed ...

  • ExecutionError[java.lang.NoClassDefFoundError: org/elasticsearch/ElasticSearchException]
    NoClassDefFoundError[org/elasticsearch/ElasticSearchException]
    ClassNotFoundException[org.elasticsearch.ElasticSearchException]

plugins breaks elasticsearch 1.4.0

I just tried to upgrade my environment to the new version of ES and discovered that index-termlist no longer works:

elasticsearch-1.4.0$ ./bin/elasticsearch
[2014-11-12 22:28:00,212][INFO ][node                     ] [Madame Masque] version[1.4.0], pid[92545], build[bc94bd8/2014-11-05T14:26:12Z]
[2014-11-12 22:28:00,212][INFO ][node                     ] [Madame Masque] initializing ...
[2014-11-12 22:28:00,215][INFO ][plugins                  ] [Madame Masque] loaded [], sites []
[2014-11-12 22:28:02,074][INFO ][node                     ] [Madame Masque] initialized
[2014-11-12 22:28:02,075][INFO ][node                     ] [Madame Masque] starting ...
[2014-11-12 22:28:02,122][INFO ][transport                ] [Madame Masque] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.104:9300]}
[2014-11-12 22:28:02,145][INFO ][discovery                ] [Madame Masque] elasticsearch/8NI0XN43RHqhMyQubzu1RQ
[2014-11-12 22:28:05,918][INFO ][cluster.service          ] [Madame Masque] new_master [Madame Masque][8NI0XN43RHqhMyQubzu1RQ][globogym.local][inet[/192.168.1.104:9300]], reason: zen-disco-join (elected_as_master)
[2014-11-12 22:28:05,932][INFO ][http                     ] [Madame Masque] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.104:9200]}
[2014-11-12 22:28:05,932][INFO ][node                     ] [Madame Masque] started
[2014-11-12 22:28:05,936][INFO ][gateway                  ] [Madame Masque] recovered [0] indices into cluster_state
^C[2014-11-12 22:28:27,469][INFO ][node                     ] [Madame Masque] stopping ...
[2014-11-12 22:28:27,482][INFO ][node                     ] [Madame Masque] stopped
[2014-11-12 22:28:27,482][INFO ][node                     ] [Madame Masque] closing ...
[2014-11-12 22:28:27,486][INFO ][node                     ] [Madame Masque] closed

elasticsearch-1.4.0$ ./bin/plugin -install index-termlist -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.3.0.0/elasticsearch-index-termlist-1.3.0.0-plugin.zip
-> Installing index-termlist...
Trying http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.3.0.0/elasticsearch-index-termlist-1.3.0.0-plugin.zip...
Downloading ................................DONE
Installed index-termlist into /Users/dblado/Downloads/test/elasticsearch-1.4.0/plugins/index-termlist

elasticsearch-1.4.0$ ./bin/elasticsearch
[2014-11-12 22:28:52,585][INFO ][node                     ] [Gargoyle] version[1.4.0], pid[92614], build[bc94bd8/2014-11-05T14:26:12Z]
[2014-11-12 22:28:52,586][INFO ][node                     ] [Gargoyle] initializing ...
[2014-11-12 22:28:52,595][INFO ][plugins                  ] [Gargoyle] loaded [index-termlist-1.3.0.0-c3c77f6], sites []
{1.4.0}: Initialization Failed ...
1) NoSuchMethodError[org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction.<init>(Lorg/elasticsearch/common/settings/Settings;Ljava/lang/String;Lorg/elasticsearch/threadpool/ThreadPool;Lorg/elasticsearch/cluster/ClusterService;Lorg/elasticsearch/transport/TransportService;)V]2) NoSuchMethodError[org.elasticsearch.rest.BaseRestHandler.<init>(Lorg/elasticsearch/common/settings/Settings;Lorg/elasticsearch/client/Client;)V]

Any plans to fix the issue?

BroadcastShardOperationFailedException

After plugin installation and restarting cluster, I've got this :
Version : 1.4.0.2 (http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.4.0.2/elasticsearch-index-termlist-1.4.0.2-plugin.zip)

curl -XGET 'http://my_elk_.cluster:9200/testpy/_termlist?totalfreqs&pretty'
{
"_shards" : {
"total" : 3,
"successful" : 0,
"failed" : 3,
"failures" : [ {
"index" : "testpy",
"shard" : 0,
"reason" : "BroadcastShardOperationFailedException[[testpy][0] ]; nested: SendRequestTransportException[[Node1][inet[/10.10.10.3:9300]][indices/termlist[s]]]; nested: NullPointerException; "
}, {
"index" : "testpy",
"shard" : 1,
"reason" : "BroadcastShardOperationFailedException[[testpy][1] ]; nested: SendRequestTransportException[[Node2][inet[/10.10.10.4:9300]][indices/termlist[s]]]; nested: NullPointerException; "
}, {
"index" : "testpy",
"shard" : 2,
"reason" : "BroadcastShardOperationFailedException[[testpy][2] ]; nested: SendRequestTransportException[[Node3][inet[/10.10.10.5:9300]][indices/termlist[s]]]; nested: NullPointerException; "
} ]
},
"total" : 0,
"terms" : [ ]
}

size parameter

When I use as size parameter a value that is equal or bigger than total, I do not get back all results.

E.g. _termlist?field=autocomplete_object&term=kan&size=6

total: 4,
terms: [
{
name: “kandelaars”
},
{
name: “kandelabers”
},
{
name: “kantoorstempels”
}

]

E.g. _termlist?field=autocomplete_object&term=kan&size=4

total: 4,
terms: [
{
name: “kandelaars”
},

{
name: “kandelabers”
},
{
name: “kantoorstempels”
}
]

What am I missing?

Tf-idf

Hi,

could you add the possibility to have the "tf-idf" (or equivalent) by term, like you've done for the frequency ?

ES 0.90.7

With ES 0.90.7, the REST action is not working, and gives an error like:
No handler found for uri [/_termlist] and method [GET]

Seems it's an incompatibility issue with 0.90.7:
Required changes: TransportTermlistAction -> acquireSearcher API has changed

es-plugin.properties is in src and not in src/main/resources.

Could not find plugin descriptor 'plugin-descriptor.properties' in plugin zip

elasticsearch/bin/plugin install jprante/elasticsearch-index-termlist
-> Installing jprante/elasticsearch-index-termlist...
Trying https://github.com/jprante/elasticsearch-index-termlist/archive/master.zip ...
Downloading ............................................................................................................................................DONE
Verifying https://github.com/jprante/elasticsearch-index-termlist/archive/master.zip checksums if available ...
NOTE: Unable to verify checksum for downloaded plugin (unable to find .sha1 or .md5 file to verify)
ERROR: Could not find plugin descriptor 'plugin-descriptor.properties' in plugin zip

Installation command does not work

Hi guys,

The installation command on the front page does not work for me. I am using 2.3.0. Error message is "failed to download from all possible locations"

./bin/plugin -install index-termlist -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.5.2.0/elasticsearch-index-termlist-1.5.2.0-plugin.zip

BTW, is it possible to access the list with Java API?

Thanks,
Cody

condition termlist result

Is it possible to generate the term list conditioned by the result of a query? and if not, is it feasible to add this capability to the plugin? forgive me if this questions seem naive, I'm not familiar with the internals of Elasticsearch or how plugins are implemented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.