o19s / elyzer Goto Github PK
View Code? Open in Web Editor NEW"Stop worrying about Elasticsearch analyzers", my therapist says
License: Apache License 2.0
"Stop worrying about Elasticsearch analyzers", my therapist says
License: Apache License 2.0
Since the sole purpose of the tool is to analyze the text I think --text
isn't necessary and it should simply accept all unparsed args as text:
elyzer --index myIndex --analyzer standard text to analyze
Currently, this needs to be prefixed and quotes:
elyzer --index myIndex --analyzer standard --text "text to analyze"
The following will fail or behave unexpected:
... --text text to analzye
=> fails with unrecognized arguments: to analyze
... --text text --text to --text analyze
=> only analyzes the last one, silently drops the first twoI am using ES 1.7 version and couldn't find a way to install it. Is it supported for 1.7 ?
Hi, I got this error while running this command. I am using elasticsearch version 5.0.1
Command
elyzer --es "http://localhost:9200" --index index_name --analyzer analyzer_name --text "some string"
Error:
CHAR_FILTER: special_char_filter
Traceback (most recent call last):
File "/usr/local/bin/elyzer", line 9, in <module>
load_entry_point('elyzer==0.2.2', 'console_scripts', 'elyzer')()
File "/usr/local/lib/python2.7/dist-packages/elyzer/__main__.py", line 30, in main
es=es))
File "/usr/local/lib/python2.7/dist-packages/elyzer/elyzer.py", line 65, in stepWise
char_filters=",".join(charFiltersInUse))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/indices.py", line 29, in analyze
'_analyze'), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 106, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'request [/autocomplete_ahs_index/_analyze] contains unrecognized parameter: [char_filters] -> did you mean [char_filter]?')
Great work! Also, it would be nice if this supports unicode text input (Example: café
). See the error below:
➜ elyzer git:(master) python __main__.py --es "http://localhost:9200" --index my_index --analyzer my_analyzer "café"
TOKENIZER: kuromoji_tokenizer
Traceback (most recent call last):
File "__main__.py", line 47, in <module>
main()
File "__main__.py", line 36, in main
es=es))
File "/Users/toiwa/Projects/Private/elyzer/elyzer/elyzer.py", line 72, in stepWise
analyzeResp = es.indices.analyze(index=indexName, body=body)
File "/Library/Python/2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/elasticsearch/client/indices.py", line 32, in analyze
'_analyze'), params=params, body=body)
File "/Library/Python/2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
body = self.serializer.dumps(body)
File "/Library/Python/2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'text': 'caf\xc3\xa9', 'char_filter': [], 'tokenizer': u'kuromoji_tokenizer'}, UnicodeDecodeError('ascii', '"caf\xc3\xa9"', 4, 5, 'ordinal not in range(128)'))
I would like to see a nice message telling me the analyzer was not found instead of a python exception.
Since providing an index and an analyzer is required, it would be nice to support environment variables so there can be set some defaults to the current shell environment.
Some suggestions:
ELYZER_ROOT_URL=http://server/9123
=> same as --es http://server/9123
ELYZER_INDEX=myIndex
=> same as --index myIndex
ELYZER_ANALYZER=myAnalyzer
=> same as --analyzer myAnalzer`The environment variables would be the fallback if the appropriate argument is not passed to elyzer
. I.e. command line arguments "always win".
Together with #6 this would make a good fit to make this a quick and handy tool:
export ELYZER_ROOT_URL=http://server:9123
export ELYZER_INDEX=myIndex
export ELYZER_ANALYZER=myAnalyzer
$ elyzer analyze this text for me thank you
The two remaining shortcomings could be addressed by setting explain: true
when calling the Analyze API (available since ES 2.2)
Currently this can only be used for an unsecured instance, could basic authentication please be incorporated?
{'error': {'root_cause': [{'type': 'security_exception', 'reason': 'missing authentication credentials for REST request [/my_index/_settings]', 'header': {'WWW-Authenticate': 'Basic realm="security" charset="UTF-8"'}}], 'type': 'security_exception', 'reason': 'missing authentication credentials for REST request [/my_index/_settings]', 'header': {'WWW-Authenticate': 'Basic realm="security" charset="UTF-8"'}}, 'status': 401}
It would be nice if some of the built-in analyzers were also supported. Mainly the following ones which don't require any specific configuration:
simple
whitespace
keyword
stop
standard
snowball
$ elyzer --es foo --index bar --analyzer my_analizer 'xxx'
Traceback (most recent call last):
File "/usr/bin/elyzer", line 9, in <module>
load_entry_point('elyzer==1.1.0', 'console_scripts', 'elyzer')()
File "/usr/lib/python3.4/site-packages/elyzer/__main__.py", line 42, in main
es=es))
File "/usr/lib/python3.4/site-packages/elyzer/elyzer.py", line 36, in getAnalyzer
normalizeAnalyzer(analyzer)
File "/usr/lib/python3.4/site-packages/elyzer/elyzer.py", line 19, in normalizeAnalyzer
if (isinstance(analyzer['char_filter'], str) or isinstance(analyzer['char_filter'], unicode)):
NameError: name 'unicode' is not defined
Do not you have people of the same problem?
In order to show the state of the tokens at each step of the analysis process, elyzer performs an analyze query for each stage: first char filters, then tokenizer, and finally token filters.
The problem is that on ES side, the analysis process has changed since they introduced normalizers. They assume that if there is no normalizer, no analyzer and no tokenizer in the request but either a token filter or a char_filter, then the analyze request should behave like a normalizer.
In the case shown here, elyzer will first perform a request for the html_strip
character filter and ES will think it is about a normalizer, hence the error since html_strip
is not a valid char_filter
for normalizers.
hey, i'm getting some kind of import problem. maybe you can help?
running python 3.4.3 through pyenv
here's what i got after doing pip install elyzer
elyzer --es "http://localhost:9200" --index companies --analyzer services_analyzer --text "coding"
Traceback (most recent call last):
File "/Users/alexchoi/.pyenv/versions/3.4.3/bin/elyzer", line 9, in <module>
load_entry_point('elyzer==0.1.1', 'console_scripts', 'elyzer')()
File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 519, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2630, in load_entry_point
return ep.load()
File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2310, in load
return self.resolve()
File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2316, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/elyzer/__main__.py", line 3, in <module>
from elyzer import stepWise, getAnalyzer
ImportError: cannot import name 'stepWise'
I've had this idea ever since my first GET _analyze
on Kibana dev-tools - why don't we have the tokens visualized?
This could useful for the devs working with Kibana dev-tools.
I've tried creating Kibana plugins on the weekend, seems like no easy fit. If anyone has useful resources that'll be great.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.