Giter VIP home page Giter VIP logo

elyzer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elyzer's Issues

Don't require --text for the text

Since the sole purpose of the tool is to analyze the text I think --text isn't necessary and it should simply accept all unparsed args as text:
elyzer --index myIndex --analyzer standard text to analyze

Currently, this needs to be prefixed and quotes:
elyzer --index myIndex --analyzer standard --text "text to analyze"

The following will fail or behave unexpected:

  • ... --text text to analzye => fails with unrecognized arguments: to analyze
  • ... --text text --text to --text analyze => only analyzes the last one, silently drops the first two

Unicode support

Great work! Also, it would be nice if this supports unicode text input (Example: café). See the error below:

➜  elyzer git:(master) python __main__.py --es "http://localhost:9200" --index my_index --analyzer my_analyzer "café"
TOKENIZER: kuromoji_tokenizer
Traceback (most recent call last):
  File "__main__.py", line 47, in <module>
    main()
  File "__main__.py", line 36, in main
    es=es))
  File "/Users/toiwa/Projects/Private/elyzer/elyzer/elyzer.py", line 72, in stepWise
    analyzeResp = es.indices.analyze(index=indexName, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/indices.py", line 32, in analyze
    '_analyze'), params=params, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
    body = self.serializer.dumps(body)
  File "/Library/Python/2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'text': 'caf\xc3\xa9', 'char_filter': [], 'tokenizer': u'kuromoji_tokenizer'}, UnicodeDecodeError('ascii', '"caf\xc3\xa9"', 4, 5, 'ordinal not in range(128)'))

No Support For Authentication

Currently this can only be used for an unsecured instance, could basic authentication please be incorporated?

{'error': {'root_cause': [{'type': 'security_exception', 'reason': 'missing authentication credentials for REST request [/my_index/_settings]', 'header': {'WWW-Authenticate': 'Basic realm="security" charset="UTF-8"'}}], 'type': 'security_exception', 'reason': 'missing authentication credentials for REST request [/my_index/_settings]', 'header': {'WWW-Authenticate': 'Basic realm="security" charset="UTF-8"'}}, 'status': 401}

Visualize analysis

I've had this idea ever since my first GET _analyze on Kibana dev-tools - why don't we have the tokens visualized?

This could useful for the devs working with Kibana dev-tools.

I've tried creating Kibana plugins on the weekend, seems like no easy fit. If anyone has useful resources that'll be great.

Support environment variables

Since providing an index and an analyzer is required, it would be nice to support environment variables so there can be set some defaults to the current shell environment.

Some suggestions:

  • ELYZER_ROOT_URL=http://server/9123 => same as --es http://server/9123
  • ELYZER_INDEX=myIndex => same as --index myIndex
  • ELYZER_ANALYZER=myAnalyzer => same as --analyzer myAnalzer`

The environment variables would be the fallback if the appropriate argument is not passed to elyzer. I.e. command line arguments "always win".

Together with #6 this would make a good fit to make this a quick and handy tool:

export ELYZER_ROOT_URL=http://server:9123
export ELYZER_INDEX=myIndex
export ELYZER_ANALYZER=myAnalyzer
$ elyzer analyze this text for me thank you

Support for built-in analyzers

It would be nice if some of the built-in analyzers were also supported. Mainly the following ones which don't require any specific configuration:

  • simple
  • whitespace
  • keyword
  • stop
  • standard
  • snowball

NameError: name 'unicode' is not defined

$ elyzer --es foo --index bar --analyzer my_analizer 'xxx'
Traceback (most recent call last):
  File "/usr/bin/elyzer", line 9, in <module>
    load_entry_point('elyzer==1.1.0', 'console_scripts', 'elyzer')()
  File "/usr/lib/python3.4/site-packages/elyzer/__main__.py", line 42, in main
    es=es))
  File "/usr/lib/python3.4/site-packages/elyzer/elyzer.py", line 36, in getAnalyzer
    normalizeAnalyzer(analyzer)
  File "/usr/lib/python3.4/site-packages/elyzer/elyzer.py", line 19, in normalizeAnalyzer
    if (isinstance(analyzer['char_filter'], str) or isinstance(analyzer['char_filter'], unicode)):
NameError: name 'unicode' is not defined

Do not you have people of the same problem?

Process fails since normalizers have been introduced

In order to show the state of the tokens at each step of the analysis process, elyzer performs an analyze query for each stage: first char filters, then tokenizer, and finally token filters.

The problem is that on ES side, the analysis process has changed since they introduced normalizers. They assume that if there is no normalizer, no analyzer and no tokenizer in the request but either a token filter or a char_filter, then the analyze request should behave like a normalizer.

In the case shown here, elyzer will first perform a request for the html_strip character filter and ES will think it is about a normalizer, hence the error since html_strip is not a valid char_filter for normalizers.

import problem

hey, i'm getting some kind of import problem. maybe you can help?

running python 3.4.3 through pyenv

here's what i got after doing pip install elyzer

elyzer --es "http://localhost:9200" --index companies --analyzer services_analyzer --text "coding"
Traceback (most recent call last):
  File "/Users/alexchoi/.pyenv/versions/3.4.3/bin/elyzer", line 9, in <module>
    load_entry_point('elyzer==0.1.1', 'console_scripts', 'elyzer')()
  File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 519, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2630, in load_entry_point
    return ep.load()
  File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2310, in load
    return self.resolve()
  File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/pkg_resources/__init__.py", line 2316, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/Users/alexchoi/.pyenv/versions/3.4.3/lib/python3.4/site-packages/elyzer/__main__.py", line 3, in <module>
    from elyzer import stepWise, getAnalyzer
ImportError: cannot import name 'stepWise'

Error

Hi, I got this error while running this command. I am using elasticsearch version 5.0.1

Command

elyzer --es "http://localhost:9200" --index index_name --analyzer analyzer_name --text "some string"

Error:

CHAR_FILTER: special_char_filter
Traceback (most recent call last):
  File "/usr/local/bin/elyzer", line 9, in <module>
    load_entry_point('elyzer==0.2.2', 'console_scripts', 'elyzer')()
  File "/usr/local/lib/python2.7/dist-packages/elyzer/__main__.py", line 30, in main
    es=es))
  File "/usr/local/lib/python2.7/dist-packages/elyzer/elyzer.py", line 65, in stepWise
    char_filters=",".join(charFiltersInUse))
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/indices.py", line 29, in analyze
    '_analyze'), params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 106, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'request [/autocomplete_ahs_index/_analyze] contains unrecognized parameter: [char_filters] -> did you mean [char_filter]?')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.