Giter VIP home page Giter VIP logo

lookupanalyzerdisambiguator's Introduction

LookupAnalyzerDisambiguator

Introduction

A tool for Turkish language processing which inputs Turkish tokens (text tokenized into tokens such as words and punctuations) and outputs their disambiguated morphological analyzes. Since Turkish has highly rich morphology, morphological analysis is required for many tasks including pos-tagging, dependency parsing

In our solution, we implement a simple morhological analyzer based on stem and suffix dictionaries. Using this simple morphological analzyzer, all possible analyzes of each token is generated. A neural network (specifically Bidirectional character based LSTMs) is implemented using DyNet library and trained for selecting the correct morphological analysis among all possible analyzes according to the context which words have appeared in. The neural network architecture is similar to the architecture used in Shen et. al.'s study

Performance

Although we do not use any complex morphological analyzer as in most of the studies, our results are competitive with state-of-the-art morphological disambiguators (96%~97% accuracy).

We will report a comprehensive evaluation results soon.

Usage

Just build a docker image using Dockerfile in the repo:

docker build --tag turkish-tagger .

Then start a docker container using the docker image built on previos step:

docker run -p 8081:8081 -d  turkish-tagger

Then, it will start to serve as a web application if everything goes well. You can just send a post request to localhost:8081/analyze to analyze a Turkish sentence morphologically.

Example requests and responses

Request1 :

curl --request POST \
  --url http://localhost:8081/analyze \
  --header 'cache-control: no-cache' \
  --header 'content-type: application/json' \
  --header 'postman-token: c18af364-c1cb-cc41-0903-063547ac7fce' \
  --data '{
    "tokens" : [
        "alın",
        "yazısı"
    ]}'

Response 1:

[
    "alın+Noun+A3sg+Pnon+Nom",
    "yazı+Noun+A3sg+P3sg+Nom"
]

Request2 :

curl --request POST \
  --url http://localhost:8081/analyze \
  --header 'cache-control: no-cache' \
  --header 'content-type: application/json' \
  --header 'postman-token: dd9b686d-509c-d676-173c-f8f64d5dcee0' \
  --data '{
    "tokens" : [
        "gelirken",
        "ekmek",
        "alın",
        "."
     ]}'

Response 2:

[
    "gelir+Noun+A3sg+Pnon+Nom^DB+Verb+Zero^DB+Adverb+While",
    "ekmek+Noun+A3sg+Pnon+Nom",
    "al+Verb+Pos+Imp+A2pl",
    ".+Punc"
]

Notes

Please email me and ask for permission to use this tool. Also note that this is not a release version and may contain some bugs. Every contribution is welcome.

We still continue working with my advisor in my PhD thesis. Wait for better accuracies :)

lookupanalyzerdisambiguator's People

Contributors

erayyildiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lookupanalyzerdisambiguator's Issues

Parameters are not matching

Hi, i've tried to run turkish-tagger on Windows and Ubuntu. First output was something like this:

e7d45b3b4fae652e5a88fd91a88e1f155520da4075de1fc71655c91028510f11

Docker was exiting after this output.

I changed a little on run command with the remove -d parameter and add -it parameter, i get error as following:

[dynet] random seed: 403473904 [dynet] allocating memory: 512MB [dynet] memory allocation done. 2019-07-16 08:07:56,997 - /usr/src/app/src/models.py - INFO - 112} - Loading Pre-Trained Model Traceback (most recent call last): File "src/run.py", line 25, in <module> class APIHandler(BaseHandler): File "src/run.py", line 27, in APIHandler _morph_anlyzer = AnalysisScorerModel.create_from_existed_model("lookup_disambiguator_wo_suffix") File "/usr/src/app/src/models.py", line 326, in create_from_existed_model return AnalysisScorerModel(train_from_scratch=False, model_file_name=model_name) File "/usr/src/app/src/models.py", line 114, in __init__ self.load_model(model_file_name, char_representation_len, word_lstm_rep_len) File "/usr/src/app/src/models.py", line 322, in load_model self.model.populate("resources/models/" + model_name + ".model") File "_dynet.pyx", line 1022, in _dynet.ParameterCollection.populate File "_dynet.pyx", line 1077, in _dynet.ParameterCollection.populate_from_textfile RuntimeError: Dimensions of parameter /vanilla-lstm-builder/_0 looked up from file ({512,128}) do not match parameters to be populated ({512,64})

How can i solve this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.