Giter VIP home page Giter VIP logo

itu-turkish-nlp-pipeline-caller's Introduction

As I no longer have time to maintain this project I am looking for collaborators to help to maintain. You can sign up by sending a pull request which fixes a bug or adds a feature.

ITU Turkish NLP Pipeline Caller

Build Status PyPI version Join the chat at https://gitter.im/freecodecamp/freecodecamp Codacy Badge

A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API

For details of the pipeline, please check the pipeline page and the sources below.

Eryigit, Gülsen. "ITU Turkish NLP Web Service." EACL. 2014.

Gülşen Eryiğit, Joakim Nivre, and Kemal Oflazer. Dependency Parsing of Turkish. Computational Linguistics, 34 no.3, 2008.

Usage

To be able to use the pipeline, you need an authentication token (details on API web page).

If you experience any problem please contact with me via the gitter chat room.

Setup

This repository is tested with Python 3.4, 3.5 and 3.6 versions, but using the most up-to-date one is always better.

Recommended way

Using PyPI just run pip3 install ITU-Turkish-NLP-Pipeline-Caller

Alternative way

Download the latest release, extract the archive and inside that directory simply run python3 ./setup.py install to install.

As a Command Line Tool

The tool reads the token from pipeline.token file (under the same directory with the tool) by default.

Simply pipeline_caller <filename> reads the input file, prints the output under ./output/output<system_time>

You can select the pipeline tool by using -t option pipeline_caller <filename> --tool <tool_name> default is "pipelineNoisy"

You can force the encoding for I/O by using -e option pipeline_caller <filename> -e <encoding> default is your system locale

You can switch processing type using -p option. Input text can be processed whole at once, sentence by sentence or word by word. For some tools (isturkish for example) in the Pipeline, word by word processing is necessary at the moment. Default type is whole at once. Example: pipeline_caller <filename> --tool isturkish -p word sends input text to isturkish tool, word by word.

And you can change the output directory by using -o option pipeline_caller <filename> -o <another_directory> default is "output"

Also pipeline_caller --help shows the help menu.

Using As a Module

import pipeline_caller

caller = pipeline_caller.PipelineCaller()

result = caller.call(<tool_name>, <text>, <api_token>)

Defaults (Optional)

Check DEFAULTS block in the source code if you need (generally, you don't) to change one of these:

api_url = "http://tools.nlp.itu.edu.tr/SimpleApi"

pipeline_encoding = 'UTF-8'

token_path = "pipeline.token" for command line tool

default_output_dir = "output"

default_enconding = locale.getpreferredencoding(False) default encoding in your OS, for I/O operations in command line tool

default_sentence_split_delimiter_class = "[\.\?:;!]" for command line tool, to separate sentences and process sentence by sentence

Special Thanks

Special thanks to Asst. Prof. Dr. Peter Schüller for his great suggestions!

Author, Copyright & License

This work was a part of a KnowLP research project.

Copyright 2015-2018 Maintainers:

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

itu-turkish-nlp-pipeline-caller's People

Contributors

0xferit avatar bitdeli-chef avatar jacobcward avatar ulgens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

itu-turkish-nlp-pipeline-caller's Issues

Piping Calls

To obtain NER outputs, it's necessary to call some modules serially.

Example: --tool T1 T2 T3 would be useful.

Parallel requests not permitted

I get this error message when I call 'pipelineNoisy`

Parallel requests not permitted

I can confirm that this is not because of a caller change, as pipeline caller was working as expected before and no changes made by our side.

Word-by-word Processing

Some API tools(like isturkish) process only a single word at a call, even if the whole sentence sent. In order to process whole sentence, we need to split the sentence into words and call the API consecutively, and merge the outputs, like in the sentence-by-sentence processing.

Call for Collaborators / Takım Arkadaşı Aranıyor

As I no longer have time to maintain this project I am looking for volunteers to help to maintain. You can sign up by sending a pull request which fixes a bug or adds a feature.

Projeyi maintain etmeye vakit ayırmakta zorlandığımdan yardımcı olacak birilerini arıyorum. Başvurmak için eksik veya yanlış gördüğünüz bir şeyi düzeltip pull request göndermeniz yeterli.

Empty Sentence(split('.'))

After splitting, last sentence is empty. Causing empty parameter error(not handled), and error gets printed into output.

Increasing Verbosity

Sometimes processing hangs for minutes, and we can't be sure if it's stucked or still processing.

Warnings

Some API tools needs to be processed word by word in order to get correct result. We need to add warnings to arg parser.

Token file - Defensive Coding

Defensive coding needed while parsing the token from token file.

For example: Program accepts token, but fails when it's like token\n.

We need to strip whitespaces, newlines etc.

Typo

Typo in help menu, "separate" not "seperate"

Setup.py

We need a setup script to install the module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.