Giter VIP home page Giter VIP logo

openodia's Introduction

image

python 3.9 Code coverage License: MIT code coverage license

  • openodia is a Python package which contains various tools on Odia language.
  • The short term goal of this package is to not make state-of-the-art methods, but to make tools which work.

Install

  • Please install any version of Python which is higher than or equal to Python 3.9. It should work.
  • The library is tested in python 3.9 version.
pip install openodia
  • If you want to directly build from the binary, please clone the repo and run setup.py.
git clone https://github.com/soumendrak/openodia.git
python setup.py install

Usage and Documentation

For usage and further documentation please visit the Documentation page.

License

openodia's People

Contributors

a-parida12 avatar dependabot[bot] avatar soumendrak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

openodia's Issues

Update the Release Action

The current release action is based on an official action implementation for semrel and is buggy when triggered.

Add dictionary corpus into the library

Is your feature request related to a problem? Please describe.
Add the English to Odia dictionary corpus into the library.

Describe the solution you'd like
The dictionary can be found in this Kaggle link.

Describe alternatives you've considered
No alternatives yet.

Additional context

  • We can further improve it by adding parallel sentences pairs corpus from MTEnglish2Odia repo.
  • The corpus position can be inside a corpus folder in openodia and we will import that in the __init__.py file.
  • Its signature should be like
from openodia import en2or_dictionary

Update Code Cov CI Test

Warning Dump from the CI job-

"Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: codecov/[email protected]. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/."

"Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2, actions/setup-python@v2, actions/upload-artifact@v2. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/."

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2, github/codeql-action/init@v1, github/codeql-action/autobuild@v1, github/codeql-action/analyze@v1. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

Add Stemming to tokens

Is your feature request related to a problem? Please describe.

  • Stemming is the process of reducing the inflected words to their root form.

Describe the solution you'd like

  • In Odia, there are many letters/suffixes/prefixes which can be removed to bring words to their root form and reduce variance.
  • Few of these are like:
"ଉଛ",
"ଉଛି",
"ଉଥିଲା",
"ଉଥିବ",
"ଉଥିବି",
"ଅଛ",
"ଅଛନ୍ତି"
  • After doing word tokenization rstrip these letters from the tokens.

Reference

Word tokenizer is removing punctuations

tokenizing words removing punctuations from the text.

Expected:

>>> from openodia import ud
>>> ud.word_tokenizer("ଭାରତୀୟ ସର୍ବୋଚ୍ଚ ନ୍ୟାୟାଳୟ, ଭାରତର ଉଚ୍ଚତମ ନ୍ୟାୟିକ ଅନୁଷ୍ଠାନ ଅଟେ ।")
[
"ଭାରତୀୟ", 
"ସର୍ବୋଚ୍ଚ", 
"ନ୍ୟାୟାଳୟ", 
",", 
"ଭାରତର", 
"ଉଚ୍ଚତମ", 
"ନ୍ୟାୟିକ", 
"ଅନୁଷ୍ଠାନ", 
"ଅଟେ",
"।"
]

However, currently, comma and purnnachheda symbols are not coming in tokenized text:

Expected:

>>> from openodia import ud
>>> ud.word_tokenizer("ଭାରତୀୟ ସର୍ବୋଚ୍ଚ ନ୍ୟାୟାଳୟ, ଭାରତର ଉଚ୍ଚତମ ନ୍ୟାୟିକ ଅନୁଷ୍ଠାନ ଅଟେ ।")
[
"ଭାରତୀୟ", 
"ସର୍ବୋଚ୍ଚ", 
"ନ୍ୟାୟାଳୟ", 
"ଭାରତର", 
"ଉଚ୍ଚତମ", 
"ନ୍ୟାୟିକ", 
"ଅନୁଷ୍ଠାନ", 
"ଅଟେ"
]

Difference:

[
"ଭାରତୀୟ", 
"ସର୍ବୋଚ୍ଚ", 
"ନ୍ୟାୟାଳୟ", 
- ",", 
"ଭାରତର", 
"ଉଚ୍ଚତମ", 
"ନ୍ୟାୟିକ", 
"ଅନୁଷ୍ଠାନ", 
"ଅଟେ",
- "।"
]

Let the library hit offline dictionary first before hitting Google Translate API

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

  • Currently, while translating we hit Google translate API directly.

Describe the solution you'd like
A clear and concise description of what you want to happen.

  • We should try as much as possible to reduce latency by hitting the offline dictionary first. If not found there then go to hit the Google Translate API.
  • In this way, this will act as a cache.
  • This has a dependency on #6

Describe alternatives you've considered

  • Open for alternatives.

Error

I wrote a Python code using openodia library.
When I try to convert "Three Hundred and Twenty One" using other_lang_to_odia(), it is returning "Three Hundred and Twenty" only - One is missed, and I have no idea why it is ignored.
Any clues ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.