Comments (4)
This is not the right way to open an issue.
You should:
- Specify your use case.
- Share your code snippet.
- Share an example input and the desired output.
from indictrans2.
I understand the need for clarity when opening an issue. Here are the details you requested:
- I am using the IndicTrans2 model for translation of English to Odia.
- Example : For some of translations model generates ଯ଼ alphabets in responses which is not exist in Odia language. ଯ with the dot under it is not a character in Odia and instead should be replaced with ୟ.
input : Local media reports an airport fire vehicle rolled over while responding.
output : ସ୍ଥାନୀଯ଼ ଗଣମାଧ୍ଯ଼ମ ରିପୋର୍ଟ କରିଛି ଯେ ପ୍ରତିକ୍ରିଯ଼ା ଦେବା ସମଯ଼ରେ ଏକ ବିମାନ ବନ୍ଦର ଅଗ୍ନିଶମ ଗାଡି ଓଲଟି ପଡ଼ିଥିଲା।
input: British newspaper The Guardian suggested Deutsche Bank controlled roughly a third of the 1200 shell companies used to accomplish this.
output:ବ୍ରିଟିଶ ଖବରକାଗଜ ଦି ଗାର୍ଡିଆନ୍ ପରାମର୍ଶ ଦେଇଛି ଯେ ଡଏଚ୍ ବ୍ଯ଼ାଙ୍କ 1200 ଟି ନକଲି କମ୍ପାନୀ ମଧ୍ଯ଼ରୁ ପ୍ରାଯ଼ ଏକ ତୃତୀଯ଼ାଂଶକୁ ନିଯ଼ନ୍ତ୍ରଣ କରିଥିଲା।
from indictrans2.
Please check the following commit, this has been resolved if you use our inference pipeline.
For a short-term fix, please make the necessary changes on your end.
For a permanent solution, the Unicode transliterator for Oriya in the IndicNLP library needs to be debugged, or a similar workaround can be implemented there as well.
Feel free to open a PR.
from indictrans2.
Thanks for a quick response,
I have tried this changes but this not solved issue for me, as this changes replace before transliterator, But it solve me when I apply it after using transliterator.
Please find code below,
if lang == "eng_Latn":
for sent in sents:
postprocessed_sents.append(self.en_detok.detokenize(sent.split(" ")))
else:
for sent in sents:
outstr = indic_detokenize.trivial_detokenize(
self.xliterator.transliterate(sent, flores_codes[common_lang], flores_codes[lang]), flores_codes[lang]
)
# Oriya bug: indic-nlp-library produces ଯ଼ instead of ୟ when converting from Devanagari to Odia
# TODO: Find out what's the issue with unicode transliterator for Oriya and fix it
if lang_code == "ory":
outstr = outstr.replace("ଯ଼", 'ୟ')
postprocessed_sents.append(outstr)
return postprocessed_sents
Thanks once again for solution.
from indictrans2.
Related Issues (20)
- Translation of Proverbs and Idioms HOT 1
- use with ctranslate HOT 1
- Hardware Requirement HOT 1
- Handle src==tgt inputs in triton inference server
- Issues for the Urdu and Kashmiri HOT 2
- Flash Attention on Mac HOT 2
- Model Optimization HOT 1
- Convert fairseq tokenizer (vocab and final_bin) to HF Autotokenizer HOT 3
- Loosing Formatting post translation HOT 3
- Convert fairseq weights to ctranslate2 HOT 1
- Distillation of en-indic base model HOT 1
- Distillation: Unable to start the training HOT 2
- Distillation Joint Translate Bug HOT 3
- Saving Distillation model HOT 1
- Fairseq dictionary Size HOT 1
- help in finetuning ai4bharat/indictrans2-indic-en-1B HOT 2
- Numerals Not Translated Correctly in IndicTrans2 HOT 3
- Installation issue. HOT 1
- Translations are not proper when source contain the different format of numbers. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from indictrans2.