danielvarga / hunalign Goto Github PK
View Code? Open in Web Editor NEWSentence aligner
License: GNU Lesser General Public License v3.0
Sentence aligner
License: GNU Lesser General Public License v3.0
Hi,
I'm trying to align an Arabic-English text corpus by using the following command: src/hunalign/hunalign -text -hand=hand_align_file dict.txt ar.txt en.txt
. I don't have a dictionary, so dict.txt is an empty text file. When I run it I get this error:
Reading dictionary...
20 hungarian sentences read.
0 english sentences read.
Sizes differing too much. Ignoring files to avoid a rare loop bug.
Am I missing something? Why aren't the english sentences read? Should I provide language codes when working with a different language than hungarian?
Hi,
I was not able to connect to the server as a guest and download the source code file or the package.
Here are the steps I took:
Please help. Thanks!
I have two files structured like this:
The Author
The Book Name
Book I
The introduction text.
Chapter 1 The Beginning
The first sentence.
La autoro
La nomo di libro
Libro 1
La prefaca texto.
Chapitro I La Komenco
La unesma frazo.
The result of hunalign -text
is:
The Author La autoro 0.266667
The Book Name ~~~ La nomo di libro 0
Book I -0.3
~~~ The introduction text. -0.3
0.3
Chapter 1 The Beginning Libro 1 10.7
0.3
The first sentence. La prefaca texto. ~~~ 0
Chapitro I La Komenco -0.3
0.3
La unesma frazo. -0.3
0.3
I.E. "Chapter 1" gets matched incorrectly to "Libro 1" (with a very high confidence score!), skewing the whole alignment - apparently, just because there is a number "1" at the both sides.
Could you add the following two projects to your Tools
sections:
https://github.com/aoliverg/hunapertium - Dictionaries for hunalign created from Apertium's transfer dictionaries.
https://github.com/coezbek/hunalign-dict-muse - Dictionaries for hunalign generated from Facebook MUSE dictionaries.
Thanks!
Sorry, I accidentally wrote this issue in the wrong repository.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.