qcri / arabicasrchallenge2016 Goto Github PK
View Code? Open in Web Editor NEWThis repository
License: Other
This repository
License: Other
Scoring not working with sclite. Here is error log entry.
# cp data/test_mer80/test.stm exp/mer80/tri4/decode.si/score_10/ && sclite -O exp/mer80/tri4/decode.si/score_10 -o all spk -h exp/mer80/tri4/decode.si/score_10/test_mer80.ctm.updated ctm -r data/test_mer80/test.stm stm
# Started at Mon Mar 21 00:42:38 CDT 2016
#
align_ctm_to_stm: File identifiers do not match but continuing. ref file/channel '679b3bca-ee2e-4a00-9501-e4f662700ce7' '0', next hyp '01be8e7b-c179-42e3-8521-109c2c732334' '0'.
sclite: 2.10 TK Version 1.3
Begin alignment of Ref File: 'data/test_mer80/test.stm' and Hyp File: 'exp/mer80/tri4/decode.si/score_10/test_mer80.ctm.updated'
Performing alignments for file '679b3bca-ee2e-4a00-9501-e4f662700ce7'.
Add a line to remove the copyright information from the lexicon while transporting it to kaldi
run.sh might be re-run. As such local/mgb_data_prep.sh seems to be deleting files from prior runs before doing its work.
Its not properly deleting the reco2file_channel file.
rm -rf reco2file_channel
should instead be
rm -rf $dirtest/reco2file_channel
when word starts with '{', sclite will see it as alternatives
240E6EF6-4F2C-41FF-A1B8-2BD434D41C2D 1323.52 1330.90
h*A AlglAm Al*y AgtSb <rAdp Al>mp >yn mbAd} Alqr|n Alkrym {w>mrhm $wrY bynhm} >yn {w$Arwhm bAl>mr}
this will cause problem in scoring. There are 24 words missing in reference because of this bug.
The script currently throws an error, when generating the 'stm' file which is used as the reference file while scoring using sclite.
The script is called from kaldi recipe/local/mgb_data_prep.sh. The error is due to the following line in xml2stm.py:
tokens = [e.childNodes[0].data for e in segment.getElementsByTagName('element')]
The line throws an exception when the element tag in the dev xml transcript is empty. For e.g.
<element id="5331FFF2_358E_445E_BA71_74B745387304_w1824_manual" type="word"/>
The xml file is at: emacs /data/sls/scratch/sameerk/mgb_arabic_data/xml/5331FFF2-358E-445E-BA71-74B745387304.xml
There are two fixes:
Currently, we have a problem with sclite scoring option in the score.sh. The ctm file is not ordered properly.
Currently in our baseline script, when scored with sclite, the number of words in reference is wrong. The problem is with buckwalter and sclite. SCLITE supports labels for utterance in angle brackets after end time. For words following end time and begin with <, it sees them as labels and eats then all.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.