Giter VIP home page Giter VIP logo

arabicasrchallenge2016's People

Contributors

sameerkhurana10 avatar yshalabi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arabicasrchallenge2016's Issues

Scoring errors when using sclite

Scoring not working with sclite. Here is error log entry.

# cp data/test_mer80/test.stm exp/mer80/tri4/decode.si/score_10/ && sclite -O exp/mer80/tri4/decode.si/score_10 -o all spk -h exp/mer80/tri4/decode.si/score_10/test_mer80.ctm.updated ctm -r data/test_mer80/test.stm stm 
# Started at Mon Mar 21 00:42:38 CDT 2016
#
align_ctm_to_stm: File identifiers do not match but continuing. ref file/channel '679b3bca-ee2e-4a00-9501-e4f662700ce7' '0', next hyp '01be8e7b-c179-42e3-8521-109c2c732334' '0'.
sclite: 2.10 TK Version 1.3
Begin alignment of Ref File: 'data/test_mer80/test.stm' and Hyp File: 'exp/mer80/tri4/decode.si/score_10/test_mer80.ctm.updated'
    Performing alignments for file '679b3bca-ee2e-4a00-9501-e4f662700ce7'.

local/mgb_data_prep.sh bug

run.sh might be re-run. As such local/mgb_data_prep.sh seems to be deleting files from prior runs before doing its work.

Its not properly deleting the reco2file_channel file.

rm -rf reco2file_channel

should instead be

rm -rf $dirtest/reco2file_channel

sclite Buckwalter bug

when word starts with '{', sclite will see it as alternatives

240E6EF6-4F2C-41FF-A1B8-2BD434D41C2D 1323.52 1330.90
h*A AlglAm Al*y AgtSb <rAdp Al>mp >yn mbAd} Alqr|n Alkrym {w>mrhm $wrY bynhm} >yn {w$Arwhm bAl>mr}

this will cause problem in scoring. There are 24 words missing in reference because of this bug.

xml2stm.py

The script currently throws an error, when generating the 'stm' file which is used as the reference file while scoring using sclite.

The script is called from kaldi recipe/local/mgb_data_prep.sh. The error is due to the following line in xml2stm.py:
tokens = [e.childNodes[0].data for e in segment.getElementsByTagName('element')]

The line throws an exception when the element tag in the dev xml transcript is empty. For e.g.
<element id="5331FFF2_358E_445E_BA71_74B745387304_w1824_manual" type="word"/>
The xml file is at: emacs /data/sls/scratch/sameerk/mgb_arabic_data/xml/5331FFF2-358E-445E-BA71-74B745387304.xml

There are two fixes:

  1. Handle this while generating the dev xmls
    Required Effort: Just a one line of code. I have already tested the change.
    Downside: We would need to generate the dev xmls again and upload to ftp server
  2. Put a check in the xml2stm.py script
    Required Effort: Just have to put a condition to check for the empty string

@amali @yifan what do you guys think?

Fix SCLITE scoring issue

Currently, we have a problem with sclite scoring option in the score.sh. The ctm file is not ordered properly.

sclite skips words in reference with '<' at the beginning of sentences

Currently in our baseline script, when scored with sclite, the number of words in reference is wrong. The problem is with buckwalter and sclite. SCLITE supports labels for utterance in angle brackets after end time. For words following end time and begin with <, it sees them as labels and eats then all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.