Giter VIP home page Giter VIP logo

medilatin's Introduction

New TSD deadline: Apr. 3.

test

medilatin's People

Contributors

lilasaba avatar mihajlik avatar btarjan avatar

Watchers

 avatar  avatar  avatar

Forkers

mihajlik

medilatin's Issues

Final changes

TODOs

  • rewrite caption names
  • recalculate weighed averages
  • add Romanian baseline grapheme results
  • WER feloldás a szövegben (%), mindig százalék
  • U+S fejezetek egybevonása

Grafema-normalizalasi statisztika az USG modellhez

@balogandras, a legutobbi megbeszelesen abban maradtunk, hogy mivel tobb biralo is hianyolta a peldakat a diakritikumos grafemakrol a legkozelebbi normal alakra valo mappelesrol, hogy nem csak peldakkal fogjuk egy kicsit feldusitani a cikkeket, hanem ossze is szamoljuk, hany ilyen atalakitas tortent.
Ezt en megcsinalom, viszont ha lesz majd valamikor ot perced, akkor at tudnad majd kuldeni a mappeleseket tartalmazo fajlokat ?

Koszi !

Leadás előtti észrevételeim a cikkel kapcsolatban

Előre is bocsánat a szőrszálhasogatásért, ne haragudj, mindent vegyél építő szándékkal:

"In one hand" : szerintem "on"

"it is also has to be taken into account" : "is" nem kell

"Both dev and test sets were read out loud" : én nem tudok dev és test halmazról

"2.2 Speech data Training data": itt meg kellene említeni, hogy AM tanító anyagról van szó

"Native speakers of Czech, Hungarian, Polish and Slovakian": of után jók ezek a melléknevek? Utána n-dash kellene, nem kötőjel. A " - " karaktersorozatokban a kötőjeleket javítani kellene n-dashre.

"One obvious way to handle this problem is create": to create

"Another way of circumvent" : to circumvent vagy of circumventing (szerintem)

"Gauss Mixture Model": Gaussian

"KALDI": Kaldi

"As source language acoustic training data, Czech and Hungarian phoneme-based acous- tic models were used": Ezt nem értem. Mit akar jelenteni ez, hogy akusztikus modelleket tanító adatként használunk?

"source languages phonemes": language ? vagy ha többes számban akarod, akkor languages'

"Table 1": Ezt nem értem. A táblázat jobb alsó negyedében fonémák vannak vagy grafémák? Ha grafémák, akkor nem jó az áthúzott o, olyan betű nincs a magyarban. Ha fonémák, akkor miért különböző a cseh és latinnál? (Bocs, ha butaság, te értesz ehhez.) Nekem nem illik a többihez, hogy középre van igazítva a "Table 1 ..."

"Table 2": A palatal vowel miért VP, miért nem PV? "VNP": mi az az N?

Nézem tovább, küldöm a másik felét is, ha végigértem, de hátha addig foglalkoznál ezzel, azért küldöm most el.

tsd cikk lektoralas

Szia Peter,

feltoltottem a cikk lektoralando verziojat. A pull request-elos megoldas szerintem remekul mukodik.

Erdemi valtozasok az utolso kommunikacio ota:

  • SK eredmenyek explicitte tetele

TODOk (amik mar nem zavarjak a lektoralast, kerlek jelezd, ha be tudsz valamelyiknel segiteni):

  • related work: szerintem a Tanja Schultz-os, es az under-resource survey-s cikkrol kellene meg emlitest tenni
  • Roman speech database emlitese referenciaval (a referenciara mindenkepp szuksegem lenne)
  • boldface best results in tables (Lili - ezt meg ma megcsinalom, csak kicsit pepecselosebb annal, hogy erre kelljen varni)

Submission Review Results

SUBMISSION REVIEW RESULTS

ID# 97: UNIFIED SIMPLIFIED GRAPHEME ACOUSTIC MODELING FOR MEDIEVAL LATIN LVCSR

Review 1 (RID# 141)

Originality: 9

Significance: 8

Relevance: 9

Presentation: 6

Technical quality: 7

Overall rating: 8

Amount of rewriting required: 2

Main contributions:
The paper presents the efforts of digitizing medieval Latin charter data using a target language independent speech recognizer. An acoustic model for medieval Latin was built using speech data from different source languages of the Visegrad region. To build an acoustic model without source language speech data, two grapheme-based pronunciation modelling approaches are proposed. In the first approach, source-language phonemes were mapped to target-language phonemes using the expert knowledge that was implemented as a set of context independent digraph mappings and context dependent rewrite rules. In the second approach, the four-language unified simplified grapheme acoustic model was used, where all special characters were mapped to their normalized form, and those graphemes that are non-native to Latin, and can straightforwardly mapped to a native Latin grapheme or several such graphemes, were replaced. The experimental results show that the speech recognition systems based on the both proposed modelling methods outperform the baseline system. Some unexpected results are reported as well that indicate that the four-language unified simplified grapheme acoustic model is able to generalize better that the other presented models.
Positive aspects:
The positive aspect of the presented work is that it represents an important contribution to the efforts of digitizing medieval Latin charter data. An important finding is also that the unified simplified grapheme acoustic model can yield better results that the models that are based on source-language phonemes to target-language phonemes mappings.
Negative aspects:
It is hard to understand from the paper how the differences between the four languages are actually reflected in the two grapheme-based pronunciation modelling approaches.
Reviewer's comments:
It would be useful if some explicit examples of pronunciations are given in the paper that would illustrate the difference between the two grapheme-based pronunciation modelling approaches.

Review 2 (RID# 197)

Originality: 5

Significance: 5

Relevance: 6

Presentation: 5

Technical quality: 6

Overall rating: 6

Amount of rewriting required: 1

Main contributions:
The authors describe 2 approaches for the building of a speech recognition components, namely acoustic model, for a dictation scenario for the Latin langauge, spoken in the Visegrad region. The use existing databases with largely varying amounts of transcribed data (Czech, Hungarian, Polish and by far the most data for Hungarian). They provided results for mono-lingual baseline systems for these languages. Approach 1) they mapped the respective phonemes to a "Latin" phoneme set. Approach 2) they mapped the grahemes to a "unified" (simplified) "grapheme" set and build acoustic models with data from 3 or 4 languages respectively. Both approaches outperform the baseline mono-lingual models. They mentioned the relevance of the different amount of training data per languages.
Positive aspects:
The authors describe well the available databases, the building essentials for the acoustic model and provided numerous results for the two investigated approaches including reasoning. As the writing systems of these language are based on Latin they chose to train acoustic models using "graphemes" as subwird unit.
Negative aspects:
The database for Hungerian is much bigger then the other language databases, thus more robust models are to be expected. The 3 or 4 languages acoustic models show more stabalized results as expected. It would be interesting to verify that a "unified phoneme" approach for using 3 or 4 languages as databases would give similiar results like the "grapheme" approach. Clearyl the "grapheme" approach would be easier to handle in real use, as no G2P converter will need to be trained.
Reviewer's comments:
The English should be slighly revised for typos.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.