Giter VIP home page Giter VIP logo

mfa-models's People

Contributors

mmcauliffe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mfa-models's Issues

new english_mfa model

This models' IPA transcription symbols seem to have changed quite a lot from those used by the previous english_ipa, is that intentional?

Also... how and why this was done?

Arabic acoustic model?

Hi team, just wondering what's happening with the Arabic acoustic model? I'm getting conflicting info and some dead links from the documentation. I recommended MFA to a student because I remembered it being in version 1 but now I can't find it. (note, I haven't actually tried downloading anything today - I'm on the wrong computer)

  • The dictionary is listed: and the dictionary page references the dictionary "was used in training the Arabic MFA acoustic model"
  • The release page link for downloading the dictionary manually is dead though the plain dictionary link works.
  • The models folder on github https://github.com/MontrealCorpusTools/mfa-models/tree/main/acoustic has Arabic in it but I think not updated to v2?

AssertionError: assert self.alignment_model_path.suffix == ".alimdl"

when : mfa align /data /home/mfauser/mandarin_pinyin.txt /home/mfauser/mandarin.zip /home/mfauser/result

Please be aware that you are running an alpha version of MFA. If you would like to install a more stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa
INFO Setting up corpus information...
INFO Found 1 speaker across 3 files, average number of utterances per speaker: 3.0
INFO Initializing multiprocessing jobs...
WARNING Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs. Use the --single_speaker flag if you would like to split utterances across jobs regardless of their
speaker.
INFO Text already normalized.
INFO Features already generated.
ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f734941a390>>
Traceback (most recent call last):
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
raise self.exception
File "/env/bin/mfa", line 8, in
sys.exit(mfa_cli())
^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align.py", line 122, in align_corpus_cli
aligner.align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 334, in align
super().align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/base.py", line 358, in align
assert self.alignment_model_path.suffix == ".alimdl"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError:

inconsistencies in Armenian dictionary

Hello,

On the MFA page for Armenian, it seems the dictionary is being based off of Armenian transliteration instead of transcriptions.

  • In the consonants table, the system is breaking up affricates. This creates non-existent IPA symbols like sʰ. The example it cites is կից [k i t sʰ], but this word is actually [kit͡sʰ] with an an aspirated affricate /t͡sʰ/. Paradoxically, the XPF page doesn't have an /sʰ/ symbol.
  • The examples seem to be transliterations instead of transcriptions because it's omitting epenthetic schwas which are unwritten but prescriptively present. For example, for the /ɡ/ symbol, a cited example is գլուխ [ɡ l u χ], but this word is actually [ɡəluχ] with an epenthetic schwa, as also reported on Wiktionary

If you want, I can re-transcribe your dictionary file using a mix of Wiktionary + my native judgments.

Spanish MFA

hi,

Thanks for opensourcing the models! I am trying to use Spanish MFA on my dataset. The command i ran is: mfa validate my_data spanish_mfa.dict spanish_mfa.zip, where spanish_mfa.zip is downloaded from this link and spanish_mfa.dict is downloaded from this link.

However, i am getting the following error:
PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: a, b, c, d̪, e, f, i, j, k, l, m, n, o, p, r, s, tʃ, t̪, u, w, x, ç, ð, ŋ, ɟ, ɟʝ, ɡ, ɣ, ɲ, ɾ, ʃ, ʎ, ʝ, β, and θ

I didn't have any problem running MFA for English before, so i think i may have did something wrong for Spanish.

Thanks in advance!

Dictionary format is unclear.

Hello and thank you for your work.

I was working with russian_mfa.dict (downloaded via mfa model download dictionary russian_mfa) and its format seems unclear: typically ноутбуков 1 0.0 0.0 0.0 n̪ o ʊ d̪ b u k ə f, I understand that it is a word and its phonemes at the very beginning of the line and at the end respectively, and the first number is some probability, but I can't figure it out what does the other 3 numbers mean. I looked at the documentation here, but there is nothing about format :(

It's important because the output of mfa g2p russian_mfa oov.txt oov_phonemes.txt has the following format жбанков ('ʐ', 'b', 'a', 'n̪', 'k', 'ə', 'f') and it's unclear how to merge existing dictionary with oov words, because the formats are different.

Could you please explain what the format is russian_mfa.dict or where to read about it.
Best wishes

Incorrect <ni> sequences in French MFA model

It appears that all orthographic sequences of "ni" in the French MFA model's dictionary are transcribed as [ɲi] or [ɲ] (depending on whether the "i" corresponds to a semivowel or not). For instance, "niche" is [ɲiʃ] but should be [niʃ]. This is affecting performance in tagging.

'KaldiProcessingError' Error constructing TableReader:

Hi,

I've tried just about every solution I can think of (reinstalling MFA, attempting on different machines and OS, updating packages, running different versions of python in the virtual env, etc.) and I continue to get a 'KaldiProcessingError' which manifests in the align.2.1.log file. This happens with some, but not all, language models (currently happening with the pre-trained Greek and when I attempt to train a Cantonese one). I'm certain I'm not the first one with this issue, so there must be an easy fix but the 16-bit suggestion is not the problem and I'm completely out of ideas of things to try.

Brazilian Portuguese Alignment Error

Hi, I'm trying to use the Portuguese(Brazil) MFA dictionary, Portuguese (Brazil) MFA G2P model, and Portuguese MFA acoustic model,to align on Multilingual TEDx portuguese dataset. And got the error below,

dictionary phones: {'ʎ', 'a', 'õ', 'b', 'ɛ', 'j̃', 'ɟ', 'n', 'ɐ', 'j', 'w̃', 'k', 'w', 'o', 'ʒ', 'i', 'd', 's', 'tʃ', 'ɔ', 'm', 'ũ', 'ɡ', 'ɐ̃', 'ɾ', 'p', 't', 'dʒ', 'f', 'u', 'c', 'z', 'ẽ', 'ɲ', 'v', 'ĩ', 'l', 'x', 'ʃ', 'e'}
model phones: set()
There were phones in the dictionary that do not have acoustic models: a, b, c, d, dʒ, e, ẽ, f, i, ĩ, j, j̃, k, l, m, n, o, õ, p, s, t, tʃ, u, ũ, v, w, w̃, x, z, ɐ, ɐ̃, ɔ, ɛ, ɟ, ɡ, ɲ, ɾ, ʃ, ʎ, ʒ

Is there any idea how to fix this? Thank you!

Update:
Solved by upgrading MFA to v2.0.0rc5

new acoustic models with IPA dictionnaries

Hello,

I saw that new IPA dictionaries for many languages were added, which is great.

I was wondering if there were plans to train acoustic models in IPA as well.
Maybe a multilingual model, but with not too many languages at first, that share characteristics? (e.g. english, french, spanish)

In fact, I just stumbled upon your blog post that is talking about that. Very interesting by the way !
https://mmcauliffe.medium.com/creating-english-ipa-dictionary-using-montreal-forced-aligner-2-0-242415dfee32

Request for more phonemic-style UK English dictionary and acoustic model

I recently posted in the discussion section (see below) relating to the phone set used by MFA and asking if anyone had developed a tool for returning to a more standard phonemic transcription after alignment. I'm still interested in getting to this, as it would be very useful for our project where variants will be differentiated using auditory and acoustic methods at a later stage in the research. Having now experimented a little more with MFA and become more familiar with the dictionary (english_uk_mfa.dict) and phonological rules for the model (english_mfa) I was using, I think that while a lot of what I'd like is possible to change using script after alignment, some of the issues are embedded in the dictionary and acoustic model.

Specifically, I've found that the rules for TH-alveolarization (allowing /θ/ -> /s/ and /ð/ -> /z/) and ING-variation are overgeneralized, so that it becomes very difficult to group together variants of these phones in order to study them together at a later stage of research. As an example, a speaker in a test file that I've aligned is often transcribed as using TH-alveolarization by the aligner (e.g. 'there' as [zɛː], 'third' as [sɜː]). Auditorily I think that this speaker actually is producing dental fricatives, but the way that they are treated by the aligner means that if at a later stage someone wanted to use the corpus to consider TH-alveolarization, TH-stopping, or TH-fronting in our data, it would be very difficult to find all cases where this might occur. Similarly, we might consider ING-variation later, and in our data auditorily-identified variants include not only [ɪn] and [ɪŋ], but also [ɪŋk], but currently the aligner transcribes a most of these as [ɪn], making it harder to find potential variable cases later in the research.

It occurs to me that it's likely that a previous MFA version included a model for UK English trained with a dictionary that used a less opinionated phone set, included fewer pronunciation variants in the dictionary, and did not use phonological rules that are difficult to reverse. If this is the case, I was wondering if this is something you'd be willing to share. I understand the motivation for the current more opinionated phone set and phonological rules, but when applying existing dictionaries and acoustic models to new data and non-standard varieties, I think a lot of people based in sociolinguistics/sociophonetics would find it very useful to have access to a dictionary and acoustic model that produce a more phonemic-style output. This is available for American English in the form of the ARPA dictionary, but not for other varieties.

Thanks in advance for your time and help with this.

Discussed in #29

Originally posted by praat-enthusiast February 28, 2024
I'm aware that recent versions of MFA IPA dictionaries follow the opinionated phone set laid out here, which produces a more allophonic transcription. However, I'm based in sociolinguistics and for the project I'm currently working on we would be quite interested to end up with more phonemic or broad phonetic transcription (essentially a version with all of the rules described here reversed).

I was wondering if anyone has a version of the IPA dictionary for UK English which doesn't have the rules described implemented, or has already created a script of some kind to get back to a more standard phonemic transcription after alignment? As I understand it, the current acoustic models have been trained with the dictionaries that use the opinionated phone set, and the allophonic detail in the models and dictionaries improves the alignment, so we would likely be aligning using this phone set and then trying to revert back to a phonemic transcription afterwards. If anyone has attempted this, or has access to a previous version of the dictionary which doesn't implement the new phone set, I'd really appreciate it if you'd be willing to share this with me! I believe that a more phonemic-like dictionary once existed for the US English IPA dictionary at least, as it appears to be mentioned here.

If no one has attempted this already, I'm planning to write a script that will reverse-engineer the rules and produce a more phonemic transcription - I'll share it here if this attempt is successful!

G2P mandarin_pinyin_g2p.zip ignore repeated tokens

Hi,I found that using mandarin_pinyin_g2p.zip to extract pinyin phonemes ignored repeated tokens, how can I avoid it?

Example: shi4 yi1 jia1 zhi4 yao4 gong1 si1 de5 duan3 qi1 gong1

Expected results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1 g o1 ng

But I got the results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1

Looking forward to your reply

UA+RU dicts should have accents

Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.

Example:

до́ма: [ˈdomə]
дома́: [dɐˈma]

This is represented in the MFA dict as:

дома	0.99	0.55	0.56	1.1	d̪ o m ə
дома	0.1	0.44	1.18	0.93	d̪ ɐ m a

It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.

Like so:

до́ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома́	0.1	0.44	1.18	0.93	d̪ ɐ m a

Or so:

до+ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома+	0.1	0.44	1.18	0.93	d̪ ɐ m a

Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.

English mfa dictionary and corresponding G2P model

Hi, I want to use mfa pretrained English_mfa acoustic model and dictionary for alignment. I also want to use the same dictionary for G2P (from text to phoneme). What is the corresponding G2P model for me to transform a text into phoneme? I want to use it for tts inference.

Thanks a lot!

Log likelihood score of the aligned segment to be a particular phone

Dear MFA dev,

I was wondering if it is possible to obtain probability scores from the alignment output.
I mean, the log probability (log probability density) of the aligned segment to be a particular phone.
I would like to use the method proposed by Yuan, J., & Liberman, M. (2009). Investigating /l/ variation in English through forced alignment. In Tenth Annual Conference of the International Speech Communication Association.

I looked around and it seems that this is possible in Kaldi (https://sourceforge.net/p/kaldi/discussion/1355348/thread/3a866d2a/). Apparently Penn Phonetics Lab Forced Aligner can also do it.

Many thanks,
Kevin

Broken link to Japanese MFA dictionary v2.0.0

mfa version: 2.0.0rc4
run mfa models download dictionary japanese_mfa on Ubuntu shows:

RemoteModelNotFoundError: Could not find a model named "japanese_mfa" for dictionary. Available: russian_mfa, 
  mandarin_taiwan_mfa, mandarin_mfa, mandarin_erhua_mfa, mandarin_china_mfa, german_mfa,    french_mfa, english_us_mfa, 
  english_uk_mfa, english_nigeria_mfa, english_mfa, czech_mfa, swedish_mfa,    portuguese_portugal_mfa, portuguese_mfa, 
  portuguese_brazil_mfa, polish_mfa, korean_mfa, korean_jamo_mfa,    vietnamese_mfa, vietnamese_hue_mfa, 
  vietnamese_ho_chi_minh_city_mfa, vietnamese_hanoi_mfa, ukrainian_mfa, turkish_mfa,    thai_mfa, swahili_mfa, 
  spanish_spain_mfa, spanish_mfa, spanish_latin_america_mfa, hausa_mfa, croatian_mfa,    bulgarian_mfa, vietnamese_cv, 
  uzbek_cv, uyghur_cv, urdu_cv, ukrainian_cv, turkish_cv, thai_cv, tatar_cv, tamil_cv,    swedish_cv, sorbian_upper_cv, 
  russian_cv, romanian_cv, punjabi_cv, portuguese_cv, polish_cv, mandarin_pinyin,    maltese_cv, kyrgyz_cv, kurmanji_cv,
   kazakh_cv, italian_cv, indonesian_cv, hungarian_cv, hindi_cv, guarani_cv,    greek_cv, german_prosodylab, 
  georgian_cv, french_prosodylab, english_us_arpa, dutch_cv, czech_cv, chuvash_cv,    bulgarian_cv, belarusian_cv, 
  basque_cv, bashkir_cv, armenian_cv, and abkhaz_cv. You can see all available models either on 
  https://mfa-models.readthedocs.io/en/latest/ or    https://github.com/MontrealCorpusTools/mfa-models/releases. If 
  you're looking for a model from 1.0, please see    
  https://github.com/MontrealCorpusTools/mfa-models/releases/tag/dictionary-archive-v1.0.

I try to download dic from release paga but the page shows 404.

Align transcript and speech (US + UK)

Hi All,
Thank you for this amazing repo really nice work!
We wish to align transcript and speech (english UK + US) what is the correct way to do it?
If it's possible we prefer to use ARPA phone set.

Thank you in advance!
@yochaiye

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.