montrealcorpustools / mfa-models Goto Github PK

View Code? Open in Web Editor NEW

103.0 103.0 19.0 1.36 GB

Collection of pretrained models for the Montreal Forced Aligner

License: Creative Commons Attribution 4.0 International

Python 95.25% R 4.75%

mfa-models's People

Contributors

Stargazers

Watchers

Forkers

toannhu human2b huangphoux serious777 leslie2046 johnzhongxm jd-yang-b wancaiyan nengczhou joyseyousa sai-fpp xusanduo-oppo deviandice dophist orantake eman-albalkhi kongweiningg jesusoctavioas seohyeonshin

mfa-models's Issues

new english_mfa model

This models' IPA transcription symbols seem to have changed quite a lot from those used by the previous english_ipa, is that intentional?

Also... how and why this was done?

Hi team, just wondering what's happening with the Arabic acoustic model? I'm getting conflicting info and some dead links from the documentation. I recommended MFA to a student because I remembered it being in version 1 but now I can't find it. (note, I haven't actually tried downloading anything today - I'm on the wrong computer)

The dictionary is listed: and the dictionary page references the dictionary "was used in training the Arabic MFA acoustic model"
The release page link for downloading the dictionary manually is dead though the plain dictionary link works.
The models folder on github https://github.com/MontrealCorpusTools/mfa-models/tree/main/acoustic has Arabic in it but I think not updated to v2?

AssertionError: assert self.alignment_model_path.suffix == ".alimdl"

when : mfa align /data /home/mfauser/mandarin_pinyin.txt /home/mfauser/mandarin.zip /home/mfauser/result

Please be aware that you are running an alpha version of MFA. If you would like to install a more stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa
INFO Setting up corpus information...
INFO Found 1 speaker across 3 files, average number of utterances per speaker: 3.0
INFO Initializing multiprocessing jobs...
WARNING Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs. Use the --single_speaker flag if you would like to split utterances across jobs regardless of their
speaker.
INFO Text already normalized.
INFO Features already generated.
ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f734941a390>>
Traceback (most recent call last):
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
raise self.exception
File "/env/bin/mfa", line 8, in
sys.exit(mfa_cli())
^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align.py", line 122, in align_corpus_cli
aligner.align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 334, in align
super().align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/base.py", line 358, in align
assert self.alignment_model_path.suffix == ".alimdl"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError:

Tamil G2P model missing

Hello,

Thanks for making the models available. I noticed the G2P for Tamil is missing. Broken link:

https://github.com/MontrealCorpusTools/mfa-models/releases/tag/g2p-tamil_mfa-v2.0.0

inconsistencies in Armenian dictionary

Hello,

On the MFA page for Armenian, it seems the dictionary is being based off of Armenian transliteration instead of transcriptions.

In the consonants table, the system is breaking up affricates. This creates non-existent IPA symbols like sʰ. The example it cites is կից [k i t sʰ], but this word is actually [kit͡sʰ] with an an aspirated affricate /t͡sʰ/. Paradoxically, the XPF page doesn't have an /sʰ/ symbol.
The examples seem to be transliterations instead of transcriptions because it's omitting epenthetic schwas which are unwritten but prescriptively present. For example, for the /ɡ/ symbol, a cited example is գլուխ [ɡ l u χ], but this word is actually [ɡəluχ] with an epenthetic schwa, as also reported on Wiktionary

If you want, I can re-transcribe your dictionary file using a mix of Wiktionary + my native judgments.

Spanish MFA

hi,

Thanks for opensourcing the models! I am trying to use Spanish MFA on my dataset. The command i ran is: mfa validate my_data spanish_mfa.dict spanish_mfa.zip, where spanish_mfa.zip is downloaded from this link and spanish_mfa.dict is downloaded from this link.

However, i am getting the following error:
PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: a, b, c, d̪, e, f, i, j, k, l, m, n, o, p, r, s, tʃ, t̪, u, w, x, ç, ð, ŋ, ɟ, ɟʝ, ɡ, ɣ, ɲ, ɾ, ʃ, ʎ, ʝ, β, and θ

I didn't have any problem running MFA for English before, so i think i may have did something wrong for Spanish.

Thanks in advance!

Dictionary format is unclear.

Hello and thank you for your work.

I was working with russian_mfa.dict (downloaded via mfa model download dictionary russian_mfa) and its format seems unclear: typically ноутбуков 1 0.0 0.0 0.0 n̪ o ʊ d̪ b u k ə f, I understand that it is a word and its phonemes at the very beginning of the line and at the end respectively, and the first number is some probability, but I can't figure it out what does the other 3 numbers mean. I looked at the documentation here, but there is nothing about format :(

It's important because the output of mfa g2p russian_mfa oov.txt oov_phonemes.txt has the following format жбанков ('ʐ', 'b', 'a', 'n̪', 'k', 'ə', 'f') and it's unclear how to merge existing dictionary with oov words, because the formats are different.

Could you please explain what the format is russian_mfa.dict or where to read about it.
Best wishes

Incorrect <ni> sequences in French MFA model

It appears that all orthographic sequences of "ni" in the French MFA model's dictionary are transcribed as [ɲi] or [ɲ] (depending on whether the "i" corresponds to a semivowel or not). For instance, "niche" is [ɲiʃ] but should be [niʃ]. This is affecting performance in tagging.

For Mandarin alignment, are the PINYIN phonemes still functioning?

So great for the new pretrained acoustic models!
Noticed that the new models now accept the IPA-like phonemes, I wonder whether it still works if I take pinyin instead of Chinese characters as the input labels for alignment using the pretrained v2.0 models.

'KaldiProcessingError' Error constructing TableReader:

Hi,

I've tried just about every solution I can think of (reinstalling MFA, attempting on different machines and OS, updating packages, running different versions of python in the virtual env, etc.) and I continue to get a 'KaldiProcessingError' which manifests in the align.2.1.log file. This happens with some, but not all, language models (currently happening with the pre-trained Greek and when I attempt to train a Cantonese one). I'm certain I'm not the first one with this issue, so there must be an easy fix but the 16-bit suggestion is not the problem and I'm completely out of ideas of things to try.

Brazilian Portuguese Alignment Error

Hi, I'm trying to use the Portuguese(Brazil) MFA dictionary, Portuguese (Brazil) MFA G2P model, and Portuguese MFA acoustic model,to align on Multilingual TEDx portuguese dataset. And got the error below,

dictionary phones: {'ʎ', 'a', 'õ', 'b', 'ɛ', 'j̃', 'ɟ', 'n', 'ɐ', 'j', 'w̃', 'k', 'w', 'o', 'ʒ', 'i', 'd', 's', 'tʃ', 'ɔ', 'm', 'ũ', 'ɡ', 'ɐ̃', 'ɾ', 'p', 't', 'dʒ', 'f', 'u', 'c', 'z', 'ẽ', 'ɲ', 'v', 'ĩ', 'l', 'x', 'ʃ', 'e'}
model phones: set()
There were phones in the dictionary that do not have acoustic models: a, b, c, d, dʒ, e, ẽ, f, i, ĩ, j, j̃, k, l, m, n, o, õ, p, s, t, tʃ, u, ũ, v, w, w̃, x, z, ɐ, ɐ̃, ɔ, ɛ, ɟ, ɡ, ɲ, ɾ, ʃ, ʎ, ʒ

Is there any idea how to fix this? Thank you!

Update:
Solved by upgrading MFA to v2.0.0rc5

new acoustic models with IPA dictionnaries

Hello,

I saw that new IPA dictionaries for many languages were added, which is great.

I was wondering if there were plans to train acoustic models in IPA as well.
Maybe a multilingual model, but with not too many languages at first, that share characteristics? (e.g. english, french, spanish)

In fact, I just stumbled upon your blog post that is talking about that. Very interesting by the way !
https://mmcauliffe.medium.com/creating-english-ipa-dictionary-using-montreal-forced-aligner-2-0-242415dfee32

Request for more phonemic-style UK English dictionary and acoustic model

I recently posted in the discussion section (see below) relating to the phone set used by MFA and asking if anyone had developed a tool for returning to a more standard phonemic transcription after alignment. I'm still interested in getting to this, as it would be very useful for our project where variants will be differentiated using auditory and acoustic methods at a later stage in the research. Having now experimented a little more with MFA and become more familiar with the dictionary (english_uk_mfa.dict) and phonological rules for the model (english_mfa) I was using, I think that while a lot of what I'd like is possible to change using script after alignment, some of the issues are embedded in the dictionary and acoustic model.

Specifically, I've found that the rules for TH-alveolarization (allowing /θ/ -> /s/ and /ð/ -> /z/) and ING-variation are overgeneralized, so that it becomes very difficult to group together variants of these phones in order to study them together at a later stage of research. As an example, a speaker in a test file that I've aligned is often transcribed as using TH-alveolarization by the aligner (e.g. 'there' as [zɛː], 'third' as [sɜː]). Auditorily I think that this speaker actually is producing dental fricatives, but the way that they are treated by the aligner means that if at a later stage someone wanted to use the corpus to consider TH-alveolarization, TH-stopping, or TH-fronting in our data, it would be very difficult to find all cases where this might occur. Similarly, we might consider ING-variation later, and in our data auditorily-identified variants include not only [ɪn] and [ɪŋ], but also [ɪŋk], but currently the aligner transcribes a most of these as [ɪn], making it harder to find potential variable cases later in the research.

It occurs to me that it's likely that a previous MFA version included a model for UK English trained with a dictionary that used a less opinionated phone set, included fewer pronunciation variants in the dictionary, and did not use phonological rules that are difficult to reverse. If this is the case, I was wondering if this is something you'd be willing to share. I understand the motivation for the current more opinionated phone set and phonological rules, but when applying existing dictionaries and acoustic models to new data and non-standard varieties, I think a lot of people based in sociolinguistics/sociophonetics would find it very useful to have access to a dictionary and acoustic model that produce a more phonemic-style output. This is available for American English in the form of the ARPA dictionary, but not for other varieties.

Thanks in advance for your time and help with this.

Discussed in #29

^{Originally posted by praat-enthusiast February 28, 2024}
I'm aware that recent versions of MFA IPA dictionaries follow the opinionated phone set laid out here, which produces a more allophonic transcription. However, I'm based in sociolinguistics and for the project I'm currently working on we would be quite interested to end up with more phonemic or broad phonetic transcription (essentially a version with all of the rules described here reversed).

I was wondering if anyone has a version of the IPA dictionary for UK English which doesn't have the rules described implemented, or has already created a script of some kind to get back to a more standard phonemic transcription after alignment? As I understand it, the current acoustic models have been trained with the dictionaries that use the opinionated phone set, and the allophonic detail in the models and dictionaries improves the alignment, so we would likely be aligning using this phone set and then trying to revert back to a phonemic transcription afterwards. If anyone has attempted this, or has access to a previous version of the dictionary which doesn't implement the new phone set, I'd really appreciate it if you'd be willing to share this with me! I believe that a more phonemic-like dictionary once existed for the US English IPA dictionary at least, as it appears to be mentioned here.

If no one has attempted this already, I'm planning to write a script that will reverse-engineer the rules and produce a more phonemic transcription - I'll share it here if this attempt is successful!

G2P mandarin_pinyin_g2p.zip ignore repeated tokens

Hi，I found that using mandarin_pinyin_g2p.zip to extract pinyin phonemes ignored repeated tokens, how can I avoid it?

Example: shi4 yi1 jia1 zhi4 yao4 gong1 si1 de5 duan3 qi1 gong1

Expected results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1 g o1 ng

But I got the results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1

Looking forward to your reply

UA+RU dicts should have accents

Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.

Example:

до́ма: [ˈdomə]
дома́: [dɐˈma]

This is represented in the MFA dict as:

дома	0.99	0.55	0.56	1.1	d̪ o m ə
дома	0.1	0.44	1.18	0.93	d̪ ɐ m a

It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.

Like so:

до́ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома́	0.1	0.44	1.18	0.93	d̪ ɐ m a

Or so:

до+ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома+	0.1	0.44	1.18	0.93	d̪ ɐ m a

Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.

English mfa dictionary and corresponding G2P model

Hi, I want to use mfa pretrained English_mfa acoustic model and dictionary for alignment. I also want to use the same dictionary for G2P (from text to phoneme). What is the corresponding G2P model for me to transform a text into phoneme? I want to use it for tts inference.

Thanks a lot!

Log likelihood score of the aligned segment to be a particular phone

Dear MFA dev,

I was wondering if it is possible to obtain probability scores from the alignment output.
I mean, the log probability (log probability density) of the aligned segment to be a particular phone.
I would like to use the method proposed by Yuan, J., & Liberman, M. (2009). Investigating /l/ variation in English through forced alignment. In Tenth Annual Conference of the International Speech Communication Association.

I looked around and it seems that this is possible in Kaldi (https://sourceforge.net/p/kaldi/discussion/1355348/thread/3a866d2a/). Apparently Penn Phonetics Lab Forced Aligner can also do it.

Many thanks,
Kevin

Broken link to Japanese MFA dictionary v2.0.0

mfa version: 2.0.0rc4
run mfa models download dictionary japanese_mfa on Ubuntu shows:

RemoteModelNotFoundError: Could not find a model named "japanese_mfa" for dictionary. Available: russian_mfa, 
  mandarin_taiwan_mfa, mandarin_mfa, mandarin_erhua_mfa, mandarin_china_mfa, german_mfa,    french_mfa, english_us_mfa, 
  english_uk_mfa, english_nigeria_mfa, english_mfa, czech_mfa, swedish_mfa,    portuguese_portugal_mfa, portuguese_mfa, 
  portuguese_brazil_mfa, polish_mfa, korean_mfa, korean_jamo_mfa,    vietnamese_mfa, vietnamese_hue_mfa, 
  vietnamese_ho_chi_minh_city_mfa, vietnamese_hanoi_mfa, ukrainian_mfa, turkish_mfa,    thai_mfa, swahili_mfa, 
  spanish_spain_mfa, spanish_mfa, spanish_latin_america_mfa, hausa_mfa, croatian_mfa,    bulgarian_mfa, vietnamese_cv, 
  uzbek_cv, uyghur_cv, urdu_cv, ukrainian_cv, turkish_cv, thai_cv, tatar_cv, tamil_cv,    swedish_cv, sorbian_upper_cv, 
  russian_cv, romanian_cv, punjabi_cv, portuguese_cv, polish_cv, mandarin_pinyin,    maltese_cv, kyrgyz_cv, kurmanji_cv,
   kazakh_cv, italian_cv, indonesian_cv, hungarian_cv, hindi_cv, guarani_cv,    greek_cv, german_prosodylab, 
  georgian_cv, french_prosodylab, english_us_arpa, dutch_cv, czech_cv, chuvash_cv,    bulgarian_cv, belarusian_cv, 
  basque_cv, bashkir_cv, armenian_cv, and abkhaz_cv. You can see all available models either on 
  https://mfa-models.readthedocs.io/en/latest/ or    https://github.com/MontrealCorpusTools/mfa-models/releases. If 
  you're looking for a model from 1.0, please see    
  https://github.com/MontrealCorpusTools/mfa-models/releases/tag/dictionary-archive-v1.0.

I try to download dic from release paga but the page shows 404.

Could not find a model named "chinese_mfa" for tokenizer

There is indeed no chinese_mfa tokenizer model in this list. However, the class ChineseTokenizer has been already implemented here as the japanese tokenizer does, which can be downloaded by running mfa model download tokenizer japanese_mfa.

Align transcript and speech (US + UK)

Hi All,
Thank you for this amazing repo really nice work!
We wish to align transcript and speech (english UK + US) what is the correct way to do it?
If it's possible we prefer to use ARPA phone set.

Thank you in advance!
@yochaiye

montrealcorpustools / mfa-models Goto Github PK

mfa-models's People

Contributors

Stargazers

Watchers

Forkers

mfa-models's Issues

Discussed in #29

Recommend Projects

Recommend Topics

Recommend Org