montrealcorpustools / mfa-models Goto Github PK
View Code? Open in Web Editor NEWCollection of pretrained models for the Montreal Forced Aligner
License: Creative Commons Attribution 4.0 International
Collection of pretrained models for the Montreal Forced Aligner
License: Creative Commons Attribution 4.0 International
This models' IPA transcription symbols seem to have changed quite a lot from those used by the previous english_ipa, is that intentional?
Also... how and why this was done?
Hi team, just wondering what's happening with the Arabic acoustic model? I'm getting conflicting info and some dead links from the documentation. I recommended MFA to a student because I remembered it being in version 1 but now I can't find it. (note, I haven't actually tried downloading anything today - I'm on the wrong computer)
when : mfa align /data /home/mfauser/mandarin_pinyin.txt /home/mfauser/mandarin.zip /home/mfauser/result
Please be aware that you are running an alpha version of MFA. If you would like to install a more stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa
INFO Setting up corpus information...
INFO Found 1 speaker across 3 files, average number of utterances per speaker: 3.0
INFO Initializing multiprocessing jobs...
WARNING Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs. Use the --single_speaker flag if you would like to split utterances across jobs regardless of their
speaker.
INFO Text already normalized.
INFO Features already generated.
ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f734941a390>>
Traceback (most recent call last):
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
raise self.exception
File "/env/bin/mfa", line 8, in
sys.exit(mfa_cli())
^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align.py", line 122, in align_corpus_cli
aligner.align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 334, in align
super().align()
File "/env/lib/python3.11/site-packages/montreal_forced_aligner/alignment/base.py", line 358, in align
assert self.alignment_model_path.suffix == ".alimdl"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError:
Hello,
Thanks for making the models available. I noticed the G2P for Tamil is missing. Broken link:
https://github.com/MontrealCorpusTools/mfa-models/releases/tag/g2p-tamil_mfa-v2.0.0
Hello,
On the MFA page for Armenian, it seems the dictionary is being based off of Armenian transliteration instead of transcriptions.
If you want, I can re-transcribe your dictionary file using a mix of Wiktionary + my native judgments.
hi,
Thanks for opensourcing the models! I am trying to use Spanish MFA on my dataset. The command i ran is: mfa validate my_data spanish_mfa.dict spanish_mfa.zip
, where spanish_mfa.zip
is downloaded from this link and spanish_mfa.dict
is downloaded from this link.
However, i am getting the following error:
PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: a, b, c, d̪, e, f, i, j, k, l, m, n, o, p, r, s, tʃ, t̪, u, w, x, ç, ð, ŋ, ɟ, ɟʝ, ɡ, ɣ, ɲ, ɾ, ʃ, ʎ, ʝ, β, and θ
I didn't have any problem running MFA for English before, so i think i may have did something wrong for Spanish.
Thanks in advance!
Hello and thank you for your work.
I was working with russian_mfa.dict
(downloaded via mfa model download dictionary russian_mfa
) and its format seems unclear: typically ноутбуков 1 0.0 0.0 0.0 n̪ o ʊ d̪ b u k ə f
, I understand that it is a word and its phonemes at the very beginning of the line and at the end respectively, and the first number is some probability, but I can't figure it out what does the other 3 numbers mean. I looked at the documentation here, but there is nothing about format :(
It's important because the output of mfa g2p russian_mfa oov.txt oov_phonemes.txt
has the following format жбанков ('ʐ', 'b', 'a', 'n̪', 'k', 'ə', 'f')
and it's unclear how to merge existing dictionary with oov words, because the formats are different.
Could you please explain what the format is russian_mfa.dict
or where to read about it.
Best wishes
It appears that all orthographic sequences of "ni" in the French MFA model's dictionary are transcribed as [ɲi] or [ɲ] (depending on whether the "i" corresponds to a semivowel or not). For instance, "niche" is [ɲiʃ] but should be [niʃ]. This is affecting performance in tagging.
So great for the new pretrained acoustic models!
Noticed that the new models now accept the IPA-like phonemes, I wonder whether it still works if I take pinyin instead of Chinese characters as the input labels for alignment using the pretrained v2.0 models.
Hi,
I've tried just about every solution I can think of (reinstalling MFA, attempting on different machines and OS, updating packages, running different versions of python in the virtual env, etc.) and I continue to get a 'KaldiProcessingError' which manifests in the align.2.1.log file. This happens with some, but not all, language models (currently happening with the pre-trained Greek and when I attempt to train a Cantonese one). I'm certain I'm not the first one with this issue, so there must be an easy fix but the 16-bit suggestion is not the problem and I'm completely out of ideas of things to try.
Hi, I'm trying to use the Portuguese(Brazil) MFA dictionary
, Portuguese (Brazil) MFA G2P model
, and Portuguese MFA acoustic model
,to align on Multilingual TEDx portuguese dataset. And got the error below,
dictionary phones: {'ʎ', 'a', 'õ', 'b', 'ɛ', 'j̃', 'ɟ', 'n', 'ɐ', 'j', 'w̃', 'k', 'w', 'o', 'ʒ', 'i', 'd', 's', 'tʃ', 'ɔ', 'm', 'ũ', 'ɡ', 'ɐ̃', 'ɾ', 'p', 't', 'dʒ', 'f', 'u', 'c', 'z', 'ẽ', 'ɲ', 'v', 'ĩ', 'l', 'x', 'ʃ', 'e'}
model phones: set()
There were phones in the dictionary that do not have acoustic models: a, b, c, d, dʒ, e, ẽ, f, i, ĩ, j, j̃, k, l, m, n, o, õ, p, s, t, tʃ, u, ũ, v, w, w̃, x, z, ɐ, ɐ̃, ɔ, ɛ, ɟ, ɡ, ɲ, ɾ, ʃ, ʎ, ʒ
Is there any idea how to fix this? Thank you!
Update:
Solved by upgrading MFA to v2.0.0rc5
Hello,
I saw that new IPA dictionaries for many languages were added, which is great.
I was wondering if there were plans to train acoustic models in IPA as well.
Maybe a multilingual model, but with not too many languages at first, that share characteristics? (e.g. english, french, spanish)
In fact, I just stumbled upon your blog post that is talking about that. Very interesting by the way !
https://mmcauliffe.medium.com/creating-english-ipa-dictionary-using-montreal-forced-aligner-2-0-242415dfee32
I recently posted in the discussion section (see below) relating to the phone set used by MFA and asking if anyone had developed a tool for returning to a more standard phonemic transcription after alignment. I'm still interested in getting to this, as it would be very useful for our project where variants will be differentiated using auditory and acoustic methods at a later stage in the research. Having now experimented a little more with MFA and become more familiar with the dictionary (english_uk_mfa.dict) and phonological rules for the model (english_mfa) I was using, I think that while a lot of what I'd like is possible to change using script after alignment, some of the issues are embedded in the dictionary and acoustic model.
Specifically, I've found that the rules for TH-alveolarization (allowing /θ/ -> /s/ and /ð/ -> /z/) and ING-variation are overgeneralized, so that it becomes very difficult to group together variants of these phones in order to study them together at a later stage of research. As an example, a speaker in a test file that I've aligned is often transcribed as using TH-alveolarization by the aligner (e.g. 'there' as [zɛː], 'third' as [sɜː]). Auditorily I think that this speaker actually is producing dental fricatives, but the way that they are treated by the aligner means that if at a later stage someone wanted to use the corpus to consider TH-alveolarization, TH-stopping, or TH-fronting in our data, it would be very difficult to find all cases where this might occur. Similarly, we might consider ING-variation later, and in our data auditorily-identified variants include not only [ɪn] and [ɪŋ], but also [ɪŋk], but currently the aligner transcribes a most of these as [ɪn], making it harder to find potential variable cases later in the research.
It occurs to me that it's likely that a previous MFA version included a model for UK English trained with a dictionary that used a less opinionated phone set, included fewer pronunciation variants in the dictionary, and did not use phonological rules that are difficult to reverse. If this is the case, I was wondering if this is something you'd be willing to share. I understand the motivation for the current more opinionated phone set and phonological rules, but when applying existing dictionaries and acoustic models to new data and non-standard varieties, I think a lot of people based in sociolinguistics/sociophonetics would find it very useful to have access to a dictionary and acoustic model that produce a more phonemic-style output. This is available for American English in the form of the ARPA dictionary, but not for other varieties.
Thanks in advance for your time and help with this.
Originally posted by praat-enthusiast February 28, 2024
I'm aware that recent versions of MFA IPA dictionaries follow the opinionated phone set laid out here, which produces a more allophonic transcription. However, I'm based in sociolinguistics and for the project I'm currently working on we would be quite interested to end up with more phonemic or broad phonetic transcription (essentially a version with all of the rules described here reversed).
I was wondering if anyone has a version of the IPA dictionary for UK English which doesn't have the rules described implemented, or has already created a script of some kind to get back to a more standard phonemic transcription after alignment? As I understand it, the current acoustic models have been trained with the dictionaries that use the opinionated phone set, and the allophonic detail in the models and dictionaries improves the alignment, so we would likely be aligning using this phone set and then trying to revert back to a phonemic transcription afterwards. If anyone has attempted this, or has access to a previous version of the dictionary which doesn't implement the new phone set, I'd really appreciate it if you'd be willing to share this with me! I believe that a more phonemic-like dictionary once existed for the US English IPA dictionary at least, as it appears to be mentioned here.
If no one has attempted this already, I'm planning to write a script that will reverse-engineer the rules and produce a more phonemic transcription - I'll share it here if this attempt is successful!
Hi,I found that using mandarin_pinyin_g2p.zip to extract pinyin phonemes ignored repeated tokens, how can I avoid it?
Example: shi4 yi1 jia1 zhi4 yao4 gong1 si1 de5 duan3 qi1 gong1
Expected results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1 g o1 ng
But I got the results: sh ii4 i1 j ia1 zh ii4 iao4 g o1 ng s ii1 d e5 d ua3 n q i1
Looking forward to your reply
Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.
Example:
до́ма: [ˈdomə]
дома́: [dɐˈma]
This is represented in the MFA dict as:
дома 0.99 0.55 0.56 1.1 d̪ o m ə
дома 0.1 0.44 1.18 0.93 d̪ ɐ m a
It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.
Like so:
до́ма 0.99 0.55 0.56 1.1 d̪ o m ə
дома́ 0.1 0.44 1.18 0.93 d̪ ɐ m a
Or so:
до+ма 0.99 0.55 0.56 1.1 d̪ o m ə
дома+ 0.1 0.44 1.18 0.93 d̪ ɐ m a
Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.
Hi, I want to use mfa pretrained English_mfa acoustic model and dictionary for alignment. I also want to use the same dictionary for G2P (from text to phoneme). What is the corresponding G2P model for me to transform a text into phoneme? I want to use it for tts inference.
Thanks a lot!
Dear MFA dev,
I was wondering if it is possible to obtain probability scores from the alignment output.
I mean, the log probability (log probability density) of the aligned segment to be a particular phone.
I would like to use the method proposed by Yuan, J., & Liberman, M. (2009). Investigating /l/ variation in English through forced alignment. In Tenth Annual Conference of the International Speech Communication Association.
I looked around and it seems that this is possible in Kaldi (https://sourceforge.net/p/kaldi/discussion/1355348/thread/3a866d2a/). Apparently Penn Phonetics Lab Forced Aligner can also do it.
Many thanks,
Kevin
mfa version: 2.0.0rc4
run mfa models download dictionary japanese_mfa
on Ubuntu shows:
RemoteModelNotFoundError: Could not find a model named "japanese_mfa" for dictionary. Available: russian_mfa,
mandarin_taiwan_mfa, mandarin_mfa, mandarin_erhua_mfa, mandarin_china_mfa, german_mfa, french_mfa, english_us_mfa,
english_uk_mfa, english_nigeria_mfa, english_mfa, czech_mfa, swedish_mfa, portuguese_portugal_mfa, portuguese_mfa,
portuguese_brazil_mfa, polish_mfa, korean_mfa, korean_jamo_mfa, vietnamese_mfa, vietnamese_hue_mfa,
vietnamese_ho_chi_minh_city_mfa, vietnamese_hanoi_mfa, ukrainian_mfa, turkish_mfa, thai_mfa, swahili_mfa,
spanish_spain_mfa, spanish_mfa, spanish_latin_america_mfa, hausa_mfa, croatian_mfa, bulgarian_mfa, vietnamese_cv,
uzbek_cv, uyghur_cv, urdu_cv, ukrainian_cv, turkish_cv, thai_cv, tatar_cv, tamil_cv, swedish_cv, sorbian_upper_cv,
russian_cv, romanian_cv, punjabi_cv, portuguese_cv, polish_cv, mandarin_pinyin, maltese_cv, kyrgyz_cv, kurmanji_cv,
kazakh_cv, italian_cv, indonesian_cv, hungarian_cv, hindi_cv, guarani_cv, greek_cv, german_prosodylab,
georgian_cv, french_prosodylab, english_us_arpa, dutch_cv, czech_cv, chuvash_cv, bulgarian_cv, belarusian_cv,
basque_cv, bashkir_cv, armenian_cv, and abkhaz_cv. You can see all available models either on
https://mfa-models.readthedocs.io/en/latest/ or https://github.com/MontrealCorpusTools/mfa-models/releases. If
you're looking for a model from 1.0, please see
https://github.com/MontrealCorpusTools/mfa-models/releases/tag/dictionary-archive-v1.0.
I try to download dic from release paga but the page shows 404.
Hi All,
Thank you for this amazing repo really nice work!
We wish to align transcript and speech (english UK + US) what is the correct way to do it?
If it's possible we prefer to use ARPA phone set.
Thank you in advance!
@yochaiye
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.