Comments (6)
Hi,
Thanks for your comment.
Unfortunately, it does not have a feature to put boundaries between phonemes, but I think it can be implemented in the following way depending on your situation.
-
If you have the transcription for the audio:
You can transform your transcription to the (correct) phoneme sequence, then match the recognized output with it using edit distance. The shortest path will show you the boundary of each word. This should be very simple to implement. -
If you do not have transcription for the audio:
Then recognizing word boundaries is very similar to a normal speech recognition task because you need to know the underlying word. There are several ways to do it. If the vocabulary is limited, you can create a search graph (lattice) and search over it with the output phonemes. If you do not know the vocabulary, the you probably need to rely on some WFST or neural network decoders, which are not very easy to implement.
Thanks!
from allosaurus.
Thank you for your suggestions.
I don't have transcripts and was wondering, if there is an easy way to convert these continuous phoneme sequences to text?
Do you think that option 2 would work in my case?
from allosaurus.
I think there is no easy way to convert phoneme seq to text because there are lots of ambiguities in the phoneme outputs. You can implement a search graph with a pronounciation lexicon, but it might not give you very good results (because the outputs typically do not perfectly match with your lexicon)
If you want good word boundaries, I suggest you take some existing good speech recognition models to recognize, then convert words outputs to phonemes afterwords. That might be easier to obtain the high quality boundary than this tool.
from allosaurus.
I was actually looking to somehow use allosaurus for ASR task.
Given the recognized phonemes from audio, I want to predict the true transcriptions. But I think that would be challenging due to ambiguities in the phoneme outputs, as you mentioned above.
In case you have some ideas, I would love to discuss. Otherwise, I can close this ticket.
from allosaurus.
Yeah, if you have some training set, probably you can use this tool to transcribe all audios and then train another seq2seq model to map phoneme to your transcription. This probably is the easiest way.
from allosaurus.
Thank you for the suggestions. I am closing the ticket.
from allosaurus.
Related Issues (20)
- Prior.txt file path HOT 2
- Optimizing for Latency
- support for python 3.10 HOT 4
- Not able to transcribe simple word what in English HOT 5
- more model for recognition HOT 1
- The timestamp of model 'interspeech21' is incorrect HOT 5
- Unable to run interspeech21 model HOT 1
- Feature normalization can cause NaN to appear HOT 1
- Directory Name con not allowed on Windows HOT 1
- NumPy requirement is less than 1.22 and latest is 1.19.5
- Difference in outputs of splitted v/s unsplitted audio file HOT 2
- Wave error for given sample
- Any way to add new languages?
- UnicodeEncodeError: 'charmap' codec can't encode character '\u02d0' in position 28 when redirecting in WIndows
- Content of fine-tuning files?
- AttributeError: 'PosixPath' object has no attribute 'startswith' HOT 1
- Fix setup.py
- Phone inventory always the default one even after specifying model eng2102 and lang eng
- Is there any way of getting arpabet phonetic transcription for hindi language?
- How long does it theoretically take for "allosaurus" to recognize phonemes?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from allosaurus.