aashishag / deepspeech-german Goto Github PK
View Code? Open in Web Editor NEWAutomatic Speech Recognition (ASR) - German
License: Apache License 2.0
Automatic Speech Recognition (ASR) - German
License: Apache License 2.0
Hello,
at first: thank you very much for your effort to provide this highly automated train script.
Sadly Im a real Windows guy and my Linux skills are very "limited". I
ve installed Debian on Windows and tried to get this up and running, but it looks like this will not work on linux on windows (the installation of prerequisites fails)... And setting up a whole new Linux system feels like a little bit overhead.
Would it be possible to provide a pre trained model for German language?
Thank you very much
Carl
For the v0.6.0 model, you mention the duration of TUDA-de dataset to be 184 hours and Voxforge dataset to be 57 hours.
Having checked the links for both, it seems that Voxforge has about 35 hours of data and TUDA-de seems to be about the same.
I'm using the links provided in the README:
Do I have the wrong links somehow?
Hello guys,
In your paper you say that this model was trained using deepspeech 0.5.0 but in the python requirements it says: deepspeech==0.2.0a8
.
How does this work then?
Hi Aashish,
Thanks for the repo. Great work.
I want to use deepspeech pre-trained model (on english). But i have 8khz audio files. One option is I upsample the audio files to 16khz and then use the pretrained model (0.6.1).
Would you know any other method where i can still use the 8khz audio files with the pre-trained deepspeech model?
Hello, I can not use linux environment, so I want to run the training on Windows. But, I am getting error, which I am not able to resolve:
Traceback (most recent call last):
File "E:/deepspeech-german-master/DeepSpeech/training/deepspeech_training/train.py", line 30, in
from DeepSpeech.native_client.ctcdecode import ctc_beam_search_decoder, Scorer
File "E:\deepspeech-german-master\DeepSpeech\native_client\ctcdecode_init_.py", line 3, in
from . import swigwrapper # pylint: disable=import-self
ImportError: cannot import name 'swigwrapper' from 'DeepSpeech.native_client.ctcdecode' (E:\deepspeech-german-master\DeepSpeech\native_client\ctcdecode_init_.py)
I ran commands for kenlm language model using cygwin terminal and it worked pretty well. But while training, I am getting this error. Is it windows specific? Please help!
Hi there,
Your german model is an excellent model and I want to use it for some research I am doing (your work will be appropriately cited :) ).
I need to train your german model for DS 0.4.1 (unfortunately I cannot use newer versions) and I am hitting an issue when it comes to bazel builds.
(deepspeech-german) clio@Trantor:~/Desktop/deepspeech-german/tensorflow$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie
INFO: Analysed 2 targets (0 packages loaded).
INFO: Found 2 targets...
ERROR: /home/clio/Desktop/deepspeech-german/tensorflow/native_client/BUILD:126:1: C++ compilation of rule '//native_client:generate_trie' failed (Exit 1)
native_client/ctcdecode/swigwrapper_wrap.cpp:174:11: fatal error: Python.h: No such file or directory
# include <Python.h>
^~~~~~~~~~
compilation terminated.
INFO: Elapsed time: 0.993s, Critical Path: 0.71s
INFO: 4 processes: 4 local.
FAILED: Build did NOT complete successfully
I followed your instructions and adapted everything tf r1.12 and bazel 0.15.0 (necessary versions for DS 0.4.1) but something about the git file disrupts the building.
I checked the tf versions
(deepspeech-german) clio@Trantor:~/Desktop/deepspeech-german$ python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
v1.12.0-0-ga6d8ffae09 1.12.0
and there is some semantic incompatibility although they point to the same version. Did you ever encounter a problem like this?
I tried to install deepspeech-german on Debian stretch and get:
On installing
pip3 install -r python_requirements.txt
I get
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/ds-ctcdecoder/
URL https://pypi.org/simple/ds-ctcdecoder/ -> 404
What now?
...and therefore results in:
AttributeError: 'Splitter' object has no attribute 'split_by_length_of_utterances
Possible fix could be replacing it with split. That worked for me at least.
Correction for deepspeech-german/pre-processing/prepare_data.py
would be:
splits = splitter.split(proportions={'train': 0.7, 'dev': 0.15, 'test': 0.15}, separate_issuers=True)
Hello,
I’m trying to execute test_model.sh
but keep running into version problems. The scripts called by test_model.sh
keep trying to use a quite old version of ds_ctcdecoder which is not installable on any of the platforms I’ve tried so far. Using a newer version of ds_ctcdecoder leads to one error after another.
Using DeepSpeech 0.6.0 instead of the default 0.5.0 also leads to errors.
Could you tell me on which platform it is possible to execute this code? I have tried Ubuntu 18.04, a tensorflow 1.13.1 docker container, and a deepspeech docker container by Mozilla so far, all to no avail.
Running the rest of the code in the README works fine using a conda environment with tensorflow-gpu 1.13.1.
The problems only occur when I try to run test_model.sh
.
Thank you for your help!
Thanks for sharing your model.
I have some questions I would like to ask:
Hello,
Im relatively new to DeepSpeech, I
m trying to replace the online speech recognition I use at the moment with DeepSpeech, but the results I`m getting at the moment are very far away for (for example Azure Speech Recognition). Perhaps I´m feeding the data in the wrong format or something like that.
What Ive learned so far is that there "could be" a language dependent scorer file, but I can
t find one for the 0.9 version. Can you provide that file? Or is this not required?
Can you provide some information about the required input? At the moment I`m using wav-data at 16000 samples per seconds, 16 bit sample size and one channel... is that correct?
Thank you very much for your help
Carl
When executing the command
pip3 install -r python_requirements.txt
I get the following message:
`ERROR: Cannot install -r python_requirements.txt (line 12), -r python_requirements.txt (line 4), -r python_requirements.txt (line 47), -r python_requirements.txt (line 54), -r python_requirements.txt (line 56), h5py==2.10.0, keras-preprocessing==1.1.2, librosa==0.7.2, numba==0.49.1, numpy==1.18.1, opt-einsum==3.3.0, resampy==0.2.2 and scipy==1.4.1 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested numpy==1.18.1
audiomate 6.0.0 depends on numpy==1.18.1
ds-ctcdecoder 0.9.3 depends on numpy>=1.14.5
h5py 2.10.0 depends on numpy>=1.7
keras-preprocessing 1.1.2 depends on numpy>=1.9.1
librosa 0.7.2 depends on numpy>=1.15.0
numba 0.49.1 depends on numpy>=1.15
opt-einsum 3.3.0 depends on numpy>=1.7
resampy 0.2.2 depends on numpy>=1.10
scikit-learn 0.24.0 depends on numpy>=1.13.3
scipy 1.4.1 depends on numpy>=1.13.3
tensorboard 2.4.0 depends on numpy>=1.12.0
tensorflow 2.4.0 depends on numpy~=1.19.2
To fix this you could try to:
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies`
So, audiomate 6.0.00 requires numpy==1.18.1 and tensorflow 2.4.0 depends on numpy~=1.19.2.
I tried installing
pip install audiomate tensorflow
this succeeds with tensorflow 2.3.2 being installed. But I don't know whether this would work with deepspeech-german .
Any suggestions for resolving this ?
Hey,
first of all: Many thanks for all the efforts from everyone until now.
I'm not that good with all the command line stuff but I have powerful servers for rendering with blender.
Would there be the option for getting a really simple instruction for creating / training the German model?
(Preferable a shell script for just starting on a Linux server?)
Many thanks
Robin
Dear @AASHISHAG ,
thanks a lot for this awesome repository. I am currently trying to export the version 0.9.0 model to the openVINO toolkit.
So far, with the help of the openVINO support, I was able to convert the tensorflow model to the optimized format. It's running with a demo of the open_vino_model_zoo, but the results are far behind to those I get with the mozilla deepspeech example.
A part of the problem I think are the alpha and beta parameters for the language model. How do I find the best values? So far I have seen the best results by setting both to 0. There are values in the flags.txt file in your google drive, but using those there isn't any output at all.
I don't know if this is of any interest to you or this repository, but I would be very thankful if you could help me out or point me in the right direction.
Greetings
Orys
While installing when using pip3 install -r python_requirements.txt
I get:
ERROR: Cannot install -r python_requirements.txt (line 12), -r python_requirements.txt (line 20), -r python_requirements.txt (line 25), -r python_requirements.txt (line 26), -r python_requirements.txt (line 31), -r python_requirements.txt (line 34), -r python_requirements.txt (line 4), -r python_requirements.txt (line 45), -r python_requirements.txt (line 47), -r python_requirements.txt (line 48), -r python_requirements.txt (line 54), -r python_requirements.txt (line 56) and numpy==1.18.1 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested numpy==1.18.1
audiomate 6.0.0 depends on numpy==1.18.1
ds-ctcdecoder 0.9.3 depends on numpy>=1.14.5
h5py 2.10.0 depends on numpy>=1.7
keras-preprocessing 1.1.2 depends on numpy>=1.9.1
librosa 0.7.2 depends on numpy>=1.15.0
numba 0.49.1 depends on numpy>=1.15
opt-einsum 3.3.0 depends on numpy>=1.7
resampy 0.2.2 depends on numpy>=1.10
scikit-learn 0.24.0 depends on numpy>=1.13.3
scipy 1.4.1 depends on numpy>=1.13.3
tensorboard 2.4.0 depends on numpy>=1.12.0
tensorflow 2.4.0 depends on numpy~=1.19.2
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Hi,
I simply looking for a pbmm and scorer file for the German translation, it drives me crazy because I could found it in french https://github.com/Common-Voice/commonvoice-fr/releases, it's working like a charm but I don't find the German equivalent.
My poor computing power and network bandwith (to download CV datasets for example) don't allow me to train my model and honestly, I don't want to (lazy like a good IT guy :)
Someone probably did upload those files somewhere already, any information would be appriciated !
Cheers :)
Hi, I want to use your pretrained model for German. I am not really sure how to use it on windows. And is it going to work on a normal computer?
Was müßte geändert werden, damit Satzzeichen und deutsche GuKschreibung integriert werden?
Antwort bitte auf Deutsch.
Have you already tried to improve the german model with these both datasets?
I have seen that it was an option in pre-processing ... I am very curious about wether this could improve the model, but I do not have the computing-power and "in-deph-model-development-knowlegde". Here is some Kaldi-based example where this dataset combination worked very well: https://github.com/uhh-lt/kaldi-tuda-de
Looking forward to hear from you!
All the best
Fabian
Hi @AASHISHAG, and thanks for open-sourcing your work!
I'm wondering if you might have some performance stats (e.g. WER and CER) for more recent versions of the model (even just the most recent v0.9.0
) would be good.
Thanks!
building the language model create_language_model.sh
ended up in an HTTP 404 Error
Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.3.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
File "DeepSpeech/util/taskcluster.py", line 12, in <module>
dsu_taskcluster.main()
File "/var/share/deepspeech-german/DeepSpeech/training/deepspeech_training/util/taskcluster.py", line 128, in main
maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
File "/var/share/deepspeech-german/DeepSpeech/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc
_, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Need a hint where to head for a solution
The deepspeech project seems to be not maintained anymore.
The fork coqui done by Mozillians being from the original team is maintained. It migth be fully compatible. Is it possible to re-run with your AI power against this new project?
Could this be made into a dockerfile?
Hi,
first of all - thank you for your work. Are there any WER results for the newer models aviable, that you released after the initial paper?
Best Regards,
Oliver
Hi, during past months there was a massive growth in german language dataset at Mozilla voice. Currently there is 1035hrs so it might be worth to update the link at the readme.md?
Bzgl. transcribe.py
Kann man bei der Erkennung auch die Selbstsicherheit, also confidence, der Erkennung angeben, damit ich die mit niedriger Ss Erkannten gezielt überprüfen kann?
When I follow all the steps in the READMe and get to python pre-processing/download_speech_corpus.py --tuda --cv --swc --voxforge --mailabs
I get this error:
Traceback (most recent call last):
File "pre-processing/download_speech_corpus.py", line 6, in <module>
import audiomate.corpus.io
File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/corpus/io/__init__.py", line 7, in <module>
from .downloader import ArchiveDownloader
File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/corpus/io/downloader.py", line 7, in <module>
from audiomate.utils import download
File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/utils/download.py", line 11, in <module>
from pget.down import Downloader
File "/deepspeech-german/python-environments/lib/python3.7/site-packages/pget/__init__.py", line 1, in <module>
from down import Downloader
ModuleNotFoundError: No module named 'down'
Hi, there is DS EN 0.9 available, which also change some details on the model architecture:
https://github.com/mozilla/DeepSpeech/releases/tag/v0.9.0
There is also the CommonVoice Dataset by 2020-06-22 shipping with 750hrs (690hrs confirmed) audio files for training:
https://commonvoice.mozilla.org/de/datasets
Do you think you could give it a try to update?
When loading the tflite version in deepspeech-tflite
version 0.9.2 and also 0.9.0 I get an segmentation fault SIGSEGV. Can this be replicated?
from deepspeech import Model
model = 'output_graph.tflite'
ds = Model(model)
What could be the issue? I do not see any stacktrace or useful information how to find the source of the problem.
Following your guidelines I encountered following error when trying to build the language model:
./create_language_model.sh
Training package is not installed. See training documentation.
...
Traceback (most recent call last):
File "DeepSpeech/util/taskcluster.py", line 7, in <module>
from deepspeech_training.util import taskcluster as dsu_taskcluster
ImportError: No module named deepspeech_training.util
... but the requested file seem to be in the right place ?!
I am using version 0.10.0-alpha.3 nuget [libdeepspeech.so], deepspeech-0.9.3-models.pbmm, and arctic_a0024.wav
I am able to get the right result.
However, when I use output_graph_de.pbmm, which I understand from polygot for 0.7 version. I could not get result.
Please advice.
Thank you
I think i found the reason why my Tuda results are much worse than the results in your paper. It's because of the random splitting. I switched to the splitting suggested by Tuda some time ago, for better comparability with some other papers.
As you write in your paper, each tuda sample was recorded with 4 different microphones. If you split the data random, instead of the suggested splitting, you will get a lot of test samples where a sample with the same transcription is already in the training dataset, only recorded with a different microphone.
Training with random split resulted in a WER of 0.09, while training with the same parameters but with correct splitting only has 0.42 WER.
I think this also has a strong influence on all of the other trainings with combined datasets including Tuda.
Not entirely sure if I am doing things right, but,
pip3 install deepspeech==0.6.1
)deepspeech
with parameters & input audioResult:
$ deepspeech --model ../model/0.6/output_graph.pb --lm ../model/0.6/lm.binary --trie ../model/0.6/trie --audio ../../test.wav
Loading model from file ../model/0.6/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
ERROR: Model provided has model identifier 'inpu', should be 'TFL3'
Error at reading model file ../model/0.6/output_graph.pb
Traceback (most recent call last):
File "/home/pi/deepspeech/deepspeech-6-venv/venv/bin/deepspeech", line 10, in <module>
sys.exit(main())
File "/home/pi/deepspeech/deepspeech-6-venv/venv/lib/python3.7/site-packages/deepspeech/client.py", line 113, in main
ds = Model(args.model, args.beam_width)
File "/home/pi/deepspeech/deepspeech-6-venv/venv/lib/python3.7/site-packages/deepspeech/__init__.py", line 42, in __init__
raise RuntimeError("CreateModel failed with error code {}".format(status))
RuntimeError: CreateModel failed with error code 12288
Not so sure what is going wrong here. Any advice?
Could you please add a license to your project?
I would be very glad if you choose one compatible to my LGPL (I changed it recently).
Hi, first of all thank you so much for providing this awesome project!
I've installed deepspeach 0.5.0 and run this command:
deepspeech --model model-de/model_tuda+voxforge+mozilla.pb --alphabet model-de/alphabet.txt --lm model-de/lm.binary --trie model-de/trie --audio weihnachtsmann.wav
The wav file I've generated from an mp3 which I've got from dict.cc via curl:
curl 'https://audio.dict.cc/speak.audio.v2.php?error_as_text=1&type=mp3&id=91424&lang=rec&lp=DEEN' > weihnachtsmann.mp3
then converted with ffmpeg via
ffmpeg -i weihnachtsmann.mp3 -acodec pcm_s16le -ac 1 -ar 16000 weihnachtsmann.wav
and the output is this: nach man
I've tried another wave for the same word:
curl 'https://audio.dict.cc/speak.audio.v2.php?error_as_text=1&type=mp3&id=655858&lang=de&lp=DEEN&voice=Hans' > weihnachtsmann.mp3
And the result was reinach zum
.
I was expecting the word is very simple but the result is not really usable. Is the model not trained good enough? I was reading in the paper that the error rate was quite low. So I wondering why this simple word cannot be recognized?
Maybe it is converted with ffmpeg with wrong settings?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.