Giter VIP home page Giter VIP logo

deepspeech-german's Introduction

Hi 👋🏻, I'm Aashish

A passionate Speech Recognition Engineer and Software Developer

aashishag

deepspeech-german's People

Contributors

aashishag avatar andife avatar kaoh avatar leonheess avatar ocastx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepspeech-german's Issues

Request for providing a pretrained model

Hello,

at first: thank you very much for your effort to provide this highly automated train script.

Sadly Im a real Windows guy and my Linux skills are very "limited". Ive installed Debian on Windows and tried to get this up and running, but it looks like this will not work on linux on windows (the installation of prerequisites fails)... And setting up a whole new Linux system feels like a little bit overhead.

Would it be possible to provide a pre trained model for German language?

Thank you very much

Carl

About dataset duration for v0.6.0

For the v0.6.0 model, you mention the duration of TUDA-de dataset to be 184 hours and Voxforge dataset to be 57 hours.

Having checked the links for both, it seems that Voxforge has about 35 hours of data and TUDA-de seems to be about the same.
I'm using the links provided in the README:

  1. Voxforge: http://www.voxforge.org/home/forums/other-languages/german/open-speech-data-corpus-for-german
  2. TUDA-de: https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html

Do I have the wrong links somehow?

DeepSpeech model used

Hello guys,

In your paper you say that this model was trained using deepspeech 0.5.0 but in the python requirements it says: deepspeech==0.2.0a8.

How does this work then?

Use pre-trained deepspeech english model with 8khz audio files

Hi Aashish,
Thanks for the repo. Great work.

I want to use deepspeech pre-trained model (on english). But i have 8khz audio files. One option is I upsample the audio files to 16khz and then use the pretrained model (0.6.1).

Would you know any other method where i can still use the 8khz audio files with the pre-trained deepspeech model?

Training on Windows

Hello, I can not use linux environment, so I want to run the training on Windows. But, I am getting error, which I am not able to resolve:
Traceback (most recent call last):
File "E:/deepspeech-german-master/DeepSpeech/training/deepspeech_training/train.py", line 30, in
from DeepSpeech.native_client.ctcdecode import ctc_beam_search_decoder, Scorer
File "E:\deepspeech-german-master\DeepSpeech\native_client\ctcdecode_init_.py", line 3, in
from . import swigwrapper # pylint: disable=import-self
ImportError: cannot import name 'swigwrapper' from 'DeepSpeech.native_client.ctcdecode' (E:\deepspeech-german-master\DeepSpeech\native_client\ctcdecode_init_.py)

I ran commands for kenlm language model using cygwin terminal and it worked pretty well. But while training, I am getting this error. Is it windows specific? Please help!

Trying to train the german model for DS 0.4.1

Hi there,

Your german model is an excellent model and I want to use it for some research I am doing (your work will be appropriately cited :) ).

I need to train your german model for DS 0.4.1 (unfortunately I cannot use newer versions) and I am hitting an issue when it comes to bazel builds.

(deepspeech-german) clio@Trantor:~/Desktop/deepspeech-german/tensorflow$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie
INFO: Analysed 2 targets (0 packages loaded).
INFO: Found 2 targets...
ERROR: /home/clio/Desktop/deepspeech-german/tensorflow/native_client/BUILD:126:1: C++ compilation of rule '//native_client:generate_trie' failed (Exit 1)
native_client/ctcdecode/swigwrapper_wrap.cpp:174:11: fatal error: Python.h: No such file or directory
 # include <Python.h>
           ^~~~~~~~~~
compilation terminated.
INFO: Elapsed time: 0.993s, Critical Path: 0.71s
INFO: 4 processes: 4 local.
FAILED: Build did NOT complete successfully

I followed your instructions and adapted everything tf r1.12 and bazel 0.15.0 (necessary versions for DS 0.4.1) but something about the git file disrupts the building.
I checked the tf versions

(deepspeech-german) clio@Trantor:~/Desktop/deepspeech-german$ python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

v1.12.0-0-ga6d8ffae09 1.12.0

and there is some semantic incompatibility although they point to the same version. Did you ever encounter a problem like this?

split_by_length_of_utterances is no longer available in audiomate

...and therefore results in:
AttributeError: 'Splitter' object has no attribute 'split_by_length_of_utterances

Possible fix could be replacing it with split. That worked for me at least.

Correction for deepspeech-german/pre-processing/prepare_data.py would be:
splits = splitter.split(proportions={'train': 0.7, 'dev': 0.15, 'test': 0.15}, separate_issuers=True)

For which platform has the code been developed?

Hello,

I’m trying to execute test_model.sh but keep running into version problems. The scripts called by test_model.sh keep trying to use a quite old version of ds_ctcdecoder which is not installable on any of the platforms I’ve tried so far. Using a newer version of ds_ctcdecoder leads to one error after another.
Using DeepSpeech 0.6.0 instead of the default 0.5.0 also leads to errors.

Could you tell me on which platform it is possible to execute this code? I have tried Ubuntu 18.04, a tensorflow 1.13.1 docker container, and a deepspeech docker container by Mozilla so far, all to no avail.

Running the rest of the code in the README works fine using a conda environment with tensorflow-gpu 1.13.1.
The problems only occur when I try to run test_model.sh.

Thank you for your help!

Infos about new 0.7.4 model

Thanks for sharing your model.


I have some questions I would like to ask:

  1. Could you please give us some infos about the model performance (test loss/WER, test datasets)?
  2. Would it be possible to upload the checkpoint files too?
  3. Did you train without augmentations (I couldn't see them in the flags.txt file) and is there a reason why?
  4. Why did you change your alphabet to include numbers this time? Did it improve performance?

Scorer file for 0.9 release

Hello,

Im relatively new to DeepSpeech, Im trying to replace the online speech recognition I use at the moment with DeepSpeech, but the results I`m getting at the moment are very far away for (for example Azure Speech Recognition). Perhaps I´m feeding the data in the wrong format or something like that.

What Ive learned so far is that there "could be" a language dependent scorer file, but I cant find one for the 0.9 version. Can you provide that file? Or is this not required?

Can you provide some information about the required input? At the moment I`m using wav-data at 16000 samples per seconds, 16 bit sample size and one channel... is that correct?

Thank you very much for your help

Carl

Dependency conflict with numpy

When executing the command
pip3 install -r python_requirements.txt
I get the following message:

`ERROR: Cannot install -r python_requirements.txt (line 12), -r python_requirements.txt (line 4), -r python_requirements.txt (line 47), -r python_requirements.txt (line 54), -r python_requirements.txt (line 56), h5py==2.10.0, keras-preprocessing==1.1.2, librosa==0.7.2, numba==0.49.1, numpy==1.18.1, opt-einsum==3.3.0, resampy==0.2.2 and scipy==1.4.1 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested numpy==1.18.1
audiomate 6.0.0 depends on numpy==1.18.1
ds-ctcdecoder 0.9.3 depends on numpy>=1.14.5
h5py 2.10.0 depends on numpy>=1.7
keras-preprocessing 1.1.2 depends on numpy>=1.9.1
librosa 0.7.2 depends on numpy>=1.15.0
numba 0.49.1 depends on numpy>=1.15
opt-einsum 3.3.0 depends on numpy>=1.7
resampy 0.2.2 depends on numpy>=1.10
scikit-learn 0.24.0 depends on numpy>=1.13.3
scipy 1.4.1 depends on numpy>=1.13.3
tensorboard 2.4.0 depends on numpy>=1.12.0
tensorflow 2.4.0 depends on numpy~=1.19.2

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies`

So, audiomate 6.0.00 requires numpy==1.18.1 and tensorflow 2.4.0 depends on numpy~=1.19.2.

I tried installing
pip install audiomate tensorflow
this succeeds with tensorflow 2.3.2 being installed. But I don't know whether this would work with deepspeech-german .

Any suggestions for resolving this ?

Newer models?

Hey,

first of all: Many thanks for all the efforts from everyone until now.

I'm not that good with all the command line stuff but I have powerful servers for rendering with blender.
Would there be the option for getting a really simple instruction for creating / training the German model?
(Preferable a shell script for just starting on a Linux server?)

Many thanks
Robin

Best Hyperparameters for version 0.9.0

Dear @AASHISHAG ,

thanks a lot for this awesome repository. I am currently trying to export the version 0.9.0 model to the openVINO toolkit.

So far, with the help of the openVINO support, I was able to convert the tensorflow model to the optimized format. It's running with a demo of the open_vino_model_zoo, but the results are far behind to those I get with the mozilla deepspeech example.
A part of the problem I think are the alpha and beta parameters for the language model. How do I find the best values? So far I have seen the best results by setting both to 0. There are values in the flags.txt file in your google drive, but using those there isn't any output at all.
I don't know if this is of any interest to you or this repository, but I would be very thankful if you could help me out or point me in the right direction.

Greetings
Orys

ERROR: Cannot install -r python_requirements.txt

While installing when using pip3 install -r python_requirements.txt I get:

ERROR: Cannot install -r python_requirements.txt (line 12), -r python_requirements.txt (line 20), -r python_requirements.txt (line 25), -r python_requirements.txt (line 26), -r python_requirements.txt (line 31), -r python_requirements.txt (line 34), -r python_requirements.txt (line 4), -r python_requirements.txt (line 45), -r python_requirements.txt (line 47), -r python_requirements.txt (line 48), -r python_requirements.txt (line 54), -r python_requirements.txt (line 56) and numpy==1.18.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy==1.18.1
    audiomate 6.0.0 depends on numpy==1.18.1
    ds-ctcdecoder 0.9.3 depends on numpy>=1.14.5
    h5py 2.10.0 depends on numpy>=1.7
    keras-preprocessing 1.1.2 depends on numpy>=1.9.1
    librosa 0.7.2 depends on numpy>=1.15.0
    numba 0.49.1 depends on numpy>=1.15
    opt-einsum 3.3.0 depends on numpy>=1.7
    resampy 0.2.2 depends on numpy>=1.10
    scikit-learn 0.24.0 depends on numpy>=1.13.3
    scipy 1.4.1 depends on numpy>=1.13.3
    tensorboard 2.4.0 depends on numpy>=1.12.0
    tensorflow 2.4.0 depends on numpy~=1.19.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Pbmm and scorer file

Hi,

I simply looking for a pbmm and scorer file for the German translation, it drives me crazy because I could found it in french https://github.com/Common-Voice/commonvoice-fr/releases, it's working like a charm but I don't find the German equivalent.

My poor computing power and network bandwith (to download CV datasets for example) don't allow me to train my model and honestly, I don't want to (lazy like a good IT guy :)

Someone probably did upload those files somewhere already, any information would be appriciated !

Cheers :)

Loading pretrained model

Hi, I want to use your pretrained model for German. I am not really sure how to use it on windows. And is it going to work on a normal computer?

SWC + M-AILABS Corpora

Have you already tried to improve the german model with these both datasets?
I have seen that it was an option in pre-processing ... I am very curious about wether this could improve the model, but I do not have the computing-power and "in-deph-model-development-knowlegde". Here is some Kaldi-based example where this dataset combination worked very well: https://github.com/uhh-lt/kaldi-tuda-de

Looking forward to hear from you!

All the best
Fabian

Add WER from most recent model

Hi @AASHISHAG, and thanks for open-sourcing your work!

I'm wondering if you might have some performance stats (e.g. WER and CER) for more recent versions of the model (even just the most recent v0.9.0) would be good.

Thanks!

create_language_model.sh : HTTP Error 404 (downloading ds native client)

building the language model create_language_model.sh ended up in an HTTP 404 Error

Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.3.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
  File "DeepSpeech/util/taskcluster.py", line 12, in <module>
    dsu_taskcluster.main()
  File "/var/share/deepspeech-german/DeepSpeech/training/deepspeech_training/util/taskcluster.py", line 128, in main
    maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
  File "/var/share/deepspeech-german/DeepSpeech/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc
    _, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/eike/.pyenv/versions/3.7.9/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Need a hint where to head for a solution

coqui

The deepspeech project seems to be not maintained anymore.

The fork coqui done by Mozillians being from the original team is maintained. It migth be fully compatible. Is it possible to re-run with your AI power against this new project?

WER results for newer models

Hi,
first of all - thank you for your work. Are there any WER results for the newer models aviable, that you released after the initial paper?

Best Regards,
Oliver

Link to CommonVoice 1035h dataset

Hi, during past months there was a massive growth in german language dataset at Mozilla voice. Currently there is 1035hrs so it might be worth to update the link at the readme.md?

Qualität der Erkennung

Bzgl. transcribe.py
Kann man bei der Erkennung auch die Selbstsicherheit, also confidence, der Erkennung angeben, damit ich die mit niedriger Ss Erkannten gezielt überprüfen kann?

ModuleNotFoundError: No module named 'down'

When I follow all the steps in the READMe and get to python pre-processing/download_speech_corpus.py --tuda --cv --swc --voxforge --mailabs I get this error:

Traceback (most recent call last):
  File "pre-processing/download_speech_corpus.py", line 6, in <module>
    import audiomate.corpus.io
  File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/corpus/io/__init__.py", line 7, in <module>
    from .downloader import ArchiveDownloader
  File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/corpus/io/downloader.py", line 7, in <module>
    from audiomate.utils import download
  File "/deepspeech-german/python-environments/lib/python3.7/site-packages/audiomate/utils/download.py", line 11, in <module>
    from pget.down import Downloader
  File "/deepspeech-german/python-environments/lib/python3.7/site-packages/pget/__init__.py", line 1, in <module>
    from down import Downloader
ModuleNotFoundError: No module named 'down'

loading output_graph.tflite deepspeech-tflite result in segmentation fault

When loading the tflite version in deepspeech-tflite version 0.9.2 and also 0.9.0 I get an segmentation fault SIGSEGV. Can this be replicated?

from deepspeech import Model

model = 'output_graph.tflite'
ds = Model(model)

What could be the issue? I do not see any stacktrace or useful information how to find the source of the problem.

ImportError: No module named deepspeech_training.util

Following your guidelines I encountered following error when trying to build the language model:
./create_language_model.sh

Training package is not installed. See training documentation.
...
Traceback (most recent call last):
  File "DeepSpeech/util/taskcluster.py", line 7, in <module>
    from deepspeech_training.util import taskcluster as dsu_taskcluster
ImportError: No module named deepspeech_training.util

... but the requested file seem to be in the right place ?!

DeepSpeech German for .NETCore

I am using version 0.10.0-alpha.3 nuget [libdeepspeech.so], deepspeech-0.9.3-models.pbmm, and arctic_a0024.wav

I am able to get the right result.

However, when I use output_graph_de.pbmm, which I understand from polygot for 0.7 version. I could not get result.

Please advice.

  • what should be the output parameter for the German.wav file.
  • Please provide a sample German.wav that will definitely work.
  • Please distribute the 0.10 version in pbmm

Thank you

Your Tuda Results

I think i found the reason why my Tuda results are much worse than the results in your paper. It's because of the random splitting. I switched to the splitting suggested by Tuda some time ago, for better comparability with some other papers.

As you write in your paper, each tuda sample was recorded with 4 different microphones. If you split the data random, instead of the suggested splitting, you will get a lot of test samples where a sample with the same transcription is already in the training dataset, only recorded with a different microphone.

Training with random split resulted in a WER of 0.09, while training with the same parameters but with correct splitting only has 0.42 WER.

I think this also has a strong influence on all of the other trainings with combined datasets including Tuda.

Issue using pre-trained german model

Not entirely sure if I am doing things right, but,

  • installed deepspeech v0.6.1 (pip3 install deepspeech==0.6.1)
  • downloaded pre-trained models for 0.6.1 provided by this repo
  • ran deepspeech with parameters & input audio

Result:

$ deepspeech --model ../model/0.6/output_graph.pb --lm ../model/0.6/lm.binary --trie ../model/0.6/trie --audio ../../test.wav
Loading model from file ../model/0.6/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
ERROR: Model provided has model identifier 'inpu', should be 'TFL3'

Error at reading model file ../model/0.6/output_graph.pb
Traceback (most recent call last):
  File "/home/pi/deepspeech/deepspeech-6-venv/venv/bin/deepspeech", line 10, in <module>
    sys.exit(main())
  File "/home/pi/deepspeech/deepspeech-6-venv/venv/lib/python3.7/site-packages/deepspeech/client.py", line 113, in main
    ds = Model(args.model, args.beam_width)
  File "/home/pi/deepspeech/deepspeech-6-venv/venv/lib/python3.7/site-packages/deepspeech/__init__.py", line 42, in __init__
    raise RuntimeError("CreateModel failed with error code {}".format(status))
RuntimeError: CreateModel failed with error code 12288

Not so sure what is going wrong here. Any advice?

Missing license

Could you please add a license to your project?

I would be very glad if you choose one compatible to my LGPL (I changed it recently).

Try to recognize a german word

Hi, first of all thank you so much for providing this awesome project!

I've installed deepspeach 0.5.0 and run this command:

deepspeech --model model-de/model_tuda+voxforge+mozilla.pb --alphabet model-de/alphabet.txt  --lm model-de/lm.binary --trie model-de/trie --audio weihnachtsmann.wav

The wav file I've generated from an mp3 which I've got from dict.cc via curl:

curl 'https://audio.dict.cc/speak.audio.v2.php?error_as_text=1&type=mp3&id=91424&lang=rec&lp=DEEN' > weihnachtsmann.mp3

then converted with ffmpeg via

ffmpeg -i weihnachtsmann.mp3 -acodec pcm_s16le -ac 1 -ar 16000  weihnachtsmann.wav

and the output is this: nach man

I've tried another wave for the same word:

curl 'https://audio.dict.cc/speak.audio.v2.php?error_as_text=1&type=mp3&id=655858&lang=de&lp=DEEN&voice=Hans' > weihnachtsmann.mp3

And the result was reinach zum.

I was expecting the word is very simple but the result is not really usable. Is the model not trained good enough? I was reading in the paper that the error rate was quite low. So I wondering why this simple word cannot be recognized?

Maybe it is converted with ffmpeg with wrong settings?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.