Giter VIP home page Giter VIP logo

Comments (26)

nanosonde avatar nanosonde commented on August 12, 2024 2

It would be really nice if you could provide a pretrained model that is compatible with DeepSpeech-0.6.0 that was released a few days ago. The authors state that models trained with older versions are not compatible.

I would do it on my own if I had enough computing power. 😔

from deepspeech-german.

LarsScha avatar LarsScha commented on August 12, 2024 1

A pretrained model compatible with DeepSpeech-0.6.0 would indeed be great.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024 1

@DanBmh : Thank you for putting up the numbers.

We had split the training data into 10 sub-sets and then trained the model on each sub-set, repeatedly over 10 cycles. Example:
cycle 1: subset 1
cycle 2: subset 1 + subset 2
cycle 3: subset 1 + subset 2 + subset 3
so on ....

You might like to refer section 4.1 "Influence of Training Size" from our paper for details. It explains the training strategy.

We trained on Tuda-De (5 subsets) followed by Voxforge (5 subsets).

Regarding the WER numbers. The numbers depend on the data splits. Reshuffle the dataset and do a Train, Dev and Test split, you would surely get different WER. The numbers would be different for different splits. We are currently writing a paper to discuss these challenges and training a more robust model.

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024 1

Thanks for the code. I tested it but for me it does not filter any additional files.
Note that i filtered some files before using different metrics like the file length and chars per second rate. Without this prefiltering i get an Error after some time, i think this is because of some invalid/empty file. Tested on tuda train + dev dataset.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024 1

@SantosSi @nanosonde @LarsScha @photoszzt : We have released v0.6.0. You can find the link in the ReadMe.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

Added in the todo list.

You call still use v0.5.0 as there is no major model architecture difference between these two versions, apart from performance and some added features.

from deepspeech-german.

SantosSi avatar SantosSi commented on August 12, 2024

Trying your v0.5.0 model with stock deepspeech 0.6.0 as provided through pip3 I get the following error message:

Not found: Op type not registered 'VariableV2' in binary running on MyMachine. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Traceback (most recent call last):
File "/home/user/.local/bin/deepspeech", line 10, in
sys.exit(main())
File "/home/user/.local/lib/python3.7/site-packages/deepspeech/client.py", line 113, in main
ds = Model(args.model, args.beam_width)
File "/home/user/.local/lib/python3.7/site-packages/deepspeech/init.py", line 42, in init
RuntimeError: CreateModel failed with error code 12294

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

You should install v0.5.0 of deepspeech using pip3.

v0.5.0 model won't be compatible with v0.6.0

from deepspeech-german.

SantosSi avatar SantosSi commented on August 12, 2024

Then I misunderstood your answer:

You call still use v0.5.0 as there is no major model architecture difference between these two versions

to @nanosonde when asking for a:

pretrained model that is compatible with DeepSpeech-0.6.0

You seem to relate to the software release version, not the model version. That was not completely clear.

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

I made a fork using the new DeepSpeech version. See: https://github.com/DanBmh/deepspeech-german

But for now only the voxforge dataset is working, i have some problems with tuda in validation step and did not test the other datasets (common-voice, swc, mailabs) yet.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

@DanBmh I didn't face this issue with v0.5.0, but I had read about this error before, which you have mentioned in your ReadMe, and most of the people were able to resolve it by setting ignore_longer_outputs_than_inputs=True in DeepSpeech.py.

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

I did that already. Training and testing for tuda is working, but the validation after each epoch always breaks with a segmentation fault error after some steps.

from deepspeech-german.

Flowr-es avatar Flowr-es commented on August 12, 2024

@DanBmh thanks for the fork!
Would you also consider to upload a builded model to your fork?

Otherwise I will try to build it during the weekend on my own :-)

If I can choose I would love to test deppespeech with this german model: Tuda-De+Voxforge

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

Now i got tuda to work, but my results are much worse than in your paper.
With my best tuda only version i get WER: 0.41, with tuda+voxforge i only get WER: 0.65.

did you train tuda+voxforge with combined datasets or first tuda then voxforge?

@Mexxxo if i have good results, i will try to upload them. But i think i dont get them before the weekend. If you train it yourself please post the scores you got.

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

I now tried your stepped training with 10 steps for tuda (without voxforge) but this is not working for me somehow.
The network is not learning much and getting worse after cycle 5 (WER 0.684).

Also i found out there are many dataset errors in tuda, resulting in infinite loss if training from English checkpoint. Did you encounter this too? I solved it by excluding those files, but i think there are still some files with errors.

from deepspeech-german.

photoszzt avatar photoszzt commented on August 12, 2024

Any update on model compatible with DeepSpeech 0.6?

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

Also i found out there are many dataset errors in tuda, resulting in infinite loss if training from English checkpoint. Did you encounter this too? I solved it by excluding those files, but i think there are still some files with errors.

@DanBmh : I remember I did a check to find erroneous files for all the dataset:

  1. Used SOX to read all the files. This way I could remove the corrupted files. (I don't have the code handy now.)
  2. Checked for wav's length greater than the transcript. (I can't recollect if the code gave an erroneous file.)
    Code: find_erroneous_files.py

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

@photoszzt : We are currently writing a paper to discuss the challenges and training a more robust model using DeepSpeech.
We should be able to release the model with new datasets and updated Mozilla release till June/July.

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

I now uploaded a checkpoint of one of my models. I did use the master branch of DeepSpeech, the version should be v0.7.0a2. It has a WER of 0.19, tested with Tuda + CommonVoice dataset. You can find the model files here: https://github.com/DanBmh/deepspeech-german#language-model-and-checkpoints

@AASHISHAG I did run a test with your uploaded checkpoints and my test dataset. I only reached 0.68 WER with both datasets and 0.79 with tuda only. From which training are your checkpoints, the Tuda-De+Voxforge+Mozilla run? I know my testset is larger and the data is not the same, but shouldn't the difference be smaller? You can find the full results and the instructions I used here.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

@DanBmh This could because of 2 reasons:

  1. Dataset size. We used ~300h of data. When compared to ~600h you used, The model you trained learned better.
  2. The random data splits. We are aware of these issues and have submitted a paper on it. I will share it as soon it's accepted.

from deepspeech-german.

 avatar commented on August 12, 2024

@AASHISHAG thanks for the new version! do you have any performance data of the new model available yet?

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

@sebastiantilman : Unfortunately not, but this model should be more robust than the previous release, as it is trained on ~4 times the previous data.

from deepspeech-german.

tanujjain avatar tanujjain commented on August 12, 2024

hi,

I tested out the model performance (for version v0.6.0) on the test set of common voice dataset which amounted to about 18 hours of audio after data preparation. The WER I got was 31.65% and the CER was 16.05%.

The quality of transcriptions in general is much better than the previous version of the model (haven't evaluated WER/CER on the old model version), but I still feel that the WER for this model is too high.

To prepare the dataset, I downloaded the dataset from the common voice website and followed the instructions in your Readme.

@AASHISHAG Do the reported numbers make sense to you?

from deepspeech-german.

erksch avatar erksch commented on August 12, 2024

@DanBmh that you have a 0.7 models is awesome! But I can't find the files on your fork. The links at the bottom do not seem to work. Can you link the .pbmm and .scorer files? :)
PS: Could you open your fork for issues?

from deepspeech-german.

DanBmh avatar DanBmh commented on August 12, 2024

@erksch Updated the links. Only the checkpoints are not yet uploaded again.

from deepspeech-german.

AASHISHAG avatar AASHISHAG commented on August 12, 2024

@ALL
We would release version v0.7.0 soon.

I am closing this issue since it relates to old v0.6.0 release.

from deepspeech-german.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.