espnet / notebook Goto Github PK

View Code? Open in Web Editor NEW

57.0 7.0 39.0 17.83 MB

Jupyter Notebook 100.00%

notebook's Introduction

ESPnet Notebooks

Demo

ASR (Speech recognition)

asr_realtime_demo.ipynb: ASR realtime inference with various pre-trained models.
asr_transfer_learning_demo.ipynb: Demo on how to use pre-trained ASR models for fine-tuning.
streaming_asr_demo.ipynb: Streaming ASR realtime inference with pre-trained models.

SE (Speech enhancement/separation)

se_demo.ipynb: Speech enhancement/separation inference with various pre-trained models.
se_demo_for_waspaa_2021.ipynb: WASPAA2021 version of ESPnet-SE demo.

SLU (Spoken language understanding)

2pass_slu_demo.ipynb: Two pass spoken language understanding pre-trained model examples.

TTS (Text-to-speech)

tts_realtime_demo.ipynb: TTS realtime inference with various pre-trained models.

Other utilities

onnx_conversion_demo.ipynb: How to convert ESPnet models into ONNX format.

ESPnet-EZ

ASR (Speech recognition)

train_from_scratch.ipynb: Training an ASR model with ESPnet-EZ on LibriSpeech-100.
ASR_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on custom dataset.

ST (Speech-to-text translation)

integrate_huggingface.ipynb: Integrating the weakly-supervised model (OWSM) and huggingface's pre-trained language model with ESPnet-EZ on MuST-C-v2.
ST_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on MuST-C-v2.

SLU (Spoken language understanding)

SLU_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on SLURP.

Course

CMU SpeechProcessing Spring2023

assignment0_data-prep.ipynb: Course assignment on how to prepare ESPnet-format data.
assignment1_espnet-tutorial.ipynb: A simplified version of previous year's new task tutorial.
assignemnt3_spk.ipynb: Examples of using ESPnet to extract speaker embeddings and conduct speaker recognition.
assignment4_ssl.ipynb: Exploration on using self-supervised speech representation to ESPnet ASR training.
assignment5_st.ipynb: Examples of state-of-the-art speech translation models in ESPnet.
assignment6_slu.ipynb: Examples of state-of-the-art spoken language understanding models in ESPnet.
assignment7_se.ipynb: Examples of state-of-the-art speech enhancement/separation in ESPnet.
assignment8_tts.ipynb: A student version of espnet2-tts realtime demonstration.
s2st_demo.ipynb: An example of existing speech-to-speech translation model for ESPnet.

CMU SpeechRecognition Fall2022

recipe_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes (with new functions).
new_task_tutorial.ipynb: A tutorial on how to add new models/tasks to ESPnet framework.

CMU SpeechRecognition Fall2021

general_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes.

ESPnet1 (Legacy)

asr_library.ipynb: Speech recognition library explanation with network training.
asr_recipe.ipynb: Speech recognition recipe explanation.
pretrained.ipynb: Tutorial on how to use pre-trained models.
st_demo.ipynb: Speech translation demonstration with a TTS model to achieve speech-to-speech translation.
tts_realtime_demo.ipynb: TTS demonstration with different pre-trained TTS models.
tts_recipe.ipynb: Stage explanation for TTS recipes.

notebook's People

Contributors

Stargazers

Watchers

notebook's Issues

ASR training using ESPnet2 library calls

Hi - I am looking for an example notebook where I want to train an ASR on a dataset such as TIMIT using ESPnet2 library calls. The data preparation is required to be done separately in Python (not using recipes) for 'sound' or 'npy' (not Kaldi style) as would be required. Any pointer on the training part would be helpful./Tirthankar.

Here is my experiment but this is giving error on epoch 1 training.
timit_train_espnet2.md

Using spembs with VITS demo

I am using the demo of pre-trained multispeaker VITS given here, but am not able to use speaker embeddings because the variable 'text2speech.use_spembs' is set to FALSE, and use_sids = TRUE when building the model from file. If I try to set the flags by inserting a line in the demo script or in the text2speech init function, I get:
AttributeError: can't set attribute 'use_spembs'

It does seem like the text2speech will be able to use either input to generate speech, if the flags are set appropriately. Is there a way to change these variables so I can use my own spembs?

https://github.com/espnet/notebook/blob/master/espnet2_tts_realtime_demo.ipynb

Train From Scratch

Sir, may you give a tutorial from basic to non-english dataset? Thank you

TTS Multispeaker Model Demo on ESPnet2

Hi - In selecting my own reference speech (Not speakers from list of X-Vectors), I would require embedding my own selected speech. How do I get this embedding which is used as one of the inputs to text2speech call (spembs)? Thanks for any pointer or help on this./Tirthankar.

Issues During Installation

Hi,

ESPnet is really an impressive toolkit, and I am trying to run the asr_library.ipynb on colab. However, during installation, I run into this problem:

tar: ./ubuntu16-featbin.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
cp: cannot stat 'featbin/*': No such file or directory

Then when I try to run the run.sh, I got this problem:

steps/make_fbank_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: 8 / 8 failed, log is in exp/make_fbank/test/make_fbank_pitch_test.*.log

I also tried to use the installation code from this tutorial, then I can successfully run the run.sh. But I got this error ModuleNotFoundError: No module named 'espnet.utils.training' when I run from espnet.utils.training.batchfy import make_batchset.

I would really appreciate if anyone could help me with these issues!

colab fatal: reference is not a tree

In the notebook espnet2_new_task_tutorial_CMU_11751_18781_Fall2022.ipynb

!git clone --depth 5 -b 2022fall_new_task_tutorial https://github.com/espnet/espnet

# We use a specific commit just for reproducibility.
%cd /content/espnet
!git checkout 9cff98a78ceaa4d85843be0a50b369ec826b27f6

output:
Cloning into 'espnet'...
remote: Enumerating objects: 5496, done.
remote: Counting objects: 100% (5496/5496), done.
remote: Compressing objects: 100% (3863/3863), done.
remote: Total 5496 (delta 1794), reused 3294 (delta 981), pack-reused 0
Receiving objects: 100% (5496/5496), 6.83 MiB | 22.72 MiB/s, done.
Resolving deltas: 100% (1794/1794), done.
/content/espnet
fatal: reference is not a tree: 9cff98a78ceaa4d85843be0a50b369ec826b27f6

ASR Recipe notebook throws error

Upon the execution of

import json
import torch
import argparse
from espnet.bin.asr_recog import get_parser
from espnet.nets.pytorch_backend.e2e_asr import E2E

root = "espnet/egs/an4/asr1"
model_dir = root + "/exp/train_nodev_pytorch_train_mtlalpha1.0/results"

# load model
with open(model_dir + "/model.json", "r") as f:
  idim, odim, conf = json.load(f)
model = E2E(idim, odim, argparse.Namespace(**conf))
model.load_state_dict(torch.load(model_dir + "/model.loss.best"))
model.cpu().eval()

# recognize speech
parser = get_parser()
args = parser.parse_args(["--beam-size", "2", "--ctc-weight", "1.0", "--result-label", "out.json", "--model", ""])
result = model.recognize(fbank, args, token_list)
s = "".join(conf["char_list"][y] for y in result[0]["yseq"]).replace("<eos>", "").replace("<space>", " ").replace("<blank>", "")

print("groundtruth:", info["output"][0]["text"])
print("prediction: ", s)

this error is thrown.

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-0ef82d76a99d> in <module>
     18 parser = get_parser()
     19 args = parser.parse_args(["--beam-size", "2", "--ctc-weight", "1.0", "--result-label", "out.json", "--model", ""])
---> 20 result = model.recognize(fbank, args, token_list)
     21 s = "".join(conf["char_list"][y] for y in result[0]["yseq"]).replace("<eos>", "").replace("<space>", " ").replace("<blank>", "")
     22 

NameError: name 'token_list' is not defined

Expected behavior, the cell must execute without throwing an Exception.

AttributeError: module 'regex' has no attribute 'Pattern'

AttributeError Traceback (most recent call last)
in ()
----> 1 from espnet2.bin.tts_inference import Text2Speech
2 from espnet2.utils.types import str_or_none
3
4 text2speech = Text2Speech.from_pretrained(
5 model_tag=str_or_none(tag),

9 frames
/usr/local/lib/python3.7/dist-packages/nltk/tokenize/casual.py in TweetTokenizer()
366
367 @Property
--> 368 def PHONE_WORD_RE(self) -> regex.Pattern:
369 """Secondary core TweetTokenizer regex"""
370 # Compiles the regex for this and all future instantiations of TweetTokenizer.

AttributeError: module 'regex' has no attribute 'Pattern'

The issuses is st_demo.ipynb

can you tell me about"git checkout c0466d9a356c1a33f671a546426d7bc33b5b17e8".what is "c0466d9a356c1a33f671a546426d7bc33b5b17e8"

vocoders' links in espnet2_tts_realtime_demo.ipynb

Hi～

I tried to implement the demo for Japanese.

But I found the vocoders,
"jsut_multi_band_melgan.v2", "jsut_style_melgan.v1", "jsut_hifigan.v1",
can not work now. (on colab)

Error messages as below

Access denied with the following error:

Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.

You may still be able to access the file from the browser:

 https://drive.google.com/uc?id=1vdgqTu9YKyGMCn-G7H2fI6UBC_4_55XB

FileNotFoundError Traceback (most recent call last)
in ()
18 # Only for VITS
19 noise_scale=0.333,
---> 20 noise_scale_dur=0.333,
21 )

4 frames
/usr/local/lib/python3.7/dist-packages/espnet2/bin/tts_inference.py in from_pretrained(model_tag, vocoder_tag, **kwargs)
301 )
302 vocoder_tag = vocoder_tag.replace("parallel_wavegan/", "")
--> 303 vocoder_file = download_pretrained_model(vocoder_tag)
304 vocoder_config = Path(vocoder_file).parent / "config.yml"
305 kwargs.update(vocoder_config=vocoder_config, vocoder_file=vocoder_file)

/usr/local/lib/python3.7/dist-packages/parallel_wavegan/utils/utils.py in download_pretrained_model(tag, download_dir)
385 f"https://drive.google.com/uc?id={id_}", output_path, quiet=False
386 )
--> 387 with tarfile.open(output_path, "r:*") as tar:
388 for member in tar.getmembers():
389 if member.isreg():

/usr/lib/python3.7/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1573 saved_pos = fileobj.tell()
1574 try:
-> 1575 return func(name, "r", fileobj, **kwargs)
1576 except (ReadError, CompressionError):
1577 if fileobj is not None:

/usr/lib/python3.7/tarfile.py in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
1637
1638 try:
-> 1639 fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
1640 except OSError:
1641 if fileobj is not None and mode == 'r':

/usr/lib/python3.7/gzip.py in init(self, filename, mode, compresslevel, fileobj, mtime)
166 mode += 'b'
167 if fileobj is None:
--> 168 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
169 if filename is None:
170 filename = getattr(fileobj, 'name', '')

FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/parallel_wavegan/jsut_hifigan.v1.tar.gz'

espnet / notebook Goto Github PK

notebook's Introduction

ESPnet Notebooks

Demo

ASR (Speech recognition)

SE (Speech enhancement/separation)

SLU (Spoken language understanding)

TTS (Text-to-speech)

Other utilities

ESPnet-EZ

ASR (Speech recognition)

ST (Speech-to-text translation)

SLU (Spoken language understanding)

Course

CMU SpeechProcessing Spring2023

CMU SpeechRecognition Fall2022

CMU SpeechRecognition Fall2021

ESPnet1 (Legacy)

notebook's People

Contributors

Stargazers

Watchers

Forkers

notebook's Issues

Recommend Projects

Recommend Topics

Recommend Org