Giter VIP home page Giter VIP logo

notebook's Introduction

ESPnet Notebooks

Demo

ASR (Speech recognition)

SE (Speech enhancement/separation)

SLU (Spoken language understanding)

TTS (Text-to-speech)

Other utilities

ESPnet-EZ

ASR (Speech recognition)

ST (Speech-to-text translation)

  • integrate_huggingface.ipynb: Integrating the weakly-supervised model (OWSM) and huggingface's pre-trained language model with ESPnet-EZ on MuST-C-v2.
  • ST_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on MuST-C-v2.

SLU (Spoken language understanding)

Course

CMU SpeechProcessing Spring2023

CMU SpeechRecognition Fall2022

CMU SpeechRecognition Fall2021

ESPnet1 (Legacy)

notebook's People

Contributors

amrzv avatar d-keqi avatar danberrebbi avatar ftshijt avatar hanseokhyeon avatar juice500ml avatar kan-bayashi avatar lichenda avatar masao-someki avatar shigekikarita avatar siddhu001 avatar sw005320 avatar syzygianinfern0 avatar tayciryahmed avatar wwwehr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

notebook's Issues

ASR training using ESPnet2 library calls

Hi - I am looking for an example notebook where I want to train an ASR on a dataset such as TIMIT using ESPnet2 library calls. The data preparation is required to be done separately in Python (not using recipes) for 'sound' or 'npy' (not Kaldi style) as would be required. Any pointer on the training part would be helpful./Tirthankar.

Here is my experiment but this is giving error on epoch 1 training.
timit_train_espnet2.md

Using spembs with VITS demo

I am using the demo of pre-trained multispeaker VITS given here, but am not able to use speaker embeddings because the variable 'text2speech.use_spembs' is set to FALSE, and use_sids = TRUE when building the model from file. If I try to set the flags by inserting a line in the demo script or in the text2speech init function, I get:
AttributeError: can't set attribute 'use_spembs'

It does seem like the text2speech will be able to use either input to generate speech, if the flags are set appropriately. Is there a way to change these variables so I can use my own spembs?

https://github.com/espnet/notebook/blob/master/espnet2_tts_realtime_demo.ipynb

Train From Scratch

Sir, may you give a tutorial from basic to non-english dataset? Thank you

TTS Multispeaker Model Demo on ESPnet2

Hi - In selecting my own reference speech (Not speakers from list of X-Vectors), I would require embedding my own selected speech. How do I get this embedding which is used as one of the inputs to text2speech call (spembs)? Thanks for any pointer or help on this./Tirthankar.

Issues During Installation

Hi,

ESPnet is really an impressive toolkit, and I am trying to run the asr_library.ipynb on colab. However, during installation, I run into this problem:

tar: ./ubuntu16-featbin.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
cp: cannot stat 'featbin/*': No such file or directory

Then when I try to run the run.sh, I got this problem:

steps/make_fbank_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: 8 / 8 failed, log is in exp/make_fbank/test/make_fbank_pitch_test.*.log

I also tried to use the installation code from this tutorial, then I can successfully run the run.sh. But I got this error ModuleNotFoundError: No module named 'espnet.utils.training' when I run from espnet.utils.training.batchfy import make_batchset.

I would really appreciate if anyone could help me with these issues!

colab fatal: reference is not a tree

In the notebook espnet2_new_task_tutorial_CMU_11751_18781_Fall2022.ipynb

!git clone --depth 5 -b 2022fall_new_task_tutorial https://github.com/espnet/espnet

# We use a specific commit just for reproducibility.
%cd /content/espnet
!git checkout 9cff98a78ceaa4d85843be0a50b369ec826b27f6

output:
Cloning into 'espnet'...
remote: Enumerating objects: 5496, done.
remote: Counting objects: 100% (5496/5496), done.
remote: Compressing objects: 100% (3863/3863), done.
remote: Total 5496 (delta 1794), reused 3294 (delta 981), pack-reused 0
Receiving objects: 100% (5496/5496), 6.83 MiB | 22.72 MiB/s, done.
Resolving deltas: 100% (1794/1794), done.
/content/espnet
fatal: reference is not a tree: 9cff98a78ceaa4d85843be0a50b369ec826b27f6

ASR Recipe notebook throws error

Upon the execution of

import json
import torch
import argparse
from espnet.bin.asr_recog import get_parser
from espnet.nets.pytorch_backend.e2e_asr import E2E

root = "espnet/egs/an4/asr1"
model_dir = root + "/exp/train_nodev_pytorch_train_mtlalpha1.0/results"

# load model
with open(model_dir + "/model.json", "r") as f:
  idim, odim, conf = json.load(f)
model = E2E(idim, odim, argparse.Namespace(**conf))
model.load_state_dict(torch.load(model_dir + "/model.loss.best"))
model.cpu().eval()

# recognize speech
parser = get_parser()
args = parser.parse_args(["--beam-size", "2", "--ctc-weight", "1.0", "--result-label", "out.json", "--model", ""])
result = model.recognize(fbank, args, token_list)
s = "".join(conf["char_list"][y] for y in result[0]["yseq"]).replace("<eos>", "").replace("<space>", " ").replace("<blank>", "")

print("groundtruth:", info["output"][0]["text"])
print("prediction: ", s)

this error is thrown.

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-0ef82d76a99d> in <module>
     18 parser = get_parser()
     19 args = parser.parse_args(["--beam-size", "2", "--ctc-weight", "1.0", "--result-label", "out.json", "--model", ""])
---> 20 result = model.recognize(fbank, args, token_list)
     21 s = "".join(conf["char_list"][y] for y in result[0]["yseq"]).replace("<eos>", "").replace("<space>", " ").replace("<blank>", "")
     22 

NameError: name 'token_list' is not defined

Expected behavior, the cell must execute without throwing an Exception.

AttributeError: module 'regex' has no attribute 'Pattern'

AttributeError Traceback (most recent call last)
in ()
----> 1 from espnet2.bin.tts_inference import Text2Speech
2 from espnet2.utils.types import str_or_none
3
4 text2speech = Text2Speech.from_pretrained(
5 model_tag=str_or_none(tag),

9 frames
/usr/local/lib/python3.7/dist-packages/nltk/tokenize/casual.py in TweetTokenizer()
366
367 @Property
--> 368 def PHONE_WORD_RE(self) -> regex.Pattern:
369 """Secondary core TweetTokenizer regex"""
370 # Compiles the regex for this and all future instantiations of TweetTokenizer.

AttributeError: module 'regex' has no attribute 'Pattern'

The issuses is st_demo.ipynb

can you tell me about"git checkout c0466d9a356c1a33f671a546426d7bc33b5b17e8".what is "c0466d9a356c1a33f671a546426d7bc33b5b17e8"

vocoders' links in espnet2_tts_realtime_demo.ipynb

Hi~

I tried to implement the demo for Japanese.

But I found the vocoders,
"jsut_multi_band_melgan.v2", "jsut_style_melgan.v1", "jsut_hifigan.v1",
can not work now. (on colab)

Error messages as below

Access denied with the following error:

Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

 https://drive.google.com/uc?id=1vdgqTu9YKyGMCn-G7H2fI6UBC_4_55XB 

FileNotFoundError Traceback (most recent call last)
in ()
18 # Only for VITS
19 noise_scale=0.333,
---> 20 noise_scale_dur=0.333,
21 )

4 frames
/usr/local/lib/python3.7/dist-packages/espnet2/bin/tts_inference.py in from_pretrained(model_tag, vocoder_tag, **kwargs)
301 )
302 vocoder_tag = vocoder_tag.replace("parallel_wavegan/", "")
--> 303 vocoder_file = download_pretrained_model(vocoder_tag)
304 vocoder_config = Path(vocoder_file).parent / "config.yml"
305 kwargs.update(vocoder_config=vocoder_config, vocoder_file=vocoder_file)

/usr/local/lib/python3.7/dist-packages/parallel_wavegan/utils/utils.py in download_pretrained_model(tag, download_dir)
385 f"https://drive.google.com/uc?id={id_}", output_path, quiet=False
386 )
--> 387 with tarfile.open(output_path, "r:*") as tar:
388 for member in tar.getmembers():
389 if member.isreg():

/usr/lib/python3.7/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1573 saved_pos = fileobj.tell()
1574 try:
-> 1575 return func(name, "r", fileobj, **kwargs)
1576 except (ReadError, CompressionError):
1577 if fileobj is not None:

/usr/lib/python3.7/tarfile.py in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
1637
1638 try:
-> 1639 fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
1640 except OSError:
1641 if fileobj is not None and mode == 'r':

/usr/lib/python3.7/gzip.py in init(self, filename, mode, compresslevel, fileobj, mtime)
166 mode += 'b'
167 if fileobj is None:
--> 168 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
169 if filename is None:
170 filename = getattr(fileobj, 'name', '')

FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/parallel_wavegan/jsut_hifigan.v1.tar.gz'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.