openspeech-team / openspeech Goto Github PK

View Code? Open in Web Editor NEW

655.0 17.0 111.0 7.67 MB

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Home Page: https://openspeech-team.github.io/openspeech/

License: MIT License

Shell 0.15% Python 99.85%

asr speech recognition speech-recognition open end-to-end e2e

openspeech's People

Contributors

Stargazers

Watchers

Forkers

soyoungcho upskyy kaki-ai canyouimagine bytecell trendingtechnology shiyuzh2007 teshima058 speechprojects dbcooperptit junekyu kristianspurling dudgns0908 sunshines14 jung-youjin lahiruts saimishra dnfcallan sooftware arabae mrlee5693 hiyoung-asr debasish-mihup wuxiuzhi738 gheyret sciai-ai andyweiqiu cvrepo ma-dan xiexukang suraj6198 jty016 xmpx yongwookha techthiyanes harisgulzar1 xiongjun19 metavai stasiche jcarlosneto koobh piaohe111 jingze98 kedengfeng harikrishna-vydana vdng9338 rxhmdia sangkwonlim-haii 21jun cwpdntm hangj11 wish0728 jinmingche tqslj2 heeseoung hsu-ray nparkstar icodein jinsongpan felipeoliverai flp1990 hacunamatada gurugubelllik rkskekzzz jaedukseo inscite guoyi0 super-wcg gahyunson jieun1128 roytravel lianaling gaoyiyeah normonisping vtodream jiahaoxun party4bread atsushimiyashita317 wh7776 hmpark94 pooya-mohammadi newoneincntk boycechen17 ylwang0425 colorsquare grazder zf223669 soom1017 dataliszt lablazer iq-scm kwangryeolpark m1-an tae0y kimcheongbin devtae krishnendu97 3186355078 dmugzt ashmanpan

openspeech's Issues

Change `log_steps()` => `info()` in `OpenspeechModel`

🚀 Change `log_steps()` => `info()` in `OpenspeechModel`

Current implementation of OpenspeechModel logging

def log_steps(...):
    self.log(f"{stage}_wer", wer)
    self.log(f"{stage}_cer", cer)
    ...

Modified version

def info(self, dictionary: dict) -> None:
    r"""
    Logging information from dictionary.

    Args:
        dictionary (dict): dictionary contains information.
    """
    for key, value in dictionary.items():
        self.log(key, value, prog_bar=True)

Motivation

For scalability

Your contribution

implementation

❓ Questions & Help

Details

현재 세 개의 데이터 이외에도 다른 데이터에 오픈 음성인식 모델들을 쉽게 적용할 수 있는 가이드라인들이 있을까요??

validation checek "NoneType' object has no attribute 'transpose"

안녕하세요 training시작을 했을 때 데이터 경로지정, manifest파일, vocab파일 등을 가지고 실행시켰을때 계속해서

이러한 에러가 발생하는데 도대체 왜그런건지 해결이 안되어서 질문드립니다...
vocab같은 경우는 올려주신 sp.model과 kspon_subword_manifest.txt 사용하고 있으며 GPU는 RTX3090, Pytorch 1.8.1 버전 사용중입니다.

🚀 Add LSTM Language Model

Add LSTM Language Model
Add test file

Motivation

For decoding combinations with language model and acoustic model.

Your contribution

implementation
validation

Environment info

Platform: Windows 10
Python version: 3.7
PyTorch version (GPU?): PyTorch version : 1.8.1, CUDA version : 10.2
Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): Conformer_lstm

The problem arises when using:

Testing started at 오후 5:22 ...
C:\Users\cote\Anaconda3\envs\sc\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition with Anaconda plugin 2019.3.3\plugins\python-ce\helpers\pycharm\_jb_unittest_runner.py" --path C:/Users/cote/PycharmProjects/kospeech2/tests/test_conformer_lstm.py
Launching unittests with arguments python -m unittest C:/Users/cote/PycharmProjects/kospeech2/tests/test_conformer_lstm.py in C:\Users\cote\PycharmProjects\kospeech2\tests


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 56, in test_beam_search
    prediction = model(DUMMY_INPUTS, DUMMY_INPUT_LENGTHS)["predictions"]
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 109, in forward
    return super(ConformerLSTMModel, self).forward(inputs, input_lengths)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 125, in forward
    predictions = self.decoder(encoder_outputs, encoder_output_lengths)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\search\beam_search_lstm.py", line 78, in forward
    step_outputs, hidden_states, attn = self.forward_step(inputs, hidden_states, encoder_outputs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 37, in test_forward
    outputs = model(DUMMY_INPUTS, DUMMY_INPUT_LENGTHS)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 109, in forward
    return super(ConformerLSTMModel, self).forward(inputs, input_lengths)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 130, in forward
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 103, in test_test_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 148, in test_step
    return super(ConformerLSTMModel, self).test_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 225, in test_step
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 71, in test_training_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 122, in training_step
    return super(ConformerLSTMModel, self).training_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 177, in training_step
    target_lengths=target_lengths,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 105, in collect_outputs
    "learning_rate": self.get_lr(),
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_model.py", line 215, in get_lr
    for g in self.optimizer.param_groups:
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 948, in __getattr__
    type(self).__name__, name))
AttributeError: 'ConformerLSTMModel' object has no attribute 'optimizer'


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 87, in test_validation_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 135, in validation_step
    return super(ConformerLSTMModel, self).validation_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 197, in validation_step
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Assertion failed


Ran 5 tests in 24.698s

FAILED (errors=5)

Process finished with exit code 1

Assertion failed

Assertion failed

Information

The graphene label value needs to be corrected.

def generate_grapheme_labels(grapheme_transcripts, vocab_path: str):
    vocab_list = list()
    vocab_freq = list()

    for grapheme_transcript in grapheme_transcripts:
        graphemes = grapheme_transcript.split()
        for grapheme in graphemes:
            if grapheme not in vocab_list:
                vocab_list.append(grapheme)
                vocab_freq.append(1)
            else:
                vocab_freq[vocab_list.index(grapheme)] += 1

    vocab_freq, vocab_list = zip(*sorted(zip(vocab_freq, vocab_list), reverse=True))
    vocab_dict = {
        'id': [0, 1, 2, 3],
        'grpm': ['<pad>', '<sos>', '<eos>', '<blank>'],
        'freq': [0, 0, 0, 0]
    }

    for idx, (grpm, freq) in enumerate(zip(vocab_list, vocab_freq)):
        vocab_dict['id'].append(idx + 3)
        vocab_dict['grpm'].append(grpm)
        vocab_dict['freq'].append(freq)

    label_df = pd.DataFrame(vocab_dict)
    label_df.to_csv(vocab_path, encoding="utf-8", index=False)

Expected behavior

vocab_dict['id'].append(idx + 3) -> vocab_dict['id'].append(idx + 4)

🚀 Feature request

Add metrics.py test file for LibriSpeech, AISHELL-1, KsponSpeech

Your contribution

implementation
validation

🚀 Feature request

Pypi package register

Motivation

Pypi package register

Your contribution

Pypi package register

AttributeError: 'NoneType' object has no attribute 'sos_id'

❓ Questions & Help

초심자여서 기초적인 질문이 존재할 수 있는 점 양해 부탁드립니다..!
kospeech에서 제공해주던 모델 그대로 conformer-large 를 이용하여 학습을 진행하다가 loss nan으로 인해 학습을 멈추고 openspeech를 사용하여 음성인식기를 제작해보려고 했습니다.

먼저, 실행 명령은 readme에 나와있는 ksponspeech 데이터 학습과 동일하게 실행하였습니다.

DATASET_PATH 에는 aihub에서 받은 KsponSpeech_01 ... 05 폴더가 위치해있으며
MANIFEST_FILE_PATH 에는 이메일을 통해 받은 kspon_character_manifest.txt 파일 경로가 저장되어있습니다.
TEST_DATASET_PATH 에는 aihub에서 받은 eval_clean과 eval_other 폴더가 위치해있으며
TEST_MANIFEST_DIR 에는 역시 이메일을 통해 받은 kspon_character_manifest.txt가 위치한 폴더 명을 저장해두었습니다.

먼저, 이 지점에서 질문을 드리고 싶은 것이 각 패스가 의미하는 파일들이 올바르게 저장되어 있는지 여쭤보고 싶습니다.

깃 클론 이후 했던 과정을 전부 나열해드리면

로 인하여
openspeech/callbacks.py 에서

로 수정하였고,
이후 재실행하여
OSError: Character label file (csv format) doesn't exist ../../../aihub_labels.csv
가 발생하였습니다.

때문에 ./opensppech/tokenizers/ksponspeech/character.py 에서

로 절대경로로 설정해주었습니다.
kspon_character_labels.csv 파일은 aihub 승인 스크린샷을 보내드린 후 받은 파일과 동일합니다.
이후 재실행하였더니

위와 같은 결과가 나왔습니다.
여기서 다시 질문을 드리고 싶은 점이, 위에 했던 과정에서 오류를 일으킬 만한 점이 있었는지와
기본 설정된 aihub_labels.csv 파일이 제가 설정해 준 kspon_character_labels.csv 파일이 맞는지,
아니라면 aihub_labels.csv 파일은 무엇이며 어떻게 얻을 수 있는지,

마지막으로 위 과정에서 특별한 문제가 없다면, sos_id attribute에 대한 error는 어떻게 처리해야 하는 지에 관해서 입니다.

다소 글이 길어진 점 죄송합니다. 사소한 것 하나가 크게 영향을 미칠까봐 건드렸던 모든 것을 정리해서 적어보았습니다.
답변 기다리겠습니다!

inference 기능

우선 좋은 프로젝트 개발해주셔서 감사드립니다.

학습 모델에 대하여 오디오 파일을 입력으로 테스트(kospeech의 inference.py) 하려면 어떤 식으로 해야 하는지 알고 싶습니다.

Re-documentation

🚀 Feature request

Re-documentation when releasing v0.2.1

Motivation

Several document updates happened.

Your contribution

test 방법

안녕하세요,
현재 deepspeech2모델로 1에폭까지 돌린후에 성능검증을 위해 잠깐끊고 test해보려고 합니다.

혹시 last checkpoint에서 저장된 모델로 test만 진행해보고 싶은데
evaluation example 에서 올려주신것처럼
$ python ./openspeech_cli/hydra_eval.py
audio=melspectrogram
eval.model_name=deepspeech2
eval.dataset_path=$DATASET_PATH
eval.checkpoint_path=$CHECKPOINT_PATH
eval.manifest_file_path=$MANIFEST_FILE_PATH
로 경로들을 넣고 실행시켰을때

이러한 에러가 떠서 문의드립니다.
/openspeech/configs/eval.yaml에는
defaults :
-audio : null
-eval : default
로 되어있어서 그런걸까요...?
그리고 만약 트레이닝 중간에 끊게되면 kospeech때처럼 model.pt는 저장이안되고 ckpt만 저장이 되는걸까요?

질문하나만 더드리면 checkpoint에서부터 이어서 트레이닝을 시키고싶은데 그부분은 어디를 참고하면 될까요?

Add Perplexity Loss function

🚀 Add Perplexity Loss function

Add Perplexity Loss function
https://en.wikipedia.org/wiki/Perplexity

Motivation

For language model training

Your contribution

implementation
validation

Add uniform length batching

🚀 Add uniform length batch

Motivation

For training speed

Version 0.2.1

🚀 Feature request

Releasing Version 0.2.1

Implement Transformer-transducer
Document update
Add language model (lstm, transformer)
Add language model training pipeline
Add string_to_label method to Vocabulary class

is openspeech.modules.conv2d_subsampling forward method made a mistake while calculating the output_lengths?

❓ Questions & Help

def forward(self, inputs: torch.Tensor, input_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:

   outputs, input_lengths = super().forward(inputs, input_lengths)

```
   output_lengths = input_lengths >> 2
```
```
   output_lengths -= 1
```
```
   return outputs, output_lengths
```

`
I believe the line <3, 4> of above forward method is redundant and wrong
In line 2, input_lengths has already been updated to the proper output_lengths.

Details

eg:
inputs.shape ==> [batch_size, time, features] == [2, 1000, 80]
input_lengths ==> [1000, 587]

then after line 2 executed
outputs.shape ==> [2, 249, num_out_features]
input_lengths ==> [249, 146]

the intput_lengths has been updated to the proper output_lengths.
but later in line <3, 4>, output_lenghts is assigned to input_lenghts divided by 4 which I believe leads to an error.

Add Ensemble decoding

🚀 Feature request

Add Ensemble decoding

Motivation

To see better decoding results using multiple models

Your contribution

Add ContextNet Model

🚀 Feature request

Motivation

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Your contribution

implementation
validation

Fix minor typo

🚀 Feature request

Fix minor typo

Motivation

Fix minor typo

openspeech install error

안녕하세요, pip 으로 openspeech install시 패키지를 찾지못하는 에러가 계속 발생하여 질문드립니다.

vocab keyError

안녕하세요, vocab_label.csv가 생성되지않아서 이 csv파일은 KoSpeech에서 사용하던 aihub_character_labels.csv파일을 대체하여 사용하였습니다. 그런데 KeyError가 발생하게 되어 질문드립니다. 혹시 openspeech에서 ./hydra_train.py를 실행하면 자동으로 _label.csv 파일이 생성되나요?

Add language model training pipeline

Motivation

For decoding with language model.

Your contribution

implementation
validation

이메일 주소로 메일을 보내지 못하였습니다.

❓ Questions & Help

지메일에서 요청 메일을 보냈는데, 주소가 없다고 나옵니다.

Details

주소를 찾을 수 없음
[email protected] 주소를 찾을 수 없거나 해당 주소에서 메일을 받을 수 없어 메일이 전송되지 않았습니다.

혹시 이메일 주소가 변경되었을지 확인 부탁드립니다.

Add `resume_from_checkpoint`

🚀 Add `resume_from_checkpoint`

Add resume_from_checkpoint
Using pl.Trainer's resume_from_checkpoint param.

Motivation

#52

Your contribution

Need help in training with hyper-parameters

I am interested in training this model which implements the below paper:
Conformer: Convolution-augmented Transformer for Speech Recognition

I believe the example 1 with conformer-lstm would train this model. Is my assumption right?

Also, according to paper there are 3 different models that vary according to the hyper-parameter and I am interested in the "s" version of the model having around 10M parameters. How can I select this version or provide the hyper-parameters?

Also, do you have any script for end-to-end inference (including language model)?

Add Evaluation Pipeline

🚀 Feature request

Add Evaluation Pipeline

Motivation

To view evaluation results only

Your contribution

Streaming models

❓ Questions & Help

Is any of these models come with a streaming/real time inference?

Details

is openspeech.modules.conv2d_subsampling forward method made a mistake while calculating the output_lengths?

❓ Questions & Help

def forward(self, inputs: torch.Tensor, input_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
  outputs, input_lengths = super().forward(inputs, input_lengths)
  output_lengths = input_lengths >> 2
  output_lengths -= 1
  return outputs, output_lengths

I believe the line <3, 4> of above forward method is redundant and wrong
In line 2, input_lengths has already been updated to the proper output_lengths.

Details

eg:
inputs.shape ==> [batch_size, time, features] == [2, 1000, 80]
input_lengths ==> [1000, 587]

then after line 2 executed
outputs.shape ==> [2, 249, num_out_features]
input_lengths ==> [249, 146]

the intput_lengths has been updated to the proper output_lengths.
but later in line <3, 4>, output_lenghts is assigned to input_lenghts divided by 4 which I believe leads to an error.

Add `string_to_label` method to `Vocabulary`

🚀 Add `string_to_label` method to `Vocabulary`

Usage

>>> vocab = LibriSpeechCharacterVocabulary(...)
>>> vocab.string_to_label('I have a dog')
'7 5 3  . . . 9'

Motivation

For language model training

Your contribution

implementation
validation

batch_size 2에서 학습을 하였는데, out of memory가 발생을 합니다.

❓ Questions & Help

batch_size 2에서 학습을 하였는데, out of memory가 발생을 합니다.
사용된 GPU는 Tesla V100 32GB, 2장을 사용하였습니다. 전체 메모리를 대부분 사용하던데
학습에 사용된 장비와 배치 사이즈를 좀 알수 있을까요??

Details

Add inference script

🚀 Add inference script

Add inference script for one audio.

Motivation

🚀 Add Transformer Language Model

Add LSTM Language Model
Add test file

Motivation

For decoding combinations with transformer language model and acoustic model.

Your contribution

implementation
validation

Add `from_config`, `from_pretrained`, change `init` to models [v0.3.0]

🚀 Add `from_config`, `from_pretrained`, change `init` to models [v0.3.0]

from_config()

>>> from openspeech import Jasper5x3Config, JasperModel
>>> model = JasperModel.from_config(Jasper5x3Config)

from_pretrained()

>>> from openspeech import JasperModel
>>> model = JasperModel('pretrain_model_path')

__init__()

>>> from openspeech import JasperModel
>>> model = JasperModel(...)

LR Scheduler doesn't work

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): All
LR Scheduler doesn't work (all schedulers)

To reproduce

Skip

Expected behavior

Testing performance of the re-produced model

Thanks for your great codes!
Can you provide the re-produced results such as Conformer, Transducer? I suppose this is important for users to measure the quality of the codes.

Add Luna (Linear Unified Nested Attention) Transformer

🌟 Add Luna (Linear Unified Nested Attention) Transformer

Model description

Open source status

the model implementation is available: (give details)
who are the authors: implementation-@sooftware

Transducer beam search implementation

🚀 Feature request

Transducer beam search implementation

Motivation

Transducer is not currently implemented beam search.

Your contribution

Training with Colab

❓ Questions & Help

Details

Hello! Thank you for great project.

First of all I am a newbie to this field and due to lack of hardware environment, I only have access to GPUs through Google colab.

Is training possible for Jasper model using KsponSpeech in Google colab? I am concerned as colab only gives continuous access to GPUs only for 12 hours which I can't determine whether it is an enough time for training through this project.

Thank you! 😄

hydra train LAS 모델 수행중 에러문의

Environment info

Platform: ubntu anaconda
Python version: 3.8.11
PyTorch version (GPU?): 1.9.0 (No GPU)
Using GPU in script?: No

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): I'm using ListenAttendSpell

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

To reproduce

Steps to reproduce the behavior:

Use tensorboard as logger rather than wandb

in configuration.py, I modified like below.

logger: str = field(
    default="tensorboard", ...
)

execute a script

python ./openspeech_cli/hydra_train.py
dataset=ksponspeech
dataset.dataset_path=$DATASET_PATH
dataset.manifest_file_path=$MANIFEST_FILE_PATH \
dataset.test_dataset_path=$TEST_DATASET_PATH
dataset.test_manifest_dir=$TEST_MANIFEST_DIR
tokenizer=kspon_character
model=listen_attend_spell
audio=melspectrogram
lr_scheduler=warmup_reduce_lr_on_plateau
trainer=cpu
criterion=cross_entropy

Then, I can see an error

File: "(중략)openspeech/models/openspeech_encoder_decoder_model.py", line 92, in collect_outputs
self.info({
File: "(중략)openspeech/models/openspeech_model.py", line 82, in info
self.log(key, value, prog_bar=True)
File: "(중략)/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 399, in log
apply_to_collection(
File: "(중략)/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 100, in apply_to_collection
return function(data, *args, **kwargs)
File: "(중략)/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 533, in __check_allowed
raise ValueError(f"'self.log({name}, {value})' was called, but '{type(v).name}' values cannot be logged")
ValueError:
self.log(val_cross_entropy_loss, None)' was called, but 'NoneType' values cannot be logged

Checking the code, i think that cross_entropy_loss value is None , which causes the error.
More detail,

file : openspeech/openspeech_encoder_decoder_model.py at main · openspeech-team/openspeech · GitHub

def collect_outputs(
            self,
            stage: str,
            logits: Tensor,
            encoder_logits: Tensor,
            encoder_output_lengths: Tensor,
            targets: Tensor,
            target_lengths: Tensor,
    ) -> OrderedDict:
        cross_entropy_loss, ctc_loss = None, None  // <------------ cross-entropy_loss 는 None 으로 초기화

.....
        elif get_class_name(self.criterion) == "LabelSmoothedCrossEntropyLoss" \
                or get_class_name(self.criterion) == "CrossEntropyLoss":
            loss = self.criterion(logits, targets[:, 1:])    // <------------ loss 만 value 할당
        else:

.....
        self.info({
            f"{stage}_loss": loss,
            f"{stage}_cross_entropy_loss": cross_entropy_loss,  // <---------- cross_entropy_loss 는 None
            f"{stage}_ctc_loss": ctc_loss,
            f"{stage}_wer": wer,
            f"{stage}_cer": cer,
        })

Expected behavior

No error

hydra_train 실행시, 아래와 같은 오류가 발생합니다.

/Volumes/MacOS/PycharmProjects/openspeech/venv/bin/python /Volumes/MacOS/PycharmProjects/openspeech/openspeech/openspeech_cli/hydra_train.py dataset=librispeech; dataset.dataset_download=True; dataset.dataset_path=$DATASET_PATH; dataset.manifest_file_path=$MANIFEST_FILE_PATH; tokenizer=libri_subword; model=conformer_lstm; audio=fbank; lr_scheduler=warmup_reduce_lr_on_plateau; trainer=cpu; criterion=cross_entropy
Traceback (most recent call last):
File "/Volumes/MacOS/PycharmProjects/openspeech/openspeech/openspeech_cli/hydra_train.py", line 60, in
hydra_main()
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/main.py", line 53, in decorated_main
config_name=config_name,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 368, in _run_hydra
lambda: hydra.run(
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 371, in
overrides=args.overrides,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 91, in run
run_mode=RunMode.RUN,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 568, in compose_config
from_shell=from_shell,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 150, in load_configuration
from_shell=from_shell,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 231, in _load_configuration_impl
parsed_overrides = parser.parse_overrides(overrides=overrides)
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/core/override_parser/overrides_parser.py", line 100, in parse_overrides
) from e.cause
hydra.errors.OverrideParseException: LexerNoViableAltException: dataset=librispeech;
^
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

Process finished with exit code 1

Details

Deepspeech2 training error

Environment info

Platform: Windows 10
Python version: 3.7
PyTorch version (GPU?): PyTorch version : 1.8.1, CUDA version : 10.2
Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): Deepspeech2, listen_attend_spell_with_multi_head

command :

$ python ./openspeech_cli/hydra_train.py \
    dataset=ksponspeech \
    dataset.dataset_path=E:/KsponSpeech \
    dataset.manifest_file_path=E:/kspon_character_manifest.txt \  
    dataset.test_dataset_path=E:/test \
    dataset.test_manifest_dir=E:/KsponSpeech_scripts \
    vocab=kspon_character \
    model=deepspeech2\
    audio=melspectrogram \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=ctc

The problem arises when using:

  | Name      | Type        | Params
------------------------------------------
0 | criterion | CTCLoss     | 0     
1 | encoder   | DeepSpeech2 | 90.2 M
------------------------------------------
90.2 M    Trainable params
0         Non-trainable params
90.2 M    Total params
360.606   Total estimated model params size (MB)
Global seed set to 1
Epoch 0:   1%|▏         | 4254/310637 [1:02:37<75:09:50,  1.13it/s, loss=9.13, v_num=0]Traceback (most recent call last):
  File "C:/Users/cote/PycharmProjects/kospeech2/openspeech_cli/hydra_train.py", line 51, in hydra_main
    trainer.fit(model, data_module)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
    self._run(model)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run
    self.dispatch()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage
    return self.run_train()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 490, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 731, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 432, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\optim\optimizer.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\optim\adam.py", line 66, in step
    loss = closure()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 726, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 827, in training_step_and_backward
    self.backward(result, optimizer, opt_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 865, in backward
    result.closure_loss, optimizer, opt_idx, should_accumulate, *args, **kwargs
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 309, in backward
    self.lightning_module, closure_loss, optimizer, optimizer_idx, should_accumulate, *args, **kwargs
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 79, in backward
    model.backward(closure_loss, optimizer, opt_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\lightning.py", line 1275, in backward
    loss.backward(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\autograd\__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

The above error occurred while training Deepspeech2, and the same error occurred in the listen_attend_spell_with_multi_head model.

Training and decoding speed

Thank you for the nice software. Could you please share the information:

How long the training on librispeech takes on how many GPUs
How fast is the decoding (RTF?)

Apply documentation theme

🚀 Feature request

Motivation

Apply documentation theme

Your contribution

Apply documentation theme

Using conformer_transformer and rnnt_loss error

Environment info

Platform: Ubuntu 16.04
Python version: 3.7.10
PyTorch version (GPU?): PyTorch 1.7.0, CUDA 10.2
Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): conformer_transducer

The problem arises when using:
CUDA_VISIBLE_DEVICES=8 python ./openspeech_cli/hydra_train.py dataset=libri dataset.dataset_path=/data/dataset/Libri/LibriSpeech dataset.dataset_download=False dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt vocab=libri_subword vocab.vocab_size=5000 vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset model=conformer_transducer audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu trainer.batch_size=4 criterion=transducer

Error executing job with overrides: ['dataset=libri', 'dataset.dataset_path=/data/dataset/Libri/LibriSpeech', 'dataset.dataset_download=False', 'dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt', 'vocab=libri_subword', 'vocab.vocab_size=5000', 'vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset', 'model=conformer_transducer', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'trainer.batch_size=4', 'criterion=transducer']
Traceback (most recent call last):
  File "./openspeech_cli/hydra_train.py", line 51, in hydra_main
    trainer.fit(model, data_module)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 442, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 733, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 98, in training_step
    return self.model(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
    output = super().forward(*inputs, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/conformer_transducer/model.py", line 110, in training_step
    return super(ConformerTransducerModel, self).training_step(batch, batch_idx)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 268, in training_step
    target_lengths=target_lengths,
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 90, in collect_outputs
    target_lengths=target_lengths.int(),
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/criterion/transducer/transducer.py", line 96, in forward
    gather=self.gather,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/warp_rnnt-0.4.0-py3.7-linux-x86_64.egg/warp_rnnt/__init__.py", line 74, in rnnt_loss
    index[:, :, :U-1, 1] = labels.unsqueeze(dim=1)
RuntimeError: The expanded size of the tensor (61) must match the existing size (62) at non-singleton dimension 2.  Target sizes: [4, 355, 61].  Tensor sizes: [4, 1, 62]

The Loss function:

def rnnt_loss(log_probs: torch.FloatTensor,
              labels: torch.IntTensor,
              frames_lengths: torch.IntTensor,
              labels_lengths: torch.IntTensor,
              average_frames: bool = False,
              reduction: Optional[AnyStr] = None,
              blank: int = 0,
              gather: bool = False) -> torch.Tensor:

    The CUDA-Warp RNN-Transducer loss.

    Args:
        log_probs (torch.FloatTensor): Input tensor with shape (N, T, U, V)
            where N is the minibatch size, T is the maximum number of
            input frames, U is the maximum number of output labels and V is
            the vocabulary of labels (including the blank).
        labels (torch.IntTensor): Tensor with shape (N, U-1) representing the
            reference labels for all samples in the minibatch.
        frames_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            number of frames for each sample in the minibatch.
        labels_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            length of the transcription for each sample in the minibatch.
        average_frames (bool, optional): Specifies whether the loss of each
            sample should be divided by its number of frames.
            Default: False.
        reduction (string, optional): Specifies the type of reduction.
            Default: None.
        blank (int, optional): label used to represent the blank symbol.
            Default: 0.
        gather (bool, optional): Reduce memory consumption.
            Default: False.

The log_probs and labels of model output and function requirements are inconsistent

Document update

🚀 Feature request

Document update

Motivation

Version 0.2 released.

Your contribution

Missing one positional argument sample_rate

in openspeech.data.sampler.SmartBatchingSampler._get_audio_length method, when using load_audio, missing one positional argument sample_rate

Encoding error when parsing manifest file

Information

self.encoding = 'utf-8' if self.configs.tokenizer.unit == 'kspon_subword' else 'cp949'


def _parse_manifest_file(self):
    r"""
    Parsing manifest file.

    Returns:
        audio_paths (list): list of audio path
        transcritps (list): list of transcript of audio
    """
    audio_paths = list()
    transcripts = list()

    with open(self.configs.dataset.manifest_file_path, encoding=self.encoding) as f:
        for idx, line in enumerate(f.readlines()):
            audio_path, korean_transcript, transcript = line.split('\t')
            transcript = transcript.replace('\n', '')

            audio_paths.append(audio_path)
            transcripts.append(transcript)

    return audio_paths, transcripts

Expected behavior

self.encoding = 'utf-8' if self.configs.tokenizer.unit == 'kspon_subword' else 'cp949'

--> self.encoding = 'cp949' if self.configs.tokenizer.unit == 'kspon_grapheme' else 'utf-8'

Version 0.3.0

🚀 Version 0.3.0

Vocabulary => Tokenizer class
Re-documentation
Re-factoring models directory

Motivation

To provide the modules used by Openspeech in the form of a library.

Your contribution

Development, design architecture

init() missing 2 required positional arguments: 'configs' and 'tokenizer'

when run openspeech_cli.hydra_eval.py, load_from_checkpoint method report this error:

__init__() missing 2 required positional arguments: 'configs' and 'tokenizer'

I think it is caused by that the checkpoit doesn't pickle 'configs' and 'tokennizer' params in model class.

kspondata의 preprocessed manifest 파일 다운

먼저 좋은 소스파일을 공유해주셔서 감사합니다!

kspondata의 preprocessed manifest 파일 다운을 받으려고

액세스 권한 신청을 한 상태인데 ,

궁금한점은

혹시 manifest로 만드는 preprocess 코드가 존재하나요?
manifest 형식이 이전의 repo인 kospeech에서 생성한 [transcripts.txt] 파일과 유사한데 manifest를 transcripts.txt으로 대체해도 될까요?
training시 test_manifest_dir 및 test_dataset_path의 경로가 필요한데, 이를 위해 train/test dataset과 train/test manifest 텍스트 파일이 각각 필요한가요? 만약 아니라면 train/test로 나누는 코드가 내장되어 있는지 궁금합니다!

엑세스 권한 수락해주시면 감사하겠습니다 :)

openspeech-team / openspeech Goto Github PK

openspeech's People

Contributors

Stargazers

Watchers

Forkers

openspeech's Issues

🚀 Change log_steps() => info() in OpenspeechModel

Motivation

Your contribution

❓ Questions & Help

Details

🚀 Add LSTM Language Model

Motivation

Your contribution

Environment info

Information

Information

Expected behavior

🚀 Feature request

Your contribution

🚀 Feature request

Motivation

Your contribution

❓ Questions & Help

🚀 Feature request

Motivation

Your contribution

🚀 Add Perplexity Loss function

Motivation

Your contribution

🚀 Add uniform length batch

Motivation

🚀 Feature request

❓ Questions & Help

Details

🚀 Feature request

Motivation

Your contribution

🚀 Feature request

Motivation

Your contribution

🚀 Feature request

Motivation

Motivation

Your contribution

❓ Questions & Help

Details

🚀 Add resume_from_checkpoint

Motivation

Your contribution

🚀 Feature request

Motivation

Your contribution

❓ Questions & Help

Details

❓ Questions & Help

Details

🚀 Add string_to_label method to Vocabulary

Motivation

Your contribution

❓ Questions & Help

Details

🚀 Add inference script

Motivation

🚀 Add Transformer Language Model

Motivation

Your contribution

🚀 Add from_config, from_pretrained, change __init__ to models [v0.3.0]

Information

To reproduce

Expected behavior

🌟 Add Luna (Linear Unified Nested Attention) Transformer

Model description

Open source status

🚀 Feature request

Motivation

Your contribution

❓ Questions & Help

Details

🚀 Change `log_steps()` => `info()` in `OpenspeechModel`

🚀 Add `resume_from_checkpoint`

🚀 Add `string_to_label` method to `Vocabulary`

🚀 Add `from_config`, `from_pretrained`, change `init` to models [v0.3.0]