Giter VIP home page Giter VIP logo

openspeech's People

Contributors

dudgns0908 avatar grazder avatar roytravel avatar sooftware avatar soyoungcho avatar upskyy avatar yongwookha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openspeech's Issues

Change `log_steps()` => `info()` in `OpenspeechModel`

๐Ÿš€ Change log_steps() => info() in OpenspeechModel

  • Current implementation of OpenspeechModel logging
def log_steps(...):
    self.log(f"{stage}_wer", wer)
    self.log(f"{stage}_cer", cer)
    ...
  • Modified version
def info(self, dictionary: dict) -> None:
    r"""
    Logging information from dictionary.

    Args:
        dictionary (dict): dictionary contains information.
    """
    for key, value in dictionary.items():
        self.log(key, value, prog_bar=True)

Motivation

  • For scalability

Your contribution

  • implementation

Other data

โ“ Questions & Help

Details

ํ˜„์žฌ ์„ธ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ด์™ธ์—๋„ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ์˜คํ”ˆ ์Œ์„ฑ์ธ์‹ ๋ชจ๋ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์ด๋“œ๋ผ์ธ๋“ค์ด ์žˆ์„๊นŒ์š”??

validation checek "NoneType' object has no attribute 'transpose"

์•ˆ๋…•ํ•˜์„ธ์š” training์‹œ์ž‘์„ ํ–ˆ์„ ๋•Œ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ์ง€์ •, manifestํŒŒ์ผ, vocabํŒŒ์ผ ๋“ฑ์„ ๊ฐ€์ง€๊ณ  ์‹คํ–‰์‹œ์ผฐ์„๋•Œ ๊ณ„์†ํ•ด์„œ

image

image

์ด๋Ÿฌํ•œ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ ๋„๋Œ€์ฒด ์™œ๊ทธ๋Ÿฐ๊ฑด์ง€ ํ•ด๊ฒฐ์ด ์•ˆ๋˜์–ด์„œ ์งˆ๋ฌธ๋“œ๋ฆฝ๋‹ˆ๋‹ค...
vocab๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ์˜ฌ๋ ค์ฃผ์‹  sp.model๊ณผ kspon_subword_manifest.txt ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ GPU๋Š” RTX3090, Pytorch 1.8.1 ๋ฒ„์ „ ์‚ฌ์šฉ์ค‘์ž…๋‹ˆ๋‹ค.

Add LSTM Language Model

๐Ÿš€ Add LSTM Language Model

  • Add LSTM Language Model
  • Add test file

Motivation

  • For decoding combinations with language model and acoustic model.

Your contribution

  • implementation
  • validation

test_conformer_lstm.py bug

Environment info

  • Platform: Windows 10
  • Python version: 3.7
  • PyTorch version (GPU?): PyTorch version : 1.8.1, CUDA version : 10.2
  • Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): Conformer_lstm

The problem arises when using:

Testing started at ์˜คํ›„ 5:22 ...
C:\Users\cote\Anaconda3\envs\sc\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition with Anaconda plugin 2019.3.3\plugins\python-ce\helpers\pycharm\_jb_unittest_runner.py" --path C:/Users/cote/PycharmProjects/kospeech2/tests/test_conformer_lstm.py
Launching unittests with arguments python -m unittest C:/Users/cote/PycharmProjects/kospeech2/tests/test_conformer_lstm.py in C:\Users\cote\PycharmProjects\kospeech2\tests


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 56, in test_beam_search
    prediction = model(DUMMY_INPUTS, DUMMY_INPUT_LENGTHS)["predictions"]
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 109, in forward
    return super(ConformerLSTMModel, self).forward(inputs, input_lengths)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 125, in forward
    predictions = self.decoder(encoder_outputs, encoder_output_lengths)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\search\beam_search_lstm.py", line 78, in forward
    step_outputs, hidden_states, attn = self.forward_step(inputs, hidden_states, encoder_outputs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 37, in test_forward
    outputs = model(DUMMY_INPUTS, DUMMY_INPUT_LENGTHS)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 109, in forward
    return super(ConformerLSTMModel, self).forward(inputs, input_lengths)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 130, in forward
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 103, in test_test_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 148, in test_step
    return super(ConformerLSTMModel, self).test_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 225, in test_step
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 71, in test_training_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 122, in training_step
    return super(ConformerLSTMModel, self).training_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 177, in training_step
    target_lengths=target_lengths,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 105, in collect_outputs
    "learning_rate": self.get_lr(),
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_model.py", line 215, in get_lr
    for g in self.optimizer.param_groups:
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 948, in __getattr__
    type(self).__name__, name))
AttributeError: 'ConformerLSTMModel' object has no attribute 'optimizer'


Error
Traceback (most recent call last):
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\cote\Anaconda3\Lib\unittest\case.py", line 628, in run
    testMethod()
  File "C:\Users\cote\PycharmProjects\kospeech2\tests\test_conformer_lstm.py", line 87, in test_validation_step
    batch=(DUMMY_INPUTS, DUMMY_TARGETS, DUMMY_INPUT_LENGTHS, DUMMY_TARGET_LENGTHS), batch_idx=i
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\conformer_lstm\model.py", line 135, in validation_step
    return super(ConformerLSTMModel, self).validation_step(batch, batch_idx)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\models\openspeech_encoder_decoder_model.py", line 197, in validation_step
    teacher_forcing_ratio=0.0,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 220, in forward
    attn=attn,
  File "C:\Users\cote\PycharmProjects\kospeech2\openspeech\decoders\lstm_attention_decoder.py", line 140, in forward_step
    embedded = self.embedding(input_var)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device


Assertion failed


Ran 5 tests in 24.698s

FAILED (errors=5)

Process finished with exit code 1

Assertion failed

Assertion failed

Grapheme label error

Information

The graphene label value needs to be corrected.

def generate_grapheme_labels(grapheme_transcripts, vocab_path: str):
    vocab_list = list()
    vocab_freq = list()

    for grapheme_transcript in grapheme_transcripts:
        graphemes = grapheme_transcript.split()
        for grapheme in graphemes:
            if grapheme not in vocab_list:
                vocab_list.append(grapheme)
                vocab_freq.append(1)
            else:
                vocab_freq[vocab_list.index(grapheme)] += 1

    vocab_freq, vocab_list = zip(*sorted(zip(vocab_freq, vocab_list), reverse=True))
    vocab_dict = {
        'id': [0, 1, 2, 3],
        'grpm': ['<pad>', '<sos>', '<eos>', '<blank>'],
        'freq': [0, 0, 0, 0]
    }

    for idx, (grpm, freq) in enumerate(zip(vocab_list, vocab_freq)):
        vocab_dict['id'].append(idx + 3)
        vocab_dict['grpm'].append(grpm)
        vocab_dict['freq'].append(freq)

    label_df = pd.DataFrame(vocab_dict)
    label_df.to_csv(vocab_path, encoding="utf-8", index=False)

Expected behavior

vocab_dict['id'].append(idx + 3) -> vocab_dict['id'].append(idx + 4)

Add metrics.py test file

๐Ÿš€ Feature request

Add metrics.py test file for LibriSpeech, AISHELL-1, KsponSpeech

Your contribution

  • implementation

  • validation

Pypi package register

๐Ÿš€ Feature request

Pypi package register

Motivation

Pypi package register

Your contribution

Pypi package register

AttributeError: 'NoneType' object has no attribute 'sos_id'

โ“ Questions & Help

์ดˆ์‹ฌ์ž์—ฌ์„œ ๊ธฐ์ดˆ์ ์ธ ์งˆ๋ฌธ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ์  ์–‘ํ•ด ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค..!
kospeech์—์„œ ์ œ๊ณตํ•ด์ฃผ๋˜ ๋ชจ๋ธ ๊ทธ๋Œ€๋กœ conformer-large ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋‹ค๊ฐ€ loss nan์œผ๋กœ ์ธํ•ด ํ•™์Šต์„ ๋ฉˆ์ถ”๊ณ  openspeech๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์Œ์„ฑ์ธ์‹๊ธฐ๋ฅผ ์ œ์ž‘ํ•ด๋ณด๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ €, ์‹คํ–‰ ๋ช…๋ น์€ readme์— ๋‚˜์™€์žˆ๋Š” ksponspeech ๋ฐ์ดํ„ฐ ํ•™์Šต๊ณผ ๋™์ผํ•˜๊ฒŒ ์‹คํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
train_execute

DATASET_PATH ์—๋Š” aihub์—์„œ ๋ฐ›์€ KsponSpeech_01 ... 05 ํด๋”๊ฐ€ ์œ„์น˜ํ•ด์žˆ์œผ๋ฉฐ
MANIFEST_FILE_PATH ์—๋Š” ์ด๋ฉ”์ผ์„ ํ†ตํ•ด ๋ฐ›์€ kspon_character_manifest.txt ํŒŒ์ผ ๊ฒฝ๋กœ๊ฐ€ ์ €์žฅ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.
TEST_DATASET_PATH ์—๋Š” aihub์—์„œ ๋ฐ›์€ eval_clean๊ณผ eval_other ํด๋”๊ฐ€ ์œ„์น˜ํ•ด์žˆ์œผ๋ฉฐ
TEST_MANIFEST_DIR ์—๋Š” ์—ญ์‹œ ์ด๋ฉ”์ผ์„ ํ†ตํ•ด ๋ฐ›์€ kspon_character_manifest.txt๊ฐ€ ์œ„์น˜ํ•œ ํด๋” ๋ช…์„ ์ €์žฅํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ €, ์ด ์ง€์ ์—์„œ ์งˆ๋ฌธ์„ ๋“œ๋ฆฌ๊ณ  ์‹ถ์€ ๊ฒƒ์ด ๊ฐ ํŒจ์Šค๊ฐ€ ์˜๋ฏธํ•˜๋Š” ํŒŒ์ผ๋“ค์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ €์žฅ๋˜์–ด ์žˆ๋Š”์ง€ ์—ฌ์ญค๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๊นƒ ํด๋ก  ์ดํ›„ ํ–ˆ๋˜ ๊ณผ์ •์„ ์ „๋ถ€ ๋‚˜์—ดํ•ด๋“œ๋ฆฌ๋ฉด
syntaxerror
๋กœ ์ธํ•˜์—ฌ
openspeech/callbacks.py ์—์„œ

callbacks

๋กœ ์ˆ˜์ •ํ•˜์˜€๊ณ ,
์ดํ›„ ์žฌ์‹คํ–‰ํ•˜์—ฌ
OSError: Character label file (csv format) doesn't exist ../../../aihub_labels.csv
๊ฐ€ ๋ฐœ์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.
log1
log2
log3
log4

๋•Œ๋ฌธ์— ./opensppech/tokenizers/ksponspeech/character.py ์—์„œ
character

๋กœ ์ ˆ๋Œ€๊ฒฝ๋กœ๋กœ ์„ค์ •ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
kspon_character_labels.csv ํŒŒ์ผ์€ aihub ์Šน์ธ ์Šคํฌ๋ฆฐ์ƒท์„ ๋ณด๋‚ด๋“œ๋ฆฐ ํ›„ ๋ฐ›์€ ํŒŒ์ผ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.
์ดํ›„ ์žฌ์‹คํ–‰ํ•˜์˜€๋”๋‹ˆ
last_log

์œ„์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์„œ ๋‹ค์‹œ ์งˆ๋ฌธ์„ ๋“œ๋ฆฌ๊ณ  ์‹ถ์€ ์ ์ด, ์œ„์— ํ–ˆ๋˜ ๊ณผ์ •์—์„œ ์˜ค๋ฅ˜๋ฅผ ์ผ์œผํ‚ฌ ๋งŒํ•œ ์ ์ด ์žˆ์—ˆ๋Š”์ง€์™€
๊ธฐ๋ณธ ์„ค์ •๋œ aihub_labels.csv ํŒŒ์ผ์ด ์ œ๊ฐ€ ์„ค์ •ํ•ด ์ค€ kspon_character_labels.csv ํŒŒ์ผ์ด ๋งž๋Š”์ง€,
์•„๋‹ˆ๋ผ๋ฉด aihub_labels.csv ํŒŒ์ผ์€ ๋ฌด์—‡์ด๋ฉฐ ์–ด๋–ป๊ฒŒ ์–ป์„ ์ˆ˜ ์žˆ๋Š”์ง€,

๋งˆ์ง€๋ง‰์œผ๋กœ ์œ„ ๊ณผ์ •์—์„œ ํŠน๋ณ„ํ•œ ๋ฌธ์ œ๊ฐ€ ์—†๋‹ค๋ฉด, sos_id attribute์— ๋Œ€ํ•œ error๋Š” ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋Š” ์ง€์— ๊ด€ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.

๋‹ค์†Œ ๊ธ€์ด ๊ธธ์–ด์ง„ ์  ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์†Œํ•œ ๊ฒƒ ํ•˜๋‚˜๊ฐ€ ํฌ๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น ๊นŒ๋ด ๊ฑด๋“œ๋ ธ๋˜ ๋ชจ๋“  ๊ฒƒ์„ ์ •๋ฆฌํ•ด์„œ ์ ์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.
๋‹ต๋ณ€ ๊ธฐ๋‹ค๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค!

inference ๊ธฐ๋Šฅ

์šฐ์„  ์ข‹์€ ํ”„๋กœ์ ํŠธ ๊ฐœ๋ฐœํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ชจ๋ธ์— ๋Œ€ํ•˜์—ฌ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์ž…๋ ฅ์œผ๋กœ ํ…Œ์ŠคํŠธ(kospeech์˜ inference.py) ํ•˜๋ ค๋ฉด ์–ด๋–ค ์‹์œผ๋กœ ํ•ด์•ผ ํ•˜๋Š”์ง€ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

Re-documentation

๐Ÿš€ Feature request

Re-documentation when releasing v0.2.1

Motivation

Several document updates happened.

Your contribution

test ๋ฐฉ๋ฒ•

์•ˆ๋…•ํ•˜์„ธ์š”,
ํ˜„์žฌ deepspeech2๋ชจ๋ธ๋กœ 1์—ํญ๊นŒ์ง€ ๋Œ๋ฆฐํ›„์— ์„ฑ๋Šฅ๊ฒ€์ฆ์„ ์œ„ํ•ด ์ž ๊น๋Š๊ณ  testํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

ํ˜น์‹œ last checkpoint์—์„œ ์ €์žฅ๋œ ๋ชจ๋ธ๋กœ test๋งŒ ์ง„ํ–‰ํ•ด๋ณด๊ณ  ์‹ถ์€๋ฐ
evaluation example ์—์„œ ์˜ฌ๋ ค์ฃผ์‹ ๊ฒƒ์ฒ˜๋Ÿผ
$ python ./openspeech_cli/hydra_eval.py
audio=melspectrogram
eval.model_name=deepspeech2
eval.dataset_path=$DATASET_PATH
eval.checkpoint_path=$CHECKPOINT_PATH
eval.manifest_file_path=$MANIFEST_FILE_PATH
๋กœ ๊ฒฝ๋กœ๋“ค์„ ๋„ฃ๊ณ  ์‹คํ–‰์‹œ์ผฐ์„๋•Œ
image

์ด๋Ÿฌํ•œ ์—๋Ÿฌ๊ฐ€ ๋– ์„œ ๋ฌธ์˜๋“œ๋ฆฝ๋‹ˆ๋‹ค.
/openspeech/configs/eval.yaml์—๋Š”
defaults :
-audio : null
-eval : default
๋กœ ๋˜์–ด์žˆ์–ด์„œ ๊ทธ๋Ÿฐ๊ฑธ๊นŒ์š”...?
๊ทธ๋ฆฌ๊ณ  ๋งŒ์•ฝ ํŠธ๋ ˆ์ด๋‹ ์ค‘๊ฐ„์— ๋Š๊ฒŒ๋˜๋ฉด kospeech๋•Œ์ฒ˜๋Ÿผ model.pt๋Š” ์ €์žฅ์ด์•ˆ๋˜๊ณ  ckpt๋งŒ ์ €์žฅ์ด ๋˜๋Š”๊ฑธ๊นŒ์š”?

์งˆ๋ฌธํ•˜๋‚˜๋งŒ ๋”๋“œ๋ฆฌ๋ฉด checkpoint์—์„œ๋ถ€ํ„ฐ ์ด์–ด์„œ ํŠธ๋ ˆ์ด๋‹์„ ์‹œํ‚ค๊ณ ์‹ถ์€๋ฐ ๊ทธ๋ถ€๋ถ„์€ ์–ด๋””๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ๋ ๊นŒ์š”?

Version 0.2.1

๐Ÿš€ Feature request

Releasing Version 0.2.1

  • Implement Transformer-transducer
  • Document update
  • Add language model (lstm, transformer)
  • Add language model training pipeline
  • Add string_to_label method to Vocabulary class

is openspeech.modules.conv2d_subsampling forward method made a mistake while calculating the output_lengths?

โ“ Questions & Help

`

  1. def forward(self, inputs: torch.Tensor, input_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
  2.    outputs, input_lengths = super().forward(inputs, input_lengths)
    
  3.    output_lengths = input_lengths >> 2
    
  4.    output_lengths -= 1
    
  5.    return outputs, output_lengths
    

`
I believe the line <3, 4> of above forward method is redundant and wrong
In line 2, input_lengths has already been updated to the proper output_lengths.

Details

eg:
inputs.shape ==> [batch_size, time, features] == [2, 1000, 80]
input_lengths ==> [1000, 587]

then after line 2 executed
outputs.shape ==> [2, 249, num_out_features]
input_lengths ==> [249, 146]

the intput_lengths has been updated to the proper output_lengths.
but later in line <3, 4>, output_lenghts is assigned to input_lenghts divided by 4 which I believe leads to an error.

Add Ensemble decoding

๐Ÿš€ Feature request

  • Add Ensemble decoding

Motivation

To see better decoding results using multiple models

Your contribution

Fix minor typo

๐Ÿš€ Feature request

Fix minor typo

Motivation

Fix minor typo

openspeech install error

์•ˆ๋…•ํ•˜์„ธ์š”, pip ์œผ๋กœ openspeech install์‹œ ํŒจํ‚ค์ง€๋ฅผ ์ฐพ์ง€๋ชปํ•˜๋Š” ์—๋Ÿฌ๊ฐ€ ๊ณ„์† ๋ฐœ์ƒํ•˜์—ฌ ์งˆ๋ฌธ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
image

vocab keyError

์•ˆ๋…•ํ•˜์„ธ์š”, vocab_label.csv๊ฐ€ ์ƒ์„ฑ๋˜์ง€์•Š์•„์„œ ์ด csvํŒŒ์ผ์€ KoSpeech์—์„œ ์‚ฌ์šฉํ•˜๋˜ aihub_character_labels.csvํŒŒ์ผ์„ ๋Œ€์ฒดํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ KeyError๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜์–ด ์งˆ๋ฌธ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ํ˜น์‹œ openspeech์—์„œ ./hydra_train.py๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ์ž๋™์œผ๋กœ _label.csv ํŒŒ์ผ์ด ์ƒ์„ฑ๋˜๋‚˜์š”?
image

์ด๋ฉ”์ผ ์ฃผ์†Œ๋กœ ๋ฉ”์ผ์„ ๋ณด๋‚ด์ง€ ๋ชปํ•˜์˜€์Šต๋‹ˆ๋‹ค.

โ“ Questions & Help

Screenshot 2021-07-30 แ„‹แ…ฉแ„Œแ…ฅแ†ซ 9 34 13

์ง€๋ฉ”์ผ์—์„œ ์š”์ฒญ ๋ฉ”์ผ์„ ๋ณด๋ƒˆ๋Š”๋ฐ, ์ฃผ์†Œ๊ฐ€ ์—†๋‹ค๊ณ  ๋‚˜์˜ต๋‹ˆ๋‹ค.

Details

์ฃผ์†Œ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†์Œ
[email protected]ย ์ฃผ์†Œ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๊ฑฐ๋‚˜ ํ•ด๋‹น ์ฃผ์†Œ์—์„œ ๋ฉ”์ผ์„ ๋ฐ›์„ ์ˆ˜ ์—†์–ด ๋ฉ”์ผ์ด ์ „์†ก๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

ํ˜น์‹œ ์ด๋ฉ”์ผ ์ฃผ์†Œ๊ฐ€ ๋ณ€๊ฒฝ๋˜์—ˆ์„์ง€ ํ™•์ธ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Add `resume_from_checkpoint`

๐Ÿš€ Add resume_from_checkpoint

  • Add resume_from_checkpoint
  • Using pl.Trainer's resume_from_checkpoint param.

Motivation

#52

Your contribution

Need help in training with hyper-parameters

I am interested in training this model which implements the below paper:
Conformer: Convolution-augmented Transformer for Speech Recognition

I believe the example 1 with conformer-lstm would train this model. Is my assumption right?

Also, according to paper there are 3 different models that vary according to the hyper-parameter and I am interested in the "s" version of the model having around 10M parameters. How can I select this version or provide the hyper-parameters?

Screenshot from 2021-07-28 14-37-40

Also, do you have any script for end-to-end inference (including language model)?

Add Evaluation Pipeline

๐Ÿš€ Feature request

Add Evaluation Pipeline

Motivation

To view evaluation results only

Your contribution

Streaming models

โ“ Questions & Help

Is any of these models come with a streaming/real time inference?

Details

is openspeech.modules.conv2d_subsampling forward method made a mistake while calculating the output_lengths?

โ“ Questions & Help

def forward(self, inputs: torch.Tensor, input_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
  outputs, input_lengths = super().forward(inputs, input_lengths)
  output_lengths = input_lengths >> 2
  output_lengths -= 1
  return outputs, output_lengths

I believe the line <3, 4> of above forward method is redundant and wrong
In line 2, input_lengths has already been updated to the proper output_lengths.

Details

eg:
inputs.shape ==> [batch_size, time, features] == [2, 1000, 80]
input_lengths ==> [1000, 587]

then after line 2 executed
outputs.shape ==> [2, 249, num_out_features]
input_lengths ==> [249, 146]

the intput_lengths has been updated to the proper output_lengths.
but later in line <3, 4>, output_lenghts is assigned to input_lenghts divided by 4 which I believe leads to an error.

Add `string_to_label` method to `Vocabulary`

๐Ÿš€ Add string_to_label method to Vocabulary

  • Usage
>>> vocab = LibriSpeechCharacterVocabulary(...)
>>> vocab.string_to_label('I have a dog')
'7 5 3  . . . 9'

Motivation

  • For language model training

Your contribution

  • implementation
  • validation

batch_size 2์—์„œ ํ•™์Šต์„ ํ•˜์˜€๋Š”๋ฐ, out of memory๊ฐ€ ๋ฐœ์ƒ์„ ํ•ฉ๋‹ˆ๋‹ค.

โ“ Questions & Help

batch_size 2์—์„œ ํ•™์Šต์„ ํ•˜์˜€๋Š”๋ฐ, out of memory๊ฐ€ ๋ฐœ์ƒ์„ ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์šฉ๋œ GPU๋Š” Tesla V100 32GB, 2์žฅ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋Œ€๋ถ€๋ถ„ ์‚ฌ์šฉํ•˜๋˜๋ฐ
ํ•™์Šต์— ์‚ฌ์šฉ๋œ ์žฅ๋น„์™€ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ข€ ์•Œ์ˆ˜ ์žˆ์„๊นŒ์š”??

Details

๐Ÿš€ Add Transformer Language Model

๐Ÿš€ Add Transformer Language Model

  • Add LSTM Language Model
  • Add test file

Motivation

  • For decoding combinations with transformer language model and acoustic model.

Your contribution

  • implementation
  • validation

Add `from_config`, `from_pretrained`, change `__init__` to models [v0.3.0]

๐Ÿš€ Add from_config, from_pretrained, change __init__ to models [v0.3.0]

  • from_config()
>>> from openspeech import Jasper5x3Config, JasperModel
>>> model = JasperModel.from_config(Jasper5x3Config)
  • from_pretrained()
>>> from openspeech import JasperModel
>>> model = JasperModel('pretrain_model_path')
  • __init__()
>>> from openspeech import JasperModel
>>> model = JasperModel(...)

LR Scheduler doesn't work

Information

  • Model I am using (ListenAttendSpell, Transformer, Conformer ...): All
  • LR Scheduler doesn't work (all schedulers)

To reproduce

Skip

Expected behavior

Transducer beam search implementation

๐Ÿš€ Feature request

Transducer beam search implementation

Motivation

Transducer is not currently implemented beam search.

Your contribution

Training with Colab

โ“ Questions & Help

Details

Hello! Thank you for great project.

First of all I am a newbie to this field and due to lack of hardware environment, I only have access to GPUs through Google colab.

Is training possible for Jasper model using KsponSpeech in Google colab? I am concerned as colab only gives continuous access to GPUs only for 12 hours which I can't determine whether it is an enough time for training through this project.

Thank you! ๐Ÿ˜„

hydra train LAS ๋ชจ๋ธ ์ˆ˜ํ–‰์ค‘ ์—๋Ÿฌ๋ฌธ์˜

Environment info

  • Platform: ubntu anaconda
  • Python version: 3.8.11
  • PyTorch version (GPU?): 1.9.0 (No GPU)
  • Using GPU in script?: No

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): I'm using ListenAttendSpell

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Use tensorboard as logger rather than wandb

in configuration.py, I modified like below.

logger: str = field(
    default="tensorboard", ...
)
  1. execute a script

python ./openspeech_cli/hydra_train.py
dataset=ksponspeech
dataset.dataset_path=$DATASET_PATH
dataset.manifest_file_path=$MANIFEST_FILE_PATH \
dataset.test_dataset_path=$TEST_DATASET_PATH
dataset.test_manifest_dir=$TEST_MANIFEST_DIR
tokenizer=kspon_character
model=listen_attend_spell
audio=melspectrogram
lr_scheduler=warmup_reduce_lr_on_plateau
trainer=cpu
criterion=cross_entropy

  1. Then, I can see an error

File: "(์ค‘๋žต)openspeech/models/openspeech_encoder_decoder_model.py", line 92, in collect_outputs
self.info({
File: "(์ค‘๋žต)openspeech/models/openspeech_model.py", line 82, in info
self.log(key, value, prog_bar=True)
File: "(์ค‘๋žต)/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 399, in log
apply_to_collection(
File: "(์ค‘๋žต)/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 100, in apply_to_collection
return function(data, *args, **kwargs)
File: "(์ค‘๋žต)/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 533, in __check_allowed
raise ValueError(f"'self.log({name}, {value})' was called, but '{type(v).name}' values cannot be logged")
ValueError:
self.log(val_cross_entropy_loss, None)' was called, but 'NoneType' values cannot be logged

Checking the code, i think that cross_entropy_loss value is None , which causes the error.
More detail,

file : openspeech/openspeech_encoder_decoder_model.py at main ยท openspeech-team/openspeech ยท GitHub

def collect_outputs(
            self,
            stage: str,
            logits: Tensor,
            encoder_logits: Tensor,
            encoder_output_lengths: Tensor,
            targets: Tensor,
            target_lengths: Tensor,
    ) -> OrderedDict:
        cross_entropy_loss, ctc_loss = None, None  // <------------ cross-entropy_loss ๋Š” None ์œผ๋กœ ์ดˆ๊ธฐํ™”

.....
        elif get_class_name(self.criterion) == "LabelSmoothedCrossEntropyLoss" \
                or get_class_name(self.criterion) == "CrossEntropyLoss":
            loss = self.criterion(logits, targets[:, 1:])    // <------------ loss ๋งŒ value ํ• ๋‹น
        else:

.....
        self.info({
            f"{stage}_loss": loss,
            f"{stage}_cross_entropy_loss": cross_entropy_loss,  // <---------- cross_entropy_loss ๋Š” None
            f"{stage}_ctc_loss": ctc_loss,
            f"{stage}_wer": wer,
            f"{stage}_cer": cer,
        })

Expected behavior

No error

hydra_train ์‹คํ–‰์‹œ, ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

/Volumes/MacOS/PycharmProjects/openspeech/venv/bin/python /Volumes/MacOS/PycharmProjects/openspeech/openspeech/openspeech_cli/hydra_train.py dataset=librispeech; dataset.dataset_download=True; dataset.dataset_path=$DATASET_PATH; dataset.manifest_file_path=$MANIFEST_FILE_PATH; tokenizer=libri_subword; model=conformer_lstm; audio=fbank; lr_scheduler=warmup_reduce_lr_on_plateau; trainer=cpu; criterion=cross_entropy
Traceback (most recent call last):
File "/Volumes/MacOS/PycharmProjects/openspeech/openspeech/openspeech_cli/hydra_train.py", line 60, in
hydra_main()
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/main.py", line 53, in decorated_main
config_name=config_name,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 368, in _run_hydra
lambda: hydra.run(
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/utils.py", line 371, in
overrides=args.overrides,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 91, in run
run_mode=RunMode.RUN,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 568, in compose_config
from_shell=from_shell,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 150, in load_configuration
from_shell=from_shell,
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 231, in _load_configuration_impl
parsed_overrides = parser.parse_overrides(overrides=overrides)
File "/Volumes/MacOS/PycharmProjects/openspeech/venv/lib/python3.7/site-packages/hydra/core/override_parser/overrides_parser.py", line 100, in parse_overrides
) from e.cause
hydra.errors.OverrideParseException: LexerNoViableAltException: dataset=librispeech;
^
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

Process finished with exit code 1

Details

Deepspeech2 training error

Environment info

  • Platform: Windows 10
  • Python version: 3.7
  • PyTorch version (GPU?): PyTorch version : 1.8.1, CUDA version : 10.2
  • Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): Deepspeech2, listen_attend_spell_with_multi_head

command :

$ python ./openspeech_cli/hydra_train.py \
    dataset=ksponspeech \
    dataset.dataset_path=E:/KsponSpeech \
    dataset.manifest_file_path=E:/kspon_character_manifest.txt \  
    dataset.test_dataset_path=E:/test \
    dataset.test_manifest_dir=E:/KsponSpeech_scripts \
    vocab=kspon_character \
    model=deepspeech2\
    audio=melspectrogram \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=ctc

The problem arises when using:

  | Name      | Type        | Params
------------------------------------------
0 | criterion | CTCLoss     | 0     
1 | encoder   | DeepSpeech2 | 90.2 M
------------------------------------------
90.2 M    Trainable params
0         Non-trainable params
90.2 M    Total params
360.606   Total estimated model params size (MB)
Global seed set to 1
Epoch 0:   1%|โ–         | 4254/310637 [1:02:37<75:09:50,  1.13it/s, loss=9.13, v_num=0]Traceback (most recent call last):
  File "C:/Users/cote/PycharmProjects/kospeech2/openspeech_cli/hydra_train.py", line 51, in hydra_main
    trainer.fit(model, data_module)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
    self._run(model)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run
    self.dispatch()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage
    return self.run_train()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 490, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 731, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 432, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\optim\optimizer.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\optim\adam.py", line 66, in step
    loss = closure()
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 726, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 827, in training_step_and_backward
    self.backward(result, optimizer, opt_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 865, in backward
    result.closure_loss, optimizer, opt_idx, should_accumulate, *args, **kwargs
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 309, in backward
    self.lightning_module, closure_loss, optimizer, optimizer_idx, should_accumulate, *args, **kwargs
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 79, in backward
    model.backward(closure_loss, optimizer, opt_idx)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\pytorch_lightning\core\lightning.py", line 1275, in backward
    loss.backward(*args, **kwargs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\cote\Anaconda3\envs\sc\lib\site-packages\torch\autograd\__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

The above error occurred while training Deepspeech2, and the same error occurred in the listen_attend_spell_with_multi_head model.

Training and decoding speed

Hi

Thank you for the nice software. Could you please share the information:

  1. How long the training on librispeech takes on how many GPUs
  2. How fast is the decoding (RTF?)

Apply documentation theme

๐Ÿš€ Feature request

Motivation

Apply documentation theme

Your contribution

Apply documentation theme

Using conformer_transformer and rnnt_loss error

Environment info

  • Platform: Ubuntu 16.04
  • Python version: 3.7.10
  • PyTorch version (GPU?): PyTorch 1.7.0, CUDA 10.2
  • Using GPU in script?: GeForce RTX 2080 Ti

Information

Model I am using (ListenAttendSpell, Transformer, Conformer ...): conformer_transducer

The problem arises when using:
CUDA_VISIBLE_DEVICES=8 python ./openspeech_cli/hydra_train.py dataset=libri dataset.dataset_path=/data/dataset/Libri/LibriSpeech dataset.dataset_download=False dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt vocab=libri_subword vocab.vocab_size=5000 vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset model=conformer_transducer audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu trainer.batch_size=4 criterion=transducer

Error executing job with overrides: ['dataset=libri', 'dataset.dataset_path=/data/dataset/Libri/LibriSpeech', 'dataset.dataset_download=False', 'dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt', 'vocab=libri_subword', 'vocab.vocab_size=5000', 'vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset', 'model=conformer_transducer', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'trainer.batch_size=4', 'criterion=transducer']
Traceback (most recent call last):
  File "./openspeech_cli/hydra_train.py", line 51, in hydra_main
    trainer.fit(model, data_module)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 442, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 733, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 98, in training_step
    return self.model(*args, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
    output = super().forward(*inputs, **kwargs)
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/conformer_transducer/model.py", line 110, in training_step
    return super(ConformerTransducerModel, self).training_step(batch, batch_idx)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 268, in training_step
    target_lengths=target_lengths,
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 90, in collect_outputs
    target_lengths=target_lengths.int(),
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/criterion/transducer/transducer.py", line 96, in forward
    gather=self.gather,
  File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/warp_rnnt-0.4.0-py3.7-linux-x86_64.egg/warp_rnnt/__init__.py", line 74, in rnnt_loss
    index[:, :, :U-1, 1] = labels.unsqueeze(dim=1)
RuntimeError: The expanded size of the tensor (61) must match the existing size (62) at non-singleton dimension 2.  Target sizes: [4, 355, 61].  Tensor sizes: [4, 1, 62]

The Loss function:

def rnnt_loss(log_probs: torch.FloatTensor,
              labels: torch.IntTensor,
              frames_lengths: torch.IntTensor,
              labels_lengths: torch.IntTensor,
              average_frames: bool = False,
              reduction: Optional[AnyStr] = None,
              blank: int = 0,
              gather: bool = False) -> torch.Tensor:

    The CUDA-Warp RNN-Transducer loss.

    Args:
        log_probs (torch.FloatTensor): Input tensor with shape (N, T, U, V)
            where N is the minibatch size, T is the maximum number of
            input frames, U is the maximum number of output labels and V is
            the vocabulary of labels (including the blank).
        labels (torch.IntTensor): Tensor with shape (N, U-1) representing the
            reference labels for all samples in the minibatch.
        frames_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            number of frames for each sample in the minibatch.
        labels_lengths (torch.IntTensor): Tensor with shape (N,) representing the
            length of the transcription for each sample in the minibatch.
        average_frames (bool, optional): Specifies whether the loss of each
            sample should be divided by its number of frames.
            Default: False.
        reduction (string, optional): Specifies the type of reduction.
            Default: None.
        blank (int, optional): label used to represent the blank symbol.
            Default: 0.
        gather (bool, optional): Reduce memory consumption.
            Default: False.

The log_probs and labels of model output and function requirements are inconsistent

Document update

๐Ÿš€ Feature request

Document update

Motivation

Version 0.2 released.

Your contribution

Encoding error when parsing manifest file

Information

self.encoding = 'utf-8' if self.configs.tokenizer.unit == 'kspon_subword' else 'cp949'


def _parse_manifest_file(self):
    r"""
    Parsing manifest file.

    Returns:
        audio_paths (list): list of audio path
        transcritps (list): list of transcript of audio
    """
    audio_paths = list()
    transcripts = list()

    with open(self.configs.dataset.manifest_file_path, encoding=self.encoding) as f:
        for idx, line in enumerate(f.readlines()):
            audio_path, korean_transcript, transcript = line.split('\t')
            transcript = transcript.replace('\n', '')

            audio_paths.append(audio_path)
            transcripts.append(transcript)

    return audio_paths, transcripts

Expected behavior

self.encoding = 'utf-8' if self.configs.tokenizer.unit == 'kspon_subword' else 'cp949'

--> self.encoding = 'cp949' if self.configs.tokenizer.unit == 'kspon_grapheme' else 'utf-8'

Version 0.3.0

๐Ÿš€ Version 0.3.0

  • Vocabulary => Tokenizer class
  • Re-documentation
  • Re-factoring models directory

Motivation

To provide the modules used by Openspeech in the form of a library.

Your contribution

Development, design architecture

kspondata์˜ preprocessed manifest ํŒŒ์ผ ๋‹ค์šด

๋จผ์ € ์ข‹์€ ์†Œ์ŠคํŒŒ์ผ์„ ๊ณต์œ ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

kspondata์˜ preprocessed manifest ํŒŒ์ผ ๋‹ค์šด์„ ๋ฐ›์œผ๋ ค๊ณ 

์•ก์„ธ์Šค ๊ถŒํ•œ ์‹ ์ฒญ์„ ํ•œ ์ƒํƒœ์ธ๋ฐ ,

๊ถ๊ธˆํ•œ์ ์€

  1. ํ˜น์‹œ manifest๋กœ ๋งŒ๋“œ๋Š” preprocess ์ฝ”๋“œ๊ฐ€ ์กด์žฌํ•˜๋‚˜์š”?

  2. manifest ํ˜•์‹์ด ์ด์ „์˜ repo์ธ kospeech์—์„œ ์ƒ์„ฑํ•œ [transcripts.txt] ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•œ๋ฐ manifest๋ฅผ transcripts.txt์œผ๋กœ ๋Œ€์ฒดํ•ด๋„ ๋ ๊นŒ์š”?

  3. training์‹œ test_manifest_dir ๋ฐ test_dataset_path์˜ ๊ฒฝ๋กœ๊ฐ€ ํ•„์š”ํ•œ๋ฐ, ์ด๋ฅผ ์œ„ํ•ด train/test dataset๊ณผ train/test manifest ํ…์ŠคํŠธ ํŒŒ์ผ์ด ๊ฐ๊ฐ ํ•„์š”ํ•œ๊ฐ€์š”? ๋งŒ์•ฝ ์•„๋‹ˆ๋ผ๋ฉด train/test๋กœ ๋‚˜๋ˆ„๋Š” ์ฝ”๋“œ๊ฐ€ ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค!

์—‘์„ธ์Šค ๊ถŒํ•œ ์ˆ˜๋ฝํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.