Giter VIP home page Giter VIP logo

specaugment's Introduction

SpecAugment

Implementation of SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Notes

  • The paper introduces three techniques for augmenting speech data in speech recognition.
  • They come from the observation that spectrograms which often used as input can be treated as images, so various image augmentation methods can be applied.
  • I find the idea interesting.
  • It covers three methods: time warping, frequency masking, and time masking.
  • Details are clearly explained in the paper.
  • While the first one, time warping, looks salient apparently, Daniel, the first author, told me that indeed the other two are much more important than time warping, so it can be ignored if necessary. (Thanks for the advice, Daniel!)
  • I found that implementing time warping with TensorFlow is tricky because the relevant functions are based on the static shape of the melspectrogram tensor, which is hard to get from the pre-defined graph.
  • I test frequency / time masking on Tensor2tensor's LibriSpeech Clean Small Task.
  • The paper used the LAS model, but I stick to Transformer.
  • To compare the effect of specAugment, I also run a base model, which is without augmentation.
  • With 4 GPUs, training (for 500K) seems to take more than a week.

Requirements

  • TensorFlow==1.12.0
  • tensor2tensor==1.12.0

Script

echo "No specAugment"
# Set Paths
MODEL=transformer
HPARAMS=transformer_librispeech_v1

PROBLEM=librispeech_clean_small
DATA_DIR=data/no_spec
TMP_DIR=tmp
TRAIN_DIR=train/$PROBLEM

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=500000 \
  --eval_steps=3 \
  --local_eval_frequency=5000 \ 
  --worker_gpu=4

echo "specAugment"
# Set Paths
PROBLEM=librispeech_specaugment
DATA_DIR=data/spec
TMP_DIR=tmp
TRAIN_DIR=train/$PROBLEM
USER_DIR=USER_DIR

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
t2t-trainer \
  --t2t_usr_dir=$USER_DIR \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=500000 \
  --eval_steps=3 \
  --local_eval_frequency=5000 \ 
  --worker_gpu=4

Results

Training loss

  • Apparently augmentation seems to do harm on training loss. It is understandable and expected.

Word Error Rate (SpecAugment (top) vs. No augmentation (bottom))

  • The base model looks messy. The WER hangs around 26%, which is bad.
  • The specAugment model looks much neater. The WER reached 20% after 500k of training. I don't think it is good enough, though.

specaugment's People

Contributors

kyubyong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

specaugment's Issues

After mel_banks computation , an error comes to warp time computation

hi , guy:
Thanks for contributing specAug codes . after merged my own t2t codes , the code about warp time part comes to me . After debug , I found bug position is sparse_image_warp.py , in function _get_grid_locations(image_height,image_width) , parameter image_height is NoneType but not int32 .
have you met this problem ?

Can we finetune the pretrained model on specaugmented data?

I have trained a model which performance was good, can I finetune this pretrained model on specaugmented data to get better performance?

How did you get your result, did you train your model with original data and specaugmented data respectively and got results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.