Giter VIP home page Giter VIP logo

tensorflow_end2end_speech_recognition's Introduction

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

  • TensorFlow >= 1.3.0
  • tqdm >= 4.14.0
  • python-Levenshtein >= 0.12.0
  • setproctitle >= 1.1.10
  • seaborn >= 0.7.1

Corpus

  • Phone (39, 48, 61 phones)
  • character
  • Phone (under implementation)
  • Character
  • Word
  • Phone (under implementation)
  • Japanese kana character (about 150 classes)
  • Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

  • Switchboard
  • WSJ
  • AMI

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Encoder

  • BLSTM
  • LSTM
  • BGRU
  • GRU
  • VGG-BLSTM
  • VGG-LSTM
  • Multi-task BLSTM
    • you can set another CTC layer to the aubitrary layer.
  • Multi-task LSTM
  • VGG

Connectionist Temporal Classification (CTC) [Graves+ 2006]

  • Greedy decoder
  • Beam Search decoder
  • Beam Search decoder w/ CharLM (under implementation)
Options
  • Frame-stacking [Sak+ 2015]
  • Multi-GPUs training (synchronous)
  • Splicing
  • Down sampling (under implementation)

Attention Mechanism

Decoder
  • Greedy decoder
  • Beam search decoder (under implementation)
Attention type
  • Bahdanau's content-based attention
  • Bahdanau's normed content-based attention (under implementation)
  • location-based attention
  • Hybrid attention
  • Luong's dot attention
  • Luong's scaled dot attention (under implementation)
  • Luong's general attention
  • Luong's concat attention
  • Baidu's attention (under implementation)
Options
  • Sharpning
  • Temperature regularization in the softmax layer (Output posteriors)
  • Joint CTC-Attention [Kim 2016]
  • Coverage (under implementation)

Usage

Please refer to docs in each corpuse

  • TIMIT
  • LibriSpeech
  • CSJ

Lisense

MIT

Contact

[email protected]

tensorflow_end2end_speech_recognition's People

Contributors

hirofumi0810 avatar hlthu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.