Giter VIP home page Giter VIP logo

e2e-keyword-spotting's Introduction

E2E-Keyword-Spotting

Joint End to End Approaches to Improving Far-field Wake-up Keyword Detection

๐Ÿ”ง Dependencies and Installation

  1. Install dependent packages

    cd E2E-Keyword-Spotting
    pip install -r requirements.txt
  2. Or use conda

    cd E2E-Keyword-Spotting
    conda env create -f environment.yaml

๐Ÿข Dataset Preparation

How to Use

Dataset is from Google Speech Command published in arxiv.

  • Data Pre-processing (Has already been done)
  1. According to the file, dataset has already been splited into three folders, train, test, and valid.
  2. The splited Google Speech Command dataset is saved in Google Drive folder.

๐Ÿ’ป Train and Test

Training commands

  • Single GPU Training:
python train.py
  • Distributed Training:
CUDA_VISIBLE_DEVICES=0,1 python train.py

Test commands

python test.py 

Neural Network Architectures

General Architecture

  1. Multi-head Attention

* Encoder: GRU/LSM * Attention Heads: 8 * GRU hidden nodes: 128/256/512 * GRU layers: 1/2/3 * Increasing GRU hidden layers nodes can increase the performance much better than increasing hidden layers
  1. VGG19/VGG16/VGG13/VGG11 with/without batch normalization
  2. Deep Residual Neural Network ('resnet18', 'resnet34', 'resnet50')
  3. Wide Residual Networks ('wideresnet28_10') imported from the repository
  4. Dual Path Networks from arxiv
  5. Densely Connected Convolutional Networks from arxiv

Result

Model Parameters

Best Accuracy Training Process

image

Best Loss Training Process

image

Files Description

โ”œโ”€โ”€ kws
โ”‚ย ย  โ”œโ”€โ”€ metrics
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ fnr_fpr.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ init.py
โ”‚ย ย  โ”œโ”€โ”€ models
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ attention.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ crnn.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ densenet.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ dpn.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ init.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ resnet.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ resnext.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ treasure_net.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ vgg.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ wideresnet.py
โ”‚ย ย  โ”œโ”€โ”€ transforms
โ”‚ย ย  โ”œโ”€โ”€ utils.py
โ”œโ”€โ”€ config.py

  • ./kws/metrics : Evaluation matrics, defining the False Rejection Rate (FRR) and False Alarm Rate (FAR) for keyword spotting
  • ./kws/models : Diffferent network architecture
  • .config.py : Configuration about parameters and hyperparameters

e2e-keyword-spotting's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

krits4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.