Giter VIP home page Giter VIP logo

speech-emotion-recognition's Introduction

Emotion Recognition and Intent Detection through Speech

This project recognises emotions through speech. It also detects intent and extract entities from the message. It uses NeMo as its base for speech recognition and Snips NLU for intent detection.

NeMo

NeMo (Neural Modules) is a toolkit for creating AI applications using neural modules - conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations.

NeMo consists of:

  • NeMo Core: fundamental building blocks for all neural models and type system.
  • NeMo collections: pre-built neural modules for particular domains such as automatic speech recognition (nemo_asr), natural language processing (nemo_nlp) and text synthesis (nemo_tts).

Introduction

See this video for a quick walk-through.

Requirements

  1. Python 3.6 or 3.7
  2. PyTorch 1.4.* with GPU support
  3. (optional for best performance) NVIDIA APEX. Install from here: https://github.com/NVIDIA/apex

Getting started

THE LATEST STABLE VERSION OF NeMo is 0.9.0 (which is available via PIP).

NVIDIA's NGC PyTorch container is recommended as it already includes all the requirements above.

  • Pull the docker: docker pull nvcr.io/nvidia/pytorch:19.11-py3
  • Run: docker run --runtime=nvidia -it -v <nemo_github_folder_path>:/NeMo --shm-size=8g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:19.11-py3
pip install nemo-toolkit  # installs NeMo Core
pip install nemo-asr # installs NeMo ASR collection
pip install nemo-nlp # installs NeMo NLP collection
pip install nemo-tts # installs NeMo TTS collection

Documentation

NeMo documentation

See examples/start_here to get started with the simplest example. The folder examples contains several examples to get you started with various tasks in NLP and ASR.

Tutorials

Snips NLU

Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natural language. The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.

pip install snips-nlu  # installs snips-nlu
python -m snips_nlu download en  # installs language resource for snips

Running Application

  • Using pretrained model:
  1. From <nemo_git_root>/examples/applications/asr_service folder do: export FLASK_APP=asr_service.py and start service: flask run --host=0.0.0.0 --port=6006
  2. Open recognize.html with any browser and upload a .wav file

Note: the service will only work correctly with single channel 16Khz .wav files.

  • Training own model on Librispeech data:
  1. Get data

These scripts will download and convert LibriSpeech into format expected by nemo_asr:

# note that this script requires sox to be installed
# to install sox on Ubuntu, simply do: sudo apt-get install sox
# and then: pip install sox
# get_librispeech_data.py script is located under <nemo_git_repo_root>/scripts
python get_librispeech_data.py --data_root=data --data_set=dev_clean,train_clean_100
# To get all LibriSpeech data, do:
# python get_librispeech_data.py --data_root=data --data_set=ALL

After download and conversion, your data folder should contain 2 json files:

  • dev_clean.json
  • train_clean_100.json
  1. Move to <nemo_git_root>/examples/asr/notebooks folder and run ASR using Librispeech dataset.ipynb file.
  2. Now follow steps 1 and 2 of pretrained model to see the application in service.

Author

MiKueen

Citation

@misc{nemo2019,

title={NeMo: a toolkit for building AI applications using Neural Modules}, author={Oleksii Kuchaiev and Jason Li and Huyen Nguyen and Oleksii Hrinchuk and Ryan Leary and Boris Ginsburg and Samuel Kriman and Stanislav Beliaev and Vitaly Lavrukhin and Jack Cook and Patrice Castonguay and Mariya Popova and Jocelyn Huang and Jonathan M. Cohen}, year={2019}, eprint={1909.09577}, archivePrefix={arXiv}, primaryClass={cs.LG}

}

speech-emotion-recognition's People

Contributors

mikueen avatar dependabot[bot] avatar

Stargazers

Annapurna Padmanabhan avatar  avatar Harsh Bhandari avatar dp avatar

Watchers

 avatar

Forkers

dhruvijain15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.