Giter VIP home page Giter VIP logo

speaker-change-detection's Introduction

Speaker Change Detection

Implementation of the paper: https://arxiv.org/abs/1702.02285

license dep1 dep2

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using indomain speaker data.

The accuracy is very high and close to 100%, as reported in the paper.

Get Started

Because it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:

You should have this:

  • /tmp/speaker-change-detection-data.pkl
  • /tmp/speaker-change-detection-norm.pkl
  • /tmp/speaker-change-detection/*.pkl

The final plots are generated as /tmp/distance_test_ID.png where ID is the id of the plot.

Be careful you have enough space in /tmp/ because you might run out of disk space there. If it's the case, you can modify all the /tmp/ references inside the codebase to any folder of your choice.

Now run those commands to reproduce the results.

git clone [email protected]:philipperemy/speaker-change-detection.git
cd speaker-change-detection
virtualenv -p python3.6 venv # probably will work on every python3 impl.
source venv/bin/activate
pip install -r requirements.txt
# download the cache and all the files specified above (you can re-generate them yourself if you wish).
cd ml/
export PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py
export PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py
export PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py

To regenerate only the VCTK cache, run:

cd audio/
export PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py

Contributions

Contributions are welcome! Some ways to improve this project:

  • Given any audio file, is it possible to test it and detect any speaker change?

Questions

  • Given any audio file, is it possible to test it and detect any speaker change? Yes, as long as it follows the same structure as the VCTK Corpus dataset.

  • Is there any way to test the trained model to detect speaker changes of our audio files? Yeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.

speaker-change-detection's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

speaker-change-detection's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.