Giter VIP home page Giter VIP logo

awesome-diarization's Introduction

Awesome Speaker Diarization Awesome Contribution

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request.

Publications

2018

2017

2015

2014

2013

2011

2010

2008

2006

Software

Framework

Link Language Description
SIDEKIT for diarization (s4d) Python An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR Python & Perl Speaker diarization scripts, based on AaltoASR.
LIUM_SpkDiarization Java LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr Build Bash Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg C++ ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio Python Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK Python Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.

Evaluation

Link Language Description
pyannote-metrics Build Python A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER Build Python A lightweight library to compute Diarization Error Rate (DER).
modified NIST md-eval.pl Perl From Mary Tai Knox
NIST md-eval-v21.pl Perl From jitendra
NIST md-eval-22.pl Perl From nryant
dscore Python & Perl Diarization scoring tools.
Sequence Match Accuracy Python Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link Language Description
uis-rnn Build Python & PyTorch Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization.
SpectralCluster Build Python Spectral clustering with affinity matrix refinement operations.
sklearn.cluster Build Python scikit-learn clustering algorithms.
PLDA Python Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA C++ Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).

Speaker embedding

Link Method Language Description
Speaker_Verification d-vector Python & TensorFlow Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification d-vector Python & PyTorch PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
x-vector-kaldi-tf x-vector Python & TensorFlow & Perl Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector i-vector C++ & Perl Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector i-vector Perl Voxceleb1 i-vector based speaker recognition system.

Other

Link Language Description
VB Diarization Python VB Diarization with Eigenvoice and HMM Priors.

Datasets

Audio Diarization ground truth Language Pricing Additional information
2000 NIST Speaker Recognition Evaluation Disk-6 (Switchboard), Disk-8 (CALLHOME) Multiple $2400.00 Evaluation Plan
2003 NIST Rich Transcription Evaluation Data Together with audios en, ar, zh $2000.00 telephone speech, broadcast news
CALLHOME American English Speech CALLHOME American English Transcripts en $1500.00 + $1000.00 CH109 whitelist
The ICSI Meeting Corpus Together with audios en Free License
The AMI Meeting Corpus Together with audios (need to be processed) Multiple Free License
Fisher English Training Speech Part 1 Speech Fisher English Training Speech Part 1 Transcripts en $7000.00 + $1000.00
Fisher English Training Part 2, Speech Fisher English Training Part 2, Transcripts en $7000.00 + $1000.00

Leaderboards

Other learning materials

Tech blog

Video tutorials

Products

Company Product
Google Google Cloud Speech-to-Text API
Amazon Amazon Transcribe
IBM Watson Speech To Text API
DeepAffects Speaker Diarization API

awesome-diarization's People

Contributors

hbredin avatar hedonistrh avatar josepatino avatar wq2012 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.