Giter VIP home page Giter VIP logo

pyannote-audio's Introduction

Using pyannote.audio open-source toolkit in production? Make the most of it thanks to our consulting services.

pyannote.audio speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

  1. Install pyannote.audio with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.1 user conditions
  4. Create access token at hf.co/settings/tokens.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

Documentation

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %):

Benchmark v2.1 v3.1 Premium
AISHELL-4 14.1 12.2 11.9
AliMeeting (channel 1) 27.4 24.4 22.5
AMI (IHM) 18.9 18.8 16.6
AMI (SDM) 27.1 22.4 20.9
AVA-AVD 66.3 50.0 39.8
CALLHOME (part 2) 31.6 28.4 22.2
DIHARD 3 (full) 26.9 21.7 17.2
Earnings21 17.0 9.4 9.0
Ego4D (dev.) 61.5 51.2 43.8
MSDWild 32.8 25.3 19.8
RAMC 22.5 22.2 18.4
REPERE (phase2) 8.2 7.8 7.6
VoxConverse (v0.3) 11.2 11.3 9.4

Diarization error rate (in %)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

pyannote-audio's People

Contributors

hbredin avatar kamilakesbi avatar mogwai avatar frenchkrab avatar juanmc2005 avatar clement-pages avatar pkorshunov avatar yinruiqing avatar wesbz avatar flyingleafe avatar kan-cloud avatar clbarras avatar marvinlvn avatar mymoza avatar j-petiot avatar martinjbaker avatar hadware avatar dependabot[bot] avatar paullerner avatar julien-c avatar greggovit avatar entn-at avatar bghorvath avatar olvb avatar paulgb avatar philschmid avatar prashanthellina avatar wq2012 avatar rhenanbartels avatar sevkod avatar

Forkers

sanchit-gandhi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.