Giter VIP home page Giter VIP logo

whisper-plus's Introduction

WhisperPlus: Advancing Speech-to-Text Processing ๐Ÿš€

teaser

๐Ÿ› ๏ธ Installation

pip install whisperplus

๐Ÿค— Model Hub

You can find the models on the HuggingFace Spaces or on the HuggingFace Model Hub

๐ŸŽ™๏ธ Usage

To use the whisperplus library, follow the steps below for different tasks:

๐ŸŽต Youtube URL to Audio

from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

# Define the URL of the YouTube video that you want to convert to text.
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"

# Initialize the Speech to Text Pipeline with the specified model.
audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")

# Run the pipeline on the audio file.
transcript = pipeline(
    audio_path=audio_path, model_id="openai/whisper-large-v3", language="english"
)

# Print the transcript of the audio.
print(transcript)

Summarization

from whisperplus.pipelines.summarization import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])

Speaker Diarization

from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

Contributing

pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files

๐Ÿ“œ License

This project is licensed under the terms of the Apache License 2.0.

๐Ÿค— Acknowledgments

This project is based on the HuggingFace Transformers library.

๐Ÿค— Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

whisper-plus's People

Contributors

kadirnar avatar cobanov avatar pre-commit-ci[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.