Giter VIP home page Giter VIP logo

dub-inference's Introduction

DubMaster

Logo

DubMaster is on a mission to allow content creators to broadcast to a wider audience. We identified a problem where current generation’s automatic dubbing services are relatively expensive. Our solution is to leverage open source models to democratize the dubbing of content.

Technical Overview

DubMaster works in several steps to provide automatic dubbing services:

  1. Encoding: The encoding worker is responsible for processing the input video and audio files. It uses ffmpeg to shorten the video and extract the audio. The shortened video and audio are then uploaded to S3 for storage.

  2. Speaker Diarization: The speaker-diarization worker identifies different speakers in the audio file. This information is used later in the voice cloning process.

  3. Transcription and Translation: The audio file is transcribed using OpenAI's API. The transcribed text is then translated into the target language.

  4. Voice Cloning: The voice cloning process uses Eleven Labs' API to clone the voices of the identified speakers. The cloned voices are used to dub the translated text.

  5. Stitching and Combining: The final step is to stitch the dubbed audio segments together and combine them with the video file. The result is a video file with dubbed audio in the target language.

Prerequisites

Please ensure you have these prerequisites before running DubMaster.

  • TemporalIO for background jobs
  • A GPU for speaker diarization
  • OpenAI API key for transcription and translation
  • Eleven Labs for voice cloning
  • S3 for object storage

Running Encoding Worker

Follow these steps to run the encoding worker:

  1. Navigate to the encoding worker directory of the project.
  2. Create a virtual environment and activate it:
    python3 -m venv venv
    source venv/bin/activate
    
  3. Install the required dependencies using pip:
    pip install -r requirements.txt
    
  4. Run the worker:
    python run_worker.py
    

Running Speaker-Diarization Worker

Follow these steps to run the speaker-diarization worker on a GPU server:

  1. Navigate to the speaker-diarization worker directory of the project.
  2. Run the worker:
    CUDA_VISIBLE_DEVICES=0 python run_worker.py
    

Running Dub-API

Dub-API is a FastAPI server. You can find the repository here. Follow these steps to run it:

  1. Navigate to the root directory of the project.
  2. Install the required dependencies using pip:
    pip install -r requirements.txt
    
  3. Run the server using uvicorn:
    uvicorn main:app --reload
    
    The server will start running on http://127.0.0.1:8000.

Running Dub-Web

Dub-Web is a React frontend. You can find the repository here. Follow these steps to run it:

  1. Navigate to the dub-web directory of the project.
  2. Install the required dependencies using npm:
    npm install
    
  3. Run the frontend using npm:
    npm start
    
    The frontend will start running on http://localhost:3000.

Backlog

  • Downloading YouTube video
  • Speaker diarization
  • Transcription
  • Translation
  • Voice cloning
  • Video/audio mixing
  • Add back any background audio
  • Move transcription to a GPU worker
  • Move translation to an open source model
  • Move voice cloning to open source
  • Modify speaker's lips in the video to match the translated audio

dub-inference's People

Contributors

ilyasubkhankulov avatar mariusbld avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.