Giter VIP home page Giter VIP logo

aiaudiotranscriber's Introduction

AI Audio Transcriber

Wallet Icon

A minimalistic application to generate transcriptions for audio built using Python

🚀 Demo

v.0.0.1

AITranscriber Snapshot

v.0.0.2 (Transcribing a Youtube Video Explaining Whisper)

AITranscriber Snapshot v2

v.0.0.2 (Transcribing an English Song - Thinkin About It)

AITranscriber Snapshot v2

v.0.0.3 (Transcribing a clip from Lex Fridman's podcast)

AITranscriber Snapshot v3

v.0.0.4 (Transcribing another clip from Lex Fridman's podcast)

AITranscriber Snapshot v4

📝 Basic Application WorkFlow

flowchart LR 
    U([Cliemt])
    
    I{Choose\n Input Mode}
    U -----> I
    
    I1[YouTube Video URL] 
    I2[Upload Video File]
    I3[Upload Audio File]
    I ---> I1 & I2 & I3

    YTC{"Check if\n Audio is available?"}
    YTA("Download video\n from YouTube")
    YTV("Download video\n from YouTube")
    
    I1 ---> YTC
    YTC --yes---> YTA
    YTC --no---> YTV

    VTA["Convert Video to Audio"]
    YTV ---> VTA
    I2 ---> VTA

    LA["Load Audio File"]
    YTA & VTA & I3---> LA
    
    M{"Choose\n Model Type"}
    U -----> M

    M1[(Ramanujan)]
    M2[(Bose)]
    M3[(Raman)]
    M4[(Kalam)]
    M ---> M1 & M2 & M3 & M4

    LM[Load Relevant Whisper Model]
    M1 & M2 & M3 & M4 --> LM

    GT("Generate Transcripts")
    LA & LM ---> GT

    O1(["Detected \n Language"])
    O2(["Complete \nSubtitle Text"])
    O3(["Subtitles \nwith Timestamps"])
    GT ---> O1 & O2 & O3

    OF(["Original\n Audio or Video"])
    D{{"Display to Client"}}
    I ---> OF
    O1 & O2 & OF ---> D

    DO{"Choose\n Output Option"}
    D1["SRT\n File"]
    D2["VTT\n File"]
    D3["Text\n File"]
    DP["Process Subtitle Object"]
    DN{{"Download Button"}}

    O3 ---> DP
    U ---> DO
    DO ---> D1 & D2 & D3 ---> DP ---> DN

    subgraph Result
        D
        DN
    end

🥊CI/CD

(Preferred Pipeline Using GitHub Actions for Docker Image)

Docker CI/CD

⚒️ Set-Up Instructions

SetUp Icon

  • Open your terminal / command prompt.

  • Clone the repository

    git clone https://github.com/smaranjitghose/AIAudioTranscriber.git
    
  • Change the directory to the cloned project

    cd AIAudioTranscriber
    

A. Without using Docker

  • Ensure you have any version of Python below 3.10 installed in your system and you have virtualenv package installed

    which python
    
    pip install virtualenv
    
  • Create a new virtual environment

    python -m venv env
    
  • Activate virtual enviroment

    • On Mac/Linux
      source env/bin/activate
      
    • On Windows
      env/Scripts/Activate.ps1 
      
  • Install ffmpeg in your local syste,

    • On Windows using Chocolatey
      choco install ffmpeg
      
    • On MacOS using Homebrew
      brew install ffmpeg 
      
    • On Debian/Ubuntu
      sudo apt update && sudo install ffmpeg
      
    • On Arch Linux
      sudo pacman -S ffmpeg 
      
  • Install the dependencies

    pip install -r requirements.txt
    
  • Download the model weights (This will take a few minutes since the total size of models in gigabytes)

    python get_model_weights.py
    
  • Run the Web application

    streamlit run .\Home.py
    

    Note:

    • If the app does not load by itself in your default browser, open a browser of your choice and navigate to http://localhost:8501
    • To stop the application, press CTRL + C in your terminal

B. Using Docker

  • Make sure you have Docker installed on your system. Refer the documentation here if you need assistance setting up.
  • Build a docker image
    docker run -t aitranscriber:v0.0.4 .
    

    Note:

    • You may give any name instead of aitranscriber and any tag instead of v0.0.4
    • Depending on your system it takes a few minutes to successfully build the image
  • Once complete, check the docker image
    docker images
    
  • Create and run a Docker Container for the image
    docker run -p 8501:8501 aitranscriber:v0.0.4
    

    Note:

    • docker run -p <hostport>:<8501> <container_name>:<tag_name>
    • In the above command, you can play around with which port of your host system you wish to map to the 8501 port of the container
    • If you used a different docker image name and/or different tag, make sure to update it in the command
  • Open your preferred Web Browser and navigate to http://localhost:8501

    Note:

    • If you used a different host port in the above command then navigate to that one, http://localhost:<host_port>
    • To stop the container, in the terminal check the containter name: docker ps --all
    • Now use container name with the command: docker stop <container_name>

🌏Deployment Options

Hosting Icon

  • Streamlit Cloud

  • HuggingFace Spaces

  • Fly

  • Railway

  • Render

  • Cyclic

  • Heroku

  • Digital Ocean

  • Google Cloud Run

    • Install Google Cloud CLI
    • Create an Account on Google Cloud
    • Create a New Project
    • Build and Push Docker Image to Google Container Registry
      gcloud builds submit --tag gcr.io/<ProjectName>/<AppName>  --project=<ProjectName>
      
    • Deploy the Docker Container
      gcloud run deploy --image gcr.io/<ProjectName>/<AppName> --platform managed --project=<ProjectName> --allow-unauthenticated
      
  • Amazon EC2 Instance

  • Azure App

(Using Google Colab/Kaggle as temporary MVP server)

  • pyngrok

    • Step 1: Install pyngrok in Google Colab

      ! pip install pyngrok
      
    • Step 2: Sign-up in ngrok and get Authentication Token

    • Step 3: Authenticate

         from pyngrok import ngrok
         ngrok.set_auth_token("xxx")
    • Step 4: Load the Streamlit App at port 8051, create a tunnel for it and reveal the public URL for the tunnel

         !nohup streamlit run app.py --server.port 8051 &
         url = ngrok.connect(8051).public_url
         print(url)
    • Step 5: Share URL with client

  • localtunnel

    • Step 1: Install localtunnel

      npm install -g localtunnel
      
    • Step 2

      streamlit run Home.py & npx localtunnel --port 8501
      
    • Step 3: Share URL with client

(Using local server as temporary MVP server)

  • NGINX + Cloudfare/ngrok

🏗️ Future Work

  • Download and use audio from Youtube Video

  • Download and use online audio file

  • Use Session States and Caching for Better UX

  • Display the language detected propely (without using the shortcode)

  • Generate Dedicated SRT,VTT files for transcripts (in addition to txt)

  • Update Model options to honour the name of prominent Indian Scientists

  • Option to limit/increase input model file size

  • Functionality to check the validity URL provided for Youtube Video

  • Add Custom Favicon File

  • Add Scrollable Text Area for Generated Transcripts

  • Containerize the Application with Docker

  • Troubleshoot Docker Container locally

  • Create Basic Workflow on GitHub Actions for Docker Image Build

  • Create Comprehensive Workflow on GitHub Actions for Docker Image Build

  • Resolve bug: Youtube video with multiple audios should download default audio.

    • Example: This clip from Huberman Lab is in English yet the script fetches the spanish audio codec from Youtube
  • Test Application by spinning up it's Container on Google Cloud Run

    • Push to a particular Docker Image Registry
    • Set TTL
    • Play around with system resources
    • Test with custom domain
  • Add Google Cloud's CI/CD to repo on push/pull requests

    • Use cloudbuild.yaml file
    • Update build time to 2 hours
  • Optimize Docker Image Size

  • Better CI/CD

  • Kubernetes Upgrade

  • Better GitHub Actions

More Features:

  • Burn transcripts to user-uploaded video ```python import os output_video = "final.mp4"

      os.system(f"ffmpeg -i {input_video} -vf subtitles={subtitle} {output_video}")
      ```
    
  • Summarize subtitles

  • Sentiment analysis on video summary

  • Batch transcript generation + summary + sentiment analysis

  • Dashboard for video review(s)

Speaker Diarization: Only if Community requires

  • Incorporate Speaker Diarization for Podcast/Vlog/Conversational Clips
  • Test it with burning transcripts to user uploaded video
  • Test it with transcript summarization

More Aligned Subtitles: Only if Community requires

  • Word Level Timestamps for transcripts + Generate ASS Transcript File

  • Test it with burning transcripts to user uploaded video

  • Test it with previous speaker diarization

  • Test it with transcript summarization

  • Improve UI Natively in Streamlit

API Development: Only if Community requires

  • Build API for model inference in FastAPI to handle requests asynchronously (on a different branch perhaps)
  • Containerize the API with Docker
  • Troubleshoot Docker Container for API
  • Host the API on Google/AWS/Linode/Heroku
  • Perform basic CI/CD for API
  • Rehost Streamlit Application on a different service (Reduce it to client side for most operations)
  • Play around with pyScript

Front End Development: Only if Community requires

  • Build Basic React Front end
  • Connect React Front End to FastAPI
  • Add Loader Animation
  • Add Animations for model inference times
  • Handling Errors in Front End/API
  • Upload File Component
  • Download Button(s)
  • Feedback Form
  • Contact Page
  • About Page
  • Home Page
  • Stripe Integration
  • Improve Navbar UI
  • 404 Page
  • Footer UI
  • Scrollbar UI
  • SEO

CI/CD Pipeline (GitHub Actions)

  • SAST (Optional)
  • Kubernetes Smoke Test (Optional)
  • Using Super Linter for Linting (Optional)
  • Unit Tests (Optional)
  • Integration Test (Optional)

✏️ Note

Note Icon

  • To view the generated transcript file(s) in VS Code IDE install Subtitles Editor extension

  • To extensively edit/manipulate the generated transcript file(s) use the open source tool Subtitle Edit

  • For Streamlit Sharing, mentioning versions of the modules in requirements throws error at times

  • Large Modelv2 outperforms all other versions of Whisper in terms of performance especially in Multi-lingual Transcription. However, it takes a 10 times more V-RAM than the base model and has longer inference time

  • To quickly record audio from system microphone use this Python Script:

    • Pre-requisities:

      pip install pyaudio wave
      
  • Whisper is unable to read audio file from disk if python-ffmpeg or ffmpeg python pacakges are installed. It only works when ffmpeg-python python package is installed and not the former too

    # Remove all ffmpeg related python packages
    pip uninstall python-ffmpeg ffmpeg ffmpeg-python
    # Install the appropriate pacakge for ffmpeg
    pip install ffmpeg-python
    
    
  • Pixabay has a great collection of copyright free, no royalty songs that one can use for testing the application

  • Poor Performance for Kanada or Telegu songs (often language recognition itself fails) for base model. Example: Kantara movie's Varaha Roopam Song

AITranscriber Snapshot v2

Docker Container and CI/CD

  • Exclude as much irrelevant files as possible with .dockerignore such as README.MD, LICENSE, snapshots, notebooks, input,output,logs, etc

  • Minimize the number of layers (Created by RUN, COPY and ADD)

  • Always combine RUN apt-get update with apt-get install in the same RUN statement. Using apt-get update alone in a RUN statement causes caching issues and subsequent apt-get install instructions fail.

  • Using RUN apt-get update && apt-get install -y ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. This technique is known as “cache busting”.

  • In addition, when you clean up the apt cache by removing /var/lib/apt/lists it reduces the image size, since the apt cache is not stored in a layer.

  • Python Docker Image Info:

    • Images tagged with stretch/buster/jessie/buster/bullseye are codenames for different Debian Operating System Production releases.
    • bullseye being version 11, buster being version 10, and so on. (2022)
    • bookworm, trixy and forky are work-in-progress releases which may not be stable yet
    • -slim - only installs the minimal packages needed to run the particular tool.
  • Base Image with python <= 3.9 raises issue with module backports.zoneinfoand pip fails

  • To build and test multi-architecture docker images locally,

    • Create a new buildx instance
      docker buildx create --use
      
    • Build a new docker image for multi-architecture support
       docker buildx build --platform linux/arm64,linux/amd64 -t aitranscriber:multi-architecture -f Dockerfile . 
      
  • Checking Docker Image Build for multi-architecture is too time consuming for the current application and disabled

🛡️ License

This project is licensed under the GNU Affero General Public License v3.0 License - see the LICENSE file for details.

🙏 Acknowledgements

Acknowledgment Icon

aiaudiotranscriber's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.