Giter VIP home page Giter VIP logo

podwhisperer's Introduction

๐ŸŽ™๐Ÿ‘‚ podwhisperer

A completely automated podcast audio transcription workflow with super accurate results!

Note: this project was presented in AWS Bites Podcast. Check out the full episode! ๐Ÿ‘ˆ

This project uses:

  • OpenAI Whisper for super accurate transcription
  • Amazon Transcribe to add speaker identification
  • FFmpeg for audio transcoding to MP3
  • AWS Lambda for:
    • Merging the Whisper and Transcribe results
    • Substituting commonly 'misheard' words/proper nouns
    • Creating a GitHub Pull Request against the podcast's website repository with the generated transcript (this is an optional step)
  • ...and Step Functions to orchestrate the whole process!

This project consists of a few components, each with their own CloudFormation Stack:

  1. ๐Ÿ‘‚ whisper-image, for creating an ECR container image repository where we store the SageMaker container to run the Whisper model
  2. ๐Ÿชฃ data-resources for shared data stores, namely an S3 Bucket
  3. ๐Ÿง  sagemaker-resources for the SageMaker model and IAM role
  4. ๐ŸŽ™ transcript-orchestration, for orchestration, custom transcript processing and creating the transcript pull request

This project uses AWS SAM with nested stacks to deploy all but the first of these components. That first component is special, since we need to create the container image respository with Amazon ECR where we can push our custom Whisper container image. That makes the image available to be loaded by the SageMaker resources we can then create.

Prerequisites

You will need the following build tooling installed.

  • Node.js 16.x and NPM 8.x
  • Docker, or other tooling that can build a container image from a Dockerfile and push it to a repository.
  • AWS SAM, used to build and deploy most of the application
  • The AWS CLI
  • esbuild

By default, the target AWS account should have the SLIC Watch SAR Application installed. It can be installed by going to _this page in the AWS Console. SLIC Watch is used to create alarms and dashboards for our transcription application. If you want to skip this option, just remove the single line referring to the SlicWatch-v2 macro from the relevant template, transcript-orchestration/template.yaml.

Getting Started

You can deploy this complete application to your own AWS account.

  1. Make sure to set the environment variables for the AWS region and profile

    export AWS_PROFILE=xxx
    export AWS_DEFAULT_REGION=eu-central-1
    export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
  2. The first deployment step creates the ECR repository. We can use the AWS CLI to do this with CloudFormation:

    aws cloudformation deploy \
     --template ./whisper-image/template.yaml \
     --stack-name whisper-image \
     --tags file://./common-tags.json \
     --capabilities CAPABILITY_NAMED_IAM 

    We can now retrieve the repostiory URI from the CloudFormation outputs:

    REPOSITORY_URI=$(aws cloudformation describe-stacks --stack-name whisper-image --query "Stacks[0].Outputs[?ExportName=='whisper-model-image-repository-uri'].OutputValue" --output text)
  3. Next, we can build and push the Whisper container image:

    cd whisper-image
    
    # Build the container image
    docker build -t $REPOSITORY_URI .
    
    # Log in to ECR with Docker (make sure to set AWS_REGION and AWS_ACCCOUNT_ID)
    aws ecr get-login-password | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
    
    # Push the container image to ECR
    docker push $REPOSITORY_URI
    
    # leave directory before executing next step
    cd ..
  4. Now that our container image is present, we can deploy the rest of the application with AWS SAM.

    sam build --parallel
    sam deploy --guided --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_IAM  # It should be sufficient to accept all defaults when prompted

That's it! You can now test the entire transcription flow. The entire process is trigged when you upload an audio file to the newly-created S3 Bucket:

aws s3 cp sample-audio/sample1.mp3 s3://pod-transcription-${AWS_ACCOUNT_ID}-${AWS_REGION}/audio/sample1.mp3

That S3 object upload will create an EventBridge event to trigger the transcription Step Function. You can watch its progress in the Step Functions Console.

Configuration

By default, the transcription workflow will attempt to create a Pull Request against a static website's GitHub repository. The GitHub repository is configured in the Pull Request Lambda Function's environment variables. You can modify this, and the pull request creation code if you want to have PRs with new transcripts created as part of the workflow. If you want to turn this feature off altogether, you can simply change the createPR default value in the Step Function's inputs from true to false here.

Step function architecture

To have a better feeling for what the process looks like you can check out the following picture for a visualization of the Step Function definition:

Overview of the step function

podwhisperer's People

Contributors

eoinsha avatar lmammino avatar nodomain avatar berkecanrizai avatar berkecanrizai-getir avatar timojde avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.