Giter VIP home page Giter VIP logo

video-transcript-summarizer's Introduction

Speech-To-Text Transcription and Summarization Web Application: Unlocking the Power of Whisper and Transformers

Language and Libraries

python numpy opencv pytorch docker aws pandas

Problem statement

The goal of this project is to develop a web application that allows users to upload a video or provide a video link and automatically transcribe the video's audio into text, with the option to translate it to a different language. Additionally, the application will summarize the transcribed text to provide a brief overview of the video's content. The application will be built using the Flask framework and utilize the VideoToSubtitle, summarize, and video_downloader components. The project aims to make it easy for users to transcribe and understand the content of videos with minimal effort.

Solution Proposed

The proposed solution for this project utilizes OpenAI's Whisper model to transcribe the audio of the video into text. For downloading YouTube videos, the PyTube library is used. To summarize the transcribed text, the BART-Large model is utilized. The application is built using the Flask framework and consists of three main components: video downloader, subtitle generator, summarizer.

Components

  • The video downloader component uses PyTube library to download the video from YouTube or accepts a video file uploaded by the user.

  • The subtitle generator component uses the OpenAI's Whisper model to transcribe the audio of the video and has the option to translate the transcript to a different language.

  • The summarizer component uses the BART-Large model to generate a summary of the transcribed text.

The user interacts with the application through a web interface, where they can upload a video file or provide a YouTube video link. The application processes the video, generates a transcript, and provides a summary of the video's content. The transcript and summary are displayed on the web page, allowing the user to easily understand the video's content. This solution aims to make it simple and easy for users to transcribe, translate and summarize videos with minimal effort.

Deployment

This project also utilizes CircleCI for continuous integration and deployment using Docker. Docker allows to package an application and its dependencies in a virtual container that can run consistently across different environments.

CircleCI is configured to automatically build and test the application inside a Docker container after each code change is pushed to the source code repository. After successful testing, the application is then deployed to an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance. This approach allows for easy and efficient updating and scaling of the application, as well as facilitating collaboration among the development team.

The use of CircleCI, AWS EC2 and Docker allows for a smooth and streamlined deployment process, ensuring that the application is always up-to-date and running efficiently in a consistent environment. This will also ensure that the application can be easily deployed and tested on different environments and platforms.

How to run?

Step 1: Clone the repository

git clone my repository link 

Step 2- Create a conda environment after opening the repository

conda create -p env python=3.10 -y
conda activate env

Step 3 - Install the requirements

pip install -r requirements.txt

Step 4 - Run the application server

python app.py

Step 6. Open the application

http://localhost:5000

Run locally

  1. Check if the Dockerfile is available in the project directory

  2. Build the Docker image

docker build vsum -t . 
  1. Run the Docker image
docker run -d -p 8080:8080 <IMAGEID>

๐Ÿ‘จโ€๐Ÿ’ป Tech Stack Used

  1. Python
  2. Flask
  3. Pytorch
  4. Docker
  5. Transformers

๐ŸŒ Infrastructure Required.

  1. AWS EC2
  2. AWS ECR
  3. Circle CI

videosum is the main package folder which contains

Conclusion

One potential area for improvement in this project could be the integration of more advanced natural language processing techniques to improve the accuracy of the transcriptions and summaries like spell checks and Additionally, the application could be enhanced to support more languages for transcription and translation.

This application could be used in a wide range of real-world scenarios, such as in the entertainment industry for subtitle generation, in education for creating transcripts of lectures, and in business for creating summaries of meetings and presentations. It could also be useful for individuals who have difficulty hearing or understanding spoken language, such as people with hearing impairments.

=====================================================================

video-transcript-summarizer's People

Contributors

aravind-ineuron avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.