Giter VIP home page Giter VIP logo

demo-speech-text-chat's Introduction

Démo Speech Text Chat

This project is a Flask web application that captures audio from the user's microphone and uses OpenAI's Whisper model for speech-to-text transcription.

This software is provided as is, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

Features

  • Record audio directly in the web browser.
  • Send the audio data to a Flask server.
  • Use Whisper to transcribe the audio to text.
  • Display the transcription result on the web page.
  • Real Time Transcription
  • Working chatbot
  • Streamed response
  • Text-to-speech responses

Todo

  • Various feedback loops to improve the quality of the transcript as well as the questions/answers

  • log conversations to central contexts in case further processing is needed

  • Real time transcription improvements

  • Chatbot improvement (user initial prompt handling)

  • AOB

Also:

  • Fix unsafe Wekzeug usage for production

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them:

  • python3
  • pip

Also, for optional dependencies:

  • virtualenv
  • ffmpeg (for the audio processing (both this project and openai-whisper))
  • Docker (optional for containerization)

And for the audio outputs (dependencies from pyttsx3, see Synthesizer support):

  • sapi5 (Windows)
  • nsss (Mac OS)
  • espeak (Linux)

Installing

How to get the app running?

  1. Clone the repository:
git clone https://github.com/spiderweak/demo-speech-text-chat
  1. Navigate to the project directory:
cd demo-speech-text-chat
  1. Create a virtual environment (alternatively, use conda, but please try not to install this directly on your system, it's not a good practice):
python -m venv .venv
source .venv/bin/activate
  1. Install the required dependencies:
pip install -r requirements.txt

Running the Application

Run the application with:

python run.py

Access the web application at http://127.0.0.1:5000/.

If necessary, you can run this application in headless mode to put your own frontend interface, don't forget you still need to send your data to the correct backend routes.

python run.py --headless

This last feature has not been extensively tested.

Docker Support

If you wish to use Docker for deployment:

  1. Build the Docker image:
docker build -t demo-speech-text-chat .
  1. Run the Docker container:
docker run -p 5000:5000 demo-speech-text-chat

As you can do with the non-dockerized version, you can run the project with docker in headless mode:

docker run -p 5000:5000 demo-speech-text-chat --headless

The container does not copy chat models locally, but you can mount the models in a dedicated folder if you want to use your own model. Just don't forget to change the environment variables defined in the .env file:

docker run -p 5000:5000 -v $(pwd)/app/models:/app/models demo-speech-text-chat

Interfaces diagram

Chatbot integration diagram "Chatbot integration diagram"

Contributing

This project does not accept exterior contributions for now.

Authors

Antoine "Spiderweak" BERNARD

License

Unless part of the system is incompatible with it, consider this project under CC BY-NC-SA and mostly used for research purposes and teaching.

Acknowledgments

Thanks to these project, that make most of the project run

  • OpenAI for the Whisper model and for the disclaimer in the opening statement of this README.
  • Flask

demo-speech-text-chat's People

Contributors

spiderweak avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.