A speech-to-speech translation MacOS app powered by Whisper and Google Translate
This repo's goal was to develop an appliction testing Whisper's capabilities for real-time voice transcription. Implementing python package speech_recognition
made this possible by processing audio input per lingual pauses.
- Record and efficiently transcribe, translate, and synthesize speech for CPU-dependent hardware between 85 different languages.
demo_downscale_3.mov
- Save up to 7 different recordings at a time.
saving_downscale_2.mov
- Run the application
voicebridge_1.0.0_darwin-x86_64
. - Select your input and output languages.
- Press REC to start recording.
- Press again to stop recording.
- Press PLAY to play the current recording.
- Press SAVE to turn saving ON.
- When enabled, press presets 1-6 to store the current recording there.
- Press presets 1-6 when saving is OFF to play the saved recording.
This application utilizes multithreading to simutaneously record and transcribe audio. The front-end is implemented using Python package customtkinter. Object-orientated design is demonstrated via Base
and Decoder
objects. The app has integration with Google Translate APIs and invokes offline package openai-whisper
. This application was compiled to be standalone using py2app
.
Voicebridge requires an internet connection due to Google API use.
- To implement offline functionality, look into Argos Translate and pyttsx3
-
This software could be paired with voicechat applications to enable real-time translation and speech synthesis between users who speak different languages.
-
This software could be deployed on small devices for people to use when traveling overseas.
- Extra helpful if offline functionality is implemented
Pull requests are welcomed. For major changes, please open an issue first to discuss what you would like to change.