Giter VIP home page Giter VIP logo

ai-austin / gpt4all-voice-assistant Goto Github PK

View Code? Open in Web Editor NEW
129.0 8.0 48.0 25 KB

This is a 100% offline GPT4ALL Voice Assistant. Completely open source and privacy friendly. Use any language model on GPT4ALL. Background process voice detection. Watch the full YouTube tutorial for setup guide: https://youtu.be/6zAk0KHmiGw

Home Page: https://youtu.be/6zAk0KHmiGw

License: MIT License

Python 100.00%

gpt4all-voice-assistant's Introduction

GPT4ALL-Voice-Assistant

This is a 100% offline GPT4ALL Voice Assistant. Completely open source and privacy-friendly. Use any language model on GPT4ALL. Background process voice detection. Watch the full YouTube tutorial for the setup guide: https://youtu.be/6zAk0KHmiGw

Setup

I highly advise watching the YouTube tutorial to use this code. You will need to modify the OpenAI whisper library to work offline and I walk through that in the video as well as setting up all the other dependencies to function properly.

If you're planning on installing it on Arch-based distros, you need to install espeak and python-espeak packages from the AUR. You can install them using yay utility by running:

yay -S espeak python-espeak

Improvements to think about adding to yours

Give a system prompt. These open-source models perform far better when you send a system prompt as specified in the GPT4ALL documentation: https://docs.gpt4all.io/gpt4all_python.html#introspection

gpt4all-voice-assistant's People

Contributors

ai-austin avatar mihajlopi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt4all-voice-assistant's Issues

WebUI

Would be amazing if you can add a web UI to this awesome AI voice assistant using one of the readily available python libraries.
Thanks for creating and sharing this good work!

I need your help

Dear Austin, I say thank your for the YouTube video tutorial.
Nevertheless, I am struggling with gpt4all_voice running offline on my PC (Win10) for two days now with no success.
The python script runs without error message, records the wake_word into the wake_detect.wav, and jarvis tells me 'listening' but then, nothing happens.
My voice question is not recorded into prompt.wav and the program hangs. It would be much appreciated if you could take a look at the code. I followed the steps in your video. I have even solved the issue of the missing vocab.bpe. Still nothing. :( Here is my main.py

from os import system
import speech_recognition as sr
from playsound import playsound
from gpt4all import GPT4All
import sys
import whisper
import warnings
import time
import os

wake_word = 'jarvis'
model = GPT4All("/users/apache33/appdata/local/nomic.ai/GPT4All/gpt4all-falcon-newbpe-q4_0.gguf", allow_download=False)
r = sr.Recognizer()
tiny_model_path = os.path.expanduser('/users/apache33/.cache/whisper/tiny.pt')
base_model_path = os.path.expanduser('/users/apache33/.cache/whisper/base.pt')
tiny_model = whisper.load_model(tiny_model_path)
base_model = whisper.load_model(base_model_path)
listening_for_wake_word = True
source = sr.Microphone()
warnings.filterwarnings("ignore", category=UserWarning, module='whisper.transcribe')

if sys.platform != 'darwin':
import pyttsx3
engine = pyttsx3.init()

def speak(text):
if sys.platform == 'darwin':
ALLOWED_CHARS = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,?!-_$:+-/ ")
clean_text = ''.join(c for c in text if c in ALLOWED_CHARS)
system(f"say '{clean_text}'")
else:
engine.say(text)
engine.runAndWait()

def listen_for_wake_word(audio):
global listening_for_wake_word
with open("wake_detect.wav", "wb") as f:
f.write(audio.get_wav_data())
result = tiny_model.transcribe('wake_detect.wav')
text_input = result['text']
if wake_word in text_input.lower().strip():
print("Wake word detected. Please speak your prompt to GPT4All.")
speak('Listening')
listening_for_wake_word = False

def prompt_gpt(audio):
global listening_for_wake_word
try:
with open("prompt.wav", "wb") as f:
f.write(audio.get_wav_data())
result = base_model.transcribe('prompt.wav')
prompt_text = result['text']
if len(prompt_text.strip()) == 0:
print("Empty prompt. Please speak again.")
speak("Empty prompt. Please speak again.")
listening_for_wake_word = True
else:
print('User: ' + prompt_text)
output = model.generate(prompt_text, max_tokens=200)
print('GPT4All: ', output)
speak(output)
print('\nSay', wake_word, 'to wake me up. \n')
listening_for_wake_word = True
except Exception as e:
print("Prompt error: ", e)

def callback(recognizer, audio):
global listening_for_wake_word
if listening_for_wake_word:
listen_for_wake_word(audio)
else:
prompt_gpt(audio)

def start_listening():
with source as s:
r.adjust_for_ambient_noise(s, duration=2)
print('\nSay', wake_word, 'to wake me up. \n')
r.listen_in_background(source, callback)
while True:
time.sleep(1)

if name == 'main':
start_listening()

Windows users program detects wake word but not voice prompt.

We are currently diagnosing this in my Discord server. It seems to either be an issue with the listen_in_background() function from speechrecognition, an issue with Pyttsx3 or being caused by them used in parallel. The repo will be updated with ASAP. If any Windows users find a solution for this issue please feel free to share below if this issue is still open.

Readme inaccurate

the Arch-section is no longer working for a while now: python-package python-espeak demands python-distutils in its PKGBUILD-File, but python-distutils was deprecated and has been fully removed as of Python 3.12.0 - Arch and its derivatives are using Python 3.12.3 at the time of writing.
I contacted the Maintainer of the AUR-PKGBUILD about this.

In addition, that AUR-PKGBUILD was building python-espeak 0.5, the current version (according to pypi) is 0.6.3 - It should be clarified in what way this affects the overall process of getting the voiceassistant to run?

ERROR: No matching distribution found for whisper==1.9.2

I get an error when trying to install the requirements:

ERROR: No matching distribution found for whisper==1.9.2

It looks like the newest version in the PyPi (pip) library is 1.1.10, so I replaced the line with that in the requirements.txt file and I get no warnings, so it should be fine.. but I will let you know if I run into any other issues. In any case, once I confirm.. I know it's minor but this should be updated (and also useful to be documented for those who may experience the issue).

I had to make changes for not getting stuck at engine.runAndWait()

On a windows computer, engine.runAndWait() did not continue after speaking. I moved the engine = pyttsx3.init() in front of the current line 31 and added engine = None and
rate = None to the current lines 23/24 and now it works.

Addintionally, getting ffmpeg to work, required an installation that is described in the openai-whisper description.

add requirements.txt

Add a file called requirements.txt
content of the file

gpt4all
openai-whisper
SpeechRecognition
playsound
PyAudio
soundfile
pyttsx3

turns out you also need espeak on your system

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.