Giter VIP home page Giter VIP logo

voice-assistant's Introduction

VoiceAssistant_DeployRecord

简易AI语音智能助手的部署记录

Author:kkl

Project from:voice-assistant By linyiLYi

Original Markdown: English | 简体中文

开始

背景:通过学习林哥的程序赶上林哥的步伐从而取代林哥,初步了解ASR+LLM+TTS的完整应用流程。

过程

  • 克隆项目文件到本地git clone --recursive https://github.com/linyiLYi/voice-assistant

  • 环境配置 ,由于原项目的环境采用Mac,而作者的环境使用Windows,于是配置使用的命令和原项目稍微有一些区别(mlx使用openai-whisper替代,say使用pyttsx3替代。

# 环境配置
conda create -n VoiceAI python=3.11.5
conda activate VoiceAI
pip install -r requirements.txt # 注意请注释mlx,mlx是Mac的专有库
pip install llama-cpp-python

# 安装音频处理工具
pip install pyaudio

# 安装openai-whisper
pip install openai-whisper # 注意检查环境中是否安装ffmpeg

# 安装pyttsx3
pip install pyttsx3

关于遇到conda创建虚拟环境时报错的问题:

conda出现http429报错:CondaHTTPError: HTTP 429 TOO MANY REQUESTS for url <xxx>,解决办法:https://blog.csdn.net/m0_67839004/article/details/137934614

关于安装openai-whisper的注意点:

  1. 安装ffmpeg依赖,教程:https://blog.csdn.net/m0_61497715/article/details/129817641
  2. 倘若发生报错:FileNotFoundError: [WinError 2] 系统找不到指定的文件。解决办法:https://blog.csdn.net/zdm_0301/article/details/133854913
  3. 倘若发生报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 9: invalid start byte,请勿使用pip的方式下载ffmpeg,同时请检查ffmpeg是否配置到系统变量PATH路径当中,此处有同样的报错案例:openai/whisper#302
  4. openai-whisper使用cuda加速,需要修改torch为gpu版本,作者使用如下命令更新torch环境pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118,其他torch版本可以前往torch官网获取
  5. 记得删除原项目当中的whisper文件夹!

提供了两个test程序,可以方便测试openai-whisper和pyttsx3

  • 获取模型
  1. Windows环境下,只需要获取llm的模型,模型文件存放于models/文件夹下,在脚本中通过变量MODEL_PATH指定。 推荐下载 TheBloke 与 XeIaso 的 gguf 格式模型,其中 6B 模型显存占用更小: yi-chat-6b.Q8_0.gguf or yi-34b-chat.Q8_0.gguf

  2. openai-whisper的模型可以根据自己的使用需求('tiny', 'base', 'medium', 'large' ...)来在线下载,如下

# 修改load_model中的参数,执行程序会自动检查模型是否存在,模型不存在会自动开下载
import whisper

if __name__ == "__main__":
    file_path = "output.wav"
    model = whisper.load_model("base", device="cuda")
    result = model.transcribe(file_path, language="Chinese")
    print(result["text"])
  • 文件创建 请自行创建modelsrecordings文件夹。

运行

Windows环境下,对原项目的main.py有几处改动:

# 1. 引入
import whisper
import pyttsx3
# 2. 修改模型路径,这里选择6B的
# Model Configuration
# WHISP_PATH = "models/whisper-large-v3"
MODEL_PATH = "models/yi-chat-6b.Q8_0.gguf"  # Or models/yi-34b-chat.Q8_0.gguf
# 3. 用pyttsx3替换say命令
def text_to_speech(self, text):
        # 创建一个文本到语音转换器
        engine = pyttsx3.init()
        # 设置语速为200
        engine.setProperty("rate", 200)
        # 设置音量
        engine.setProperty("volume", 1.0)
        try:
            if LANG == "CN":
                engine.say(text)
                # 等待语音输出结束
                engine.runAndWait()
            else:
                engine.say(text)
                # 等待语音输出结束
                engine.runAndWait()
        except Exception as e:
            print(f"Error in text-to-speech: {e}")

        engine.stop()
# 4. 加载openai-whisper模型,一共两处修改,比较乱,具体见原代码
if __name__ == "__main__":
    if LANG == "CN":
        prompt_path = "prompts/example-cn.txt"
    else:
        prompt_path = "prompts/example-en.txt"
    
    # 1. 第一处
    whisper_model = whisper.load_model("base", device="cuda")  # 加载Whisper模型

    with open(prompt_path, "r", encoding="utf-8") as file:
        template = file.read().strip()  # {dialogue}
    prompt_template = PromptTemplate(template=template, input_variables=["dialogue"])

   # ...

    dialogue = ""
    try:
        while True:
            if voice_output_handler.tts_busy:  # Check if TTS is busy
                continue  # Skip to the next iteration if TTS is busy
            try:
                print("Listening...")
                record_audio()
                print("Transcribing...")
                time_ckpt = time.time()
                
                # user_input = whisper.transcribe("recordings/output.wav", path_or_hf_repo=WHISP_PATH)["text"]

                # 2. 第二处
                user_input = whisper_model.transcribe(
                    "recordings/output.wav", language="Chinese"
                )["text"] # 对录音做语音识别

                print(
                    "%s: %s (Time %d ms)"
                    % ("Guest", user_input, (time.time() - time_ckpt) * 1000)
                )

            # ...

鸣谢

这破项目怎么跑的怎么慢,破模型怎么这么蠢,感谢林哥linyiLYi,让我离拥有贾维斯又近了一步!同时对本文章中所有的教程的作者和报错问题的解决者表示衷心的感谢!

voice-assistant's People

Contributors

linyilyi avatar zhangkeliang0627 avatar crackerben99 avatar artus-lytiq avatar eltociear avatar

Stargazers

Flipped avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.