Giter VIP home page Giter VIP logo

airis's Introduction

Screenshot 2024-07-19 002209

Airis: Local Vtuber AI

Airis-VtuberAI is a open source attempt to recreate the populer Vtuber "Neuro Sama". The project utilises no APIs and can run entirely localy without a need for an internet connection or considerable Vram.

the project includes the ability to transcribe the users voice, generate a response, and synthisise a text2speach output with as litle latency as resonable posible while sacraphising as little quality as posible.

Features

  • Chat Mode
    • Allows the Vtuber AI to read and respond to chat messages
    • Interacts with OBS to include Subtitles and updated chat
    • lower VRAM
  • Interview Mode
    • Allows the Vtuber AI to convers with the user with low latency
    • Includes fast transcription

Table of Contents

Installation

first clone this repository and then clone the OpenVoice TTS repository

git clone https://github.com/neurokitti/AIRIS-VtuberAI.git
cd AIRIS-VtuberAI
git clone https://github.com/myshell-ai/OpenVoice.git

next create a .venv and install install the requirments.txt (the one from this repo not the OpenVoice repo)

pip install -r requirements.txt

next install pytorch here next you can deleat all the files (not the folders) in the OpenVoice folder. then drag the files from the Vtuber Project into the open voice repository. dont drag the system prompt files into the repo though.

image

finnaly install OBS Websocket here and set the websocket pasword to the be the same as the one in the startup_scripts.py file.

Usage

To run this project you can simply run the main file. to run interview mode just uncoment it.

from startup_scripts import main_chat, main_interview

if __name__ == "__main__":
    main_chat() #this will run a chat mode that will interact with the chat but will not respond to you
    #main_interview() # this will not read chat but instead respond to anyone on the stream over mic

you may also want to edit the project to better suit your needs. in that case navigate to the startup_scripts.py file.

finnaly to run the project run the main.py file with the mode you want uncomented

Benchmark

The Metrics in this section include the full project including the overhead from running OBS, and Vtube Studio. All of these test were run on GPU and used the phi 3 mini 4k instruct model from microsoft.

NOTE: Because I have fully tested response time for reference its between 1 and 2 seconds

Time to First token: Interview Mode

Whisper Model Precision Language Model Quantization Max. GPU memory Response Time
tiny int8_float16 Phi-3-mini-4k-instruct 4-bit tbd time tbd
tiny int8_float16 Phi-3-mini-4k-instruct 8-bit tbd time tbd
tiny int8_float16 Phi-3-mini-4k-instruct full tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct 4-bit tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct 8-bit tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct full tbd time tbd

Executed with CUDA 12.1 on a NVIDIA Laptop RTX 4080 with 12 GB of VRAM.

Time to First token: Chat Mode

Whisper Model Precision Language Model Quantization Max. GPU memory Response Time
tiny int8_float16 Phi-3-mini-4k-instruct 4-bit tbd time tbd
tiny int8_float16 Phi-3-mini-4k-instruct 8-bit tbd time tbd
tiny int8_float16 Phi-3-mini-4k-instruct full tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct 4-bit tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct 8-bit tbd time tbd
distil-large-v3 int8_float16 Phi-3-mini-4k-instruct full tbd time tbd

Executed with CUDA 12.1 on a NVIDIA Laptop RTX 4080 with 12 GB of VRAM.

Comming Soon

  • Better summery memmory managment
  • mannager UI

License

idk how to do A license but all projects used in this use MIT so i think you can do whatver you want cuz i dont care. go nuts

Credits

Join Our Community That Doesnt Exist

Discord Youtube

Contact

[email protected]

airis's People

Contributors

neurokitti avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.