semperai / amica Goto Github PK

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

Home Page: https://heyamica.com

License: MIT License

JavaScript 3.78% TypeScript 93.67% CSS 0.32% Shell 0.83% Rust 0.50% Dockerfile 0.51% HTML 0.38%

ai assistant-chat-bots computer-vision llm speech-recognition tts

amica's People

Contributors

Stargazers

Watchers

Forkers

frierenlabs arthurgodoy mencelot hololeo dkzdev mehmetcanbudak macguyversmusic patricklearn maugal panda5514 aspie96 mounta11n pipazoul ai-jie01 usmc2033 morphles nrvo snowyu navezjt robinrealtor zeroxclem syqs rehberim360 dusterbloom amin-jpg dreamjz agrim0312 rochemedia alan-baylis yvonne-aizawa andyccliao zlapp wensheng gamemasterninja hbernet sanyaade-projects justmalhar mightyhouseinc aykutalparslan guoqiangjia kustomzone lievreai reed-schimmel goodwillken kingdotnet arpitingle babinux dannypei 170928 m1ndb0ts aszazel youngsecurity while-basic catspunch dodecaedres jadegeek sudanenator zuozhu6698 ffffffffchopin gaborkukucska edustack iungere nathanasimon weingartens danyyil-pelekhatyi taurusduan wilaz rebots-online flukexp hasbase arbiusintern arbiusintern bluespider35 slowsynapse arbiusintern azmisahincom sikkgit rabucho eidenz

amica's Issues

I know there is more advanced, char card issue already. Might even look into that. But this is for very basic load/save. I actually have it almost done, have select/load working. Still have to sort out saving, and Sqlite DB initialization. Still I hope to do a PR in coming couple days. So this is placeholder issue for that, also as this one is more involved than my previous changes I expect some comment's exp with regards to translations, but hopefully you'll just be able to edit that stuff simply.

Wake word?

This would be great to be able to set a wake word so if you are talking to someone else Amica doesnt get confused

Use Mixamo animations?

For almost a week now I was looking where I could download free vrma animations, and... Couldn't really find it. Found some guides how to convert some other types (mixamo in particular) using say blender. Still haven't gotten to it, but searching again I see that seems underlying library already support mixamo animations: https://github.com/josephrocca/ChatVRM-js and it's even in repo

amica/src/lib/VRMAnimation/loadMixamoAnimation.ts

Line 13 in ff7bb03

export function loadMixamoAnimation(url: string, vrm: VRM) {

but not exposed in any way. As mixamo is free and provides loads of anims, I think it would be of great benefit to allow loading those animations via interface or maybe adding to server dir (like I did with bunch of vrm's) to select from menu. I'm so bummed out by idle only animation, that might look into it myself come weekend or so.

Support Text-Generation-WebUI

I mainly use exl2 models since they allow better performance for local models, and the Text-Generation-WebUI is the best option for running them locally.

Implement WebXR for AR / VR experience

Needs more research

Customization options

I like some things about this interface [mainly ease of voice stuff, esp VAD] (though after some digging seems it heavily related to some Japanese stuff, whatever though...)., but customization options are, limited.

There seems to be system prompt, not changeable in interface, it seems like it should be changeable via .env.local var NEXT_PUBLIC_SYSTEM_PROMPT , but somehow it seems to not have effect. Though even if it did there is a problem that in code there is some prompt crafting, which has 'Amica' hard-coded, so name change would likely be funky anyway. I'm not sure what is preventing variable loading from .env.local or maybe I'm doing something wrong... will keep working on it, but hard-coded name is not good.

Container for canvas not respecting viewport orientation

100vw -> 100vh depending on orientation. It seems mobile orientation is locked to landscape.

Special characters - other languages

Hey! Very interesting work! Thanks!

Could you tell me how can I add support for other languages? It doesn't display polish special characters during conversation which leads to wrong pronunciation by TTS model.

Input works but output by the model shows truncated text without special characters

feat: Chat Sessions Managament

List Chat Sessions
New Chat Session
Use Chat Session
Delete Chat Session
Import/Export/Share Chat Session

Fix Ollama support/support new Ollama chat API.

Ollama is not working correctly with Amica.

Recommend using Chat API.
Ollama has a Chat API as of v0.1.14
Ollama also has support for images via the Chat API as of v0.1.15

[tauri] add splashscreen for initial load

https://tauri.app/v1/guides/features/splashscreen

[tauri] bundle llama.cpp, whisper.cpp, and tts for all-in-one easy setup/install

Would be cool if users could just download an installer, download model when it loads, and start using amica without requiring additional setup.

requires a model downloading solution and #49

Implement support for memgpt server

While this is still early, it would be really cool to support memgpt websocket server as a chatbot backend

https://github.com/cpacker/MemGPT/

getusermedia

How do you use this in a headless server? The latest browsers (firefox/chrome) want a https connection. So instead of localhost, I just enter IP of server on the LAN. Where to install the certificates at?

vad error {"message":"getUserMedia is not implemented in this browser"}

This is nice though!

support piper tts

https://github.com/rhasspy/piper

refactor share page to be a modal component

The separate page for /share is unnecessary and causes conversation state to be lost.

When there is a network error (such as a failed api call), a small error alert should show in corner of screen

Similarly to the "settings saved" dialog that shows. This would help people to know they have a misconfiguration.

Create script which generates installable .exe for windows

Same idea as #6

Feat: Is it possible to auto-create a new folder during the installation of Amica?

Installer Used: amica_0.2.0_x64-setup.exe

Issue

Due to my negligence, I forgot to create a new folder during the installation of Amica and I directly chose D:\Apps. Amica was installed in D:\Apps\.

When I uninstalled it, the uninstaller started deleting the entire APP folder. Fortunately, I terminated the process before my some important apps(like goland, unreal engine) was deleted, only some less-used apps and my Epic Games were removed. T_T

Because I'm used to Windows software installations, in which they usually create a new folder automatically, such as Goland, UE. When I install them, a new folder will be created automatically:

D:\Apps -> D:\Apps\Goland 20xx-xx\
D:\Apps -> D:\Apps\UE_5.3\

Possible Solutions

After reading some tauri docs, I came up with two possible solustions:

1. Customize the Windows installer

I think this is troublesome and it might be not in the scope of amica.

Because Tauri does not provide relevant configurations, it is necessary to use a custom NSIS script. Here are some docs I found:

Tauri NSIS template: https://github.com/tauri-apps/tauri/blob/1.x/tooling/bundler/src/bundle/windows/templates/installer.nsi
Tauri Windows budle configuration: https://tauri.app/v1/guides/building/windows/#customizing-the-nsis-installer-template
Tauri NSIS configuration: https://tauri.app/v1/api/config/#nsisconfig.template

2. Tell users to create a new folder

Add a note in the Amica documentation to remind users to create a new folder during installation to prevent the unintentional deletion of important files during uninstallation.

What do you think about it? >_<

Potential errors in the system prompt

First of all, this is a cool project, thank you for making it open source!

The issue is about the system_prompt configuration variable: https://github.com/semperai/amica/blob/438c09ccd42b7f9afc8363afb14b00bc2e94731f/src/utils/config.ts#L37C112-L37C112

The default system prompt is a bit weird at times. I am not sending a patch because, while the changes are trivial, I am not certain of the intention of the authors.

There are five types of emotions: 'neutral' which indicates normal, 'happy' which indicates joy, 'angry' which indicates anger, 'sad' which indicates sadness, and 'relaxed' which indicates calmness.

While "normal" can be used as a noun, I believe "normality" would feel more natural in this context.

The prompt then uses the word "conversation" to refer to individual messages. "Message" would be more natural.

Please do not use polite language.

This is counterintuitive. Is there any reason for asking the model to not use polite language?

Show images in conversation history

Ideally this could be generic enough to contain any sort of multimedia

Colab

I think it would be nice to create a colab.

The following somewhat works:

# Install node
!dpkg --configure -a
!sudo apt-get update
!sudo apt-get install -y ca-certificates curl gnupg
!sudo mkdir -p /etc/apt/keyrings
!curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
!NODE_MAJOR=21 && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
!sudo apt-get update
!sudo apt-get install nodejs -y

# Clone repo
!git clone https://github.com/semperai/amica.git
%cd amica
!npm install

from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(3000)"))

import subprocess #I don't know if that is even needed
subprocess.Popen(["npm", "run", "dev"])

It only gives some error because of the service worker and loading the avatar takes some time, but the echo mode works fine in my tests.

How is possible to add new speechT5 models

How is possible change the text to speech model ? Is possible to use other .bin like voxpopuli for Italian language or other trained by ourself ? I try to add the voxpopuli.bin file in the public directory but the app stop to work without info in debug .
The main difference is that original model are 2 kbyte the voxpopuli are 500 mbyte .

Support XTTS

Using https://github.com/daswer123/xtts-api-server is one option to have XTTS support, I've looked at the code though and it relies on the local filesystem to share voice files between the client and server.
There is also this https://github.com/coqui-ai/xtts-streaming-server not sure how it would be used though.

Custom actions

We should add function calling since the model uses ollama can add ollama functions to be able to have amica search the web, extract info from news, Turn on and off lights etc.

Design and create an extension API

This would open the door to further community involvement and could foster a dev ecosystem.
It could also provide a way to split up current features and implementations into modules.

Refactoring the current architecture could be costly, but in the long run it could pay off in many ways, such as not having to individually implement and maintain multiple LLM engine apis and such.

However, it is worth considering the end purpose of this application and comparing it to others. Probably wouldn't want it to end up too similar to oobabooga.

Thoughts?

[tauri] load vrms and other assets from folder available to user

So users can import images and vrms and have them visible in ui.

Support mid-phrase interruption

Requires:

detect when user has been typing to interrupt
detecting when user has begun speaking long enough to interrupt
stop the currently spoken stream (audio) or the chat response stream mid-phrase
optional: perform some sort of audio filter to have realistic pause of speech
detect how much of the phrase has been spoken to include for context

Autotranslate - Google Translate / Deepl API?

Most models have poor generation quality in languages other than English. So much so that intermediate translation between generations (Human input ⇒ English, AI output ⇒ User Language) will be a better approach. Can we hope to add this feature?

Support different prompting templates

Thank you for Amica, I'm having a lot of fun with it!

I tried using the yi-34B models with llama.cpp with it, and generally it works, but the results aren't optimal. For example, the reply always ends with "<|im_end|>" which ends up visible in the conversation.

It would be great to have a way to not just change the system prompt, but to change the prompting template altogether. Here's ChatML, which Yi is trained to use:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Feature request: KoboldAI API support

In the different implementations you missed KoboldAI's API.
Its interactive documentation can be found on our sample instance : https://koboldai-koboldcpp-tiefighter.hf.space/api

Python based example can be found here : https://github.com/henk717/KoboldAI/blob/united/api_example.py

create docker file for web version

Hey, first off love the project.

I wonder if it is possible to build the web version as a docker image. It would make deploying much easier.

If possible add a docker-compose file that also contains the needed endpoints such as ollama. So setup is very easy

vad error: The object can not be found here

This is related to microphone, probably from it not existing, or browser not giving access to it. Need to find way to reproduce.

build docker image and publish it.

Hello,
In response to the docker file i made. #61

Would it be possible to publish the docker image ter docker hub or github packages?

o eith

I could write a GitHub action that automatically builds the docker image and puts it on GitHub packages. i think this could be possible with docker hub too.

It could be possible to write it as such that it runs when there is a commit (for a beta image) and with every release (every major update)

Build controllable animation system

We should be able to import all animations from mixamo (and others!) and use a similar system to the expression system to play animations when appropriate.

[tauri] check for updated version

look up heyamica.com/releases.json and compare version with json, show some indicator (maybe with system tray?) which prompts users to download -> links to download page.

[tauri] support arm build for orange pi and rpi

Design websocket protocol for 2 way interaction servers

Let's say we want to support home automation, currently it is possible to support by forking and adding some home automation specific code. However, with the wide variety of different home automation systems, it may make more sense to instead have a script which connects to amica, can say things as a user, and can read what she is saying (or other events).

The amica websocket server could use JSON RPC.

Supports receiving these:

register: {client_name } (registers client)
messages (get message list)
set_messages
message: {store(store in message list), interrupt(to interrupt amica or not)} (send message to amica from client)
set_animation
set_expression
set_config: {key, value}

Support broadcasting these sorts of events:

message_received: { from(which client or web interface or amica/assistant)}
interruption (currently spoken stream interrupted by user, include data to figure out where was interrupted)
config_change: {key, value}

This may make it possible to do really cool things like:

add a parser for when the user sends a youtube video link, run a script that will download the youtube video and provide it to amica somehow, and allow the user to talk about it with amica
allow user to discuss a github repo with amica
connect amica to an irc(or similar) chatroom
have amica be able to query the internet or perform other actions
connect amica to minecraft to allow her to play with you

Switch to using emojis for emotion system (and animations)

There has been persistent problems with having the chatbot use the correct format of [emotion] message. An idea shared with me was to instead utilize emojis. These fail gracefully, and there should already be a lot of training data utilizing emojis. We can search for specific types, and trigger expression changes, or for some emojis - trigger animations. This should hopefully allow a non-finetuned model to be able to trigger any of the gestures and expressions already exposed by the emoji system, will need to import the animations to make this fully work.

A much larger amount of expressions can be grabbed correctly (see https://emojipedia.org/smileys )

Some I think interesting from https://emojipedia.org/people :

thumbs up/down
waving
dancing
bowing
facepalming
juggling
lotus position

This is a pretty large change but I think it should provide a much better route.

We also should save some space in prompt context.

feat: add Character Card and Lorebooks to prompt

Character Card

The Character Card Specification provides guidelines for creating character. These character cards contain information about a specific character, including their name, description, personality, scenario, first message, and example conversations. It can be embedded into png images.

Lorebooks

A lorebook is a series of defined keywords that, when activated, insert specific content into the prompt. They can be used to serve content to the AI about the character's backstory, setting, environment, etc without needing to have it be in character definitions taking up permanent token space.

Additionally, the use of external tools like https://Chub.ai to create and share character cards and lorebooks.

Refs:

Support newer node versions

Thanks a lot for developing this project! The result is pretty impressive.

Currently, the package.json only specifies that Node 18 is supported. It would be nice if newer Nodes could be used. I just tried it with Node 21 (the "Echo" Chat mode), and it seemed to work.

Write documentation page for local server client setup

Self signed cert and letsencrypt preferably

Create script which generates .pkg file

It would be nice to distribute an easy to install method for people that are not technical.

If a script could be run in amica repo which would:

build the frontend
take llama.cpp and the basic-openai-api python repo and grab their dependencies and package them
have some script that starts llama.cpp and basic-openai then opens browser (maybe in kiosk mode?)
maybe add this to top bar so that it can be exited to reclaim memory

Then we can share the .pkg file on heyamica.com and make it easier for people to try this out.

Some questions:

can install script curl openchat and show progress
what about installing all of the python dependencies in a venv?

"getUserMedia is not implemented in this browser": v0.2.0 for MacOS

For the MacOS release (.dmg) v0.2.0, the microphone cannot be used, and there are many errors saying 'vad error {"message":"getUserMedia is not implemented in this browser"}'.

Provide a way to export / import characters

Imagine you can save:

system prompt
character model
any animations (default animation only for now)
background image
(or) background video
voice cloning file (not currently supported in ui)

As a tarball or equivalent. Then, these can be loaded into the ui. There could be a list for selecting characters you have imported.

VRM selection is imo suboptimal.

Admittedly I got somewhat overboard with doenloading vrms, but oh well. Thing is I have no good way to generate previes, and as it's showing it's quite not great. There are couple more issues, one that I'll probably get to sometime maybe, is when new char or anim is loaded char gets "reset", I mean position/orientation/zoom, which I find mildly annoying. But another thing which imo is worse, is that damn menu is covering my view, and also blurring so I can't easily see what I'm selecting/flipping through. So IMO when choosing char, it should not be centered and blurred, to allow to easily cycle and see characters.

Model on online demo

Hi.

It's not clear to me what model is being used in the online demo, but it appears to be LLaMA 2, correct me if I am wrong.

If this is so, may I suggest considering switching to something like mistral-openorca, zephyr, dolphin2.2.1-mistral, openhermes2.5-mistral, openchat or neural-chat?

Unlike LLaMA (2), which has a proprietary license by Meta, these models are all under an open source license and are based on Mistral, which is under the Apache 2.0 license and some of them may even outperform LLaMA 2.

Support ollama vision backend

0.1.15 of ollama has vision support https://github.com/jmorganca/ollama/releases/tag/v0.1.15

Refact the backend code to make it more modular and allow for easy plugin integration in the future

Currently, there are 3 backend components: chat, text-to-speech (TTS), speech-to-text (STT). Also, a storage backend is needed and the UI could be considered as a special fifth backend component too.

Chat
TTS(text-to-speech)
STT(speech-to-text)
Storage
UI(Optional)

IMO, all interfaces in the future are only auxiliary, and all interfaces should be able to be called by talking to the bot.