semperai / amica Goto Github PK
View Code? Open in Web Editor NEWAmica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
Home Page: https://heyamica.com
License: MIT License
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
Home Page: https://heyamica.com
License: MIT License
I know there is more advanced, char card issue already. Might even look into that. But this is for very basic load/save. I actually have it almost done, have select/load working. Still have to sort out saving, and Sqlite DB initialization. Still I hope to do a PR in coming couple days. So this is placeholder issue for that, also as this one is more involved than my previous changes I expect some comment's exp with regards to translations, but hopefully you'll just be able to edit that stuff simply.
This would be great to be able to set a wake word so if you are talking to someone else Amica doesnt get confused
For almost a week now I was looking where I could download free vrma animations, and... Couldn't really find it. Found some guides how to convert some other types (mixamo in particular) using say blender. Still haven't gotten to it, but searching again I see that seems underlying library already support mixamo animations: https://github.com/josephrocca/ChatVRM-js and it's even in repo
but not exposed in any way. As mixamo is free and provides loads of anims, I think it would be of great benefit to allow loading those animations via interface or maybe adding to server dir (like I did with bunch of vrm's) to select from menu. I'm so bummed out by idle only animation, that might look into it myself come weekend or so.I mainly use exl2 models since they allow better performance for local models, and the Text-Generation-WebUI is the best option for running them locally.
Needs more research
I like some things about this interface [mainly ease of voice stuff, esp VAD] (though after some digging seems it heavily related to some Japanese stuff, whatever though...)., but customization options are, limited.
There seems to be system prompt, not changeable in interface, it seems like it should be changeable via .env.local
var NEXT_PUBLIC_SYSTEM_PROMPT
, but somehow it seems to not have effect. Though even if it did there is a problem that in code there is some prompt crafting, which has 'Amica'
hard-coded, so name change would likely be funky anyway. I'm not sure what is preventing variable loading from .env.local
or maybe I'm doing something wrong... will keep working on it, but hard-coded name is not good.
100vw -> 100vh depending on orientation. It seems mobile orientation is locked to landscape.
Hey! Very interesting work! Thanks!
Could you tell me how can I add support for other languages? It doesn't display polish special characters during conversation which leads to wrong pronunciation by TTS model.
Input works but output by the model shows truncated text without special characters
Would be cool if users could just download an installer, download model when it loads, and start using amica without requiring additional setup.
requires a model downloading solution and #49
While this is still early, it would be really cool to support memgpt websocket server as a chatbot backend
How do you use this in a headless server? The latest browsers (firefox/chrome) want a https connection. So instead of localhost, I just enter IP of server on the LAN. Where to install the certificates at?
vad error {"message":"getUserMedia is not implemented in this browser"}
This is nice though!
The separate page for /share is unnecessary and causes conversation state to be lost.
Similarly to the "settings saved" dialog that shows. This would help people to know they have a misconfiguration.
Same idea as #6
Installer Used: amica_0.2.0_x64-setup.exe
Due to my negligence, I forgot to create a new folder during the installation of Amica and I directly chose D:\Apps
. Amica was installed in D:\Apps\
.
When I uninstalled it, the uninstaller started deleting the entire APP folder. Fortunately, I terminated the process before my some important apps(like goland, unreal engine) was deleted, only some less-used apps and my Epic Games were removed. T_T
Because I'm used to Windows software installations, in which they usually create a new folder automatically, such as Goland, UE. When I install them, a new folder will be created automatically:
D:\Apps -> D:\Apps\Goland 20xx-xx\
D:\Apps -> D:\Apps\UE_5.3\
After reading some tauri docs, I came up with two possible solustions:
I think this is troublesome and it might be not in the scope of amica.
Because Tauri does not provide relevant configurations, it is necessary to use a custom NSIS script. Here are some docs I found:
Add a note in the Amica documentation to remind users to create a new folder during installation to prevent the unintentional deletion of important files during uninstallation.
What do you think about it? >_<
First of all, this is a cool project, thank you for making it open source!
The issue is about the system_prompt
configuration variable: https://github.com/semperai/amica/blob/438c09ccd42b7f9afc8363afb14b00bc2e94731f/src/utils/config.ts#L37C112-L37C112
The default system prompt is a bit weird at times. I am not sending a patch because, while the changes are trivial, I am not certain of the intention of the authors.
There are five types of emotions: 'neutral' which indicates normal, 'happy' which indicates joy, 'angry' which indicates anger, 'sad' which indicates sadness, and 'relaxed' which indicates calmness.
While "normal" can be used as a noun, I believe "normality" would feel more natural in this context.
The prompt then uses the word "conversation" to refer to individual messages. "Message" would be more natural.
Please do not use polite language.
This is counterintuitive. Is there any reason for asking the model to not use polite language?
Ideally this could be generic enough to contain any sort of multimedia
I think it would be nice to create a colab.
The following somewhat works:
# Install node
!dpkg --configure -a
!sudo apt-get update
!sudo apt-get install -y ca-certificates curl gnupg
!sudo mkdir -p /etc/apt/keyrings
!curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
!NODE_MAJOR=21 && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
!sudo apt-get update
!sudo apt-get install nodejs -y
# Clone repo
!git clone https://github.com/semperai/amica.git
%cd amica
!npm install
from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(3000)"))
import subprocess #I don't know if that is even needed
subprocess.Popen(["npm", "run", "dev"])
It only gives some error because of the service worker and loading the avatar takes some time, but the echo mode works fine in my tests.
How is possible change the text to speech model ? Is possible to use other .bin like voxpopuli for Italian language or other trained by ourself ? I try to add the voxpopuli.bin file in the public directory but the app stop to work without info in debug .
The main difference is that original model are 2 kbyte the voxpopuli are 500 mbyte .
Using https://github.com/daswer123/xtts-api-server is one option to have XTTS support, I've looked at the code though and it relies on the local filesystem to share voice files between the client and server.
There is also this https://github.com/coqui-ai/xtts-streaming-server not sure how it would be used though.
We should add function calling since the model uses ollama can add ollama functions to be able to have amica search the web, extract info from news, Turn on and off lights etc.
This would open the door to further community involvement and could foster a dev ecosystem.
It could also provide a way to split up current features and implementations into modules.
Refactoring the current architecture could be costly, but in the long run it could pay off in many ways, such as not having to individually implement and maintain multiple LLM engine apis and such.
However, it is worth considering the end purpose of this application and comparing it to others. Probably wouldn't want it to end up too similar to oobabooga.
Thoughts?
So users can import images and vrms and have them visible in ui.
Requires:
Most models have poor generation quality in languages other than English. So much so that intermediate translation between generations (Human input โ English, AI output โ User Language) will be a better approach. Can we hope to add this feature?
Thank you for Amica, I'm having a lot of fun with it!
I tried using the yi-34B models with llama.cpp with it, and generally it works, but the results aren't optimal. For example, the reply always ends with "<|im_end|>" which ends up visible in the conversation.
It would be great to have a way to not just change the system prompt, but to change the prompting template altogether. Here's ChatML, which Yi is trained to use:
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
In the different implementations you missed KoboldAI's API.
Its interactive documentation can be found on our sample instance : https://koboldai-koboldcpp-tiefighter.hf.space/api
Python based example can be found here : https://github.com/henk717/KoboldAI/blob/united/api_example.py
Hey, first off love the project.
I wonder if it is possible to build the web version as a docker image. It would make deploying much easier.
If possible add a docker-compose file that also contains the needed endpoints such as ollama. So setup is very easy
This is related to microphone, probably from it not existing, or browser not giving access to it. Need to find way to reproduce.
Hello,
In response to the docker file i made. #61
Would it be possible to publish the docker image ter docker hub or github packages?
o eith
I could write a GitHub action that automatically builds the docker image and puts it on GitHub packages. i think this could be possible with docker hub too.
It could be possible to write it as such that it runs when there is a commit (for a beta image) and with every release (every major update)
We should be able to import all animations from mixamo (and others!) and use a similar system to the expression system to play animations when appropriate.
look up heyamica.com/releases.json and compare version with json, show some indicator (maybe with system tray?) which prompts users to download -> links to download page.
Let's say we want to support home automation, currently it is possible to support by forking and adding some home automation specific code. However, with the wide variety of different home automation systems, it may make more sense to instead have a script which connects to amica, can say things as a user, and can read what she is saying (or other events).
The amica websocket server could use JSON RPC.
Supports receiving these:
register: {client_name }
(registers client)messages
(get message list)set_messages
message: {store(store in message list), interrupt(to interrupt amica or not)}
(send message to amica from client)set_animation
set_expression
set_config: {key, value}
Support broadcasting these sorts of events:
message_received: { from(which client or web interface or amica/assistant)}
interruption
(currently spoken stream interrupted by user, include data to figure out where was interrupted)config_change: {key, value}
This may make it possible to do really cool things like:
There has been persistent problems with having the chatbot use the correct format of [emotion] message
. An idea shared with me was to instead utilize emojis. These fail gracefully, and there should already be a lot of training data utilizing emojis. We can search for specific types, and trigger expression changes, or for some emojis - trigger animations. This should hopefully allow a non-finetuned model to be able to trigger any of the gestures and expressions already exposed by the emoji system, will need to import the animations to make this fully work.
A much larger amount of expressions can be grabbed correctly (see https://emojipedia.org/smileys )
Some I think interesting from https://emojipedia.org/people :
This is a pretty large change but I think it should provide a much better route.
We also should save some space in prompt context.
The Character Card Specification provides guidelines for creating character. These character cards contain information about a specific character, including their name, description, personality, scenario, first message, and example conversations. It can be embedded into png images.
A lorebook is a series of defined keywords that, when activated, insert specific content into the prompt. They can be used to serve content to the AI about the character's backstory, setting, environment, etc without needing to have it be in character definitions taking up permanent token space.
Additionally, the use of external tools like https://Chub.ai to create and share character cards and lorebooks.
Refs:
Thanks a lot for developing this project! The result is pretty impressive.
Currently, the package.json only specifies that Node 18 is supported. It would be nice if newer Nodes could be used. I just tried it with Node 21 (the "Echo" Chat mode), and it seemed to work.
Self signed cert and letsencrypt preferably
It would be nice to distribute an easy to install method for people that are not technical.
If a script could be run in amica repo which would:
Then we can share the .pkg file on heyamica.com and make it easier for people to try this out.
Some questions:
For the MacOS release (.dmg) v0.2.0, the microphone cannot be used, and there are many errors saying 'vad error {"message":"getUserMedia is not implemented in this browser"}'.
Imagine you can save:
As a tarball or equivalent. Then, these can be loaded into the ui. There could be a list for selecting characters you have imported.
Hi.
It's not clear to me what model is being used in the online demo, but it appears to be LLaMA 2, correct me if I am wrong.
If this is so, may I suggest considering switching to something like mistral-openorca, zephyr, dolphin2.2.1-mistral, openhermes2.5-mistral, openchat or neural-chat?
Unlike LLaMA (2), which has a proprietary license by Meta, these models are all under an open source license and are based on Mistral, which is under the Apache 2.0 license and some of them may even outperform LLaMA 2.
0.1.15 of ollama has vision support https://github.com/jmorganca/ollama/releases/tag/v0.1.15
Currently, there are 3 backend components: chat, text-to-speech (TTS), speech-to-text (STT). Also, a storage backend is needed and the UI could be considered as a special fifth backend component too.
IMO, all interfaces in the future are only auxiliary, and all interfaces should be able to be called by talking to the bot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.