Giter VIP home page Giter VIP logo

semperai / amica Goto Github PK

View Code? Open in Web Editor NEW
542.0 12.0 81.0 56.56 MB

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

Home Page: https://heyamica.com

License: MIT License

JavaScript 3.78% TypeScript 93.67% CSS 0.32% Shell 0.83% Rust 0.50% Dockerfile 0.51% HTML 0.38%
ai assistant-chat-bots computer-vision llm speech-recognition tts

amica's People

Contributors

andyccliao avatar aspie96 avatar batoracli avatar danyyil-pelekhatyi avatar dreamjz avatar flukexp avatar illtellyoulater avatar inoueharutaka avatar kasumi-1 avatar ke456-png avatar morphles avatar patricklearn avatar slowsynapse avatar snowyu avatar wilaz avatar yvonne-aizawa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amica's Issues

Basic charater save/load

I know there is more advanced, char card issue already. Might even look into that. But this is for very basic load/save. I actually have it almost done, have select/load working. Still have to sort out saving, and Sqlite DB initialization. Still I hope to do a PR in coming couple days. So this is placeholder issue for that, also as this one is more involved than my previous changes I expect some comment's exp with regards to translations, but hopefully you'll just be able to edit that stuff simply.

Wake word?

This would be great to be able to set a wake word so if you are talking to someone else Amica doesnt get confused

Use Mixamo animations?

For almost a week now I was looking where I could download free vrma animations, and... Couldn't really find it. Found some guides how to convert some other types (mixamo in particular) using say blender. Still haven't gotten to it, but searching again I see that seems underlying library already support mixamo animations: https://github.com/josephrocca/ChatVRM-js and it's even in repo

export function loadMixamoAnimation(url: string, vrm: VRM) {
but not exposed in any way. As mixamo is free and provides loads of anims, I think it would be of great benefit to allow loading those animations via interface or maybe adding to server dir (like I did with bunch of vrm's) to select from menu. I'm so bummed out by idle only animation, that might look into it myself come weekend or so.

Support Text-Generation-WebUI

I mainly use exl2 models since they allow better performance for local models, and the Text-Generation-WebUI is the best option for running them locally.

Customization options

I like some things about this interface [mainly ease of voice stuff, esp VAD] (though after some digging seems it heavily related to some Japanese stuff, whatever though...)., but customization options are, limited.

There seems to be system prompt, not changeable in interface, it seems like it should be changeable via .env.local var NEXT_PUBLIC_SYSTEM_PROMPT , but somehow it seems to not have effect. Though even if it did there is a problem that in code there is some prompt crafting, which has 'Amica' hard-coded, so name change would likely be funky anyway. I'm not sure what is preventing variable loading from .env.local or maybe I'm doing something wrong... will keep working on it, but hard-coded name is not good.

Special characters - other languages

Hey! Very interesting work! Thanks!

Could you tell me how can I add support for other languages? It doesn't display polish special characters during conversation which leads to wrong pronunciation by TTS model.

Input works but output by the model shows truncated text without special characters

getusermedia

How do you use this in a headless server? The latest browsers (firefox/chrome) want a https connection. So instead of localhost, I just enter IP of server on the LAN. Where to install the certificates at?

vad error {"message":"getUserMedia is not implemented in this browser"}

This is nice though!

Feat: Is it possible to auto-create a new folder during the installation of Amica?

Installer Used: amica_0.2.0_x64-setup.exe

Issue

Due to my negligence, I forgot to create a new folder during the installation of Amica and I directly chose D:\Apps. Amica was installed in D:\Apps\.

When I uninstalled it, the uninstaller started deleting the entire APP folder. Fortunately, I terminated the process before my some important apps(like goland, unreal engine) was deleted, only some less-used apps and my Epic Games were removed. T_T

Because I'm used to Windows software installations, in which they usually create a new folder automatically, such as Goland, UE. When I install them, a new folder will be created automatically:

D:\Apps -> D:\Apps\Goland 20xx-xx\
D:\Apps -> D:\Apps\UE_5.3\

Possible Solutions

After reading some tauri docs, I came up with two possible solustions:

1. Customize the Windows installer

I think this is troublesome and it might be not in the scope of amica.

Because Tauri does not provide relevant configurations, it is necessary to use a custom NSIS script. Here are some docs I found:

2. Tell users to create a new folder

Add a note in the Amica documentation to remind users to create a new folder during installation to prevent the unintentional deletion of important files during uninstallation.

What do you think about it? >_<

Potential errors in the system prompt

First of all, this is a cool project, thank you for making it open source!

The issue is about the system_prompt configuration variable: https://github.com/semperai/amica/blob/438c09ccd42b7f9afc8363afb14b00bc2e94731f/src/utils/config.ts#L37C112-L37C112

The default system prompt is a bit weird at times. I am not sending a patch because, while the changes are trivial, I am not certain of the intention of the authors.

There are five types of emotions: 'neutral' which indicates normal, 'happy' which indicates joy, 'angry' which indicates anger, 'sad' which indicates sadness, and 'relaxed' which indicates calmness.

While "normal" can be used as a noun, I believe "normality" would feel more natural in this context.

The prompt then uses the word "conversation" to refer to individual messages. "Message" would be more natural.

Please do not use polite language.

This is counterintuitive. Is there any reason for asking the model to not use polite language?

Colab

I think it would be nice to create a colab.

The following somewhat works:

# Install node
!dpkg --configure -a
!sudo apt-get update
!sudo apt-get install -y ca-certificates curl gnupg
!sudo mkdir -p /etc/apt/keyrings
!curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
!NODE_MAJOR=21 && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
!sudo apt-get update
!sudo apt-get install nodejs -y

# Clone repo
!git clone https://github.com/semperai/amica.git
%cd amica
!npm install

from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(3000)"))

import subprocess #I don't know if that is even needed
subprocess.Popen(["npm", "run", "dev"])

It only gives some error because of the service worker and loading the avatar takes some time, but the echo mode works fine in my tests.

How is possible to add new speechT5 models

How is possible change the text to speech model ? Is possible to use other .bin like voxpopuli for Italian language or other trained by ourself ? I try to add the voxpopuli.bin file in the public directory but the app stop to work without info in debug .
The main difference is that original model are 2 kbyte the voxpopuli are 500 mbyte .

Custom actions

We should add function calling since the model uses ollama can add ollama functions to be able to have amica search the web, extract info from news, Turn on and off lights etc.

Design and create an extension API

This would open the door to further community involvement and could foster a dev ecosystem.
It could also provide a way to split up current features and implementations into modules.

Refactoring the current architecture could be costly, but in the long run it could pay off in many ways, such as not having to individually implement and maintain multiple LLM engine apis and such.

However, it is worth considering the end purpose of this application and comparing it to others. Probably wouldn't want it to end up too similar to oobabooga.

Thoughts?

Support mid-phrase interruption

Requires:

  • detect when user has been typing to interrupt
  • detecting when user has begun speaking long enough to interrupt
  • stop the currently spoken stream (audio) or the chat response stream mid-phrase
  • optional: perform some sort of audio filter to have realistic pause of speech
  • detect how much of the phrase has been spoken to include for context

Autotranslate - Google Translate / Deepl API?

Most models have poor generation quality in languages other than English. So much so that intermediate translation between generations (Human input โ‡’ English, AI output โ‡’ User Language) will be a better approach. Can we hope to add this feature?

Support different prompting templates

Thank you for Amica, I'm having a lot of fun with it!

I tried using the yi-34B models with llama.cpp with it, and generally it works, but the results aren't optimal. For example, the reply always ends with "<|im_end|>" which ends up visible in the conversation.

It would be great to have a way to not just change the system prompt, but to change the prompting template altogether. Here's ChatML, which Yi is trained to use:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

create docker file for web version

Hey, first off love the project.

I wonder if it is possible to build the web version as a docker image. It would make deploying much easier.

If possible add a docker-compose file that also contains the needed endpoints such as ollama. So setup is very easy

build docker image and publish it.

Hello,
In response to the docker file i made. #61

Would it be possible to publish the docker image ter docker hub or github packages?

o eith

I could write a GitHub action that automatically builds the docker image and puts it on GitHub packages. i think this could be possible with docker hub too.

It could be possible to write it as such that it runs when there is a commit (for a beta image) and with every release (every major update)

Build controllable animation system

We should be able to import all animations from mixamo (and others!) and use a similar system to the expression system to play animations when appropriate.

[tauri] check for updated version

look up heyamica.com/releases.json and compare version with json, show some indicator (maybe with system tray?) which prompts users to download -> links to download page.

Design websocket protocol for 2 way interaction servers

Let's say we want to support home automation, currently it is possible to support by forking and adding some home automation specific code. However, with the wide variety of different home automation systems, it may make more sense to instead have a script which connects to amica, can say things as a user, and can read what she is saying (or other events).

The amica websocket server could use JSON RPC.

Supports receiving these:

  • register: {client_name } (registers client)
  • messages (get message list)
  • set_messages
  • message: {store(store in message list), interrupt(to interrupt amica or not)} (send message to amica from client)
  • set_animation
  • set_expression
  • set_config: {key, value}

Support broadcasting these sorts of events:

  • message_received: { from(which client or web interface or amica/assistant)}
  • interruption (currently spoken stream interrupted by user, include data to figure out where was interrupted)
  • config_change: {key, value}

This may make it possible to do really cool things like:

  • add a parser for when the user sends a youtube video link, run a script that will download the youtube video and provide it to amica somehow, and allow the user to talk about it with amica
  • allow user to discuss a github repo with amica
  • connect amica to an irc(or similar) chatroom
  • have amica be able to query the internet or perform other actions
  • connect amica to minecraft to allow her to play with you

Switch to using emojis for emotion system (and animations)

There has been persistent problems with having the chatbot use the correct format of [emotion] message. An idea shared with me was to instead utilize emojis. These fail gracefully, and there should already be a lot of training data utilizing emojis. We can search for specific types, and trigger expression changes, or for some emojis - trigger animations. This should hopefully allow a non-finetuned model to be able to trigger any of the gestures and expressions already exposed by the emoji system, will need to import the animations to make this fully work.

A much larger amount of expressions can be grabbed correctly (see https://emojipedia.org/smileys )

Some I think interesting from https://emojipedia.org/people :

  • thumbs up/down
  • waving
  • dancing
  • bowing
  • facepalming
  • juggling
  • lotus position

This is a pretty large change but I think it should provide a much better route.

We also should save some space in prompt context.

feat: add Character Card and Lorebooks to prompt

Character Card

The Character Card Specification provides guidelines for creating character. These character cards contain information about a specific character, including their name, description, personality, scenario, first message, and example conversations. It can be embedded into png images.

Lorebooks

A lorebook is a series of defined keywords that, when activated, insert specific content into the prompt. They can be used to serve content to the AI about the character's backstory, setting, environment, etc without needing to have it be in character definitions taking up permanent token space.

Additionally, the use of external tools like https://Chub.ai to create and share character cards and lorebooks.

Refs:

Support newer node versions

Thanks a lot for developing this project! The result is pretty impressive.

Currently, the package.json only specifies that Node 18 is supported. It would be nice if newer Nodes could be used. I just tried it with Node 21 (the "Echo" Chat mode), and it seemed to work.

Create script which generates .pkg file

It would be nice to distribute an easy to install method for people that are not technical.

If a script could be run in amica repo which would:

  • build the frontend
  • take llama.cpp and the basic-openai-api python repo and grab their dependencies and package them
  • have some script that starts llama.cpp and basic-openai then opens browser (maybe in kiosk mode?)
  • maybe add this to top bar so that it can be exited to reclaim memory

Then we can share the .pkg file on heyamica.com and make it easier for people to try this out.

Some questions:

  • can install script curl openchat and show progress
  • what about installing all of the python dependencies in a venv?

Provide a way to export / import characters

Imagine you can save:

  • system prompt
  • character model
  • any animations (default animation only for now)
  • background image
  • (or) background video
  • voice cloning file (not currently supported in ui)

As a tarball or equivalent. Then, these can be loaded into the ui. There could be a list for selecting characters you have imported.

VRM selection is imo suboptimal.

image Admittedly I got somewhat overboard with doenloading vrms, but oh well. Thing is I have no good way to generate previes, and as it's showing it's quite not great. There are couple more issues, one that I'll probably get to sometime maybe, is when new char or anim is loaded char gets "reset", I mean position/orientation/zoom, which I find mildly annoying. But another thing which imo is worse, is that damn menu is covering my view, and also blurring so I can't easily see what I'm selecting/flipping through. So IMO when choosing char, it should not be centered and blurred, to allow to easily cycle and see characters.

Model on online demo

Hi.

It's not clear to me what model is being used in the online demo, but it appears to be LLaMA 2, correct me if I am wrong.

If this is so, may I suggest considering switching to something like mistral-openorca, zephyr, dolphin2.2.1-mistral, openhermes2.5-mistral, openchat or neural-chat?

Unlike LLaMA (2), which has a proprietary license by Meta, these models are all under an open source license and are based on Mistral, which is under the Apache 2.0 license and some of them may even outperform LLaMA 2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.