myshell-ai / ailice Goto Github PK

View Code? Open in Web Editor NEW

781.0 19.0 118.0 3.53 MB

AIlice is a fully autonomous, general-purpose AI agent.

License: MIT License

Python 88.42% Dockerfile 0.35% HTML 11.23%

agent ai llm llm-agent

ailice's Introduction

AIlice

Quick Start • Demo • Development • Twitter • Reddit

🔥 Jun 22, 2024: We have entered the era of locally running JARVIS-like AI assistants! The latest open-source LLMs enable us to perform complex tasks locally! Click here to learn more.

AIlice is a fully autonomous, general-purpose AI agent. This project aims to create a standalone artificial intelligence assistant, similar to JARVIS, based on the open-source LLM. AIlice achieves this goal by building a "text computer" that uses a Large Language Model (LLM) as its core processor. Currently, AIlice demonstrates proficiency in a range of tasks, including thematic research, coding, system management, literature reviews, and complex hybrid tasks that go beyond these basic capabilities.

AIlice has reached near-perfect performance in everyday tasks using GPT-4 and is making strides towards practical application with the latest open-source models.

We will ultimately achieve self-evolution of AI agents. That is, AI agents will autonomously build their own feature expansions and new types of agents, unleashing LLM's knowledge and reasoning capabilities into the real world seamlessly.

To understand AIlice's present abilities, watch the following videos:

Features
Quick Start
- Quick Installation
- COOL things we can do
Installation and Usage
Selection and Configuration of LLM
Development

Features

Key technical features of AIlice include:

In-depth research capabilities on specialized subjects.
The ability to read and analyze articles and scholarly works.
Advanced automation in programming and script execution, functioning as a comprehensive coder and an efficient system management tool, similar to an AI-powered operating system.
Voice interaction support.
Compatibility with open-source models and seamless integration with commercial models like GPT-4.
A more intuitive and flexible approach to user interaction, allowing for seamless conversation participation as an agent or the ability to intervene during task execution.
Support for multi-modal models.
A natural and highly fault-tolerant Interactive Agents Calling Tree architecture.
Flexible parsing of LLM outputs, enabling a broader range of function call mechanisms.
The capability to self-construct and dynamically load modules for interacting with the environment, providing endless possibilities for expanding features.

Quick Start

Quick Installation

Install and run AIlice with the following commands. Once AIlice is launched, use a browser to open the web page it provides, a dialogue interface will appear. Issue commands to AIlice through the conversation to accomplish various tasks. For your first use, you can try the commands provided in the COOL things we can do section to quickly get familiarized.

git clone https://github.com/myshell-ai/AIlice.git
cd AIlice
pip install -e .
ailice_web --modelID=oai:gpt-4o

For a more detailed understanding of the installation and configuration methods, please visit the Installation and Usage section and the Selection and Configuration of LLM section.
To grasp the basic design principles of AIlice, navigate to the Design section.

COOL things we can do

Let's list some typical use cases. I frequently employ these examples to test AIlice during development, ensuring stable performance. However, even with these tests, the execution results are influenced by the chosen model, code version, and even the testing time. (GPT-4 may experience a decrease in performance under high loads. Some random factors can also lead to different results from running the model multiple times. Sometimes the LLM performs very intelligently, but other times it does not) Additionally, AIlice is an agent based on multi-agent cooperation, and as a user, you are also one of the "agents". Hence, when AIlice requires additional information, it will seek input from you, and the thoroughness of your details is crucial for her success. Furthermore, if the task execution falls short, you can guide her in the right direction, and she will rectify her approach.

The last point to note is that AIlice currently lacks a run time control mechanism, so she might get stuck in a loop or run for an extended period. When using a commercial LLM, you need to monitor her operation closely.

"Please list the contents of the current directory."
"Find David Tong's QFT lecture notes and download them to the "physics" folder in the current directory. You may need to create the folder first."
"Deploy a straightforward website on this machine using the Flask framework. Ensure accessibility at 0.0.0.0:59001. The website should have a single page capable of displaying all images located in the 'images' directory." This one is particularly interesting. We know that drawing cannot be done in the docker environment, and all the file output we generate needs to be copied using the "docker cp" command to see it. But you can let AIlice solve this problem by itself: deploy a website in the container according to the above prompt(It is recommended to use ports between 59001 and 59200 that has been port mapped), the images in the directory will be automatically displayed on the web page. In this way, you can dynamically see the generated image content on the host. You can also try to let her iterate to produce more complex functions. If you don't see any images on the page, please check whether the "images" folder of the website is different from the "images" folder here (for example, it might be under "static/images").
"Please use python programming to solve the following tasks: obtain the price data of BTC-USDT for six months and draw it into a graph, and save it in the 'images' directory." If you successfully deployed the above website, you can now see the BTC price curve directly on the page.
"Find the process on port 59001 and terminate it." This will terminate the website service program that was just established.
"Please use cadquery to implement a cup." This is also a very interesting attempt. Cadquery is a python package that uses python programming for cad modeling. We try to use AIlice to automatically build 3D models! This can give us a glimpse of how mature geometric intuition can be in LLM's world view. Of course, after implementing multimodal support, we can enable AIlice to see the models she creates, allowing for further adjustments and establishing a highly effective feedback loop. This way, it might be possible to achieve truly usable language-controlled 3D modeling.
"Please search the internet for 100 tutorials in various branches of physics and download the PDF files you find to a folder named 'physics'. There is no need to verify the content of pdfs, we only need a rough collection for now." Utilizing AIlice to achieve automatic data set collection and construction is one of our ongoing objectives. Currently, the researcher employed for this functionality still has some deficiencies, but it is already capable of providing some intriguing results.
"Please conduct an investigation on open-source PDF OCR tools, with a focus on those capable of recognizing mathematical formulas and converting them into LaTeX code. Consolidate the findings into a report."
1. Find the video of Feynmann's lectures on youtube and download them to Feynmann/ subdir. you need to create the folder first. 2. Extract the audio from these videos and save them to Feynmann/audio. 3. Convert these audio files to text and merge them into a text document. You need to first go to Hugging Face and find the page for whisper-large-v3, locate the example code, and refer to the sample code to get this done. 4. Find the answer to this question from the text files you just extracted: Why do we need antiparticles? This is a multi-step prompt-based task where you need to interact with AIlice step by step to complete the task. Naturally, there might be unexpected events along the way, so you'll need to maintain good communication with AIlice to resolve any issues you encounter(Using the "Interrupt" button to interrupt AIlice at any time and give a prompt is a good option!). Finally, based on the content of the downloaded video, you can ask AIlice a physics-related question. Once you receive the answer, you can look back and see how far you've come together.
1. Use SDXL to generate an image of "a fat orange cat". You need to find the sample code on its Huggingface page as a reference to complete the programming and image generation work. Save the image to the current directory and display it. 2. Now let's implement a single-page website. The function of the webpage is to convert the text description entered by the user into an image and display it. Refer to the text-to-image code from before. The website runs on 127.0.0.1:59102. Save the code to ./image_gen before you run it; you may need to create the folder first.
"Please write an ext-module. The function of the module is to obtain the content of related pages on the wiki through keywords." AIlice can construct external interaction modules (we call it ext-modules) on her own, thereby endowing her with unlimited extensibility. All it takes is a few prompts from you. Once the module is constructed, you can instruct AIlice by saying, "Please load the newly implemented wiki module and utilize it to query the entry on relativity."

Installation and Usage

Environment Configuration and Installation

Agents need to interact with various aspects of the surrounding environment, their operating environment is often more complex than typical software. It may take us a long time to install the dependencies, but fortunately, this is basically done automatically.

To run AIlice, you need to ensure that Chrome are correctly installed. If you need to execute code in a secure virtual environment, you also need to install Docker.

If you want to run AIlice in a virtual machine, ensure Hyper-V is turned off(otherwise llama.cpp cannot be installed). In a VirtualBox environment, you can disable it by following these steps: disable PAE/NX and VT-X/AMD-V ( Hyper-V) on VirtualBox settings for the VM. Set paravirtualization Interface to Default, disable nested paging.

You can use the following command to install AIlice (It is strongly recommended to use tools such as conda to create a new virtual environment to install AIlice, so as to avoid dependency conflicts):

git clone https://github.com/myshell-ai/AIlice.git
cd AIlice
pip install -e .

For users who need to use the voice dialogue or model fine-tuning or pdf reading functions, you can use one of the following command(Installing too many features increases the likelihood of dependency conflicts, so it is recommended to install only the necessary parts):

pip install -e .[huggingface]
pip install -e .[speech]
pip install -e .[finetuning]
pip install -e .[pdf-reading]

You can run AIlice now! Use the commands in Usage.

AIlice installed by default will run slowly because it uses CPU as the inference hardware of the long-term memory module. Therefore, it is strongly recommended to configure it to run on GPU according to your own circumstances.

If You Need to Frequently Use Google

By default, the Google module in AIlice is restricted, and repeated usage can lead to errors requiring some time to resolve. This is an awkward reality in the AI era; traditional search engines only allow access to genuine users, and AI agents currently don't fall within the category of 'genuine users'. While we have alternative solutions, they all require configuring an API key, which sets a high barrier for entry for ordinary users. However, for users who require frequent access to Google, I assume you'd be willing to endure the hassle of applying for a Google's official API key (We are referring to Custom Search JSON API, which requires you to specify searching the entire internet at the time of creation) for search tasks. For these users, please open config.json and use the following configuration:

{
    ...
    "services": {
        ...
        "google": {
          "cmd": "python3 -m ailice.modules.AGoogleAPI --addr=ipc:///tmp/AGoogle.ipc --api_key=YOUR_API_KEY --cse_id=YOUR_CSE_ID",
          "addr": "ipc:///tmp/AGoogle.ipc"
        },
        ...
    }
}

and install google-api-python-client:

pip install google-api-python-client

Then simply restart AIlice.

Accelerate using GPU

The vector database is currently AIlice's sole long-term memory mechanism, accessed frequently during runtime. We currently use CPU as the default device for embedding vector calculations, which may result in high CPU usage and slower response times. If you have GPU support for llama.cpp, switching to the GPU version of llama-cpp-python is the optimal choice:

pip uninstall llama-cpp-python

For Nvidia CUDA GPUs:

CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python

For AMD ROCM GPUs:

CMAKE_ARGS="-DGGML_HIPBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Use Vulkan:

CMAKE_ARGS="-DGGML_VULKAN=on" FORCE_CMAKE=1 pip install llama-cpp-python

Virtual Environment Settings for Code Execution

By default, code execution utilizes the local environment. To prevent potential AI errors leading to irreversible losses, it is recommended to install Docker, build a container, and modify AIlice's configuration file (AIlice will provide the configuration file location upon startup). Configure its code execution module (AScripter) to operate within a virtual environment.

docker build -t env4scripter .
docker run -d -p 127.0.0.1:59000-59200:59000-59200 --name scripter env4scripter

In my case, when AIlice starts, it informs me that the configuration file is located at ~/.config/ailice/config.json, so I modify it in the following way

nano ~/.config/ailice/config.json

Modify "scripter" under "services":

{
    ...
    "services": {
        ...
        "scripter": {"cmd": "docker start scripter",
                     "addr": "tcp://127.0.0.1:59000"},
    }
}

Now that the environment configuration has been done.

Code Update

Due to the ongoing development status of AIlice, updating the code may result in incompatibility issues between existing configuration file and Docker container with the new code. The most thorough solution for this scenario is to delete the configuration file (making sure to save any API keys beforehand) and the container, and then perform a complete reinstall. However, for most situations, you can address the issue by simply deleting the configuration file and updating the AIlice module within the container.

rm ~/.config/ailice/config.json
cd AIlice
docker cp ailice/__init__.py scripter:scripter/ailice/__init__.py
docker cp ailice/common/__init__.py scripter:scripter/ailice/common/__init__.py
docker cp ailice/common/ADataType.py scripter:scripter/ailice/common/ADataType.py
docker cp ailice/common/lightRPC.py scripter:scripter/ailice/common/lightRPC.py
docker cp ailice/modules/__init__.py scripter:scripter/ailice/modules/__init__.py
docker cp ailice/modules/AScripter.py scripter:scripter/ailice/modules/AScripter.py
docker cp ailice/modules/AScrollablePage.py scripter:scripter/ailice/modules/AScrollablePage.py
docker restart scripter

Usage

You can directly copy a command from the typical use cases below to run AIlice.

ailice_web --modelID=oai:gpt-4o
ailice_web --modelID=anthropic:claude-3-5-sonnet-20240620
ailice_web --modelID=oai:gpt-4-1106-preview --chatHistoryPath=./chat_history
ailice_web --modelID=anthropic:claude-3-opus-20240229 --prompt="researcher"
ailice_web --modelID=mistral:mistral-large-latest
ailice_web --modelID=deepseek:deepseek-chat
ailice_web --modelID=hf:Open-Orca/Mistral-7B-OpenOrca --quantization=8bit --contextWindowRatio=0.6
ailice_web --modelID=hf:NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO --quantization=4bit --contextWindowRatio=0.3
ailice_web --modelID=hf:Phind/Phind-CodeLlama-34B-v2 --prompt="coder-proxy" --quantization=4bit --contextWindowRatio=0.6
ailice_web --modelID=groq:llama3-70b-8192
ailice_web   #Use models configured individually for different agents under the agentModelConfig field in config.json.
ailice_web --modelID=openrouter:openrouter/auto
ailice_web --modelID=openrouter:mistralai/mixtral-8x22b-instruct
ailice_web --modelID=openrouter:qwen/qwen-2-72b-instruct
ailice_web --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5

It should be noted that the last use case requires you to configure the LLM inference service first, please refer to How to Add LLM Support. Using inference frameworks such as LM Studio can use limited hardware resources to support larger models, provide faster inference speed and faster AIlice startup speed, making it more suitable for ordinary users.

When you run it for the first time, you will be asked to enter the api-key of openai. If you only want to use open source LLM, you do not need to enter it. You can also modify the api-key by editing the config.json file. Please note that the first time When using an open source LLM, it will take a long time to download the model weights, please make sure you have enough time and disk space.

When you turn on the speechOn switch for the first time, you may need to wait for a long time at startup. This is because the weights of the speech recognition and TTS models are being downloaded in the background.

As shown in the examples, you can use the Agent through ailice_web, it provides a web dialogue interface. You can view the default value of each parameter by using

ailice_web --help

The default values for all command line arguments can be customized by modifying the corresponding parameters in config.json.

--modelID There are two modes for model configuration. In the first mode, the model is uniformly specified by modelID. In the second mode, different types of agents will run on different models. When this parameter is an empty string (unspecified), the second mode will be used automatically, i.e., the models configured individually for different agents under the agentModelConfig field in config.json will be used, for details please refer to Using Different Models in Different Agents. The currently supported models can be seen in config.json.
--quantization is the quantization option, you can choose 4bit or 8bit. The default is not quantized.
--maxMemory is the memory video memory capacity constraint, the default is not set, the format when set is like "{0:"23GiB", 1:"24GiB", "cpu": "64GiB"}".
--prompt specifies the prompt to be executed, which is the type of agent. The default is 'main', this agent will decide to call the appropriate agent type according to your needs. You can also specify a special type of agent and interact with it directly.
--temperature sets the temperature parameter of LLM reasoning, the default is zero.
--flashAttention2 is the switch to enable flash attention 2 to speed up inference. It may have a certain impact on output quality.
--contextWindowRatio is a user-specified proportion coefficient, which determines the proportion of the upper limit of the prompt length constructed during inference to the LLM context window in some cases. The default value is 0.6.
--speechOn is the switch to enable voice conversation.
--ttsDevice specifies the computing device used by the text-to-speech model. The default is "cpu", you can set it to "cuda" if there is enough video memory.
--sttDevice specifies the computing device used by the speech-to-text model. The default is "cpu", you can set it to "cuda" if there is enough video memory.
--chatHistoryPath is used to specify the directory where chat history data is stored.
--certificate Certificate settings for the web interface. The simplest option is an empty string, which will use the HTTP protocol for the UI web page. Setting it to 'adhoc' will use a self-generated certificate, providing encryption for the data flow between the UI and server, but it requires dismissing browser security warnings. The most secure method is to apply for a certificate and set this parameter to '{"cert": "your_cert.pem", "key": "your_key.pem")'.
--share create a publicly shareable link for AIlice. (For security reasons, we have temporarily removed this feature. It will be re-enabled once more security measures are implemented in the UI. Please ensure that the services provided by app.py are not exposed to any untrusted networks)

Module Configuration

The configuration file of AIlice is named config.json, and its location will be output to the command line when AIlice is started. In this section, we will introduce how to configure the external interaction modules through the configuration file.

In AIlice, we use the term "module" to specifically refer to components that provide functions for interacting with the external world. Each module runs as an independent process; they can run in different software or hardware environments from the core process, making AIlice capable of being distributed. We provide a series of basic module configurations in the configuration file required for AIlice's operation (such as vector database, search, browser, code execution, etc.). You can also add configurations for any third-party modules and provide their module runtime address and port after AIlice is up and running to enable automatic loading. Module configuration is very simple, consisting of only two items:

  "services": {
    ...
    "scripter": {"cmd": "python3 -m ailice.modules.AScripter --addr=tcp://127.0.0.1:59000",
	               "addr": "tcp://127.0.0.1:59000"},
    ...
  }

Among these, under "cmd" is a command line used to start the module's process. When AIlice starts, it automatically runs these commands to launch the modules. Users can specify any command, providing significant flexibility. You can start a module's process locally or utilize Docker to start a process in a virtual environment, or even start a remote process. Some modules have multiple implementations (such as Google/Storage), and you can configure here to switch to another implementation.

"addr" refers to the address and port number of the module process. Users might be confused by the fact that many modules in the default configuration have both "cmd" and "addr" containing addresses and port numbers, causing redundancy. This is because "cmd" can, in principle, contain any command (which may include addresses and port numbers, or none at all). Therefore, a separate "addr" item is necessary to inform AIlice how to access the module process.

Useful Tips

Interrupts. Interrupts are the second interaction mode supported by AIlice, which allows you to interrupt and provide prompts to AIlice's agents at any time to correct errors or provide guidance. In ailice_web, during AIlice's task execution, a interrupt button appears on the right side of the input box. Pressing it pauses AIlice's execution and waits for your prompt message. You can enter your prompt into the input box and press Enter to send the message to the agent currently executing the subtask. Proficient use of this feature requires a good understanding of AIlice's workings, especially the agent calling tree architecture. It also involves focusing more on the command line window rather than the dialogue interface during AIlice's task execution. Overall, this is a highly useful feature, especially on less powerful language model setups.

First use GPT-4 to successfully run some simple use cases, then restart AIlice with a less powerful (but cheaper/open-source) model to continue running new tasks based on the previous conversation history. This way, the history provided by GPT-4 serves as a successful example, offering valuable reference for other models and significantly increasing the chances of success.

Selection and Configuration of LLM

Guide to Choosing an LLM

Updated on Aug 23, 2024.

Currently, AIlice can handle more complex tasks using the locally run 72B open-source model (qwen-2-72b-instruct running on 4090x2), with performance approaching that of GPT-4 level models. Considering the low cost of open-source models, we highly recommend users to start using them. Moreover, localizing LLM operations ensures absolute privacy protection, a rare quality in AI applications in our time. Click here to learn how to run this model locally. For users whose GPU conditions are insufficient to run large models, this is not a problem. You can use the online inference service (such as openrouter, this will be mentioned next) to access these open-source models (though this sacrifices privacy). Although open-source models cannot yet fully rival commercial GPT-4 level models, you can make agents excel by leveraging different models according to their strengths and weaknesses. For details, please refer to Using Different Models in Different Agents.

claude-3-5-sonnet-20240620 provides the best performance.

gpt-4o and gpt-4-1106-preview also offer top-level performance. But due to the long running time of the Agent and the great consumption of tokens, please use commercial models with caution. gpt-4o-mini works very well, and although it's not top-notch, its low price makes this model very attractive. gpt-4-turbo/gpt-3.5-turbo is surprisingly lazy, and we have never been able to find a stable prompt expression.

Among the open-source models, the ones that usually perform well include:

meta-llama-3.1-70B-Instruct
Qwen/Qwen2-72B-Instruct

meta-llama-3.1-405B-Instruct is nice, but too big to be practical on PC.

For users whose hardware capabilities are insufficient to run open-source models locally and who are unable to obtain API keys for commercial models, they can try the following options:

openrouter This service can route your inference requests to various open-source or commercial models without the need to deploy open-source models locally or apply for API keys for various commercial models. It's a fantastic choice. AIlice automatically supports all models in OpenRouter. You can choose autorouter:openrouter/auto to let the autorouter automatically route for you, or you can specify any specific model configured in the config.json file. Thank @babybirdprd for recommending OpenRouter to me.
groq:llama3-70b-8192 Of course, AIlice also supports other models under Groq. One issue with running under Groq is that it's easy to exceed rate limits, so it can only be used for simple experiments.

The Most Outstanding Open-source Model

We will select the currently best-performing open-source model to provide a reference for users of open-source models.

The best among all models: qwen-2-72b-instruct. This is the first open-source model with practical value. It's a great advancement! It has reasoning capabilities close to GPT-4, though not quite there yet. With active user intervention through the interrupt feature, many more complex tasks can be successfully completed.
The second-best performing models: mixtral-8x22b-instruct and meta-llama/Meta-Llama-3-70B-Instruct. It's worth noting that the Llama3 series models seem to exhibit a significant performance drop after quantization, which reduces their practical value. You can use them with Groq.

If you find a better model, please let me know.

How to Add LLM Support

For advanced players, it is inevitable to try more models. Fortunately, this is not difficult to achieve.

Using LLM through Inference Services

For openai/mistral/anthropic/groq models, you don't need to do anything. Just use the modelID consisting of the official model name appended to the "oai:"/"mistral:"/"anthropic:"/"groq:" prefix. If you need to use a model that is not included in AIlice's supported list, you can resolve this by adding an entry for this model in the config.json file. The method for adding is to directly reference the entry of a similar model, modify the contextWindow to the actual value, and keep the systemAsUser consistent with the similar model.

You can use any third-party inference server compatible with the OpenAI API to replace the built-in LLM inference functionality in AIlice. Just use the same configuration format as the openai models and modify the baseURL, apikey, contextWindow and other parameters (Actually, this is how AIlice supports Groq models).

For inference servers that do not support the OpenAI API, you can try using litellm to convert them into an OpenAI-compatible API (we have an example below).

It's important to note that due to the presence of many SYSTEM messages in AIlice's conversation records, which is not a common use case for LLM, the level of support for this depends on the specific implementation of these inference servers. In this case, you can set the systemAsUser parameter to true to circumvent the issue. Although this might prevent the model from running AIlice at its optimal performance, it also allows us to be compatible with various efficient inference servers. For the average user, the benefits outweigh the drawbacks.

Example 1: ollama + litellm

We use Ollama as an example to explain how to add support for such services. First, we need to use Litellm to convert Ollama's interface into a format compatible with OpenAI.

pip install litellm
ollama pull mistral-openorca
litellm --model ollama/mistral-openorca --api_base http://localhost:11434 --temperature 0.0 --max_tokens 8192

Then, add support for this service in the config.json file (the location of this file will be prompted when AIlice is launched).

{
  "maxMemory": {},
  "quantization": null,
  "models": {
    "oai": {
      ...
    },
    "ollama": {
      "modelWrapper": "AModelChatGPT",
      "apikey": "fake-key",
      "baseURL": "http://localhost:8000",
      "modelList": {
        "mistral-openorca": {
          "formatter": "AFormatterGPT",
          "contextWindow": 8192,
          "systemAsUser": false
        }
      }
    },
    ...
  },
  ...
}

Now we can run AIlice:

ailice_web --modelID=ollama:mistral-openorca

Example 2: LM Studio

In this example, we will use LM Studio to run the most open source model I've ever seen: Qwen2-72B-Instruct-Q3_K_S.gguf, powering AIlice to run on a local machine.

Download model weights of Qwen2-72B-Instruct-Q3_K_S.gguf using LM Studio.

In the LM Studio's "LocalServer" window, set n_gpu_layers to -1 if you want to use GPU only. Adjust the 'Context Length' parameter on the left to 16384(or a smaller value based on your available memory), and change the 'Context Overflow Policy' to 'Keep the system prompt and the first user message, truncate middle'.

Run the service. We assume the address of the service is "http://localhost:1234/v1/".

Then, we open config.json and make the following modifications:

{
  "maxMemory": {},
  "quantization": null,
  "models": {
    "oai": {
      ...
    },
    "lm-studio": {
      "modelWrapper": "AModelChatGPT",
      "apikey": "fakekey",
      "baseURL": "http://localhost:1234/v1/",
      "modelList": {
        "qwen2-72b": {
          "formatter": "AFormatterGPT",
          "contextWindow": 32764,
          "systemAsUser": true
        }
      }
    },
    ...
  },
  ...
}

Finally, run AIlice. You can adjust the 'contextWindowRatio' parameter based on your available VRAM or memory space. The larger the parameter, the more VRAM space is required.

ailice_web --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5

Example 3: Add open source multimodal model support

Similar to what we did in the previous section, after we use LM Studio to download and run LLAVA, we modify the configuration file as follows:

{
  "maxMemory": {},
  "quantization": null,
  "models": {
    "oai": {
      ...
    },
    "lm-studio": {
      "modelWrapper": "AModelChatGPT",
      "apikey": "fakekey",
      "baseURL": "http://localhost:1234/v1/",
      "modelList": {
        "llava-1.6-34b": {
          "formatter": "AFormatterGPTVision",
          "contextWindow": 4096,
          "systemAsUser": true
        }
      },
    },
    ...
  },
  ...
}

However, it should be noted that the current open source multi-modal model is far from sufficient to perform agent tasks, so this example is for developers rather than users.

Open Source Models on Huggingface

For open source models on Huggingface, you only need to know the following information to add support for new models: The huggingface address of the model, the prompt format of the model, and the context window length. Usually one line of code is enough to add a new model, but occasionally you are unlucky and you need about a dozen lines of code.

Here is the complete method of adding new LLM support:

Open config.json, you should add the config of new LLM into models.hf.modelList, which looks like the following:

{
  "maxMemory": {},
  "quantization": null,
  "models": {
    "hf": {
      "modelWrapper": "AModelCausalLM",
      "modelList": {
        "meta-llama/Llama-2-13b-chat-hf": {
          "formatter": "AFormatterLLAMA2",
          "contextWindow": 4096,
          "systemAsUser": false
        },
        "meta-llama/Llama-2-70b-chat-hf": {
          "formatter": "AFormatterLLAMA2",
          "contextWindow": 4096,
          "systemAsUser": false
        },
        ...
      }
    },
  ...
  }
...
}

"formatter" is a class that defines LLM's prompt format. You can find their definitions in core/llm/AFormatter. You can read these codes to determine which format is required for the model you want to add. In case you don't find it, You need to write one yourself. Fortunately, Formatter is a very simple thing and can be completed in more than a dozen lines of code. I believe you will understand how to do it after reading a few Formatter source codes.
The context window is a property that the LLM of the Transformer architecture usually has. It determines the length of text that the model can process at one time. You need to set the context window of the new model to the "contextWindow" key.
"systemAsUser": We use the "system" role as the sender of the message returned by the function calls. However, not all LLMs have a clear definition of system role, and there is no guarantee that the LLM can adapt to this approach. So we need to use systemAsUser to set whether to put the text returned by the function calls in user messages. Try to set it to False first.

Everything is done! Use "hf:" as a prefix to the model name to form a modelID, and use the modelID of the new model as the command parameter to start AIlice!

Using Different Models in Different Agents

AIlice has two operating modes. One mode uses a single LLM to drive all agents, while the other allows each type of agent to specify a corresponding LLM. The latter mode enables us to better combine the capabilities of open-source models and commercial models, achieving better performance at a lower cost. To use the second mode, you need to configure the agentModelConfig item in config.json first:

  "modelID": "",
  "agentModelConfig": {
    "DEFAULT": "openrouter:qwen/qwen-2-72b-instruct",
    "coder": "openrouter:deepseek/deepseek-coder"
  },

First, ensure that the default value for modelID is set to an empty string, then configure the corresponding LLM for each type of agent in agentModelConfig.

Finally, you can achieve the second operating mode by not specifying a modelID:

ailice_web

Development

Design

The basic principles when designing AIlice are:

Enriching the behavior of LLM with highly dynamic prompt construction mechanisms;
Separating different computational tasks as much as possible, using recursion and divide-and-conquer from traditional computing to solve complex problems.
Agents should be able to interact in both directions.

Let's briefly explain these fundamental principles.

Starting from the most obvious level, a highly dynamic prompt construction makes it less likely for an agent to fall into a loop. The influx of new variables from the external environment continuously impacts the LLM, helping it to avoid that pitfall. Furthermore, feeding the LLM with all the currently available information can greatly improve its output. For example, in automated programming, error messages from interpreters or command lines assist the LLM in continuously modifying the code until the correct result is achieved. Lastly, in dynamic prompt construction, new information in the prompts may also come from other agents, which acts as a form of linked inference computation, making the system's computational mechanisms more complex, varied, and capable of producing richer behaviors.

Separating computational tasks is, from a practical standpoint, due to our limited context window. We cannot expect to complete a complex task within a window of a few thousand tokens. If we can decompose a complex task so that each subtask is solved within limited resources, that would be an ideal outcome. In traditional computing models, we have always taken advantage of this, but in new computing centered around LLMs, this is not easy to achieve. The issue is that if one subtask fails, the entire task is at risk of failure. Recursion is even more challenging: how do you ensure that with each call, the LLM solves a part of the subproblem rather than passing the entire burden to the next level of the call? We have solved the first problem with the IACT architecture in AIlice, and the second problem is theoretically not difficult to solve, but it likely requires a smarter LLM.

The third principle is what everyone is currently working on: having multiple intelligent agents interact and cooperate to complete more complex tasks. The implementation of this principle actually addresses the aforementioned issue of subtask failure. Multi-agent collaboration is crucial for the fault tolerance of agents in operation. In fact, this may be one of the biggest differences between the new computational paradigm and traditional computing: traditional computing is precise and error-free, assigning subtasks only through unidirectional communication (function calls), whereas the new computational paradigm is error-prone and requires bidirectional communication between computing units to correct errors. This will be explained in detail in the following section on the IACT framework.

Computational Model: Interactive Agents Calling Tree

IACT Architecture Diagram. A user requirement to build a page for image collection and display is dynamically decomposed into two tasks: coder_spider and coder_website. When coder_spider encounters difficulties, it proactively seeks assistance from its caller, proxy_cat_gallery. Proxy_cat_gallery then creates another agent, researcher_api, and employs it to address the issue.

AIlice can be regarded as a computer powered by a LLM, and its features include:

Representing input, output, programs, and data in text form.
Using LLM as the processor.
Breaking down computational tasks through successive calls to basic computing units (analogous to functions in traditional computing), which are essentially various functional agents.

Therefore, user-input text commands are executed as a kind of program, decomposed into various "subprograms", and addressed by different agents, forming the fundamental architecture of AIlice. In the following, we will provide a detailed explanation of the nature of these basic computing units.

A natural idea is to let LLM solve certain problems (such as information retrieval, document understanding, etc.) through multi-round dialogues with external callers and peripheral modules in the simplest computational unit. We temporarily call this computational unit a "function". Then, by analogy with traditional computing, we allow functions to call each other, and finally add the concept of threads to implement multi-agent interaction. However, we can have a much simpler and more elegant computational model than this.

The key here is that the "function" that wraps LLM reasoning can actually be called and returned multiple times. A "function" with coder functionality can pause work and return a query statement to its caller when it encounters unclear requirements during coding. If the caller is still unclear about the answer, it continues to ask the next higher level caller. This process can even go all the way to the final user's chat window. When new information is added, the caller will reactivate the coder's execution process by passing in the supplementary information. It can be seen that this "function" is not a traditional function, but an object that can be called multiple times. The high intelligence of LLM makes this interesting property possible. You can also see it as agents strung together by calling relationships, where each agent can create and call more sub-agents, and can also dialogue with its caller to obtain supplementary information or report its progress. In AIlice, we call this computational unit "AProcessor"(essentially what we referred to as an agent). Its code is located in core/AProcessor.py.

Basic Computing Unit: Tai Chi Diagram of LLM and Interpreter

Next, we will elaborate on the structure inside AProcessor. The interior of AProcessor is a multi-round dialogue. The "program" that defines the function of AProcessor is a prompt generation mechanism, which generates the prompt for each round of dialogue from the dialogue history. The dialogue is one-to-many. After the external caller inputs the request, LLM will have multiple rounds of dialogue with the peripheral modules (we call them SYSTEM), LLM outputs function calls in various grammatical forms, and the system calls the peripheral modules to generate results and puts the results in the reply message. LLM finally gets the answer and responds to the external caller, ending this call. But because the dialogue history is still preserved, the caller can call in again to continue executing more tasks.

The last part we want to introduce is the parsing module for LLM output. In fact, we regard the output text of LLM as a "script" of semi-natural language and semi-formal language, and use a simple interpreter to execute it. We can use regular expressions to express a carefully designed grammatical structure, parse it into a function call and execute it. Under this design, we can design more flexible function call grammar forms, such as a section with a certain fixed title (such as "UPDATE MEMORY"), which can also be directly parsed out and trigger the execution of an action. This implicit function call does not need to make LLM aware of its existence, but only needs to make it strictly follow a certain format convention. For the most hardcore possibility, we have left room. The interpreter here can not only use regular expressions for pattern matching, its Eval function is recursive. We don't know what this will be used for, but it seems not bad to leave a cool possibility, right? Therefore, inside AProcessor, the calculation is alternately completed by LLM and the interpreter, their outputs are each other's inputs, forming a cycle.

Agent Design: Implementing the Interpreter Framework

In AIlice, the interpreter is one of the most crucial components within an agent. We use the interpreter to map texts from the LLM output that match specific patterns to actions, including function calls, variable definitions and references, and any user-defined actions. Sometimes these actions directly interact with peripheral modules, affecting the external world; other times, they are used to modify the agent's internal state, thereby influencing its future prompts.

The basic structure of the interpreter is straightforward: a list of pattern-action pairs. Patterns are defined by regular expressions, and actions are specified by a Python function with type annotations. Given that syntactic structures can be nested, we refer to the overarching structure as the entry pattern. During runtime, the interpreter actively detects these entry patterns in the LLM output text. Upon detecting an entry pattern (and if the corresponding action returns data), it immediately terminates the LLM generation to execute the relevant action.

The design of agents in AIlice encompasses two fundamental aspects: the logic for generating prompts based on dialogue history and the agent's internal state, and a set of pattern-action pairs. Essentially, the agent implements the interpreter framework with a set of pattern-action pairs; it becomes an integral part of the interpreter. The agent's internal state is one of the targets for the interpreter's actions, with changes to the agent's internal state influencing the direction of future prompts.

Generating prompts from dialogue history and the internal state is nearly a standardized process, although developers still have the freedom to choose entirely different generation logic. The primary challenge for developers is to create a system prompt template, which is pivotal for the agent and often demands the most effort to perfect. However, this task revolves entirely around crafting natural language prompts.

Scripting Language: From Text to Reality

AIlice utilizes a simple scripting language embedded within text to map the text-based capabilities of LLMs to the real world. This straightforward scripting language includes non-nested function calls and mechanisms for creating and referencing variables, as well as operations for concatenating text content. Its purpose is to enable LLMs to exert influence on the world more naturally: from smoother text manipulation abilities to simple function invocation mechanisms, and multimodal variable operation capabilities. Finally, it should be noted that for the designers of agents, they always have the freedom to extend new syntax for this scripting language. What is introduced here is a minimal standard syntax structure.

The basic syntax is as follows:

Variable Definition: VAR_NAME := <!|"SOME_CONTENT"|!>

Function Calls/Variable References/Text Concatenation: !FUNC-NAME<!|"...", '...', VAR_NAME1, "Execute the following code: \n" + VAR_NAME2, ...|!>

The basic variable types are str/AImage/various multimodal types. The str type is consistent with Python's string syntax, supporting triple quotes and escape characters.

This constitutes the entirety of the embedded scripting language.

The variable definition mechanism introduces a way to extend the context window, allowing LLMs to record important content into variables to prevent forgetting. During system operation, various variables are automatically defined. For example, if a block of code wrapped in triple backticks is detected within a text message, a variable is automatically created to store the code, enabling the LLM to reference the variable to execute the code, thus avoiding the time and token costs associated with copying the code in full. Furthermore, some module functions may return data in multimodal types rather than text. In such cases, the system automatically defines these as variables of the corresponding multimodal type, allowing the LLM to reference them (the LLM might send them to another module for processing).

Multimodal: Collaboration of Rich Text and Variable Mechanisms

In the long run, LLMs are bound to evolve into multimodal models capable of seeing and hearing. Therefore, the exchanges between AIlice's agents should be in rich text, not just plain text. While Markdown provides some capability for marking up multimodal content, it is insufficient. Hence, we will need an extended version of Markdown in the future to include various embedded multimodal data such as videos and audio.

Let's take images as an example to illustrate the multimodal mechanism in AIlice. When agents receive text containing Markdown-marked images, the system automatically inputs them into a multimodal model to ensure the model can see these contents. Markdown typically uses paths or URLs for marking, so we have expanded the Markdown syntax to allow the use of variable names to reference multimodal content.

Another minor issue is how different agents with their own internal variable lists exchange multimodal variables. This is simple: the system automatically checks whether a message sent from one agent to another contains internal variable names. If it does, the variable content is passed along to the next agent.

Why do we go to the trouble of implementing an additional multimodal variable mechanism when marking multimodal content with paths and URLs is much more convenient? This is because marking multimodal content based on local file paths is only feasible when AIlice runs entirely in a local environment, which is not the design intent. AIlice is meant to be distributed, with the core and modules potentially running on different computers, and it might even load services running on the internet to provide certain computations. This makes returning complete multimodal data more attractive. Of course, these designs made for the future might be over-engineering, and if so, we will modify them in the future.

Self-Expansion: Growing Like a Tree

One of the goals of AIlice is to achieve introspection and self-expansion (which is why our logo features a butterfly with its reflection in the water). This would enable her to understand her own code and build new functionalities, including new external interaction modules (i.e. new functions) and new types of agents (APrompt class). As a result, the knowledge and capabilities of LLMs would be more thoroughly unleashed.

Implementing self-expansion involves two parts. On one hand, new modules and new types of agents (APrompt class) need to be dynamically loaded during runtime and naturally integrated into the computational system to participate in processing, which we refer to as dynamic loading. On the other hand, AIlice needs the ability to construct new modules and agent types.

The dynamic loading mechanism itself is of great significance: it represents a novel software update mechanism. We can allow AIlice to search for its own extension code on the internet, check the code for security, fix bugs and compatibility issues, and ultimately run the extension as part of itself. Therefore, AIlice developers only need to place their contributed code on the internet, without the need to merge into the main codebase or consider any other installation methods. The implementation of the dynamic loading mechanism is continuously improving. Its core lies in the extension packages providing some text describing their functions. During runtime, each agent in AIlice finds suitable functions or agent types to solve sub-problems for itself through semantic matching and other means.

Building new modules is a relatively simple task, as the interface constraints that modules need to meet are very straightforward. We can teach LLMs to construct new modules through an example. The more complex task is the self-construction of new agent types (APrompt class), which requires a good understanding of AIlice's overall architecture. The construction of system prompts is particularly delicate and is a challenging task even for humans. Therefore, we pin our hopes on more powerful LLMs in the future to achieve introspection, allowing AIlice to understand herself by reading her own source code (for something as complex as a program, the best way to introduce it is to present itself), thereby constructing better new agents.

How Developers Should Get Started

For developing Agents, the main loop of AIlice is located in the AIliceMain.py or ui/app.py files. To further understand the construction of an agent, you need to read the code in the "prompts" folder, by reading these code you can understand how an agent's prompts are dynamically constructed.
For developers who want to understand the internal operation logic of AIlice, please read core/AProcessor.py and core/Interpreter.py. These two files contain approximately three hundred lines of code in total, but they contain the basic framework of AIlice.

Project Development Standards and Constraints

In this project, achieving the desired functionality of the AI Agent is the primary goal. The secondary goal is code clarity and simplicity. The implementation of the AI Agent is still an exploratory topic, so we aim to minimize rigid components in the software (such as architecture/interfaces imposing constraints on future development) and provide maximum flexibility for the application layer (e.g., prompt classes). Abstraction, deduplication, and decoupling are not immediate priorities.
When implementing a feature, always choose the best method rather than the most obvious one. The metric for "best" often includes traits such as trivializing the problem from a higher perspective, maintaining code clarity and simplicity, and ensuring that changes do not significantly increase overall complexity or limit the software's future possibilities.
Adding comments is not mandatory unless absolutely necessary; strive to make the code clear enough to be self-explanatory. While this may not be an issue for developers who appreciate comments, in the AI era, we can generate detailed code explanations at any time, eliminating the need for unstructured, hard-to-maintain comments.
Follow the principle of Occam's razor when adding code; never add unnecessary lines.
Functions or methods in the core should not exceed 60 lines.
While there are no explicit coding style constraints, maintain consistency or similarity with the original code in terms of naming and case usage to avoid readability burdens.

AIlice aims to achieve multimodal and self-expanding features within a scale of less than 5000 lines, reaching its final form at the current stage. The pursuit of concise code is not only because succinct code often represents a better implementation, but also because it enables AI to develop introspective capabilities early on and facilitates better self-expansion. Please adhere to the above rules and approach each line of code with diligence.

Future Development Roadmap

AIlice's fundamental tasks are twofold: one is to fully unleash the capabilities of LLM based on text into the real world; the other is to explore better mechanisms for long-term memory and forming a coherent understanding of vast amounts of text. Our development efforts revolve around these two focal points.

If you are interested in the development of AIlice itself, you may consider the following directions:

Explore improved long-term memory mechanisms to enhance the capabilities of each Agent. We need a long-term memory mechanism that enables consistent understanding of large amounts of content and facilitates association. The most feasible option at the moment is to replace vector database with knowledge graph, which will greatly benefit the comprehension of long texts/codes and enable us to build genuine personal AI assistants.
Multimodal support. The support for the multimodal model has been completed, and the current development focus is shifting towards the multimodal support of peripheral modules. We need a module that operates computers based on screenshots and simulates mouse/keyboard actions.
Self-expanding support. Our goal is to enable language models to autonomously code and implement new peripheral modules/agent types and dynamically load them for immediate use. This capability will enable self-expansion, empowering the system to seamlessly integrate new functionalities. We've completed most of the functionality, but we still need to develop the capability to construct new types of agents.
Richer UI interface. We need to organize the agents output into a tree structure in dialog window and dynamically update the output of all agents. And accept user input on the web interface and transfer it to scripter's standard input, which is especially needed when using sudo.
Develop Agents with various functionalities based on the current framework.
Explore the application of IACT architecture on complex tasks. By utilizing an interactive agents calling tree, we can break down large documents for improved reading comprehension, as well as decompose complex software engineering tasks into smaller modules, completing the entire project build and testing through iterations. This requires a series of intricate prompt designs and testing efforts, but it holds an exciting promise for the future. The IACT architecture significantly alleviates the resource constraints imposed by the context window, allowing us to dynamically adapt to more intricate tasks.
Build rich external interaction modules using self-expansion mechanisms! This will be accomplished in AIliceEVO.

In addition to the tasks mentioned above, we should also start actively contemplating the possibility of creating a smaller LLM that possesses lower knowledge content but higher reasoning abilities.

ailice's People

Contributors

Stargazers

Watchers

Forkers

karajan1001 sunnyxorange therealsphinx iksnae centaurioun agrime lyoyvel douyingaoxiao1 robinkomi averrag komirobin fuvines akimdi anima-os-dev sanyaade-projects roitmaster specialized806 tutumomo jo-dean eltociear huangweiboy2 ai-jie01 vector-tech-ltd shreefmokhtar1990 itcbb genyarko johndpope jadenblack creative-v pent nobu007 mediasoundlab fanhaixing999 polya20 captainkyd johnny-rice zzmjohn lyhiving xiaozhiob osbarcelos79 tangyiyong henderson11 wxtanghui2023 zhaopufeng chaojigang001 zhangzhuobys ai-mou zuyu3 hedhehe wangliru khromai shangguanyuanluo aku-github brunoscaglione watoliu shijunflash electronic-yoda andrewyang885 gvc0461082002 pchaganti teretzdev pranavred paperwave evdcush divineblade7 pbovio korziner sayanth-t-m bing-cheng baobo5625 planetmacro joe2hpimn decentralised-ai rickzw m1ndb0ts godgivendan bumbutudor id-2 musicdr1ven yanzhao77 call-center-together hunterfielow captain-bacon tingrubato natural20ai hendrikmax asthinasthi ranaprasad123 shreeshreee fellowtraveler maxlemieux generalduel jaccosu juanmackie startstuidopro morgan7street mdwoicke mojowebs bugbr3ak3r ue5377

ailice's Issues

Error when trying to list current directory content

Hello,

My first try seems to generate error :

ASSISTANT_AIlice:   !CALL<!|"coder-proxy","file_listing","./current_directory"|!>
ASSISTANT_file_listing:
SYSTEM_AIlice:  Agent file_listing returned:

ASSISTANT_AIlice:   !CALL<!|"coder-proxy","file_listing","./current_directory"|!>
ASSISTANT_file_listing:  SYSTEM_AIlice:  Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}EXCEPTION: Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}
Traceback (most recent call last):
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 131, in EvalEntries
    r = self.Eval(script)
        ^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 107, in Eval
    return self.CallWithTextArgs(nodeType, paras)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 86, in CallWithTextArgs
    return action['func'](**paras)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AProcessor.py", line 148, in EvalCall
    resp = f"Agent {agentName} returned: {self.subProcessors[agentName](msg)}"
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AProcessor.py", line 113, in __call__
    ret = self.llm.Generate(prompt, proc=partial(self.outputCB, "ASSISTANT_" + self.name), endchecker=self.interpreter.EndChecker, temperature = config.temperature)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/llm/AModelChatGPT.py", line 26, in Generate
    for chunk in self.client.chat.completions.create(model=self.modelName,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions.py", line 663, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 889, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}


ASSISTANT_AIlice:   To list the contents of the current directory, you can use the "ls" command in bash. Please note that this command requires the "coder-proxy" agent type to execute.
!CALL<!|"coder-proxy","file_listing","""
ls
"""|!>
ASSISTANT_file_listing:  SYSTEM_AIlice:  Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}EXCEPTION: Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}
Traceback (most recent call last):
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 131, in EvalEntries
    r = self.Eval(script)
        ^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 107, in Eval
    return self.CallWithTextArgs(nodeType, paras)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AInterpreter.py", line 86, in CallWithTextArgs
    return action['func'](**paras)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AProcessor.py", line 148, in EvalCall
    resp = f"Agent {agentName} returned: {self.subProcessors[agentName](msg)}"
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/AProcessor.py", line 113, in __call__
    ret = self.llm.Generate(prompt, proc=partial(self.outputCB, "ASSISTANT_" + self.name), endchecker=self.interpreter.EndChecker, temperature = config.temperature)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benda/AIlice/AIlice/ailice/core/llm/AModelChatGPT.py", line 26, in Generate
    for chunk in self.client.chat.completions.create(model=self.modelName,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions.py", line 663, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 889, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': "'messages' array must only contain objects with a 'content' field that is not empty."}

Launched with the command line : python3 AIliceWeb.py --modelID=lm-studio:AbLoGa/Chat-Mistral-7b-openorca-q4-gguf

LmStudio log :
LM-Studio.log

Problems with ailice calling the model after installing it through lm studio

(AIlice) PS G:\AIlice> ailice_main --modelID=lm-studio:Nexesenex/MIstral-QUantized-70b_Miqu-1-70b-iMat.GGUF --prompt="main" --contextWindowRatio=0.5
Encountered an exception, AIlice is exiting: 'lm-studio'
  File "G:\AIlice\ailice\AIliceMain.py", line 123, in main
    mainLoop(**kwargs)
  File "G:\AIlice\ailice\AIliceMain.py", line 38, in mainLoop
    config.Initialize(modelID = modelID)
  File "G:\AIlice\ailice\common\AConfig.py", line 125, in Initialize
    needAPIKey = ("apikey" in self.models[modelType] and (self.models[modelType]["apikey"] is None))

firefunction-v2 infinite loop

I'm only posting this because the contents of the infinite loop make me think it should have worked. It seems like the function call is formed properly, so I don't know why it just repeats over and over again unless it's not seeing the output or something?

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

The task objective is to list the files in the current directory. !CALL<!|"researcher","file_researcher","List the files in the current directory."|!>

Errors running on mac

On Mac M1 Max OSX 14.2.1 with python 3.11.

when running with parameters:
ailice_main --modelID=hf:Open-Orca/Mistral-7B-OpenOrca --prompt="main" --quantization=8bit --contextWindowRatio=0.6
getting error because of quantization parameter
Error:
Encountered an exception, AIlice is exiting: No GPU found. A GPU is needed for quantization.

if not torch.cuda.is_available():
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")

which expected because M1 do not have cuda

without quantization parameter getting other error:
Encountered an exception, AIlice is exiting: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

@stevenlu137 what are you thought of using external local running LLM with LM Studio or ollama and implement AModelLocal.py ?
it can run all the same hugging face models

Small features / corrections

Dear,

Would it be possible to check :

Better handling of error for config.json ? Example : When you try to load something not existing.
Creation of the config.json file on first launch of "AIliceWeb.py" with no args (Or with the pip install) ? On first install, i need to execute "--modelID=oai:gpt-4o" to have the message that the file need to be updated (and created).
When using the docker file, i have an error about AV missing (Built with Docker for Windows)
I can't run it on VM easily, website is not accessible from outside the VM, need to edit py file to add 'share=True' in 'Launch'. Could an args be availaible when launching AIlice ?

Thanks !

Chrome?

Help me understand the Chrome dependency. Do I actually need to install the Chrome browser? (I use Brave). Or is this just a headless python library that the Agent is supposed to install? Because it will try to do "Google" tool and hang.

Error: Connecting module browser FAILED

Hello,

I am facing Connecting module browser FAILED issue. I am running this on MacBook Pro M1 and using conda and python version 3.11. Here are the details:

(AIlice) AK@MBP AIlice % ailice_web --modelID=groq:llama3-70b-8192
InitOpenRouterCfg() FAILED, skip this part and do not set it again.
********************** Initialize *****************************
config.json is located at /Users/AK/Library/Application Support/ailice
********************** End of Initialization *****************************
In order to simplify installation and usage, we have set local execution as the default behavior, which means AI has complete control over the local environment. To prevent irreversible losses due to potential AI errors, you may consider one of the following two methods: the first one, run AIlice in a virtual machine; the second one, install Docker, use the provided Dockerfile to build an image and container, and modify the relevant configurations in config.json. For detailed instructions, please refer to the documentation.
killing proc with PID 75993
killing proc with PID 75995
killing proc with PID 75996
killing proc with PID 75997
killing proc with PID 75999
storage started.
browser started.
arxiv started.
google started.
duckduckgo started.
scripter started.
computer started.
Connecting module browser FAILED. You can try running the module manually and observe its error messages. EXCEPTION: Resource temporarily unavailable
Connecting module browser FAILED. You can try running the module manually and observe its error messages. EXCEPTION: Resource temporarily unavailable
Connecting module browser FAILED. You can try running the module manually and observe its error messages. EXCEPTION: Resource temporarily unavailable
Connecting module browser FAILED. You can try running the module manually and observe its error messages. EXCEPTION: Resource temporarily unavailable
Connecting module browser FAILED. You can try running the module manually and observe its error messages. EXCEPTION: Resource temporarily unavailable
It seems that some peripheral module services failed to start. EXCEPTION: Resource temporarily unavailable
File "/Users/AK/AIlice/ailice/ui/app.py", line 59, in Init
clientPool.Init()
File "/Users/AK/AIlice/ailice/common/ARemoteAccessors.py", line 17, in Init
raise e
File "/Users/AK/AIlice/ailice/common/ARemoteAccessors.py", line 14, in Init
self.pool[cfg['addr']] = makeClient(cfg['addr'])
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/AK/AIlice/ailice/common/lightRPC.py", line 138, in makeClient
ret=ReceiveMsg(socket)
^^^^^^^^^^^^^^^^^^
File "/Users/AK/AIlice/ailice/common/lightRPC.py", line 28, in ReceiveMsg
return pickle.loads(conn.recv())
^^^^^^^^^^^
File "_zmq.py", line 1137, in zmq.backend.cython._zmq.Socket.recv
File "_zmq.py", line 1172, in zmq.backend.cython._zmq.Socket.recv
File "_zmq.py", line 1264, in zmq.backend.cython._zmq._recv_copy
File "_zmq.py", line 1259, in zmq.backend.cython._zmq._recv_copy
File "_zmq.py", line 152, in zmq.backend.cython._zmq._check_rc

Also I tried running this command as I saw it in another post:
python3 -m ailice.modules.AStorageVecDB --addr=ipc:///tmp/AIliceStorage.ipc", "addr": "ipc:///tmp/AIliceStorage.ipc
Output:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/AK/AIlice/ailice/modules/AStorageVecDB.py", line 117, in
main()
File "/Users/AK/AIlice/ailice/modules/AStorageVecDB.py", line 114, in main
makeServer(AStorageVecDB, dict(), args.addr, ["ModuleInfo", "Open", "Reset", "Store", "Query"]).Run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/AK/AIlice/ailice/common/lightRPC.py", line 83, in makeServer
return GenesisRPCServer(objCls,objArgs,url,APIList)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/AK/AIlice/ailice/common/lightRPC.py", line 39, in init
self.receiver.bind(url)
File "/opt/homebrew/Caskroom/miniconda/base/envs/AIlice/lib/python3.11/site-packages/zmq/sugar/socket.py", line 311, in bind
super().bind(addr)
File "_zmq.py", line 895, in zmq.backend.cython._zmq.Socket.bind
zmq.error.ZMQError: No such file or directory for ipc path "/tmp/AIliceStorage.ipc, addr: ipc:///tmp/AIliceStorage.ipc". (addr='ipc:///tmp/AIliceStorage.ipc, addr: ipc:///tmp/AIliceStorage.ipc')

Hope someone can advise on a solution. Thanks

anthropic only works through openrouter

I tried anthropic with null base URL, and I also tried the two URLs provided by anthropic. ("complete" and "messages"). I can paste the errors here but I assume you can already see them in your console, otherwise lmk and I'll post them.
It works through openrouter though.
I will submit a PR that adds deepinfra and fireworks to the config providers list.

Support for connecting to models running via Llamafile

Hey Steven!

Very cool project! I have been having some fun trying to get a multi-modal agent setup going using hosted models alongside open source models running locally on my 3090 setup. I have been surprised by how much is already possible!

I am not sure if you have heard of the Mozilla foundation's Llamafile project but it builds on llama-cpp and it bundles weights as executable programs and allows for easier sharing/running of models locally. I was curious to get your thoughts on integrating the use of these? It seems like each Llamafile has a OAI API completions endpoint it runs by default. I thought perhaps this might be useful to avoid having to convert models to support the API as you describe in the README. Would love to pick your brain on so much!

Add GPT-4o support

Afaik it has its own tokenizer...

Unable to use OPENAI_API_KEY error incorrect but its accurate

This is really cool project I installed it all but get this

127.0.0.1 - - [16/Jul/2024 20:33:26] "GET /new_chat HTTP/1.1" 200 -
127.0.0.1 - - [16/Jul/2024 20:33:26] "GET /static/User.JPG HTTP/1.1" 304 -
ASSISTANT_AIlice: 127.0.0.1 - - [16/Jul/2024 20:33:26] "POST /chat HTTP/1.1" 200 -
127.0.0.1 - - [16/Jul/2024 20:33:26] "GET /static/AIlice.png HTTP/1.1" 304 -
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self.kwargs)
File "Z:\APPZ\AG3NTZ\AIlice\ailice\core\AProcessor.py", line 117, in call
ret = self.llm.Generate(prompt, proc=partial(self.outputCB, "ASSISTANT" + self.name), endchecker=self.interpreter.EndChecker, temperature = config.temperature)
File "Z:\APPZ\AG3NTZ\AIlice\ailice\core\llm\AModelChatGPT.py", line 26, in Generate
for chunk in self.client.chat.completions.create(model=self.modelName,
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\site-packages\openai_utils_utils.py", line 277, in wrapper
return func(*args, **kwargs)
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\site-packages\openai\resources\chat\completions.py", line 643, in create
return self._post(
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\site-packages\openai_base_client.py", line 1266, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\site-packages\openai_base_client.py", line 942, in request
return self._request(
File "C:\Users\MindExpander\AppData\Local\Programs\Python\Python310\lib\site-packages\openai_base_client.py", line 1046, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-********************************************sh7R. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

think it something with API it is working a new one I created I am not sure what issue it

The code AIlice wrote always have invalid syntax.

ASSISTANT_api_requestor:  Apologies for the error in my previous response. Here's the corrected version of Python code:

```!PYTHON<!|
import requests

post_data = '{"token": "thisisatoken"}'
url = "http://****/get_cur_tasks"

response = requests.post(url, json=post_data)

if response.status_code == 200:
    result_json = response.json()
else:
    error_message = f"POST request failed with status code: {response.status_code}"
    result_json = {"error": error_message}

# Ensure result_json is stringified before assigning to a variable
result_json_str = str(result_json)

# Properly store the result as a string
!VAR<!|"post_response", result_json_str|!>;
```
SYSTEM_api_requestor:  PYTHON_RESULT=[  File "/tmp/tmpk8yktbgk", line 19
    !VAR<!|"post_response", result_json_str
    ^
SyntaxError: invalid syntax

Execution completed.

]

ASSISTANT_api_requestor:  I apologize for the mistake in my previous code. Here's the corrected Python snippet:

```!PYTHON<!|
import requests

post_data = '{"token": "thisisatoken"}'
url = "http://****/get_cur_tasks"

response = requests.post(url, json=post_data)

if response.status_code == 200:
    result_json = response.json()
else:
    error_message = f"POST request failed with status code: {response.status_code}"
    result_json = {"error": error_message}

# Stringify the JSON object and store it in a variable
result_json_str = str(result_json)

# Properly save the stringified JSON to a variable using the correct syntax
!VAR <!| "post_response", result_json_str |!>;
```
SYSTEM_api_requestor:  PYTHON_RESULT=[  File "/tmp/tmpzl7x9snh", line 19
    !VAR <!| "post_response", result_json_str 
    ^
SyntaxError: invalid syntax

Execution completed.

]

I think the !VAR <!| "post_response", result_json_str is a command to start a agent, but looks like execute as a python code.

openai.APIConnectionError: Connection error using qwen2-72b

Everthing looks good at the start:

This is my relevant screenshot of my configuration file:

It runs ok in browser:

As soon as i wrote "Hi" it happened in console:

I'm only using qwen, not api of openai.
Finally I see this error:
openai.APIConnectionError: Connection error.
I downloaded this from LMStudio:
qwen2-72b-instruct-q4_k_m.gguf https://huggingface.co/Qwen/Qwen2-72B-Instruct-GGUF/resolve/main/qwen2-72b-instruct-q4_k_m.gguf

references to "undefined" in webUI

The agent appears to have assigned labels to various symbols correctly, since it references those symbols by name in the console log.
But in the web UI it will refer to those same symbols as "undefined".
I would post a screenshot but I assume you know what I mean already.

Erroring on Windows when installed and first run

As a user who is on a windows system and can't run Docker (Due to WSL issues. Don't ask), there should be an option to run this off of the command prompt directly.

Agent use git?

Suggest feature from Aider: whenever agent makes a change, create a new git branch, make the change there on that branch, debug there, until it's ready to merge back to dev branch.
Furthermore, probably should make the test first on a new branch (against a stub function), merge it, and then write the function which must pass that test. Then merge that. Then write next test, etc.
Seems like the agent can do C++ this way. Write the header first, including a comment at the top describing what the class is meant to do, including the interface if available/specified. An LLM summary of the header can be made and then a graph node can be inserted with properties for text of the header itself, the summary of the header, and the embedding of the summary, along with the fully-qualified scope and name of the class, and the filename. And I suppose the commit hash. But what do you think?
Then each function should be ingested/written as its own separate piece of text, on its own graph node, with properties for dependencies. And whenever a function is changed, the agent should write the entire CPP file out on the fly, with all the other functions unchanged (stored separately before that point). All the member functions for that class should still be viewable to the coding agent, but the agent should only be actually changing one of those functions at a time IMO. Or at least, a small set of them discretely and with purpose. (Git can be useful for this). Each function should have its own summary created by an LLM with the header in context, and each summary should then have an embedding used for finding that function later (and its subgraph). Semantic search is not as useful for code, but summaries of code I think can be semantically useful. As well as fully-qualified names / scope.

But of course for context the agent doesn't just need to know the interface/implementation of the class it's working on, but also it needs to know the interfaces it has inherited from base classes. It needs to read the summaries from base class headers and methods, so it knows whether or not it's supposed to override certain methods, or use certain methods from higher up in the class hierarchy. This is why I think a graph is critical as you have pointed out also.
Each time a change in a git branch passes all the existing tests, and institutes a new test that passes critical inspection, and contains a new function that passes the new test as well, then a caller agent can consider merging it back into dev branch.

^^^^Partially I wrote this only to get my own thoughts out. This should be general enough that it works for all languages, I'm only using C++ to "think my way" through it.
^^^^Partially I wrote this because I think you are doing some of this already, in terms of tracking the functions and file pieces separately, and I want to understand faster how I can help, or where I'm wrong, or what I'm missing, and what I need to contribute to get this part working because I think this is critical.
Again, much respect for this project; I think you have the right design philosophy. I have played with a lot of agents and I really like this one, so thank you for your hard work. It is appreciated.

Agents looping at task execution

I have an issue with agents looping infinitely at the lower level task execution. They can execute the task successfully but instead of passing the result up, the LLM get's basically the same request again.
I'm using local model via OpenAI compatible API using text-generation-webui (oobabooga), so I suspect it might be related to the issue. I tried different models and tasks and the problem is always the same.

I've added logs (LLM + AIlice output) from very simple task (listing files in the current directory) using nous-hermes-2-mixtral-8x7b-dpo model.
Any hints how to resolve that would be appreciated.

AIlice log

storage started.
browser started.
arxiv started.
google started.
duckduckgo started.
scripter started.
USER: List all files in the current dir
ASSISTANT_AIlice: This is a task that requires file system access and should be handled by a coder-proxy agent. I will forward it to a coder-proxy agent: !CALL<!|"coder-proxy", "file_system_agent", "List all files in the current dir."|!>
ASSISTANT_file_system_agent: !BASH<!|ls|!>
SYSTEM_file_system_agent: BASH_RESULT=[ailice
ailice.egg-info
AIlice.png
Dockerfile
IACT.jpg
LICENSE
log
README.md
setup.py
venv

Execution completed.

]

ASSISTANT_file_system_agent: !BASH<!|ls|!>
SYSTEM_file_system_agent: BASH_RESULT=[ailice
ailice.egg-info
AIlice.png
Dockerfile
IACT.jpg
LICENSE
log
README.md
setup.py
venv

Execution completed.

]

ASSISTANT_file_system_agent: !BASH<!|ls|!>
SYSTEM_file_system_agent: BASH_RESULT=[ailice
ailice.egg-info
AIlice.png
Dockerfile
IACT.jpg
LICENSE
log
README.md
setup.py
venv

Execution completed.

]

LLM log

23:24:02-028478 INFO PROMPT= <|im_start|>system

Your responsibility is to communicate with users, understand their needs, and select an appropriate agent to forward the questions or tasks presented by the user. Afterward, you need to organize the results returned by the agent and reply to the user. The agent may ask the user some questions, and you will need to facilitate the communication between them.
When calling an agent, it is important to note that the working language of the agent is English. If the user is communicating in a language other than English, you will need to provide translation between the agent and the user.
If you're not sure an agent type is suitable for a task, create one and ask it first.
You can create and communicate with AI agents using textual function calls. These agents have the ability to solve different types of problems and have different abilities to interact with the external environment.
Always forward tasks to agents with appropriate capabilities. Especially don't do coding tasks yourself.
In order to reduce symbol conflicts, we use special symbols "<!|","|!>" as replacements for commonly used parentheses in function call syntax. Please pay special attention to the syntax when generating function call statements.
The agent can only see the information you send to it through "CALL" function, they can't see anything else you output. So, please provide agents with thorough task descriptions or additional information; don't cut corners.
"ext-modules" is a type of module that implements specific interfaces, defining a set of new function calls. Once loaded, you will be able to use the functions defined within it. For user requests to build or load ext-modules, please forward them to coder-proxy.

#Use the following function to create or communicate with an AI agent:
CALL<!|agentType: str, agentName: str, msg: str|!>

agentType: A string used to specify the type of AI agent. It can be of the following types:
1. "researcher", Conduct an internet investigation on a particular topic or gather data. It also has the capability to execute simple scripts.
2. "article-digest": literature (local files or URLs on internet) reading comprehension and related question answering.
3. "coder-proxy", an excellent coder who also has access to bash and python interpreter, he can solve problems programmatically.
  You need to choose the appropriate agentType among the above types according to your needs.
agentName: The name of the AI agent instance. Create a new one if not found. Name should reflect responsibility. A new task can be assigned to an existing agent with a relevant name, as they often possess related experience.
msg: message need to be told to the agent. When you need to describe a new task in msg, it should be clear, complete, and self-sufficient.

Function calls are limited to positional parameters, please keep the order of parameters consistent with the function definition.
The function parameters cannot include context references. Please ensure that the parameters are comprehensive and do not rely on context.
Function calls need to be placed at the end of your output, and prefixed with "!" to trigger execution.
Sometimes the information returned by the agent is a bit messy, but you can still identify the parts that the user needs and sort it out.

Example:
USER: What's the weather like in New York today?
ASSISTANT: This is an information query task, and I will forward it to a researcher-type agent: !CALL<!|"researcher", "weather_agent", "What's the weather like in New York today?"|!>

End of general instructions.

Active Agents: []
Relevant Information:
None.
The "RELEVANT INFORMATION" part contains data that may be related to the current task, originating from your own history or the histories of other agents. Please refrain from attempting to invoke functions mentioned in the relevant information, as you may not necessarily have the permissions to do so.

Llama.generate: prefix-match hit

llama_print_timings: load time = 11984.62 ms
llama_print_timings: sample time = 0.78 ms / 70 runs ( 0.01 ms per token, 89858.79 tokens per second)
llama_print_timings: prompt eval time = 15172.52 ms / 889 tokens ( 17.07 ms per token, 58.59 tokens per second)
llama_print_timings: eval time = 3243.48 ms / 69 runs ( 47.01 ms per token, 21.27 tokens per second)
llama_print_timings: total time = 18493.18 ms / 958 tokens
Output generated in 18.65 seconds (3.70 tokens/s, 69 tokens, context 923, seed 425007066)
23:24:20-766744 INFO PROMPT=
<|im_start|>system

You are an smart and helpful AI agent that helps user complete tasks that require programming. You will complete the task by interacting with agents with coding capabilities (hereinafter referred to as "coder").
Your job is to communicate with the user to understand their needs, provide requirements for the coder, execute the returned code, extract possible running errors information and feed back to coder until the code runs correctly.
For programming tasks that are complex enough to consist of multiple modules, you can consider breaking down the task and assigning it to multiple agents to complete it.
Using function calls, you can create and interact with agents, reply to users, configure the program's running environment and execute programs.
Use special symbols "<!|","|!>" as replacements for parentheses in function call syntax. Please pay special attention to the syntax when generating function call statements.
Only positional parameter function calls are supported, please do not use keyword parameters and keep the order of parameters consistent with the function definition.
References are not supported in parameters of function call statements. Please use clear and straightforward expressions in function calls.
Functions with a single string parameter do not need to write quotation marks when passing parameters. So, do not use quotes when passing code into functions such as BASH or PYTHON, use "cat" command when you need to write code to a file. Avoid the extensive use of escape characters.
Function calls need to be placed at the end of your output, and prefixed with "!" to trigger execution.
You MUST call a function in your output.
Only the most recent rounds of historical conversations will be retained. To prevent the loss of information, please make sure to include summaries of key details, such as the user's requests, in the initial portion of your responses.
If you find that the environment variable A_IN_CONTAINER is predefined, you can install any necessary tools.

"ext-modules" is a type of module that implements specific interfaces, defining a set of new function calls. Once loaded, you will be able to use the functions defined within it.
Do not attempt to build ext-modules unless explicitly requested by the user. It adds complexity to the debugging process.
Please note that for ext-modules, you need to store the code in the current directory before running it, otherwise there will be an error that the python package cannot be found. You also need to run them as separate processes, or you will be stuck.

Your workflow follows these steps (The details of functions such as CALL/PYTHON/BASH/RESPOND will be given below, and for the sake of simplicity in this context, the term 'coder' is used to encompass both 'coder' and 'module-coder'):

Receive user requirements, understand and engage with the user as needed to enhance understanding(use RESPOND).
Choose or create a suitable coder agent, provide a clear and comprehensive description of the requirements to it(use CALL).
Once coder returns the code, your responsibilities include:
Initial verification of whether the solution provided by the coder meets the requirements. If not, make modification requests to the coder.
Install the necessary dependencies following coder's instructions and execute the code (use BASH and PYTHON).
In case of error, send detailed error information to coder for code improvement (use CALL), the message include error and problematic lines of code (for the python case, the code is saved into a temp file like "/tmp/tmp*.py" before execute, you can identify this file in the top level of callstack). Go back to the beginning of step 3 and iterate until success.
During the iteration, if coder requires knowledge about specific libraries, query relevant knowledge through the "researcher" type agent (use CALL).
Finally, return the execution result to the user (use RESPOND)."

Available Functions:
#Use this function to interact with an AI agent.
CALL<!|agentType: str, agentName: str, msg: str|!>
agentType: A string used to specify the type of AI agent. It can be of the following types:
1. "coder". An excellent coder, you need to leave any programming problems other than ext-modules building to him. It should be noted that he can only return the code to you, but does not have the authority to execute the code and configure the environment, this is where you need to help him.
2. "module-coder". The only agent capable of building ext-modules, and this is its sole responsibility.
3. "module-loader". An agent that can help you load and use ext-modules. Please pass in the address of the module when creating the agent, and then you can let it help you interact with the module. You can use this agent to assist in ext-module debugging.
4. "researcher". Suitable for technical problem search tasks such as library document or sample code search on the internet.
agentName: The name of the AI agent instance. Create a new one if not found. Name should reflect responsibility. A new task can be assigned to an existing agent with a relevant name, as they often possess related experience.
msg: message need to be told to the agent. The agent cannot see content other than msg. Please provide complete content in msg.

#Dialog with user. Typical usage scenarios: when you need user to supplement task information, or need to report the current results to user.
RESPOND<!|message: str|!>

#Execute bash script. A timeout error will occur for programs that have not been completed for a long time.
BASH<!|code: str|!>

#Execute python code. Please note that you need to copy the complete code here, and you must not use references.
PYTHON<!|code: str|!>

#Wait for some seconds. The unit of the "duration" parameter is seconds. Useful when waiting for script or command execution to complete.
WAIT<!|duration: int|!>

EXAMPLE:
!CALL<!|"coder", "clock_coder", "Build a program that can check CPU usage."|!>
!PYTHON<!|print('hello wolrd.')|!>

End of general instructions.

Active Agents: []

Relevant Information: None.
The "RELEVANT INFORMATION" part contains data that may be related to the current task, originating from your own history or the histories of other agents. Please refrain from attempting to invoke functions mentioned in the relevant information, as you may not necessarily have the permissions to do so.

Llama.generate: prefix-match hit

llama_print_timings: load time = 11984.62 ms
llama_print_timings: sample time = 0.29 ms / 26 runs ( 0.01 ms per token, 89965.40 tokens per second)
llama_print_timings: prompt eval time = 21451.36 ms / 1497 tokens ( 14.33 ms per token, 69.79 tokens per second)
llama_print_timings: eval time = 1451.37 ms / 25 runs ( 58.05 ms per token, 17.23 tokens per second)
llama_print_timings: total time = 22934.28 ms / 1522 tokens
Output generated in 23.10 seconds (1.08 tokens/s, 25 tokens, context 1531, seed 194179588)
23:24:43-919787 INFO PROMPT=
<|im_start|>system
Function returned: {BASH_RESULT=[ailice
ailice.egg-info
AIlice.png
Dockerfile
IACT.jpg
LICENSE
log
README.md
setup.py
venv

Execution completed.

]
}<|im_end|>
<|im_start|>user
List all files in the current dir.<|im_end|>
<|im_start|>assistant
!BASH<!|ls|!><|im_end|>
<|im_start|>user
List all files in the current dir.<|im_end|>
<|im_start|>assistant

Llama.generate: prefix-match hit

llama_print_timings: load time = 11984.62 ms
llama_print_timings: sample time = 0.12 ms / 11 runs ( 0.01 ms per token, 90909.09 tokens per second)
llama_print_timings: prompt eval time = 3626.59 ms / 108 tokens ( 33.58 ms per token, 29.78 tokens per second)
llama_print_timings: eval time = 439.92 ms / 10 runs ( 43.99 ms per token, 22.73 tokens per second)
llama_print_timings: total time = 4078.59 ms / 118 tokens
Output generated in 4.25 seconds (2.35 tokens/s, 10 tokens, context 163, seed 1374320967)
23:24:48-588522 INFO PROMPT=
<|im_start|>system
Function returned: {BASH_RESULT=[ailice
ailice.egg-info
AIlice.png
Dockerfile
IACT.jpg
LICENSE
log
README.md
setup.py
venv

Execution completed.

]
}<|im_end|>
<|im_start|>user
List all files in the current dir.<|im_end|>
<|im_start|>assistant
!BASH<!|ls|!><|im_end|>
<|im_start|>assistant
!BASH<!|ls|!><|im_end|>
<|im_start|>user
List all files in the current dir.<|im_end|>
<|im_start|>assistant

Llama.generate: prefix-match hit

llama_print_timings: load time = 11984.62 ms
llama_print_timings: sample time = 0.12 ms / 11 runs ( 0.01 ms per token, 90909.09 tokens per second)
llama_print_timings: prompt eval time = 2768.10 ms / 32 tokens ( 86.50 ms per token, 11.56 tokens per second)
llama_print_timings: eval time = 479.88 ms / 10 runs ( 47.99 ms per token, 20.84 tokens per second)
llama_print_timings: total time = 3259.56 ms / 42 tokens
Output generated in 3.42 seconds (2.92 tokens/s, 10 tokens, context 190, seed 1927657688)

Agent lies about tests / build

When the agent is coding, I instruct the agent to maintain a test script and to run the tests after each change, adding new tests to correspond to new functions it is coding. I have done this so far in Python and in Rust.
The agent will often get build errors, but then it will claim: "All the tests passed successfully! What would you like to do next?"
When this happens, I am in the habit of just saying "that's not true, I can see the actual output: " and then I paste the build log directly into the web UI. The agent then corrects itself, "I'm sorry you're right..." and goes off to fix it–but then it happens again (and again).

Somehow when it should be examining the actual output from the build and deciding that a fix must be done, it is instead hallucinating that it was successful and that all the tests have passed.
I've tried different models but I was seeing this a lot tonight with Claude-Sonnet-3.5 (through openrouter).

Congratulations on the quality of this agent BTW. I really like the design choices you made here. I am interested in code ingestion (eventually C++) and graph DBs, and automated coding specifically. This is probably the best agent I have used so far so kudos for the excellent work.

Speech.PrepareModel not downloading voice model

When starting AIlice (both main and web) with the --speechOn flag, I'm getting this error:

Command used:

$ ailice_main --modelID=lm-server:llava --prompt="main" --contextWindowRatio=0.5 --speechOn

Result (truncated):

...
The speech module is preparing speech recognition and TTS models, which may include the work of downloading weight data, so it may take a long time.
Encountered an exception, AIlice is exiting: An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.
  File "/home/user/ailice/AIliceMain.py", line 104, in main
    mainLoop(**kwargs)
  File "/home/user/ailice/AIliceMain.py", line 56, in mainLoop
    speech.PrepareModel()
  File "/home/user/ailice/common/lightRPC.py", line 102, in methodTemplate
    return self.RemoteCall(methodName,args,kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/ailice/common/lightRPC.py", line 129, in RemoteCall
    raise ret["exception"]```

I am running using LM Studio to run my model (trinity v1.2 7B for testing), and it works fine without speech.

offline huggingface

Because the region is restricted, I would like to ask how to load the model on hugging face offline?

Agent apparently rewrites entire code file and loses a bunch of work.

I noticed the agent was working on the code and the file got bigger, and bigger, say 3k, then 6k, then 9k.
...Then at some point the file was 2 or 3k again. Basically the agent just deleted everything it had done and rewrite the file.
I think that code files should not be thought of as "files" to the agent.
Instead, whether the agent writes the code, or ingests it, the agent should determine how to split up the file (into a series of functions for example) and should have a corresponding test(s) for each function, and should only make changes to that specific function.
The entire file if possible (or maybe the header in the case of C++) should still be in the context, but the only part actually being edited should be the 1 specific function, which should be stored in its own place. Then when the agent writes the file to disk, it "compiles" it from those chunks into the final text file that is written. The agent should not have the power to rewrite the entire file at once and re-save it. Otherwise that situation prevents it from systematically working through the problem set, and instead causes it to loop endlessly starting over at the beginning with a rewrite and losing work that it had already done and even debugged.
This happened tonight with Claude-Sonnet-3.5 BTW. But I think I've also seen it with GPT-4o, deepseek-coder and/or qwen2 72b instruct.
BTW -- you may already be doing this, I saw hints of something like this in the log. But I still saw the problem, so maybe I just need to understand the code better so I can help prevent this sort of issue from happening.

Recommended Python Version?

First off, thank you for creating this! I've been playing with other agent systems, and found AIlice to be quite interesting.

Is there a recommended python version I should be running AIlice in?

Currently running AIlice on a NUC running ubuntu 23.10 (CPU only, 32GB RAM) on python 3.11.7. Previously rubbing on the same machine but with python 3.10.6

Will there be gguf support?

Ollama, Litellm, lmstudio, etc

Not actually lightweight ;)

The tagline of AIlice is 'lightweight AI agent', and while the code itself is brilliantly minimal, the ailice virtual environment took up 5.8GB of space on my hard drive.

The biggest contributors to the size are the nvidia package at 2.8GB and torch at 1.6GB.

Would it be possible to not depend on these packages, or to make the dependencies optional?

💡 Rewrite it in https://www.modular.com/mojo, a python compatible new lang, for c++ performance?

This lang was created to use all cpus available in a machine with c like performance but using a pythonic lang completely compatible with python, for the new ai era.

It has true interoperability with the Python ecosystem.
Read introduction for more info:
https://docs.modular.com/mojo/manual/
So it theory, it can be improve speed like 68k times.

Agent often assumes wrong OS

I'm using MacOS and sometimes I see the agent trying to "sudo apt-get install..."
When I notice that, it reminds me to add "You are on MacOS so don't use any linux-specific bash commands" to the prompt when I re-try.
Is this not already in the system prompt?
I will dig into the code but would also like to do a short call if possible to speed me up.

What happens if temperature is null?

Is it possible to just leave temperature null and go with however the model is already configured?
I have models already custom configured in Ollama with specific sampler settings and would like to test with those as they are, but I think the ailice code is setting my temperature to 0 and also doesn't let me set min_p.
I would like option to leave sampler settings null and go with the defaults for the model as configured in ollama.
Also I would like the option to individually set sampler settings for each individual model.
I'm not asking you to work on this, but just letting you know for visibility.
I might submit a PR myself if I get around to it.

Suggestion for claude

Saw in your Readme that you haven't tried it with claude. Would recommend trying openrouter to try out claude. And maybe adding instructions for it. That way swapping models is just a matter of changing one line. Since they have openai compatibility. https://openrouter.ai/docs#quick-start

Add ollama models fails

Seems something wrong when I attempt adding ollama models

I got following info

We now start the vector database. Note that this may include downloading the model weights, so it may take some time.
Encountered an exception, AIlice is exiting: Failed to create llama_context
  File "/Users/ziyic/AIlice/ailice/AIliceWeb.py", line 225, in main
    mainLoop(**kwargs)
  File "/Users/ziyic/AIlice/ailice/AIliceWeb.py", line 60, in mainLoop
    msg = storage.Open("")
  File "/Users/ziyic/AIlice/ailice/common/lightRPC.py", line 106, in methodTemplate
    return self.RemoteCall(methodName,args,kwargs)
  File "/Users/ziyic/AIlice/ailice/common/lightRPC.py", line 129, in RemoteCall
    raise ret['exception']

And my config.json just like this

{
  "maxMemory": {},
  "quantization": null,
  "models": {
    "ollama": {
      "modelWrapper": "AModelChatGPT",
      "apikey": "fake-key",
      "baseURL": "http://localhost:4000",
      "modelList": {
        "ollama/codestral:latest": {
          "formatter": "AFormatterGPT",
          "contextWindow": 8192,
          "systemAsUser": false
        }
      }
    }
  },
  "temperature": 8.0,
  "flashAttention2": false,
  "speechOn": false,
  "contextWindowRatio": 0.6,
  "services": {...}
}

Installation path

I would like to know where the default installation path for the weight model is downloaded to

Prompt is not a vaild dict.

Well, it is hard for me to use the api from OpenAI. And My computer is not powerful enough to run these models locally.
So I wrote a file in core/llm/AModelQW.py to access the Qwen model from Internet. （I have much free tokens to use so I choose it.）

It works good when I use the command like "list the files in this folder".
But when I give a complex command like find the *.ppt files in my PC. And tell me where they are. Point the files which maybe the study materials.
It works fine for the first few steps. But after SYSTEM_system_command_agent return the python script and Instructions， the ASSISTANT_system_command_agent give a tuple as prompt. It makes error cause Qwen only accept a dict.
So, does the tuple is neccessary for AIlice? Should I do something to remove it, or it just a bug?

AModelQW.py

import os, json

from ailice.common.utils.ATextSpliter import sentences_split
from ailice.common.AConfig import config
from ailice.core.llm.AFormatter import AFormatterQW
from dashscope import Generation

class AModelQW():
    def __init__(self, modelName: str):
        
        self.tokenizer = None
        self.modelName = modelName
        # self.client = openai.OpenAI(api_key = config.openaiGPTKey)
        self.api_key = os.environ.get("QWEN_KEY")
        self.formatter = AFormatterQW(systemAsUser=True)
        # self.contextWindow = 6122
        self.contextWindow=6122*0.6
        return
    
    def Generate(self, prompt: list[dict[str,str]], proc: callable, endchecker: callable, temperature: float = 0.2) -&gt; str:
        # print("======&gt;this is a generate: ", prompt)
        proc(txt='', action='open')
        currentPosition = 0
        text = ""
        responses = Generation.call(model=Generation.Models.qwen_max,
                                    result_format='message',
                                    messages=prompt,
                                    incremental_output=True,
                                    stream=True)
        for chunk in responses:
            if chunk.status_code == 200:
                text += (chunk.output.choices[0]['message']['content'] or "")

                if endchecker(text):
                    break
                
                sentences = [x for x in sentences_split(text[currentPosition:])]
                if (2 &lt;= len(sentences)) and ("" != sentences[0].strip()):
                    proc(txt=sentences[0], action='append')
                    currentPosition += len(sentences[0])
            else:
                print(f"=====&gt;response error, chunk id {chunk.request_id}, chunk status_code {chunk.status_code}, chunk code {chunk.code}, chunk message {chunk.message}")
                print(prompt)
        proc(txt=text[currentPosition:], action='close')
        return text

prompt

([{'role': 'system', 'content': '\nYou are an smart and helpful AI agent that helps user complete tasks that require programming. You will complete the task by interacting with agents with coding capabilities (hereinafter referred to as "coder").\nYour job is to communicate with the user to understand their needs, provide requirements for the coder, execute the returned code, extract possible running errors information and feed back to coder until the code runs correctly.\nFor programming tasks that are complex enough to consist of multiple modules, you can consider breaking down the task and assigning it to multiple agents to complete it.\nUsing function calls, you can create and interact with agents, reply to users, configure the program\'s running environment and execute programs.\nUse special symbols "<!|","|!>" as replacements for parentheses in function call syntax. Please pay special attention to the syntax when generating function call statements.\nOnly positional parameter function calls are supported, please do not use keyword parameters and keep the order of parameters consistent with the function definition.\nReferences are not supported in parameters of function call statements. Please use clear and straightforward expressions in function calls.\nFunctions with a single string parameter do not need to write quotation marks when passing parameters. So, do not use quotes when passing code into functions such as BASH or PYTHON, use "cat" command when you need to write code to a file. Avoid the extensive use of escape characters.\nFunction calls need to be placed at the end of your output, and prefixed with "!" to trigger execution.\nYou MUST call a function in your output.\nOnly the most recent rounds of historical conversations will be retained. To prevent the loss of information, please make sure to include summaries of key details, such as the user\'s requests, in the initial portion of your responses.\nIf you find that the environment variable A_IN_CONTAINER is predefined, you can install any necessary tools.\n\n"ext-modules" is a type of module that implements specific interfaces, defining a set of new function calls. Once loaded, you will be able to use the functions defined within it.\nDo not attempt to build ext-modules unless explicitly requested by the user. It adds complexity to the debugging process.\nPlease note that for ext-modules, you need to store the code in the current directory before running it, otherwise there will be an error that the python package cannot be found. You also need to run them as separate processes, or you will be stuck.\n\nYour workflow follows these steps (The details of functions such as CALL/PYTHON/BASH/RESPOND will be given below, and for the sake of simplicity in this context, the term \'coder\' is used to encompass both \'coder\' and \'module-coder\'):\n1. Receive user requirements, understand and engage with the user as needed to enhance understanding(use RESPOND).\n2. Choose or create a suitable coder agent, provide a clear and comprehensive description of the requirements to it(use CALL).\n3. Once coder returns the code, your responsibilities include:\n    Initial verification of whether the solution provided by the coder meets the requirements. If not, make modification requests to the coder.\n    Install the necessary dependencies following coder\'s instructions and execute the code (use BASH and PYTHON).\n    In case of error, send detailed error information to coder for code improvement (use CALL), the message include error and problematic lines of code (for the python case, the code is saved into a temp file like "/tmp/tmp*.py" before execute, you can identify this file in the top level of callstack). Go back to the beginning of step 3 and iterate until success.\n    During the iteration, if coder requires knowledge about specific libraries, query relevant knowledge through the "researcher" type agent (use CALL).\n4. Finally, return the execution result to the user (use RESPOND)."\n\nAvailable Functions:\n#Use this function to interact with an AI agent.\nCALL<!|agentType: str, agentName: str, msg: str|!>\nagentType: A string used to specify the type of AI agent. It can be of the following types:\n    1. "coder". An excellent coder, you need to leave any programming problems other than ext-modules building to him. It should be noted that he can only return the code to you, but does not have the authority to execute the code and configure the environment, this is where you need to help him.\n    2. "module-coder". The only agent capable of building ext-modules, and this is its sole responsibility.\n    3. "module-loader". An agent that can help you load and use ext-modules. Please pass in the address of the module when creating the agent, and then you can let it help you interact with the module. You can use this agent to assist in ext-module debugging.\n    4. "researcher". Suitable for technical problem search tasks such as library document or sample code search on the internet.\nagentName: The name of the AI agent instance. Create a new one if not found. Please use a concise name that reflects the agent’s responsibilities.\nmsg: message need to be told to the agent. The agent cannot see content other than msg. Please provide complete content in msg.\n\n#Dialog with user. Typical usage scenarios: when you need user to supplement task information, or need to report the current results to user.\nRESPOND<!|message: str|!>\n\n#Execute bash script. A timeout error will occur for programs that have not been completed for a long time.\nBASH<!|code: str|!>\n\n#Execute python code. Please note that you need to copy the complete code here, and you must not use references.\nPYTHON<!|code: str|!>\n\n#Wait for some seconds. The unit of the "duration" parameter is seconds. Useful when waiting for script or command execution to complete. \nWAIT<!|duration: int|!>\n\nEXAMPLE:\n!CALL<!|"coder", "clock_coder", "Build a program that can check CPU usage."|!>\n!PYTHON<!|print(\'hello wolrd.\')|!>\n\n\nEnd of general instructions.\n\nActive Agents: [\'file_finder: agentType coder\']\n\nRelevant Information: !CALL<!|"coder", "file_finder", "Write a Python script to find all .ppt files on the system and list their directories, with emphasis on those likely to contain study materials"|!>\n\nBased on the user\'s request,\nThe "RELEVANT INFORMATION" part contains data that may be related to the current task, originating from your own history or the histories of other agents. Please refrain from attempting to invoke functions mentioned in the relevant information, as you may not necessarily have the permissions to do so.\n\n'}, {'role': 'user', 'content': 'Function returned: {Agent file_finder returned: To create a Python script that finds all PowerPoint files (with a .ppt extension) on the system and lists their directories, with emphasis on those likely to contain study materials, we\'ll use the `os` and `glob` modules. The "emphasis" part is subjective, but we can consider directories containing keywords like "study," "education," "class," etc.\n\n```python\nimport os\nimport glob\n\n# List of keywords to identify directories potentially containing study materials\nstudy_keywords = ["study", "education", "class", "lecture", "presentation"]\n\ndef find_ppt_files():\n    # Find all ppt files in the file system\n    all_ppt_files = []\n    for root, dirs, files in os.walk("/"):\n        ppt_files = [os.path.join(root, f) for f in files if f.endswith(".ppt")]\n        all_ppt_files.extend(ppt_files)\n\n    # Filter directories based on study keywords\n    study_material_directories = set()\n    for ppt_file_path in all_ppt_files:\n        directory = os.path.dirname(ppt_file_path)\n        for keyword in study_keywords:\n            if keyword.lower() in directory.lower():\n                study_material_directories.add(directory)\n                break  # Add the directory only once even if it matches multiple keywords\n\n    # Print all ppt files along with their directories\n    print("All PowerPoint (.ppt) files:")\n    for ppt_file in all_ppt_files:\n        print(f"- {ppt_file}")\n\n    print("\\nDirectories likely to contain study materials:")\n    for dir in study_material_directories:\n        print(f"- {dir}")\n\nif __name__ == "__main__":\n    find_ppt_files()\n```\n\n**Instructions:**\n1. Replace the starting directory (`"/"`) with the appropriate starting directory if you don\'t want to search the entire file system.\n2. Run this script as a standalone program; it will list all .ppt files and highlight directories that might contain study materials.\n\nPlease note that searching the entire file system could take a significant amount of time depending on its size and structure. This script does not check the content of the PowerPoint files, only the directory names.\n}'}], 2221)

Adding ollama support fails

I tried to include ollama model into /home/user/.config/ailice/config.json as explained in the readme:

(env) user@debian-ai:~/AIlice$ ailice_main --modelID=ollama:llama2:latest --prompt="main"
config.json is located at /home/user/.config/ailice

In order to simplify installation and usage, we have set local execution as the default behavior, which means AI has complete control over the local environment. To prevent irreversible losses due to potential AI errors, you may consider one of the following two methods: the first one, run AIlice in a virtual machine; the second one, install Docker, use the provided Dockerfile to build an image and container, and modify the relevant configurations in config.json. For detailed instructions, please refer to the documentation.
killing proc with PID 27298
killing proc with PID 27299
killing proc with PID 27302
killing proc with PID 27303
killing proc with PID 27305
killing proc with PID 27308
killing proc with PID 27309
storage  started.
browser  started.
arxiv  started.
google  started.
duckduckgo  started.
scripter  started.
computer  started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
We now start the vector database. Note that this may include downloading the model weights, so it may take some time.
Vector database has been started. returned msg: vector database has been switched to a non-persistent version. tokenizer: bert-base-uncased, model: nomic-ai/nomic-embed-text-v1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Encountered an exception, AIlice is exiting: 'llama2:latest'
  File "/home/user/AIlice/ailice/AIliceMain.py", line 126, in main
    mainLoop(**kwargs)
  File "/home/user/AIlice/ailice/AIliceMain.py", line 91, in mainLoop
    llmPool.Init([modelID])
  File "/home/user/AIlice/ailice/core/llm/ALLMPool.py", line 21, in Init
    self.pool[id] = MODEL_WRAPPER_MAP[config.models[modelType]["modelWrapper"]](modelType=modelType, modelName=modelName)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AIlice/ailice/core/llm/AModelChatGPT.py", line 16, in __init__
    modelCfg = config.models[modelType]["modelList"][modelName]
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^

the config.json looks like this:

 {
  "maxMemory": {},
  "quantization": null,
  "models": {
    "hf": {...},
    "peft": {...},
    "oai": {...},
    "groq": {...},
    "mistral": {...},
    "ollama": {
      "modelWrapper": "AModelChatGPT",
      "apikey": "fakekey",
      "baseURL": "http://localhost:4000",
      "modelList": {
        "ollama/llama2:latest": {
          "contextWindow": 8192,
          "systemAsUser": false
        }
      }
    },
    "anthropic": {...}
  },
  "temperature": 0.0,
  "flashAttention2": false,
  "speechOn": false,
  "contextWindowRatio": 0.6,
  "services": {...}
}

Cannot build Docker container using provided Dockerfile

Hey there,
when I run the Dockerfile using
docker build -t env4scripter .
as explained in the documentation I get the following error.

 > [10/10] RUN pip3 install pyzmq:
0.580 error: externally-managed-environment
0.580
0.580 × This environment is externally managed
0.580 ╰─> To install Python packages system-wide, try apt install
0.580     python3-xyz, where xyz is the package you are trying to
0.580     install.
0.580
0.580     If you wish to install a non-Debian-packaged Python package,
0.580     create a virtual environment using python3 -m venv path/to/venv.
0.580     Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
0.580     sure you have python3-full installed.
0.580
0.580     If you wish to install a non-Debian packaged Python application,
0.580     it may be easiest to use pipx install xyz, which will manage a
0.580     virtual environment for you. Make sure you have pipx installed.
0.580
0.580     See /usr/share/doc/python3.12/README.venv for more information.
0.580
0.580 note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
0.580 hint: See PEP 668 for the detailed specification.
------
Dockerfile:14
--------------------
  12 |     COPY ailice/modules/AScrollablePage.py /scripter/ailice/modules/AScrollablePage.py
  13 |
  14 | >>> RUN pip3 install pyzmq
  15 |
  16 |     EXPOSE 59000-59200
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install pyzmq" did not complete successfully: exit code: 1

I managed to bypass this issue by creating a virtual environment.

However, I encountered several issues even after creating a working Docker container.

First of all, several Python packages are missing now, like numpy.
Furthermore, after installing those missing libraries, access to the AIlice modules is not working either and needs to be fixed.
So far, I have not found a solution to this issue.

Thus, I'm not able to use the dockerized method right now.

module storage failed

PS C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice> ailice_web --modelID=groq:llama3-70b-8192
config.json is located at C:\Users\ZIWIG\AppData\Local\Steven Lu\ailice
In order to simplify installation and usage, we have set local execution as the default behavior, which means AI has complete control over the local environment. To prevent irreversible losses due to potential AI errors, you may consider one of the following two methods: the first one, run AIlice in a virtual machine; the second one, install Docker, use the provided Dockerfile to build an image and container, and modify the relevant configurations in config.json. For detailed instructions, please refer to the documentation.
storage started.
browser started.
arxiv started.
google started.
duckduckgo started.
scripter started.
computer started.
Connecting module storage FAILED. You can try running the module manually and observe its error messages.
Connecting module storage FAILED. You can try running the module manually and observe its error messages.
Connecting module storage FAILED. You can try running the module manually and observe its error messages.
Connecting module storage FAILED. You can try running the module manually and observe its error messages.
Connecting module storage FAILED. You can try running the module manually and observe its error messages.
It seems that some peripheral module services failed to start. EXCEPTION: Resource temporarily unavailable
File "C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice\ailice\AIliceWeb.py", line 47, in mainLoop
clientPool.Init()
File "C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice\ailice\common\ARemoteAccessors.py", line 17, in Init
raise e
File "C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice\ailice\common\ARemoteAccessors.py", line 14, in Init
self.pool[cfg['addr']] = makeClient(cfg['addr'])
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice\ailice\common\lightRPC.py", line 126, in makeClient
ret=ReceiveMsg(socket)
^^^^^^^^^^^^^^^^^^
File "C:\Users\ZIWIG\Downloads\ScraperAI\gpt4V-scraper\AIlice\ailice\common\lightRPC.py", line 27, in ReceiveMsg
return pickle.loads(conn.recv())
^^^^^^^^^^^
File "_zmq.py", line 1137, in zmq.backend.cython._zmq.Socket.recv
File "_zmq.py", line 1172, in zmq.backend.cython._zmq.Socket.recv
File "_zmq.py", line 1264, in zmq.backend.cython._zmq._recv_copy
File "_zmq.py", line 1259, in zmq.backend.cython._zmq._recv_copy
File "_zmq.py", line 152, in zmq.backend.cython._zmq._check_rc

can't run correctly

Connecting module browser FAILED. You can try running the module manually and observe its error messages.

AIlice start error

install through docker.

ailice_web --modelID=hf:openchat/openchat_3.5 --prompt="main" --quantization=8bit --contextWindowRatio=0.6                                   

config.json is located at /home/hakuju/.config/ailice/config.json
In order to simplify installation and usage, we have set local execution as the default behavior, which means AI has complete control over the local environment. To prevent irreversible losses due to potential AI errors, you may consider one of the following two methods: the first one, run AIlice in a virtual machine; the second one, install Docker, use the provided Dockerfile to build an image and container, and modify the relevant configurations in config.json. For detailed instructions, please refer to the documentation.
The port range of the ext-modules has been changed from 2005-2016 to 59000-59200. If you are using an old version, startup failure will occur after updating the code. Please modify the port number in config.json and rebuild the docker image.
kill: (1198047): 不允许的操作 // not allowed operation
kill: (1212154): 没有那个进程 //no such process
scripter  started.
Encountered an exception, AIlice is exiting: 'storage'
  File "/data/AIlice/ailice/AIliceWeb.py", line 96, in main
    mainLoop(**kwargs)
  File "/data/AIlice/ailice/AIliceWeb.py", line 42, in mainLoop
    clientPool.Init()
  File "/data/AIlice/ailice/common/ARemoteAccessors.py", line 10, in Init
    storage = config.services['storage']['addr']

config.json：

"maxMemory": {},
  "quantization": null,
  "openaiGPTKey": null,
  "temperature": 0.0,
  "flashAttention2": false,
  "speechOn": false,
  "contextWindowRatio": 0.6,
  "services": {
    "scripter": {
      "cmd": "docker start scripter",
      "addr": "tcp://127.0.0.1:59000"
    }
  }
}

cpu: arm64,
os: ubuntu 20.04

error

Use Playwright instead of Selenium in AWebBrowser.py for easier chrome driver instalation?

With playwrite is easier chrome driver updates and instalation. It's just a matter of "playwright install"
Check it out.
https://www.zenrows.com/blog/playwright-vs-selenium

Cannot install on Windows

I create new clean Python environment.
Then the following commands all work

git clone https://github.com/myshell-ai/AIlice.git
cd AIlice
pip install -e .

But when I try the next pip install -e .[speech] it fails with these errors...
fail.txt

Any ideas? Thanks.

Graph DB?

Please give details of where/how I can help integrate Graph DB in this project. Pointers, bullets, etc.
I am interested in C++ ingestion, meaning the directory, all the files. Chunk them up with each function in its own chunk. Make summaries of all the chunks that contain exact scope-resolved names and then make embeddings of the summaries. Extract all the entities/relationships into a graph.
I have made some progress on this on my own by parsing doxygen output XML, since doxygen has already done the hard work of extracting all the entities/relationships...Would rather contribute here since Ailice IMO is doing it right and I don't want to re-invent the wheel.
EDIT: I also want this to be a general solution that works equally for Rust, Python, etc.
Basically some kind of structure where it is forced to write a test function first, then write the implementation of the actual function, then make sure it runs, and make sure the test passes -- systematically, piece by piece until it's all done. I don't want it to step all over all the previous work it did, nor do I want it to continue coding the next feature until after it has first ensured that the most recent change passes the tests. Only then should it continue on to the next piece. And even if it's debugging, it shouldn't lose all of the work it has previously done. Because it will blow a lot of money while debugging something, only to later just delete the work it did.
What does it do now, and where does it need to go, and is there any specific area I can pitch in? Details please to speed me up.

502 Network Error

(base) PS G:\AIlice> conda activate AIlice
(AIlice) PS G:\AIlice> ailice_web --modelID=lm-studio:TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q5_K_M.gguf --prompt="main" --contextWindowRatio=0.5
config.json is located at C:\Users\29099\AppData\Local\Steven Lu\ailice
In order to simplify installation and usage, we have set local execution as the default behavior, which means AI has complete control over the local environment. To prevent irreversible losses due to potential AI errors, you may consider one of the following two methods: the first one, run AIlice in a virtual machine; the second one, install Docker, use the provided Dockerfile to build an image and container, and modify the relevant configurations in config.json. For detailed instructions, please refer to the documentation.
storage  started.
browser  started.
arxiv  started.
google  started.
duckduckgo  started.
scripter  started.
files  started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
We now start the vector database. Note that this may include downloading the model weights, so it may take some time.
Vector database has been started. returned msg: vector database has been switched to a non-persistent version. tokenizer: bert-base-uncased, model: nomic-ai/nomic-embed-text-v1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
ASSISTANT_AIlice:  Exception in thread Thread-6:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "G:\AIlice\ailice\core\AProcessor.py", line 83, in __call__
    ret = self.llm.Generate(prompt, proc=partial(self.outputCB, "ASSISTANT_" + self.name), endchecker=self.interpreter.EndChecker, temperature = config.temperature)
  File "G:\AIlice\ailice\core\llm\AModelChatGPT.py", line 26, in Generate
    for chunk in self.client.chat.completions.create(model=self.modelName,
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_utils\_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\resources\chat\completions.py", line 663, in create
    return self._post(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 889, in request
    return self._request(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 965, in _request
    return self._retry_request(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 1013, in _retry_request
    return self._request(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 965, in _request
    return self._retry_request(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 1013, in _retry_request
    return self._request(
  File "C:\Users\29099\AppData\Roaming\Python\Python310\site-packages\openai\_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 502

myshell-ai / ailice Goto Github PK

ailice's Introduction

AIlice

Features

Quick Start

Quick Installation

COOL things we can do

Installation and Usage

Environment Configuration and Installation

If You Need to Frequently Use Google

Accelerate using GPU

Virtual Environment Settings for Code Execution

Code Update

Usage

Module Configuration

Useful Tips

Selection and Configuration of LLM

Guide to Choosing an LLM

The Most Outstanding Open-source Model

How to Add LLM Support

Using LLM through Inference Services

Example 1: ollama + litellm

Example 2: LM Studio

Example 3: Add open source multimodal model support

Open Source Models on Huggingface

Using Different Models in Different Agents

Development

Design

Computational Model: Interactive Agents Calling Tree

Basic Computing Unit: Tai Chi Diagram of LLM and Interpreter

Agent Design: Implementing the Interpreter Framework

Scripting Language: From Text to Reality

Multimodal: Collaboration of Rich Text and Variable Mechanisms

Self-Expansion: Growing Like a Tree

How Developers Should Get Started

Project Development Standards and Constraints

Future Development Roadmap

ailice's People

Contributors

Stargazers

Watchers

Forkers

ailice's Issues

On Mac M1 Max OSX 14.2.1 with python 3.11.

Recommend Projects

Recommend Topics

Recommend Org