Giter VIP home page Giter VIP logo

auto-llama-cpp's Introduction

Note

If you are interested in locally running autonomous agents, also have a look at this project of mine. It's much cleaner, more stable and under more active development.

Auto-Llama-cpp: An Autonomous Llama Experiment

This is a fork of Auto-GPT with added support for locally running llama models through llama.cpp. This is more of a proof of concept. It's sloooow and most of the time you're fighting with the too small context window size or the models answer is not valid JSON. But sometimes it works and then it's really quite magical what even such a small model comes up with. But obviously don't expect GPT-4 brilliance here.

Supported Models


Since this uses llama.cpp under the hood it should work with all models they support. As of writing this is

  • LLaMA
  • Alpaca
  • GPT4All
  • Chinese LLaMA / Alpaca
  • Vigogne (French)
  • Vicuna
  • Koala

Model Performance (the experience so far)


Response Quality

So far I have tried

  • Vicuna-13b-4BIT
  • LLama-13B-4BIT

Overall the Vicuna model performed much better than the original LLama model in terms of answering in the required JSON format and how much sense the answers make. I just couldn't get it to stop starting every answer with ### ASSISTANT. I am very curious to hear how well others models perform. The 7B models seemed have problems with grasping what's asked of them in the prompt, but I tried very little in this direction since the inference speed didn't seem to be much faster for me.

Inference Speed

The biggest problem at the moment is indeed inference speed. As the agent is self prompting a lot, a few seconds of infernce that are acceptable in a chatbot scenario become minutes and more. Testing things like different prompts etc is a pain under these conditions.

Discussion

Fell free to add your thoughts and experiences in the discussion area. What models did you try? How well did they work ou for you?

Future Plans


  1. Add GPU Support via GPTQ
  2. Improve Prompts
  3. Remove external API support (This is supposed to be completely self-contained agent)
  4. Add support for Open Assistent models

auto-llama-cpp's People

Contributors

andrescdo avatar billschumacher avatar blankey1337 avatar blankster avatar coditamar avatar coley-angel avatar dr33dm3 avatar eltociear avatar fabricehong avatar hunteraraujo avatar jcp avatar keenborder786 avatar kinance avatar malikmalna avatar monkee1337 avatar nponeccop avatar onekum avatar pratiksinghchauhan avatar rhohndorf avatar richbeales avatar russellocean avatar slavakurilyak avatar sma-das avatar sweetlilmre avatar taytay avatar thomasfifer avatar tooktheredbean avatar torantulino avatar wladastic avatar yousefissa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auto-llama-cpp's Issues

Error message when calling scripts/main.py

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

I pulled the git repository, edited my copy of .env to suit my needs, and even recreated a run.bat from the original AutoGPT project, adding the check_requirements.py script to the scripts (why was it actually removed?).
Using a conda environment with Python 3.10.10
Pip successfully installed all requirements

Current behavior ๐Ÿ˜ฏ

now calling
python scripts/main.py
results in

Traceback (most recent call last):
  File "E:\LLama\Auto-Llama-cpp\scripts\main.py", line 3, in <module>
    import commands as cmd
  File "E:\LLama\Auto-Llama-cpp\scripts\commands.py", line 1, in <module>
    import browse
  File "E:\LLama\Auto-Llama-cpp\scripts\browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "E:\LLama\Auto-Llama-cpp\scripts\llm_utils.py", line 7
    def create_chat_completion(messages[0]["content"], model=None, temperature=cfg.temperature, max_tokens=0)->str:
                                       ^
SyntaxError: invalid syntax

What am i doing wrong?

Expected behavior ๐Ÿค”

It never executes as it should and doesn't seem to find my model.
Im not sure if that is the problem or if it's even earlier.

Your prompt ๐Ÿ“

There is no last_run_ai_settings.yaml, because it never executes.

What am i doing wrong?

Memory Error -- shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

After thinking, I got the following error (on a Ubuntu 22.04 VM):

Using memory of type: LocalCache
| Thinking...
llama_print_timings: load time = 629.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 629.34 ms / 2 tokens ( 314.67 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 629.68 ms
Traceback (most recent call last):
File "/data/Auto-Llama-cpp/scripts/main.py", line 331, in
assistant_reply = chat.chat_with_ai(
File "/data/Auto-Llama-cpp/scripts/chat.py", line 77, in chat_with_ai
relevant_memory = permanent_memory.get_relevant(str(full_message_history[-5:]), 10)
File "/data/Auto-Llama-cpp/scripts/memory/local.py", line 105, in get_relevant
scores = np.dot(self.data.embeddings, embedding)
File "<array_function internals>", line 5, in dot
ValueError: shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

Running the app in docker but cannot find EMBED_DIM var.

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

  1. Using ggml-vicuna-13b-4bit.bin model
  2. Changed .env file (From the default)

SMART_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
FAST_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
EMBED_DIM = 8192
  1. Running docker build -t foo/auto-llama .
  2. Running docker run -p80:3000 foo/auto-llama

Current behavior ๐Ÿ˜ฏ

docker run -p80:3000 foo/auto-llama
Traceback (most recent call last):
  File "/app/main.py", line 3, in <module>
    import commands as cmd
  File "/app/commands.py", line 1, in <module>
    import browse
  File "/app/browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "/app/llm_utils.py", line 4, in <module>
    cfg = Config()
          ^^^^^^^^
  File "/app/config.py", line 18, in __call__
    cls._instances[cls] = super(
                          ^^^^^^
  File "/app/config.py", line 69, in __init__
    self.EMBED_DIM = int(os.getenv("EMBED_DIM"))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
(base)

Expected behavior ๐Ÿค”

Was hoping the app would run after the steps above.
I'm sure I'm misconfiguring the setup.

Your prompt ๐Ÿ“

# Paste your prompt here

Inference time slow: running llama.cpp in child processes doesn't use full CPU capacity

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

npm start

./test-installation.sh

Current behavior ๐Ÿ˜ฏ

On Mac Mini M1, at 8 threads, llama.cpp is way slower than expected.
It only uses 20-30% of available resources for each worker.

Expected behavior ๐Ÿค”

Should use 100% resources for each thread.

Your prompt ๐Ÿ“

N/A

Error Message after "thinking"

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

You start the Program by executing the main.py File.
Then you press "y" and Enter.
After a couple of minutes you will get an Error Message.
I am using the Vicuna 13b Model.

Current behavior ๐Ÿ˜ฏ

I press enter and this is the Output after letting it "think".

AutoGPT INFO Error:
: Traceback (most recent call last):
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\main.py", line 79, in print_assistant_thoughts
assistant_reply_json = fix_and_parse_json(assistant_reply)
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\json_parser.py", line 52, in fix_and_parse_json
brace_index = json_str.index("{")
ValueError: substring not found

Expected behavior ๐Ÿค”

I think it should continue with the Process.

Your prompt ๐Ÿ“

# Paste your prompt here

How to run with CUDA

as newer version of llama.cpp support GPU how can we use that wiht this

iam new here

dockerfile lable error image to dangling easy fix for you if you want :D

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

=> => exporting layers 8.5s
=> => exporting manifest sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => exporting config sha256:00a84a43fa0487344f94620038375e6d5607fc5f482c144bf8e938ceb7c76803 0.0s
=> => naming to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => unpacking to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4

but something like perks it right up, the source image can be swapped for rocm, intel, arm etc. i habve a cuda gpu so i played to my strong suite.

Use an official CUDA runtime as a parent image

FROM nvidia/cuda:11.5.0-runtime-ubuntu20.04

Install Python and any necessary dependencies

RUN apt-get update && apt-get install python3.11 python3-pip -y

Set the working directory to /app

WORKDIR /app

Copy the scripts and requirements.txt files into the container at /app

COPY scripts/ /app/scripts/
COPY requirements.txt /app/

Install any necessary Python packages

RUN ls //requirements.txt|xargs -n 1 -P 3 pip install -r

Set any necessary environment variables

ENV CUDA_VISIBLE_DEVICES=all

Set the command to run when the container starts

CMD ["python3.11", "/bin/bash"]

Current behavior ๐Ÿ˜ฏ

failure to store and run image

Expected behavior ๐Ÿค”

store and run image

Your prompt ๐Ÿ“

# Paste your prompt here
``` doesnt get that far

json loads error Expecting value: line 1 column 1 (char 0)

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

I have the same problem when running any model, I tried running different versions of Vicuna since with the original 13b it also gives the same problem, I'm running it as it comes, I just added the model to the .env file, the prompt it has is the one it comes with default

Llama.generate: prefix-match hit
| Thinking...
llama_print_timings: load time = 1466.45 ms
llama_print_timings: sample time = 30.25 ms / 31 runs ( 0.98 ms per run)
llama_print_timings: prompt eval time = 768713.51 ms / 987 tokens ( 778.84 ms per token)
llama_print_timings: eval time = 24162.74 ms / 30 runs ( 805.42 ms per run)
llama_print_timings: total time = 794693.56 ms
Assistent Reply If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
Error:
Traceback (most recent call last): File "scripts/main.py", line 79, in print_assistant_thoughts assistant_reply_json = fix_and_parse_json(assistant_reply) File "/root/llama.cpp/Auto-Llama-cpp/scripts/json_parser.py", line 52, in fix_and_parse_json brace_index = json_str.index("{") ValueError: substring not found
json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
NEXT ACTION: COMMAND = Error: ARGUMENTS = substring not found
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for Entrepreneur-GPT...
Input:

Current behavior ๐Ÿ˜ฏ

Default

Expected behavior ๐Ÿค”

Default

Your prompt ๐Ÿ“

Defaulr prompt

I want to change other huggingface local models.

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

  1. If I want to change other huggingface local models, shall I modify this field? llm = Llama(model_path="ggml-vicuna-13b-4bit.bin", n_ctx=2048, embedding=True) But they have a lot of bin section is how to load: https://huggingface.co/chavinlo/gpt4-x-alpaca/tree/main
    pytorch_model-00001-of-00006.bin
    pytorch_model-00002-of-00006.bin
    pytorch_model-00003-of-00006.bin
    pytorch_model-00004-of-00006.bin
    pytorch_model-00005-of-00006.bin
    pytorch_model-00006-of-00006.bin
  2. May I ask if I need to fill in my openai-key when I query the configuration file /.env.template with OPENAI_API_KEY=your-openai-api-key? I see that the code does not run Openai-key, shall I use it to access openai in later versions? Can I omit this parameter now?

Examples ๐ŸŒˆ

No response

Motivation ๐Ÿ”ฆ

No response

error in running docker build

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

when I ran docker run -p80:3000 auto-llama1, I got the following errors:

Welcome to Auto-Llama! Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI: For example, 'Entrepreneur-GPT'
AI Name: Traceback (most recent call last):
File "/app/main.py", line 313, in
prompt = construct_prompt()
^^^^^^^^^^^^^^^^^^
File "/app/main.py", line 205, in construct_prompt
config = prompt_user()
^^^^^^^^^^^^^
File "/app/main.py", line 231, in prompt_user
ai_name = utils.clean_input("AI Name: ")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/utils.py", line 3, in clean_input
return input(prompt)
^^^^^^^^^^^^^
EOFError: EOF when reading a line

Any idea how to fix it?

Current behavior ๐Ÿ˜ฏ

No response

Expected behavior ๐Ÿค”

No response

Your prompt ๐Ÿ“

# Paste your prompt here

I hope if it supports petals.dev

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

I think it will be useful to add support for petals.dev.
I think it will run faster and work with bigger models like llama 2 70b.

Examples ๐ŸŒˆ

No response

Motivation ๐Ÿ”ฆ

No response

Hard code file location of json.gbnf

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

in scripts/llm_utils.py:grammar, there is a line of code

grammar = LlamaGrammar.from_file("/home/ruben/Code/Auto-Llama-cpp/grammars/json.gbnf")

The line of code should not read a file with absolute path, but I am not should what should be used instead.

Current behavior ๐Ÿ˜ฏ

Error is prompt and program terminated. As I use Docker, I need to modify Dockerfile and add the lines below and build the image:

RUN mkdir -p /home/ruben/Code/Auto-Llama-cpp
COPY grammars /home/ruben/Code/Auto-Llama-cpp/grammars

Expected behavior ๐Ÿค”

The application should be able to build by

docker build -t auto-llama .

And run with

docker run -it --env-file "./.env" -v "<MODEL_PATH>:/models" auto-llama

Your prompt ๐Ÿ“

# Paste your prompt here

LLM call

Hi, I noticed that when calling LLM in the code, only the first item in the messages list is passed as the prompt. Is this an error?

response = llm(messages[0]["content"], stop=["Q:", "### Human:"], echo=False, temperature=temperature, max_tokens=max_tokens)

cublas implemetation?

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

for llama, there's a flag called --gpu-layers N, basically oflloads some layers to the gpu for processing

Examples ๐ŸŒˆ

image
from ooba

Motivation ๐Ÿ”ฆ

since cpu is super slow, gpu would be nice

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.