🦙 Baby Code Interpreter

An open-source, locally-run, python code interpreter (like openAI's GPT-4 Plugin: Code-Interpreter) (though not as capable, for now 🚀)

Baby Code is:

powered by Llama.cpp
extremly SIMPLE & 100% LOCAL
CROSS-PLATFORM.

baby-code-interpreter.mov

Leveraging open source Llama-based models and powered by llama.cpp, this service is exposed through a Flask server which receives user's requests, processes them, and returns Python code.

🏗️ Architecture (in a nutshell)

🖥️ Backend: Python Flask (CORS for serving both the API and the HTML).
🌐 Frontend: HTML/JS/CSS (The UI was designed 100% to personal liking but open for changes).
⚙️ Engine: Llama.cpp (Inference library for Llama/GGML models).
🧠 Model: Llama-2 (Only models compatible with Llama.cpp).

🚀 Setup

Clone the repo:

git clone https://github.com/itsPreto/baby-code

Navigate to the project:

cd baby-code

Install the required libraries:

pip install -r requirements.txt

💾 Model Download

With everything installed you just need a model.
The 7B Llama-2 based model TheBloke/llama2-7b-chat-codeCherryPop-qLoRA-GGML is a model fine-tuned by a kind redditor
You may also download any other models supported by llama.cpp (llama-cpp-python), of any parameter size of your choosing.
Keep in mind that the paramters might need to be tuned for your specific case:

Building llama.cpp

Depending on you OS you have a few options for compiling and building the Llama.cpp library.

Please refer to their original build setup

🧠 Model Config

This project is configured to use llama.cpp to load up models for local inference using CPU or GPU. Once your model is downloaded you can simply place it in the \models folder and edit the llama.cpp init config below:

if __name__ == '__main__':
    # Run the external command
    subprocess.Popen(["./server", "-m", "models/code_cherry_Llama_q4_0.bin", "-c", "2048", "-ngl", "30"])

    # Pause for 5 seconds
    time.sleep(5)
    app.run(args.host, port=args.port)

You may also want to configure the baby-code.py server config at the top of the file:

parser = argparse.ArgumentParser(description="An example of using server.cpp with a similar API to OAI. It must be used together with server.cpp.")
parser.add_argument("--chat-prompt", type=str, 
                    help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')", 
                    default='[INST] <<SYS>>\nYou are a helpful assistant, that communicates ONLY and strictly ONLY with python code.\nThe code must include any necessary function definitions and must also include commands\nto run these functions and print the results.\nThe goal is to have a complete, runnable Python script in each response.\nYour responses MUST absolutely follow the format below:\npython\n```\n{CODE}\n```\nNOTE: If you are not able to provide an answer despite all your efforts you may simply\nreply with that reason.\nOBS: DO NOT INCLUDE ANY EXPLANATION. ONLY CODE.\nThe [INST] block will always be a json in the following format:\n{\n"prompt": {the user request}\n}\n<</SYS>> [/INST]')
# ...parser.add_argument("--chat-prompt", type=str, help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')", default='A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')
parser.add_argument("--user-name", type=str, help="USER name in chat completions(default: '\\nUSER: ')", default="\\nUSER: ")
parser.add_argument("--ai-name", type=str, help="ASSISTANT name in chat completions(default: '\\nASSISTANT: ')", default="\\nASSISTANT: ")
parser.add_argument("--system-name", type=str, help="SYSTEM name in chat completions(default: '\\nASSISTANT's RULE: ')", default="\\nASSISTANT's RULE: ")
parser.add_argument("--stop", type=str, help="the end of response in chat completions(default: '</s>')", default="</s>")
parser.add_argument("--llama-api", type=str, help="Set the address of server.cpp in llama.cpp(default: http://127.0.0.1:8080)", default='http://127.0.0.1:8080')
parser.add_argument("--api-key", type=str, help="Set the api key to allow only few user(default: NULL)", default="")
parser.add_argument("--host", type=str, help="Set the ip address to listen.(default: 127.0.0.1)", default='127.0.0.1')
parser.add_argument("--port", type=int, help="Set the port to listen.(default: 8081)", default=8081)

🏃‍♀️ Run it

To start the backend simply run:

python3 code-interpreter/baby-code.py

The Flask server will start and listen on the port of your choosing.

🌐 Endpoints

POST /completion: Given a prompt, it returns the predicted completion.

Options:

temperature: Adjust the randomness of the generated text (default: 0.8).

top_k: Limit the next token selection to the K most probable tokens (default: 40).

top_p: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P (default: 0.9).

n_predict: Set the number of tokens to predict when generating text. Note: May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. (default: 128, -1 = infinity).

n_keep: Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to 0 (meaning no tokens are kept). Use -1 to retain all tokens from the initial prompt.

stream: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to true.

prompt: Provide a prompt. Internally, the prompt is compared, and it detects if a part has already been evaluated, and the remaining part will be evaluate. A space is inserted in the front like main.cpp does.

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration (default: []).

tfs_z: Enable tail free sampling with parameter z (default: 1.0, 1.0 = disabled).

typical_p: Enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).

repeat_penalty: Control the repetition of token sequences in the generated text (default: 1.1).

repeat_last_n: Last n tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).

penalize_nl: Penalize newline tokens when applying the repeat penalty (default: true).

presence_penalty: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).

frequency_penalty: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);

mirostat: Enable Mirostat sampling, controlling perplexity during text generation (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).

mirostat_tau: Set the Mirostat target entropy, parameter tau (default: 5.0).

mirostat_eta: Set the Mirostat learning rate, parameter eta (default: 0.1).

seed: Set the random number generator (RNG) seed (default: -1, -1 = random seed).

ignore_eos: Ignore end of stream token and continue generating (default: false).

logit_bias: Modify the likelihood of a token appearing in the generated text completion. For example, use "logit_bias": [[15043,1.0]] to increase the likelihood of the token 'Hello', or "logit_bias": [[15043,-1.0]] to decrease its likelihood. Setting the value to false, "logit_bias": [[15043,false]] ensures that the token Hello is never produced (default: []).
POST /tokenize: [NOT EXPOSED THROUGH baby-code.py yet] Tokenize a given text.

Options:

content: Set the text to tokenize.

Note that the special BOS token is not added in fron of the text and also a space character is not inserted automatically as it is for /completion.
POST /embedding: [NOT EXPOSED THROUGH baby-code.py yet] Generate embedding of a given text just as the embedding example does.

Options:

content: Set the text to process.

🤝 Contributing

Contributions to this project are welcome. Please create a fork of the repository, make your changes, and submit a pull request. I'll be creating a few issues for feature tracking soon!!

ALSO~~~ If anyone would like to start a Discord channel and help me manage it that would be awesome

(I'm not on it that much).

License

This project is licensed under the MIT License.

uetuluk / baby-llama.pycpp Goto Github PK

baby-llama.pycpp's Introduction

🦙 Baby Code Interpreter

Baby Code is:

🏗️ Architecture (in a nutshell)

🚀 Setup

💾 Model Download

Building llama.cpp

🧠 Model Config

🏃‍♀️ Run it

🌐 Endpoints

🤝 Contributing

License

baby-llama.pycpp's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent