Giter VIP home page Giter VIP logo

baby-llama.pycpp's Introduction

๐Ÿฆ™ Baby Code Interpreter

An open-source, locally-run, python code interpreter (like openAI's GPT-4 Plugin: Code-Interpreter) (though not as capable, for now ๐Ÿš€)

Baby Code is:

  • powered by Llama.cpp
  • extremly SIMPLE & 100% LOCAL
  • CROSS-PLATFORM.
baby-code-interpreter.mov

Leveraging open source Llama-based models and powered by llama.cpp, this service is exposed through a Flask server which receives user's requests, processes them, and returns Python code.

๐Ÿ—๏ธ Architecture (in a nutshell)

  • ๐Ÿ–ฅ๏ธ Backend: Python Flask (CORS for serving both the API and the HTML).
  • ๐ŸŒ Frontend: HTML/JS/CSS (The UI was designed 100% to personal liking but open for changes).
  • โš™๏ธ Engine: Llama.cpp (Inference library for Llama/GGML models).
  • ๐Ÿง  Model: Llama-2 (Only models compatible with Llama.cpp).

๐Ÿš€ Setup

  • Clone the repo:
git clone https://github.com/itsPreto/baby-code
  • Navigate to the project:
cd baby-code
  • Install the required libraries:
pip install -r requirements.txt

๐Ÿ’พ Model Download

  • With everything installed you just need a model.
  • The 7B Llama-2 based model TheBloke/llama2-7b-chat-codeCherryPop-qLoRA-GGML is a model fine-tuned by a kind redditor
  • You may also download any other models supported by llama.cpp (llama-cpp-python), of any parameter size of your choosing.
  • Keep in mind that the paramters might need to be tuned for your specific case:

Building llama.cpp

Depending on you OS you have a few options for compiling and building the Llama.cpp library.

Please refer to their original build setup

๐Ÿง  Model Config

This project is configured to use llama.cpp to load up models for local inference using CPU or GPU. Once your model is downloaded you can simply place it in the \models folder and edit the llama.cpp init config below:

if __name__ == '__main__':
    # Run the external command
    subprocess.Popen(["./server", "-m", "models/code_cherry_Llama_q4_0.bin", "-c", "2048", "-ngl", "30"])

    # Pause for 5 seconds
    time.sleep(5)
    app.run(args.host, port=args.port)

You may also want to configure the baby-code.py server config at the top of the file:

parser = argparse.ArgumentParser(description="An example of using server.cpp with a similar API to OAI. It must be used together with server.cpp.")
parser.add_argument("--chat-prompt", type=str, 
                    help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')", 
                    default='[INST] <<SYS>>\nYou are a helpful assistant, that communicates ONLY and strictly ONLY with python code.\nThe code must include any necessary function definitions and must also include commands\nto run these functions and print the results.\nThe goal is to have a complete, runnable Python script in each response.\nYour responses MUST absolutely follow the format below:\npython\n```\n{CODE}\n```\nNOTE: If you are not able to provide an answer despite all your efforts you may simply\nreply with that reason.\nOBS: DO NOT INCLUDE ANY EXPLANATION. ONLY CODE.\nThe [INST] block will always be a json in the following format:\n{\n"prompt": {the user request}\n}\n<</SYS>> [/INST]')
# ...parser.add_argument("--chat-prompt", type=str, help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')", default='A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')
parser.add_argument("--user-name", type=str, help="USER name in chat completions(default: '\\nUSER: ')", default="\\nUSER: ")
parser.add_argument("--ai-name", type=str, help="ASSISTANT name in chat completions(default: '\\nASSISTANT: ')", default="\\nASSISTANT: ")
parser.add_argument("--system-name", type=str, help="SYSTEM name in chat completions(default: '\\nASSISTANT's RULE: ')", default="\\nASSISTANT's RULE: ")
parser.add_argument("--stop", type=str, help="the end of response in chat completions(default: '</s>')", default="</s>")
parser.add_argument("--llama-api", type=str, help="Set the address of server.cpp in llama.cpp(default: http://127.0.0.1:8080)", default='http://127.0.0.1:8080')
parser.add_argument("--api-key", type=str, help="Set the api key to allow only few user(default: NULL)", default="")
parser.add_argument("--host", type=str, help="Set the ip address to listen.(default: 127.0.0.1)", default='127.0.0.1')
parser.add_argument("--port", type=int, help="Set the port to listen.(default: 8081)", default=8081)

๐Ÿƒโ€โ™€๏ธ Run it

  • To start the backend simply run:
python3 code-interpreter/baby-code.py 

The Flask server will start and listen on the port of your choosing.

๐ŸŒ Endpoints

  • POST /completion: Given a prompt, it returns the predicted completion.

    Options:

    temperature: Adjust the randomness of the generated text (default: 0.8).

    top_k: Limit the next token selection to the K most probable tokens (default: 40).

    top_p: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P (default: 0.9).

    n_predict: Set the number of tokens to predict when generating text. Note: May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. (default: 128, -1 = infinity).

    n_keep: Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to 0 (meaning no tokens are kept). Use -1 to retain all tokens from the initial prompt.

    stream: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to true.

    prompt: Provide a prompt. Internally, the prompt is compared, and it detects if a part has already been evaluated, and the remaining part will be evaluate. A space is inserted in the front like main.cpp does.

    stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration (default: []).

    tfs_z: Enable tail free sampling with parameter z (default: 1.0, 1.0 = disabled).

    typical_p: Enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).

    repeat_penalty: Control the repetition of token sequences in the generated text (default: 1.1).

    repeat_last_n: Last n tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).

    penalize_nl: Penalize newline tokens when applying the repeat penalty (default: true).

    presence_penalty: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).

    frequency_penalty: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);

    mirostat: Enable Mirostat sampling, controlling perplexity during text generation (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).

    mirostat_tau: Set the Mirostat target entropy, parameter tau (default: 5.0).

    mirostat_eta: Set the Mirostat learning rate, parameter eta (default: 0.1).

    seed: Set the random number generator (RNG) seed (default: -1, -1 = random seed).

    ignore_eos: Ignore end of stream token and continue generating (default: false).

    logit_bias: Modify the likelihood of a token appearing in the generated text completion. For example, use "logit_bias": [[15043,1.0]] to increase the likelihood of the token 'Hello', or "logit_bias": [[15043,-1.0]] to decrease its likelihood. Setting the value to false, "logit_bias": [[15043,false]] ensures that the token Hello is never produced (default: []).

  • POST /tokenize: [NOT EXPOSED THROUGH baby-code.py yet] Tokenize a given text.

    Options:

    content: Set the text to tokenize.

    Note that the special BOS token is not added in fron of the text and also a space character is not inserted automatically as it is for /completion.

  • POST /embedding: [NOT EXPOSED THROUGH baby-code.py yet] Generate embedding of a given text just as the embedding example does.

    Options:

    content: Set the text to process.

๐Ÿค Contributing

Contributions to this project are welcome. Please create a fork of the repository, make your changes, and submit a pull request. I'll be creating a few issues for feature tracking soon!!

ALSO~~~ If anyone would like to start a Discord channel and help me manage it that would be awesome

(I'm not on it that much).

License

This project is licensed under the MIT License.

baby-llama.pycpp's People

Contributors

ggerganov avatar slaren avatar johannesgaessler avatar sw avatar ikawrakow avatar prusnak avatar howard0su avatar anzz1 avatar slyecho avatar dannydaemonic avatar green-sky avatar kerfufflev2 avatar crd716 avatar ejones avatar ivanstepanovftw avatar comex avatar tjohnman avatar xaedes avatar freed-wu avatar mqy avatar blackhole89 avatar 0cc4m avatar jpodivin avatar katsu560 avatar glinscott avatar gjmulder avatar dfyz avatar j-f1 avatar li-plus avatar foldl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.