Giter VIP home page Giter VIP logo

Comments (5)

Anil-matcha avatar Anil-matcha commented on May 18, 2024 2

GPT4All doesn't support GPU acceleration. Will add support for models like Llama which can do this

from embedai.

bradsec avatar bradsec commented on May 18, 2024 1

I was able to get GPU working with this Llama model: ggml-vic13b-q5_1.bin using a manual workaround.

# Download the ggml-vic13b-q5_1.bin model and place in privateGPT/server/models/
# Edit privateGPT.py and comment out GPT4 model and add LLama model
# Change n_gpu_layers=40 layers based on what Nvidia GPU (max is 40). Uses about 9GB VRAM.

def load_model():
    filename = 'ggml-vic13b-q5_1.bin'  # Specify the name for the downloaded file
    models_folder = 'models'  # Specify the name of the folder inside the Flask app root
    file_path = f'{models_folder}/{filename}'
    if os.path.exists(file_path):
        global llm
        callbacks = [StreamingStdOutCallbackHandler()]
        #llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
        llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, n_gpu_layers=40, callbacks=callbacks, verbose=False)

# Edit privateGPT/server/.env

# Update .env as follows
PERSIST_DIRECTORY=db
MODEL_TYPE=LlamaCpp
MODEL_PATH=models/ggml-vic13b-q5_1.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1000

# If using conda enviroment
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

# Remove and reinstall llama-cpp-python with ENV variables set
# Linux uses "export" not "set" like Windows for setting environment variables

pip uninstall llama-cpp-python
export CMAKE_ARGS="-DLLAMA_CUBLAS=on"
export FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

Run python privateGPT from privateGPT/server/ directory
You should see the following lines in output as the model loads

llama_model_load_internal: [cublas] offloading 40 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 9076 MB

from embedai.

jackyoung022 avatar jackyoung022 commented on May 18, 2024

Hi, thanks for your info.
But when I was following your step in my windows, I got this error:
Could not load Llama model from path: D:/code/privateGPT/server/models/ggml-vic13b-q5_1.bin. Received error (type=value_error)
Any idea about this? Thanks.

from embedai.

MyraBaba avatar MyraBaba commented on May 18, 2024

@bradsec

Hi,

I followed the instructions but looks still using cpu :

(venPrivateGPT) (base) alp2080@alp2080:~/data/dProjects/privateGPT/server$ python privateGPT.py
/data/dProjects/privateGPT/server/privateGPT.py:1: DeprecationWarning: 'flask.Markup' is deprecated and will be removed in Flask 2.4. Import 'markupsafe.Markup' instead.
from flask import Flask,jsonify, render_template, flash, redirect, url_for, Markup, request
llama.cpp: loading model from models/ggml-vic13b-q5_1.bin
llama_model_load_internal: format = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1000
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: mem required = 11359.05 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size = 781.25 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
LLM0 LlamaCpp
Params: {'model_path': 'models/ggml-vic13b-q5_1.bin', 'suffix': None, 'max_tokens': 256, 'temperature': 0.8, 'top_p': 0.95, 'logprobs': None, 'echo': False, 'stop_sequences': [], 'repeat_penalty': 1.1, 'top_k': 40}

  • Serving Flask app 'privateGPT'
  • Debug mode: off
    WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on all addresses (0.0.0.0)
  • Running on http://127.0.0.1:5000
  • Running on http://192.168.5.110:5000
    Press CTRL+C to quit
    Loading documents from source_documents

from embedai.

Musty1 avatar Musty1 commented on May 18, 2024

I tried this as well and it looks like it's still using CPU.. interesting. If anyone could suggest as to why it's not working with gpu, please let me know.

from embedai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.