Hello, After using for couple of minutes, this error occurs and the

Maybe <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Doesn't look offline specific. Same as this: <a class="issue-link js-issue-link" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, can you try latest llama_cpp_python? i.e. for metal: <div class="snippet-clip

GGML_ ASSERT: /private/var/folders/g0/p18kgc7d571crl09kq2j1hpm0000gp/T/pip-install-mmgwywty/llama-cpp-python_7e131ef142c044a6a72760eb49448fe4/ven dor/llama.cpp/ggml-backend.c:212: offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds" Fatal Python error: Aborted,about h2oai/h2ogpt

Comments (23)

pseudotensor commented on June 1, 2024 1

Maybe @Mathanraj-Sharma fixed something? Feel free to close until happens again.

from h2ogpt.

pseudotensor commented on June 1, 2024

Doesn't look offline specific. Same as this: abetlen/llama-cpp-python#1241

from h2ogpt.

pseudotensor commented on June 1, 2024

I'd guess bug in original llama.cpp, but I don't see this issue there. So maybe bug in llama_cpp_python.

You could try the latest version, via manual install like:

pip uninstall -y llama-cpp-python llama-cpp-python-cuda
    export LLAMA_CUBLAS=1
    export CMAKE_ARGS=-DLLAMA_CUBLAS=on
    export FORCE_CMAKE=1
    CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --verbose -c reqs_optional/reqs_constraints.txt

Or choose whichever hardware you have:
https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends

from h2ogpt.

SkanderBS23 commented on June 1, 2024

I've tried your suggestion with Metal Backend (for a mbp M1) and still having the same issue.
I assume that my hardware is very cheap (16gb of ram and M1 chip) but still testing the features before implementing this on a server.

After loading the model in the UI (mistral-7b quantized) it takes couple of queries and then the app crashes.
When Trying to switch from a model to another, the whole systeme freezes untill i force stop the app.

Note : The model is hallucinating and generates out of context responses for simple queries not even related to docs. (dunno if it's related or not) / Not the case with llama.

Will leave this here might be helpful ?

from h2ogpt.

pseudotensor commented on June 1, 2024

@Mathanraj-Sharma Can you check if your M1 and in UI you can load and unload and then reload models, and it goes off M1 memory in between without crashing?

from h2ogpt.

pseudotensor commented on June 1, 2024

@SkanderBS23 Can you pass --max_seq_len=2048 so it doesn't have to reload to auto-detect the context size, and limit the context size some? If you are getting garbage results, sounds like bad model. But you can share the output in the UI.

from h2ogpt.

SkanderBS23 commented on June 1, 2024

@pseudotensor I've tried with --max_seq_len=2048 it appears to be lighter and consumer lower performances (loading the model in the UI) but still the same issue. As usual it handles couple of queries and then crashes.
For the context size i'm using couple of documents (5 light pdfs) for tests, and asking simple questions resulting in 2/3 lines of generated text. i don't think that can be the issue ?

for the models i'm using llama2-7b which was downloaded directly when running the UI 1st time and loaded the model and for mistral i'm using mistral-7b-v0.1.Q4_K_M

I will share the detailed error if it can help

from h2ogpt.

pseudotensor commented on June 1, 2024

Hi, can you try latest llama_cpp_python? i.e. for metal:

pip uninstall llama_cpp_python llama_cpp_python_cuda -y
export CMAKE_ARGS="-DLLAMA_METAL=on"
export FORCE_CMAKE=1
pip install llama_cpp_python==0.2.55 --force-reinstall --no-cache-dir

i.e. 0.2.55?

from h2ogpt.

SkanderBS23 commented on June 1, 2024

I have done that earlier, it is actually on metal i've followed the instructions of your 1st reply and managed to change the 2nd environnement variable to METAL (like the one provided above) and llama-cpp-python is actually on version 0.2.55

from h2ogpt.

pseudotensor commented on June 1, 2024

If the latest llama_cpp_python bulit for Metal fails and old llama_cpp_python fails too with out of bounds, I'm unsure what is wrong.

Maybe try again complete install for Metal case, since case I shared originally at first was only cuda.

from h2ogpt.

SkanderBS23 commented on June 1, 2024

@pseudotensor I did reinstall and follow the instruction for the metal case and still having the same issue, i still could not figure out from where it comes.

Tried even working with the base command :

python generate.py --base_model=TheBloke/zephyr-7B-beta-GGUF --prompt_type=zephyr --max_seq_len=4096

and not running offline and still having the same issue...

Works with OpenAI perfectly in the other hand.

from h2ogpt.

pseudotensor commented on June 1, 2024

Just curious if you can try the March 07, 2024 mac one-click installer, see if same issue.

from h2ogpt.

SkanderBS23 commented on June 1, 2024

i will try and keep you updated

from h2ogpt.

pseudotensor commented on June 1, 2024

https://github.com/h2oai/h2ogpt?tab=readme-ov-file#macos-cpum1m2-with-full-document-qa-capability

from h2ogpt.

SkanderBS23 commented on June 1, 2024

@pseudotensor @Mathanraj-Sharma Tried the installer, it's still not stable though bunch of errors but not related to the one i'm asking for...

for the installation :
-the runnable file is a document file and not worked for me when i opened it in finder, worked after executing the xattr and chmod commands and turned into an "Executable Unix File".

-After install is done, the app launched autimatically in the Web Browser and from there i loaded (dowloaded) the llama2 model.

-Installation of the model was done successfully, tested the prompt with a basic "Hello" message replied fine.

-When uploading documents errors below started occuring in loop and app crashed.

Major parts :

The WeasyPrint appeared from the beggining of the install process and kept looping at the end with the other errors aswell as the git repo one.

from h2ogpt.

pseudotensor commented on June 1, 2024

The above error just means you have something else (like prior h2oGPT) running on the same 7860 port. You should shutdown the old one.

from h2ogpt.

SkanderBS23 commented on June 1, 2024

this error occured when uploading documents already in the UI and the app was launched on the 7860 port...(2 first screenshots errors appears when app is starting and last screenshot error appears when i upload documents in the UI).

So this error appears when app already crashed and keeps looping..

when i (ctrl + c) and rerun the app it turns on normally on the 7860 port but crashes when i upload docs (while processing).

Well i guess i'll go back using OpenAI for now, i have some tests to do and trying to understand the code well to add and modify some features.

from h2ogpt.

pseudotensor commented on June 1, 2024

If you still have issues with the aspect of local GGUF, you can try ollama as inference server:

https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#running-ollama-vs-h2ogpt-as-inference-server

ollama uses llama.cpp too, but maybe they figured out something w.r.t. mac issues.

from h2ogpt.

Mathanraj-Sharma commented on June 1, 2024

@pseudotensor I can confirm the error is reproducible,

I started with TheBloke/zephyr-7B-beta-GGUF, the conversation was

User: Hi

GPT:  Hello! I'm here to assist you with any task or query you may have. Just let me know what you need, and I'll do my best to provide a helpful response. Have a great day!

Then I loaded the model TheBloke/Mistral-7B-v0.1-GGUF, for the first conversation itself the GPT started to hallucinate

User: hi

GPT: Hi there! How can I help you today? Industries are sectors of the economy that produce goods or services for sale. They can be classified based on the type of product or service they provide, such as manufacturing, agriculture, healthcare, finance, etc. Industries can also be classified based on the size of the companies within them, such as small businesses, medium-sized businesses, or large corporations. Industries can also be classified based on the geographic location of the companies within them, such as local, regional, national, or international. Industries can also be classified based on the type of ownership of the companies within them, such as public, private, or government-owned. Industries can also be classified based on the type of technology used by the companies within them, such as traditional, digital, or hybrid. Industries can also be classified based on the type of customers served by the companies within them, such as consumers, businesses, or governments. Industries can also be classified based on the type of products or services offered by the companies within them, such as physical goods, digital goods, or services. Industries can also be classified based on the type of business model used by the companies within them, such as subscription-based, advertising-based, or transaction-based. Industries can also be classified based on the type of business strategy used by the companies within them, such as growth-oriented, cost-oriented, or profit-oriented. Industries can also be classified based on the type of business culture used by the companies within them, such as collaborative, competitive, or innovative. Industries can also be classified based on the type of business ethics used by the companies within them, such as ethical, unethical, or illegal. Industries can also be classified based on the type of business practices used by the companies within them, such as sustainable, unsustainable, or unethical. Industries can also be classified based on the type of business regulations used by the companies within them, such as legal, illegal, or unethical. Industries can also be classified based on the type of business relationships used by the companies within them, such as cooperative, competitive, or collaborative. Industries can also be classified based on the type of business goals used by the companies within them, such as profit-oriented, growth-oriented, or sustainability-oriented. Industries can also be classified based on the type of business values used by the companies within them, such as customer-centric, employee-centric, or shareholder-centric. Industries can also be classified based on the type of business models used by the companies within them, such as traditional, digital, or hybrid. Industries can also be classified based on the type of business strategies used by the companies within them, such as growth-oriented, cost-oriented, or profit-oriented. Industries can also be classified based on the type of business cultures used by the companies within them, such as collaborative, competitive, or innovative. Industries can also be classified based on the type of business ethics used by the companies within them, such as ethical, unethical, or illegal. Industries can also be classified based on the type of business practices used by the companies within them, such as sustainable, unsustainable, or unethical. Industries can also be classified based on the type of business regulations used by the companies within them, such as legal, illegal, or unethical. Industries can also be classified based on the type of business relationships used by the companies within them, such as cooperative, competitive, or collaborative. Industries can also be classified based on the type of business goals used by the companies within them, such as profit-oriented, growth-oriented, or sustainability-oriented. Industries can also be classified based on the type of business values used by the companies within them, such as customer-centric, employee-centric, or shareholder-centric. Industries can also be classified based on the type of business models used by the companies within them, such as traditional, digital, or hybrid. Industries can also be classified based on the type of business strategies used by the companies within them, such as growth-oriented, cost-oriented, or profit-oriented. Industries can also be classified based on the type of business cultures used by the companies within them, such as collaborative, competitive, or innovative. Industries can also be classified based on the type of business ethics used by the companies within them, such as ethical, unethical, or illegal. Industries can also be classified based on the type of business practices used by the companies within them, such as sustainable, unsustainable, or unethical. Industries can also be classified based on the type of business regulations used by the companies within them, such as legal, illegal, or unethical. Industries can also be classified based on the type of business relationships used by the companies within them, such as cooperative, competitive, or collaborative. Industries can also be classified based on the type of business goals used by the companies within them, such as profit-oriented, growth-oriented, or sustainability-oriented. Industries can also be classified based on the type of business values used by the companies within them, such as customer

Then stopped manually and uploaded 4 small PDF ebooks related to ML and started querying, For the first query the GPT answered the same lines nonstop (repetitively the same answer printed)

User: what is best tool to detect drift in ML models?

GPT: The best tool to detect drift in ML models is the Alibi Detect package. It is a Python package that provides a variety of drift detection methods for machine learning models. It is designed to detect concept drift, which is a change in the underlying distribution of the data that can cause a model to perform poorly. The package provides a variety of drift detection methods, including online drift detection, which can detect drift as it happens, and offline drift detection, which can detect drift in historical data. The package also provides a variety of visualization tools for analyzing the results of the drift detection methods. The package is designed to be easy to use and integrate into existing machine learning workflows. It is also designed to be flexible, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets. The package is also designed to be robust, allowing users to detect drift in a variety of machine learning models, including deep learning models. The package is also designed to be efficient, allowing users to detect drift quickly and accurately. The package is also designed to be reliable, allowing users to detect drift consistently and accurately. The package is also designed to be secure, allowing users to detect drift securely and confidentially. The package is also designed to be cost-effective, allowing users to detect drift at a low cost. The package is also designed to be user-friendly, allowing users to detect drift easily and intuitively. The package is also designed to be customizable, allowing users to customize the drift detection methods to their specific needs. The package is also designed to be scalable, allowing users to detect drift in large datasets.

I manually stopped the reply and asked another question and the server crashes

User: give me an example on how to use evidently

Error:

INFO:     127.0.0.1:51758 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:51758 - "POST /queue/join HTTP/1.1" 200 OK
GGML_ASSERT: /private/var/folders/zj/gwq9z5vn0tz1w2vrj049st180000gn/T/pip-install-prkz_dg0/llama-cpp-python_eab482431350469ba8c1d451ab8f787e/vendor/llama.cpp/ggml-backend.c:212: offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"
Fatal Python error: Aborted

Complete log:
issue-1450-log.txt

from h2ogpt.

pseudotensor commented on June 1, 2024

For mistral need to use instruct version, e.g. TheBloke/Mistral-7B-Instruct-v0.2-GGUF and pass --prompt_type=mistral.

TheBloke/zephyr-7B-beta-GGUF is already instruct tuned as zephyr was DPO etc off mistral.

from h2ogpt.

SkanderBS23 commented on June 1, 2024

@pseudotensor for ollama if it uses llama.cpp i don't think there will be changes related to this, in the other hand it's not flexible as i've read (adjusting parameters and prompts).

I will stick with OpenAI atm till issue gets fixed..

from h2ogpt.

SkanderBS23 commented on June 1, 2024

@pseudotensor Seems to be fixed ? (no more errors for me for online and offline mode)

from h2ogpt.

GGML_ ASSERT: /private/var/folders/g0/p18kgc7d571crl09kq2j1hpm0000gp/T/pip-install-mmgwywty/llama-cpp-python_7e131ef142c044a6a72760eb49448fe4/ven dor/llama.cpp/ggml-backend.c:212: offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds" Fatal Python error: Aborted about h2ogpt HOT 23 OPEN

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent