Today, Gorilla end-points run on UC Berkeley hosted servers 🐻 When you try our colab

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I was expecting to be put next to <a href="https://huggingface.co/gorilla-llm" rel="no

[feature] Run gorilla locally without GPUs 🦍 about gorilla HOT 11 CLOSED

shishirpatil commented on July 16, 2024 5

[feature] Run gorilla locally without GPUs 🦍

from gorilla.

Comments (11)

ShishirPatil commented on July 16, 2024 3

@fire we have the mpt-ggml and the llama--ggml models up on Huggingface!
gorilla-llm/gorilla-7b-hf-v1-ggml
gorilla-llm/gorilla-mpt-7b-hf-v0-ggml

from gorilla.

fire commented on July 16, 2024 2

I am excited about a possible integration using ggml and mpt.

https://github.com/ggerganov/ggml/tree/master/examples/mpt

How much code is specific to gorilla that needs to port from python to c++?

How much functionality is finetuning the llm?

from gorilla.

ShishirPatil commented on July 16, 2024 2

Hey @fire for the first cut, we don't have to use any gorilla specific code nor any finetuning. It would just be inference - and there is no change in the architecture of either llama or MPT, so the port should be pretty straightforward. The model weights are here https://huggingface.co/gorilla-llm

from gorilla.

fire commented on July 16, 2024 2

The links are 404'ing not found.

from gorilla.

pranramesh commented on July 16, 2024 1

@ShishirPatil I believe the model here (https://huggingface.co/gorilla-llm/gorilla-7b-hf-v1-ggml) is a quantized version of the delta weights model (if this is the model quantized from the llama script posted above by @fire). I tried running inference and it was poor, probably due to the fact that it wasn't merged with llama first.

from gorilla.

fire commented on July 16, 2024

As of today which model should I be using? (weights)

from gorilla.

fire commented on July 16, 2024

Can someone help me quantize? I'm currently using mobile internet.

# get the repo and build it
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j

# get the model from HuggingFace
# be sure to have git-lfs installed
git clone https://huggingface.co/gorilla-llm/gorilla-mpt-7b-hf-v0

# convert model to FP16
python3 ../examples/mpt/convert-h5-to-ggml.py ./gorilla-mpt-7b-hf-v0 1

# run inference using FP16 precision
./bin/mpt -m ./gorilla-mpt-7b-hf-v0/ggml-model-f16.bin -p "I would like to translate 'I feel very good today.' from English to Chinese." -t 8 -n 64

# quantize the model to 5-bits using Q5_0 quantization
./bin/mpt-quantize ./gorilla-mpt-7b-hf-v0/ggml-model-f16.bin ./gorilla-mpt-7b-hf-v0/ggml-model-q5_0.bin q5_0

# run inference using FP16 precision
./bin/mpt -m ./gorilla-mpt-7b-hf-v0/ggml-model-q5_0.bin -p "I would like to translate 'I feel very good today.' from English to Chinese." -t 8 -n 64

from gorilla.

ShishirPatil commented on July 16, 2024

@fire good question re: models. gorilla-7b-hf-delta-v1 and gorilla-mpt-7b-hf-v0 are good models to get started with. The first is a diff of the model with llama base, and the second is MPT based.

re: quantize. How do you want to access the quantized model? 👀

from gorilla.

fire commented on July 16, 2024

I was expecting to be put next to https://huggingface.co/gorilla-llm but with a ggml tag and the tag q5 I think.

I've also written up instructions for llama

gorilla-llm/gorilla-7b-hf-delta-v1

# get the repo and build it
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake ..
make -j

# get the model from HuggingFace
# be sure to have git-lfs installed
git clone https://huggingface.co/gorilla-llm/gorilla-7b-hf-delta-v1

# convert model to FP16
python3 convert.py ~/gorilla-7b-hf-delta-v1

# run inference using FP16 precision
./bin/main -m ./gorilla-7b-hf-delta-v1/ggml-model-f16.bin -p "I would like to translate 'I feel very good today.' from English to Chinese." -t 8 -n 64

# quantize the model to 5-bits using Q5_0 quantization
./bin/quantize ./gorilla-7b-hf-delta-v1/ggml-model-f16.bin ./gorilla-7b-hf-delta-v1/ggml-model-q5_0.bin q5_0

# run inference using FP16 precision
./bin/main -m ./gorilla-7b-hf-delta-v1/ggml-model-q5_0.bin -p "I would like to translate 'I feel very good today.' from English to Chinese." -t 8 -n 64

Llama evaluates poorly. No idea why.

from gorilla.

ShishirPatil commented on July 16, 2024

Yikes, I think they were private! Made it public. Let me know if it works! Also, feel free to raise a PR for updates to README or anything you want to put into the HF models repo!

from gorilla.

CHIRU98 commented on July 16, 2024

Hi ShishirPatil still this is "gorilla-llm/gorilla-7b-hf-v1-ggml" is private.can you check ones.still its getting the 401 Client Error.

from gorilla.

[feature] Run gorilla locally without GPUs 🦍 about gorilla HOT 11 CLOSED

Comments (11)

gorilla-llm/gorilla-7b-hf-delta-v1

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent