iamlemec / bert.cpp Goto Github PK
View Code? Open in Web Editor NEWThis project forked from xyzhang626/embeddings.cpp
GGML implementation of BERT model with Python bindings and quantization.
License: MIT License
This project forked from xyzhang626/embeddings.cpp
GGML implementation of BERT model with Python bindings and quantization.
License: MIT License
Thank you for your excellent work.
bge-m3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
When run the following command:
python convert-to-ggml.py './bge-m3' f16
Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: './bge-m3/vocab.txt'
Will make some changes to convert-to-ggml.py to support the new model?
I seems to get jina embeddings convert successfully with
python bert_cpp/convert.py jinaai/jina-embeddings-v2-base-code models/jina-f16.gguf
Seems the only change to original bert is ALiBi
, as described in https://huggingface.co/jinaai/jina-embeddings-v2-base-code
It'll be nice if we could adapt ggml_alibi
into this repo for jina embedding support
In CPU build, I experienced this issue both here and in the llama.cpp version.
GGML is built for edge AI so resource constrained devices but Both in CLI and as python, even for a small text of say 64 tokens the code seems to run very slow when the RAM available is 10Gb or low but runs really FAST when with more RAM like 15-20GB. I used 6 threads. With less threads again things slow.
Something is wrong. Have you encountered this ?
Hi there, I understand the GGML implementation focuses on BERT like models as embedding models. So output is pooled and normalised. But I am interested in using BERT as a MLM model with the MaskedLM header on top of the output layer like below. Could you please advise or help in accomodating bert.cpp to return MLM logits instead of pooled embeddings. so for a 8 token input including CLS and SEP the out will be of shape 1 x 10 x 30522 (bs, seq_len, vocab_size).
Thanks in advance,
BertForMaskedLM(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(cls): BertOnlyMLMHead(
(predictions): BertLMPredictionHead(
(transform): BertPredictionHeadTransform(
(dense): Linear(in_features=768, out_features=768, bias=True)
(transform_act_fn): GELUActivation()
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(decoder): Linear(in_features=768, out_features=30522, bias=True)
)
)
)
conversion is fine running the below
from bert_cpp import BertModel
mod = BertModel('models/bge-base-en-v1.5-f16.gguf')
batch = ["Hello, how are you"]
emb = mod.embed(batch)
throws
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
[<ipython-input-23-145b8ed4c48e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from bert_cpp import BertModel
2 mod = BertModel('models/bge-base-en-v1.5-f16.gguf')
3 batch = ["Hello, how are you"]
4 emb = mod.embed(batch)
3 frames
[/content/bert.cpp/bert_cpp/utils.py](https://localhost:8080/#) in load_shared_library(lib_base_name)
58 raise RuntimeError(f'Failed to load shared library "{lib_path}": {e}')
59
---> 60 raise FileNotFoundError(
61 f'Shared library with base name "{lib_base_name}" not found'
62 )
FileNotFoundError: Shared library with base name "bert" not found
I know I am missing something basic, pl advice.
I am trying to use llama.cpp as you suggested its merged there for the same baai 1.5 embedding models , could you please help me how should I get started. I cant figure out the equivalent of bert_tokenize part there.
Thanks
I wrote this API for Dify. Now that I'm finished coding it, I think it could be a helpful tool for others.
I am thinking about creating DeBERTa version of this project. Initially I thought to use it as a backbone, because it's easier to modify than llama.cpp, but performance is really important for my case. It was mentioned in the readme that llama.cpp realization is substantially faster, I am a beginner of ggml and llama.cpp and I don't understand why. Can someone explain it?
It could be installed by pip and can be used in other projects.
Thanks.
Rollback to oringinal code to correct it.
test with "gpt", it should be splitted to ["gp", "t"]
.
Hi, Thanks for the great work. Hope this gets merged into llama.cpp, but till then, I'm able to get things to work in the command line. However, when running the python example, I get this error:
FileNotFoundError: Shared library with base name "bert" not found
I think I'm missing a package? I did the pip install requirements bit, so not sure what I'm getting wrong.
EDIT 1: Just noticed this has been merged into llama.cpp. For some reason I get an error when loading it into llama.cpp
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: bert.context_length
This gguf was converted using bert.cpp. Does the original model have to be converted through llama.cpp?
EDIT 2: I see there's an issue with the embeddings implementation in llama.cpp
Also tried converting the model using llama.cpp convert.py but get this error:
Loading model file /home/sravanth/vecsearch/UAE-Large-V1/model.safetensors
Traceback (most recent call last):
File "/home/sravanth/llama.cpp/convert.py", line 1483, in <module>
main()
File "/home/sravanth/llama.cpp/convert.py", line 1430, in main
params = Params.load(model_plus)
File "/home/sravanth/llama.cpp/convert.py", line 317, in load
params = Params.loadHFTransformerJson(model_plus.model, hf_config_path)
File "/home/sravanth/llama.cpp/convert.py", line 256, in loadHFTransformerJson
f_norm_eps = config["rms_norm_eps"],
KeyError: 'rms_norm_eps'
For a build with Metal according to the README, see:
(base) โ bert.cpp-future git:(master) make -C build
[ 50%] Built target ggml
[ 58%] Building CXX object src/CMakeFiles/bert.dir/bert.cpp.o
In file included from /Users/turbo/dev/bert.cpp-future/src/bert.cpp:9:
/Users/turbo/dev/bert.cpp-future/src/bert.h:186:22: warning: 'bert_tokenize' has C-linkage specified, but returns incomplete type 'bert_tokens' (aka 'vector<int>') which could be incompatible with C [-Wreturn-type-c-linkage]
BERT_API bert_tokens bert_tokenize(
^
/Users/turbo/dev/bert.cpp-future/src/bert.h:192:22: warning: 'bert_detokenize' has C-linkage specified, but returns user-defined type 'bert_string' (aka 'basic_string<char>') which is incompatible with C [-Wreturn-type-c-linkage]
BERT_API bert_string bert_detokenize(
^
/Users/turbo/dev/bert.cpp-future/src/bert.cpp:358:33: error: no matching function for call to 'min'
memcpy(output, str.c_str(), std::min(n_output, str.size()));
^~~~~~~~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.