skeskinen / bert.cpp Goto Github PK
View Code? Open in Web Editor NEWggml implementation of BERT
License: MIT License
ggml implementation of BERT
License: MIT License
As of yet I haven't tried what happens with Chinese/Japanese characters in tokenization. Some special handling is required since these languages don't have spaces between words.
It should be relatively simple to copy existing implementation:
Alternatively:
Replace the whole tokenizer with the huggingface rust implementation? It should probably be at least simplified a little bit, but I would be fine adding some rust code here if it doesn't complicate the build too much.
I appears Microsoft have a neat little BERT model for code search and basic comment generation, code translation, and simple refactoring. With faster inference in C++, perhaps someone can make a neat VSCode extension or CLI for it...
I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not seeing in bert.cpp where it would transfer the model or the inputs to the GPU.
Is this just not functioning yet?
nothing.
First of all, thanks for your work! I have trouble loading the weights in 'ggml-model-q4_0.bin' model (windows, Visual Studio 2022 )
In the second loop of
while (true)
{
int32_t n_dims;
int32_t length;
int32_t ftype;
fin.read(reinterpret_cast<char *>(&n_dims), sizeof(n_dims));
fin.read(reinterpret_cast<char *>(&length), sizeof(length));
fin.read(reinterpret_cast<char *>(&ftype), sizeof(ftype));
all 3 parameter give nonsense values.
Also if, I look at the output on my machine
bert_load_from_file: loading model from 'D:/GitHub/LLM/bert.cpp/models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens = 512
bert_load_from_file: n_embd = 384
bert_load_from_file: n_intermediate = 1536
bert_load_from_file: n_head = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16 = 2
bert_load_from_file: ggml ctx size = 12.26 MB
bert_load_from_file:
ggml ctx size is not the same as in your build example
Did the data model change? I would be gratefull if you could help me debug this
This is a good work but as ggml is phased out, any plan to support gguf
How does the runtime performance compares with that using the regular version for all-MiniLM-L6-v2?
Forgive me for my ignorance, seems 'chatglm-6b' is based on GLM framework, which is a BERT-style model. but its model repo does not contain tokenizer.json
and vocab.txt
and python models/convert-to-ggml.py
failed. How could I make it possible to run with bert.cpp? thanks!
(.venv) ➜ build git:(master) ✗ cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at CMakeLists.txt:190 (add_subdirectory):
The source directory
/home/user/Tools/06_MachineLearning/BERT/bert.cpp/ggml
does not contain a CMakeLists.txt file.
CMake Error at CMakeLists.txt:201 (set_target_properties):
set_target_properties Can not find target to add properties to: ggml
-- Configuring incomplete, errors occurred!
See also "/home/user/Tools/06_MachineLearning/BERT/bert.cpp/build/CMakeFiles/CMakeOutput.log".
Forgive me for my ignorance, but I don't quite understand how BERT search and BERT Q&A differ. From what I can see, both using cosine similarity to find matching embedding, just trained on different datasets. If that is the case, would it mean that just taking multi-qa-MiniLM-L6-cos-v1 model and converting it to GGML would just work without any changes? If so, can it be added to the list of downloadable models?
Sorry for the dumb question, but I noticed Windows support was added: #29.
I'm struggling to get a proper build for the native binaries. For example, if I run this step:
mkdir build
cd build
cmake .. -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Release
make
cd ..
This fails due to windows not support pthread. This was fixed by using the mingw64 cmake:
cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -G "MinGW Makefiles"
mingw32-make
This results in the following files:
libbert.dll.a
bin/libbert.dll
bin/libggml.dll
The rest of the files are generated files that aren't needed, so I've left them out.
Do you know why this is missing the server/client's executables? Is it missing anything else that's necessary to make it run?
When I try to run the server example I get an error
bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens = 512
bert_load_from_file: n_embd = 384
bert_load_from_file: n_intermediate = 1536
bert_load_from_file: n_head = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16 = 2
bert_load_from_file: ggml ctx size = 12.26 MB
libc++abi: terminating due to uncaught exception of type std::length_error: basic_string
Running on Mac m1
We ran one fast api on a machine on two different port. The API make socket connection to the bert.cpp socket server. In this only first app is connected and when second is started it just hangs after the socket.connect method.
Is it possible to make multiple socket connection at the same time from same machine?
Traceback (most recent call last):
File "/bert.cpp/examples/sample_client.py", line 69, in <module>
embedding = embed_text(text)
File "/bert.cpp/examples/sample_client.py", line 49, in embed_text
embedding = embeddings_from_local_server(text, sock)
File "/bert.cpp/examples/sample_client.py", line 16, in embeddings_from_local_server
sock.sendall(s.encode())
BrokenPipeError: [Errno 32] Broken pipe
It works with small texts, but fails when passing a larger paragraph. Can someone give a help on this ?
I tried to reproduce your method, but the implementation I tested differs significantly from yours, so I'm not sure if it's due to the limited computing power of the computer.
Is it possible to retrofit BERT classification model into this code ?
Can you please provide me some guidelines so that I can take care of it myself ?
Thanks in advance.
In the convertion script, it ignores embeddings.position_ids
if name in ['embeddings.position_ids', 'pooler.dense.weight', 'pooler.dense.bias']:
continue
why?
Is this repository ever going to be updated and/or worked on or has it been abandoned?
As mention in title, https://github.com/mlc-ai/tokenizers-cpp
is a good implement for token.
Maybe persons do not like another dependency, but it is worthy.
For example, a model like this:
https://huggingface.co/aloxatel/bert-base-mnli
If so, how would I do inference on it?
I am using Rust to compile bert.cpp
and use bert_load_from_file
as an extern
, I am able to pass the path to the module successfully, but I get the following output:
bert_load_from_file: loading model from './models/ggml-model-f32.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens = 512
bert_load_from_file: n_embd = 384
bert_load_from_file: n_intermediate = 1536
bert_load_from_file: n_head = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16 = 0
bert_load_from_file: ggml ctx size = 86.10 MB
bert_load_from_file: tensor 'embeddings.word_embeddings.weight' has wrong shape in model file: got [2, 384], expected [384, 30522]
Tried this with all quantizations, using a model downloaded using python3 models/download-ggml.py download all-MiniLM-L6-v2 {Q}
For convenience of downstream applications, the API shouldn't have any C++ stuff.
There could be separate util.h with C++ API, like in ggml and llama.cpp
Hi i have been implementing the bert cpp but i am facing the following error
OSError: dlopen(examples/../build/libbert.so, 0x0006): tried: 'examples/../build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSexamples/../build/libbert.so' (no such file), '/usr/lib/examples/../build/libbert.so' (no such file, not in dyld cache), 'examples/../build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file)
Exception ignored in: <function BertModel.del at 0x7f95006a8b80>
Traceback (most recent call last):
File "examples/sample_dylib.py", line 40, in del
self.lib.bert_free(self.ctx)
AttributeError: 'BertModel' object has no attribute 'lib'
From the error i can understand that the file is missing i have done same procedure for installing the bert.cpp as mention can you please help me .
Thanks
Jasheen shaik
For bert, there are many models use #
for subword symbol, but not all.
Some popular bert-based models defined their own subword symbol.
For example, in e5
the symbol is ▁
.
>>> a = '▁'
>>> a.encode('utf-8')
b'\xe2\x96\x81'
There has been a lot of changes in the way quantization works in ggml, could you update the project to use the newer ggml tree and update the conversion script?
./main -m ./all-MiniLM-L6-v2/ggml-model-q4_0.bin
bert_load_from_file: invalid model file './MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from './models/MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin'
Hi guys, why is it giving me this error?
Following #24 WASM compiled library does not work. Nor f32, nor fresh q4_0 models work.
Here is what I am getting in console: Uncaught (in promise) RuntimeError: Aborted(alignment fault)
. Exact line that is falling before 'SAFE_HEAP_STORE_i64_8_8'
call is ggml.c:4632, which have this content:
for (int i = 0; i < n_dims; i++) {
result->ne[i] = ne[i];
}
[C/C++ DevTools Support (DWARF)] Loading debug symbols for wasm://wasm/017aede6...
index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:93 Writing model to filesystem... because: No such file or directory
bert.wasm.js:1415 bert_load_from_file: loading model from '/ggml-model-q4_0.bin' - please wait ...
bert.wasm.js:1415 bert_load_from_file: n_vocab = 30522
bert.wasm.js:1415 bert_load_from_file: n_max_tokens = 512
bert.wasm.js:1415 bert_load_from_file: n_embd = 384
bert.wasm.js:1415 bert_load_from_file: n_intermediate = 1536
bert.wasm.js:1415 bert_load_from_file: n_head = 12
bert.wasm.js:1415 bert_load_from_file: n_layer = 6
bert.wasm.js:1415 bert_load_from_file: f16 = 2
[C/C++ DevTools Support (DWARF)] Loaded debug symbols for wasm://wasm/017aede6, found 567 source file(s)
bert.wasm.js:1415 bert_load_from_file: ggml ctx size = 12.26 MB
bert.wasm.js:581 Aborted(alignment fault)
abort @ bert.wasm.js:581
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507
bert.wasm.js:584 Uncaught (in promise) RuntimeError: Aborted(alignment fault)
at abort (bert.wasm.js:584:10)
at alignfault (bert.wasm.js:365:2)
at SAFE_HEAP_STORE_i64_8_8 (017aede6:0x129fa4)
at ggml_new_tensor_impl (ggml.c:4632)
at ggml_new_tensor (ggml.c:4667)
at ggml_new_tensor_2d (ggml.c:4683)
at ::bert_load_from_file(const char *) (bert.cpp:495)
at embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const (emscripten.cpp:24)
at embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) (emscripten.cpp:22)
at emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) (bind.h:416)
abort @ bert.wasm.js:584
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507
run main.exe with a q4_0 quantized model.
for q4_0, it complains assert in ggml.c
in function static void ggml_compute_forward_soft_max_f32(
line 9342: assert(sum > 0.0); sum is -nan (ind)
When I run the the build/bin/main
example with a larger input I get a segfault:
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040)
Segmentation fault
I can work around this by doing an N *= 2;
near the bottom of bert_load_from_file
, but obviously that isn't the right solution. It seems somewhere a calculation is off (probably with mem_per_token
).
Tried to convert https://huggingface.co/intfloat/e5-large-v2
to ggml with the current d9f04e609fb7f7e5fb3b20a77d4d685219971009
commit. However, execution of the converted f32, f16, q4_0, and q4_1 models shows the not enough space in the context's memory pool
message. Maybe it is related to ggerganov/ggml#158 ?
Hi, thanks for this library! Trying to build under Linux, I'm getting this error:
~/code/bert.cpp/build$ cmake .. -DBERT_STATIC=ON -DBUILD_SHARED_LIBS=ON
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jjzazuet/code/bert.cpp/build
~/code/bert.cpp/build$ make
[ 25%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 50%] Linking C shared library libggml.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginT.o: relocation R_X86_64_32 against hidden symbol `__TMC_END__' can not be used when making a shared object
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:97: ggml/src/libggml.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:141: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
Perhaps I'm missing something obvious, but any help or pointers are appreciated.
$ uname -a
Linux echoes 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux
$ cat /etc/debian_version
12.0
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.