skeskinen / bert.cpp Goto Github PK

View Code? Open in Web Editor NEW

443.0 443.0 56.0 144 KB

ggml implementation of BERT

License: MIT License

CMake 8.34% Python 16.73% C++ 60.88% Shell 0.35% C 13.70%

bert.cpp's People

Contributors

Stargazers

Watchers

bert.cpp's Issues

implement do_handle_chinese_characters in tokenizing

As of yet I haven't tried what happens with Chinese/Japanese characters in tokenization. Some special handling is required since these languages don't have spaces between words.

It should be relatively simple to copy existing implementation:

Get inspiration from existing implementation like: https://github.com/huggingface/tokenizers/blob/ef5f50605ddf9f8caef1598c0e4853862b9707a7/tokenizers/src/normalizers/bert.rs#L98
Implement that in bert.cpp -> bert_normalize_prompt
Add some test cases with Asian languages to test_tokenizer.cpp, get the expected results from python Transformers lib tokenizer.

Alternatively:
Replace the whole tokenizer with the huggingface rust implementation? It should probably be at least simplified a little bit, but I would be fine adding some rust code here if it doesn't complicate the build too much.

CodeBERT

I appears Microsoft have a neat little BERT model for code search and basic comment generation, code translation, and simple refactoring. With faster inference in C++, perhaps someone can make a neat VSCode extension or CLI for it...

https://github.com/microsoft/CodeBERT

Does this support CUDA?

I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not seeing in bert.cpp where it would transfer the model or the inputs to the GPU.

Is this just not functioning yet?

none

nothing.

Issue reading ggml-model-q4_0.bin

First of all, thanks for your work! I have trouble loading the weights in 'ggml-model-q4_0.bin' model (windows, Visual Studio 2022 )

In the second loop of

    while (true)
    {
        int32_t n_dims;
        int32_t length;
        int32_t ftype;

        fin.read(reinterpret_cast<char *>(&n_dims), sizeof(n_dims));
        fin.read(reinterpret_cast<char *>(&length), sizeof(length));
        fin.read(reinterpret_cast<char *>(&ftype), sizeof(ftype));

all 3 parameter give nonsense values.

Also if, I look at the output on my machine

bert_load_from_file: loading model from 'D:/GitHub/LLM/bert.cpp/models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  12.26 MB
bert_load_from_file:

ggml ctx size is not the same as in your build example

Did the data model change? I would be gratefull if you could help me debug this

Any plan to support GGUF?

This is a good work but as ggml is phased out, any plan to support gguf

Comparison of timing results

How does the runtime performance compares with that using the regular version for all-MiniLM-L6-v2?

convert 'chatglm-6b' model failed

Forgive me for my ignorance, seems 'chatglm-6b' is based on GLM framework, which is a BERT-style model. but its model repo does not contain tokenizer.json and vocab.txt and python models/convert-to-ggml.py failed. How could I make it possible to run with bert.cpp? thanks!

Cannot build " set_target_properties Can not find target to add properties to: ggml"

(.venv) ➜  build git:(master) ✗ cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at CMakeLists.txt:190 (add_subdirectory):
  The source directory

    /home/user/Tools/06_MachineLearning/BERT/bert.cpp/ggml

  does not contain a CMakeLists.txt file.


CMake Error at CMakeLists.txt:201 (set_target_properties):
  set_target_properties Can not find target to add properties to: ggml


-- Configuring incomplete, errors occurred!
See also "/home/user/Tools/06_MachineLearning/BERT/bert.cpp/build/CMakeFiles/CMakeOutput.log".

Q&A example?

Forgive me for my ignorance, but I don't quite understand how BERT search and BERT Q&A differ. From what I can see, both using cosine similarity to find matching embedding, just trained on different datasets. If that is the case, would it mean that just taking multi-qa-MiniLM-L6-cos-v1 model and converting it to GGML would just work without any changes? If so, can it be added to the list of downloadable models?

How do I properly build this on windows?

Sorry for the dumb question, but I noticed Windows support was added: #29.

I'm struggling to get a proper build for the native binaries. For example, if I run this step:

mkdir build
cd build
cmake .. -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Release
make
cd ..

This fails due to windows not support pthread. This was fixed by using the mingw64 cmake:

cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -G "MinGW Makefiles"
mingw32-make

This results in the following files:

libbert.dll.a
bin/libbert.dll
bin/libggml.dll

The rest of the files are generated files that aren't needed, so I've left them out.

Do you know why this is missing the server/client's executables? Is it missing anything else that's necessary to make it run?

Error Running Server Example

When I try to run the server example I get an error

bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  12.26 MB
libc++abi: terminating due to uncaught exception of type std::length_error: basic_string

Running on Mac m1

Is it possible to make multiple socket connection at the same time from same machine.

We ran one fast api on a machine on two different port. The API make socket connection to the bert.cpp socket server. In this only first app is connected and when second is started it just hangs after the socket.connect method.
Is it possible to make multiple socket connection at the same time from same machine?

BrokenPipeError: [Errno 32] Broken pipe

Traceback (most recent call last):
  File "/bert.cpp/examples/sample_client.py", line 69, in <module>
    embedding = embed_text(text)
  File "/bert.cpp/examples/sample_client.py", line 49, in embed_text
    embedding = embeddings_from_local_server(text, sock)
  File "/bert.cpp/examples/sample_client.py", line 16, in embeddings_from_local_server
    sock.sendall(s.encode())
BrokenPipeError: [Errno 32] Broken pipe

It works with small texts, but fails when passing a larger paragraph. Can someone give a help on this ?

I really appreciate your kind words about our project! May I ask which model and configuration of CPU you are using?

I tried to reproduce your method, but the implementation I tested differs significantly from yours, so I'm not sure if it's due to the limited computing power of the computer.

Classification Pipeline

Is it possible to retrofit BERT classification model into this code ?

Can you please provide me some guidelines so that I can take care of it myself ?

Thanks in advance.

why does the program ignore the three layers?

In the convertion script, it ignores embeddings.position_ids

if name in ['embeddings.position_ids', 'pooler.dense.weight', 'pooler.dense.bias']:
    continue

why?

About the calculation of overhead.

ggerganov/ggml#356

Is this ever going to be updated?

Is this repository ever going to be updated and/or worked on or has it been abandoned?

Tokenizer in bert.cpp is not good enough, how about `tokenizers-cpp`

As mention in title, https://github.com/mlc-ai/tokenizers-cpp is a good implement for token.
Maybe persons do not like another dependency, but it is worthy.

Can this handle classification models, e.g. MNLI?

For example, a model like this:
https://huggingface.co/aloxatel/bert-base-mnli

If so, how would I do inference on it?

Running `bert_load_from_file` from code results with ` 'embeddings.word_embeddings.weight' has wrong shape in model file`

I am using Rust to compile bert.cpp and use bert_load_from_file as an extern, I am able to pass the path to the module successfully, but I get the following output:

bert_load_from_file: loading model from './models/ggml-model-f32.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 0
bert_load_from_file: ggml ctx size =  86.10 MB
bert_load_from_file: tensor 'embeddings.word_embeddings.weight' has wrong shape in model file: got [2, 384], expected [384, 30522]

Tried this with all quantizations, using a model downloaded using python3 models/download-ggml.py download all-MiniLM-L6-v2 {Q}

Provide C api in bert.h

For convenience of downstream applications, the API shouldn't have any C++ stuff.
There could be separate util.h with C++ API, like in ggml and llama.cpp

throwing the error while running the model

Hi i have been implementing the bert cpp but i am facing the following error
OSError: dlopen(examples/../build/libbert.so, 0x0006): tried: 'examples/../build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSexamples/../build/libbert.so' (no such file), '/usr/lib/examples/../build/libbert.so' (no such file, not in dyld cache), 'examples/../build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file)
Exception ignored in: <function BertModel.del at 0x7f95006a8b80>
Traceback (most recent call last):
File "examples/sample_dylib.py", line 40, in del
self.lib.bert_free(self.ctx)
AttributeError: 'BertModel' object has no attribute 'lib'

From the error i can understand that the file is missing i have done same procedure for installing the bert.cpp as mention can you please help me .

Thanks
Jasheen shaik

Support for cjk. Please！and get an error on windows

Tokenizer doesn't correctly handle asian writing (CJK, maybe others)

Support for cjk. Please
Or Just Chinese

How can I increase n_max_tokens?

After modifying the value of n_max_tokens in bert.cpp from "int32_t n_max_tokens = 512;" to "int32_t n_max_tokens = 10000;", I proceeded to rebuild the project. However, upon testing, the value of n_max_tokens remained unchanged at 512 despite the modification.

subword `#` should be an option.

For bert, there are many models use # for subword symbol, but not all.
Some popular bert-based models defined their own subword symbol.

For example, in e5 the symbol is ▁.

>>> a = '▁'
>>> a.encode('utf-8')
b'\xe2\x96\x81'

update to current ggml

There has been a lot of changes in the way quantization works in ggml, could you update the project to use the newer ggml tree and update the conversion script?

Error load model

./main -m ./all-MiniLM-L6-v2/ggml-model-q4_0.bin
bert_load_from_file: invalid model file './MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from './models/MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin'

Hi guys, why is it giving me this error?

[Regression] WASM alignment fault after ggml update

Following #24 WASM compiled library does not work. Nor f32, nor fresh q4_0 models work.

Here is what I am getting in console: Uncaught (in promise) RuntimeError: Aborted(alignment fault). Exact line that is falling before 'SAFE_HEAP_STORE_i64_8_8' call is ggml.c:4632, which have this content:

    for (int i = 0; i < n_dims; i++) {
        result->ne[i] = ne[i];
    }

Chrome Console Output

[C/C++ DevTools Support (DWARF)] Loading debug symbols for wasm://wasm/017aede6...
index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:93 Writing model to filesystem... because:  No such file or directory
bert.wasm.js:1415 bert_load_from_file: loading model from '/ggml-model-q4_0.bin' - please wait ...
bert.wasm.js:1415 bert_load_from_file: n_vocab = 30522
bert.wasm.js:1415 bert_load_from_file: n_max_tokens   = 512
bert.wasm.js:1415 bert_load_from_file: n_embd  = 384
bert.wasm.js:1415 bert_load_from_file: n_intermediate  = 1536
bert.wasm.js:1415 bert_load_from_file: n_head  = 12
bert.wasm.js:1415 bert_load_from_file: n_layer = 6
bert.wasm.js:1415 bert_load_from_file: f16     = 2
[C/C++ DevTools Support (DWARF)] Loaded debug symbols for wasm://wasm/017aede6, found 567 source file(s)
bert.wasm.js:1415 bert_load_from_file: ggml ctx size =  12.26 MB
bert.wasm.js:581 Aborted(alignment fault)
abort @ bert.wasm.js:581
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507
bert.wasm.js:584 Uncaught (in promise) RuntimeError: Aborted(alignment fault)
    at abort (bert.wasm.js:584:10)
    at alignfault (bert.wasm.js:365:2)
    at SAFE_HEAP_STORE_i64_8_8 (017aede6:0x129fa4)
    at ggml_new_tensor_impl (ggml.c:4632)
    at ggml_new_tensor (ggml.c:4667)
    at ggml_new_tensor_2d (ggml.c:4683)
    at ::bert_load_from_file(const char *) (bert.cpp:495)
    at embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const (emscripten.cpp:24)
    at embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) (emscripten.cpp:22)
    at emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) (bind.h:416)
abort @ bert.wasm.js:584
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507

Can not infer quantized model, but fp32 works well.

run main.exe with a q4_0 quantized model.

for q4_0, it complains assert in ggml.c

in function static void ggml_compute_forward_soft_max_f32(

line 9342: assert(sum > 0.0); sum is -nan (ind)

Segfault on large inputs?

When I run the the build/bin/main example with a larger input I get a segfault:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040)
Segmentation fault

I can work around this by doing an N *= 2; near the bottom of bert_load_from_file, but obviously that isn't the right solution. It seems somewhere a calculation is off (probably with mem_per_token).

converter does not work with the current ggml

Tried to convert https://huggingface.co/intfloat/e5-large-v2 to ggml with the current d9f04e609fb7f7e5fb3b20a77d4d685219971009 commit. However, execution of the converted f32, f16, q4_0, and q4_1 models shows the not enough space in the context's memory pool message. Maybe it is related to ggerganov/ggml#158 ?

Unable to build static library

Hi, thanks for this library! Trying to build under Linux, I'm getting this error:

~/code/bert.cpp/build$ cmake .. -DBERT_STATIC=ON -DBUILD_SHARED_LIBS=ON
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jjzazuet/code/bert.cpp/build

~/code/bert.cpp/build$ make
[ 25%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 50%] Linking C shared library libggml.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginT.o: relocation R_X86_64_32 against hidden symbol `__TMC_END__' can not be used when making a shared object
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:97: ggml/src/libggml.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:141: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Perhaps I'm missing something obvious, but any help or pointers are appreciated.

$ uname -a
Linux echoes 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

$ cat /etc/debian_version 
12.0

Thanks!

skeskinen / bert.cpp Goto Github PK

bert.cpp's People

Contributors

Stargazers

Watchers

Forkers

bert.cpp's Issues

Recommend Projects

Recommend Topics

Recommend Org