Giter VIP home page Giter VIP logo

bert.cpp's People

Contributors

dranger003 avatar hlhr202 avatar lindeer avatar marclove avatar skeskinen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert.cpp's Issues

implement do_handle_chinese_characters in tokenizing

As of yet I haven't tried what happens with Chinese/Japanese characters in tokenization. Some special handling is required since these languages don't have spaces between words.

It should be relatively simple to copy existing implementation:

  1. Get inspiration from existing implementation like: https://github.com/huggingface/tokenizers/blob/ef5f50605ddf9f8caef1598c0e4853862b9707a7/tokenizers/src/normalizers/bert.rs#L98
  2. Implement that in bert.cpp -> bert_normalize_prompt
  3. Add some test cases with Asian languages to test_tokenizer.cpp, get the expected results from python Transformers lib tokenizer.

Alternatively:
Replace the whole tokenizer with the huggingface rust implementation? It should probably be at least simplified a little bit, but I would be fine adding some rust code here if it doesn't complicate the build too much.

CodeBERT

I appears Microsoft have a neat little BERT model for code search and basic comment generation, code translation, and simple refactoring. With faster inference in C++, perhaps someone can make a neat VSCode extension or CLI for it...

https://github.com/microsoft/CodeBERT

Does this support CUDA?

I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not seeing in bert.cpp where it would transfer the model or the inputs to the GPU.

Is this just not functioning yet?

Issue reading ggml-model-q4_0.bin

First of all, thanks for your work! I have trouble loading the weights in 'ggml-model-q4_0.bin' model (windows, Visual Studio 2022 )

In the second loop of

    while (true)
    {
        int32_t n_dims;
        int32_t length;
        int32_t ftype;

        fin.read(reinterpret_cast<char *>(&n_dims), sizeof(n_dims));
        fin.read(reinterpret_cast<char *>(&length), sizeof(length));
        fin.read(reinterpret_cast<char *>(&ftype), sizeof(ftype));

all 3 parameter give nonsense values.

Also if, I look at the output on my machine

bert_load_from_file: loading model from 'D:/GitHub/LLM/bert.cpp/models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  12.26 MB
bert_load_from_file:

ggml ctx size is not the same as in your build example

Did the data model change? I would be gratefull if you could help me debug this

convert 'chatglm-6b' model failed

Forgive me for my ignorance, seems 'chatglm-6b' is based on GLM framework, which is a BERT-style model. but its model repo does not contain tokenizer.json and vocab.txt and python models/convert-to-ggml.py failed. How could I make it possible to run with bert.cpp? thanks!

Cannot build " set_target_properties Can not find target to add properties to: ggml"

(.venv) ➜  build git:(master) ✗ cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at CMakeLists.txt:190 (add_subdirectory):
  The source directory

    /home/user/Tools/06_MachineLearning/BERT/bert.cpp/ggml

  does not contain a CMakeLists.txt file.


CMake Error at CMakeLists.txt:201 (set_target_properties):
  set_target_properties Can not find target to add properties to: ggml


-- Configuring incomplete, errors occurred!
See also "/home/user/Tools/06_MachineLearning/BERT/bert.cpp/build/CMakeFiles/CMakeOutput.log".

Q&A example?

Forgive me for my ignorance, but I don't quite understand how BERT search and BERT Q&A differ. From what I can see, both using cosine similarity to find matching embedding, just trained on different datasets. If that is the case, would it mean that just taking multi-qa-MiniLM-L6-cos-v1 model and converting it to GGML would just work without any changes? If so, can it be added to the list of downloadable models?

How do I properly build this on windows?

Sorry for the dumb question, but I noticed Windows support was added: #29.

I'm struggling to get a proper build for the native binaries. For example, if I run this step:

mkdir build
cd build
cmake .. -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Release
make
cd ..

This fails due to windows not support pthread. This was fixed by using the mingw64 cmake:

cmake .. -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -G "MinGW Makefiles"
mingw32-make

This results in the following files:

libbert.dll.a
bin/libbert.dll
bin/libggml.dll

The rest of the files are generated files that aren't needed, so I've left them out.

Do you know why this is missing the server/client's executables? Is it missing anything else that's necessary to make it run?

Error Running Server Example

When I try to run the server example I get an error

bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 2
bert_load_from_file: ggml ctx size =  12.26 MB
libc++abi: terminating due to uncaught exception of type std::length_error: basic_string

Running on Mac m1

BrokenPipeError: [Errno 32] Broken pipe

Traceback (most recent call last):
  File "/bert.cpp/examples/sample_client.py", line 69, in <module>
    embedding = embed_text(text)
  File "/bert.cpp/examples/sample_client.py", line 49, in embed_text
    embedding = embeddings_from_local_server(text, sock)
  File "/bert.cpp/examples/sample_client.py", line 16, in embeddings_from_local_server
    sock.sendall(s.encode())
BrokenPipeError: [Errno 32] Broken pipe

It works with small texts, but fails when passing a larger paragraph. Can someone give a help on this ?

Classification Pipeline

Is it possible to retrofit BERT classification model into this code ?

Can you please provide me some guidelines so that I can take care of it myself ?

Thanks in advance.

Running `bert_load_from_file` from code results with ` 'embeddings.word_embeddings.weight' has wrong shape in model file`

I am using Rust to compile bert.cpp and use bert_load_from_file as an extern, I am able to pass the path to the module successfully, but I get the following output:

bert_load_from_file: loading model from './models/ggml-model-f32.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens   = 512
bert_load_from_file: n_embd  = 384
bert_load_from_file: n_intermediate  = 1536
bert_load_from_file: n_head  = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16     = 0
bert_load_from_file: ggml ctx size =  86.10 MB
bert_load_from_file: tensor 'embeddings.word_embeddings.weight' has wrong shape in model file: got [2, 384], expected [384, 30522]

Tried this with all quantizations, using a model downloaded using python3 models/download-ggml.py download all-MiniLM-L6-v2 {Q}

Provide C api in bert.h

For convenience of downstream applications, the API shouldn't have any C++ stuff.
There could be separate util.h with C++ API, like in ggml and llama.cpp

throwing the error while running the model

Hi i have been implementing the bert cpp but i am facing the following error
OSError: dlopen(examples/../build/libbert.so, 0x0006): tried: 'examples/../build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSexamples/../build/libbert.so' (no such file), '/usr/lib/examples/../build/libbert.so' (no such file, not in dyld cache), 'examples/../build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file), '/Users/compl-558jasheen/Downloads/bert.cpp-master/build/libbert.so' (no such file)
Exception ignored in: <function BertModel.del at 0x7f95006a8b80>
Traceback (most recent call last):
File "examples/sample_dylib.py", line 40, in del
self.lib.bert_free(self.ctx)
AttributeError: 'BertModel' object has no attribute 'lib'

From the error i can understand that the file is missing i have done same procedure for installing the bert.cpp as mention can you please help me .

Thanks
Jasheen shaik

How can I increase n_max_tokens?

After modifying the value of n_max_tokens in bert.cpp from "int32_t n_max_tokens = 512;" to "int32_t n_max_tokens = 10000;", I proceeded to rebuild the project. However, upon testing, the value of n_max_tokens remained unchanged at 512 despite the modification.
image

subword `#` should be an option.

For bert, there are many models use # for subword symbol, but not all.
Some popular bert-based models defined their own subword symbol.

For example, in e5 the symbol is .

>>> a = '▁'
>>> a.encode('utf-8')
b'\xe2\x96\x81'

update to current ggml

There has been a lot of changes in the way quantization works in ggml, could you update the project to use the newer ggml tree and update the conversion script?

Error load model

./main -m ./all-MiniLM-L6-v2/ggml-model-q4_0.bin
bert_load_from_file: invalid model file './MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from './models/MiniLM(bert.cpp)/all-MiniLM-L6-v2/ggml-model-q4_0.bin'

Hi guys, why is it giving me this error?

[Regression] WASM alignment fault after ggml update

Following #24 WASM compiled library does not work. Nor f32, nor fresh q4_0 models work.

Here is what I am getting in console: Uncaught (in promise) RuntimeError: Aborted(alignment fault). Exact line that is falling before 'SAFE_HEAP_STORE_i64_8_8' call is ggml.c:4632, which have this content:

    for (int i = 0; i < n_dims; i++) {
        result->ne[i] = ne[i];
    }
Chrome Console Output
[C/C++ DevTools Support (DWARF)] Loading debug symbols for wasm://wasm/017aede6...
index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:93 Writing model to filesystem... because:  No such file or directory
bert.wasm.js:1415 bert_load_from_file: loading model from '/ggml-model-q4_0.bin' - please wait ...
bert.wasm.js:1415 bert_load_from_file: n_vocab = 30522
bert.wasm.js:1415 bert_load_from_file: n_max_tokens   = 512
bert.wasm.js:1415 bert_load_from_file: n_embd  = 384
bert.wasm.js:1415 bert_load_from_file: n_intermediate  = 1536
bert.wasm.js:1415 bert_load_from_file: n_head  = 12
bert.wasm.js:1415 bert_load_from_file: n_layer = 6
bert.wasm.js:1415 bert_load_from_file: f16     = 2
[C/C++ DevTools Support (DWARF)] Loaded debug symbols for wasm://wasm/017aede6, found 567 source file(s)
bert.wasm.js:1415 bert_load_from_file: ggml ctx size =  12.26 MB
bert.wasm.js:581 Aborted(alignment fault)
abort @ bert.wasm.js:581
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507
bert.wasm.js:584 Uncaught (in promise) RuntimeError: Aborted(alignment fault)
    at abort (bert.wasm.js:584:10)
    at alignfault (bert.wasm.js:365:2)
    at SAFE_HEAP_STORE_i64_8_8 (017aede6:0x129fa4)
    at ggml_new_tensor_impl (ggml.c:4632)
    at ggml_new_tensor (ggml.c:4667)
    at ggml_new_tensor_2d (ggml.c:4683)
    at ::bert_load_from_file(const char *) (bert.cpp:495)
    at embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const (emscripten.cpp:24)
    at embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) (emscripten.cpp:22)
    at emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) (bind.h:416)
abort @ bert.wasm.js:584
alignfault @ bert.wasm.js:365
$SAFE_HEAP_STORE_i64_8_8 @ 017aede6:0x129fa4
$ggml_new_tensor_impl @ ggml.c:4632
$ggml_new_tensor @ ggml.c:4667
$ggml_new_tensor_2d @ ggml.c:4683
$bert_load_from_file @ bert.cpp:495
$embind_init_bert()::$_0::operator()(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) const @ emscripten.cpp:24
$embind_init_bert()::$_0::__invoke(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&) @ emscripten.cpp:22
$emscripten::internal::Invoker<void, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&>::invoke(void (*)(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const&), emscripten::internal::BindingType<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>, void>::'unnamed'*) @ bind.h:416
(anonymous) @ bert.wasm.js:4079
Module.onRuntimeInitialized @ index.html?_ijt=8885d1g3pefs1slvbkk1nbfo9s&_ij_reload=RELOAD_ON_SAVE:98
await in Module.onRuntimeInitialized (async)
doRun @ bert.wasm.js:5620
run @ bert.wasm.js:5633
runCaller @ bert.wasm.js:5596
removeRunDependency @ bert.wasm.js:571
receiveInstance @ bert.wasm.js:706
receiveInstantiationResult @ bert.wasm.js:714
Promise.then (async)
instantiateArrayBuffer @ bert.wasm.js:666
instantiateAsync @ bert.wasm.js:688
createWasm @ bert.wasm.js:724
(anonymous) @ bert.wasm.js:5507

Segfault on large inputs?

When I run the the build/bin/main example with a larger input I get a segfault:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040)
Segmentation fault

I can work around this by doing an N *= 2; near the bottom of bert_load_from_file, but obviously that isn't the right solution. It seems somewhere a calculation is off (probably with mem_per_token).

converter does not work with the current ggml

Tried to convert https://huggingface.co/intfloat/e5-large-v2 to ggml with the current d9f04e609fb7f7e5fb3b20a77d4d685219971009 commit. However, execution of the converted f32, f16, q4_0, and q4_1 models shows the not enough space in the context's memory pool message. Maybe it is related to ggerganov/ggml#158 ?

Unable to build static library

Hi, thanks for this library! Trying to build under Linux, I'm getting this error:

~/code/bert.cpp/build$ cmake .. -DBERT_STATIC=ON -DBUILD_SHARED_LIBS=ON
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jjzazuet/code/bert.cpp/build
~/code/bert.cpp/build$ make
[ 25%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 50%] Linking C shared library libggml.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginT.o: relocation R_X86_64_32 against hidden symbol `__TMC_END__' can not be used when making a shared object
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:97: ggml/src/libggml.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:141: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Perhaps I'm missing something obvious, but any help or pointers are appreciated.

$ uname -a
Linux echoes 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

$ cat /etc/debian_version 
12.0

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.