Giter VIP home page Giter VIP logo

bert.cpp's People

Contributors

dranger003 avatar ggerganov avatar hlhr202 avatar iamlemec avatar lindeer avatar marclove avatar skeskinen avatar snowyu avatar sroussey avatar xyzhang626 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bert.cpp's Issues

Can BAAI/bge-m3 will be supported?

Thank you for your excellent work.

bge-m3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

When run the following command:
python convert-to-ggml.py './bge-m3' f16

Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: './bge-m3/vocab.txt'

Will make some changes to convert-to-ggml.py to support the new model?

Memory usage and slowness question

In CPU build, I experienced this issue both here and in the llama.cpp version.

GGML is built for edge AI so resource constrained devices but Both in CLI and as python, even for a small text of say 64 tokens the code seems to run very slow when the RAM available is 10Gb or low but runs really FAST when with more RAM like 15-20GB. I used 6 threads. With less threads again things slow.

Something is wrong. Have you encountered this ?

BERT MLM Question

Hi there, I understand the GGML implementation focuses on BERT like models as embedding models. So output is pooled and normalised. But I am interested in using BERT as a MLM model with the MaskedLM header on top of the output layer like below. Could you please advise or help in accomodating bert.cpp to return MLM logits instead of pooled embeddings. so for a 8 token input including CLS and SEP the out will be of shape 1 x 10 x 30522 (bs, seq_len, vocab_size).

Thanks in advance,

BertForMaskedLM(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (cls): BertOnlyMLMHead(
    (predictions): BertLMPredictionHead(
      (transform): BertPredictionHeadTransform(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (transform_act_fn): GELUActivation()
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      )
      (decoder): Linear(in_features=768, out_features=30522, bias=True)
    )
  )
)

Python binding gives error

conversion is fine running the below

from bert_cpp import BertModel
mod = BertModel('models/bge-base-en-v1.5-f16.gguf')
batch = ["Hello, how are you"]
emb = mod.embed(batch)

throws

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-23-145b8ed4c48e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from bert_cpp import BertModel
      2 mod = BertModel('models/bge-base-en-v1.5-f16.gguf')
      3 batch = ["Hello, how are you"]
      4 emb = mod.embed(batch)

3 frames
[/content/bert.cpp/bert_cpp/utils.py](https://localhost:8080/#) in load_shared_library(lib_base_name)
     58                 raise RuntimeError(f'Failed to load shared library "{lib_path}": {e}')
     59 
---> 60     raise FileNotFoundError(
     61         f'Shared library with base name "{lib_base_name}" not found'
     62     )

FileNotFoundError: Shared library with base name "bert" not found

I know I am missing something basic, pl advice.

Using llama.cpp

I am trying to use llama.cpp as you suggested its merged there for the same baai 1.5 embedding models , could you please help me how should I get started. I cant figure out the equivalent of bert_tokenize part there.

Thanks

Why llama.cpp runs substantially faster

I am thinking about creating DeBERTa version of this project. Initially I thought to use it as a backbone, because it's easier to modify than llama.cpp, but performance is really important for my case. It was mentioned in the readme that llama.cpp realization is substantially faster, I am a beginner of ggml and llama.cpp and I don't understand why. Can someone explain it?

Works great on command line, but unable to use via python

Hi, Thanks for the great work. Hope this gets merged into llama.cpp, but till then, I'm able to get things to work in the command line. However, when running the python example, I get this error:

FileNotFoundError: Shared library with base name "bert" not found

I think I'm missing a package? I did the pip install requirements bit, so not sure what I'm getting wrong.

EDIT 1: Just noticed this has been merged into llama.cpp. For some reason I get an error when loading it into llama.cpp

llama_model_load: error loading model: error loading model hyperparameters: key not found in model: bert.context_length

This gguf was converted using bert.cpp. Does the original model have to be converted through llama.cpp?

EDIT 2: I see there's an issue with the embeddings implementation in llama.cpp

Also tried converting the model using llama.cpp convert.py but get this error:

Loading model file /home/sravanth/vecsearch/UAE-Large-V1/model.safetensors
Traceback (most recent call last):
  File "/home/sravanth/llama.cpp/convert.py", line 1483, in <module>
    main()
  File "/home/sravanth/llama.cpp/convert.py", line 1430, in main
    params = Params.load(model_plus)
  File "/home/sravanth/llama.cpp/convert.py", line 317, in load
    params = Params.loadHFTransformerJson(model_plus.model, hf_config_path)
  File "/home/sravanth/llama.cpp/convert.py", line 256, in loadHFTransformerJson
    f_norm_eps        = config["rms_norm_eps"],
KeyError: 'rms_norm_eps'

Compilation Error on macOS

For a build with Metal according to the README, see:

(base) โžœ  bert.cpp-future git:(master) make -C build
[ 50%] Built target ggml
[ 58%] Building CXX object src/CMakeFiles/bert.dir/bert.cpp.o
In file included from /Users/turbo/dev/bert.cpp-future/src/bert.cpp:9:
/Users/turbo/dev/bert.cpp-future/src/bert.h:186:22: warning: 'bert_tokenize' has C-linkage specified, but returns incomplete type 'bert_tokens' (aka 'vector<int>') which could be incompatible with C [-Wreturn-type-c-linkage]
BERT_API bert_tokens bert_tokenize(
                     ^
/Users/turbo/dev/bert.cpp-future/src/bert.h:192:22: warning: 'bert_detokenize' has C-linkage specified, but returns user-defined type 'bert_string' (aka 'basic_string<char>') which is incompatible with C [-Wreturn-type-c-linkage]
BERT_API bert_string bert_detokenize(
                     ^
/Users/turbo/dev/bert.cpp-future/src/bert.cpp:358:33: error: no matching function for call to 'min'
    memcpy(output, str.c_str(), std::min(n_output, str.size()));
                                ^~~~~~~~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.