Giter VIP home page Giter VIP logo

Comments (4)

zch-cc avatar zch-cc commented on August 18, 2024

find similar issue here #1957 and it looks like versions after 2.0.2 is not working. I tried all versions after 2.0.2, all fails

from text-generation-inference.

ErikKaum avatar ErikKaum commented on August 18, 2024

Hi!

Thanks for reporting the issue 👍 could you share a bit on how to reproduce this?
E.g. which model are you using, what's the command to launch the docker container?

from text-generation-inference.

zch-cc avatar zch-cc commented on August 18, 2024

Hi @ErikKaum,
Thanks for responding. The model is

deepseek-ai/deepseek-coder-6.7b-base

and you can use the official docker container command to reproduce this

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.1.1 --model-id $model

from text-generation-inference.

ErikKaum avatar ErikKaum commented on August 18, 2024

Thank you.

Yeah, I'm able to reproduce this on my machine. Also quickly checking seems that deepseek-ai/deepseek-coder-6.7b-base is working with the transformers library. So most likely a bug in our end.

At the moment, I unfortunately don't have bandwidth to start debugging.

I'm seeing a lot of warnings like this:

2024-07-16T15:00:17.309591Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'õ' was expected to have ID '32000' but was given ID 'None'
2024-07-16T15:00:17.309615Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '÷' was expected to have ID '32001' but was given ID 'None'
2024-07-16T15:00:17.309618Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'Á' was expected to have ID '32002' but was given ID 'None'
2024-07-16T15:00:17.309621Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ý' was expected to have ID '32003' but was given ID 'None'
2024-07-16T15:00:17.309624Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'À' was expected to have ID '32004' but was given ID 'None'
2024-07-16T15:00:17.309626Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ÿ' was expected to have ID '32005' but was given ID 'None'
2024-07-16T15:00:17.309629Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ø' was expected to have ID '32006' but was given ID 'None'
2024-07-16T15:00:17.309631Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ú' was expected to have ID '32007' but was given ID 'None'
2024-07-16T15:00:17.309641Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'þ' was expected to have ID '32008' but was given ID 'None'
2024-07-16T15:00:17.309643Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ü' was expected to have ID '32009' but was given ID 'None'
2024-07-16T15:00:17.309646Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ù' was expected to have ID '32010' but was given ID 'None'
2024-07-16T15:00:17.309648Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'ö' was expected to have ID '32011' but was given ID 'None'
2024-07-16T15:00:17.309651Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token 'û' was expected to have ID '32012' but was given ID 'None'
2024-07-16T15:00:17.309653Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|begin▁of▁sentence|>' was expected to have ID '32013' but was given ID 'None'
2024-07-16T15:00:17.309656Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end▁of▁sentence|>' was expected to have ID '32014' but was given ID 'None'
2024-07-16T15:00:17.309658Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁hole|>' was expected to have ID '32015' but was given ID 'None'
2024-07-16T15:00:17.309661Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁begin|>' was expected to have ID '32016' but was given ID 'None'
2024-07-16T15:00:17.309664Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|fim▁end|>' was expected to have ID '32017' but was given ID 'None'
2024-07-16T15:00:17.309666Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<pad>' was expected to have ID '32018' but was given ID 'None'
2024-07-16T15:00:17.309669Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|User|>' was expected to have ID '32019' but was given ID 'None'
2024-07-16T15:00:17.309672Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|Assistant|>' was expected to have ID '32020' but was given ID 'None'
2024-07-16T15:00:17.309674Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|EOT|>' was expected to have ID '32021' but was given ID 'None'
2024-07-16T15:00:17.310063Z  INFO text_generation_router: router/src/main.rs:330: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205

which makes me think it might be a tokenization issue.

from text-generation-inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.