Comments (13)
@OlivierDehaene I hope you're ok with me pinging, I've been unable to find documentation behind the best_of
sampling strategy and why it would replace beam_search
. Do you have links to provide so that I can reach an explanation. Thank you for your help and you and your team for TGI it really is great!
from text-generation-inference.
Have you tried
best_of
?
yes, and it returns only one candidate. I'd like to apply filters & ranker (another model) on top of the generated candidates. So what am I asking is there a way to get several candidates at once
from text-generation-inference.
Hello, why does TGI not support beam_search is there a design reason behind this ?
from text-generation-inference.
Can you delete the huggingface hub cache and re-try? I think this is because you have a corrupted weight file.
Could you also link your hf_transfer
version?
pip freeze |grep hf_transfer
from text-generation-inference.
Thanks. Deleting the cache and re-trying fixed the above failure. However, its strange why it failed the first time because it downloaded the model on the fly and it did not have anything in the cache before.
I have hf_transfer==0.1.2
installed.
I am running into connection refused error when trying to run inference.
Steps:
Below command succeeds
BUILD_EXTENSIONS=False make install
make run-bloom-560m
Below command returns error - curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"Testing API","parameters":{"max_new_tokens":9}}' \
-H 'Content-Type: application/json'
I also tried to run
make server-dev
make router-dev
curl request fails after this as well.
from text-generation-inference.
The webserver is running on port 3000 by default.
from text-generation-inference.
Thanks, works now.
from text-generation-inference.
Also, I was trying to use generate_stream
api.
Example request:
curl 127.0.0.1:3000/generate_stream -X POST -d '{"inputs":"Amazon is","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
I only see generated text in the last result. With stream api, isn't it supposed to return token by token?
Output:
data:{"token":{"id":267,"text":" a","logprob":-2.046875,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":10087,"text":" great","logprob":-2.234375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":4676,"text":" way","logprob":-1.9453125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":427,"text":" to","logprob":-0.16210938,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":2213,"text":" get","logprob":-2.84375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":267,"text":" a","logprob":-2.984375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":2084,"text":" new","logprob":-3.953125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":39222,"text":" iPhone","logprob":-2.515625,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":15,"text":",","logprob":-1.5078125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":530,"text":" and","logprob":-2.359375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":267,"text":" a","logprob":-3.046875,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":2084,"text":" new","logprob":-1.828125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":99607,"text":" iPad","logprob":-0.4609375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":15,"text":",","logprob":-1.3828125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":530,"text":" and","logprob":-1.65625,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":267,"text":" a","logprob":-1.140625,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":2084,"text":" new","logprob":-0.11328125,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":10677,"text":" Mac","logprob":-1.6484375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":15,"text":",","logprob":-1.7734375,"special":false},"generated_text":null,"details":null}
data:{"token":{"id":530,"text":" and","logprob":-0.35351562,"special":false},"generated_text":" a great way to get a new iPhone, and a new iPad, and a new Mac, and","details":null}
from text-generation-inference.
sorry, my bad - the text field has token by token information.
from text-generation-inference.
@OlivierDehaene curious if streaming text generation is supported with beam search?
looking at the code - https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/tokens.py, seems like sampling and greedy are supported. Are there plans to add beam search?
from text-generation-inference.
No, text-generation-inference does not support beam search and might never support it.
from text-generation-inference.
No, text-generation-inference does not support beam search and might never support it.
Hey @OlivierDehaene , could you please tell us the reason behind this? For many (not that large) LMs, this will be VERY useful. As for now, the only way to get candidates from LMs is to send multiple requests with different seeds, which lacks caching and algo optimizations.
from text-generation-inference.
Have you tried best_of
?
from text-generation-inference.
Related Issues (20)
- Add support for Phi-3 Model HOT 4
- Inference error for Mistral7b v-0.2 while deploying in Azure VM
- Frequency penalty corrupting generations HOT 1
- Shared volume using mountpoint-s3, permissions issues HOT 5
- Planned/Potential of significant work
- Suport for InternVL-Chat-V1-5 HOT 1
- Support for ReFT
- Python client: Extra slash in base_uri leads to failures in chat endpoint
- The TGI loading model consumes all available gpus memory
- Process hangs in local run HOT 1
- Out of Memory Errors When Running text-generation-benchmark Despite Compliant Batch Token Limit HOT 3
- TGI crashes with complex json schemas provided as grammar without any information (on debug/trace level) HOT 1
- Canno launch with error exllamav2_kernels not installed. HOT 4
- Failing to start a TGI pod with 2 or more GPUs. Sharding fails.
- Unable to stop TGI after serving models HOT 1
- Do I need to additionally apply an inference template?
- UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0
- Serverless inference API endpoints fails to return logprobs via chat completions
- 404 for Multi-modal docs
- The quantized llama-3-8b-instruct-awq with TGI 1.4 can handle fewer batch requests than the standard llama-3-8b-instruct with TGI 1.4 on the same RTX 3090 with 24GB VRAM.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.