Steps to reproduce: Followed local install steps in README.md <div class="snip

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Have you tried best_of ? </blockquot

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error launching bloom-560m about text-generation-inference HOT 13 CLOSED

huggingface commented on May 16, 2024

Error launching bloom-560m

from text-generation-inference.

Comments (13)

McDonnellJoseph commented on May 16, 2024 2

@OlivierDehaene I hope you're ok with me pinging, I've been unable to find documentation behind the best_of sampling strategy and why it would replace beam_search. Do you have links to provide so that I can reach an explanation. Thank you for your help and you and your team for TGI it really is great!

from text-generation-inference.

stalkermustang commented on May 16, 2024 1

Have you tried best_of ?

yes, and it returns only one candidate. I'd like to apply filters & ranker (another model) on top of the generated candidates. So what am I asking is there a way to get several candidates at once

from text-generation-inference.

McDonnellJoseph commented on May 16, 2024 1

Hello, why does TGI not support beam_search is there a design reason behind this ?

from text-generation-inference.

OlivierDehaene commented on May 16, 2024

Can you delete the huggingface hub cache and re-try? I think this is because you have a corrupted weight file.
Could you also link your hf_transfer version?

pip freeze |grep hf_transfer

from text-generation-inference.

rohithkrn commented on May 16, 2024

Thanks. Deleting the cache and re-trying fixed the above failure. However, its strange why it failed the first time because it downloaded the model on the fly and it did not have anything in the cache before.
I have hf_transfer==0.1.2 installed.

I am running into connection refused error when trying to run inference.

Steps:
Below command succeeds

BUILD_EXTENSIONS=False make install
make run-bloom-560m

Below command returns error - curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused

curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"Testing API","parameters":{"max_new_tokens":9}}' \
    -H 'Content-Type: application/json'

I also tried to run

make server-dev
make router-dev

curl request fails after this as well.

from text-generation-inference.

OlivierDehaene commented on May 16, 2024

The webserver is running on port 3000 by default.

from text-generation-inference.

rohithkrn commented on May 16, 2024

Thanks, works now.

from text-generation-inference.

rohithkrn commented on May 16, 2024

Also, I was trying to use generate_stream api.
Example request:
curl 127.0.0.1:3000/generate_stream -X POST -d '{"inputs":"Amazon is","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'

I only see generated text in the last result. With stream api, isn't it supposed to return token by token?
Output:

data:{"token":{"id":267,"text":" a","logprob":-2.046875,"special":false},"generated_text":null,"details":null}                                                                                                 
                                                                                                                                                                                                               
data:{"token":{"id":10087,"text":" great","logprob":-2.234375,"special":false},"generated_text":null,"details":null}                                                                                           
                                                                                                                                                                                                               
data:{"token":{"id":4676,"text":" way","logprob":-1.9453125,"special":false},"generated_text":null,"details":null}                                                                                             
                                                                                                                                                                                                               
data:{"token":{"id":427,"text":" to","logprob":-0.16210938,"special":false},"generated_text":null,"details":null}                                                                                              
                                                                                                                                                                                                               
data:{"token":{"id":2213,"text":" get","logprob":-2.84375,"special":false},"generated_text":null,"details":null}                                                                                               
                                                                                                                                                                                                               
data:{"token":{"id":267,"text":" a","logprob":-2.984375,"special":false},"generated_text":null,"details":null}                                                                                                 
                                                                                                                                                                                                               
data:{"token":{"id":2084,"text":" new","logprob":-3.953125,"special":false},"generated_text":null,"details":null}                                                                                              
                                                                                                                                                                                                               
data:{"token":{"id":39222,"text":" iPhone","logprob":-2.515625,"special":false},"generated_text":null,"details":null}                                                                                          
                                                                                                                                                                                                               
data:{"token":{"id":15,"text":",","logprob":-1.5078125,"special":false},"generated_text":null,"details":null}                                                                                                  

data:{"token":{"id":530,"text":" and","logprob":-2.359375,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":267,"text":" a","logprob":-3.046875,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":2084,"text":" new","logprob":-1.828125,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":99607,"text":" iPad","logprob":-0.4609375,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":15,"text":",","logprob":-1.3828125,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":530,"text":" and","logprob":-1.65625,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":267,"text":" a","logprob":-1.140625,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":2084,"text":" new","logprob":-0.11328125,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":10677,"text":" Mac","logprob":-1.6484375,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":15,"text":",","logprob":-1.7734375,"special":false},"generated_text":null,"details":null}

data:{"token":{"id":530,"text":" and","logprob":-0.35351562,"special":false},"generated_text":" a great way to get a new iPhone, and a new iPad, and a new Mac, and","details":null}

from text-generation-inference.

rohithkrn commented on May 16, 2024

sorry, my bad - the text field has token by token information.

from text-generation-inference.

rohithkrn commented on May 16, 2024

@OlivierDehaene curious if streaming text generation is supported with beam search?
looking at the code - https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/tokens.py, seems like sampling and greedy are supported. Are there plans to add beam search?

from text-generation-inference.

OlivierDehaene commented on May 16, 2024

No, text-generation-inference does not support beam search and might never support it.

from text-generation-inference.

stalkermustang commented on May 16, 2024

No, text-generation-inference does not support beam search and might never support it.

Hey @OlivierDehaene , could you please tell us the reason behind this? For many (not that large) LMs, this will be VERY useful. As for now, the only way to get candidates from LMs is to send multiple requests with different seeds, which lacks caching and algo optimizations.

from text-generation-inference.

Narsil commented on May 16, 2024

Have you tried best_of ?

from text-generation-inference.

Error launching bloom-560m about text-generation-inference HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent