Problem torch.compile() shows an

This should solve the problem😄 <a class="issue-link js-issue-link" data-error-tex

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Understanding why TorchInductor cannot speed-up huggingface transformer inference about gpt-fast HOT 4 CLOSED

pytorch-labs commented on June 2, 2024

Understanding why TorchInductor cannot speed-up huggingface transformer inference

from gpt-fast.

Comments (4)

kxzxvbk commented on June 2, 2024 3

I think HF llama does not have a static kv cache, since its cache is dynamically increased during generation. Here is the relavent code: https://github.com/huggingface/transformers/blob/38611086d293ea4a5809bcd7fadd8081d55cb74e/src/transformers/models/llama/modeling_llama.py#L1014C37-L1014C37
However, I also have the same doubt about why compile hardly accelerate HF model? Is it becase the input size of model in each step of generation is different and results in frequent recompile?

from gpt-fast.

ArthurZucker commented on June 2, 2024 2

Yes! Static KV cache is not supported but coming soon!

from gpt-fast.

learning-chip commented on June 2, 2024 1

This should solve the problem😄
huggingface/transformers#28075
huggingface/transformers#27931

from gpt-fast.

yafehlis commented on June 2, 2024

@learning-chip @ArthurZucker
Hi both, I am comparing HF with GPT-fast as well and cannot get the same pass@1 score. When using greedy method, I cannot get the exact same predictions from both APIs. I have submitted an issue (#94 ). Could you provide some pointers? I am stuck. Thanks, Yao Fehlis ([email protected])

from gpt-fast.

Understanding why TorchInductor cannot speed-up huggingface transformer inference about gpt-fast HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent