Hello, Thank you for this awesome project! I ran through the Getting

Interested in the response to <a class="user-mention notranslate" data-hovercard-type=

Improve `from_quantized` loading time about autogptq HOT 7 CLOSED

philschmid commented on May 19, 2024

Improve `from_quantized` loading time

from autogptq.

Comments (7)

PanQiWei commented on May 19, 2024 2

The warmup used every time when model is loaded can help triton to tune the best configuration that can make inference as fast as possible. But since it's kindly like grid search, hyper-params are manually set (hard coded), it may have no effect for some card. So now I've added warmup_triton to from_quantized api, one can set it to False to skip warmup stage and then model loading can be fast.

I will also find a way to let users to cache the best triton configuration and save as a file so that one can warmup only once.

from autogptq.

PanQiWei commented on May 19, 2024 1

Will close this issue for loading time problem of .from_quantized has been fixed. Feel free to reopen or raise a new issue if you still encounter similar problem.

I will also find a way to let users to cache the best triton configuration and save as a file so that one can warmup only once.

For this I will add into a backlog as a future work.

from autogptq.

PanQiWei commented on May 19, 2024

Hi, when using triton with from_quantized, there is a auto_tune_warmup is executed. Maybe I should make the warmup as an option for users.

from autogptq.

philschmid commented on May 19, 2024

Is the "warm up" needed after you saved the model once?

from autogptq.

philschmid commented on May 19, 2024

Is the "warm up" needed after you saved the model once?

from autogptq.

larekrow commented on May 19, 2024

Interested in the response to @philschmid's question as well. If so, what are the implications of not using the warm up?

from autogptq.

philschmid commented on May 19, 2024

Thank you will give it a try later this week!

from autogptq.

Recommend Projects

Improve `from_quantized` loading time about autogptq HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent