Hi, As a debugging way, I want to check whether the fake and true qu

fake and true quantization don't match about loftq HOT 4 CLOSED

BaohaoLiao commented on June 27, 2024

fake and true quantization don't match

from loftq.

Comments (4)

yxli2123 commented on June 27, 2024 1

LoftQ/Llama-2-7b-hf-4bit-64rank is quantized with bitsandbytes method and does not have discrepancy between true and fake quantization. The default alternating step T for llama-2 is 5.

from loftq.

yxli2123 commented on June 27, 2024

Thanks for pointing out. This is because LoftQ/Llama-2-7b-hf-bit4-rank64 used self-implemented nf4 quantization method which is not exact the same as nf4 quantization in bitsandbytes. To fix it, please try LoftQ/Llama-2-7b-hf-4bit-64rank.

Moreover, our method is not constrained to a specific quantization method. Either self-implemented or the one in bitsandbytes can achieve the on par results.

from loftq.

BaohaoLiao commented on June 27, 2024

Thank you for this clarification.

I understand your method is not limited to any quantization function. However, you still use the bitsandbytes as a backend for memory-efficient fine-tuning. If you use custom quantization (like self-implemented nf4 quantization), doesn't it introduce some mismatch because of the different quantization functions between fine-tuning and custom LoRA initialization?

Said you can obtain a perfect LoRA initialization as W = Q + AB, where Q = self_implemented_nf4(W). When you use bitsandbytes to fine-tune, Q_new = bitsandbytes_nf4(Q), results in W is not equal to Q_new + AB.

from loftq.

BaohaoLiao commented on June 27, 2024

In addition, may I ask what the default T is for llama?

from loftq.

Recommend Projects

fake and true quantization don't match about loftq HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent