Comments (4)
LoftQ/Llama-2-7b-hf-4bit-64rank
is quantized with bitsandbytes method and does not have discrepancy between true and fake quantization. The default alternating step T for llama-2 is 5.
from loftq.
Thanks for pointing out. This is because LoftQ/Llama-2-7b-hf-bit4-rank64
used self-implemented nf4 quantization method which is not exact the same as nf4 quantization in bitsandbytes
. To fix it, please try LoftQ/Llama-2-7b-hf-4bit-64rank
.
Moreover, our method is not constrained to a specific quantization method. Either self-implemented or the one in bitsandbytes
can achieve the on par results.
from loftq.
Thank you for this clarification.
I understand your method is not limited to any quantization function. However, you still use the bitsandbytes as a backend for memory-efficient fine-tuning. If you use custom quantization (like self-implemented nf4 quantization), doesn't it introduce some mismatch because of the different quantization functions between fine-tuning and custom LoRA initialization?
Said you can obtain a perfect LoRA initialization as W = Q + AB, where Q = self_implemented_nf4(W). When you use bitsandbytes to fine-tune, Q_new = bitsandbytes_nf4(Q), results in W is not equal to Q_new + AB.
from loftq.
In addition, may I ask what the default T is for llama?
from loftq.
Related Issues (20)
- Can we use LoftQ to optimize vision foundation models like OWL-ViT v2 and Grounding Dino? HOT 1
- quantize_save.py script fails saving lora adapter with peft>=0.7.2 HOT 3
- Does it support Mixtral 8x7Bīŧ HOT 1
- loftQ can not use multi gpu to train HOT 9
- Is there any way for using LoftQ to GPTQ or AWQ model? HOT 2
- bugs for running python test_gsm8k.py when uses LoftQ for llama HOT 2
- A question from a novice. HOT 2
- The issue of not being able to download the LoftQ model from huggingface even when using an VPN HOT 1
- issues for running python test_gsm8k.py when uses LoftQ for llama
- Why are the full models, and not just adapters, pushed to hub? HOT 2
- Failing to converge when using some random seeds HOT 2
- Performance worsens versus QLoRA with TinyLlama
- Why are base weights on HF LoftQ models in 16-bit? HOT 2
- Error with shape HOT 2
- quick question about the Llama-3 results HOT 1
- [BUG]size mismatch for base_model.model.model.embed_tokens.weight
- Method fails on Gemma-7B model HOT 1
- Embedding layer HOT 1
- Cannot reproduce the result of LoftQ on gsm8k with llama2-7b
- About the test result on gsm8k
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from loftq.