Comments (3)
Thanks for your inserts of our work. Setting either full_matrices=True
or full_matrices=False
will not affect the results. For square matrices, they are the same. For non-square matrices, reduced SVD is also the same as full SVD but removes the zero singular vectors. In our work, we are actually using truncated SVD, which only obtain the first r singular vectors.
from loftq.
Thank you for your kind response. I have some additional questions regarding the LoftQ algorithm.
I am struggling to intuitively understand how repeatedly performing quantization and SVD approximation leads to progressively better initialization at adapter weight.
If we rewrite LoftQ Algorithm 1 with an added Error Term, it looks as follows: (
![스크린샷 2023-11-21 오후 8 23 34](https://private-user-images.githubusercontent.com/54992207/284561588-5f068dd5-1192-447d-907f-ca3a5dd6a047.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwMzUxNzAsIm5iZiI6MTcyMjAzNDg3MCwicGF0aCI6Ii81NDk5MjIwNy8yODQ1NjE1ODgtNWYwNjhkZDUtMTE5Mi00NDdkLTkwN2YtY2EzYTVkZDZhMDQ3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI2VDIzMDExMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMyZjFkOWM2OWVjODliNmRkZjEwYzE0OWMyMDE1NWFjYTllZmRjMTFkYTBjNTAxNmI2NmEwYThjZmRiM2M5MmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.LLQt3oDFwqyJE4e244JOmII-mo-nHEXrZ0nK09QTLho)
As in Equation 3, when we approximate the difference between
I have personally measured how this error term (
The results show that in all layers, this svd error term decreases as the number of iteration steps increases.
In summary, as the steps in the LoftQ algorithm increase, the SVD approximation becomes more accurate, effectively minimizing the main objective, as stated in eq.6 of the paper. However, I am not entirely clear on why this error minimizes through the repetition of these two steps (1)Quantization 2)SVD). Could you please explain this once more?
I conceptually understand how the initialization of quantized weight and adapter weight are jointly optimized but it is not clear to me why this process minimizes
I would greatly appreciate additional clarification on this, as it would help me deeply understand the core idea of this excellent paper.
from loftq.
Hi @MarsJacobs, the error decreasing as the step increasing is not guaranteed. This algorithm is heuristic. For some models, like some layers in DeBERTa-v3-base, the error fluctuates as steps increases.
from loftq.
Related Issues (20)
- Does it support Mixtral 8x7B? HOT 1
- loftQ can not use multi gpu to train HOT 9
- Is there any way for using LoftQ to GPTQ or AWQ model? HOT 2
- bugs for running python test_gsm8k.py when uses LoftQ for llama HOT 2
- A question from a novice. HOT 2
- The issue of not being able to download the LoftQ model from huggingface even when using an VPN HOT 1
- issues for running python test_gsm8k.py when uses LoftQ for llama
- Why are the full models, and not just adapters, pushed to hub? HOT 2
- Failing to converge when using some random seeds HOT 2
- Performance worsens versus QLoRA with TinyLlama
- Why are base weights on HF LoftQ models in 16-bit? HOT 2
- Error with shape HOT 2
- quick question about the Llama-3 results HOT 1
- [BUG]size mismatch for base_model.model.model.embed_tokens.weight
- Method fails on Gemma-7B model HOT 1
- Embedding layer HOT 1
- Cannot reproduce the result of LoftQ on gsm8k with llama2-7b
- About the test result on gsm8k
- Number of iterations seems always set to 1 based on latest code
- The issue of rank not being able to change HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from loftq.