Comments (4)
Hi @daegonYu ,
This is a hyperparameter for tuning. Empirically we observe that lower temperature will lead to better performance but might cause training instability under float16 precision for large models. A lower temperature allows the logits to vary in a wider range and thus has more flexibility.
from unilm.
“A lower temperature allows the logits to vary in a wider range and thus has more flexibility.” This can be interpreted as saying that embeddings make it easier to learn more diverse expressions. But in "https://huggingface.co/intfloat/multilingual-e5-base"
3. Why does the cosine similarity scores distribute around 0.7 to 1.0?
This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss.
For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue.
If embeddings can be expressed in wider range, I think cosine similarity should be distributed over a wide range. Cosine similarity is distributed between 0.7 and 1.0. It's difficult to understand because it seems like something contradictory. Simply put, I wonder why lowering the temperature allows learning a wider range of logits.
from unilm.
The logits are calculated with cosine_similarity / t
. Therefore, the logits will fall in [-100, 100]
with t = 0.01
and [-50, 50]
with t=0.02
, etc.
However, this does not mean the learned cosine similarity will be in a wider range. On the contrary, the cosine similarity tends to concentrate as the temperature becomes lower.
from unilm.
Related Issues (20)
- About using BEATs as audio feature extractor HOT 2
- Reproducing WavLM results on speaker verification
- BEATs model produces NaN when using mixed precision with pytorch lightning
- Question about TROCR model variations in terms of FLOPs and Inference time
- Unable to use finetuned LayoutLMV3 for object detection task model for testing
- BEiT2 linear probing
- Inference on my own images HOT 1
- Prompt Preparation of Kosmos-2 Object Detection Fine-tuning HOT 1
- unimim is still unavailable after one year HOT 2
- Request for DiT FUNSD MRCNN Config File
- Fine tuning Kosmos 2.5 HOT 1
- How to fine-tune E5-mistral-7b-instruct? HOT 1
- Kosmo2.5 Chinese performance very bad HOT 2
- why last_hidden_states.sequence_length is not same as input_ids.sequence_length
- Model for commercial use? HOT 1
- Kosmos 2.5 for Volta GPU
- (BEIT-3) I can't find transformers.py HOT 3
- Kosmos-2.5 Dependencies Should Be Updated HOT 1
- The pretraining code and model of beit3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unilm.