Comments (6)
Hi @KyleZheng1997
- I used temperature = 0.1
- I've experimented with different parameters and the same schedule as used in this repo was the optimal (0.996 to 1 with cosine)
- 65k, though it was giving only minor improvement over 4k for example
- learning rate is the same as in this repo. For weight decay keeping it fixed to 0.05 was working the best in my experiments.
- no clipping
- 3 layers and final dimension at 256. hidden unit at 2048
- I use synchronous BN in projection head
- batch size is 1024 just like in DINO experiments
- same as in this repo
- symmetric version
Hope that helps :)
from dino.
In case that helps here are the checkpoints at 300 epochs used in our paper:
BYOL:
https://dl.fbaipublicfiles.com/dino/byol_vitsmall16_300ep_pretrain.pth
SwAV:
https://dl.fbaipublicfiles.com/dino/swav_vitsmall16_300ep_pretrain.pth
MoCo-v2:
https://dl.fbaipublicfiles.com/dino/moco_vitsmall16_300ep_pretrain.pth
DINO:
https://dl.fbaipublicfiles.com/dino/dino_vitsmall16_300ep_pretrain.pth
from dino.
Would you like to share some implementation details of mocov2 (transformer version)?
- temperature
- momentum coefficient (and did you cosine increasing m to 1 ?)
- number of negatives in the memory buffer
- learning rate, weight decay (and it's corresponding schedule strategy)
- global clip norm value
- Number of layers in the MLP head (and it's corresponding dimensions)
- Did you used bn in the MLP layer? if you did, did you use shuffle bn or sync bn?
- the batch size
- The details of image augmentations (simclr / mocov2 / byol style ?)
- The reported result of MocoV2 is the original version or the symmetric version?
Sorry to have so many questions, I will be really appreciated it if you can answer these questions.
from dino.
1- AdamW (exactly like DINO training)
2- GeLU (exactly like DINO training)
3- No activation after final layer. The features are l2 normalized (like in MoCo paper)
4- 72.7% top-1 with linear eval after 800 epochs (71.6% after 300 epochs) (please see Tables 2 and 13 from our DINO paper). I have not reported top5.
Hope that helps!
from dino.
Hi @Trent-tangtao
Thanks for your kind words. In order to keep this codebase simple, we have not been planning to release my implementation for BYOL, MoCo and SwAV. The repo is meant to be a simple implementation for DINO and related transfer tasks only and not a generic SSL library.
In addition there are already official releases for these works so I am not sure of the added value in releasing the re-implementation. I encourage you to take a look at https://github.com/facebookresearch/vissl, https://github.com/facebookresearch/swav or https://github.com/facebookresearch/moco.
I apologize for the possible inconvenience, and feel free to reach out if you have any questions.
from dino.
Hi, thanks for your excellent work! I have the following questions regarding reproducing MoCoV2 results:
- Do you use AdamW for MoCoV2 training? And what are the parameters of the optimizer?
- What activation function do you use in the MLP head?
- Do you use the activation function after the final layer of the MLP head?
- What were the top-1 accuracy and top-5 accuracy when you finished the MoCoV2 training for ViT-S?
from dino.
Related Issues (20)
- starting from checkpoint
- The averaged learning rate in log.txt
- Hyperparameters for ViT-B/16 + ImageNet pretraining
- issue with the ViT-S/8 full checkpoint
- Intermediate checkpoints?
- Facing Accuracy Drop while running in QDQ onnx runtime
- Why the patchembedding defaut img size is not equal to the image size in visualize attention? HOT 1
- Loss Nan Error
- Difference between DINO and DINOv2
- For large batches (256), there is a problem of loss non convergence
- Why not setting correct img_size when building the student network? HOT 1
- Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? HOT 1
- a solution to solve memory issues (but slows down training a bit)
- Best features for image similarity HOT 1
- Loss is not decresing HOT 3
- Why Teacher network perform bettert than Student one during training?
- How to convert to onnx HOT 1
- Can I use dino model to match images with pixels? Just like with the clip model you can match pixels with text.
- Choice of out_dim
- Cannot reproduce KNN performance for vanilla ViT-S training HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dino.