Thanks for your excellent work! And in your paper, you made a lot of comparisons

Hi @KyleZheng1997 I used temperature = 0.1 I've exper

Call for the implement code of BYOL, MoCo and Swav about dino HOT 6 CLOSED

tangtaogo commented on July 3, 2024 4

Call for the implement code of BYOL, MoCo and Swav

from dino.

Comments (6)

mathildecaron31 commented on July 3, 2024 3

Hi @KyleZheng1997

I used temperature = 0.1
I've experimented with different parameters and the same schedule as used in this repo was the optimal (0.996 to 1 with cosine)
65k, though it was giving only minor improvement over 4k for example
learning rate is the same as in this repo. For weight decay keeping it fixed to 0.05 was working the best in my experiments.
no clipping
3 layers and final dimension at 256. hidden unit at 2048
I use synchronous BN in projection head
batch size is 1024 just like in DINO experiments
same as in this repo
symmetric version

Hope that helps :)

from dino.

mathildecaron31 commented on July 3, 2024 2

In case that helps here are the checkpoints at 300 epochs used in our paper:
BYOL:
https://dl.fbaipublicfiles.com/dino/byol_vitsmall16_300ep_pretrain.pth

SwAV:
https://dl.fbaipublicfiles.com/dino/swav_vitsmall16_300ep_pretrain.pth

MoCo-v2:
https://dl.fbaipublicfiles.com/dino/moco_vitsmall16_300ep_pretrain.pth

DINO:
https://dl.fbaipublicfiles.com/dino/dino_vitsmall16_300ep_pretrain.pth

from dino.

mingkai-zheng commented on July 3, 2024 1

Would you like to share some implementation details of mocov2 (transformer version)?

temperature
momentum coefficient (and did you cosine increasing m to 1 ?)
number of negatives in the memory buffer
learning rate, weight decay (and it's corresponding schedule strategy)
global clip norm value
Number of layers in the MLP head (and it's corresponding dimensions)
Did you used bn in the MLP layer? if you did, did you use shuffle bn or sync bn?
the batch size
The details of image augmentations (simclr / mocov2 / byol style ?)
The reported result of MocoV2 is the original version or the symmetric version?

Sorry to have so many questions, I will be really appreciated it if you can answer these questions.

from dino.

mathildecaron31 commented on July 3, 2024 1

1- AdamW (exactly like DINO training)
2- GeLU (exactly like DINO training)
3- No activation after final layer. The features are l2 normalized (like in MoCo paper)
4- 72.7% top-1 with linear eval after 800 epochs (71.6% after 300 epochs) (please see Tables 2 and 13 from our DINO paper). I have not reported top5.

Hope that helps!

from dino.

mathildecaron31 commented on July 3, 2024

Hi @Trent-tangtao

Thanks for your kind words. In order to keep this codebase simple, we have not been planning to release my implementation for BYOL, MoCo and SwAV. The repo is meant to be a simple implementation for DINO and related transfer tasks only and not a generic SSL library.

In addition there are already official releases for these works so I am not sure of the added value in releasing the re-implementation. I encourage you to take a look at https://github.com/facebookresearch/vissl, https://github.com/facebookresearch/swav or https://github.com/facebookresearch/moco.

I apologize for the possible inconvenience, and feel free to reach out if you have any questions.

from dino.

Hiusam commented on July 3, 2024

Hi, thanks for your excellent work! I have the following questions regarding reproducing MoCoV2 results:

Do you use AdamW for MoCoV2 training? And what are the parameters of the optimizer?
What activation function do you use in the MLP head?
Do you use the activation function after the final layer of the MLP head?
What were the top-1 accuracy and top-5 accuracy when you finished the MoCoV2 training for ViT-S?

from dino.

Call for the implement code of BYOL, MoCo and Swav about dino HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent