shiramir / dino-vit-features Goto Github PK

Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".

Home Page: https://dino-vit-features.github.io

License: MIT License

Python 87.65% Jupyter Notebook 12.35%

deep-learning computer-vision vision-transformers co-segmentation part-segmentation semantic-correspondence dino pytorch

dino-vit-features's Issues

Indexing error when using high resolution saliency map

Hi,

When running cosegmentation.py and parts_cosegmentation.py, turning low_res_saliency_maps off leads to an indexing error. It appears that saliency_map is batched with shape 1xN. So something like:

if not low_res_saliency_maps:
    saliency_map = saliency_map[0]

is a sufficient fix.

Traceback (most recent call last):
  File "cosegmentation.py", line 523, in <module>
    seg_masks, pil_images = find_cosegmentation(
  File "cosegmentation.py", line 257, in find_cosegmentation
    label_saliency = saliency_map[image_labels[:, 0] == label].mean()
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 1705

parameter tunning for custom dataset

I found your method sensitive to the choice of parameters (thresh, elbow coefficient, etc.). Instead of tunning them manually and assessing the results qualitatively, is there a way to do a grid search and assess quantitatively? For example, can I search on the training set and evaluate on the validation set and use Landmark regression results to select the best parameters? If so, could you upload your evaluation scripts so that I can do it this way? Thank you.

Does the tint of the PCA image mean anything?

I performed self-supervised training of DINO on a pathology image dataset and analysed PCA as well as DINO-vit-feature.

At the same time, I also analysed the PCA in the model with DINO self-supervised learning in imagenet, and found that the colour images of the last 3dim of the PCA were more colourful in the model trained on pathology images, whereas the colour images in the model with DINO self-supervised learning in imagenet were were closer to the three primary colours of RGB and less colourful.　What do you think these differences due to PCA represent?

DINO:pretrained with pathology data

DINO:pretrained with imagenet data

PCK evaluation code

I am currently trying to replicate the results using the code from @kampta under #8. I am getting PCK values which are around 6-10% lower than the one mentioned in the paper under the same parameters using his notebook.

I would appreciate if you could guide me on this or share the code for how you went about calculating the PCK.

Evaluation subsets and code for fair comparison.

Hi,

Is it possible to share the evaluation splits and code for fair comparison?

Thanks

Part co-segmentation comparison on CUB

Hello,
is it possible to release the evaluation code for CUB, which reproduces the results presented in the paper?

With the currently available implementation, I'm unfortunately not able to reproduce the results.
I get much worse results for the NMI and ARI.

best regards

CUDA out of memory

I have GPUs with 11GB of memory, and I will get out of memory warning when I load more than three images.
(when computing the attention of ViT, attn = (q @ k.transpose(-2, -1)) * self.scale)

I think I can increase the stride or decrease the load size, but it will also degrade the performance.

I found the code only processes a single image each time, so I would like to ask if I can run the program across multiple GPUs?

Does this suitable for swin model?

Why is only dino_vits8 supported?

In the examples, if you change the model_type to anything other than dino_vits8 the code crashes because of an assert in ViTExtractor.extract_saliency_maps. What needs to change to properly support other model types?

PCK Evaluation

I am unable to replicate results in Table 4 (Correspondence Evaluation on Spair71k). Since in the case of PCK evaluation, keypoints are provided for the source image, I find the closest point (according to the binned descriptor) in the second image within the "salient region". The numbers I get are close to zeros so there might be a mistake in my code. Are there any additional heuristics that you apply for this one-way correspondence?

Use of the previous KMeans instance

I wonder if using the previous KMeans instance at this point is intentional. I mean the part_algorithm was trained on normalized_all_fg_sampled_descriptors as opposed to common_part_algorithm, which is trained on normalized_all_common_sampled_descriptors.

dino-vit-features/part_cosegmentation.py

Line 273 in 4b023ec

 _, common_part_labels = part_algorithm.index.search(normalized_all_common_descriptors.astype(np.float32), 1) 

Extractor feature OOM

Hi there， I tried the code, at the beginning , everything sames fine, but when I try to use the extractor on my own images, more specificly, extract feature of high resolution pciture and visualize the pca picture. I tried the code on 100 800 x 800 images, set the load_size=224 and stride=2, it seems fine, so the code maybe run the image separately?

How should I calculate the memory of GPU and Could the Extractor could modify to using on multiple GPUs?

DINO v2

Is there information available for how to use DINO V2 for point correspondence?

Issues with the code

Hi!

Thank you for this great repo.

I had the following isue while trying to run the different notebooks:

TypeError: interpolate_pos_encoding() missing 1 required positional argument: 'h'

Thanks!

permutation order

dino-vit-features/extractor.py

Line 299 in 1779649

 desc = x.permute(0, 2, 3, 1).flatten(start_dim=-2, end_dim=-1).unsqueeze(dim=1) # Bx1xtx(dxh) 

I am a bit confused. According to my understanding it should be x.permute(0, 2, 1, 3). same goes for line 239.

do I miss something?

PS:

dino-vit-features/extractor.py

Line 239 in 1779649

bin_x = x.permute(0, 2, 3, 1).flatten(start_dim=-2, end_dim=-1) # Bx(t-1)x(dxh)

How much GPU memory is needed for inference?

Hi，Thanks for you great work!How much GPU memory is needed for inference?

The choice of head_idx

dino-vit-features/extractor.py

Line 313 in 79c1289

head_idxs = [0, 2, 4, 5]

Is there any reason why you choose these heads?

Supervised ViT Checkpoint

Hi, thanks for this great repo! Could you by chance point me in the direction of the "Supervised ViT" described in Figure 3?

saliency maps

Hi, why are the saliency maps not available for patch number 16?

DINO vs MAE

Hi,
Thanks for your amazing work. The study is very interesting. You are using DINO as feature extractor in your work, and I was just wondering if you tried using MAE or a different method? And do you have the same/similar results?
Thanks for your time,

Error in colab demo of part co-segmentation

it looks like the 'PyDenseCRF' package cannot be properly installed in colab

shiramir / dino-vit-features Goto Github PK

dino-vit-features's Issues

Recommend Projects

Recommend Topics

Recommend Org