shiramir / dino-vit-features Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
Home Page: https://dino-vit-features.github.io
License: MIT License
Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
Home Page: https://dino-vit-features.github.io
License: MIT License
Hi,
When running cosegmentation.py and parts_cosegmentation.py, turning low_res_saliency_maps
off leads to an indexing error. It appears that saliency_map
is batched with shape 1xN. So something like:
if not low_res_saliency_maps:
saliency_map = saliency_map[0]
is a sufficient fix.
Traceback (most recent call last):
File "cosegmentation.py", line 523, in <module>
seg_masks, pil_images = find_cosegmentation(
File "cosegmentation.py", line 257, in find_cosegmentation
label_saliency = saliency_map[image_labels[:, 0] == label].mean()
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 1705
I found your method sensitive to the choice of parameters (thresh, elbow coefficient, etc.). Instead of tunning them manually and assessing the results qualitatively, is there a way to do a grid search and assess quantitatively? For example, can I search on the training set and evaluate on the validation set and use Landmark regression results to select the best parameters? If so, could you upload your evaluation scripts so that I can do it this way? Thank you.
I performed self-supervised training of DINO on a pathology image dataset and analysed PCA as well as DINO-vit-feature.
At the same time, I also analysed the PCA in the model with DINO self-supervised learning in imagenet, and found that the colour images of the last 3dim of the PCA were more colourful in the model trained on pathology images, whereas the colour images in the model with DINO self-supervised learning in imagenet were were closer to the three primary colours of RGB and less colourful. What do you think these differences due to PCA represent?
I am currently trying to replicate the results using the code from @kampta under #8. I am getting PCK values which are around 6-10% lower than the one mentioned in the paper under the same parameters using his notebook.
I would appreciate if you could guide me on this or share the code for how you went about calculating the PCK.
Hi,
Is it possible to share the evaluation splits and code for fair comparison?
Thanks
Hello,
is it possible to release the evaluation code for CUB, which reproduces the results presented in the paper?
With the currently available implementation, I'm unfortunately not able to reproduce the results.
I get much worse results for the NMI and ARI.
best regards
I have GPUs with 11GB of memory, and I will get out of memory warning when I load more than three images.
(when computing the attention of ViT, attn = (q @ k.transpose(-2, -1)) * self.scale)
I think I can increase the stride or decrease the load size, but it will also degrade the performance.
I found the code only processes a single image each time, so I would like to ask if I can run the program across multiple GPUs?
In the examples, if you change the model_type to anything other than dino_vits8
the code crashes because of an assert in ViTExtractor.extract_saliency_maps
. What needs to change to properly support other model types?
I am unable to replicate results in Table 4 (Correspondence Evaluation on Spair71k). Since in the case of PCK evaluation, keypoints are provided for the source image, I find the closest point (according to the binned descriptor) in the second image within the "salient region". The numbers I get are close to zeros so there might be a mistake in my code. Are there any additional heuristics that you apply for this one-way correspondence?
I wonder if using the previous KMeans instance at this point is intentional. I mean the part_algorithm was trained on normalized_all_fg_sampled_descriptors as opposed to common_part_algorithm, which is trained on normalized_all_common_sampled_descriptors.
dino-vit-features/part_cosegmentation.py
Line 273 in 4b023ec
Hi there, I tried the code, at the beginning , everything sames fine, but when I try to use the extractor on my own images, more specificly, extract feature of high resolution pciture and visualize the pca picture. I tried the code on 100 800 x 800 images, set the load_size=224 and stride=2, it seems fine, so the code maybe run the image separately?
How should I calculate the memory of GPU and Could the Extractor could modify to using on multiple GPUs?
Is there information available for how to use DINO V2 for point correspondence?
Hi!
Thank you for this great repo.
I had the following isue while trying to run the different notebooks:
TypeError: interpolate_pos_encoding() missing 1 required positional argument: 'h'
Thanks!
dino-vit-features/extractor.py
Line 299 in 1779649
I am a bit confused. According to my understanding it should be x.permute(0, 2, 1, 3)
. same goes for line 239.
do I miss something?
PS:
dino-vit-features/extractor.py
Line 239 in 1779649
Hi,Thanks for you great work!How much GPU memory is needed for inference?
dino-vit-features/extractor.py
Line 313 in 79c1289
Is there any reason why you choose these heads?
Hi, thanks for this great repo! Could you by chance point me in the direction of the "Supervised ViT" described in Figure 3?
Hi, why are the saliency maps not available for patch number 16?
Hi,
Thanks for your amazing work. The study is very interesting. You are using DINO as feature extractor in your work, and I was just wondering if you tried using MAE or a different method? And do you have the same/similar results?
Thanks for your time,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.