cvmi-lab / slotcon Goto Github PK
View Code? Open in Web Editor NEW(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
Home Page: https://wen-xin.info/slotcon/
License: Apache License 2.0
(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
Home Page: https://wen-xin.info/slotcon/
License: Apache License 2.0
Hi Xin,
Thanks for the great and insightful work.
When I read the code, I am confused by the label generation for contrastive learning of slots.
As shown in https://github.com/CVMI-Lab/SlotCon/blob/main/models/slotcon.py#L186, the slots with the same indexes are viewed as positive indexes while I find that these slots are generated by masked pooling from features and indexes maybe not be related to the semantic classes. Maybe I have missed something.
Looking forward to your rely!
Respected Sir,
While I am executing the code, getting the following error. It will be a very helpful for me if you are getting me out of that error.
Traceback (most recent call last):
File "/transfer/detection/convert_pretrain_to_d2.py", line 19, in
input = sys.argv[1]
IndexError: list index out of range
Hi, @xwen99 great work on unsupervised local features learning.
I am wondering how to evaluate SlotCon on unsupervised semantic segmentation benchmarks (i.e., Table 5 in the paper).
Do you have any plans to release the implementation?
Hi, Xin wen!
I am trying to use SlotCon to process 3D data, but some problems hinder me. Such as the operation torchvision.ops.roi_align
don't provide 3D version. Do you have some solutions for that?
Hi Xin Wen,
Thanks for your great work! Regarding SlotCon, I have two questions:
(1) I notice the prototypes are initialized with nn.Embedding
. I am wondering how to ensure that the trainable prototypes are optimized to be meaningful semantic groups via backpropagation. Since the loss functions do not explicitly ensure this, I am a little bit confused about the optimization of prototypes.
(2) Have you tried how the resize operation in the data augmentation matters? I mean, if you only do crop along with other augmentation, without resize operation, will the performance drop heavily?
Thanks for your reply!
Hello,
Thank you for your very interesting work ! I'm currently trying to replicate your results with your provided codebase and I was wondering whether you also tested a Vision Transformer architecture as encoder ? You compared in the paper with DINO, but I wanted to know if you where able to get some properties close to what they obtained (a kind of saliency map with the attention map around the object of interest).
Thank you again for your response !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.