kainingying / ctvis Goto Github PK
View Code? Open in Web Editor NEWICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
License: MIT License
ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
License: MIT License
Hi @KainingYing ,
I am attempting to test a model that was trained using 8 GPUs on a single GPU. I followed the instructions for the registration process as you explained, and I appreciate that. Could you please provide guidance on how to perform model training with a single GPU?
Additionally, how can we modify the parameters that affect memory during the training of the model?
Hi, author!
I'm curious about the training scheme with contractive loss in your code.
In your paper, you said,
However, when i checked your source code, there was no code regarding the aforementioned statement.
Rather, the code just samples the t-1 object queries as shown in below figure.
Is there anything I missed?
We release the weights of R50_YTVIS19
, you can download it here. You can evaluate this checkpoint on your own machine and get an expected score 55.1 AP
However, some users (#3 (comment)) said the inference can not match the performance (~55.1 AP) on paper or repos. We argue this is introduced by the mismatch of the required Pytorch version or GPU version.
In this issue, we evaluate this checkpoint on different combinations of Pytorch (1.x, 2.x) and Nvidia GPU (RTX 3060, 3090, 4090, A6000). We use Python 3.10 as the main environment.
RTX 3060 | RTX 3090 | RTX 4090 | A6000 | |
---|---|---|---|---|
Pytorch 1.12.1 | 54.42576062 | - | 55.13484004 | |
Pytorch 2.0.0 | 55.21045475 | 54.27014723 | 55.27668969 | 55.13366189 |
We find that the GPU model and Pytorch environment can both affect the AP. Surprisingly, the RTX 3090 is about 1 point lower than the others.
It's normal for VIS to fluctuate during training, but it's very strange that it fluctuates so much during testing. We would be very grateful if someone could advise what is causing this.
How to we register own dataset to CTVIS? Register(Colab tutorial) of Detectron2 is not working.
Page URL: https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html
Hello Author,
after installing and building my environment and preparing the data I am not able to run the train_ctvis.py.
Traceback (most recent call last): File "/beegfs/work/ymarquardt/CTVIS/train_ctvis.py", line 43, in <module> from mask2former import add_maskformer2_config File "/beegfs/work/ymarquardt/CTVIS/mask2former/__init__.py", line 3, in <module> from . import modeling File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/__init__.py", line 4, in <module> from .pixel_decoder.msdeformattn import MSDeformAttnPixelDecoder File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/msdeformattn.py", line 19, in <module> from .ops.modules import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/__init__.py", line 12, in <module> from .ms_deform_attn import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/ms_deform_attn.py", line 24, in <module> from ..functions import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/__init__.py", line 12, in <module> from .ms_deform_attn_func import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py", line 22, in <module> import MultiScaleDeformableAttention as MSDA ImportError: /home/ymarquardt/anaconda3/envs/CTVIS2/lib/python3.10/site-packages/MultiScaleDeformableAttention-1.0-py3.10-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl srun: error: gpu06: task 0: Exited with exit code 1
I already run python -m detectron2.utils.collect_env to find out inconsistent CUDA versions and got the following output:
`
sys.platform linux
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
numpy 1.26.2
detectron2 0.6 @/home/ymarquardt/detectron2/detectron2
Compiler GCC 11.2
CUDA compiler CUDA 11.6
detectron2 arch flags 8.0
DETECTRON2_ENV_MODULE
PyTorch 1.13.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA A100-PCIE-40GB (arch=8.0)
Driver version 535.104.05
CUDA_HOME /cluster/cuda/11.6
Pillow 8.2.0
torchvision 0.14.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.1
PyTorch built with:
Hi,
could you share or explain how you implemented Multi-Reference IDOL, as well as IDOL with Mask2Former?
thanks!
when i try to use the provided checkpoint (CTVIS_R50_OVIS.pth), shape-compatible issues arise... (In OVIS, the number of classes should be 26...)
Using that checkpoint with configuration (CTVIS_R50.yaml), the codalab evaluation scores are obtained as 0.09AP.
With the modification of the number of classes from 25 to 40 in the configuration file, the scores are 34.7AP.
Could you reupload the checkpoint with the correction?
Hi authors,
Your project is pretty good! I have a question about how do you perform Contrastive Learning on VPS, will you apply it for stuff queries?
Thanks!
Hi,
Thank you for sharing your code. I have a question on your method to update the memory bank with noise. I would appreciate if you could provide some help. Specifically, I note you randomly replace an instance in the memory bank with another instance (i.e. noise). Is this mechanism designed only to help the model recover when wrong id assignment happens in a specific frame (i.e. the model can reindentify the correct instance in the next few frames), or it can also prevent the happening of wrong id assignment? If it is the latter cause, please can you give more explanation on the why it can prevent wrong id assignment? I have thought on it but did not get it very clear. Thank you for your help!
browse_datasets.py
, it seems that there isn't mask2former_video.data_video.datasets.ytvis
. Instead, we can use ctvis.data.vis.ytvis
. It's better for one to move it to the root directory before using it.ctvis/modeling/cl_plugin/ct_cl_plugin.py
, like this random.sample(list(set(range(self.num_negatives + 1)) - set([anchor_query_id.item()])), self.num_negatives)) # noqa
CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward cost_class = -out_prob[:, tgt_ids] IndexError: tensors used as indices must be long, int, byte or bool tensors
Thank you for sharing the code,
Could you also share the data used for partial training examples (1%, 5% etc) for ytvis21?
The training script would also be appreciated!
thanks!
Could you help?
Hi! Your work is excellent!
I found a bug when running the following code "ython train_ctvis.py --num-gpus 4 --config-file configs/ytvis_2021/CTVIS_R50.yaml"
CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward
cost_class = -out_prob[:, tgt_ids]
IndexError: tensors used as indices must be long, int, byte or bool tensors
Is that because of the environment? Thanks!
I want to add a new backbone to ctvis. How can I make the model train using the backbone I added?
Hi, could I ask how could I get the visualization of the results as shown in your paper?
Thanks
Hello,
I've been exploring CTVIS (Consistent Training for Online Video Instance Segmentation) and I'm interested in its real-time inference capabilities. I've noticed that the provided demo script, demo.py, supports video input, and I'd like to understand if CTVIS can be effectively used in real-time applications.
Real-Time Performance: Can CTVIS be used for real-time video instance segmentation? I'm curious about its performance and whether it can achieve low-latency results on live video streams.
Optimal Configurations: Are there specific configurations or settings that need to be adjusted to enhance real-time performance? If there are best practices or tips for real-time deployment, I'd appreciate guidance on that.
Hardware Considerations: Are there any hardware requirements or recommendations for achieving real-time performance with CTVIS, such as GPU specifications or other hardware considerations?
Implementation Guidance: If CTVIS can be used in real-time scenarios, could you provide some implementation guidance or code examples to demonstrate how to set up and run CTVIS for real-time video instance segmentation?
I'm eager to learn more about the potential of CTVIS in real-time applications, and any insights or guidance you can provide would be greatly appreciated.
Thank you for your time and assistance.
Hi! author!
Do you have any plans to share the model weights for the other benchmarks?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.