kainingying / ctvis Goto Github PK

View Code? Open in Web Editor NEW

65.0 65.0 4.0 311 KB

ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation

License: MIT License

Python 90.07% Shell 0.10% C++ 0.98% Cuda 8.84%

ctvis's People

Contributors

Stargazers

Watchers

Forkers

cv-seg zjutcv zhang-tao-whu

ctvis's Issues

Trying to test a model trained with 8 GPUs on a single GPU

Hi @KainingYing ,

I am attempting to test a model that was trained using 8 GPUs on a single GPU. I followed the instructions for the registration process as you explained, and I appreciate that. Could you please provide guidance on how to perform model training with a single GPU?

Additionally, how can we modify the parameters that affect memory during the training of the model?

inconsistent contrastive training scheme with paper

Hi, author!
I'm curious about the training scheme with contractive loss in your code.
In your paper, you said,

"We use the MA embeddings of other instances in the memory bank as the major negative embeddings".

However, when i checked your source code, there was no code regarding the aforementioned statement.
Rather, the code just samples the t-1 object queries as shown in below figure.

Is there anything I missed?

[Inference Reproduce] The influence of GPU and Pytorch

We release the weights of R50_YTVIS19, you can download it here. You can evaluate this checkpoint on your own machine and get an expected score 55.1 AP

However, some users (#3 (comment)) said the inference can not match the performance (~55.1 AP) on paper or repos. We argue this is introduced by the mismatch of the required Pytorch version or GPU version.

In this issue, we evaluate this checkpoint on different combinations of Pytorch (1.x, 2.x) and Nvidia GPU (RTX 3060, 3090, 4090, A6000). We use Python 3.10 as the main environment.

	RTX 3060	RTX 3090	RTX 4090	A6000
Pytorch 1.12.1		54.42576062	-	55.13484004
Pytorch 2.0.0	55.21045475	54.27014723	55.27668969	55.13366189

We find that the GPU model and Pytorch environment can both affect the AP. Surprisingly, the RTX 3090 is about 1 point lower than the others.

It's normal for VIS to fluctuate during training, but it's very strange that it fluctuates so much during testing. We would be very grateful if someone could advise what is causing this.

Register Custom Dataset

How to we register own dataset to CTVIS? Register(Colab tutorial) of Detectron2 is not working.
Page URL: https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html

Undefined symbol ImportError

Hello Author,

after installing and building my environment and preparing the data I am not able to run the train_ctvis.py.

Traceback (most recent call last): File "/beegfs/work/ymarquardt/CTVIS/train_ctvis.py", line 43, in <module> from mask2former import add_maskformer2_config File "/beegfs/work/ymarquardt/CTVIS/mask2former/__init__.py", line 3, in <module> from . import modeling File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/__init__.py", line 4, in <module> from .pixel_decoder.msdeformattn import MSDeformAttnPixelDecoder File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/msdeformattn.py", line 19, in <module> from .ops.modules import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/__init__.py", line 12, in <module> from .ms_deform_attn import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/ms_deform_attn.py", line 24, in <module> from ..functions import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/__init__.py", line 12, in <module> from .ms_deform_attn_func import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py", line 22, in <module> import MultiScaleDeformableAttention as MSDA ImportError: /home/ymarquardt/anaconda3/envs/CTVIS2/lib/python3.10/site-packages/MultiScaleDeformableAttention-1.0-py3.10-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl srun: error: gpu06: task 0: Exited with exit code 1

I already run python -m detectron2.utils.collect_env to find out inconsistent CUDA versions and got the following output:

sys.platform linux
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
numpy 1.26.2
detectron2 0.6 @/home/ymarquardt/detectron2/detectron2
Compiler GCC 11.2
CUDA compiler CUDA 11.6
detectron2 arch flags 8.0
DETECTRON2_ENV_MODULE
PyTorch 1.13.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA A100-PCIE-40GB (arch=8.0)
Driver version 535.104.05
CUDA_HOME /cluster/cuda/11.6
Pillow 8.2.0
torchvision 0.14.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.1

PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.6
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
`

IDOL experiments

Hi,

could you share or explain how you implemented Multi-Reference IDOL, as well as IDOL with Mask2Former?

thanks!

checkpoint incompatible shapes

when i try to use the provided checkpoint (CTVIS_R50_OVIS.pth), shape-compatible issues arise... (In OVIS, the number of classes should be 26...)
Using that checkpoint with configuration (CTVIS_R50.yaml), the codalab evaluation scores are obtained as 0.09AP.
With the modification of the number of classes from 25 to 40 in the configuration file, the scores are 34.7AP.
Could you reupload the checkpoint with the correction?

about contrastive learning on VPS

Hi authors,

Your project is pretty good! I have a question about how do you perform Contrastive Learning on VPS, will you apply it for stuff queries?

Thanks!

An question on memory bank update with noise

Hi,

Thank you for sharing your code. I have a question on your method to update the memory bank with noise. I would appreciate if you could provide some help. Specifically, I note you randomly replace an instance in the memory bank with another instance (i.e. noise). Is this mechanism designed only to help the model recover when wrong id assignment happens in a specific frame (i.e. the model can reindentify the correct instance in the next few frames), or it can also prevent the happening of wrong id assignment? If it is the latter cause, please can you give more explanation on the why it can prevent wrong id assignment? I have thought on it but did not get it very clear. Thank you for your help!

Some problems in browse_datasets and training

In browse_datasets.py, it seems that there isn't mask2former_video.data_video.datasets.ytvis. Instead, we can use ctvis.data.vis.ytvis. It's better for one to move it to the root directory before using it.
When training, it fails when batch size is 1. We can modify line 296 in ctvis/modeling/cl_plugin/ct_cl_plugin.py, like this random.sample(list(set(range(self.num_negatives + 1)) - set([anchor_query_id.item()])), self.num_negatives)) # noqa

Some code issues

The code cannot be trained, the error is:
CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward cost_class = -out_prob[:, tgt_ids] IndexError: tensors used as indices must be long, int, byte or bool tensors
I test the author's YTVIS19_R50 model is 54.4AP, but the result does not match the paper. It is 55.1AP in the paper and 55.2AP in README.md.
visualize_all_videos.py and demo.py cannot run, import many modules that are not included in the code.

Sharing the split for partial training examples

Thank you for sharing the code,

Could you also share the data used for partial training examples (1%, 5% etc) for ytvis21?

The training script would also be appreciated!

thanks!

how to export in onnx format

Could you help?

A bug in your code

Hi! Your work is excellent!

I found a bug when running the following code "ython train_ctvis.py --num-gpus 4 --config-file configs/ytvis_2021/CTVIS_R50.yaml"

CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward
cost_class = -out_prob[:, tgt_ids]
IndexError: tensors used as indices must be long, int, byte or bool tensors

Is that because of the environment? Thanks!

Adding new backbone

I want to add a new backbone to ctvis. How can I make the model train using the backbone I added?

visualization of the results

Hi, could I ask how could I get the visualization of the results as shown in your paper?

Thanks

Real-Time Inference for CTVIS - Performance and Implementation Inquiry

Hello,

I've been exploring CTVIS (Consistent Training for Online Video Instance Segmentation) and I'm interested in its real-time inference capabilities. I've noticed that the provided demo script, demo.py, supports video input, and I'd like to understand if CTVIS can be effectively used in real-time applications.

Real-Time Performance: Can CTVIS be used for real-time video instance segmentation? I'm curious about its performance and whether it can achieve low-latency results on live video streams.

Optimal Configurations: Are there specific configurations or settings that need to be adjusted to enhance real-time performance? If there are best practices or tips for real-time deployment, I'd appreciate guidance on that.

Hardware Considerations: Are there any hardware requirements or recommendations for achieving real-time performance with CTVIS, such as GPU specifications or other hardware considerations?

Implementation Guidance: If CTVIS can be used in real-time scenarios, could you provide some implementation guidance or code examples to demonstrate how to set up and run CTVIS for real-time video instance segmentation?

I'm eager to learn more about the potential of CTVIS in real-time applications, and any insights or guidance you can provide would be greatly appreciated.

Thank you for your time and assistance.