Giter VIP home page Giter VIP logo

ctvis's People

Contributors

aziily avatar kainingying avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ctvis's Issues

Trying to test a model trained with 8 GPUs on a single GPU

Hi @KainingYing ,

I am attempting to test a model that was trained using 8 GPUs on a single GPU. I followed the instructions for the registration process as you explained, and I appreciate that. Could you please provide guidance on how to perform model training with a single GPU?

Additionally, how can we modify the parameters that affect memory during the training of the model?

inconsistent contrastive training scheme with paper

Hi, author!
I'm curious about the training scheme with contractive loss in your code.
In your paper, you said,

"We use the MA embeddings of other instances in the memory bank as the major negative embeddings".

However, when i checked your source code, there was no code regarding the aforementioned statement.
Rather, the code just samples the t-1 object queries as shown in below figure.

image

image

Is there anything I missed?

[Inference Reproduce] The influence of GPU and Pytorch

We release the weights of R50_YTVIS19, you can download it here. You can evaluate this checkpoint on your own machine and get an expected score 55.1 AP

However, some users (#3 (comment)) said the inference can not match the performance (~55.1 AP) on paper or repos. We argue this is introduced by the mismatch of the required Pytorch version or GPU version.

In this issue, we evaluate this checkpoint on different combinations of Pytorch (1.x, 2.x) and Nvidia GPU (RTX 3060, 3090, 4090, A6000). We use Python 3.10 as the main environment.

RTX 3060 RTX 3090 RTX 4090 A6000
Pytorch 1.12.1 54.42576062 - 55.13484004
Pytorch 2.0.0 55.21045475 54.27014723 55.27668969 55.13366189

We find that the GPU model and Pytorch environment can both affect the AP. Surprisingly, the RTX 3090 is about 1 point lower than the others.

It's normal for VIS to fluctuate during training, but it's very strange that it fluctuates so much during testing. We would be very grateful if someone could advise what is causing this.

Undefined symbol ImportError

Hello Author,

after installing and building my environment and preparing the data I am not able to run the train_ctvis.py.

Traceback (most recent call last): File "/beegfs/work/ymarquardt/CTVIS/train_ctvis.py", line 43, in <module> from mask2former import add_maskformer2_config File "/beegfs/work/ymarquardt/CTVIS/mask2former/__init__.py", line 3, in <module> from . import modeling File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/__init__.py", line 4, in <module> from .pixel_decoder.msdeformattn import MSDeformAttnPixelDecoder File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/msdeformattn.py", line 19, in <module> from .ops.modules import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/__init__.py", line 12, in <module> from .ms_deform_attn import MSDeformAttn File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/modules/ms_deform_attn.py", line 24, in <module> from ..functions import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/__init__.py", line 12, in <module> from .ms_deform_attn_func import MSDeformAttnFunction File "/beegfs/work/ymarquardt/CTVIS/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py", line 22, in <module> import MultiScaleDeformableAttention as MSDA ImportError: /home/ymarquardt/anaconda3/envs/CTVIS2/lib/python3.10/site-packages/MultiScaleDeformableAttention-1.0-py3.10-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl srun: error: gpu06: task 0: Exited with exit code 1

I already run python -m detectron2.utils.collect_env to find out inconsistent CUDA versions and got the following output:

`


sys.platform linux
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
numpy 1.26.2
detectron2 0.6 @/home/ymarquardt/detectron2/detectron2
Compiler GCC 11.2
CUDA compiler CUDA 11.6
detectron2 arch flags 8.0
DETECTRON2_ENV_MODULE
PyTorch 1.13.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA A100-PCIE-40GB (arch=8.0)
Driver version 535.104.05
CUDA_HOME /cluster/cuda/11.6
Pillow 8.2.0
torchvision 0.14.1+cu116 @/home/ymarquardt/anaconda3/envs/CTVIS/lib/python3.10/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.1


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.6
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
    `

IDOL experiments

Hi,

could you share or explain how you implemented Multi-Reference IDOL, as well as IDOL with Mask2Former?

thanks!

checkpoint incompatible shapes

when i try to use the provided checkpoint (CTVIS_R50_OVIS.pth), shape-compatible issues arise... (In OVIS, the number of classes should be 26...)
Using that checkpoint with configuration (CTVIS_R50.yaml), the codalab evaluation scores are obtained as 0.09AP.
With the modification of the number of classes from 25 to 40 in the configuration file, the scores are 34.7AP.
Could you reupload the checkpoint with the correction?

image

about contrastive learning on VPS

Hi authors,

Your project is pretty good! I have a question about how do you perform Contrastive Learning on VPS, will you apply it for stuff queries?

Thanks!

An question on memory bank update with noise

Hi,

Thank you for sharing your code. I have a question on your method to update the memory bank with noise. I would appreciate if you could provide some help. Specifically, I note you randomly replace an instance in the memory bank with another instance (i.e. noise). Is this mechanism designed only to help the model recover when wrong id assignment happens in a specific frame (i.e. the model can reindentify the correct instance in the next few frames), or it can also prevent the happening of wrong id assignment? If it is the latter cause, please can you give more explanation on the why it can prevent wrong id assignment? I have thought on it but did not get it very clear. Thank you for your help!

Some problems in browse_datasets and training

  1. In browse_datasets.py, it seems that there isn't mask2former_video.data_video.datasets.ytvis. Instead, we can use ctvis.data.vis.ytvis. It's better for one to move it to the root directory before using it.
  2. When training, it fails when batch size is 1. We can modify line 296 in ctvis/modeling/cl_plugin/ct_cl_plugin.py, like this random.sample(list(set(range(self.num_negatives + 1)) - set([anchor_query_id.item()])), self.num_negatives)) # noqa

Some code issues

  1. The code cannot be trained, the error is:
    CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward cost_class = -out_prob[:, tgt_ids] IndexError: tensors used as indices must be long, int, byte or bool tensors
  2. I test the author's YTVIS19_R50 model is 54.4AP, but the result does not match the paper. It is 55.1AP in the paper and 55.2AP in README.md.
  3. visualize_all_videos.py and demo.py cannot run, import many modules that are not included in the code.

A bug in your code

Hi! Your work is excellent!

I found a bug when running the following code "ython train_ctvis.py --num-gpus 4 --config-file configs/ytvis_2021/CTVIS_R50.yaml"

CTVIS/mask2former/modeling/matcher.py", line 111, in memory_efficient_forward
cost_class = -out_prob[:, tgt_ids]
IndexError: tensors used as indices must be long, int, byte or bool tensors

Is that because of the environment? Thanks!

Adding new backbone

I want to add a new backbone to ctvis. How can I make the model train using the backbone I added?

Real-Time Inference for CTVIS - Performance and Implementation Inquiry

Hello,

I've been exploring CTVIS (Consistent Training for Online Video Instance Segmentation) and I'm interested in its real-time inference capabilities. I've noticed that the provided demo script, demo.py, supports video input, and I'd like to understand if CTVIS can be effectively used in real-time applications.

Real-Time Performance: Can CTVIS be used for real-time video instance segmentation? I'm curious about its performance and whether it can achieve low-latency results on live video streams.

Optimal Configurations: Are there specific configurations or settings that need to be adjusted to enhance real-time performance? If there are best practices or tips for real-time deployment, I'd appreciate guidance on that.

Hardware Considerations: Are there any hardware requirements or recommendations for achieving real-time performance with CTVIS, such as GPU specifications or other hardware considerations?

Implementation Guidance: If CTVIS can be used in real-time scenarios, could you provide some implementation guidance or code examples to demonstrate how to set up and run CTVIS for real-time video instance segmentation?

I'm eager to learn more about the potential of CTVIS in real-time applications, and any insights or guidance you can provide would be greatly appreciated.

Thank you for your time and assistance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.