zhang-tao-whu / dvis_plus Goto Github PK

View Code? Open in Web Editor NEW

85.0 3.0 5.0 453 KB

Python 95.40% Shell 0.05% C++ 0.45% Cuda 4.10%

open-vocabulary video-instance-segmentation video-segmentation video-semantic-segmentation

dvis_plus's People

Contributors

Stargazers

Watchers

Forkers

shuaicong97 egeozsoy pinghe-stan coral-wku amazevr

dvis_plus's Issues

demo.py: error: unrecognized arguments: --merge: command not found

Hi! I am new to open vocab VIS and I am currently testing it out. I encountered some problems as I am running the visualisation mode of the code. It caught an error of unrecognised argument, --merge seems to be the problem in this case. I am unsure of what to do so that I get to try this model.
I am trying to segment/detect person and cash using open vocab models. Do let me know on what I am doing wrong and what I should adjust...

eval: KeyError: 'video_id` / demo: AttributeError: 'list' object has no attribute 'to'

Hello, Thank you for all materials :)
I successfully trained my custom dataset(the original format is COCO, and already used coco2ytvis tools) using the DVIS_Plus model, and I made my config file(.yaml).

I had 2 Errors,

When I run evaluation code,
!python train_net_video.py --num-gpus 1 --config-file /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/cfg2.yaml --eval-only MODEL.WEIGHTS /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/model_final.pth

File "/home/user/dvis2/DVIS_Plus/dvis_Plus/data_video/datasets/ytvis_api/ytvos.py", line 81, in createIndex vidToAnns[ann['video_id']].append(ann) **KeyError: 'video_id**
But, my annotation files already exist 'video_id`.

When I run demo,
!python demo_video/demo_long_video.py --config-file /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/config.yaml --input /home/user/dvis2/DVIS_Plus/datasets/train_exd/merged_scenario --output /home/user/dvis2/DVIS_Plus/datasets/train_exd/visualization --opts MODEL.WEIGHTS /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/model_final.pth

File "demo_video/demo_long_video.py", line 124, in <module> predictions, visualized_output = demo.run_on_video(vid_frames, keep=False) File "/home/user/dvis2/DVIS_Plus/demo_video/predictor.py", line 161, in run_on_video predictions = self.predictor((frames, keep)) ... File "/home/user/anaconda3/envs/dvis2/lib/python3.8/site-packages/torch/jit/_trace.py", line 1112, in wrapper return fn(*args, **kwargs) File "/home/user/anaconda3/envs/dvis2/lib/python3.8/site-packages/detectron2/layers/wrappers.py", line 170, in move_device_like return src.to(dst.device) **AttributeError: 'list' object has no attribute 'to'**
But, in 'def move_device_like(src: torch.Tensor, dst: torch.Tensor)' src is already Tensor type.

infer

How to make inference predictions about own videos?

infer

demo：
--thing_classes carrot,carrots lantern,lanterns
--stuff_classes hay
————————————
Hello, first of all, thank you very much for open-source such an algorithm library, it is a very good project, and secondly, I have a few questions I would like to ask: (1) When opening vocabulary video segmentation, my video may have a code scanner, coil and the like, is it possible to add my category directly in the demo "--thing_classes"? (2) If I want to detect and segment an open factory scene, and split some obvious artifacts, such as toolboxes, plastic films, etc., can I directly use the weight of your OV to reason directly?③Don't support direct video input?If I want to reason on a new video? Do we need to divide the new video into frames every time? Can't we directly input the entire video and then customize the output of a certain number of frames of the video?

Questions about results.json

Hi, I've successfully got the results.json file by using the offline_vitl_ovis_534.pth. I need to do some further processing of the data, so I have some questions about the results.json file.

I found that there're 140 videos. For each video, there're 20 entries in the json file.

What does '20' here mean? Besides the score and counts are different in each video, why is that?
Does the results.json have some information about object id?

Thank you in advance!

VIPSeg Panoptic Segmentation Eval Error

I am trying to run VIPSeg evaluation as you have suggested:

python train_net_video.py
--num-gpus 1
--config-file configs/dvis_Plus/VIPSeg/CTVIS_r50.yaml
--eval-only MODEL.WEIGHTS DVIS_Plus/mask2former/ctvis_r50_vipseg.pth

But I am getting the following error:

File "/home/guests/ege_oezsoy/MultiModalOperatingRoom/DVIS_Plus/train_net_video.py", line 260, in test
  results_i = inference_on_dataset(model, data_loader, evaluator)
File "/home/guests/ege_oezsoy/miniconda3/envs/MultiModalOperatingRoom/lib/python3.10/site-packages/detectron2/evaluation/evaluator.py", line 164, in inference_on_dataset
  evaluator.process(inputs, outputs)
File "/home/guests/ege_oezsoy/MultiModalOperatingRoom/DVIS_Plus/dvis_Plus/data_video/vps_eval.py", line 108, in process
  segments_infos = outputs['segments_infos']
KeyError: 'segments_infos'

Upon further inspection, it seems to me that some part of the code/model is not properly processing the output as a panoptic segmentation, because the output has the following keys: ['image_size', 'pred_scores', 'pred_labels' , 'pred_masks']

I wanted to ask you if you have encountered a similar issue, and if you have any suggestions on how to infer/eval in panoptic segmentation mode with the VIPSeg dataset. Maybe I am configuring something wrong?

Thank you for you help

Error while Evaluation

Hi, first thank you for your work!

I want to reproduce the results of your paper in Table 1 (the best, i.e. DVIS++ with VIT-L and OVIS). So I downloaded the offline_vitl_ovis_534.pth.

about the number of gpus or IMS_PER_BATCH

Hello, thank you for your great work and sharing the code!

I noticed there is a $NUM_GPUS over-writing the config values of IMS_PER_BATCH in command line options. I am wondering what is the number of gpus or IMS_PER_BATCH you used to obtain the reported performance? Do you use the default values in the config file?

I hope you can answer the question. Thank you very much!

What are the differences between Finetuned Segmentor and DVIS++?

Thanks for the great work!
In the model zoo I find both Finetuned Segmentor and DVIS++ for each dataset.
May I ask the differences between these two categories?
If I want to use for my customized dataset, which weights I should choose?

Thanks!

error running

There was an error running，this is my code

I've downloaded the model，

Custom dataset

Dear Mr. Zhang,

Could you please give me some advice on introducing custom datasets for training?

I want to segment a video with 2 fish swimming next to each other. I tried inferring instances using Ovis pretrained vitl model and it worked well. I want to further improve segmentation by training DVIS_Plus on a combination of OVIS and my custom dataset.
In particular, I have labeled "fish" instances in some frames of my video (totally ~800 frames with 2 instances each) and saved annotation file in coco.json format.

But I am confused how I should register the dataset. I tried adding function to register custom dataset in builtin.py:

def register_all_coco_custom(root):
    for key, (image_root, json_file) in _PREDEFINED_SPLITS_COCO_CUSTOM.items():
        # Assume pre-defined datasets live in `./datasets`
        register_coco_instances(
            key,
            {},
            os.path.join(root, json_file) if "://" not in json_file else json_file,
            os.path.join(root, image_root),
        )

This worked well, when I just trying Detectron2 model separately before.
But now I get following error : AttributeError: Attribute 'thing_dataset_id_to_contiguous_id' does not exist in the metadata of dataset 'custom_train'. Available keys are dict_keys(['name', 'json_file', 'image_root', 'evaluator_type']).

What additional information should I provide? How can I run training on both custom and ovis dataset? Also, does custom data set have to consist of consistent frames, or random frames should be also okay? Thank you very much in advance!

Download model weights baidu

Thanks for sharing your repo with the public.
My issue is not related to the code but to downloading the weights from the Model zoo.
I was able to download the ResNet50 weights (Mask2Former(instance) | R50 | COCO | OVIS,YTVIS19&21 | where a direct link was provided without issues: https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/instance/maskformer2_R50_bs16_50ep/model_final_3c8ec9.pkl .
Now I wanted to download the weights of the VIT-L model but I was not able to start the download. Not knowing any Chinese I just have always downloaded the BaiduNetDisk installer. Is there an option to download the weights directly without creating a baidu user account or downloading the app?

convert to onnx

Dear Mr. Zhang,
thank you for your great work.

i want to convert the DVIS_Plus model to onnx, is there any deploy python script

Temporal Refiner in Online Mode

First of all, congrats on a really great work.

I had a question about the temporal refiner in the online setting: Would it not be possible to still use a temporal refiner for processing the current frame, so that this segmentation is improved by the past frames? It seems like you are not using any temporal refinement in the online processing setting.

Many thanks in advance

VIPseg panoptic_gt_VIPSeg_test.json

Hi,
Thank you for open sourcing your work! I download your processed VIPSeg data, unzipped it and found that there is no panoptic_gt_VIPSeg_test.json, how can I get this file?

Cannot load pretrained weight

Dear Mr. Zhang,
Firstly, I want to thank you for your efforts in developing and maintaining DIVS_Plus.

I'm writing to report an issue I encountered while trying to load the model weights from the repository. Specifically, when I tried to retrain Segmenter on the VSPW dataset, the weights for pretrained Segmenter Mask2Former(panoptic) ViT-L seem to be problematic. I've successfully loaded other weights provided in the repository, but when attempting to load this particular one, I receive the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/anaconda3/envs/vspw/lib/python3.8/site-packages/torch/serialization.py", line 600, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/root/anaconda3/envs/vspw/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I've tried downloading the file multiple times from both baidupan and HuggingFace to rule out download corruption. Could you please check if the model weight file for Mask2Former(panoptic) ViT-L is uploaded correctly, or if there's a known issue with this file? Any guidance on how to resolve this issue would be greatly appreciated.

Can I get pretrained checkpoint files from another way?

Hi, first of all, thank you for your amazing job for this task.

I wonder that can I get pretrained checkpoint files from another way instead of baidu drive? (ex. google drive, etc.)
I cannot download from baidu link that you provided on model zoo.

thank you.