Giter VIP home page Giter VIP logo

dvis_plus's People

Contributors

zhang-tao-whu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dvis_plus's Issues

demo.py: error: unrecognized arguments: --merge: command not found

Hi! I am new to open vocab VIS and I am currently testing it out. I encountered some problems as I am running the visualisation mode of the code. It caught an error of unrecognised argument, --merge seems to be the problem in this case. I am unsure of what to do so that I get to try this model.
I am trying to segment/detect person and cash using open vocab models. Do let me know on what I am doing wrong and what I should adjust...
image

eval: KeyError: 'video_id` / demo: AttributeError: 'list' object has no attribute 'to'

Hello, Thank you for all materials :)
I successfully trained my custom dataset(the original format is COCO, and already used coco2ytvis tools) using the DVIS_Plus model, and I made my config file(.yaml).

I had 2 Errors,

  1. When I run evaluation code,
    !python train_net_video.py --num-gpus 1 --config-file /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/cfg2.yaml --eval-only MODEL.WEIGHTS /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/model_final.pth

File "/home/user/dvis2/DVIS_Plus/dvis_Plus/data_video/datasets/ytvis_api/ytvos.py", line 81, in createIndex vidToAnns[ann['video_id']].append(ann) **KeyError: 'video_id**
But, my annotation files already exist 'video_id`.

  1. When I run demo,
    !python demo_video/demo_long_video.py --config-file /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/config.yaml --input /home/user/dvis2/DVIS_Plus/datasets/train_exd/merged_scenario --output /home/user/dvis2/DVIS_Plus/datasets/train_exd/visualization --opts MODEL.WEIGHTS /home/user/dvis2/DVIS_Plus/output_CTVIS_R50_YTVIS21_custom_0205/model_final.pth

File "demo_video/demo_long_video.py", line 124, in <module> predictions, visualized_output = demo.run_on_video(vid_frames, keep=False) File "/home/user/dvis2/DVIS_Plus/demo_video/predictor.py", line 161, in run_on_video predictions = self.predictor((frames, keep)) ... File "/home/user/anaconda3/envs/dvis2/lib/python3.8/site-packages/torch/jit/_trace.py", line 1112, in wrapper return fn(*args, **kwargs) File "/home/user/anaconda3/envs/dvis2/lib/python3.8/site-packages/detectron2/layers/wrappers.py", line 170, in move_device_like return src.to(dst.device) **AttributeError: 'list' object has no attribute 'to'**
But, in 'def move_device_like(src: torch.Tensor, dst: torch.Tensor)' src is already Tensor type.

infer

How to make inference predictions about own videos?

infer

demo:
--thing_classes carrot,carrots lantern,lanterns
--stuff_classes hay
————————————
Hello, first of all, thank you very much for open-source such an algorithm library, it is a very good project, and secondly, I have a few questions I would like to ask: (1) When opening vocabulary video segmentation, my video may have a code scanner, coil and the like, is it possible to add my category directly in the demo "--thing_classes"? (2) If I want to detect and segment an open factory scene, and split some obvious artifacts, such as toolboxes, plastic films, etc., can I directly use the weight of your OV to reason directly?③Don't support direct video input?If I want to reason on a new video? Do we need to divide the new video into frames every time? Can't we directly input the entire video and then customize the output of a certain number of frames of the video?

Questions about results.json

Hi, I've successfully got the results.json file by using the offline_vitl_ovis_534.pth. I need to do some further processing of the data, so I have some questions about the results.json file.

I found that there're 140 videos. For each video, there're 20 entries in the json file.

  1. What does '20' here mean? Besides the score and counts are different in each video, why is that?
  2. Does the results.json have some information about object id?

Thank you in advance!

VIPSeg Panoptic Segmentation Eval Error

I am trying to run VIPSeg evaluation as you have suggested:

python train_net_video.py
--num-gpus 1
--config-file configs/dvis_Plus/VIPSeg/CTVIS_r50.yaml
--eval-only MODEL.WEIGHTS DVIS_Plus/mask2former/ctvis_r50_vipseg.pth 

But I am getting the following error:

File "/home/guests/ege_oezsoy/MultiModalOperatingRoom/DVIS_Plus/train_net_video.py", line 260, in test
  results_i = inference_on_dataset(model, data_loader, evaluator)
File "/home/guests/ege_oezsoy/miniconda3/envs/MultiModalOperatingRoom/lib/python3.10/site-packages/detectron2/evaluation/evaluator.py", line 164, in inference_on_dataset
  evaluator.process(inputs, outputs)
File "/home/guests/ege_oezsoy/MultiModalOperatingRoom/DVIS_Plus/dvis_Plus/data_video/vps_eval.py", line 108, in process
  segments_infos = outputs['segments_infos']
KeyError: 'segments_infos'

Upon further inspection, it seems to me that some part of the code/model is not properly processing the output as a panoptic segmentation, because the output has the following keys: ['image_size', 'pred_scores', 'pred_labels' , 'pred_masks']

I wanted to ask you if you have encountered a similar issue, and if you have any suggestions on how to infer/eval in panoptic segmentation mode with the VIPSeg dataset. Maybe I am configuring something wrong?

Thank you for you help

Error while Evaluation

Hi, first thank you for your work!

I want to reproduce the results of your paper in Table 1 (the best, i.e. DVIS++ with VIT-L and OVIS). So I downloaded the offline_vitl_ovis_534.pth.

about the number of gpus or IMS_PER_BATCH

Hello, thank you for your great work and sharing the code!

I noticed there is a $NUM_GPUS over-writing the config values of IMS_PER_BATCH in command line options. I am wondering what is the number of gpus or IMS_PER_BATCH you used to obtain the reported performance? Do you use the default values in the config file?

I hope you can answer the question. Thank you very much!

error running

image
There was an error running,this is my code
image
I've downloaded the model,

Custom dataset

Dear Mr. Zhang,

Could you please give me some advice on introducing custom datasets for training?

I want to segment a video with 2 fish swimming next to each other. I tried inferring instances using Ovis pretrained vitl model and it worked well. I want to further improve segmentation by training DVIS_Plus on a combination of OVIS and my custom dataset.
In particular, I have labeled "fish" instances in some frames of my video (totally ~800 frames with 2 instances each) and saved annotation file in coco.json format.

But I am confused how I should register the dataset. I tried adding function to register custom dataset in builtin.py:

def register_all_coco_custom(root):
    for key, (image_root, json_file) in _PREDEFINED_SPLITS_COCO_CUSTOM.items():
        # Assume pre-defined datasets live in `./datasets`
        register_coco_instances(
            key,
            {},
            os.path.join(root, json_file) if "://" not in json_file else json_file,
            os.path.join(root, image_root),
        )

This worked well, when I just trying Detectron2 model separately before.
But now I get following error : AttributeError: Attribute 'thing_dataset_id_to_contiguous_id' does not exist in the metadata of dataset 'custom_train'. Available keys are dict_keys(['name', 'json_file', 'image_root', 'evaluator_type']).

What additional information should I provide? How can I run training on both custom and ovis dataset? Also, does custom data set have to consist of consistent frames, or random frames should be also okay? Thank you very much in advance!

Download model weights baidu

Thanks for sharing your repo with the public.
My issue is not related to the code but to downloading the weights from the Model zoo.
I was able to download the ResNet50 weights (Mask2Former(instance) | R50 | COCO | OVIS,YTVIS19&21 | where a direct link was provided without issues: https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/instance/maskformer2_R50_bs16_50ep/model_final_3c8ec9.pkl .
Now I wanted to download the weights of the VIT-L model but I was not able to start the download. Not knowing any Chinese I just have always downloaded the BaiduNetDisk installer. Is there an option to download the weights directly without creating a baidu user account or downloading the app?

convert to onnx

Dear Mr. Zhang,
thank you for your great work.

i want to convert the DVIS_Plus model to onnx, is there any deploy python script

Temporal Refiner in Online Mode

First of all, congrats on a really great work.

I had a question about the temporal refiner in the online setting: Would it not be possible to still use a temporal refiner for processing the current frame, so that this segmentation is improved by the past frames? It seems like you are not using any temporal refinement in the online processing setting.

Many thanks in advance

VIPseg panoptic_gt_VIPSeg_test.json

Hi,
Thank you for open sourcing your work! I download your processed VIPSeg data, unzipped it and found that there is no panoptic_gt_VIPSeg_test.json, how can I get this file?

Cannot load pretrained weight

Dear Mr. Zhang,
Firstly, I want to thank you for your efforts in developing and maintaining DIVS_Plus.

I'm writing to report an issue I encountered while trying to load the model weights from the repository. Specifically, when I tried to retrain Segmenter on the VSPW dataset, the weights for pretrained Segmenter Mask2Former(panoptic) ViT-L seem to be problematic. I've successfully loaded other weights provided in the repository, but when attempting to load this particular one, I receive the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/anaconda3/envs/vspw/lib/python3.8/site-packages/torch/serialization.py", line 600, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/root/anaconda3/envs/vspw/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I've tried downloading the file multiple times from both baidupan and HuggingFace to rule out download corruption. Could you please check if the model weight file for Mask2Former(panoptic) ViT-L is uploaded correctly, or if there's a known issue with this file? Any guidance on how to resolve this issue would be greatly appreciated.

Can I get pretrained checkpoint files from another way?

Hi, first of all, thank you for your amazing job for this task.

I wonder that can I get pretrained checkpoint files from another way instead of baidu drive? (ex. google drive, etc.)
I cannot download from baidu link that you provided on model zoo.

thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.