orrzohar / prob Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection
License: Apache License 2.0
[CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection
License: Apache License 2.0
Hi again Orr,
I'm having another issue where I can't run the fine-tuning part of the tasks. So far I trained task 1 and 2 but fine-tuning for task 2 is giving me an error. It seems like the script only runs with the flag --freeze_prob_model
, do you know why this might be happening?
# train task 2
PY_ARGS=${@:1}
python -u main_open_world.py \
--output_dir "${EXP_DIR}/t2" --dataset fathomnet --PREV_INTRODUCED_CLS 10 --CUR_INTRODUCED_CLS 2\
--train_set 'task2_train' --test_set 'all_test' --epochs 51\
--model_type 'prob' --obj_loss_coef 8e-4 --obj_temp 1.3 --freeze_prob_model\
--wandb_name "${WANDB_NAME}_t2"\
--exemplar_replay_selection --exemplar_replay_max_length 1743 --exemplar_replay_dir ${WANDB_NAME}\
--exemplar_replay_prev_file "task1_train_ft.txt" --exemplar_replay_cur_file "task2_train_ft.txt"\
--pretrain "${EXP_DIR}/t1/checkpoint0040.pth" --lr 2e-5\
${PY_ARGS}
# fine tune task 2
PY_ARGS=${@:1}
python -u main_open_world.py \
--output_dir "${EXP_DIR}/t2_ft" --dataset fathomnet --PREV_INTRODUCED_CLS 10 --CUR_INTRODUCED_CLS 2 \
--train_set "${WANDB_NAME}/task2_train_ft" --test_set 'all_test' --epochs 111 --lr_drop 40\
--model_type 'prob' --obj_loss_coef 8e-4 --obj_temp 1.3\
--wandb_name "${WANDB_NAME}_t2_ft"\
--pretrain "${EXP_DIR}/t2/checkpoint0050.pth"\
${PY_ARGS}
Dataset OWDetection
Number of datapoints: 1669
Root location: /home/sabrina/code/PROB/data/OWOD
[['test'], Compose(
<datasets.transforms.RandomResize object at 0x7f053807ef50>
Compose(
<datasets.transforms.ToTensor object at 0x7f053807ef80>
<datasets.transforms.Normalize object at 0x7f053807f0a0>
)
)]
Initialized from the pre-training model
<All keys matched successfully>
Start training from epoch 51 to 111
Traceback (most recent call last):
File "/home/sabrina/code/PROB/main_open_world.py", line 475, in <module>
main(args)
File "/home/sabrina/code/PROB/main_open_world.py", line 335, in main
train_stats = train_one_epoch(
File "/home/sabrina/code/PROB/engine.py", line 41, in train_one_epoch
prefetcher = data_prefetcher(data_loader, device, prefetch=True)
File "/home/sabrina/code/PROB/datasets/data_prefetcher.py", line 21, in __init__
self.preload()
File "/home/sabrina/code/PROB/datasets/data_prefetcher.py", line 25, in preload
self.next_samples, self.next_targets = next(self.loader)
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
data = self._next_data()
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sabrina/code/PROB/datasets/torchvision_datasets/open_world.py", line 328, in __getitem__
img, target = self.transforms[-1](img, target)
File "/home/sabrina/code/PROB/datasets/transforms.py", line 275, in __call__
image, target = t(image, target)
File "/home/sabrina/code/PROB/datasets/transforms.py", line 233, in __call__
return self.transforms2(img, target)
File "/home/sabrina/code/PROB/datasets/transforms.py", line 275, in __call__
image, target = t(image, target)
File "/home/sabrina/code/PROB/datasets/transforms.py", line 207, in __call__
return resize(img, target, size, self.max_size)
File "/home/sabrina/code/PROB/datasets/transforms.py", line 125, in resize
scaled_boxes = boxes * torch.as_tensor([ratio_width, ratio_height, ratio_width, ratio_height])
RuntimeError: The size of tensor a (0) must match the size of tensor b (4) at non-singleton dimension 0
Hi Orr,
I'm trying to run the code on my custom data. Now the training starts but it seems like it stops by the end of the first epoch when it's evaluating the results, but I'm not sure why this is the case. Do you have any ideas about what could be causing this issue?
What I'm running:
python -u main_open_world.py \
--output_dir "${EXP_DIR}/t1" --dataset fathomnet --PREV_INTRODUCED_CLS 0 --CUR_INTRODUCED_CLS 10\
--train_set 'task1_train_ft' --test_set 'all_test' --epochs 1\
--model_type 'prob' --obj_loss_coef 8e-4 --obj_temp 1.3\
--wandb_name "${WANDB_NAME}_t1" --exemplar_replay_selection --exemplar_replay_max_length 850\
--exemplar_replay_dir ${WANDB_NAME} --exemplar_replay_cur_file "task1_train_ft.txt"\
${PY_ARGS}
The error:
Epoch: [0] Total time: 0:34:28 (2.0770 s / it)
Averaged stats: lr: 0.000200 class_error: 94.23 grad_norm: 79.76 loss: 17.0468 (22.1405) loss_bbox: 0.5209 (0.8674) loss_bbox_0: 0.5404 (0.8677) loss_bbox_1: 0.5110 (0.8647) loss_bbox_2: 0.4962 (0.8638) loss_bbox_3: 0.5127 (0.8650) loss_bbox_4: 0.5242 (0.8643) loss_ce: 0.9692 (1.1414) loss_ce_0: 0.9991 (1.1387) loss_ce_1: 0.9678 (1.1351) loss_ce_2: 0.9384 (1.1382) loss_ce_3: 0.9560 (1.1357) loss_ce_4: 0.9526 (1.1354) loss_giou: 1.2438 (1.5470) loss_giou_0: 1.2195 (1.5511) loss_giou_1: 1.2046 (1.5455) loss_giou_2: 1.2071 (1.5467) loss_giou_3: 1.1704 (1.5451) loss_giou_4: 1.1755 (1.5464) loss_obj_ll: 0.0963 (0.1336) loss_obj_ll_0: 0.1126 (0.1440) loss_obj_ll_1: 0.1067 (0.1403) loss_obj_ll_2: 0.1058 (0.1419) loss_obj_ll_3: 0.1018 (0.1404) loss_obj_ll_4: 0.1040 (0.1409) cardinality_error_unscaled: 2.4000 (3.4721) cardinality_error_0_unscaled: 2.5000 (3.4544) cardinality_error_1_unscaled: 2.6000 (3.4567) cardinality_error_2_unscaled: 2.6000 (3.4612) cardinality_error_3_unscaled: 2.6000 (3.4592) cardinality_error_4_unscaled: 2.5000 (3.4591) class_error_unscaled: 95.8333 (98.9971) loss_bbox_unscaled: 0.1042 (0.1735) loss_bbox_0_unscaled: 0.1081 (0.1735) loss_bbox_1_unscaled: 0.1022 (0.1729) loss_bbox_2_unscaled: 0.0992 (0.1728) loss_bbox_3_unscaled: 0.1025 (0.1730) loss_bbox_4_unscaled: 0.1048 (0.1729) loss_ce_unscaled: 0.4846 (0.5707) loss_ce_0_unscaled: 0.4996 (0.5693) loss_ce_1_unscaled: 0.4839 (0.5676) loss_ce_2_unscaled: 0.4692 (0.5691) loss_ce_3_unscaled: 0.4780 (0.5679) loss_ce_4_unscaled: 0.4763 (0.5677) loss_giou_unscaled: 0.6219 (0.7735) loss_giou_0_unscaled: 0.6098 (0.7756) loss_giou_1_unscaled: 0.6023 (0.7727) loss_giou_2_unscaled: 0.6036 (0.7733) loss_giou_3_unscaled: 0.5852 (0.7726) loss_giou_4_unscaled: 0.5877 (0.7732) loss_obj_ll_unscaled: 120.4147 (166.9937) loss_obj_ll_0_unscaled: 140.7218 (179.9712) loss_obj_ll_1_unscaled: 133.4184 (175.3988) loss_obj_ll_2_unscaled: 132.2273 (177.4261) loss_obj_ll_3_unscaled: 127.2509 (175.5592) loss_obj_ll_4_unscaled: 129.9692 (176.1760)
testing data details
21
20
('Urchin', 'Fish', 'Sea star', 'Anemone', 'Sea cucumber', 'Sea pen', 'Sea fan', 'Worm', 'Crab', 'Gastropod')
('Urchin', 'Fish', 'Sea star', 'Anemone', 'Sea cucumber', 'Sea pen', 'Sea fan', 'Worm', 'Crab', 'Gastropod', 'Shrimp', 'Soft coral', 'Glass sponge', 'Feather star', 'Eel', 'Squat lobster', 'Barnacle', 'Stony coral', 'Black coral', 'Sea spider', 'unknown')
/home/sabrina/code/PROB/models/prob_deformable_detr.py:537: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
topk_boxes = topk_indexes // out_logits.shape[2]
Test: [ 0/167] eta: 0:02:11 time: 0.7896 data: 0.4684 max mem: 22891
/home/sabrina/code/PROB/models/prob_deformable_detr.py:537: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
topk_boxes = topk_indexes // out_logits.shape[2]
Test: [ 10/167] eta: 0:00:57 time: 0.3635 data: 0.0485 max mem: 22891
Test: [ 20/167] eta: 0:00:50 time: 0.3197 data: 0.0066 max mem: 22891
Test: [ 30/167] eta: 0:00:46 time: 0.3211 data: 0.0065 max mem: 22891
Test: [ 40/167] eta: 0:00:42 time: 0.3221 data: 0.0063 max mem: 22891
Test: [ 50/167] eta: 0:00:38 time: 0.3234 data: 0.0064 max mem: 22891
Test: [ 60/167] eta: 0:00:35 time: 0.3250 data: 0.0065 max mem: 22891
Test: [ 70/167] eta: 0:00:31 time: 0.3213 data: 0.0062 max mem: 22891
Test: [ 80/167] eta: 0:00:28 time: 0.3114 data: 0.0060 max mem: 22891
Test: [ 90/167] eta: 0:00:24 time: 0.3013 data: 0.0059 max mem: 22891
Test: [100/167] eta: 0:00:21 time: 0.3026 data: 0.0058 max mem: 22891
Test: [110/167] eta: 0:00:18 time: 0.3170 data: 0.0060 max mem: 22891
Test: [120/167] eta: 0:00:15 time: 0.3320 data: 0.0065 max mem: 22891
Test: [130/167] eta: 0:00:11 time: 0.3400 data: 0.0068 max mem: 22891
Test: [140/167] eta: 0:00:08 time: 0.3419 data: 0.0067 max mem: 22891
Test: [150/167] eta: 0:00:05 time: 0.3419 data: 0.0067 max mem: 22891
Test: [160/167] eta: 0:00:02 time: 0.3455 data: 0.0068 max mem: 22891
Test: [166/167] eta: 0:00:00 time: 0.3433 data: 0.0066 max mem: 22891
Test: Total time: 0:00:54 (0.3289 s / it)
Urchin has 5789 predictions.
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
Traceback (most recent call last):
File "/home/sabrina/code/PROB/main_open_world.py", line 475, in <module>
main(args)
File "/home/sabrina/code/PROB/main_open_world.py", line 343, in main
test_stats, coco_evaluator = evaluate(
File "/home/sabrina/mambaforge/envs/prob/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/sabrina/code/PROB/engine.py", line 151, in evaluate
coco_evaluator.accumulate()
File "/home/sabrina/code/PROB/datasets/open_world_eval.py", line 136, in accumulate
self.num_unk, self.tp_plus_fp_closed_set, self.fp_open_set = voc_eval(lines_by_class, \
File "/home/sabrina/code/PROB/datasets/open_world_eval.py", line 410, in voc_eval
R = class_recs[image_ids[d]]
KeyError: '3895_'
Hi Orr,
I managed to train task 1 and now I would like to get some predictions on a couple test images. I'm reading through issues trying to figure out what I have to do to run inference, but I'm still confused. How do I get predictions from an image?
I noticed that you do not use the top 5 pseudo-labels (which are not overlapping with the ground truth) to train the unknown detection.
Could you please tell me why? Because most of the OWOD methods such as OW-DETR adopt this strategy.
Best regards,
Yulin
Hi.Thanks for the amamzing work!
i have use your network by changing to my custom dataset with 7 category,without changing other parameters and trained it,it turns out that i can get the known object well,however there are many unknown object with high confidence,even after NMS it still looks not good,here is my visulization
Btw when i train my data,it turns out the script shows many unknown object is detected too,is this normal?Looking forword to your reply.
I didn't find the call of the class FullProbObjectnessHead in the code, where is the call of this class?
Hi,
Because our group only assigned me a 2080Ti, the training took a long time, for MOWODB's task 1, it took 43 hours.
Unfortunately, on training to the 35th epoch, wandb crashes, so its curve also stops at the 35th epoch.
However, the program is still running without errors, and the file "checkpoint0040.pth" is also generated in the end, and the program can run smoothly when I use this file to train task 2.
Below are the wandb graphs and hyperparameters, which don't work very well, and I may need to tune the parameters as close to the original performance as possible.
K_AP50 is 52.476, U_R50 is 21.042
The download link of pretrain weights is invalid. Please pull a new link, thanks!
There are a few minor installation problems. The first is when setting up the conda environment, torch packages could not be found. The second is trying to install multiscaledeformableattention before running the make.sh. These two result in errors. The third problem is running coco2voc.py, which will append to the original train.txt file, which is already up to date. It does not result in an error, but would double the train.txt file size.
Thanks for your great work. I try to integrate prob-OWOD to 3D domains, but I am confronted with one problem. After training the cls reg branch produces the known categories between [-3,0], and the unknown is >0, after sigmoid and topk selection all objects are classified as unknown. May you give some insight on how to regulate the model from avoiding classifying knowns as unknowns?
Hi, @orrzohar,
I'm facing the same issue as described in input #24. I followed the installation instructions and used four TITAN (12G) GPUs to run the code. I set the batch size to 2 and reduced the learning rate from 2e-4 to 8e-5, as well as the lr_backbone from 2e-5 to 8e-6.
However, even after 2 epochs, the class_error metrics remain quite high, ranging from 80 to 100. Is this normal? If not, what could be causing this problem?
Hello,
I am looking to work with your model and have been able to run and train the model, it would be really helpful if you could share the inference script that was used to get the results.
Hi @orrzohar
Thanks for your nice works and share! Currently I want to reproduce your code with 2 3090 GPUs, I wonder how should I set the hyperparameters, which of them need to be changed essentially?
How to distinguish between mismatched backgrounds and unknown objects when penalizing matched Mahalanobis distances?
Hello, I would like to ask about my training progress on Task1. I have trained for several epochs, but the class_error has remained at 100.00. Is this normal?
Hello, I would like to know if two GPUs can be used for training? Will the results of training differ significantly?
'AuntimeError: Timed out initializn process group in store based barrier on rank: 2,for key: store based barier key:l. (world size-3. worker counte,timeot=0:30:00)'
I would like to ask why it only occurs in rank2?I can run one epoch normally, but every time I run the second epoch, an error occurs and one GPU stops running. Why is that?@orrzohar
When running the code, give a warning as follows. Will this have an impact?
UserWarning:The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
UserWarning: Arguments other than a weight enum or 'None" for 'weights' are deprecated since 0.13 and wili be removed in 0.15. The current behavior is equivalent to passing 'weights = None'
Hi orr,
I'm sorry to borther you again and again. I encountered a problem when generating the "learned_owod_t2_ft.txt" file.
As shown in the first figure, it lists some blank or error image ids. The number also comes to 1749 instead of 1743 predefined in the config file.
However, when I generate "learned_owod_t2_ft.txt" with one GPU, everything is ok, as shown in the below figure.
I guess the problem may be the "dist.all_gather_object" function, which combines the output of all GPUs.
But I have no idea to fix it.
Best regards,
harrylin
I noticed while reading the code that you set 'loss_obj_ll(weight_dict)' to 0.0008. How did you set this value?@orrzohar
The multiplication mentioned in the paper in "For class prediction, the learned objectness probability multiplies the classification probabilities to produce the final class predictions" is reflected in the code, but I'm sorry I couldn't find it
Hi, thanks for your great work!
In the inference phase, I find that your model may assign multi-class labels to the same token in the inference phase, is this reasonable?
hi, I'm having a 'wandb' issue when running a 'run.sh' file,I found a way not to use wandb in your other answers, and I wonder if not using wandb will have an impact on the experiment,Looking forward to hearing from you.
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql
wandb.errors.CommError: Permission denied, ask the project owner to grant you access
wandb: ERROR Internal wandb error: file data was not synced
Problem at: /home/workspace/prob/main_open_world.py 165 main
wandb.errors.MailboxError: transport failed
wandb: While tearing down the service manager. The following error has occured: [Errno 32] Broken pipe
Hi, I was wondering if you used a VoC Incremental Fine Tune Exemplar Set for each of the three settings and if it's in the repository?
Hello, thanks for your wonderful works!
When I run run.sh, I encounter this problem:
No such file or directory: './data/OWOD/ImageSets/TOWOD/PROB_V1/learned_owod_t4_ft.txt'
and
subprocess.CalledProcessError: Command '['./configs/M_OWOD_BENCHMARK.sh']' returned non-zero exit status 1.
How to solve these problem?
I ran the .run_eval.sh
after solving errors on shared memory, integer error in open_world_eval.py
, the predictions results was displayed in the terminal, but not saved in the /exps/MOWODB/PROB/eval
folder. Can someone help me on this?
How to test an image and visulize the results. Could you provide a py file like OWOD?
Best,
Hello, I have trained 4 tasks on my own dataset. The mAP scores are increasing from Task 1 to Task 4, with Task 1 being 45.3 and Task 4 exceeding 80. I want to know how to set the number of epochs for each task. Should I train until over 80 for Task 1, which may require a large number of epochs? Another question is about the batch size. Due to batch size reasons, I set the learning rate to 8e-5. I noticed that you changed the learning rate to 2e-5 for Task 2, 3, and 4. How should I adjust it appropriately?
Hello. How to trian "19+1" , "15+5" and "10+10" settings?
Dear authors,
When I run python test.py, there is an error: Cuda out of memory. Could I use multiple GPUs to run this command?
A100(80G),Nothing else has changed.
{"K_AP50": 59.38904571533203, "K_P50": 21.074942637087915, "K_R50": 72.52758104006436, "U_AP50": 0.6464414000511169, "U_P50": 0.4288344914478119, "U_R50": 16.88679245283019, "epoch": 40}, "test_coco_eval_bbox": [14.671942710876465, 14.671942710876465, 78.46551513671875, 58.18337631225586, 64.30726623535156, 50.592430114746094, 29.676156997680664, 71.94124603271484, 56.22311782836914, 82.22350311279297, 27.28054428100586, 71.0342788696289, 22.341707229614258, 82.27958679199219, 71.79204559326172, 68.34331512451172, 49.77190017700195, 35.397483825683594, 71.02239227294922, 50.98625564575195, 83.90058135986328, 62.01821517944336, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6464414000511169], "epoch": 40, "n_parameters": 39742295}@orrzohar Thanks
How can I get the visualization results in Figure 3 of the paper?
Thanks!
Thanks for releasing the code. How long does your model need to train?
Hi @orrzohar During the reasoning process, when I execute class PostProcess, the value of the output temperature is 0.005078125. In the paper, I see it is 1.3. Why is this?
Originally posted by @YH-2023 in #43 (comment)
I would like to know your evaluation criteria? When you predict 100 targets, and each one of them has an IoU > 0.5 with GT, it means the prediction is correct. Otherwise, it means the prediction is incorrect. Is this understanding?
I am happy to help! Yes, these are very similar to the curves I got on my machine:
I actually looked into why this happens and this has to do with PROB learning a good representation of objectness very early on (which is why U-Recall initially jumps, if you plot U-Recall inside epoch 1 you will see it increase from ~0 to 19). Then, as training progresses, it starts declining as it starts making more known object predictions, and therefore less unknown object predictions (e.g., ~U-Recall@100 goes down to ~U-Recall@80).
I will update the readme with this hyper parameter setup & machine type for future users.
If you encounter any new issues - do not hesitate to reach out, Orr
It is indeed like this. We can see that the U_R50 has decreased even after the training time has increased. I am quite puzzled, so why not choose the model with the highest U_R50? @orrzohar
> As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?
I used four cards with batch_size = 3,the result is :
{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}
the authors' results is :
U-R:19.4,K-AP:59.5
Why is it that the author's performance cannot be achieved?
@Hatins @orrzohar
Originally posted by @Rzx520 in #26 (comment)
Hi, @orrzohar , thanks for your great job. Here, I have some questions about visualization for known and unknown objects.
My codes for visualization as follows:
from torchvision.ops.boxes import batched_nms
### You can choose confidence: the default value of confidence is 0.7
def filter_boxes(scores, boxes, confidence=0.7, apply_nms=True, iou=0.5):
keep = scores.max(-1).values > confidence
scores, boxes = scores[keep], boxes[keep]
if apply_nms:
top_scores, labels = scores.max(-1)
keep = batched_nms(boxes, top_scores, labels, iou)
scores, boxes = scores[keep], boxes[keep]
return scores, boxes
@torch.no_grad()
def viz(model, criterion, postprocessors, data_loader, base_ds, device, output_dir, args):
dataset = args.dataset
import numpy as np
os.makedirs(output_dir, exist_ok=True)
model.eval()
criterion.eval()
metric_logger = utils.MetricLogger(delimiter=" ")
metric_logger.add_meter('class_error', utils.SmoothedValue(window_size=1, fmt='{value:.2f}'))
use_topk = True
num_obj = 20
for batch_idx, (samples, targets) in enumerate(tqdm(data_loader)):
if batch_idx >=10:
break
samples = samples.to(device)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
top_k = len(targets[0]['boxes'])
# outputs = model(samples)
# indices = outputs['pred_logits'][0].softmax(-1)[..., 1].sort(descending=True)[1][:top_k]
# predicted_boxes = torch.stack([outputs['pred_boxes'][0][i] for i in indices])
# logits = torch.stack([outputs['pred_logits'][0][i] for i in indices])
# scores_softmax = logits.softmax(-1)[:, :-1]
# labels = scores_softmax.argmax(axis=1)
# scores = scores_softmax.max(-1).values
outputs = model(samples)
# probas = outputs['pred_logits'].softmax(-1)[0, :, :-1].cpu()
probas = outputs['pred_logits'].softmax(-1)[0, :, :].cpu()
pred_objs = outputs['pred_obj'].softmax(-1)[0, :].cpu()
predicted_boxes = outputs['pred_boxes'][0,].cpu()
scores, predicted_boxes = filter_boxes(probas, predicted_boxes)
labels = scores.argmax(axis=1)
scores = scores.max(-1).values
fig, ax = plt.subplots(1, 3, figsize=(10,3), dpi=200)
# Ori Picture
plot_ori_image(
samples.tensors[0:1],
ax[0],
plot_prob=False,
)
ax[0].set_title('Original Image')
# Pred results
# if not control the number of labels
if not use_topk:
plot_prediction(
samples.tensors[0:1],
scores[-num_obj:],
predicted_boxes[-num_obj:],
labels[-num_obj:],
ax[1],
plot_prob=False,
dataset=dataset,
)
# if control the number of labels
if use_topk:
plot_prediction(
samples.tensors[0:1],
scores[-top_k:],
predicted_boxes[-top_k:],
labels[-top_k:],
ax[1],
plot_prob=False,
dataset=dataset,
)
ax[1].set_title('Prediction (Ours)')
# GT Results
plot_prediction(
samples.tensors[0:1],
torch.ones(targets[0]['boxes'].shape[0]),
targets[0]['boxes'],
targets[0]['labels'],
ax[2],
plot_prob=False,
dataset=dataset,
)
ax[2].set_title('GT')
for i in range(3):
ax[i].set_aspect('equal')
ax[i].set_axis_off()
plt.savefig(os.path.join(output_dir, f'img_{int(targets[0]["image_id"][0])}.jpg'))
There are some results for visualization:
when I set probas = outputs['pred_logits'].softmax(-1)[0, :, :-1].cpu()
, (In this case, there will be just known objects (because it just considers 0-79 known classes.)):
when I set probas = outputs['pred_logits'].softmax(-1)[0, :, :].cpu()
, (In this case, there will be known and unknown objects (because it just considers 0-80 known classes.)):
When I show the boxes of known and unknown objects in the picture, we can see that there are many overlap boxes on known objects and many boxes which are not objectness, so can you tell me how to modify the code and make it normal?
Thanks for your great work.
In the file "prob_deformable_detr.py", there are two classes, ProbObjectnessHead
and FullProbObjectnessHead
, and the latter one (Full) is not used. But it seems that the full one is more consistent with the paper, while the ProbObjectnessHead
is used but is so simple. Do these two classes actually play the same role? Can you expain it more detailedly? Thanks.
Is the larger the Pred_obj value output by the model, the lower the objectness, or did I make a mistake?
Hi Orr, everything was going smoothly until I got to task 4. At the end when generating the exemplar file for task 4, I got the following error:
Urchin has 1 predictions.
Fish has 0 predictions.
Sea star has 0 predictions.
Anemone has 0 predictions.
Sea cucumber has 0 predictions.
Sea pen has 0 predictions.
Sea fan has 19 predictions.
Worm has 0 predictions.
Crab has 0 predictions.
Gastropod has 0 predictions.
Shrimp has 0 predictions.
Soft coral has 0 predictions.
Glass sponge has 0 predictions.
Feather star has 0 predictions.
Eel has 1867 predictions.
Squat lobster has 11 predictions.
Barnacle has 0 predictions.
Stony coral has 0 predictions.
Black coral has 0 predictions.
Sea spider has 0 predictions.
unknown has 165102 predictions.
detection mAP50: 1.961324
detection mAP: 1.961324
---AP50---
Wilderness Impact: {0.1: {50: 0.0}, 0.2: {50: 0.0}, 0.3: {50: 0.0}, 0.4: {50: 0.0}, 0.5: {50: 0.0}, 0.6: {50: 0.0}, 0.7: {50: 0.0}, 0.8: {50: 0.0}, 0.9: {50: 0.0}}
avg_precision: {0.1: {50: 0.0}, 0.2: {50: 0.0}, 0.3: {50: 0.0}, 0.4: {50: 0.0}, 0.5: {50: 0.0}, 0.6: {50: 0.0}, 0.7: {50: 0.0}, 0.8: {50: 0.0}, 0.9: {50: 0.0}}
Absolute OSE (total_num_unk_det_as_known): {50: 0.0}
total_num_unk 0
AP50: ['9.1', '0.0', '0.0', '0.0', '0.0', '0.0', '9.1', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '21.2', '1.8', '0.0', '0.0', '0.0', '0.0', '0.0']
Precisions50: ['100.0', '0.0', '0.0', '0.0', '0.0', '0.0', '78.9', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '3.4', '9.1', '0.0', '0.0', '0.0', '0.0', '0.0']
Recall50: ['0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '3.2', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '86.5', '3.1', '0.0', '0.0', '0.0', '0.0', 'nan']
Prev class AP50: tensor(1.2987)
Prev class Precisions50: 12.781954887218046
Prev class Recall50: 0.23057421923603336
Current class AP50: tensor(3.8343)
Current class Precisions50: 2.086478063982081
Current class Recall50: 14.935247747747747
Known AP50: tensor(2.0594)
Known Precisions50: 9.573311840247257
Known Recall50: 4.6419762777895475
Unknown AP50: tensor(0.)
Unknown Precisions50: 0.0
Unknown Recall50: nan
Urchin 9.090909
Fish 0.000000
Sea star 0.000000
Anemone 0.000000
Sea cucumber 0.000000
Sea pen 0.000000
Sea fan 9.090909
Worm 0.000000
Crab 0.000000
Gastropod 0.000000
Shrimp 0.000000
Soft coral 0.000000
Glass sponge 0.000000
Feather star 0.000000
Eel 21.187799
Squat lobster 1.818182
Barnacle 0.000000
Stony coral 0.000000
Black coral 0.000000
Sea spider 0.000000
unknown 0.000000
[ExempReplay] [ 0/61] eta: 0:00:21 loss: 10.0000 (10.0000) time: 0.3489 data: 0.0000 max mem: 23363
[ExempReplay] [10/61] eta: 0:00:17 loss: 60.0000 (60.0000) time: 0.3373 data: 0.0000 max mem: 23363
[ExempReplay] [20/61] eta: 0:00:13 loss: 110.0000 (110.0000) time: 0.3296 data: 0.0000 max mem: 23363
[ExempReplay] [30/61] eta: 0:00:10 loss: 210.0000 (160.0000) time: 0.3209 data: 0.0000 max mem: 23363
[ExempReplay] [40/61] eta: 0:00:06 loss: 310.0000 (210.0000) time: 0.3191 data: 0.0000 max mem: 23363
[ExempReplay] [50/61] eta: 0:00:03 loss: 410.0000 (260.0000) time: 0.3209 data: 0.0000 max mem: 23363
[ExempReplay] [60/61] eta: 0:00:00 loss: 510.0000 (310.0000) time: 0.3133 data: 0.0000 max mem: 23363
[ExempReplay] Total time: 0:00:19 (0.3212 s / it)
found a total of 610 images
found a total of 610 images
only found 18 imgs in class 17
only found 4 imgs in class 18
only found 0 imgs in class 19
Traceback (most recent call last):
File "/home/sabrina/code/PROB/main_open_world.py", line 475, in <module>
main(args)
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
File "/home/sabrina/code/PROB/main_open_world.py", line 387, in main
create_ft_dataset(args, image_sorted_scores)
File "/home/sabrina/code/PROB/main_open_world.py", line 439, in create_ft_dataset
max_val = tmp.min()
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
How I ran it:
# train task 4
PY_ARGS=${@:1}
python -u main_open_world.py \
--output_dir "${EXP_DIR}/t4" --dataset fathomnet --PREV_INTRODUCED_CLS 14 --CUR_INTRODUCED_CLS 6\
--train_set 'task4_train' --test_set 'all_test' --epochs 191 \
--model_type 'prob' --obj_loss_coef 8e-4 --freeze_prob_model --obj_temp 1.3\
--wandb_name "${WANDB_NAME}_t4"\
--exemplar_replay_selection --exemplar_replay_max_length 2749 --exemplar_replay_dir ${WANDB_NAME}\
--exemplar_replay_prev_file "t3_ft.txt" --exemplar_replay_cur_file "t4_ft.txt"\
--num_inst_per_class 40\
--pretrain "${EXP_DIR}/t3_ft/checkpoint0180.pth" --lr 2e-5\
${PY_ARGS}
Have you come across this before?
Hi, @orrzohar , thanks for your great work.
Here, I have some questions about your paper as follows:
what's the meaning of matched objects
? Do them refer to the pairs of output embedding and its relative annotated labels (labels and boxes supervision)?
How to get the un-known objects bounding box after training?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.