Giter VIP home page Giter VIP logo

maskscoring_rcnn's Introduction

Mask Scoring R-CNN (MS R-CNN)

By Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang.

CVPR 2019 Oral Paper, pdf

This project is based on maskrcnn-benchmark.

Introduction

Mask Scoring R-CNN contains a network block to learn the quality of the predicted instance masks. The proposed network block takes the instance feature and the corresponding predicted mask together to regress the mask IoU. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. By extensive evaluations on the COCO dataset, Mask Scoring R-CNN brings consistent and noticeable gain with different models and different frameworks. The network of MS R-CNN is as follows:

alt text

Install

Check INSTALL.md for installation instructions.

Prepare Data

  mkdir -p datasets/coco
  ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
  ln -s /path_to_coco_dataset/train2014 datasets/coco/train2014
  ln -s /path_to_coco_dataset/test2014 datasets/coco/test2014
  ln -s /path_to_coco_dataset/val2014 datasets/coco/val2014

Pretrained Models

  mkdir pretrained_models
  #The pretrained models will be downloaded when running the program.

My training log and pre-trained models can be found here link or link(pw:xm3f).

Running

Single GPU Training

  python tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

Multi-GPU Training

  export NGPUS=8
  python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" 

Results

NetWork Method mAP(mask) mAP(det)
ResNet-50 FPN Mask R-CNN 34.2 37.8
ResNet-50 FPN MS R-CNN 35.6 37.9
ResNet-101 FPN Mask R-CNN 36.1 40.1
ResNet-101 FPN MS R-CNN 37.4 40.1

Visualization

alt text The left four images show good detection results with high classification scores but low mask quality. Our method aims at solving this problem. The rightmost image shows the case of a good mask with a high classification score. Our method will retrain the high score. As can be seen, scores predicted by our model can better interpret the actual mask quality.

Acknowledgment

The work was done during an internship at Horizon Robotics.

Citations

If you find MS R-CNN useful in your research, please consider citing:

@inproceedings{huang2019msrcnn,
    author = {Zhaojin Huang and Lichao Huang and Yongchao Gong and Chang Huang and Xinggang Wang},
    title = {{Mask Scoring R-CNN}},
    booktitle = {CVPR},
    year = {2019},
}   

License

maskscoring_rcnn is released under the MIT license. See LICENSE for additional details.

Thanks to the Third Party Libs

maskrcnn-benchmark
Pytorch

maskscoring_rcnn's People

Contributors

xinggangw avatar zjhuang22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maskscoring_rcnn's Issues

Can't reproduce paper results on COCO images

The pretrained network gives strange results for images even from COCO dataset (the predictions are almost random)

Screenshot 2019-03-24 at 18 35 06

Looks like the problem is the weights files.

What I've done:

  • Used predictor.py and Mask_R-CNN_demo.ipynb to generate and visualize masks
  • Used configs/e2e_ms_rcnn_R_50_FPN_1x.yaml and configs/e2e_ms_rcnn_R_101_FPN_1x.yaml
  • Downloaded pretrained models from here

Have anyone reproduce this on mmdetection?

Have you reproduce this on mmdetection? @zjhuang22
We mainly follow your code and modify mmdetection but get 1.5 reduction in detection and no gain on ins seg compared with mask rcnn on mmdetection.
What are the differences between the data representations of the two frameworks,is there any points for attention ?

AP (mask) is 35.4

Loading and preparing results...
DONE (t=3.93s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=44.62s).
Accumulating evaluation results...
DONE (t=5.38s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.354
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.380
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.161
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.379
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.517
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.298
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.450
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.269
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619
2019-05-09 06:57:34,621 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3759153459368749)
, ('AP50', 0.5908149513831737), ('AP75', 0.407623512259985), ('APs', 0.21592651568910282), ('APm', 0.40564621888303226),
 ('APl', 0.4964606429195723)])), ('segm', OrderedDict([('AP', 0.35404836208584806), ('AP50', 0.558382551808117), ('AP75'
, 0.379614103783724), ('APs', 0.1613041010105462), ('APm', 0.37909971606652115), ('APl', 0.5174932965096802)]))])

Hi, I use the repo to train on train2017 and test on val2017 (8 GPUs), the backbone is resnet-r50. I does not change any other config, but the final AP (mask) is 35.4 which is lower 0.2 than 35.6, is it normal? Is this the normal performance variation?

DataLoader worker (pid 2753) exited unexpectedly with exit code 127??

2019-03-12 15:03:22,550 maskrcnn_benchmark.trainer INFO: Start training
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
python: symbol lookup error: /home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/king/githubToolkit/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 631, in next
idx, batch = self._get_batch()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
return self.data_queue.get()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/home/king/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 274, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2753) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

Why this occur??Is that because my GPU memory or system shared memory limited??or ln command maked some potential mistakes?someone met this problem?

runtimeerror :invalid arugment 0

My gpus are 2* RTX2080
os = ubuntu 16.04.6
cuda version = 9.0.176
python = 3.6
pytorch = 1.0.1

when I use script " export NGPUS=2
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 700"

Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 167, in main
test(cfg, model, args.distributed)
File "tools/train_net.py", line 104, in test
maskiou_on=cfg.MODEL.MASKIOU_ON
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 379, in inference
predictions = compute_on_dataset(model, data_loader, device)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 31, in compute_on_dataset
output = model(images)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 51, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 43, in forward
loss_maskiou, detections = self.maskiou(roi_feature, detections, selected_mask, labels, maskiou_targets)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/maskiou_head.py", line 41, in forward
x = self.feature_extractor(features, selected_mask)
File "/home/d/.conda/envs/ms/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/d/github/maskscoring_rcnn/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/roi_maskiou_feature_extractors.py", line 39, in forward
x = torch.cat((x, mask_pool), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 250 in dimension 0 at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THC/generic/THCTensorMath.cu:83

how to test an image

I want to test on my own image and see the segmentation and bounding box result, please tell me how to do?

environment problem

I followed install.md to setup environment, in the last step to install PyTorch maskscoring_rcnn, I got some error like this:
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:15:17: error: switch quantity not an integer
switch (TYPE) {
^
/home//github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:16:44: error: could not convert ‘Double’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, VA_ARGS)
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
/home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:17:44: error: could not convert ‘Float’ from ‘c10::ScalarType’ to ‘’
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, VA_ARGS)
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:8:8: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
case enum_type: {
^
/home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:71:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’
AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {

In file included from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/Scalar.h:10:0,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/core/Type.h:8,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Type.h:2,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
from /hom/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
from /home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
from /home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/vision.h:3,
from /home/github/maskscoring_rcnn/maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp:2:
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
static inline const char * toString(ScalarType t) {
^
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
/home/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/c10/core/ScalarType.h:122:28: note: ‘c10::toString’
error: command 'gcc' failed with exit status 1
can you please help me?

How do you get the final predicted maskiou?

In your code, the maskiou value is predicted by the network and you use l2-loss.
But the observation of GT maskiou is [0,1], so is it better to do ReLu after you calculated the iou using the Linear layer?

UnboundLocalError

I used the script "python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "maskscoring_rcnn/configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 2" to run the code twice. But in the second try I got an error: UnbondLocalError:local variable 'iteration' referenced before assignment.

how to add other dataset?

Hi, I want to train on diffierent dataset, so what should I modify?
Looking forward to your reply.

How can I just test one image?

I just want to test one image, and show its mask result....
Can you tell me what should I do??
Maybe "tests/test_data_samplers.py", but I don't known how to run it?
Thanks!

train problem for AttributeError: 'list' object has no attribute 'resize'

loading annotations into memory...
Done (t=14.28s)
creating index...
index created!
loading annotations into memory...
Done (t=1.97s)
creating index...
index created!
2019-05-06 18:22:29,797 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 36, in getitem
img, anno = super(COCODataset, self).getitem(idx)
File "/home/lj/.conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torchvision-0.2.3a0+7a4845a-py3.7.egg/torchvision/datasets/coco.py", line 114, in getitem
img, target = self.transforms(img, target)
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/cs/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
target = target.resize(image.size)
AttributeError: 'list' object has no attribute 'resize'

(with coco2017 dataset )
can anyone tell me how to fix it?thanks!!

Too many open files

I train my model with your methods, but something terrible occur, could you give me some suggestions

The error message as follow:

Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 194, in DupFd
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 194, in DupFd
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/multiprocessing/reduction.py", line 179, in send_handle
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
File "/home/zhongqi/anaconda3/envs/maskscore/lib/python3.7/socket.py", line 463, in fromfd
nfd = dup(fd)
OSError: [Errno 24] Too many open files
Traceback (most recent cal

Process finished with exit code -1

it only took 1880 iterations and it stopped. May I ask what might be the reason for this?

2019-04-22 02:19:55,592 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:55:33 iter: 1840 loss: 0.6163 (0.7650) loss_classifier: 0.3722 (0.4438) loss_box_reg: 0.1609 (0.1872) loss_mask: 0.0471 (0.0711) loss_maskiou: 0.0093 (0.0110) loss_objectness: 0.0095 (0.0252) loss_rpn_box_reg: 0.0180 (0.0267) time: 1.0529 (1.0924) data: 0.0552 (0.0508) lr: 0.002500 max mem: 7203
2019-04-22 02:20:16,294 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:47:48 iter: 1860 loss: 0.6061 (0.7632) loss_classifier: 0.3617 (0.4429) loss_box_reg: 0.1469 (0.1867) loss_mask: 0.0573 (0.0709) loss_maskiou: 0.0125 (0.0111) loss_objectness: 0.0080 (0.0250) loss_rpn_box_reg: 0.0149 (0.0266) time: 1.0085 (1.0918) data: 0.0435 (0.0507) lr: 0.002500 max mem: 7203
2019-04-22 02:20:37,496 maskrcnn_benchmark.trainer INFO: eta: 9 days, 1:43:25 iter: 1880 loss: 0.5906 (0.7615) loss_classifier: 0.3566 (0.4420) loss_box_reg: 0.1496 (0.1863) loss_mask: 0.0485 (0.0707) loss_maskiou: 0.0114 (0.0111) loss_objectness: 0.0094 (0.0248) loss_rpn_box_reg: 0.0148 (0.0265) time: 1.0790 (1.0915) data: 0.0493 (0.0507) lr: 0.002500 max mem: 7203

Process finished with exit code -1

How can I make use of multiple GPUs on a single server?

I tried changing WORLD_SIZE and CUDA_VISIBLE_DEVICES environment variables, but I got errors when init process group:

os.environ['WORLD_SIZE'] = '2'
os.environ["CUDA_VISIBLE_DEVICES"] = "1,9"

num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
args.distributed = num_gpus > 1

Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1741, in
main()
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/opt/maskscoring_RCNN/train_net.py", line 178, in
main()
File "/opt/maskscoring_RCNN/train_net.py", line 145, in main
backend="nccl", init_method="env://"
File "/root/anaconda3/lib/python3.6/site-packages/torch/distributed/deprecated/init.py", line 101, in init_process_group
group_name, rank)
RuntimeError: rank is not set but it is required for env:// init method at /opt/conda/conda-bld/pytorch_1549628766161/work/torch/lib/THD/process_group/General.cpp:20

ImportError: cannot import name '_C'

Traceback (most recent call last):
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/tools/train_net.py", line 18, in
from maskrcnn_benchmark.engine.inference import inference
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/media/jiurui/92466C31466C186D/svn_all_20181126/yuanbaoxi/maskrcnntemp/maskscoring_rcnn-master/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: cannot import name '_C'

test error

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 84 and 366 in dimension 0 at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THC/generic/THCTensorMath.cu:83

evaluation problem

Hi, I want to ask when you evaluate with box AP, what's the score type you use, cls score or mask score. I just can not understand that if you use the mask score to evaluate box AP, the box AP must drop, but in your repo, the box AP nearly unchanged, so I think the results of box AP and mask AP shown come from two different scores, and you test twice?

can not download pretrained model

While I excute the code:

python tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

A error returned:

2019-03-11 14:57:30,345 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from catalog://ImageNetPretrained/MSRA/R-50
2019-03-11 14:57:30,345 maskrcnn_benchmark.utils.checkpoint INFO: catalog://ImageNetPretrained/MSRA/R-50 points to https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
Downloading: "https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl" to pretrained_models/R-50.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/media//ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/media/
/ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 128, in _load_file
cached_f = cache_url(f, model_dir=self.cfg.MODEL.PRETRAINED_MODELS)
File "/media//ffd15abb-ef51-4903-a331-ef8327a5864a/DukTo/svn/object-detection/maskscoring_rcnn/maskrcnn_benchmark/utils/model_zoo.py", line 54, in cache_url
_download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "/home/
/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/utils/model_zoo.py", line 88, in _download_url_to_file
u = urlopen(url)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/home/
/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/home/
/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/home//anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/home/
/anaconda3/envs/maskrcnn_benchmark/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

how could I get the pretrained model by other way.

Cannot train because layers.nms.py

Hi, thanks for your job. Now I am trying this code, howerver I cannot train because maskrcnn_benchmark\layers\nms.py. The code will stop at maskrcnn_benchmark\structures\boxlist_ops.py line keep = _box_nms(boxes, score, nms_thresh), and there is no error. The env I use are gcc 5.4.0 pytorch stable 1.0 python3.7 cuda8.0. How can I solve this problem. Thanks!

Am I mis-understanding the meaning of MaskIoU head here?

I checked the paper and really thx for the awesome work and code share!
Previously I thought "MaskIoU" is that you're training the mask with IoU directly (dice loss) -- your predict a mask for box i of class A, and you use the ground of box i of class A to compute IoU and use that as a loss to train your mask-rcnn in a multi-task learning fashion.
Checking the paper more carefully I realized you're feeding the predicted mask to a CNN to compute IoU rather than compute it directly as https://github.com/kevinzakka/pytorch-goodies/blob/master/losses.py#L54 I wonder if this approach is prone to overfitting, have you compared the results with/without "concatenation" and see how much the MaskIoU head is actually relying on RoI feature maps?

Problem about calculate "segmentation_mask_for_maskratio"

    for segmentation_mask, proposal in zip(segmentation_masks, proposals):
        cropped_mask = segmentation_mask.crop(proposal)
        scaled_mask = cropped_mask.resize((M, M))
        mask = scaled_mask.convert(mode="mask")
        masks.append(mask)
        if maskiou_on:
            x1 = int(proposal[0])
            y1 = int(proposal[1])
            x2 = int(proposal[2]) + 1
            y2 = int(proposal[3]) + 1
            for poly_ in segmentation_mask.polygons:
                poly = np.array(poly_, dtype=np.float32)
                x1 = np.minimum(x1, poly[0::2].min())
                x2 = np.maximum(x2, poly[0::2].max())
                y1 = np.minimum(y1, poly[1::2].min())
                y2 = np.maximum(y2, poly[1::2].max())
            img_h = segmentation_mask.size[1]
            img_w = segmentation_mask.size[0]
            x1 = np.maximum(x1, 0)
            x2 = np.minimum(x2, img_w-1)
            y1 = np.maximum(y1, 0)
            y2 = np.minimum(y2, img_h-1)
            segmentation_mask_for_maskratio =  segmentation_mask.crop([x1, y1, x2, y2])

Here is the code for segmentation_mask_for_maskratio, if I am right, segmentation_mask_for_maskratio is the segmentation cover the whole object. So why not directly calculate the area of segmentation_mask but you firstly crop the area of [x1,y1,x2,y2]. Thanks in advance!

How to use multi-gpus to run it?

Hello, I only can use one gpu with your code. What should I do if I want to use multi-gpus, I noticed that you use "torch.distributed.deprecated.init_process_group()" to set it, but there are not many materials about bow to use it. Can you tell me how to use it ? thank you very much!

AttributeError: 'list' object has no attribute 'resize'

When I use simple GPU to train the network.I have a problem"AttributeError: 'list' object has no attribute 'resize'".Could you please tell me how to solve this problem.Thank you very much.

PyTorch version: 1.1.0.dev20190506
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration: GPU 0: GeForce GTX TITAN X
Nvidia driver version: 418.40.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3
/usr/local/lib/libcudnn.so.5.1.10

Versions of relevant libraries:
[pip] numpy==1.16.3
[pip] torch==1.1.0.dev20190506
[pip] torchvision==0.2.3a0+d534785
[conda] blas 1.0 mkl
[conda] mkl 2019.3 199
[conda] mkl_fft 1.0.12 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch-nightly 1.1.0.dev20190506 py3.7_cuda9.0.176_cudnn7.5.1_0 pytorch
Pillow (6.0.0)
2019-05-08 22:23:42,841 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_ms_rcnn_R_50_FPN_1x.yaml
2019-05-08 22:23:42,842 maskrcnn_benchmark INFO:
.
.
.
.
2019-05-08 22:23:58,578 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 172, in
main()
File "tools/train_net.py", line 165, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 74, in train
arguments,
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 36, in getitem
img, anno = super(COCODataset, self).getitem(idx)
File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torchvision-0.2.3a0+d534785-py3.7.egg/torchvision/datasets/coco.py", line 114, in getitem
img, target = self.transforms(img, target)
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 14, in call
image, target = t(image, target)
File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
target = target.resize(image.size)
AttributeError: 'list' object has no attribute 'resize'

pretrained ms_rcnn model

Thanks for the great work!
It seems the link provided in README is for ImageNet pretrained models? Could you please provide the R-50/R-101 MS-RCNN model that is used to produce your results in the paper?

build error/running build_ext #

when i run setup.py build develop, i got this error:

copying build/lib.linux-x86_64-3.6/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so -> maskrcnn_benchmark
error: could not create 'maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so': No such file or directory

_pickle.UnpicklingError: invalid load key, '<'.

I am trying to reproduce your work. When I run train_net I got this. BTW I am using coco2017 rather than coco2014. Is dataset selection related to this error?

Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/root/github/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: invalid load key, '<'.

KeyError: 'Non-existent config key: MODEL.PRETRAINED_MODELS'

Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 143, in main
cfg.merge_from_file(args.config_file)
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 213, in merge_from_file
self.merge_from_other_cfg(cfg)
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 473, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.PRETRAINED_MODELS'

[Errno 2] No such file or directory: 'datasets/coco/annotations/instances_train2014.json',but the file does exist.

loading annotations into memory...
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 60, in train
start_iter=arguments["iteration"],
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/build.py", line 149, in make_data_loader
datasets = build_dataset(dataset_list, transforms, DatasetCatalog, is_train)
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/build.py", line 41, in build_dataset
dataset = factory(**args)
File "/home/yg/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 13, in init
super(COCODataset, self).init(root, ann_file)
File "/home/yg/anaconda3/envs/ms-rcnn/lib/python3.5/site-packages/torchvision/datasets/coco.py", line 97, in init
self.coco = COCO(annFile)
File "/home/yg/anaconda3/envs/ms-rcnn/lib/python3.5/site-packages/pycocotools-2.0-py3.5-linux-x86_64.egg/pycocotools/coco.py", line 84, in init
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/yg/datasets/coco/annotations/instances_train2014.json'

and it seems that the same problem occurred in maskrcnn-benchmark
https://github.com/facebookresearch/maskrcnn-benchmark/issues/345

error

2019-03-15 20:43:52,007 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from catalog://ImageNetPretrained/MSRA/R-50
2019-03-15 20:43:52,007 maskrcnn_benchmark.utils.checkpoint INFO: catalog://ImageNetPretrained/MSRA/R-50 points to https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
2019-03-15 20:43:52,008 maskrcnn_benchmark.utils.checkpoint INFO: url https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl cached in pretrained_models/R-50.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/home/chase/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: invalid load key, '\x00'.

I encountered some problems during test

(maskrcnn) wuyi@nclab:~/github/maskscoring_rcnn/tools$ python test_net.py
Traceback (most recent call last):
File "test_net.py", line 12, in
from maskrcnn_benchmark.engine.inference import inference
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /media/dat1/users/master/2019/wuyi/github/maskscoring_rcnn/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Z20ROIPool_forward_cudaRKN2at6TensorES2_fii

I'm confused,can you help me with it?

I wonder which file you change based on facebook's mask-rcnn

Because facebook's mask-rcnn has lots of files and I just want to know your algorithm code. And I have already known how to realize mask rcnn. So can you directly tell me where I can see your network design and your loss function.

And thanks.

Confusion about the procedure of inference

Hi, nice job man. But I'm still a little confused about the procedure of inference. In the original version of Mask-RCNN, NMS is processed among the proposals from RPN, and then the model use the rest proposals to generate cls-score and bbox-refinement. After that, another NMS is processed among the detections from cls-head and bbox-head. Finally, we feed the rest bbox to the mask-head and get the final result.
However, according to your paper, the mask-scores are used to refine the scores of cls-head, which means we should first get the mask-head outputs and then attach them to the cls-head. I'm just confused that how can the refined-scores help? Do you use those refined-scores to redo NMS among the detections from cls-head and -bbox-head and then re-feed them to mask-head to get some new segmentation results? So if I were correct, you had run the mask-head twice to get more accurate results?

error when use multi_gpus

When I use your script to run it with multi-gpus, error happened:
Traceback (most recent call last): File "tools/train_net.py", line 171, in <module> main() File "tools/train_net.py", line 140, in main backend="nccl", init_method="env://" File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/deprecated/__init__.py", line 101, in init_process_group group_name, rank) RuntimeError: Address already in use at /opt/conda/conda-bld/pytorch-nightly_1552799380021/work/torch/lib/THD/process_group/General.cpp:20 Traceback (most recent call last): File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/launch.py", line 238, in <module> main() File "/home/chh/anaconda2/envs/maskrcnn_benchmark/lib/python3.6/site-packages/torch/distributed/launch.py", line 234, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/chh/anaconda2/envs/maskrcnn_benchmark/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/e2e_ms_rcnn_R_50_FPN_1x.yaml']' returned non-zero exit status 1.

_pickle.UnpicklingError: pickle data was truncated

When I run train_net I got this.My datasets are coco2014.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 53, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 61, in load
checkpoint = self._load_file(f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/checkpoint.py", line 133, in _load_file
return load_c2_format(self.cfg, f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 155, in load_c2_format
return C2_FORMAT_LOADER[cfg.MODEL.BACKBONE.CONV_BODY](cfg, f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 146, in load_resnet_c2_format
state_dict = _load_c2_pickled_weights(f)
File "/home/huanran/anaconda3/envs/maskrcnn_benchmark/maskscoring_rcnn/maskrcnn_benchmark/utils/c2_model_loading.py", line 124, in _load_c2_pickled_weights
data = pickle.load(f, encoding="latin1")
_pickle.UnpicklingError: pickle data was truncated.
I don't know why.

About inference time

Thanks for your great work!
In your paper, speed and computation is mentioned as "Our MaskIoU head has about 0.39G FLOPs while Mask head has about 0.53G FLOPs for each proposal."
As fa as I known, there should be at least 10 proposals for mask head in each image, so that's 3.9G and 5.3G for MaskIou and Mask head. However, ResNet-18 is about 2G, so why MaskIoU head didn't lead to slower inference? Thank you!

What are instances_minival and instances_valminusminival?

Hi! I am using coco2017 dataset and have changed paths in paths_catalog from 2014 to corresponding 2017 ones. When I run the project I got this error:

No such file or directory: 'datasets/coco/annotations/instances_valminusminival2017.json'

It seems there is no such json file in coco datasets. May I ask what are these minival json files? How can I reproduce one?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.