Giter VIP home page Giter VIP logo

Comments (51)

ppwwyyxx avatar ppwwyyxx commented on May 18, 2024 8

FYI I pushed my own implementation. On minival it gets 40.1 on bbox and 34.4 on segm, with only a ResNet101 backbone (no FPN). This is slightly better than the paper's number.

from mask_rcnn.

ppwwyyxx avatar ppwwyyxx commented on May 18, 2024 6

Btw the authors have also released the code and models (in caffe2):
https://github.com/facebookresearch/Detectron . It also has slightly better performance than the paper. I've updated my implementation to the official configurations (learning rate schedule, etc) and got the same performance.

from mask_rcnn.

ppwwyyxx avatar ppwwyyxx commented on May 18, 2024 5

The pretrained model seems to base on ResNet101-FPN backbone if I'm not mistaken. The reference number in the paper for this architecture is segm-32.435.4 (on minival), bbox-38.2 (on test-dev). According to @csnemes2 's evaluation the number here is segm-26.7, bbox-29.8. This is a large gap IMHO, not 3 to 4 percentage points.

from mask_rcnn.

michaelisc avatar michaelisc commented on May 18, 2024 4

@waleedka Could you post the hyperparameters you used to get the reported performance somewhere? The results when I train the model look way worse and with a model as complex as Mask R-CNN doing a hyperparameter search takes a lot of time. So it would be great to know, which configuration you used for the provided checkpoint and the posted results.

from mask_rcnn.

s-bayer avatar s-bayer commented on May 18, 2024 4

Training is done now. Final results are 32.6 for bbox and 30.1 for segm on 8-GPUs with a Resnet101 initialized with Resnet50 weights. There are also checkpoints with slightly better numbers (33.0 and 30.4), but I will just use the last one.

Not sure where the higher values from @waleedka come from, but I'm fairly sure it's not just changes to the learning schedule, but changes to the remaining code, e.g. not freezing batchnorm.

I use a base learning rate of 0.005 (which is fairly aggressive) and a batch size of 2 for an effective batch size of 16.
My schedule is as follows (epochs are 1000 steps each and you need to add all epochs to get the total number of epochs):

    trainer = Trainer(model, dataset_train, dataset_val, config, augmentation)

    trainer.train('Training Stage 1: only network heads',
                  num_epochs=1, layers='heads')

    trainer.train('Training Stage 2: warmup for network except resnet-50',
                  num_epochs=1, layers='4+', learning_rate_factor=0.01)

    trainer.train('Training Stage 3: train network except resnet-50',
                  num_epochs=5, layers='4+')

    trainer.train('Training Stage 4: warmup for everything',
                  num_epochs=1, layers='all', learning_rate_factor=0.01)

    trainer.train('Training Stage 5: train everything',
                  num_epochs=100, layers='all')

    trainer.train('Training Stage 6: fine-tune / 10',
                  num_epochs=20, layers='all', learning_rate_factor=0.1)

    trainer.train('Training Stage 7: fine-tune / 100',
                  num_epochs=10, layers='all', learning_rate_factor=0.01)

The final fine-tuning / 100 barely changes results, so 1-2 epochs of that is probably enough.
You could also remove around 20 or maybe even 30 epochs of stage 5, if you need to save on training time.

The last three stages are intentionally fairly similar to the s1x-schedule of Detectron, which is from the authors of the paper, but my learning rate is different and I don't use linear warmup.

Here is the trainer-class in case you need it:

class Trainer:
    def __init__(self, model, dataset_train, dataset_val, config, augmentation):
        self.model = model
        self.dataset_train = dataset_train
        self.dataset_val = dataset_val
        self.config = config
        self.augmentation = augmentation
        self.total_epochs = 0

    def train(self, message, num_epochs, layers, learning_rate_factor=1.0):
        self.total_epochs += num_epochs
        print(message + f' for {num_epochs} mini-epochs.')
        self.model.train(self.dataset_train, self.dataset_val,
                         learning_rate=self.config.LEARNING_RATE * learning_rate_factor,
                         epochs=self.total_epochs,
                         layers=layers,
                         augmentation=self.augmentation)

And here are loss curves.

Train-loss:

loss

Val-loss:

val loss

You can clearly see that the first learning rate decrease is essential, while the second one is not necessary.

from mask_rcnn.

ppwwyyxx avatar ppwwyyxx commented on May 18, 2024 3

@waleedka It improves a little bit (within 1 mAP). If your model is not initialized with a pretrained resnet101, it is probably the main reason.

from mask_rcnn.

jmtatsch avatar jmtatsch commented on May 18, 2024 3

@waleedka there is converted resnet101 weights for keras here https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294
they load just fine when you set use_bias=False in conv1, conv_block & identity block. Apparently thats ok because the bias should be in the bn layers anyway KaimingHe/deep-residual-networks#10

from mask_rcnn.

jmtatsch avatar jmtatsch commented on May 18, 2024 3

@waleedka could you please chime in on how exactly you trained your model?

from mask_rcnn.

Pelups avatar Pelups commented on May 18, 2024 3

I think that the default settings doesn't contain enough epochs.
In model.py (train method), you can read :

epochs: Number of training epochs. Note that previous training epochs
                are considered to be done alreay, so this actually determines
                the epochs to train in total rather than in this particaular
                call.

This means that during the stage 3 of training, for instance, you are not training on 160 epochs, but 160 minus the number of epochs in the previous stage. Then, the stage 3 only trains on 40 epochs.
Then the default settings trains only on 160 epochs in total. With 1000 steps per epoch and 1 image per GPU, it means that you are training only on 160 000 images.
Official results get obtained on 160 000 steps * 2 images per GPU * 8 GPUs = 2 560 000 images.

So I think that you need to train the model longer.

But it could be interesting to know which settings have been used by @waleedka ;)

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024 2

@jiang1st Keras provides a resnet50 trained weights, but not resnet101. I could find resnet101 weights from another source and do a quick mapping of weight names and load them. I just didn't get around to doing so. The residual connections in resnet make this less of an issue, other than longer fine tuning time, and for most people starting from the COCO weights is a faster path. Still, I do want to add 101 weights when I get a chance.

Regarding the learning rate, I'm using 0.001 with a batch size of 16 (2 images per GPU on 8 GPUs). Typically you'd want to use a higher LR with bigger batches, but my observation in a few tests lead me to believe that the gradients from different GPUs are being summed (rather than averaged). This is a guess as I haven't verified it by tracing the internal implementation code, but if it's true then it means that you could use the same LR with regardless of the batch size.

from mask_rcnn.

s-bayer avatar s-bayer commented on May 18, 2024 2

Any updates on this? I'm trying to use this repository as baseline for my thesis, which is impossible if results are so far off from the original paper.
I'm also currently running experiments and will report if I find a way to make it work, but as said above: experimenting on this is quite expensive and takes a lot of time.
@waleedka, I would really appreciate it, if you could post your exact training code, so that I can use it :)

from mask_rcnn.

ypflll avatar ypflll commented on May 18, 2024 1

@ppwwyyxx I see in Kaiming He's paper, for ResNet-101-FPN backbone, AP is 35.4 on 5k minival and 35.7 on COCO test-dev.
@jiang1st I just tested on minival, AP is 0.261 for segm and 02.98 for bbox.

@waleedka Hi, since we can't reproduce the results, can you post your results on coco? Thanks.

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

Evaluation code against MS COCO is included in the repository, both for bounding boxes and segmentation masks so it should be easy to run (but takes a long time).

We should publish more details, though. Thanks for bringing it up. Our implementation deviates a bit from the paper (as mentioned in the documentation), and optimizing for COCO was a 'nice to have' rather than being the main objective. We got pretty close to the reported numbers (within 3 to 4 percentage points) but that was with half the training steps compared to the paper. We'll try to add more details over the next few days.

from mask_rcnn.

larsoncs avatar larsoncs commented on May 18, 2024

I used the pre-trained COCO weights test,After testing segmentation masks, But i found the result is not good ,could you tell me the reason?

from mask_rcnn.

yanxp avatar yanxp commented on May 18, 2024

I trained the coco dataset and found the result was not good . How to fine tune the work or set the parameters,why is that?

from mask_rcnn.

ppwwyyxx avatar ppwwyyxx commented on May 18, 2024

Out of curiosity, what's the score people have seen on COCO?

from mask_rcnn.

vijaykbg avatar vijaykbg commented on May 18, 2024

I am getting the following results on the COCO minival set with the provided model (mask_rcnn_coco.h5) :
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.241
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.419
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.250
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.273
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.216
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.309
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.114
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

@vijaykbg That's a lot lower than what I got. Is this for bounding boxes or segmentation masks? I'll try to generate and share our results next week.

from mask_rcnn.

larsoncs avatar larsoncs commented on May 18, 2024

@waleedka it's great work,looking forward to your next week results and model weights file !!!!

from mask_rcnn.

vijaykbg avatar vijaykbg commented on May 18, 2024

@waleedka Thanks for the info and the code. I evaluated segmentation masks ("segm") using the evaluation code coco.py.
It would be really helpful if you could share the updated models and your results soon.
Thanks in advance!

from mask_rcnn.

csnemes2 avatar csnemes2 commented on May 18, 2024

I have installed this right now, and without any tweak I run

python3 coco.py evaluate --dataset=/home/csn/COCO/ --model=./mask_rcnn_coco.h5

Which probably translates to

   # TODO: evaluating on 500 images. Set to 0 to evaluate on all images.
   evaluate_coco(model, dataset_val, coco, "bbox", limit=500)

Which finally gives me the following precisions:

Evaluate annotation type bbox
DONE (t=1.09s).
Accumulating evaluation results...
DONE (t=0.34s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.298
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.466
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.332
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.133
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.350
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.435
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.248
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.344
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.351
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.146
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.407
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.508

from mask_rcnn.

csnemes2 avatar csnemes2 commented on May 18, 2024

And for "segm":

Evaluate annotation type *segm*
DONE (t=1.20s).
Accumulating evaluation results...
DONE (t=0.34s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.267
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.453
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.283
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.106
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.410
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.225
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.119
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.365
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.466

from mask_rcnn.

Walid-Ahmed avatar Walid-Ahmed commented on May 18, 2024

@csnemes2
How can I actually run "evaluate" for segm ?
Thanks

from mask_rcnn.

ypflll avatar ypflll commented on May 18, 2024

@Walid-Ahmed
Do u mean this:
evaluate_coco(model, dataset_val, coco, "segm", limit=500)?

from mask_rcnn.

ypflll avatar ypflll commented on May 18, 2024

Evaluation is really time-consuming and seems that my gpu is not used, when configurationis like this:
Configurations:
BACKBONE_SHAPES [[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [ 0.1 0.1 0.2 0.2]
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
GPU_COUNT 1
IMAGES_PER_GPU 1
......

@waleedka How can I change it to evalute on gpu, not cpu?

from mask_rcnn.

ypflll avatar ypflll commented on May 18, 2024

Just falsely installed the cpu version of tensorflow.
With gpu version, 10 tmies less time cost^_^.

from mask_rcnn.

jiang1st avatar jiang1st commented on May 18, 2024

Has anyone reproduced the result published on the original paper? I tried the code, but the model is converging slowly. The mAP is around 29%.
Update: I evaluated the model on minival5k.

from mask_rcnn.

jiang1st avatar jiang1st commented on May 18, 2024

@ppwwyyxx It seems @csnemes2 only evaluated the top500 images.

from mask_rcnn.

csnemes2 avatar csnemes2 commented on May 18, 2024

@ppwwyyxx True

Usually there is no big difference, hence, the success/popularity of minival

Anyway I started a new evaluation on the whole dataset, but it will take a while

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

This took a bit longer than I hoped, but I just uploaded a new trained model.

https://github.com/matterport/Mask_RCNN/releases/tag/v2.0

These are the results on the COCO 5K minival. Bounding box AP is 34.7 (the paper reports 38.2 on test-dev but they don't report minival, unless I missed it). For segmentation, the AP is 29.6 (the paper reports 35.4 on minival). Playing with hyper parameters might allow you to get higher accuracy, but I didn't have enough time to try.

Evaluate annotation type *bbox*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.377
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.163
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.214
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601

Evaluate annotation type *segm*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.296
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.306
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.330
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.369
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.538

from mask_rcnn.

jiang1st avatar jiang1st commented on May 18, 2024

@waleedka Thank you for your efforts. Another thing I want to mention is, why are you loading parameters from a resnet50-based pretrained model (see model.get_imagenet_weights()) when training for a resnet101-based network?
One more question, when you set learning rate to 0.001, what is the actual batch size you are using? 16 (two image by 8 GPU) ? Thanks.

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

@ppwwyyxx That's awesome. Do you think your custom implementation of crop_and_resize is the main factor in your performance? I've seen other implementations that use the standard crop_and_resize and still get very good accuracy. So I'm curious how big a difference that contributed to your results.

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

@ppwwyyxx That makes sense, thanks! I'll train on resnet101 weights and see how much improvement that gives me. And if I get the chance, I want to try your implementation of ROI Align and see if that gives it additional boost as well.

from mask_rcnn.

jmtatsch avatar jmtatsch commented on May 18, 2024

@waleedka I trained with your latest training schedule on a 4 gpu system but get far worse results than your v2 weights. How exactly did you train your v2 weights: starting from imagenet? Same as your published training schedule? Is the difference between a 4 and a 8 gpu system and the resulting larger batch sizes really that big?

from mask_rcnn.

jiang1st avatar jiang1st commented on May 18, 2024

@jmtatsch I tried with 8 gpus with the default configuration, and didn't reproduce the reported result (far below 34.7). Not sure where could be the problem.

from mask_rcnn.

fanglw avatar fanglw commented on May 18, 2024

@waleedka I tried your code with one GPU (8G ram), but I can't reproduce your result (with all the setting is default except that one image per GPU), could you write out how you trained your model. Thanks. Below is my result.

Running COCO evaluation on 500 images.
Loading and preparing results...
DONE (t=0.03s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=4.27s).
Accumulating evaluation results...
DONE (t=1.15s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.230
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.424
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.222
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.112
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.282
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.217
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.307
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.357
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.460
Prediction time: 157.76884293556213. Average 0.31553768587112424/image
Total time: 188.39889979362488
Loading and preparing results...
DONE (t=0.03s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type segm
DONE (t=4.90s).
Accumulating evaluation results...
DONE (t=0.80s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.393
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.199
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.087
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.254
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.325
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.200
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.279
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.111
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.426
Prediction time: 151.28928136825562. Average 0.30257856273651124/image
Total time: 180.4482204914093

from mask_rcnn.

realwecan avatar realwecan commented on May 18, 2024

I was wondering if we had any better pretrained models with this matterport implementation so far? Do we have something with similar performance to the implementation from @ppwwyyxx?

from mask_rcnn.

fanglw avatar fanglw commented on May 18, 2024

@Pelups , thanks. I will try to train more epochs to see what I can get.

from mask_rcnn.

xpngzhng avatar xpngzhng commented on May 18, 2024

@jmtatsch @jiang1st Do you still working on the training of this version of mask rcnn implementation. I tried training using 1 gpu, 4 gpus and 8 gpus, at the end I show the performance of bbox prediction on 2014 minival dataset. it seems that 4 gpu training gives the best performance. In training, I only modify GPU_COUNT in config.py.

According to the official implementation of mask rcnn : detectron, multiple GPU training needs nontrivial modification to the code. Learning rate should change according to the number of GPUs, and there should be a warm up at the beginning of the training. See https://github.com/facebookresearch/Detectron/blob/master/GETTING_STARTED.md

I wonder anyone have delved deep into multi GPU training of this code. @waleedka

1 gpu
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.405
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.203
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.249
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.339
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.200
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.286
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.293
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.127
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.329
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
4 gpu
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.245
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.456
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.234
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.125
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.283
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.221
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.313
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.321
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.157
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.349
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.478
8 gpu
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.242
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.446
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.235
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.122
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.271
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.217
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.306
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.312
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.147
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.331
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.476

from mask_rcnn.

John1231983 avatar John1231983 commented on May 18, 2024

As the @ppwwyyxx said, the problem may come from pretrained model. So why not write resnet in model.py by using tf-slim and then you can use tf-slim official pretrained model. Is it possible?

from mask_rcnn.

John1231983 avatar John1231983 commented on May 18, 2024

What does it mean of full pretrained model in compare with resnet pretrain?

from mask_rcnn.

John1231983 avatar John1231983 commented on May 18, 2024

So my question is that can we rewrite resnet part using tf-slim in model.py to use offical pretrained model in tf-slim team

from mask_rcnn.

John1231983 avatar John1231983 commented on May 18, 2024

I see what you said. So, we first convert pretrained resnet 101 in tf-slim to keras and we can train with full pretrained model.

from mask_rcnn.

jmtatsch avatar jmtatsch commented on May 18, 2024

@XupingZHENG With the 101 layer weights and waleeds training schedule i got very mediocre results. Pretty much doubled the training up and its still running.

from mask_rcnn.

endluo avatar endluo commented on May 18, 2024

i train the coco2017, the loss is 0.6 , but i get the result is zeros.I don't know why...
this is my command:CUDA_VISIBLE_DEVICES=1 python3 coco.py evaluate --dataset=/home/swland/coco/ --model=/home/swland/coco/mask_rcnn_coco_last.h5 --year=2017

Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_MAX_DIM 512
IMAGE_META_SIZE 18
IMAGE_MIN_DIM 512
IMAGE_MIN_SCALE 1
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [512 512 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_mask_loss': 1.0, 'rpn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 6
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

Running per image evaluation...
Evaluate annotation type bbox
DONE (t=6.00s).
Accumulating evaluation results...
DONE (t=1.08s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Prediction time: 315.4280889034271. Average 0.6308561778068542/image
Total time: 428.98461985588074

from mask_rcnn.

s-bayer avatar s-bayer commented on May 18, 2024

I adjusted the training schedule to something more reasonable and am slowly getting somewhere. Currently bbox-AP is at 31.4 and segm-AP at 29.1 with a Resnet-101 on minival2014.

Still not great, but way better than what I got with the training schedule posted with this repository.
My training is still running and I should have final results in 1-2 days and will keep you updated.

I will of course post my training schedule eventually, but would like to wait until my thesis is done (October 1). If someone needs it earlier, just message me.

from mask_rcnn.

michaelisc avatar michaelisc commented on May 18, 2024

@syrix I would actually be very interested. I've also updated my schedule and get better results now but they are still quite a bit from the official checkpoint (however using a ResNet50).

from mask_rcnn.

stark-xf avatar stark-xf commented on May 18, 2024

I train the coco2017, use coco.py, but get the result mAP=0.203. It is far worse result, could you tell me the reason?

Command: train
Model: /home/lixiaofeng/project/version5/Mask_RCNN-master/logs/coco20181012T1737/mask_rcnn_coco_0020.h5
Dataset: /home/lixiaofeng/dataset/coco
Year: 2017
Logs: /home/lixiaofeng/project/version5/Mask_RCNN-master/logs
Auto Download: False

Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 4
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 4
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

from mask_rcnn.

SG-97 avatar SG-97 commented on May 18, 2024

I'm running the evaluation but it seems to be stuck at "index created!" then, nothing. I've waited for 12+, evaluating on just 100 images. Yet the results aren't ready ? Is this an error of some sort or should just preserver.

from mask_rcnn.

ashwin-999 avatar ashwin-999 commented on May 18, 2024

This took a bit longer than I hoped, but I just uploaded a new trained model.

https://github.com/matterport/Mask_RCNN/releases/tag/v2.0

These are the results on the COCO 5K minival. Bounding box AP is 34.7 (the paper reports 38.2 on test-dev but they don't report minival, unless I missed it). For segmentation, the AP is 29.6 (the paper reports 35.4 on minival). Playing with hyper parameters might allow you to get higher accuracy, but I didn't have enough time to try.

Evaluate annotation type *bbox*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.377
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.163
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.214
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601

Evaluate annotation type *segm*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.296
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.306
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.330
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.369
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.538

Hi, Do you have any notes on the hyper parameters, data augmentation, optimizer etc.?

I am working on the code as is with no changes made and am trying to train on Coco2014 dataset
"bbox" results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.142
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.235
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.147
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.048
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.166
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.248
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.157
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.193
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.354
Prediction time: 173.85491013526917. Average 0.3477098202705383/image
Total time: 182.46349167823792

I have read through many issues here and have picked up various suggestions - use Adam, data augmentations, use ResNet50, lower learning rate, train all layer directly without stage-wise training - to mention a few.

I have a couple of specific questions regarding training on Coco dataset.

  1. How many batches (model sees the whole dataset once) do you train for?
  2. What is the best train/val (absolute) loss values? (so I know what to expect from the train logs)
    The general train loss values i see at my end is something like:
    loss: 1.8633 - rpn_class_loss: 0.0268 - rpn_bbox_loss: 0.7644 - mrcnn_class_loss: 0.3761 - mrcnn_bbox_loss: 0.4218 - mrcnn_mask_loss: 0.2742
    where in only rpn_class_loss is small. Other loss values are relatively larger (as seen here)

Finally, generally, I am wondering if you have some suggestions from your experiments on the right direction to get the model to a convergence point?

from mask_rcnn.

ashwin-999 avatar ashwin-999 commented on May 18, 2024

@stark-xf Hi, could you figure how to the model trained?

from mask_rcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.