I trained the default model on GPU and here are the results from evaluation on the <co

I wrote myself, based on training code. <a href="https://gist.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Example code: <a href="https://github.com/chainer/chainercv/blob/master/examples/d

Maybe it's useful to add a link to the evaluation s in this section <a href="htt

Faster R-CNN example - how to reproduce mAP score as reported in chainerCV repo? about chainercv HOT 15 CLOSED

chainer commented on July 27, 2024

Faster R-CNN example - how to reproduce mAP score as reported in chainerCV repo?

from chainercv.

Comments (15)

Hakuyume commented on July 27, 2024 1

And you need to set use_difficult=True, return_difficult=True because the evaluation of Pascal VOC requires the information of difficulty.
https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75#file-eval-faster-rcnn-py-L10

from chainercv.

yuyu2172 commented on July 27, 2024 1

Did you get 70.3mAP on eval_voc07.py, but 71.1mAP on your script?

from chainercv.

yuyu2172 commented on July 27, 2024

Thanks for reporting.
Is this a result reported by DetectionVOCEvaluator in train.py?
Which evaluation script did you use?

from chainercv.

zori commented on July 27, 2024

I wrote myself, based on training code.

https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75

from chainercv.

Hakuyume commented on July 27, 2024

@zori Why don't you use VOC07's metric?

from chainercv.

Hakuyume commented on July 27, 2024

Example code:
https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py

from chainercv.

zori commented on July 27, 2024

Kudos for adding the ProgressHook with real-time fps! Using your script I got the following results, i.e. reproduced (70.3mAP):

mAP: 0.703815
aeroplane: 0.728792
bicycle: 0.781839
bird: 0.689493
boat: 0.580245
bottle: 0.542269
bus: 0.762974
car: 0.800064
cat: 0.827440
chair: 0.530806
cow: 0.795550
diningtable: 0.673258
dog: 0.767966
horse: 0.797277
motorbike: 0.762030
person: 0.774788
pottedplant: 0.447154
sheep: 0.700003
sofa: 0.650613
train: 0.755741
tvmonitor: 0.707990

Using my script, modified as you suggested (use_07_metric=True, using difficult - I had misunderstood what that would do and expected to get lower mAP with it), https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75, I still get lower mAP 64.8, so will try to spot what's the difference.

{'target/ap/aeroplane': 0.69227046387196767,
'target/ap/bicycle': 0.71394777425178191,
'target/ap/bird': 0.61498866519566731,
'target/ap/boat': 0.54880581204069478,
'target/ap/bottle': 0.48883048108296545,
'target/ap/bus': 0.70652445184562551,
'target/ap/car': 0.79961783751344018,
'target/ap/cat': 0.7926320482872764,
'target/ap/chair': 0.46797736195443013,
'target/ap/cow': 0.69721986728638619,
'target/ap/diningtable': 0.59980098919256719,
'target/ap/dog': 0.6963571015848693,
'target/ap/horse': 0.72264672006772912,
'target/ap/motorbike': 0.70002532901614711,
'target/ap/person': 0.70573380091904236,
'target/ap/pottedplant': 0.42114938608966307,
'target/ap/sheep': 0.61374799690307713,
'target/ap/sofa': 0.5651731051612594,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.67203969805997776,
'target/map': 0.64872161317489208}

from chainercv.

zori commented on July 27, 2024

Maybe it's useful to add a link to the evaluation scripts in this section https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn#performance so people don't start writing their (incorrect) evaluator.

I think the difference might come from the model's different score_thresh for evaluation / visualization.

model.use_preset('evaluate')

Thanks for the help, I'll close that issue now.

from chainercv.

zori commented on July 27, 2024

Adding model.use_preset('evaluate') to my gist bumped the score to 71.1mAP, which is more than even you guys reported! 🥇

{'target/ap/aeroplane': 0.72342105149596758,
'target/ap/bicycle': 0.77929621891057144,
'target/ap/bird': 0.72381003557002876,
'target/ap/boat': 0.58643679970722262,
'target/ap/bottle': 0.56510820347237467,
'target/ap/bus': 0.80929680938381243,
'target/ap/car': 0.79957630581849859,
'target/ap/cat': 0.82693536179619742,
'target/ap/chair': 0.53027815093665998,
'target/ap/cow': 0.78799813663608409,
'target/ap/diningtable': 0.67882123042487097,
'target/ap/dog': 0.79649319692517251,
'target/ap/horse': 0.79702688535698529,
'target/ap/motorbike': 0.76974911555409298,
'target/ap/person': 0.77391510272795971,
'target/ap/pottedplant': 0.44851681033208729,
'target/ap/sheep': 0.70565664017521923,
'target/ap/sofa': 0.64787553933910336,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.71522151624179597,
'target/map': 0.71101882419889884}

from chainercv.

zori commented on July 27, 2024

Yes, that's right. I was planning on figuring out why exactly. I've updated the gist to what I used for the evaluation.

from chainercv.

yuyu2172 commented on July 27, 2024

I am suspecting that batchsize changed the output of convolutions.
Can you set it to 32 in your gist code?

Also, chainer.config.train = False doesn't exist in your gist code, but this is probably not the issue because there is no batchnormalization.

EDIT:
I realized that images are not batched when forwarding through convolutions even if they are passed to predict by batches. That means batchsize is not the cause of the difference.

Instead, The chainer.config.train = False is probably the cause of the difference.
It changes the behavior of ProposalCreator. https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/utils/proposal_creator.py#L101

from chainercv.

zori commented on July 27, 2024

Using the chainerCV evaluation script https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py with my trained model I got 0.703815 mAP.

Using all the settings you proposed, and changing to non-train mode, I reproduced exactly the result of the evaluation script (0.70381452150068202mAP) 🎉 https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75

m # chainer.config.train = False, model.use_preset('evaluate'), use_difficult=True, return_difficult=True, use_07_metric=True

{'target/ap/aeroplane': 0.72879232069364264,
'target/ap/bicycle': 0.7818391153877351,
'target/ap/bird': 0.6894932410832586,
'target/ap/boat': 0.58024463343481369,
'target/ap/bottle': 0.54226872580937269,
'target/ap/bus': 0.76297390329751469,
'target/ap/car': 0.80006371632971629,
'target/ap/cat': 0.82743970304062942,
'target/ap/chair': 0.53080637265582875,
'target/ap/cow': 0.795549850859535,
'target/ap/diningtable': 0.67325826936140487,
'target/ap/dog': 0.76796552879063684,
'target/ap/horse': 0.79727656780461242,
'target/ap/motorbike': 0.76203001156652761,
'target/ap/person': 0.77478779928501851,
'target/ap/pottedplant': 0.44715398473512707,
'target/ap/sheep': 0.70000306866435569,
'target/ap/sofa': 0.65061283254689395,
'target/ap/train': 0.75574071803816101,
'target/ap/tvmonitor': 0.70799006662885811,
'target/map': 0.70381452150068202}

from chainercv.

zori commented on July 27, 2024

One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False as a VOCDetectionDataset option would give improved results (difficult bounding boxes will not be considered in the evaluation). But that actually yielded slightly lower mAP (0.696):

m # use_difficult=False, return_difficult=True, chainer.config.train = False, model.use_preset('evaluate'), use_07_metric=True

{'target/ap/aeroplane': 0.72868271753399505,
'target/ap/bicycle': 0.78164635163926977,
'target/ap/bird': 0.68639681313787682,
'target/ap/boat': 0.56111095740747918,
'target/ap/bottle': 0.54018898989758468,
'target/ap/bus': 0.76206103546449477,
'target/ap/car': 0.79554754529536775,
'target/ap/cat': 0.82671094720028693,
'target/ap/chair': 0.5142619273016299,
'target/ap/cow': 0.77286045805869263,
'target/ap/diningtable': 0.65993288409822526,
'target/ap/dog': 0.76555032538227552,
'target/ap/horse': 0.79347936557475029,
'target/ap/motorbike': 0.76010223958233081,
'target/ap/person': 0.77047906484839457,
'target/ap/pottedplant': 0.440254432579609,
'target/ap/sheep': 0.68870669608096324,
'target/ap/sofa': 0.63355841102101251,
'target/ap/train': 0.7551344575379958,
'target/ap/tvmonitor': 0.70322286913713505,
'target/map': 0.69699442443896842}

That means Faster R-CNN is slightly better at the VOC bounding boxes annotated as difficult than on the rest.

from chainercv.

Hakuyume commented on July 27, 2024

One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False as a VOCDetectionDataset option would give improved results (difficult bounding boxes will not be considered in the evaluation).

This table shows how we compute precision. If use_difficult=False and a detector find a difficult object, it is considered as false-positive. In this case, precision decreases.

Prediction	use_difficult=False	use_difficult=True, return_difficult=False	use_difficult=True, return_difficult=True
Matched to easy GT	T	T	T
Matched to difficult GT	F	T	(don't care)
Not matched	F	F	F

Also, this table shows how we compute recall. If return_difficult=False and a detector can not find a difficult object, it is considered as false-negative. In this case, recall decreases.

Ground truth	use_difficult=False	use_difficult=True, return_difficult=False	use_difficult=True, return_difficult=True
Easy and detected	P	P	P
Easy and not detected	N	N	N
Difficult and detected	-	P	(don't care)
Difficult and not detected	-	N	(don't care)

In conclusion, use_difficult=True, return_difficult=True is easiest setting.

from chainercv.

zori commented on July 27, 2024

Thank you, that really clarifies things!

In conclusion, use_difficult=True, return_difficult=True is easiest setting.

I think that's an important and non-trivial take-home message, as it might be difficult for people not familiar with the PASCAL evaluation to reach the correct conclusion. Would you consider perhaps adding it to the readme https://github.com/chainer/chainercv/tree/master/examples/detection

from chainercv.

Faster R-CNN example - how to reproduce mAP score as reported in chainerCV repo? about chainercv HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent