Giter VIP home page Giter VIP logo

Comments (15)

Hakuyume avatar Hakuyume commented on July 27, 2024 1

And you need to set use_difficult=True, return_difficult=True because the evaluation of Pascal VOC requires the information of difficulty.
https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75#file-eval-faster-rcnn-py-L10

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024 1

Did you get 70.3mAP on eval_voc07.py, but 71.1mAP on your script?

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

Thanks for reporting.
Is this a result reported by DetectionVOCEvaluator in train.py?
Which evaluation script did you use?

from chainercv.

zori avatar zori commented on July 27, 2024

I wrote myself, based on training code.

https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75

from chainercv.

Hakuyume avatar Hakuyume commented on July 27, 2024

@zori Why don't you use VOC07's metric?

from chainercv.

Hakuyume avatar Hakuyume commented on July 27, 2024

Example code:
https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py

from chainercv.

zori avatar zori commented on July 27, 2024

Kudos for adding the ProgressHook with real-time fps! Using your script I got the following results, i.e. reproduced (70.3mAP):

mAP: 0.703815
aeroplane: 0.728792
bicycle: 0.781839
bird: 0.689493
boat: 0.580245
bottle: 0.542269
bus: 0.762974
car: 0.800064
cat: 0.827440
chair: 0.530806
cow: 0.795550
diningtable: 0.673258
dog: 0.767966
horse: 0.797277
motorbike: 0.762030
person: 0.774788
pottedplant: 0.447154
sheep: 0.700003
sofa: 0.650613
train: 0.755741
tvmonitor: 0.707990

Using my script, modified as you suggested (use_07_metric=True, using difficult - I had misunderstood what that would do and expected to get lower mAP with it), https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75, I still get lower mAP 64.8, so will try to spot what's the difference.

{'target/ap/aeroplane': 0.69227046387196767,
'target/ap/bicycle': 0.71394777425178191,
'target/ap/bird': 0.61498866519566731,
'target/ap/boat': 0.54880581204069478,
'target/ap/bottle': 0.48883048108296545,
'target/ap/bus': 0.70652445184562551,
'target/ap/car': 0.79961783751344018,
'target/ap/cat': 0.7926320482872764,
'target/ap/chair': 0.46797736195443013,
'target/ap/cow': 0.69721986728638619,
'target/ap/diningtable': 0.59980098919256719,
'target/ap/dog': 0.6963571015848693,
'target/ap/horse': 0.72264672006772912,
'target/ap/motorbike': 0.70002532901614711,
'target/ap/person': 0.70573380091904236,
'target/ap/pottedplant': 0.42114938608966307,
'target/ap/sheep': 0.61374799690307713,
'target/ap/sofa': 0.5651731051612594,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.67203969805997776,
'target/map': 0.64872161317489208}

from chainercv.

zori avatar zori commented on July 27, 2024

Maybe it's useful to add a link to the evaluation scripts in this section https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn#performance so people don't start writing their (incorrect) evaluator.

I think the difference might come from the model's different score_thresh for evaluation / visualization.

model.use_preset('evaluate')

Thanks for the help, I'll close that issue now.

from chainercv.

zori avatar zori commented on July 27, 2024

Adding model.use_preset('evaluate') to my gist bumped the score to 71.1mAP, which is more than even you guys reported! 🥇

{'target/ap/aeroplane': 0.72342105149596758,
'target/ap/bicycle': 0.77929621891057144,
'target/ap/bird': 0.72381003557002876,
'target/ap/boat': 0.58643679970722262,
'target/ap/bottle': 0.56510820347237467,
'target/ap/bus': 0.80929680938381243,
'target/ap/car': 0.79957630581849859,
'target/ap/cat': 0.82693536179619742,
'target/ap/chair': 0.53027815093665998,
'target/ap/cow': 0.78799813663608409,
'target/ap/diningtable': 0.67882123042487097,
'target/ap/dog': 0.79649319692517251,
'target/ap/horse': 0.79702688535698529,
'target/ap/motorbike': 0.76974911555409298,
'target/ap/person': 0.77391510272795971,
'target/ap/pottedplant': 0.44851681033208729,
'target/ap/sheep': 0.70565664017521923,
'target/ap/sofa': 0.64787553933910336,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.71522151624179597,
'target/map': 0.71101882419889884}

from chainercv.

zori avatar zori commented on July 27, 2024

Yes, that's right. I was planning on figuring out why exactly. I've updated the gist to what I used for the evaluation.

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

I am suspecting that batchsize changed the output of convolutions.
Can you set it to 32 in your gist code?

Also, chainer.config.train = False doesn't exist in your gist code, but this is probably not the issue because there is no batchnormalization.

EDIT:
I realized that images are not batched when forwarding through convolutions even if they are passed to predict by batches. That means batchsize is not the cause of the difference.

Instead, The chainer.config.train = False is probably the cause of the difference.
It changes the behavior of ProposalCreator. https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/utils/proposal_creator.py#L101

from chainercv.

zori avatar zori commented on July 27, 2024

Using the chainerCV evaluation script https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py with my trained model I got 0.703815 mAP.

Using all the settings you proposed, and changing to non-train mode, I reproduced exactly the result of the evaluation script (0.70381452150068202mAP) 🎉 https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75

m # chainer.config.train = False, model.use_preset('evaluate'), use_difficult=True, return_difficult=True, use_07_metric=True

{'target/ap/aeroplane': 0.72879232069364264,
'target/ap/bicycle': 0.7818391153877351,
'target/ap/bird': 0.6894932410832586,
'target/ap/boat': 0.58024463343481369,
'target/ap/bottle': 0.54226872580937269,
'target/ap/bus': 0.76297390329751469,
'target/ap/car': 0.80006371632971629,
'target/ap/cat': 0.82743970304062942,
'target/ap/chair': 0.53080637265582875,
'target/ap/cow': 0.795549850859535,
'target/ap/diningtable': 0.67325826936140487,
'target/ap/dog': 0.76796552879063684,
'target/ap/horse': 0.79727656780461242,
'target/ap/motorbike': 0.76203001156652761,
'target/ap/person': 0.77478779928501851,
'target/ap/pottedplant': 0.44715398473512707,
'target/ap/sheep': 0.70000306866435569,
'target/ap/sofa': 0.65061283254689395,
'target/ap/train': 0.75574071803816101,
'target/ap/tvmonitor': 0.70799006662885811,
'target/map': 0.70381452150068202}

from chainercv.

zori avatar zori commented on July 27, 2024

One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False as a VOCDetectionDataset option would give improved results (difficult bounding boxes will not be considered in the evaluation). But that actually yielded slightly lower mAP (0.696):

m # use_difficult=False, return_difficult=True, chainer.config.train = False, model.use_preset('evaluate'), use_07_metric=True

{'target/ap/aeroplane': 0.72868271753399505,
'target/ap/bicycle': 0.78164635163926977,
'target/ap/bird': 0.68639681313787682,
'target/ap/boat': 0.56111095740747918,
'target/ap/bottle': 0.54018898989758468,
'target/ap/bus': 0.76206103546449477,
'target/ap/car': 0.79554754529536775,
'target/ap/cat': 0.82671094720028693,
'target/ap/chair': 0.5142619273016299,
'target/ap/cow': 0.77286045805869263,
'target/ap/diningtable': 0.65993288409822526,
'target/ap/dog': 0.76555032538227552,
'target/ap/horse': 0.79347936557475029,
'target/ap/motorbike': 0.76010223958233081,
'target/ap/person': 0.77047906484839457,
'target/ap/pottedplant': 0.440254432579609,
'target/ap/sheep': 0.68870669608096324,
'target/ap/sofa': 0.63355841102101251,
'target/ap/train': 0.7551344575379958,
'target/ap/tvmonitor': 0.70322286913713505,
'target/map': 0.69699442443896842}

That means Faster R-CNN is slightly better at the VOC bounding boxes annotated as difficult than on the rest.

from chainercv.

Hakuyume avatar Hakuyume commented on July 27, 2024

One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False as a VOCDetectionDataset option would give improved results (difficult bounding boxes will not be considered in the evaluation).

This table shows how we compute precision. If use_difficult=False and a detector find a difficult object, it is considered as false-positive. In this case, precision decreases.

Prediction use_difficult=False use_difficult=True, return_difficult=False use_difficult=True, return_difficult=True
Matched to easy GT T T T
Matched to difficult GT F T (don't care)
Not matched F F F

Also, this table shows how we compute recall. If return_difficult=False and a detector can not find a difficult object, it is considered as false-negative. In this case, recall decreases.

Ground truth use_difficult=False use_difficult=True, return_difficult=False use_difficult=True, return_difficult=True
Easy and detected P P P
Easy and not detected N N N
Difficult and detected - P (don't care)
Difficult and not detected - N (don't care)

In conclusion, use_difficult=True, return_difficult=True is easiest setting.

from chainercv.

zori avatar zori commented on July 27, 2024

Thank you, that really clarifies things!

In conclusion, use_difficult=True, return_difficult=True is easiest setting.

I think that's an important and non-trivial take-home message, as it might be difficult for people not familiar with the PASCAL evaluation to reach the correct conclusion. Would you consider perhaps adding it to the readme https://github.com/chainer/chainercv/tree/master/examples/detection

from chainercv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.