Comments (15)
And you need to set use_difficult=True, return_difficult=True
because the evaluation of Pascal VOC requires the information of difficulty.
https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75#file-eval-faster-rcnn-py-L10
from chainercv.
Did you get 70.3mAP on eval_voc07.py
, but 71.1mAP on your script?
from chainercv.
Thanks for reporting.
Is this a result reported by DetectionVOCEvaluator in train.py
?
Which evaluation script did you use?
from chainercv.
I wrote myself, based on training code.
https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75
from chainercv.
@zori Why don't you use VOC07's metric?
from chainercv.
Example code:
https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py
from chainercv.
Kudos for adding the ProgressHook with real-time fps! Using your script I got the following results, i.e. reproduced (70.3mAP):
mAP: 0.703815
aeroplane: 0.728792
bicycle: 0.781839
bird: 0.689493
boat: 0.580245
bottle: 0.542269
bus: 0.762974
car: 0.800064
cat: 0.827440
chair: 0.530806
cow: 0.795550
diningtable: 0.673258
dog: 0.767966
horse: 0.797277
motorbike: 0.762030
person: 0.774788
pottedplant: 0.447154
sheep: 0.700003
sofa: 0.650613
train: 0.755741
tvmonitor: 0.707990
Using my script, modified as you suggested (use_07_metric=True, using difficult - I had misunderstood what that would do and expected to get lower mAP with it), https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75, I still get lower mAP 64.8, so will try to spot what's the difference.
{'target/ap/aeroplane': 0.69227046387196767,
'target/ap/bicycle': 0.71394777425178191,
'target/ap/bird': 0.61498866519566731,
'target/ap/boat': 0.54880581204069478,
'target/ap/bottle': 0.48883048108296545,
'target/ap/bus': 0.70652445184562551,
'target/ap/car': 0.79961783751344018,
'target/ap/cat': 0.7926320482872764,
'target/ap/chair': 0.46797736195443013,
'target/ap/cow': 0.69721986728638619,
'target/ap/diningtable': 0.59980098919256719,
'target/ap/dog': 0.6963571015848693,
'target/ap/horse': 0.72264672006772912,
'target/ap/motorbike': 0.70002532901614711,
'target/ap/person': 0.70573380091904236,
'target/ap/pottedplant': 0.42114938608966307,
'target/ap/sheep': 0.61374799690307713,
'target/ap/sofa': 0.5651731051612594,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.67203969805997776,
'target/map': 0.64872161317489208}
from chainercv.
Maybe it's useful to add a link to the evaluation scripts in this section https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn#performance so people don't start writing their (incorrect) evaluator.
I think the difference might come from the model's different score_thresh
for evaluation / visualization.
model.use_preset('evaluate')
Thanks for the help, I'll close that issue now.
from chainercv.
Adding model.use_preset('evaluate')
to my gist bumped the score to 71.1mAP, which is more than even you guys reported! 🥇
{'target/ap/aeroplane': 0.72342105149596758,
'target/ap/bicycle': 0.77929621891057144,
'target/ap/bird': 0.72381003557002876,
'target/ap/boat': 0.58643679970722262,
'target/ap/bottle': 0.56510820347237467,
'target/ap/bus': 0.80929680938381243,
'target/ap/car': 0.79957630581849859,
'target/ap/cat': 0.82693536179619742,
'target/ap/chair': 0.53027815093665998,
'target/ap/cow': 0.78799813663608409,
'target/ap/diningtable': 0.67882123042487097,
'target/ap/dog': 0.79649319692517251,
'target/ap/horse': 0.79702688535698529,
'target/ap/motorbike': 0.76974911555409298,
'target/ap/person': 0.77391510272795971,
'target/ap/pottedplant': 0.44851681033208729,
'target/ap/sheep': 0.70565664017521923,
'target/ap/sofa': 0.64787553933910336,
'target/ap/train': 0.75494337317327176,
'target/ap/tvmonitor': 0.71522151624179597,
'target/map': 0.71101882419889884}
from chainercv.
Yes, that's right. I was planning on figuring out why exactly. I've updated the gist to what I used for the evaluation.
from chainercv.
I am suspecting that batchsize changed the output of convolutions.
Can you set it to 32 in your gist code?
Also, chainer.config.train = False
doesn't exist in your gist code, but this is probably not the issue because there is no batchnormalization.
EDIT:
I realized that images are not batched when forwarding through convolutions even if they are passed to predict
by batches. That means batchsize is not the cause of the difference.
Instead, The chainer.config.train = False
is probably the cause of the difference.
It changes the behavior of ProposalCreator
. https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/utils/proposal_creator.py#L101
from chainercv.
Using the chainerCV evaluation script https://github.com/chainer/chainercv/blob/master/examples/detection/eval_voc07.py with my trained model I got 0.703815 mAP.
Using all the settings you proposed, and changing to non-train mode, I reproduced exactly the result of the evaluation script (0.70381452150068202mAP) 🎉 https://gist.github.com/zori/6a1e9cac10b4ffcf601407cddda5cd75
m # chainer.config.train = False, model.use_preset('evaluate'), use_difficult=True, return_difficult=True, use_07_metric=True
{'target/ap/aeroplane': 0.72879232069364264,
'target/ap/bicycle': 0.7818391153877351,
'target/ap/bird': 0.6894932410832586,
'target/ap/boat': 0.58024463343481369,
'target/ap/bottle': 0.54226872580937269,
'target/ap/bus': 0.76297390329751469,
'target/ap/car': 0.80006371632971629,
'target/ap/cat': 0.82743970304062942,
'target/ap/chair': 0.53080637265582875,
'target/ap/cow': 0.795549850859535,
'target/ap/diningtable': 0.67325826936140487,
'target/ap/dog': 0.76796552879063684,
'target/ap/horse': 0.79727656780461242,
'target/ap/motorbike': 0.76203001156652761,
'target/ap/person': 0.77478779928501851,
'target/ap/pottedplant': 0.44715398473512707,
'target/ap/sheep': 0.70000306866435569,
'target/ap/sofa': 0.65061283254689395,
'target/ap/train': 0.75574071803816101,
'target/ap/tvmonitor': 0.70799006662885811,
'target/map': 0.70381452150068202}
from chainercv.
One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False
as a VOCDetectionDataset
option would give improved results (difficult bounding boxes will not be considered in the evaluation). But that actually yielded slightly lower mAP (0.696):
m # use_difficult=False, return_difficult=True, chainer.config.train = False, model.use_preset('evaluate'), use_07_metric=True
{'target/ap/aeroplane': 0.72868271753399505,
'target/ap/bicycle': 0.78164635163926977,
'target/ap/bird': 0.68639681313787682,
'target/ap/boat': 0.56111095740747918,
'target/ap/bottle': 0.54018898989758468,
'target/ap/bus': 0.76206103546449477,
'target/ap/car': 0.79554754529536775,
'target/ap/cat': 0.82671094720028693,
'target/ap/chair': 0.5142619273016299,
'target/ap/cow': 0.77286045805869263,
'target/ap/diningtable': 0.65993288409822526,
'target/ap/dog': 0.76555032538227552,
'target/ap/horse': 0.79347936557475029,
'target/ap/motorbike': 0.76010223958233081,
'target/ap/person': 0.77047906484839457,
'target/ap/pottedplant': 0.440254432579609,
'target/ap/sheep': 0.68870669608096324,
'target/ap/sofa': 0.63355841102101251,
'target/ap/train': 0.7551344575379958,
'target/ap/tvmonitor': 0.70322286913713505,
'target/map': 0.69699442443896842}
That means Faster R-CNN is slightly better at the VOC bounding boxes annotated as difficult than on the rest.
from chainercv.
One things that I still wonder about related to the VOC dataset, I thought that use_difficult=False as a VOCDetectionDataset option would give improved results (difficult bounding boxes will not be considered in the evaluation).
This table shows how we compute precision. If use_difficult=False
and a detector find a difficult object, it is considered as false-positive
. In this case, precision decreases.
Prediction | use_difficult=False | use_difficult=True, return_difficult=False | use_difficult=True, return_difficult=True |
---|---|---|---|
Matched to easy GT | T | T | T |
Matched to difficult GT | F | T | (don't care) |
Not matched | F | F | F |
Also, this table shows how we compute recall. If return_difficult=False
and a detector can not find a difficult object, it is considered as false-negative
. In this case, recall decreases.
Ground truth | use_difficult=False | use_difficult=True, return_difficult=False | use_difficult=True, return_difficult=True |
---|---|---|---|
Easy and detected | P | P | P |
Easy and not detected | N | N | N |
Difficult and detected | - | P | (don't care) |
Difficult and not detected | - | N | (don't care) |
In conclusion, use_difficult=True, return_difficult=True
is easiest setting.
from chainercv.
Thank you, that really clarifies things!
In conclusion, use_difficult=True, return_difficult=True is easiest setting.
I think that's an important and non-trivial take-home message, as it might be difficult for people not familiar with the PASCAL evaluation to reach the correct conclusion. Would you consider perhaps adding it to the readme https://github.com/chainer/chainercv/tree/master/examples/detection
from chainercv.
Related Issues (20)
- Faster RCNN training result problem HOT 2
- Add a img.resize function in utils HOT 2
- A function to return segmented image HOT 2
- no module named 'chainercv.datasets' HOT 6
- Problems of FCIS HOT 6
- Problem about eval_detection HOT 2
- Accuracy problems of FCIS example HOT 5
- loc_normalize_std in ProposalTargetCreator HOT 5
- yolo/train_v3.py does not work HOT 2
- DirectoryParsingLabelDataset fails to read images with an alpha channel
- Allow empty object bounding box for SSD training
- `neg_iou_thresh_lo` value in `ProposalTargetCreator`
- Is it fixed for loading the trained weights for FPN model? HOT 2
- Change Request in chainercv/examples/fpn/train_multi.py HOT 1
- build wheels for chainerCV failed HOT 1
- can't install environment, invalid channel HOT 3
- "Introduction to Chainer" doc link broken
- Request for train.py for YOLO
- eval_semantic_segmentation and calc_semantic_segmentation_confusion for when we have ignore label
- possible bug in the way that mIoU is computed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chainercv.