Comments (7)
YOLO is designed to do one-shot classification of all objects in an image without using an attention mechanism. This is probably because YOLO is designed to maximize real-time performance.
Cropping the predictions for post processing with another network is one solution however it is more efficient to crop out the features corresponding to a prediction. This is the region of interest pooling approach introduced in Fast and Faster R-CNN. This also lets you train both region proposal and downstream tasks end-to-end.
I don't think Keras/Tensorflow have an official ROI pooling layer but there have been some implementations shared in this Keras issue thread. I haven't tried them yet myself though.
from yad2k.
Not sure what you mean by "feature of each detected bounding box". Do you just mean the output coordinates? See test_yolo.py
for how to run the model on a single image.
See the Keras FAQ on how to get a specific model layer's features: https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
from yad2k.
Thank you for your answer
It seems that I misunderstood the architecture of the framework. The features given from each layer are for the whole image and not for every box. What I will do is to crop the detected boxes and feed them to the network, to extract a feature from an intermediate layer for each of them. Thanks once again!
from yad2k.
I was thinking about your suggestion, to 'crop out the features corresponding to a prediction'.
Lets say that I want to extract features from the layer 17, which produces output of size (1,104,104,128).
If input image is of size width x height= 640x480, and two detected boxes are
A=[xa,ya,wa,ha]=[10,5,35,105] and
B=[xb,yb,wb,hb]=[60,15,30,140].
These coordinates, adjusted to feature map correspond to
A'=[xa',ya',wa',ha']=[1.62, 1.08, 5.68, 22.75] and
B'=[xb',yb',wb',hb']=[9.75, 3.25, 4.87, 30.33].
Now we see that A' and B' are not comparable, because they are of different width and height.
One solution would be to resize them to a common base (104x104 for example) but this would add noise, right?
So I think that it would be better to refeed the cropped boxes to the network and take the output of the 17th layer.
from yad2k.
+1, any temporary solution?
from yad2k.
This is what ROI pooling layer is doing, no matter what is the size of cropped object, the feature vector will have the same (fixed) length.
https://deepsense.ai/region-of-interest-pooling-explained/
from yad2k.
+1, any temporary solution?
from yad2k.
Related Issues (20)
- weird error
- when trying'./yad2k.py yolo.cfg yolo.weights model_data/yolo.h5', has problem HOT 2
- retrain_yolo image size/shape (416,416 vs 416,416,3)
- Difference between yad2k.py and yad2k/models/keras_yolo.py?
- Replace session.run() with keras Model
- ValueError: Cannot feed value of shape (1, 416, 416) for Tensor 'input_1:0', which has shape '(?, 416, 416, 3)' HOT 2
- The bounding box shifts
- The bounding box shifts right or left from the target.
- I try to generate yolo.h5 and use this command: python yad2k.py yolo.cfg yolo.weights model_data/yolo.h5 HOT 4
- when I run the code "yolo_model = load_model("model_data/yolov2.h5")" always raise an error HOT 1
- Using darknet19 as a simple classifier
- AttributeError: Module 'tensorflow' has no attribute 'space_to_depth" HOT 4
- issue with model last layer
- Generate yolo.h5 file HOT 1
- detects only 10 objects on max
- wget yolo cfg doesn't work HOT 1
- ReLu Activation Function HOT 1
- convert yolo9000.weights error
- Attempting to use uninitialized value FeatureExtractor/MobilenetV2/expanded_conv_3/depthwise/BatchNorm/beta [[{{node _retval_FeatureExtractor/MobilenetV2/expanded_conv_3/depthwise/BatchNorm/beta_0_179}}]] HOT 1
- issues with config_path weights_path output_path
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yad2k.