I would like to save the features of each detected bounding box for later use. How can

How to save features of each detected bounding box? about yad2k HOT 7 CLOSED

allanzelener commented on July 16, 2024 1

How to save features of each detected bounding box?

from yad2k.

Comments (7)

allanzelener commented on July 16, 2024 2

YOLO is designed to do one-shot classification of all objects in an image without using an attention mechanism. This is probably because YOLO is designed to maximize real-time performance.

Cropping the predictions for post processing with another network is one solution however it is more efficient to crop out the features corresponding to a prediction. This is the region of interest pooling approach introduced in Fast and Faster R-CNN. This also lets you train both region proposal and downstream tasks end-to-end.

I don't think Keras/Tensorflow have an official ROI pooling layer but there have been some implementations shared in this Keras issue thread. I haven't tried them yet myself though.

from yad2k.

allanzelener commented on July 16, 2024 1

Not sure what you mean by "feature of each detected bounding box". Do you just mean the output coordinates? See test_yolo.py for how to run the model on a single image.

See the Keras FAQ on how to get a specific model layer's features: https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer

from yad2k.

linamede commented on July 16, 2024 1

Thank you for your answer

It seems that I misunderstood the architecture of the framework. The features given from each layer are for the whole image and not for every box. What I will do is to crop the detected boxes and feed them to the network, to extract a feature from an intermediate layer for each of them. Thanks once again!

from yad2k.

linamede commented on July 16, 2024

I was thinking about your suggestion, to 'crop out the features corresponding to a prediction'.
Lets say that I want to extract features from the layer 17, which produces output of size (1,104,104,128).
If input image is of size width x height= 640x480, and two detected boxes are
A=[xa,ya,wa,ha]=[10,5,35,105] and
B=[xb,yb,wb,hb]=[60,15,30,140].
These coordinates, adjusted to feature map correspond to
A'=[xa',ya',wa',ha']=[1.62, 1.08, 5.68, 22.75] and
B'=[xb',yb',wb',hb']=[9.75, 3.25, 4.87, 30.33].
Now we see that A' and B' are not comparable, because they are of different width and height.
One solution would be to resize them to a common base (104x104 for example) but this would add noise, right?
So I think that it would be better to refeed the cropped boxes to the network and take the output of the 17th layer.

from yad2k.

pribadihcr commented on July 16, 2024

+1, any temporary solution?

from yad2k.

cygerts commented on July 16, 2024

This is what ROI pooling layer is doing, no matter what is the size of cropped object, the feature vector will have the same (fixed) length.
https://deepsense.ai/region-of-interest-pooling-explained/

from yad2k.

qiaohong-li commented on July 16, 2024

+1, any temporary solution?

from yad2k.

How to save features of each detected bounding box? about yad2k HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent