Giter VIP home page Giter VIP logo

Comments (7)

rykov8 avatar rykov8 commented on September 4, 2024

@natlachaman Hi!
I'm not sure, how exactly min_size and max_size correspond to the paper, because it really seems, that they have changed the choice of default boxes in one of the architecture versions, but haven't changed the explanation in the paper. Definitely, min_size and max_size are in pixels both in my implementations and the original one (in the original Caffe implementation the authors also use the same values in pixels). To make long story short, in the paper they describe one way of choosing priors, but they implement another one, that is ported here.
I am not sure, if SSD is able to detect such small object, probably, you need to change the architecture, e.g. to take features from conv3 block in order to be able to deal with such small objects (but it is a question, whether it helps). Moreover, you can check the last revision of the paper, where the authors introduce some data augmentation technique, that appeared to be useful for small objects.
Actually, this port is a little bit outdated, because the authors keep trying different architectures and try to add improvements, but fortunately, all of their current improvements can be added to my port quite easily if one wants to do it.

from ssd_keras.

natlachaman avatar natlachaman commented on September 4, 2024

@rykov8 Thanks for your quick reply!
It just seems odd to me since the ground truth coordinates are passed as relative coordinates so it'd be independent of the image size, but then you need to tune min_size and max_size manually in pixels and scale them accordingly.
In any case, thanks for the clarification!
Also, I changed the model a bit. I shorten it some and try to feed the PriorBox layer with lower output layers in the model. Thing is, feature maps output by lower conv nets are much larger than deeper ones and I end up with a crazy number of prior boxes which affects the performance greatly.
I'll see what I can do.

A bit off topic but also very quick question: Did you use PASCAL VOC2007 for the results you uploaded? I'mm trying t reproduce your results but I haven't succeed so far. I checked the names of the image files from what I have and they don't seem to match. So I was wondering if I jsut got that part worng. I'm getting the data from the official PASCAL site http://host.robots.ox.ac.uk:8080/pascal/VOC/voc2007/

Thanks a ton!

natlachaman

from ssd_keras.

rykov8 avatar rykov8 commented on September 4, 2024

@natlachaman relative coordinates are good, because you can resize input image from, say, 640x480 to 300x300, but you don't need to rescale the bounding boxes. On the contrary, the input image to the net is always 300x300 (you may change it, but the architecture is designed for this input, the authors also have one for ~500x500 pictures and it is a bit different), so, probably, that is why it is ok to choose sizes of priors in pixels. However, you are right, probably it was better to leave sizes of priors as scales, but I followed the original implementation and didn't consider my own improvements.

As for your last question. What results do you mean? If you are speaking about training example, I used my own small dataset, that is very different from PASCAL. If you are speaking about the weights, they are ported from the original Caffe implementation, but as stated in #7, I didn't check them on PASCAL.

from ssd_keras.

natlachaman avatar natlachaman commented on September 4, 2024

@rykov8 Oh! I didn't know the image size had that effect on the network. In the apepr they mentiones that they had better performance with larger images but didn't know they developed different architectures for different image sizes. Good to know!

As for the PASCAL question. I missed that ! (#7) I thought you trained it on PASCAL VOC2007. In any case, the model you implemented follows the original implementation so in theory it should work relative fine. I used the same data format as you did for your own dataset, resize the images to 300x300 but still get a really strange behaviour: the error grow shoots like crazy half way the first epoch and I can't figure out why.

Thanks for your time, always very helpful :)

from ssd_keras.

rykov8 avatar rykov8 commented on September 4, 2024

@natlachaman I'm not sure, that the architectures are different a lot for different input sizes (the idea is the same for sure), but if I am right, the net for 500x500 images is a little bit deeper. Anyway, you can check their prototxt files just to understand the architectures. Moreover, as I have mentioned, in the third revision of the paper they have changed a little the architecture for 300x300 pictures.

As for the error, do you use Adam? It always helps me to throw away SGD and use Adam, because I don't have a magic skill of tuning the learning rate.

I also have a small question to you. Probably, you have the implementation of MAP metric, as it is computed in PASCAL? I am implementing it (because I failed to find the implementation, that is quite strange, though), but I'm too lazy to finish. If you have, feel free to make pull request or post a link to someone's implementation.

from ssd_keras.

natlachaman avatar natlachaman commented on September 4, 2024

@rykov8 I use Adam or Rmsprop usually, for the same reason. No magic powers so far hehe.

As of your question: No, I don't. Implementing MAP is def in my list. I started working with the SSD last week, on and off, so I was mainly focus on getting it to work on my dataset first. But I'll for sure make a pull request whenever (and if I get further with SSD) I have MAP implemented or refer you to other work if stumble upon something interesting.

Thanks again for you help!

from ssd_keras.

rykov8 avatar rykov8 commented on September 4, 2024

@natlachaman you are welcome :)

from ssd_keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.