rykov8 / ssd_keras Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 552.0 3.19 MB

Port of Single Shot MultiBox Detector to Keras

License: MIT License

Python 1.66% Jupyter Notebook 98.34%

computer-vision deep-learning keras-models object-detection

ssd_keras's People

Contributors

Stargazers

Watchers

Forkers

lunardog aanikin93 wanjinchang nieshaoshuai 2php chagge benjamesbabala alyato lvaleriu anazou nipe0324 dchall88 sonnyhu oarriaga pchankh blooberr projectafey bigsnarfdude 157995010 fabi92 ltoscano trigrass2 wtnan2003 jjykh renato145 shuaimi bityangke arunreddy iflier tvkpz chenwgen lantuzi litingsjj snakeroot91 pzhokhov cory8249 anguillanneuf qinhongwei iryna darraghdog micba zakktakk royhuang9 rdjondo hprop timsmole yougoforward embeddedsamurai samtzai glastonburyc sunxingxingtf nakamoo aihill ubikas wcy940418 ahrnbom karolmajek rozental conanhung vanillaxm jsendino bashgu glebalshanskii mvoelk thomas32426 negation pkuwison shingomatsuura larion93 sampathweb elianomarques toshitanian vijaysudheer meirtz mattzheng knightofdawn andyhyh kiyo-e zhanghao-jnu wasnot sarthusarth elizabethcase khemkaiitr snowmasaya xiaoxtm hellojialee leriomaggio ps793 vareto-forks cocoza4 senseyoung jjkke88 motomizuki clancyian abnerdesigner1992 xiaoerlaigeid pycn ossdc maueki newebug

ssd_keras's Issues

FileNotFoundError: [Errno 2] No such file or directory: '../../frames/frame02579.png'

I am getting a file not found error in generate function of file SSD_Training.ipynb, complaining a file is not found.

the stack trace is as below:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/vorale/anaconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/vorale/anaconda3/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/vorale/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 429, in data_generator_task
    generator_output = next(self._generator)
  File "/home/vorale/Downloads/ssd_keras-master/ssd_training2.py", line 186, in generate
    img = imread(img_path).astype('float32')
  File "/home/vorale/anaconda3/lib/python3.5/site-packages/scipy/misc/pilutil.py", line 154, in imread
    im = Image.open(name)
  File "/home/vorale/anaconda3/lib/python3.5/site-packages/PIL/Image.py", line 2280, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../../frames/frame02579.png'

Is there any folder I haven't included?

Input images of varying size

In the examples, we hard-code in the image size. Is this uniformity a requirement of the algorithm, or is it possible for the algorithm to deal with images of varying size?

Can you explain the gt_pascal.pkl format?

Hi,

Can you explain the gt_pascal.pkl format?
and how is it formated from the pascal format 👍

<?xml version="1.0" encoding="UTF-8" ?>
;;<annotations>
;;  <folder>/home/user/path-to-kitti-root/training/image/</folder>
;;  <filename>000000.png</filename>
;;  <size>
;;    <width>1224</width>
;;    <height>370</height>
;;    <depth>3</depth>
;;  </size>
;;  <object>
;;    <name>Pedestrian</name>
;;    <truncated>0</truncated>
;;    <occluded>0</occluded>
;;    <alpha>-0.20</alpha>
;;    <bndbox>
;;      <xmin>712.40</xmin>
;;      <ymin>143.00</ymin>
;;      <xmax>810.73</xmax>
;;      <ymax>307.92</ymax>
;;    </bndbox>
;;    <dimensions>
;;      <height>1.89</height>
;;      <width>0.48</width>
;;      <length>1.20</length>
;;    </dimensions>
;;    <location>
;;      <x>1.84</x>
;;      <y>1.47</y>
;;      <z>8.41</z>
;;    </location>
;;    <rotation_y>0.01</rotation_y>
;;    <property>-0.20,0.00,0</property>
;;  </object>
;;  <object>
;;  .
;;  .
;;  .
;;  </object>
;; <object>
;;  .
;;  .
;;  .
;;  </object>
;;</annotations>

[question] What is the purpose of the variances?

Hello @rykov8 sorry for bothering you again,
Looking at your ssd_training.py file it seems like that variances do not take part of the training loss or even in any part of the training pipeline. However, in your ssd_utils.py, the method detection_out does change the prediction by multiplying them by their respective variance.

        decode_bbox_center_x = mbox_loc[:, 0] * prior_width * variances[:, 0]
        decode_bbox_center_x += prior_center_x
        decode_bbox_center_y = mbox_loc[:, 1] * prior_width * variances[:, 1]
        decode_bbox_center_y += prior_center_y
        decode_bbox_width = np.exp(mbox_loc[:, 2] * variances[:, 2])
        decode_bbox_width *= prior_width
        decode_bbox_height = np.exp(mbox_loc[:, 3] * variances[:, 3])

I do understand that one has to decode the boxes since they were encoded using the transformation described in equation 2 from SSD (and faster R-CNN)

My main concern is that the variances are changing explicitly the values already outputted by the CNN without considering them directly in the training procedure; furthermore, I do not seem to find any reference, neither in the SSD or in Faster R-CNN papers, that make a reference to these variances. Maybe I am missing something in the papers or in the implementation, in that case I would be very grateful if you could tell me if I making a mistake or maybe if you could elaborate on the use of these variances.

Thank you very much.

What is prior_boxes_ssd300.pkl? Is there anyone knowing this?

After read all the issues here, still don't know what prior_boxes_ssd300.pkl is.

link to weights dead

Hello,
it seems the Mega link for downloading the weights is dead.
Are you planning on making a new one?

Thanks for your work!

is it possible with theano backend

tensorflow is problem for windows
is it possible make a code with theano backend

Unable to find frame01884.png

Hi,
I want to run SSD_training.ipynb. I converted that to .py and when I run that, I get the following error. What is that am missing?

Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 429, in data_generator_task
generator_output = next(self._generator)
File "SSD_training.py", line 204, in generate
img = imread(img_path).astype('float32')
File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 154, in imread
im = Image.open(name)
File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2312, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: u'../../frames/frame01884.png'

Traceback (most recent call last):
File "SSD_training.py", line 285, in
nb_worker=1)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1528, in fit_generator
str(generator_output))
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

[Question] Which parameters to change when increasing image size?

Hi,

I am trying to replicate the same model but for the 500x500 version in the paper.
Apart from the input image_shape, and the priors, what other parameters need to be changed?

I am getting an error like this when running fit_generator

InvalidArgumentError: Incompatible shapes: [16,7308,4] vs. [16,28461,4]
	 [[Node: sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](strided_slice_20, strided_slice_21)]]
	 [[Node: mul_11/_375 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_5128_mul_11", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I tried seeing if there is anything hard-coded in the SSD300 object or generators, but perhaps I do not see it.
Your advice would be appreciated! :)

For the new priors, I am using https://gist.github.com/codingPingjun/aa54be7993ca6b2d484cccf5a2c5c3d4 but with 500x500 size.

assign_boxes method may have a bug

Hi,

Thanks for the great work!

When I tried to run the training script, ran into shape mismatch at this line: https://github.com/rykov8/ssd_keras/blob/master/ssd_utils.py#L149

Should assignment[:, 5:-8][best_iou_mask] be assignment[:, 4:-8][best_iou_mask]?

Role of num_priors

Hi rykov8,

First of, great job! and thanks a lot for sharing this repo with us.

I'm trying to shorten the ssd network a bit to see if I can gain on speed during training.
I see you set num_priors variable to either 3 or 6, and then use it to determine the number of filters nb_filter in the Conv2D layers responsible for the location and confidence multiboxes.
Now, in trying to make a shorter network I end up with a shape mismatch coming from the last merge() layer (prediction layer):
Exception: "concat" mode can only merge layers with matching output shapes except for the concat axis. Layer shapes: [(None, 247500, 4), (None, 247500, 2), (None, 337500, 8)]

That is, the shape of these three layers:

net['mbox_loc'] = Reshape((num_boxes, 4),
                              name='mbox_loc_final')(net['mbox_loc'])
net['mbox_conf'] = Reshape((num_boxes, num_classes),
                               name='mbox_conf_logits')(net['mbox_conf'])
net['mbox_priorbox'] = merge([net['conv1_2_mbox_priorbox'],
                                  net['conv2_2_mbox_priorbox']],
                                  mode='concat',
                                  concat_axis=1,
                                  name='mbox_priorbox')

I try to look in the literature with no luck. Can you maybe explain how to set this parameter? What does it depend on?
My input image shape is (300, 300,3), just like in your example and I'm using the same priors you pickled.

Thanks in advance!

A question about PriorBox layer in ssd_layers.py

I want to ask two questions about the PriorBox layer in your work. What's the meaning of min_szie and max_size? Maybe there are some relationships between these parameters and S__min, S__max in paper. Really hope you can give me replies. Thank you! @rykov8

Error when checking model target

Hi, nice work....
but i have errror in notebook ssd
please help ;)

Exception: Error when checking : expected input_2 to have shape (None, 300, 300, 3) but got array with shape (5, 3, 300, 300)

python 2.7 anaconda
keras and theano,tf updated to last

（not an issue）please specify the version of keras in read me

hi @rykov8
Thanks for your good work!
I think you'd better specify the keras's version in read me because I have met so many issues caused by the versions of keras .

ImportError: cannot import name GlobalAveragePooling2D

I got the following error

Using TensorFlow backend.
Traceback (most recent call last):
File "ssd.py", line 9, in
from keras.layers import GlobalAveragePooling2D
ImportError: cannot import name GlobalAveragePooling2D

some questions about ssd.py

In ssd.py,
I found
net['input'] = input_tensor
net['conv1_1'] = Convolution2D(64, 3, 3,
activation='relu',
border_mode='same',
name='conv1_1')(net['input'])

I know Convolution2D, but I can not understant it times (net['input']) ,and I did not find this usage in keras's document. Can you provide more details about this?
Thank you!

[question]How to pre-train? What dataset should we use?

Hi rykov8,

Do you know how to pre-train the vgg part in ssd network? And what dataset do you use to do that?

Thanks a lot.

[question]The code is written by tensorflow API, not pure Keras API. It can only run on tensorflow, not theano?

e.g.: ssd_training.py. A lot of code are used by tensorflow API. Not pure keras API.

Slow detection

Great work! Thanks a lot!

The detection takes around 2 second per image on a mac using only CPU.
It's quite different from the performance of test provided in the paper.
Apart from hardware, is it possible that it's caused by the overhead of Keras?
Also, may I ask is it possible to shrink the network somehow?
Thank you.

pretrained VGG model

hi did you use pretrained VGG model? Then how did you subsample the parameter from fc6 and fc7?

Training_on_new_data

Hi. @rykov8 .FIrstly, thanks for this keras port for SSD. You are amazing :)
I have been trying to train the model for hand detection. I have basically a single class then, that of a hand. I set NUM_CLASSES=2 as you specified in other issues. Can you please let me know about my input format. My data currently has the 5 tuple.:- label, x0, y0, width, height to specify the coords of the bounding box. I have generated the same corresponding to each hand image in my dataset . Do we represent our input through prior_boxes_ssd300.pkl and gt_pascal.pkl?? How exactly do i do that. What is the prior_boxes_ssd300.pkl for. It would be great if you can help me out. Thanks in advance..

How to train my own dataset?

I just want to detect the fish and capture it from a image. One picture only contain one kind of fish. My dataset has 7 kinds of fishes image. How to train my dataset and i can capture fish from a image. Thanks.

How Can I see output?

How can I execute the code in SSD.ipynb and view output?

How to generate prior_boxes_ssd300.pkl

Hi, your implementation is really great!
But I have a question. If I want to detect some small objects and I want to detect from conv3, should I change prior_boxes_ssd300.pkl? I think this pkl is in the same format with the class PriorBox, but I don't know how to generate the pkl, can you give me some advices?
Thank you in advance.

Face error when I am training my own data

Hi @rykov8, I have a error problem.
When I try to train my own image dataset(follow SSD_training notebook), I face a error as follow:

832/1264 [==================>...........] - ETA: 634s - loss: 2.3086Exception in thread Thread-12:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 404, in data_generator_task
generator_output = next(generator)
File "/home/optsai/文件/Object detection/test_training.py", line 197, in generate
img = jitter(img)
File "/home/optsai/文件/Object detection/test_training.py", line 102, in contrast
gs = self.grayscale(rgb).mean() * np.ones_like(rgb)
File "/home/optsai/文件/Object detection/test_training.py", line 86, in grayscale
return rgb.dot([0.299, 0.587, 0.114])
ValueError: shapes (300,300) and (3,) not aligned: 300 (dim 1) != 3 (dim 0)

Can you tell me how to fix it?? This is my code.
train.txt

`Filter must not be larger than the input` error

When I try the notebook, I get this error message:

ValueError: Filter must not be larger than the input: Filter: (3, 3) Input: (1, 1)

in this line:

    x = Convolution2D(24, 3, 3, border_mode='same',
                      name='pool6_mbox_loc')(net['pool6'])

It passes when I change it to:

    x = Convolution2D(24, 1, 1, border_mode='same',
                      name='pool6_mbox_loc')(net['pool6'])

but then loading the weights fails.

pool6 is indeed reshaped to (1, 1, 256), so what can I do to fit the (3, 3) convolution?

Simple Tutorial on Simple Data Set

To those who have got this to work, could anyone point to some simple tutorial code? Admittedly getting lost in all the dependencies to work. I have not fully explored yet, but its tough to even see what the inputs and outputs to the model are. (images, ground truth bbox arrays, target class arrays etc) Just looking for some simple starter code to begin exploring the true complexity of this amazing implementation.

Training Problem

Hi @rykov8, I have a training error when I train my own image dataset.
1120/1133 [============================>.] - ETA: 15s - loss: 1.8311Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 404, in data_generator_task
generator_output = next(generator)
File "/home/optsai/文件/Object detection/training(fine-tune).py", line 169, in generate
img = imread(img_path).astype('float32')
TypeError: float() argument must be a string or a number

Do you meet this error before??

.pkl files are written for Python 3 only

After messing around a bit, I was able to convert prior_boxes_ssd300.pkl to a Python2 compatible format. I'm submitting it here in case it is useful for anybody.
prior_boxes_ssd300_python2.pkl.zip

The problem with the .pkl files included in this repo is that they use Pickle Protocol 3, which is only supported on Python3, not Python2. Everything else seems to work with Python2, so it's unfortunate to leave Python2 out just because of that.

I also want to take the time to thank you for porting SSD to Keras! It makes it much easier to work with, compared to the original Caffe implementation.

BBoxUtility cannot NMS ground truth?

Hi! I recently noticed that the detection_out method on BBoxUtility doesn't seem to work with actual ground truth data, as far as I can tell. I believe it works on output that I have gotten from the SSD network, but it seems strange that it wouldn't support the actual grount truth; if the network was trained to perfection, its output should match that of the ground truth, in theory (at least for some dataset).

The reason I care about this is because in my application, I want to debug my generator class, similar to the generator in SSD_training.ipynb. I want to run the generator and visualize its output. That should always be a good idea to make sure you're training the network on reasonable data. I can do that easily with the image data I'm feeding the network, but for the ground truth I have found no way to visualize it. The most obvious way would be to do like you'd visualize the network's output similar to how it's done in SSD_training.ipynb, by running it through detection_out and then interpreting the output visually. However, detection_out doesn't seem to produce anything reasonable when fed ground truth data.

Here's some sample code that show what I mean:

from ssd_utils import BBoxUtility
import pickle
import numpy as np

NUM_CLASSES = 4

priors = pickle.load(open('prior_boxes_ssd300.pkl', 'rb'))
bbox_util = BBoxUtility(NUM_CLASSES, priors)

gt_pascal = pickle.load(open('gt_pascal.pkl', 'rb'))
gt = gt_pascal[u'frame03196.png']

y = bbox_util.assign_boxes(gt)

print(y.shape)

# Visualization of y
#import cv2
#cv2.imshow("Y", y.reshape((7308/12,16*12)).transpose())
#cv2.waitKey(0)

gt2 = bbox_util.detection_out(y.reshape(1,7308,16))

print(gt)
print(gt2) 
# Why are there no coordinates here?? class numbers and confidence seem okay
# but nowhere are actual coordinates of any box.

I've tried this with several different ground truth arrays, both from gt_pascal.pkl and also things I've constructed on my own. Always I get garbage positions of these boxes, with xmin = 1 and xmax = 0. As far as I can tell the class numbers and confidence (should always be 1 for the relevant boxes) seem fine, it's the coordinates that get messed up somehow.

Am I missing something, or is this a bug in BBoxUtility?

Problem testing on web Cam

Hi.
Thanks a lot for your SSD implementation, I tested your code in images and worked fine.
But when I tried to apply to a webcam,I changed vid_test.run('path/to/your/video.mkv') to vid_test.run(0) and tried also vid_test.run(), I got the following error

python videotest_example.py
Using TensorFlow backend.
Traceback (most recent call last):
File "videotest_example.py", line 25, in
vid_test.run()
File "/Users/walidahmed/Desktop/Code/ssd_keras-master/testing_utils/videotest.py", line 87, in run
vidw = vid.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)
AttributeError: 'module' object has no attribute 'cv'

I have my cv2.version :'3.2.0-dev'

Can you please advice?

Walid

Multi-box loss dimensions

Hello @rykov8 ,

In the file ssd_utils.py in the assign_boxes method you mention:

assignment[:, -7:] are all 0. See loss for more details

I believe that assignment[:, -8:] counts the number of positive examples (examples assigned to a prior box); however, I have not find any use of all the zeros between assignment[:, -7:-1] in the loss function.
Did I miss something in regards to their use? or should we change the loss function so that it contains only the counter at the end?
If so, we could probably substitute the counter from y_true[:, :, -8], with the assigned probability values for the ground truth bounding boxes for the background class in y_true[:, :, 4] right?

Thanks

multi label classification

hi i have a question about assigning boxes to gt

In matching step, paper firstly match each gt to the default box with maximum iou(1 default box per gt)
Second, they assigned a default box whose iou is larger than 0.5 with any gt. So, did you assign a default box to 'multi class'? or just 'one class' with maximum iou?

for example, a default box B has iou with aeroplane 0.6, person 0.7 in an image. then did you assign that box B to the only person? or both of aeroplane and person?

wrong training result when changing training data

When I use VOC2007 data to train, every thing is perfect.
But when I download 'cat' from Imagenet website, the training result is wrong even I changed the number of classes to 1 and the other corresponding place to 2.

Could anyone give a hint?

[question]I have changed the network from ssd+vgg to ssd+resnet, do you know how to generate prior_boxes_ssd300.pkl?

As title.

evaluate model

Hi.. i'm trying to use your framework on my data with 512x512 size as input
how should i evaluate my model?
i'm currently monitoring val_loss, should i monitor the val_acc ?
what is the val_acc meaning when using class and location per prior box?
what is val_acc of 0.11 meaning?
in the paper they mention mAP.. how can i calculate it ?

thank you very much for the help!

hard negative mining

hi
In last part of ssd_train.py, you add pos_conf_loss and neg_conf_loss.
When you calculate neg_conf_loss, you just select top_k boxes from max_conf, however, i think max_conf can include positive boxes(matched to a gt) because you did not restrict max_conf to have y_pred[:,:,-8] '0' or y_pred[:,:,4] '1'. What do you think about it?

Also, I`m implementing SSD300 in pascal voc on my own. However, when i draw confusion matrix for each epoch, most of the samples are biaised to negative class(background class). Can you give me any comment?

Unable to load model with 'th' image_dim_ordering

@rykov8

Hi, I am trying to load model with 'th' image_dim_ordering and 'tensorflow' backend, but it raising some errors. Is it possible to run model with 'th' image_dim_order?

Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
  File "./test.py", line 30, in <module>
    model = SSD300(input_shape, num_classes=NUM_CLASSES)
  File "/home/dummy/ssd_keras/ssd.py", line 77, in SSD300
    net['conv4_3_norm'] = Normalize(20, name='conv4_3_norm')(net['conv4_3'])
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 166, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/dummy/ssd_keras/ssd_layers.py", line 43, in call
    output *= self.gamma
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 814, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 987, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1613, in mul
    result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2242, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1617, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1568, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 38 and 512 for 'mul' (op: 'Mul') with input shapes: [?,512,38,38], [512].

How to understand encoding box?

How to understand encoding box codes? who can give me some papers or materials，thanks。

Porting Weights

Thanks for the work you have done here. I agree that Caffe is painful to use.

A quick question. How did you port the weights from Caffe to Keras? Do you have code to do this?
The reason I ask is that I would like to port the coco trained weights from the original repo.

Reproducing results on PASCAL VOC2007

Hi, Great repo!
Have you tried reproducing the results on PASCAL VOC2007 reported in the original paper?
This information in the README would be very helpful!

can't load the weights from weights_SSD300.hdf5

When I run SSD.py file.
Fail to run this code : model.load_weights('weights_SSD300.hdf5', by_name=True)
It gives error:
ValueError: Dimension 0 in both shapes must be equal, but are 64 and 3 for 'Assign_4' (op: 'Assign') with input shapes: [64,300,3,3], [3,3,3,64].

Seems the weights_SSD300.hdf5 doesn't match your model. Could you please help me with this. Thank you very much.

question on training with other dataset

@lvaleriu

Hi your code worked fine when I applied it to images, video and cam.
I am trying to classify vehicles and pedestrians, I checked the file gt_pascal.pkl and read one of its values by

import pickle
f = open("gt_pascal.pkl")
data = pickle.load(f)    #<type 'dict'>
print(data.get('frame05183.png'))

I have several question on training and I hope you can help me

1- Where is frame05183.png stored?
2-To do my training with 3 classes, I believe I will have to edit "gt_pascal.pkl", but where should I store my images?
3-What objects are you actually trying to train to detect in SSD_training.ipynb?

Thanks a lot

min_size/max_size

Hi again!

Im trying to use your implementation on a different problem than PASCAL VOC dataset suggests. In my case, I need to identify much smaller objects (ground truth boxes are 50px50p in 768X1024 images).
For what I've seen so far min_size and max_size determine the dimension of the default boxes. Are these parameters implemented to be pixels? or what are they? Cause in the paper they talk about scales, with values ranging from 0 to 1, and I'm not sur eif you implemented a different version of it and conceptually they do the same or if I'm mixing up concepts.

Thanks in advance!

Modifying SSD to support multiple labels per bounding box output?

Hello, thank you so much for this Keras implementation of SSD!

I have successfully ported the caffe weights for SSD trained on the COCO dataset (300x300 input, 80+1 classes), and now I'm trying to utilize these weights to help retrain SSD on my specific problem.

I need SSD to output 200-some attributes instead of 81 object classes, and since one object can have multiple attributes, I need SSD to output class scores that don't sum to 1.

So I tried just re-training without any major changes (had to randomly initialize the weights of 6 layers that relied on COCO's 81 class output, but loaded the rest just fine), and my training loss was stuck at around 200.

I then realized this would never train because the class score outputs are normalized to sum to 1, so I changed the Activation function on the last layer of SSD from "softmax" to "sigmoid" (maybe I should use "tanh" instead?), and I'm currently training successfully I think, but I won't know for a while. The loss started at 32 and is now down to 7 after 8000 samples, and still decreasing nicely.

Anyways, I was just wondering about SSD's custom loss function, since I see it uses a softmax loss for conf_loss, in ssd_training.py. Should I change this to some other loss function? If so, which one?

Is this model's architecture different from that of the paper?

I found the output is from conv4_3, fc7, conv6_2, conv7_2, conv8_2, pool6
but the paper is from conv4_3 , fc7, conv9_2, conv10_2, conv11_2
and the number of default box is different, this model use either3 or 6 , but the paper uses 4 or 6?
So this model still has the same performance(same level) as that of the paper?
Thank you

Bug in random cropping?

In the ssd_training.ipynb, I found the following code in the function of random_sized_crop seems problematic:

if (x_rel < cx < x_rel + w_rel and y_rel < cy < y_rel + h_rel):
                    xmin = (box[0] - x) / w_rel
                    ymin = (box[1] - y) / h_rel
                    xmax = (box[2] - x) / w_rel
                    ymax = (box[3] - y) / h_rel

Since the coordinates are box[:] are relative coordinates, I think these lines should be

if (x_rel < cx < x_rel + w_rel and y_rel < cy < y_rel + h_rel):
                    xmin = (box[0] - x_rel) / w_rel
                    ymin = (box[1] - y_rel) / h_rel
                    xmax = (box[2] - x_rel) / w_rel
                    ymax = (box[3] - y_rel) / h_rel