nqanh / affordance-net Goto Github PK

AffordanceNet - Multiclass Instance Segmentation Framework - ICRA 2018

License: Other

CMake 1.35% Makefile 0.31% HTML 0.09% CSS 0.12% Jupyter Notebook 46.79% C++ 37.69% Shell 0.33% Python 9.19% Cuda 3.04% MATLAB 0.45% C 0.26% Cython 0.37%

affordance cnn deep-learning

affordance-net's People

Stargazers

Watchers

affordance-net's Issues

Sorry, Can you tell me how to create several .sm files in image with pascal voc datasets?

I find several objects in image of your ITT datasets, you have several .sm files, like 0_1, 0_2, 0_3.sm. But in pascal voc I don't know how to create several .sm files of one image.

can't load training data

Dear nqanh:
i can't load training data fram google,could you please upload data to baidu yun, or give me a address of data for loading, thanks!!!

Sorry that I have so many questions.Could you plz teach me how to read a .sm file , the mask?

Regarding cudnn version and caffe installation

@nqanh
Hi,
Do I need to install official caffe or just install your modified version, I have some compiling problem with your cudnn_tanh_layer, so I want to double check that you are using CUDA8.0 and cudnn5, not 5.1?

src/caffe/layers/cudnn_relu_layer.cu(19): error: identifier "activ_desc_" is undefined

After fix cudnn5 error, this error occured, and can not figure out why? Any suggestions?

how to measure the performance.

I do not know how to measure the performance. Could you tell me a file or a method?

How to train on my dataset

Hi, I have some question about the dataset. I download the data folder and find there are 4 .txt inside the ImageSets/Main folder: 4train.txt, test.txt, train.txt, train_ORIGINAL_BACKUP (copy).txt. But in the Pascal VOC dataset, there are trainval.txt, train.txt, val.txt and test.txt inside the ImageSets/Main folder. So, how can I split my dataset?

Then when I run the code, I meet the problem:

Traceback (most recent call last):
File "./tools/train_net.py", line 116, in
max_iters=args.max_iters)
File "/shenlab/lab_stor4/shujun/affordance-net-master/tools/../lib/fast_rcnn/train.py", line 171, in train_net
model_paths = sw.train_model(max_iters)
File "/shenlab/lab_stor4/shujun/affordance-net-master/tools/../lib/fast_rcnn/train.py", line 110, in train_model
self.solver.step(1)
File "/shenlab/lab_stor4/shujun/affordance-net-master/tools/../lib/rpn/proposal_target_layer.py", line 310, in forward
gt_mask = mask_flipped_ims[gt_mask_ind]
IndexError: list index out of range

Have you meet this problem before?

Low confidence class results

hi
I have some low confidence message popping up (message and related image below) or some classes being undetected in many images, e.g. people.

What classes did you train the provided pre-trained model with please?

Current img:  tools_lots.jpg
Detection took 0.709s for 1 object proposals
No detected box with probality > thresh =  0.9 -- Choossing highest confidence bounding box.

Tets

Error: Number of labels must match number of predictions

When trying to train with the IIT-AAF dataset I get an error with regards to the Number of labels:

I1211 15:56:06.012106  4756 net.cpp:106] Creating Layer mask_deconv3
I1211 15:56:06.012109  4756 net.cpp:454] mask_deconv3 <- pool5_2_conv6_relu
I1211 15:56:06.012112  4756 net.cpp:411] mask_deconv3 -> mask_deconv3
I1211 15:56:06.012537  4756 net.cpp:150] Setting up mask_deconv3
I1211 15:56:06.012545  4756 net.cpp:157] Top shape: 1 256 244 244 (15241216)
I1211 15:56:06.012548  4756 net.cpp:165] Memory required for data: 1637661548
I1211 15:56:06.012552  4756 layer_factory.hpp:77] Creating layer mask_score
I1211 15:56:06.012559  4756 net.cpp:106] Creating Layer mask_score
I1211 15:56:06.012562  4756 net.cpp:454] mask_score <- mask_deconv3
I1211 15:56:06.012567  4756 net.cpp:411] mask_score -> mask_score
I1211 15:56:06.012789  4756 net.cpp:150] Setting up mask_score
I1211 15:56:06.012795  4756 net.cpp:157] Top shape: 1 10 244 244 (595360)
I1211 15:56:06.012797  4756 net.cpp:165] Memory required for data: 1640042988
I1211 15:56:06.012801  4756 layer_factory.hpp:77] Creating layer loss_mask
I1211 15:56:06.012809  4756 net.cpp:106] Creating Layer loss_mask
I1211 15:56:06.012811  4756 net.cpp:454] loss_mask <- mask_score
I1211 15:56:06.012814  4756 net.cpp:454] loss_mask <- mask_targets
I1211 15:56:06.012817  4756 net.cpp:411] loss_mask -> loss_mask
I1211 15:56:06.012823  4756 layer_factory.hpp:77] Creating layer loss_mask
F1211 15:56:06.014055  4756 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (59536 vs. 0) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
./experiments/scripts/faster_rcnn_end2end.sh: line 65:  4756 Aborted                 ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}

Any idea how to solve this?

Thanks

About loss function

Hi!

I want to change multinomial cross entropy loss for affordance detection branch based on softmax into binary cross entropy loss based on sigmod, how can I do?

I try to change train.prototxt file as follow:
`layer {
name: "mask_score"
type: "Convolution"
bottom: "mask_deconv3" #
top: "mask_score"
param { lr_mult: 1.0 decay_mult: 1.0 }
param { lr_mult: 2.0 decay_mult: 0 }
convolution_param {
#num_output: 10 # 9 affordance classes + 1 background
#num_output: 1# output will be 1x1x14x14 --> for using SigmoidCrossEntropyLoss
num_output: 2# output will be 1x2x14x14 --> for using Softmax. Actually, binomial cross-entropy loss
#(sigmoid + cross entropy) = logistic regression = two classes softmax regression
kernel_size: 1 pad: 0
weight_filler {type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
bias_filler { type: "constant" value: 0 }
}
}

layer {
name: "loss_mask"
type: "SoftmaxWithLoss"
#bottom: "mask_score_reshape"
bottom: "mask_score"
bottom: "mask_targets"
top: "loss_mask"
loss_weight: 3
loss_param {
ignore_label: -1
normalize: true
#normalize: false
}
propagate_down: true # backprop to prediction
propagate_down: false # don't backprop to labels
}`

and set the base_lr = le-10 (large base_lr doesn't work). But the loss is very random, sometimes is very big to 100 and sometimes is small to 6. I can't see a downward trend of the loss.

AssertionError: Selective search data not found for pascal_voc

Hi,

I am trying to train the model with your code, but I met a problem.
I am using the command line:
./experiments/scripts/faster_rcnn_end2end.sh 1 VGG16 pascal_voc

And here is the error:
Set proposal method: selective_search Appending horizontally-flipped training examples... voc_2012_train gt roidb loaded from /media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/data/cache/voc_2012_train_gt_roidb.pkl Traceback (most recent call last): File "./tools/train_net.py", line 108, in <module> imdb, roidb = combined_roidb(args.imdb_name) File "./tools/train_net.py", line 73, in combined_roidb roidbs = [get_roidb(s) for s in imdb_names.split('+')] File "./tools/train_net.py", line 66, in get_roidb roidb = get_training_roidb(imdb) File "/media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/tools/../lib/fast_rcnn/train.py", line 127, in get_training_roidb imdb.append_flipped_images() File "/media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/tools/../lib/datasets/imdb.py", line 111, in append_flipped_images boxes = self.roidb[i]['boxes'].copy() File "/media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/tools/../lib/datasets/imdb.py", line 67, in roidb self._roidb = self.roidb_handler() File "/media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/tools/../lib/datasets/pascal_voc.py", line 145, in selective_search_roidb ss_roidb = self._load_selective_search_roidb(gt_roidb) File "/media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/tools/../lib/datasets/pascal_voc.py", line 179, in _load_selective_search_roidb 'Selective search data not found at: {}'.format(filename) AssertionError: Selective search data not found at: /media/MMVCNYLOCAL_2/MMVC_NY/Jin_Huang/affordance-net/data/selective_search_data/voc_2012_train.mat

And I checked the shell, there is a coco option, but when I use
./experiments/scripts/faster_rcnn_end2end.sh 1 VGG16 coco
it shows:
IOError: [Errno 2] No such file or directory: 'affordance-net/data/coco/annotations/instances_train2014.json'

I just downloaded the data as instructed in readme, but it seems like there is a dataset issue?
Do you know how I can solve the problem?

Thanks,

train my own dataset

would you mind to tell me how to produce the "instance_png" images?
Is there any requirement for the instance_png？ such as: different object need different colour?

thanks

Add cudnn7 support

This caffe version can not make in cudnn7, does it also support official caffe repo? The official can build success. What extra layers this version caffe added?

Camera parameters of IIT dataset

Hello, I'd like to use the IIT dataset.
But I need point clouds, therefore I'm trying to convert the depth images of IIT dataset to the point clouds of it. It seems like that your dataset is collected using 4 types of cameras. Could you let me know 4 parameters of your camera (cx, cy, fx, fy) of them?

how to use pretrained data?

how can i Used with previous trained data?

i tryed
./experiments/scripts/faster_rcnn_end2end.sh 2 VGG16 pascal_voc --weights
vgg16_faster_rcnn_iter_190000.caffemodel

after 10000 iteration training and then make 200000.caffemodel (true name : 10000.caffemodel)

when i did compare with 200000.caffemodel and attached 10000.caffemodel(As mentioned above), the result were different.

Why should the results be different? Is there something I'm making a mistake?

Error when running demo_img.py

hi
thanks for your code.
So after compiling etc, I just reach the magic moment of running your demo_img.py

But then, after crunching about 30 seconds (or a bit more), I have the following errors:

~/affordance-net$ ./tools/demo_img.py
./tools/demo_img.py: line 4: $'\nSee README.md for installation instructions before running.\nDemo script to perform affordace detection from images\n': command not found
from: can't read /var/mail/fast_rcnn.config
from: can't read /var/mail/fast_rcnn.test
from: can't read /var/mail/fast_rcnn.nms_wrapper
from: can't read /var/mail/utils.timer
./tools/demo_img.py: line 19: CONF_THRESHOLD: command not found
./tools/demo_img.py: line 20: good_range: command not found
./tools/demo_img.py: line 23: syntax error near unexpected token `('
./tools/demo_img.py: line 23: `cwd = os.getcwd()'

Any help would be much appreciated. Thanks a lot

Tets

Question on memory error

Hi,

I got often memory issue on images, is it due to my 980Ti having "only" 6GB of RAM you think?

For instance I got this message below for the attached image:

Current idx:  0  /  1
Current img:  6893.jpg
F1006 07:28:27.337713 13705 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0)  out of memory

Tets

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 490:23: Message type "caffe.LayerParameter" has no field named "roi_alignment_param".

I have the docker-caffe version. It has the caffe, so I didn't anything at caffe-affordance-net.
I just did demo, but I encountered that error.

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 490:23: Message type "caffe.LayerParameter" has no field named "roi_alignment_param".
F0531 09:04:05.949674 28 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /workspace/affordance-net/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt

What do I do?
Thanks.

I use docker-caffe, Cuda 8.0.

mulitiple/Parallelism gpu training work

hi
I tried '1 gpu 2,000,000 iterator training' but it takes more then 50days
so I want to train 'multi-gpu training'.

My computer has 4 gpu(gtx 1080ti * 4)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 On | N/A |
| 29% 47C P8 18W / 250W | 606MiB / 11170MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 53% 84C P2 218W / 250W | 10888MiB / 11172MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 48% 80C P2 167W / 250W | 10888MiB / 11172MiB | 33% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 23% 31C P8 8W / 250W | 11MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

I tried
./experiments/scripts/faster_rcnn_end2end.sh 0,1,2,3 VGG16 pascal_voc
./experiments/scripts/faster_rcnn_end2end.sh 0:3 VGG16 pascal_voc
./experiments/scripts/faster_rcnn_end2end.sh {0,1,2,3} VGG16 pascal_voc
but not working

Can you tell me about how to use multi-gpu training?
I want to know detail step of that.
Thank you.

Ros version Problem

Hi. Thanks for great Project.
I'm in 18.04 melodic
Should I use this Pakage for another Ubuntu Version?

about pretrain model

Hi
I found that I couldn't get a good model when I didn't use the pretrained model
I couldn't detect anything when I didn't use the pretrained model .
The same backbone network with VGG16 and after 50 thousand times iters.

so is it due to the hyperpameter ?
when I change the backbone network with the PVAlite network , I also find this issue
thank you

A question about loss_mask layer

Hi, I found loss_mask layer have two bottom, about your paper mask_score is 10 * 224 * 224, but the mask_targets, how can I sure 224*224?
name: "loss_mask" type: "SoftmaxWithLoss" #bottom: "mask_score_reshape" bottom: "mask_score" bottom: "mask_targets" top: "loss_mask"

How to create several .sm files in an image with pascal voc datasets?

HI， nqanh， Can you tell me how to create several .sm files in an image with pascal voc datasets? Ｔｈａｎｋｓ！

Missing Affordance Masks

Hi,
I try to train the net on my own data. After 200 000 iterations the detection is really good (good bounding box and right label). Nevertheless with demo_img.py no masks are shown but only the background color. I checked the .sm files and also adjusted the affordance number in the .prototxt. What else could be the mistake?

What is the difference between roi_alignment and roi_alignment2?

AffordanceNet architecture implementation

Hi, I have few questions about the network implementation:

Did you start from an available Mask R-CNN implementation for AffordanceNet? Or did you compose the network connecting and VGG, RPN, and modifying the head?
Was the initial model pretrained on COCO or some other dataset?
The note in the README.md states that the available architecture is slightly different from the paper. Could you please list these differences?

Thank you very much

Error parsing text-format caffe.NetParameter

Hi,
I installed and built pycaffe, but there was a error when I tried to run demo_img.py.
Do you know how I can solve the problem?

Thanks,

Error message:
[libprotobuf ERROR google/protobuf/text_format.cc:288] Error parsing text-format caffe.NetParameter: 490:23: Message type "caffe.LayerParameter" has no field named "roi_alignment_param".
F1111 19:04:30.259788 23639 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/mmvc/affordance-net/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt

demo.py / Check failed: error == cudaSuccess (2 vs. 0) out of memory * Check failure stack trace: * Aborted (core dumped)

I was the one who asked the other question.
Now I am in another difficulty.
start 'demo.py' in your code 'affordance net'

but,There is an error like the picture.

Do you have this error? If so, how did you resolve it?

thank you, 고마워요~

nvidia-smi

cf)
i started python demo.py --gpu 1

demo_img.py is not sequentially load image

I run demo_img.py then,

4 images are saved in tools/img folder, but demo_img.py is load 1 image.
furthermore, after show 1 image then stop terminal, so I reboot terminal.
What process did I miss? or could you give some tips for this problem?

Hi, How to visulize the result like this? to keep the original pic

How to set multiple GPUs

Good Job！
but How to set multiple GPUs？

Error running demo

Hi there

I am a beginner in deep learning and currently I am trying to run the demo after all the installation steps. Here's the error I got:

Please help! Thank you! :)

I get some errors while doing python demo_img.py

Hi I am using ubuntu 1604 and I have successfully downloaded caffe.
But I got this error and there is no description about this error online.

AffordanceNet root folder: /home/sujong/affordance_net/affordance-net
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0907 16:15:43.276456 5336 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0907 16:15:43.276486 5336 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0907 16:15:43.276489 5336 _caffe.cpp:142] Net('/home/sujong/affordance_net/affordance-net/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt', 1, weights='/home/sujong/affordance_net/affordance-net/pretrained/AffordanceNet_200K.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 490:23: Message type "caffe.LayerParameter" has no field named "roi_alignment_param".
F0907 16:15:43.277683 5336 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/sujong/affordance_net/affordance-net/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
*** Check failure stack trace: ***
[6] 5336 abort (core dumped) python demo_img.py

Can anyone please help me?

TypeError: slice indices must be integers or None or have an index method

When training the network with your script I get the following error:

Solving...
Traceback (most recent call last):
  File "./tools/train_net.py", line 118, in <module>
    max_iters=args.max_iters)
  File "/home/rdlm/affordance-net/tools/../lib/fast_rcnn/train.py", line 175, in train_net
    model_paths = sw.train_model(max_iters)
  File "/home/rdlm/affordance-net/tools/../lib/fast_rcnn/train.py", line 114, in train_model
    self.solver.step(1)
  File "/home/rdlm/affordance-net/tools/../lib/rpn/proposal_target_layer.py", line 106, in forward
    rois_per_image, self._num_classes) #bbox_targets_oris: original gt of rois
  File "/home/rdlm/affordance-net/tools/../lib/rpn/proposal_target_layer.py", line 606, in _sample_rois
    _get_bbox_regression_labels(bbox_target_data, num_classes)
  File "/home/rdlm/affordance-net/tools/../lib/rpn/proposal_target_layer.py", line 532, in _get_bbox_regression_labels
    bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] #gan gia tri tai class tuong ung la bbox_target_data, con lai la so 0
TypeError: slice indices must be integers or None or have an __index__ method

These errors are probably caused by the version of Numpy I am using (v 13.3). I managed to solve these errors by modifying the `$AffordanceNet/lib/rpn/proposal_target_layer.py file as follows:

vim $AffordanceNet/lib/rpn/proposal_target_layer.py
replace the for loop in line 532 with:

for ind in inds:
        cls = clss[ind]
        start = 4 * cls
        end = start + 4
        start = int(start)
        end = int(end)
        bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
        bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS

replace the if statement in line 579 with:

if fg_inds.size > 0:
        fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_this_image), replace=False)

replace the if statement in line 590 with:

if bg_inds.size > 0:
        bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_this_image), replace=False)

replace line 597 with:

    labels[int(fg_rois_per_this_image):] = 0

I get a new error, however, which I am not able to solve:

Solving...
Traceback (most recent call last):
  File "./tools/train_net.py", line 118, in <module>
    max_iters=args.max_iters)
  File "/home/rdlm/affordance-net/tools/../lib/fast_rcnn/train.py", line 175, in train_net
    model_paths = sw.train_model(max_iters)
  File "/home/rdlm/affordance-net/tools/../lib/fast_rcnn/train.py", line 114, in train_model
    self.solver.step(1)
  File "/home/rdlm/affordance-net/tools/../lib/rpn/proposal_target_layer.py", line 305, in forward
    roi_mask = -1 * np.ones((h, w), dtype=np.float32)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 192, in ones
    a = empty(shape, dtype, order)
TypeError: 'numpy.float64' object cannot be interpreted as an index

Color-coded affordance label

Hi, I'm running demo_img.py with pretrained model and the result looks great. But I cannot find any information on the correspondence betweem color and affordance label. There are in total 12 + 1 (bg) colors in demo_img.py. Is there any way to get that information? Thanks!

background = [200, 222, 250]
c1 = [0,0,205]
c2 = [34,139,34] # cut
c3 = [192,192,128]
c4 = [165,42,42]
c5 = [128,64,128] # grasp
c6 = [204,102,0]
c7 = [184,134,11]
c8 = [0,153,153]
c9 = [0,134,141]
c10 = [184,0,141]
c11 = [184,134,0]
c12 = [184,134,223]

about results

Hi, do you know why the segmented result (the area of the target object) is always smaller than the ground truth? Someone told me the reason is threshold for the segmentation mask in the method is too large. But I don't what is this threshold?

Caffe Installation

Hello. I'm trying to install caffe through the installation given here, but the build fails at various points.
Is there a specific CUDA and CUDNN version that I must use to build it successfully?

The build fails at /include/cudnn.hpp, layers/cudnn_sigmoid_layer.cu ...etc, with incompatible types/incorrect parameters etc.

I tried building with CUDA 10, CUDNN8, CUDNN5 with cuda 8, and also tried other variations, however 'make' all fails. I followed the procedure given in the caffe installation entirely, yet with no success.

Thanks

SigmoidCrossEntropyLoss

Hi, I want to change the SoftmaxWithLoss layer for a SigmoidCrossEntropyLoss layer. I understand that SigmoidCrossEntropy doesn't accept a mask with multiple labels (e.g. 0, 1, 2 ...) but only a binary mask with values in {0,1}. To pass a single mask to the Sigmoid I put a python layer before it that selects the mask provided by the roi-data layer (as in Mask R-CNN). This layer is described below. My training prototxt file is also given below (note that I'm using 3 classes only).

The problem I encounter is that the loss of mask is very large (around 40.000) and doesn't drop. Any help would be greatly appreciated?

PYTHON LAYER

import caffe
import numpy as np

class BinaryMaskLayer(caffe.Layer):
    def setup(self, bottom, top):
        layer_params = yaml.load(self.param_str_)
        self._num_classes = layer_params['num_classes']
        top[0].reshape(1, 1, cfg.TRAIN.MASK_SIZE, cfg.TRAIN.MASK_SIZE)
        top[1].reshape(1, 1, cfg.TRAIN.MASK_SIZE, cfg.TRAIN.MASK_SIZE)

    def forward(self, bottom, top):
        mask_score = bottom[0].data
        mask_targets = bottom[1].data
        label_for_mask = bottom[2].data

        # convert multilabel mask in binary mask
        for i in xrange(mask_targets.shape[0]):
            mask = mask_targets[i,...]
            # make all but the bounding box labeled values 0
            mask[ mask != label_for_mask[i] ] = 0
            # make other values 1
            mask[ mask == label_for_mask[i] ] = 1
            mask_targets[i,...] = mask
        label = int(label_for_mask[0])

        # choose 1 mask, e.g. mask of label 1
        mask_score = mask_score[:,label:label+1,:,:]

        # add dimension and subsequently swap the first and second dimension
        mask_targets = mask_targets[np.newaxis,:]
        mask_targets = np.swapaxes(mask_targets,0,1)

        top[0].reshape(*mask_score.shape)
        top[0].data[...] = mask_score
        top[1].reshape(*mask_targets.shape)
        top[1].data[...] = mask_targets

    def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass

    def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass

TRAIN.PROTOTXT FILE

name: "VGG_ILSVRC_16_layers"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  top: 'seg_mask_inds' 
  top: 'flipped' 
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 3"  # 2 obj categories + 1 background
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu1_2"
  type: "ReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu2_1"
  type: "ReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu2_2"
  type: "ReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2_2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3_1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3_1"
  type: "ReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3_2"
  type: "ReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3_3"
  type: "ReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3_3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4_1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu4_1"
  type: "ReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu4_2"
  type: "ReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu4_3"
  type: "ReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "pool4"
  type: "Pooling"
  bottom: "conv4_3"
  top: "pool4"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv5_1"
  type: "Convolution"
  bottom: "pool4"
  top: "conv5_1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu5_1"
  type: "ReLU"
  bottom: "conv5_1"
  top: "conv5_1"
}
layer {
  name: "conv5_2"
  type: "Convolution"
  bottom: "conv5_1"
  top: "conv5_2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu5_2"
  type: "ReLU"
  bottom: "conv5_2"
  top: "conv5_2"
}
layer {
  name: "conv5_3"
  type: "Convolution"
  bottom: "conv5_2"
  top: "conv5_3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu5_3"
  type: "ReLU"
  bottom: "conv5_3"
  top: "conv5_3"
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5_3"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    #num_output: 24   
    num_output: 30 # 2(bg/fg) * 15(n_anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    #num_output: 48   # 4 * 12(anchors)
    num_output: 60   # 4 * 15(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } } 
}

layer {
  name: 'rpn-data'
  type: 'Python'
  bottom: 'rpn_cls_score'
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'
  top: 'rpn_labels'
  top: 'rpn_bbox_targets'
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    #param_str: "'feat_stride': 16 \n'scales': !!python/tuple [4, 8, 16, 32]"
    param_str: "'feat_stride': 16 \n'scales': !!python/tuple [2, 4, 8, 16, 32]" 
  }
}

layer {
  name: "rpn_loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "rpn_cls_score_reshape"
  bottom: "rpn_labels"
  propagate_down: 1
  propagate_down: 0
  top: "rpn_cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}

layer {
  name: "rpn_loss_bbox"
  type: "SmoothL1Loss"
  bottom: "rpn_bbox_pred"
  bottom: "rpn_bbox_targets"
  bottom: 'rpn_bbox_inside_weights'
  bottom: 'rpn_bbox_outside_weights'
  top: "rpn_loss_bbox"
  loss_weight: 1
  smooth_l1_loss_param { sigma: 3.0 }
}

#========= RoI Proposal ============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}

layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  #reshape_param { shape { dim: 0 dim: 24 dim: -1 dim: 0 } } 
  reshape_param { shape { dim: 0 dim: 30 dim: -1 dim: 0 } }
}

layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    #param_str: "'feat_stride': 16 \n'scales': !!python/tuple [4, 8, 16, 32]"
    param_str: "'feat_stride': 16 \n'scales': !!python/tuple [2, 4, 8, 16, 32]"
  }
}

layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  bottom: 'im_info' 
  bottom: 'seg_mask_inds' 
  bottom: 'flipped' 
  top: 'rois' 
  top: 'labels' 
  top: 'bbox_targets' 
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  top: 'mask_targets' 
  top: 'rois_pos'
  top: 'label_for_mask'
  python_param {
    module: 'rpn.proposal_target_layer_ppsigmoid'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 3"
  }
}

#========= RCNN ============

layer {
  name: "roi_pool5"
  #type: "ROIPooling"
  #type: "ROIAlignment2"
  type: "ROIAlignment"
  bottom: "conv5_3" #bottom[0]
  bottom: "rois" #bottom[1]
  top: "pool5"
  #roi_pooling_param {
  #roi_alignment2_param {
  roi_alignment_param {
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output:3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 12 # = 4 * 3, i.e., box coordinate for each class
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "cls_score"
  bottom: "labels"
  propagate_down: 1
  propagate_down: 0
  top: "loss_cls"
  loss_weight: 3
}
layer {
  name: "loss_bbox"
  type: "SmoothL1Loss"
  bottom: "bbox_pred"
  bottom: "bbox_targets"
  bottom: "bbox_inside_weights"
  bottom: "bbox_outside_weights"
  top: "loss_bbox"
  loss_weight: 2
}

##############Mask branch####################################
 layer {
 name: "roi_pool5_2"
  #type: "ROIPooling"
  #type: "ROIAlignment2"
  type: "ROIAlignment"
  bottom: "conv5_3"
  bottom: "rois_pos"
  top: "pool5_2"
  #roi_pooling_param {
  #roi_alignment2_param{
  roi_alignment_param{
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 # 1/16
  }
}

## Conv-Relu 1
layer {
  name: "pool5_2_conv"
  type: "Convolution"
  bottom: "pool5_2"
  top: "pool5_2_conv"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv_relu"
  type: "ReLU"
  bottom: "pool5_2_conv"
  top: "pool5_2_conv_relu"
}


## Conv-Relu 2
layer {
  name: "pool5_2_conv2"
  type: "Convolution"
  bottom: "pool5_2_conv_relu"
  top: "pool5_2_conv2"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1#kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv2_relu"
  type: "ReLU"
  bottom: "pool5_2_conv2"
  top: "pool5_2_conv2_relu"
}

# Deconv 1
layer { 
  name: "mask_deconv1"
  type: "Deconvolution"
  #bottom: "pool5_2_conv_relu"
  bottom: "pool5_2_conv2_relu"
  top: "mask_deconv1"
  param { lr_mult: 1 decay_mult: 1.0 }
  param { lr_mult: 2 decay_mult: 0}
  convolution_param {
    num_output: 256
    #pad: 1 stride: 2 kernel_size: 4 # 14x14
    #pad: 1 stride: 3 kernel_size: 6  # 22x22
    pad: 1 stride: 4 kernel_size: 8 # 30x30
    group: 256 #apply independently
    weight_filler { type: "bilinear" }
    #bias_filler { type: "constant" value: 1 }
  }
}


## Conv-Relu 3
layer {
  name: "pool5_2_conv3"
  type: "Convolution"
  bottom: "mask_deconv1"
  top: "pool5_2_conv3"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1#kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv3_relu"
  type: "ReLU"
  bottom: "pool5_2_conv3"
  top: "pool5_2_conv3_relu"
}


## Conv-Relu 4
layer {
  name: "pool5_2_conv4"
  type: "Convolution"
  bottom: "pool5_2_conv3_relu"
  top: "pool5_2_conv4"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1#kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv4_relu"
  type: "ReLU"
  bottom: "pool5_2_conv4"
  top: "pool5_2_conv4_relu"
}



# Deconv 2
layer {
  name: "mask_deconv2"
  type: "Deconvolution"
  bottom: "pool5_2_conv4_relu"
  top: "mask_deconv2"
  param { lr_mult: 1 decay_mult: 1.0 }
  param { lr_mult: 2 decay_mult: 0}
  convolution_param {
    num_output: 256
    #pad: 1 stride: 2 kernel_size: 4  # 28x28
    #pad: 1 stride: 8 kernel_size: 16 # 490x490 
    pad: 1 stride: 4 kernel_size: 8
    group: 256 #apply independently
    weight_filler { type: "bilinear" }
    #bias_filler { type: "constant" value: 1 }
  }
}


## Conv-Relu 5
layer {
  name: "pool5_2_conv5"
  type: "Convolution"
  bottom: "mask_deconv2"
  top: "pool5_2_conv5"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1#kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv5_relu"
  type: "ReLU"
  bottom: "pool5_2_conv5"
  top: "pool5_2_conv5_relu"
}


## Conv-Relu 6
layer {
  name: "pool5_2_conv6"
  type: "Convolution"
  bottom: "pool5_2_conv5_relu"
  top: "pool5_2_conv6"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0}
  convolution_param {
    num_output: 512
    kernel_size: 3 pad: 1 stride: 1#kernel_size: 1 pad: 0 #kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: "pool5_2_conv6_relu"
  type: "ReLU"
  bottom: "pool5_2_conv6"
  top: "pool5_2_conv6_relu"
}



# Deconv 3
layer {
  name: "mask_deconv3"
  type: "Deconvolution"
  bottom: "pool5_2_conv6_relu"
  top: "mask_deconv3"
  param { lr_mult: 1 decay_mult: 1.0 }
  param { lr_mult: 2 decay_mult: 0}
  convolution_param {
    num_output: 256
    pad: 1 stride: 2 kernel_size: 4  
    #pad: 1 stride: 8 kernel_size: 16 
    #pad: 1 stride: 4 kernel_size: 8
    group: 256 #apply independently
    weight_filler { type: "bilinear" }
    #bias_filler { type: "constant" value: 1 }
  }
}

layer {
  name: "mask_score"
  type: "Convolution"
  bottom: "mask_deconv3" #
  top: "mask_score"
  param { lr_mult: 1.0 decay_mult: 1.0 }
  param { lr_mult: 2.0 decay_mult: 0 }
  convolution_param {
    num_output: 3	# 2  classes + 1 background
    kernel_size: 1 pad: 0 
    weight_filler {type: "gaussian" std: 0.01 } #weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

layer {
  name: 'binary-mask'
  type: 'Python'
  bottom: 'mask_score'
  bottom: 'mask_targets' #from lib/rpn/proposal_target_layer.py roi-data
  bottom: 'label_for_mask' #from lib/rpn/proposal_target_layer.py roi-data
  top: 'mask_score2'
  top: 'binary_mask'
  python_param {
    module: 'rpn.binary_mask'
    layer: 'BinaryMaskLayer'
    param_str: "'num_classes': 3"
  }
}

layer {
  name: "loss_mask"
  type: "SigmoidCrossEntropyLoss"
  bottom: 'mask_score2'
  bottom: "binary_mask"
  top: "loss_mask"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
    #normalize: false
  }
  propagate_down: true  # backprop to prediction
  propagate_down: false # don't backprop to labels
}

How to change number of classes

Hello,
I would like to train the net with my own data in pascal_voc format. Unfortunately it gives me the following error when I try to change the num_classes parameter in models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt line 13 and 528:

Check failed: bottom[0]->channels() == bottom[1]->channels() (44 vs. 12)

What do I have to do to train the network with my number of classes?

way to calculate F score

Hi Dr. Nguyen,

It is a really nice work! Thank you.
Would you mind to provide script to compute the F score in the paper? I would like to evaluate the performance for my own dataset set. Or it is in the repo and I overlooked it? It would be great if I could borrow the code from you.

Thank you very much.

I have a question of installing AffordanceNet with docker

Hi.

I want to use AffordanceNet using docker.

I downloaded docker-caffe and did git clone AffordanceNet.

Docker-caffe have been installed caffe, so I did not anything. ( Make, build and so on )

And cd AffordanceNet/caffe-affordance-net and edit Makefile.config.

uncomment
USE_CUDNN := 1
WITH_PYTHON_LAYER := 1

but I got this error

./include/caffe/util/cudnn.hpp: In function 'void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)':
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function 'cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)'
pad_h, pad_w, stride_h, stride_w));
^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro 'CUDNN_CHECK'
cudnnStatus_t status = condition;
^

Can you help me?

I don't know why it had the error.

I did cudnn v5.1, cudnn v6, not use cudnn, but I got the same error.

I use CUDA 8.0.

Thank you.

A question about .sm file

Hi, thanks for your code! I have a question about .sm file. I read the convert_instance_png_to_sm.py. The image '0.png' has 3 objects --> has 3 affordance masks: '0_1.png', '0_2.png', '0_3.png'.. But pascal_voc dataset can't devide several masks, what should I do？ Also, How should I deal with .sm files? I found pascal_voc.py have tips:

if cfg.TRAIN.MASK_REG:
## need more processing here
# 1. create seg_mask_save for this obj (mask size equals to image size)
# 2. Convert to bool:
# seg_mask_save = seg_mask_save.astype(bool). #Note that in case multi label---> DO NOT convert to bool
# 3. seg_mask_path = './data/cache/seg_mask_pascal2012_gt/' + str(index) + '_' + str(count) + '_segmask.sm'
# 4. save into folder
# with open(seg_mask_path, 'wb') as f_seg_save:
# cPickle.dump(seg_mask_save, f_seg_save, cPickle.HIGHEST_PROTOCOL)
# print ("=======================index:" + str(index))
# print ("=======================ix:" + str(ix))
#index has form: index = "2008_000008" --> has to parse into integer number
index_t = index.strip()

Can you fix this part ? thx!

Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type

Hi,
i am having following problem when i run demo_img.py from tools folder:-

Creating layer proposal
F0311 17:06:25.735666 24859 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, ROIAlignment, ROIAlignment2, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
Aborted (core dumped)

It actually generates the whole model successfully up until this layer. And then fails.
Could it have anything to do with the fact that i kept the following

##cuDNN acceleration switch (uncomment to build with cuDNN).

USE_CUDNN := 1

commented in the makefile.config before building caffe-affordance-net ?

WARNING: filter_boxes() remove ALL proposal.

Hi,
I encountered the problem that I got the error "WARNING: filter_boxes() remove ALL proposal." (proposal_layer.py line 191) during training. I saw that the values of the boxes are NaNs, but I don't know exactly why. Also important is that this Error occurs after a few hundred steps and not from the beginning. I checked my data but all masks and annotations seem to be fine. Right before the warning appears I get this RuntimeWarnng:

affordance-net/tools/../lib/fast_rcnn/bbox_transform.py:48: RuntimeWarning:
overflow encountered in exp pred_w = np.exp(dw) * widths[:, np.newaxis]
affordance-net/tools/../lib/fast_rcnn/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply pred_w = np.exp(dw) * widths[:, np.newaxis]
affordance-net/tools/../lib/fast_rcnn/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp pred_h = np.exp(dh) * heights[:, np.newaxis]
affordance-net/tools/../lib/fast_rcnn/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply pred_h = np.exp(dh) * heights[:, np.newaxis]

Do you know why this error could happened

Can't train pascal_voc

Hi, thanks for your codes! But when I run sh experiments/scripts/faster_rcnn_end2end.sh 1 VGG16 pascal_voc I got a error :
Traceback (most recent call last): File "./tools/train_net.py", line 109, in <module> imdb, roidb = combined_roidb(args.imdb_name) File "./tools/train_net.py", line 74, in combined_roidb roidbs = [get_roidb(s) for s in imdb_names.split('+')] File "./tools/train_net.py", line 67, in get_roidb roidb = get_training_roidb(imdb) File "/tools/../lib/fast_rcnn/train.py", line 128, in get_training_roidb imdb.append_flipped_images() File "/tools/../lib/datasets/imdb.py", line 116, in append_flipped_images assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError

training error

I train ./experiments/scripts/faster_rcnn_end2end.sh 0 VGG16 pascal_voc.
if just run, error message is

so, I fixed faster_rcnn_end2end.sh ITERS=2000000 ->500 then it worked.
big ITERS occur error
small ITERS not occur error, but it is not good learning.
error message is

Floating point exception

1 ITERS is almost spend 3s. is it good? or this error is common error?

How to use caffe pretrained model

how can i Used with previous trained data?

i tried but error message

One Object with multiple affordances

Hi,
I want to give the same object multiple affordances over the whole size of the object, is this possible?
I know it is not possible to write just e.g 1,2 in the mask .sm file or create two files with a 1 mask and a 2 mask but is there any possibility to do this?

nqanh / affordance-net Goto Github PK

affordance-net's People

Stargazers

Watchers

Forkers

affordance-net's Issues

but I got this error

USE_CUDNN := 1

Recommend Projects

Recommend Topics

Recommend Org