Giter VIP home page Giter VIP logo

fcn.tensorflow's People

Contributors

jakobu5 avatar lababidi avatar shekkizh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fcn.tensorflow's Issues

[Solved] Problems with TensorFlow 1.0 and Windows

Hi there,

First, I wanted to say thanks for sharing! I'm working through the code to help with my own segmentation project and having something to work from is a big help.

Second, I came across a few issues (minor really) that I've figured out and wanted to share:

  • TensorFlow 1.0 replaced tf.pack() with tf.stack().
  • In TensorFlow 1.0, variables should be initialised using tf.global_variables_initializer()
  • In Windows, the os.path.splittext() should use "\ \", rather than '/'. Otherwise, the program can't find any files to pickle (and the MITSceneParsing.pickle file is empty), which in turn means 0 records are found and the feed dict instruction doesn't work.

Like I said, pretty minor stuff, but I wanted to post in case anyone else had any issues.

Best regards,

Frazer

P.S. If you get an out of memory error, it's likely because you're trying to work with 20,000 images, which might be a bit too much. I deleted some of the training images and it worked.

FCN vs Deepmask

Hi, you have more experience in ML and computer vision so I just want to know your opinion on Image Segmentation.

What do you think will yield better results in a case of image segmentation, this implementation of FCN or Deepmask from Facebook? Can you also elaborate on why?

Right know I am in the phase of learning, so thank you for any insight.

ValueError: Cannot feed value of shape (1, 6000, 6000, 3, 1) for Tensor 'annotation:0', which has shape '(?, 6000, 6000, 1)'

Thanks, for sharing your resource Mr. Shekkizh. In annotation it's it brings to me error .. I used python 3.6 and Tensorflow 1.0 . Is it from my working environment ? I only changed the dataset.

setting up vgg initialized conv layers ...
Setting up summary op...
Setting up image reader...

Found pickle file!
40
8
Setting up dataset reader

Initializing Batch Dataset Reader...
{'resize': True, 'resize_size': 6000}
(40, 6000, 6000, 3)
(40, 6000, 6000, 3, 1)

Initializing Batch Dataset Reader...
{'resize': True, 'resize_size': 6000}
(8, 6000, 6000, 3)
(8, 6000, 6000, 3, 1)
Setting up Saver...

Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Yared/Desktop/Project/FCN.py', wdir='C:/Users/Yared/Desktop/Project')

File "C:\Users\Yared\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "C:\Users\Yared\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Yared/Desktop/Project/FCN.py", line 225, in
tf.app.run()

File "C:\Users\Yared\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "C:/Users/Yared/Desktop/Project/FCN.py", line 196, in main
sess.run(train_op, feed_dict=feed_dict)

File "C:\Users\Yared\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 767, in run
run_metadata_ptr)

File "C:\Users\Yared\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))

ValueError: Cannot feed value of shape (1, 6000, 6000, 3, 1) for Tensor 'annotation:0', which has shape '(?, 6000, 6000, 1)'

How can i solve these problems?

runfile('C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master/FCN.py', wdir='C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master')
setting up vgg initialized conv layers ...
Setting up summary op...
Setting up image reader...
Found pickle file!
0
0
Setting up dataset reader
Initializing Batch Dataset Reader...
{'resize': True, 'resize_size': 224}
(0,)
(0,)
Initializing Batch Dataset Reader...
{'resize': True, 'resize_size': 224}
(0,)
(0,)
Setting up Saver...
****************** Epochs completed: 1******************

Traceback (most recent call last):

  File "<ipython-input-1-6062f5716837>", line 1, in <module>
    runfile('C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master/FCN.py', wdir='C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master')

  File "C:\Users\PROCOMP-9\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Users\PROCOMP-9\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master/FCN.py", line 223, in <module>
    tf.app.run()

  File "C:\Users\PROCOMP-9\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))

  File "C:/Users/PROCOMP-9/Desktop/FCN.tensorflow-master/FCN.tensorflow-master/FCN.py", line 194, in main
    sess.run(train_op, feed_dict=feed_dict)

  File "C:\Users\PROCOMP-9\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 766, in run
    run_metadata_ptr)

  File "C:\Users\PROCOMP-9\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 943, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))

ValueError: Cannot feed value of shape (0,) for Tensor 'input_image:0', which has shape '(?, 224, 224, 3)'

How to train on Pascal dataset?

Hello,

I was wondering what all is need to be done to get this working with the Pascal dataset. I see that the output placeholder channel size is hardcoded to 1 and the pascal annotations are rgb images so that would need to be changed to 3, along with the number of classes and whatnot. I've tried changing those but its giving me an error in the loss function line and I'm having a hard time understanding how logits which is of shape (?, ?, ?, num_classes) can be compared with y_output which is of shape (?, width, height, channel).

Also a separate question, but do you know how to compute intersection over union for the output?

Thanks

edit: Spent a little bit of time looking around and it looks like I need to figure out how to map the color mapped segmentation labels they give us in the dataset with a 0-20 integer indexed version

Test with single image

How can we test it with single image with out giving it training label/mask ? To predict the segmentation ?

errors with loss

I use tensorflow 1.0 gpu on windows and I got error:

Traceback (most recent call last):
File "FCN.py", line 221, in
tf.app.run()
File "C:\Users\SEELE\AppData\Local\Programs\Python\Python35\lib\site-packages
tensorflow\python\platform\app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "FCN.py", line 152, in main
loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits
,tf.squeeze(annotation, squeeze_dims=[3]),name="entropy")))
File "C:\Users\SEELE\AppData\Local\Programs\Python\Python35\lib\site-packages
tensorflow\python\ops\nn_ops.py", line 1684, in sparse_softmax_cross_entropy_wit
h_logits
labels, logits)
File "C:\Users\SEELE\AppData\Local\Programs\Python\Python35\lib\site-packages
tensorflow\python\ops\nn_ops.py", line 1533, in _ensure_xent_args
"named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call sparse_softmax_cross_entropy_with_logits with named argu
ments (labels=..., logits=..., ...)

How can I solve it?

AttributeError: 'module' object has no attribute 'get_model_data'

/usr/bin/python2.7 /home/yared/Desktop/Project/FCN.py
  setting up vgg initialized conv layers ...
     Traceback (most recent call last):
     File "/home/yared/Desktop/Project/FCN.py", line 225, in <module>
       tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/home/yared/Desktop/Project/FCN.py", line 150, in main
    pred_annotation, logits = inference(image, keep_probability)
  File "/home/yared/Desktop/Project/FCN.py", line 75, in inference
    model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL)
AttributeError: 'module' object has no attribute 'get_model_data'

A possible bug when resizing annotations.

If resize==true, the image and annotation will be resized to a given size.
However, in the resizing process of the annotation, the interpolation method is default: bilinear, which causes some mistakes labels near the edge of objects.
So, I think the interpolation method should be 'nearest' when resizing annotations.

Thanks a lot!

Test the trained model?

Dear all,
How I can test the trained model on a group of new images?
thanks for your help.

[SOLVED] Loss won't decrease, predictions are all the same

Hi! I would like to reproduce your results.
Just running the code like python FCN.py doesn't seem to do the job for me.
The default parameters are:

  • IMAGE_SIZE = 224 (changing it to 256 does not affect the results)
  • learning_rate = 1e-4
  • batch_size = 2

What I get is that training and validation loss start at about 400, and very quickly (200 iterations) decrease until they settle to about 3.

Step: 0, Train_loss:415.754
2016-10-13 12:19:13.407670 ---> Validation_loss: 395.876
Step: 10, Train_loss:28.7208
Step: 20, Train_loss:10.2944
Step: 30, Train_loss:5.06159
Step: 40, Train_loss:4.51668
Step: 50, Train_loss:4.17936
Step: 60, Train_loss:4.55051
Step: 70, Train_loss:4.98752
Step: 80, Train_loss:3.63942
Step: 90, Train_loss:3.56676
Step: 100, Train_loss:3.96641
Step: 110, Train_loss:3.72767
Step: 120, Train_loss:3.26587
Step: 130, Train_loss:3.89015
Step: 140, Train_loss:5.48371
Step: 150, Train_loss:4.27173
Step: 160, Train_loss:3.81378
Step: 170, Train_loss:3.58391
Step: 180, Train_loss:2.79207
Step: 190, Train_loss:4.10269
Step: 200, Train_loss:4.57686
Step: 210, Train_loss:4.00551
Step: 220, Train_loss:3.1667
Step: 230, Train_loss:3.7841
Step: 240, Train_loss:3.74983
Step: 250, Train_loss:3.03212
Step: 260, Train_loss:2.85248
Step: 270, Train_loss:3.64257
Step: 280, Train_loss:3.765
Step: 290, Train_loss:4.16679
Step: 300, Train_loss:4.0291
Step: 310, Train_loss:3.95092
Step: 320, Train_loss:3.38709
Step: 330, Train_loss:2.48646
Step: 340, Train_loss:2.98015
Step: 350, Train_loss:3.59501
Step: 360, Train_loss:3.80755
Step: 370, Train_loss:3.73314
Step: 380, Train_loss:3.40185
Step: 390, Train_loss:3.89394
Step: 400, Train_loss:3.80676
Step: 410, Train_loss:2.78324
Step: 420, Train_loss:3.14695
Step: 430, Train_loss:3.29019
Step: 440, Train_loss:3.16163
Step: 450, Train_loss:3.64598
Step: 460, Train_loss:2.74009
Step: 470, Train_loss:3.93917
Step: 480, Train_loss:3.815
Step: 490, Train_loss:3.83076
Step: 500, Train_loss:4.45192
2016-10-13 12:24:10.606606 ---> Validation_loss: 3.02666

I kept it running up to 35000 iterations, which should be about 3.5 epochs, but the loss won't decrease any further.
If I then validate the model at 35000 iterations (Train_loss:2.73392, Validation_loss: 3.51286) with python FCN.py --mode visualize I get always the same prediction, whichever the input image is:

image

This is also, by the way, the same prediction I get with an earlier model (200 iterations).

Is there something I'm getting wrong?
Thank you

regarding patch wise training and convolutional training

Hi Sarath,

Thanks for sharing the code. I have a question regarding generating the training data set.

In the FCN paper, the authors discuss the patch wise training and fully convolutional training. What is the difference between these two?

Please refer to section 4.4 attached in the following.

It seems to me that the training mechanism is as follows, Assume the original image is MM, then iterate the MM pixels to extract N*N patch (where N<M). The iteration stride can some number like N/3 to generate overlapping patches. Moreover, assume each single image corresponds to 20 patches, then we can put these 20 patches or 60 patches(if we want to have 3 images) into a single mini-batch for training. Is this understanding right? It seems to me that this so-called fully convolutional training is the same as patch-wise training.

capture

Fusing layer

According to the paper, we should add a 1x1 convolutional layer on top of pool4 to get a score for each class and use that score to fuse with the final layer in FC32. Finally, we use a devconv layer to get the target image.
However, in your implementation, you convert final layer of FC32, using a devconv layer, to have the same shape with pool4 layer. Then, you directly fuse pool4 with that score.
I just want to know whether the order of these operation matters?

How to test and demo?

hi,i have completed the train but i don't know how to test it?and how to demo it?can you help me ?

some problem about the result(output map)

@shekkizh when i run FCN.py to train network on my laptop , the loss value from 400 drops to 3 ,but the segmentation map is so terrible. i can hardly see the shape of segmentation from output map .should i continue to train network , or there are other factor causing the terrible situation

Train the model on our own dataset !

Hello , I've been trying to train the model on my own dataset , I've formatted my annotation to gray scale so that it's the same format as MIT , but when I launch the training , I get a training loss of nan for all iterations, is there something I'm doing wrong , or something I didn't take into consideration ?
Thank you

[[Node: entropy/entropy = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](entropy/Reshape, entropy/Reshape_1)]]

----- 7 198 197 196 196 197 196 196 201 197 197 198 190 166 133 130 131 131 132 132 132 132 132 133 133 133 134 129 114 104 96 69 17 10 10 28 124 133 132 132 132 132 132 131 131 131 131 131 130 131 133 132 132 132 133 133 132 132 132 132 132 132 132 132 132 132 132 133 134 133 133 133 133 133 133 133 133 133 133 127
[[Node: entropy/entropy = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](entropy/Reshape, entropy/Reshape_1)]]

This error stoped the traing. Is this related to softmax ?
loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
labels=tf.squeeze(annotation, squeeze_dims=[3]),
name="entropy")))

CUDA_ERROR_OUT_OF_MEMORY

Hi, @shekkizh ,

I got the following error:

root@milton-OptiPlex-9010:/data/code/FCN.tensorflow# python FCN.py 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.7.5 locally
setting up vgg initialized conv layers ...
Setting up summary op...
Setting up image reader...
Found pickle file!
20210
2000
Setting up dataset reader
Initializing Batch Dataset Reader...
{'resize_size': 224, 'resize': True}
(20210, 224, 224, 3)
(20210, 224, 224, 1)
Initializing Batch Dataset Reader...
{'resize_size': 224, 'resize': True}
(2000, 224, 224, 3)
(2000, 224, 224, 1)
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615
Traceback (most recent call last):
  File "FCN.py", line 223, in <module>
    tf.app.run()
  File "/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "FCN.py", line 177, in main
    sess = tf.Session()
  File "/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1186, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 551, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/root/anaconda3/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Could you suggest me how to fix this error: "CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615"

Why are you padding on the sixth convolutional layer?!

Hi,

On the sixth convolutional layer, the code is:

    W6 = utils.weight_variable([7, 7, 512, 4096], name="W6")
    b6 = utils.bias_variable([4096], name="b6")
    conv6 = utils.conv2d_basic(pool5, W6, b6)

At this stage, pool5 has size [batch_size, 7, 7, 512]. Now, as far as I understand, you are using a filter of size 7 by 7, in order to make your feature map of size [batch_size, 1, 1, 4096]. However, if you look at the code of conv2d_basic(...), the code is:

    def conv2d_basic(x, W, bias):
        conv = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
        return tf.nn.bias_add(conv, bias)

The problem here is that the output of conv6 actually remains [batch_size, 7, 7, 4096] because you're using padding, and I am not sure that this is what we want. If we look at the official code, we'll see the sixth convolutional layer coded as:

layer {
    name: "fc6"
    type: "Convolution
    bottom: "pool5"
    top: "fc6"
    param {
        lr_mult: 1
        decay_mult: 1
    }
    param {
        lr_mult: 2
        decay_mult: 0
    }
    convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 7
    stride: 1
    }
}

They aren't using padding in this layer, which means that conv6 is actually of size [batch_size, 1, 1, 4096]. Pretty sure this is the entire point of conv6.

Am I missing something, or that part of the code was a mistake from your part?

Anyway, cheers for the code. The cleanest TF implementation of F-CNN I have seen so far.

Line 152 in FCN.py

Sorry to bother you Mr.Shekkizh, I just have a problem when I run this code and get following

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/framework/tensor_shape.py", line 547, in merge_with
self.assert_same_rank(other)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/framework/tensor_shape.py", line 593, in assert_same_rank
"Shapes %s and %s must have the same rank" % (self, other))
ValueError: Shapes (?, ?, ?, 151) and (?, ?) must have the same rank

And I find out that this exception comes from here
loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.squeeze(annotation, squeeze_dims=[3]), name="entropy")))

I test on the shape of logits and tf.squeeze(annotation, squeeze_dims=[3]), they are (?, ?, ?, 151) and (?, 224, 224) seperately.
So is there anything wrong? I'm a little confused and hoping for your advice. Thanks a lot.

higher order of channel?

is there anyway to use this model for images with higher than 3 channel?
like 4 or 5 channel?

Line 47 BatchDatsetReader.py

Hi @shekkizh
First, thank you for sharing the code with us.
I have a question, the if statement on line 47 on BatchDatsetReader says:

if self.image_options.get("resize", False) and self.image_options["resize"]:
    resize_size = int(self.image_options["resize_size"])
    resize_image = misc.imresize(image,
                                  [resize_size, resize_size], interp='nearest')
else:
     resize_image = image

I think is should be:

if self.image_options.get("resize", True) and self.image_options["resize_size"]:
    resize_size = int(self.image_options["resize_size"])
    resize_image = misc.imresize(image,
                                [resize_size, resize_size], interp='nearest')
else:
    resize_image = image

Am I misinterpreting the parameters?

Thanks!

How is this executed?

Hi there!

This set up looks very useful for me! I am trying to run this on my own set of images and labels. I don't see any instructions on how to execute the app though. Could you perhaps shed some light on this?

Thanks in advance!
Lilly

The loss is Nan

I train my dataset which has only two class,so I set the NUM_CLASSES as 2,and the loss turned out to be Nan.I change the NUM _CLASSED to 3 or 151 without changing my dataset,and it can work.
I'm very confused with this,please help me.

I have tried to decrease the lr to le-7 and le-8,but it didn't work.

Slight modification, large loss values?

Thanks for your work here.

I'm trying to train a slightly modified* version of this network.

In the figure under the 'Observations' heading on the main github page for this repo (the image at logs/images/sparse_entropy.png') is the entropy on the y-axis the training or validation loss, with iterations on the x-axis? If so, I seem to be getting huge loss values in comparison. The plot has an entropy of 4.5 initially, decreasing to around 3.5 after 2500 iterations, whereas I'm getting an initial loss of 300-600, with training loss hovering between 30 and 100 by iteration 2500, and validation loss somewhere between 70 and 100 at iteration 2500.

Are these reasonable values, or has something gone seriously wrong here?

(*Details of the modified version of the network:

  • I've reintroduced relu5_3 between conv5_3 and pool5
  • I'm loading the parameters for the first two fully connected layers of VGG-19 from vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat instead of initialising them randomly
  • Due to memory constraints, I'm holding all parameters fixed apart from the last fully connected layer of VGG-19 and the deconvolution/upsampling layers. )

Why Accuracy and Loss are not corresponding?

I have implemented the code and extended it according to the statistics (Accuracy and Loss). However, I expect a correspondence between Accuracy and Loss in TensorBoard. Unfortunately, the graphs do not agree (See Attachment). Does anyone know why?
Thanks in Advance!
graph

how to decrease space used for logging?

Dear all,
I have some limitations on hard disk space on shared server,
Is there any necessity for logging? Or how I can use some smaller space for logging?
How I can decrease the size need for logging?
Please guide step by step
Any comment appreciated.
Thanks

Training loss remains nan

Hi,

I want to use the code to train my own image data.
And the data image is 512*512 gray image.
But when I train with them,
the loss remains nan:

Setting up Saver...
Step: 0, Train_loss:nan
2017-03-17 13:55:04.919050 ---> Validation_loss: nan
Step: 10, Train_loss:nan
Step: 20, Train_loss:nan
Step: 30, Train_loss:nan
Step: 40, Train_loss:nan

In the py file BatchDatsetReader I changed the way to read images like:

def _read_images(self):
    self.__channels = False
    self.images = np.array(
        [np.expand_dims(self._transform(filename['image']), axis=3) for filename in self.files])

But it did not work

How can I solve this problem to train my gray image ?
And what is the mean of NUM_OF_CLASSESS?

relu5_3

Hello,

Is there a particular reason why the Inference continue from the "conv5_3" level of the vgg_net and not from the "relu5_3" ? I mean teh input to "POOL 5" is "conv5_3" and not "relu5_3"

What am i missing?

Thanks!

GPU training Nan loss

Hi developers,

I encountered a weird issue. I tried the code on our Lab server, it worked well.

because recently I added my own graphic card on my macbook pro externally, so I want to try the model on it. Strange thing is, the graphic card on lab server is identical to the one on my laptop(980ti), but I encountered nan loss after a few steps on MBP:

Step: 880, Train_loss:4.60517
Step: 890, Train_loss:4.10488
Step: 900, Train_loss:3.88846
Step: 910, Train_loss:3.37081
Step: 920, Train_loss:2.04156
Step: 930, Train_loss:3.50961
Step: 940, Train_loss:nan
Step: 950, Train_loss:nan
Step: 960, Train_loss:nan
Step: 970, Train_loss:nan
Step: 980, Train_loss:nan
Step: 990, Train_loss:nan
Step: 1000, Train_loss:nan
2017-06-17 01:29:08.570643 ---> Validation_loss: nan
Step: 1010, Train_loss:nan
Step: 1020, Train_loss:nan

I googled a while but did not find an answer. Do you know why? It might not relevant to the code itself, but it's good to hear hints if you have any :)

Final convolution layer 5_3 and not 5_4?

Hello!

I've been reading up on the paper as well as reading your code to get a good grasp of how to do image segmentation using ConvNets. I was wondering why in FCN.py line 86 you set the conv layer to 5_3 instead of 5_4? Also to clarify for FCN in general, we're changing the fully connected layers to convolution layers as well which is what you're doing from 87 to 108 and then afterwards start working backwards, "deconvolving" the image, which it appears you do three times?

Thanks.

Prediction output of size 224x224 as opposed to original dimensions

Hey again!

Sorry for opening another issue, but I am in the home stretch of getting this codebase suited to my data and almost everything is coming together.

So the output predictions are of size 224x224x1 and for my purposes I need to reconstruct the images back together with their geo-location intact. In order to do so however, I need the output predictions to be of size 256x256x1.

Do you know if there is a way to restore the predictions to those dimensions without changing the segmentation mapping or resolution?

Thank you again for both this awesome repo and for your help!!

The order of the fc layers

Do you know how to load the fully-connected layers as you load the conv layers
is the following still true for fully-connected layers?

matconvnet: weights are [width, height, in_channels, out_channels]

tensorflow: weights are [height, width, in_channels, out_channels]

the output of pool5 with a image with shape[1,224,224,3] is [1,7,7,512], and the fc6 weight is [7,7,512,4096], so how to flatten the weight in matconvnet to fit the format of tensorflow?

How can I train arbitrary sized images?

Hello, thanks for your code firstly.

I found this problem was previously discussed as issue #18, I am very sad...I try to change the code at "image_options = {'resize': False}" and change the image and annotation's placeholder "IMAGE_SIZE" to "None", but it always throws the "ValueError: setting an array element with a sequence." So it still can't be solved now ?

But I really want to get an predict image, whose size is as same as the original input image (un-resized). How can I solve this problem?

Thanks in advance!

Inference using large amount of GPU memory

Thanks so much for you wonderful work here. I have been able to modify this code to train some really accurate segmentation models. I am now trying to get one of them running on a Jetson TX1, but I am having some issues. I have tensorflow 1.0.1 installed and running correctly on the TX1, but when I try to run the --visualize setting to just do inference I run out of memory. I went back to my regular desktop that has a Titan X Pascal and did some tests using nvidia-smi to try and see how much memory was being used. It appears that even during the inference it is using over 10 Gb of GPU memory on my system.

Here is the output from sudo watch nvidia-smi while doing inference only:
Before running: 745MiB / 12183MiB
While Running: 11630MiB / 12183MiB

Do you have any idea why that would be happening? Any ideas on how to reduce this to 1.5 Gb or less for the inference? I can see where it would need that much memory for the training, but I am not sure why it would be doing that on the inference.

How to visualize the prediction?

I have run the command "python FCN.py --mode=visualize", but I got result far different from the author's result. Here is my result:

image

Does anyone know what should I do, and how to apply the whole model in my own dataset and test data set?

Thanks.

The output are always an black picture

Hello!
I just run your code for 10 min, and the loss quickly converged to 3. But when I goes to the tensorboard, the prediction are always an black picture. Why this happen? Is there anything wrong with the parameter?
Thanks!

training termination

Dear all,
I have two question about the training process ?
1- what is the termination criteria for learning? it seems to work very long time, and maybe not need in some problem.
2-How I can stop training manually and restore last saved net(structure and weights) to use in testing the network?
in another word if I stop training manually(I dot know how?), can I use the saved net in testing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.