irolaina / fcrn-depthprediction Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 311.0 29 KB

Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)

License: BSD 2-Clause "Simplified" License

MATLAB 36.01% Python 63.99%

convolutional-residual-networks depth-maps depth-prediction matconvnet tensorflow

fcrn-depthprediction's People

Contributors

Stargazers

Watchers

Forkers

naveen-tirupattur benjamesbabala baiyancheng20 anjiang2016 liuhengli excuseyour zeyuan1987 ml-lab caomw holzers mayanxin89 peterzs fanghaizhao bemoregt rap9430 jszhujun2010 nueluno leiup j0k qingzew dengdan loretoparisi rainy1798 libardo1 anotherotherme anilsener asmith26 chenyncv flyinggh ismymajia maor-gramlabs ten2net perception-slam tacitadeplata dl-yc gnuchev impossibilitylabs llcc343 miguelalgaba wang-kx catherineyao learneringithub lancelot899 konanrobot chrismbirmingham sunkaianna fireae x007dwd jgraving satstanford xuanhan863 jinkijung yushanshan05 techscientist jxjrework mightychaos tresym wulingtian fatterzhang cjtang xinguo2015 rodamn its-dron zumbalamambo skylook buaazimmy nikolausdemmel liyancas melights shiyongde lraxue eborboihuc tcyhx nullstring justrypython hyuantan rodgeliao grenki kidlin solertis imaduddinamajid kbmajeed satoshirobatofujimoto ccyinlu angel817 nightinwhite manuelusb tgithubj stainless-steel-rat wiibrew louisiv decayale huaijing mqqiao yingning fect heyuanpem amoliu maolb ryuyamamoto

fcrn-depthprediction's Issues

tensoflow script

can you add please to predict.py resize the image like matlab?
and if i understand right the neural network can get maximum 640,480 image size?
thanks

Can this model be used in real_time?

I want to use the model in real_time.
However, input image is set as (228,304).
Can u help me?
Thank you!!

I want to knwo can this model estimate the real distance?

Can I estimate the real distance after prediction? Can I get like some depth information besides color?
Thank U!

Difference between two pre-trained models

Hi, all. There are an in-door and an out-door pre-trained model, are there much differences between them? I want to estimate the depth for some images containing both in-door and out-door images, which one should I use, and Does that matter?

Question about train code of matconvnet

Hello，
I have a problem，I don't know how to use your methods to train。
Can you give me the original training code of matconvnet? I want to know more details about your experiment。So if you are willing，Please share your training source code of matconnet give me。thanks

Some problem

Hello,I'm a maching learning new learner.I have read your acticle,it's pretty nice;I've learned a lot from your paper and run the code successfully and get the final results.However,I have some questions.
The core concept in your paper is your proposed CNN Architecture,right? In Experimental,Your article evaluate the influence of the depth of the architecture using the convolutional blocks of AlexNet,VGG-16 and ResNet-50,I try to replace your NYU_ResNet-UpProj.mat with imagenet-resnet-50-dag.mat directly,it makes errors.Later I know that the architecture of Resnet-50 need to be changed, May replacing the last full conections with fully convolutional?
Sincerely expected your reply.Thanks a lot.

Error when running your TensorFlow code-predict.py

line 46, in setup .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1')
ValueError: ('stride must be less than or equal to filter size', 'stride: [2x2] filter: [1x1]')

Max Depth and Depth Format

Hello.

I was just wondering what the maximum depth your model can reliably predict is. Also is your depth format linear or log scale?

Thanks

Padding side might be wrong

Hi,

I was trying to reimplement your awesome work using PyTorch this afternoon but met a trouble on your fast up-projection implementation.

In your unpool_as_conv function, convB/C/D was computed after an asymmetric padding. However, after I decompose the up-pooling and 5x5 conv computation to the interleaved sum of four smaller convolution results by hands, I found that paddings bebore convB/C/D should be [[0, 0], [0, 1], [1, 1], [0, 0]], [[0, 0], [1, 1], [0, 1], [0, 0]] and [[0, 0], [0, 1], [0, 1], [0, 0]] instead of [[0, 0], [1, 0], [1, 1], [0, 0]], [[0, 0], [1, 1], [1, 0], [0, 0]] and [[0, 0], [1, 0], [1, 0], [0, 0]]. So is it possible that padding side in the code was wrongly set by mistake?

Dropout during training

Hi, the value of dropout_rate was not mentioned in your paper when training. I would like to know which value it is when training
Thank you

Why the lr-0.01 is too large for me?

When I set the lr as 0.01, I think the network learn nothing, the images in tensorboard are just all black. Besides, sometimes, the initial loss will become so large or nan. I use the raw NYU images, and the images are about 95k, can you give me some advice?
@iro-cp

Matlab version

Hi, I found websave function is not exist in my matlab R2014a. So it will result in error when websave is used in .m file. What can I do to solve this problem？

questions about the number of input images

Hi,

thanks for your awesome work!

Could you please tell me how many images are used in the training stage?

I notice that there are many papers just using the labeled data(795 images) to train the model and 654 images to evaluated the metrics.

Is there some rule of the number of training image?

I am looking forward to your reply!

Training code

Can you share the training codes for training our own datasets? We want to evaluate your method on the complex datasets?

Question about input image

Hello,

What kind of image can be used as test image ? Trying to use your code and latest release of Tensorflow. As result I have got the errors:

Loading the model
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/code/py/ddpwfcrn/predict.py", line 71, in
main('C:/code/py/ddpwfcrn/NYU_ResNet-UpProj.npy', 'C:/code/py/ddpwfcrn/img/test.png')
File "C:/code/py/ddpwfcrn/predict.py", line 66, in main
pred = predict(model_path, image_paths)
File "C:/code/py/ddpwfcrn/predict.py", line 47, in predict
pred = sess.run(net.get_output(), feed_dict={input_node: img})
File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 767, in run
run_metadata_ptr)
File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 228, 304, 4) for Tensor 'Placeholder:0', which has shape '(?, 228, 304, 3)'

I test your tf-model metrics-error and the result can not reach 81%

Hi, @iro-cp
I have test your result on nyu_label dataset (use test 654 images, tf-model is provided by you) ,
http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat

There is my error calculate method, (a1 corresponding your delta in paper):

    thresh = np.maximum((gt / pred), (pred / gt))
    a1 = (thresh < 1.25   ).mean()
    a2 = (thresh < 1.25 ** 2).mean()
    a3 = (thresh < 1.25 ** 3).mean()

    rmse = (gt - pred) ** 2
    rmse = np.sqrt(rmse.mean())

    rmse_log = (np.log(gt) - np.log(pred)) ** 2
    rmse_log = np.sqrt(rmse_log.mean())

    abs_rel = np.mean(np.abs(gt - pred) / gt)

    sq_rel = np.mean(((gt - pred)**2) / gt)

The result (77%) is not good when compare with your paper .
Did you test errors on tf-model ?
Maybe there are some difference with my evaluate scipt. (network image feeded code is the same with your realeased code)

img = Image.open(image_path)
img_resize = img.resize([width,height], Image.ANTIALIAS)
img_resize = np.array(img_resize).astype('float32')
 img_resize_expend = np.expand_dims(np.asarray(img_resize), axis = 0)

Your tensorflow code may have bugs, it is very hard to train.

Hi~
I have trained your network use the code provided and hyper-parameter in the paper. But I find it is very very hard to train, even the training dataset gets the blurry depth prediction.
I have also tried fine-tuning your model (.npy file ), the final result is also unsatisfied. I am confused for a long time.
I have asked several people, all of them met the same problem I am facing.
So can you double check your code, or release your training code. No matter tensorflow version or matlab version.

LR during training

In the article you mention, that the initial the learning rate is 0.01 for all layers. I would like to know if you used equal learning rates for all layers during the training or just at the beginning. In other words, when you reduce the learning rates during training, do you reduce it with the same amount for every layer?

I can't reach 0.217

When training, how should I do with groundtruth? such as normalization

hello, I am trying to recreate your results on the NYU_depth dataset with Tensorflow, I use your published test code directly as a build network. When training, I found the value of the network output is not between 0 and 1. So I don't know deal with groundtruth to calculate loss. Can you help me to solve this problem?
Thank you.

How

How to process ground-truth depth when training?

Hi~
the original frames of size 640 × 480 pixels are down-sampled to 1/2 resolution and center-cropped to 304 × 228 pixels
When I do the same operation to ground-truth, It seems that the black border still exists.
How to process ground-truth depth？

Question about show the prediction

I used opencv to do the prediction in real-time by Opencv. However, I got the output only in black and white, without any color. Which type of data" is pred[0,:,:,0]" ? How should I do with opencv?
Could you help me?
Thank U very much.

Your predictions error is inconsistent with your paper, why?

Hello, it is a great work. But I just test your predicted map provided by yourself and cannot get same error
like your paper, I test it with matlab.
your paper error: rmse 0.573 rel 0.127
your predicted map error: rmse 0.58912 rel 0.136271
Then, I get the predicted map with your pretrained model provided in this github, and write a python script
to test error, I also cannot reach the results like your paper.
predicted map error with your ckpt model: rmse 0.587333 rel 0.141360
I am very confused with it, could you explain it, thanks!

Why the colormap performs incorrectly by opencv

By matplotlib, it is good.
However, it becomes like this by opencv.

Can anyone tell me? Thanks

How to define loss function

I just use the simple L2 loss to train the model. But the loss value can not converge.
I use tensorflow to finish it.
Here is my code:

` with tf.name_scope('loss'):

labels_mask = tf.to_float(input_labels>0)

output = net.get_output()

loss = tf.reduce_sum(tf.square((output-input_labels)*labels_mask))/tf.reduce_sum(labels_mask)

tf.summary.image('output', output, max_outputs=3)

tf.summary.scalar('loss', loss)`

Is it correct? If there is something wrong, please let me know. Thanks.

Reduce computational time

It is taking more than a minute to calculate depth map, How can i reduce this time?

release question

dear sir ,as you have said that you will provide the code for training and ｅｖen the caffe or tensorflow implementation ,since i have some problems in this implementation, i'd like to know if your will ｄｅlay the release.thanks

Where can I find train/test split for NYUD v2 unlabeled data?

I downloaded the unlabeled dataset separately and can't find train/test split information anywhere.

Have you set the weight decay during the traning?

Hi~ Laina, I want to ask, Have you set the weight decay of the convolution layers？

Hey.On which machine did you train the model?.I am trying run this pretrained model of yours on Titan X GPU machine with 16 GB ram.But my machine reboots.which seems due to excessive load on the machine!

train image

i am trying this paper in the caffe
i am using NYU2 raw depth dataset. about 12K~13K sampled depth image.
at the train task of you, did you use filtered image(cross bf or colorization) or not(raw depth image)?
colorization is good filter method but it is too slow

From depth map to grayscale image

Can someone explain to me in a few words how to convert the depth image to a black and white depth map, where white are the closest points and black are the farthest. (I don't quite grasp the colorized image. Is it linear or logarithmic, what is the max value etc.)

Question about test the image

Hello, I have a little problem, I try to test the image on preduct,py.
When I input commend "python predict.py NYU_ResNet-UpProj.npy (my own image)", it's get a error like this:

How to fix it problem?
Thanks you!!

Training details

Hello!

I am trying to recreate your results on the NYU_depth dataset with Pytorch. I am fairly confident that my network structure, loss function, and data augmentation process is correct, but I am unable to reach a similar depth image quality as your Tensorflow outputs (see the attached images).

My guess is that the difference might be in the training process. I tried to work according to your article, but a few details are unclear. You wrote that you gradually reduce the learning rate when you observe plateaus. How do you define a plateau, and what does gradually means in this case?

To get the results below, I used SGD optimizer with 0.01 init LR and 0.9 momentum, and I halve the learning rate after every 7th epoch.

some test image:

the output using your Tensorflow network:

the output using my Pytorch network:

code for training

I cannot find the code that is used to train your model, will you be making it available soon please? I would like to experiment with different data sets. :)

sth wrong

evaluateNYU()
loading data to workspace... done!
predicting...
Error using vl_nnconv
An input is not a numeric array (or GPU support not compiled).

Error in dagnn.Conv/forward (line 11)
outputs{1} = vl_nnconv(...

Error in dagnn.Layer/forwardAdvanced (line 85)
outputs = obj.forward(inputs, {net.params(par).value}) ;

Error in dagnn.DagNN/eval (line 88)
obj.layers(l).block.forwardAdvanced(obj.layers(l)) ;

Error in DepthMapPrediction (line 60)
net.eval(inputs) ;

Error in evaluateNYU (line 47)
predictions = DepthMapPrediction(testSet, net, netOpts);

How to make data augmentation

Hi ,I am trying to recreate your results on the NYU_V2 dataset，but the huber loss can't converge when training. I guess i need to make data augmentation, but i am unsure how to do it. Now i have 12k image pairs, if i want to get 95K pairs, should i make rotation, scaling, color transformations and flip eight times
for each RGB-D image pair ? Besides, when computing loss, if the input groundtruth depth size is 640x480?
Thank you.

is this a bug? can't run normally when use matlab, error in net.eval(inputs);

test the matlab version, use the NVU_ResnNet-UpProj.mat (download from the project github)when load a RGB image, when run to prediction..., and net.eval(inputs) ; return error:

Error in dagnn.Conv/forward (line 11)
output{1} = vl_nnconv(...
Error in dagnn.Layer/forwardAdvanced (line 85)
outputs = obj.forward(inputs, {net.params(par).value}) ;
Error in dagnn.DagNN/eval (line 88)
obj.layers(l).block.forwardAdvanced(obj.layers(l)) ;
Error in DepthMapPrediction (line 65)
net.eval(inputs);

fx>>

=======================

what is the problem? please help, thanks a lot!

Unpooling indices stored as tf.Variables

In network.py:

  def prepare_indices(self, before, row, col, after, dims ):
    x_0 = tf.Variable(x0.reshape([-1]), name = 'x_0')
    x_1 = tf.Variable(x1.reshape([-1]), name = 'x_1')
    x_2 = tf.Variable(x2.reshape([-1]), name = 'x_2')
    x_3 = tf.Variable(x3.reshape([-1]), name = 'x_3')

This is the wrong use of tf.Variable (variable nodes are used as a source for value that the net expects to change, typically weights). In this case, this is just reshaping the indices from np.meshgrid, so these values aren't weights, or anything like that. There could be a specific reason these are made as tf.Variable that I'm unaware of, but it seems these lines should be:

  x_0 = x0.reshape([-1])

Why this matters:
tf.Variable nodes typically store "trainable" values, which must be stored in checkpoints and loaded weight files. Since these are four 4D-flattened-to-1D arrays and there is a set of these for each up-conversion, this is a lot of data being stored to disk, which must also be loaded from disk (and saved, when creating checkpoints). These are basically indices so no change (learning) is expected, this saving and loading I propose is needless.

Case in point, these variables seems to be the prime contributor to the long load time of the weights file. In predict.py:

  net.load(model_data_path, sess)

takes several minutes on my computer in the current state. Changing prepare_indices() as indicated above reduces the load time by orders of magnitude, however making this change MIGHT make the new model incompatible with the current weights file, NYU_ResNet-UpProj.npy (I am having trouble making the net work with this change, so more investigation is needed on my end, but I figured I would raise this issue in case others are available to work on resolve this).

Since this is a non-functional change, I propose the authors try the following:

Remove the tf.Variable nodes as shown above (making them simple operations)
Retrain using identical meta-parameters as in the paper (if the starting weight values are still available)
Compare results pre- and post- change to ensure they generate the same output?

If the starting weights aren't available, I suppose a full retraining would just need to generate acceptable results.

Is it possible to share NYU Depth result?

Hello, is it possible to share your NYU depth result files? So that we don't have to rerun your code for benchmark and reference. Thanks!

The prediction is not well

I restore the model that you provide, and finetune the layers except the resnet50.
I use Berhu loss. And I use AdamOptimizer to minimize my loss, the learning rate is 0.0001, and after 10000 steps the lr is 0.000001. The whole steps is 20000.
Here is my loss curve:

Here is the prediction that I use my model:

As you see, the loss value is small, but the prediction is not very well, it is ambiguous. Can you give me some advice? Thanks!
@iro-cp

Dig into the depth of the beginning

About test images which used by your paper?

Hi, @iro-cp , I can't find the test images in nyu official website.
Can you provide origin test_images used by your paper?
As I think that only use the same test_images can validation inference. And it's important to unify the same standard.

Tensorflow model for Make3D

Are you planning to release the model soon?

Question about Reverse Huber Loss

Hi @iro-cp I'm taking a look on the paper and got confused on the section 3.2 (Loss Function)

Where c

Questions

c Will be a scalar related to 20% (1/5) max error on the batch correct?
When I need to select between L1 and Quadratic loss. More specifically the condition |x| < c. Does the |x| need to be summed over all it's elements in order to compare with this scalar number "c"?

Example

Considering the vector |x| with a batch of 3 elements as ...

My worst error will be 6, so c will be 1/5*6
c = 1.2

So summing all elements

|x| will be 9, then on this case the quadratic part of the reversed Huber would be selected

Or we are completely wrong and we should calculate B(x) for each element of |x| on the batch and then sum all those losses and divide by the batch size?

Can't converge when trainning using TensorFlow

I am trying to train this model using your TensorFlow code. But it can't converge.
I am using 'nyu_depth_v2_labeled.mat'. I accept the L2 loss for convenience.
The raw depths are used. Invalid pixels (where depth is zero) have been excluded from training. I have tried to fix the ResNet50 or not.

The code is as below:

# NetWork
graph = tf.Graph()
with graph.as_default():
    # Create a placeholder for the input image
    tf_train_dataset = tf.placeholder(tf.float32, shape=(None, img_height, img_width, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(None, depth_height, depth_width))

    # Model.
    # net = models.ResNet50UpProj({'data': tf_train_dataset}, batch_size, trainable=True)
    net_ResNet50 = models.ResNet50({'data': tf_train_dataset}, batch_size, trainable=False)
    layer1_BN = net_ResNet50.get_output()
    net = models.UpProj({'layer1_BN': layer1_BN}, batch_size, trainable=True)

    # Training computation.
    output = tf.squeeze(net.get_output(), squeeze_dims=[3])
    loss = tf.reduce_mean(tf.nn.l2_loss((output-tf_train_labels)*(tf_train_labels!=0)))

    # Optimizer.
    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 10**-3
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                               200, 0.8, staircase=True)
    # optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    momentum = 0.9
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step=global_step)

    # Add a scalar summary for the snapshot loss.
    tf.summary.scalar('loss', loss)

    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all()

# Train
with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  saver = tf.train.Saver()

  # Instantiate a SummaryWriter to output summaries and the Graph.
  summary_writer = tf.summary.FileWriter(log_dir, session.graph)

  # Load the converted parameters
  print('Loading the model')
  net.load('NYU_ResNet-UpProj.txt',session)
  print("Initialized")

  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]

    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, output], feed_dict=feed_dict)

    if (step % 10 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f" % accuracy(predictions, batch_labels))
      # Update the events file.
      summary_str = session.run(summary, feed_dict=feed_dict)
      summary_writer.add_summary(summary_str, step)
      summary_writer.flush()

  # print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

  saver.save(session, model_path)

  print('Done!!!')

The loss curve is as follws:

Fine-tune result not smooth

Hi, @iro-cp , I want to fine-tune from your nyu checkpoint model for my own dataset.

This is my fine-tune loss (berhu as your paper), fine-tune code is similar with tensorflow-deeplab-resnet
Finetune layer : layer16x_ and ConvPred
Learning Rate : 0.001
BatchSize : 8
Optimizer : AdamOptimizer

Tensorboard shows the depth result doesn't smooth in plane scenes (such as wall, red circle)

Can you provide some advise?
Thanks very much for sharing this perfect work!

Finetuning Resnet-50 didn't get sharp result

Hi,
Thanks for sharing your excellent work. I try to finetune Resnet-50 with NYU-v2 labeled data (1449 pair) in caffe (intending to overfit the dataset).

Augmentations are done as the paper describes, but depth uses log value. Upsamping layers are initialized with xavier. Batch_size: 1 (it costs 3G GPU memory), lr: 1e-6 (the gradient will explode when using 1e-5).

The prediction somehow looks like ground truth depth, but loses details. Sample result:

When decrease learning rate, the loss did not decrease too much and the result depth map gets more blur rather than clearer. Do you have any ideas about this?

Sincerely

Make3D Preprocessing

Hi there,

Is there any preprocessing that needs to be done for Make3D training samples apart from the part that depths below 70m only are considered?

Thank You.