ry / tensorflow-resnet Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 627.0 1.39 MB

ResNet model in TensorFlow

License: MIT License

Python 100.00%

tensorflow-resnet's People

Contributors

Stargazers

Watchers

Forkers

qixianbiao caomw appleseedez sequoiar benjamwhite cloudxtreme adityosanjaya gongenhao mrsaibot sun9700 desperado1992 zbxzc35 takerum jhhsia jjsong gideonite jeffzhengye amoliu zkailinzhang techscientist qgzang chagge robsalzwedel affluo darashi wenwangting ml-ai-nlp-ir ctgushiwei u4lr451 hxl1990 yaowenwu ronghanghu jeromeyoon dominicgwak alimiraftab apo-j kdjyss alexisvallet renmengye janericlenssen milestonesvn merinoraldua zzutk zilongzhong tongche alanguo001 beijinggao quizp2p zhiqiangwan joseph-zhong charlesshang rhythm92 andrei-pokrovsky thunderink jmrinaldi kimkilho zhangxinnan yatuzhang dkollias skycache birdgun hyh21521038 boristype000 freeyawork wingfox leiup remega pengchengai ml-lab mathjoy lepikhin nccheng h4ck3rm1k3 liuhuiwisdom akinswin ykwon0407 jacky168 ferranvidal xuerenlv bigsnarfdude junyue0214 liusiye solertis rioyokotalab jjamjung robustfengbin ozgurgundogan jasonzhao001 jiayohsu-junkers yenchi-hsu jithsjoy priyatransbit steelep mzweilin shijy07 linan7788626 puchodeeplearninglabs peipei1109 kami93 sankit1

tensorflow-resnet's Issues

New variables generated in moving_averages can't be assigned by caffe-model

The Issue is caused by a variable named "scale1/scale1/moving_mean/biased" which is generated in moving_average of bn.This variable has no corresponding value in caffe。I change "vars_to_restore = tf.all_variables()" to "vars_to_restore = tf.trainable_variables()".It works but the test result has great difference between tensorflow and caffe whether adding the function "_imagenet_preprocess" or not .Even in caffe and by the tensorflow meta and ckpt file provided by the author,I input the picture "cat.jpg",the output of Top5 is as follows:

This is obviously the wrong result。what's wrong it?
In addition,After using the function "tf.trainable_variables()",an error occurs when I restore from the files .meta and .ckpt produced by the file convert.py.
NotFoundError (see above for traceback): Key scale1/moving_mean not found in checkpoint
Does anyone have a successful conversion?

AttributeError: 'NoneType' object has no attribute 'group'

hello, when I run convert.py, I get error:

  File "convert.py", line 169, in parse_tf_varnames
    scale_num = int(m.group(1))
AttributeError: 'NoneType' object has no attribute 'group'

how to fix it?

can't load the pre-trained model.

always inform me of different tensor name not found error every time I press run. For example,

NotFoundError (see above for traceback): Tensor name "scale5/block3/c/scale5/block3/c/moving_variance/biased" not found in checkpoint files ./data/tensorflow-resnet-pretrained-20160509/ResNet-L50.ckpt

NotFoundError (see above for traceback): Tensor name "scale4/block6/b/scale4/block6/b/moving_mean/local_step" not found in checkpoint files ./data/tensorflow-resnet-pretrained-20160509/ResNet-L50.ckpt

bug in resnet.py

line 283 of resnet.py , should it be
weight_decay=FC_WEIGHT_DECAY) ?

Also, it might be better for each variable_scope, for example:
with tf.variable_scope('scale1'):
add resue flag at the end
with tf.variable_scope('scale1', resue = not is_training):

Current context was not created by the StreamExecutor cuda_driver

I have got the error below when i run the convert.py. both Caffe and Tensorflow are using the cpu mode. I am not sure why this error occurs. Any help is appreciated.

F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:334] current context was not created by the StreamExecutor cuda_driver API: 00000263E3664030; a CUDA runtime call was likely performed without using a StreamExecutor context

ERROR in inference()

First, when I use the inference() in resnet.py, assert will have an error, then I remove the assert, just give a assignment, and another error occurs, the shape of shortcut and x is not equal, they can not add together.
I don't know what happened

ValueError: None values not supported. tf10.0

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
Traceback (most recent call last):
  File "train_cifar.py", line 311, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "train_cifar.py", line 307, in main
    train(is_training, logits, images, labels)
  File "/home/me/tensorflows/tensorflow-resnet/resnet_train.py", line 33, in train
    loss_ = loss(logits, labels)
  File "/home/me/tensorflows/tensorflow-resnet/resnet.py", line 148, in loss
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 545, in sparse_softmax_cross_entropy_with_logits
    logits = ops.convert_to_tensor(logits)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 621, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 163, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 346, in make_tensor_proto
    raise ValueError("None values not supported.")
ValueError: None values not supported.

About the slim/nets/resnet_v1.py

Hello, I have a question about the nets.resnet_v1.resnet_v1_101() this function in the tensorflow. I observe the structure of this model and find that the output of last three layers are [1, 7, 7, 2048], [1, 7, 7, 1024], [1, 14, 14, 512]. However the output of last three layers of the normal resnet sholud be [1, 7, 7, 2048], [1, 14, 14, 1024], [1, 28, 28, 512].
So what I doubt is that if I use small size feature map [1, 7, 7, 1024] to build the class and box subnet, will it affect the model's result of objects of a certain size? Because the aim of different size of feature map is to detect objects in different scales

Does anyone successfully train this model?

Hi, I am trying to use this model for my specific dataset, but it gives a NaN after few steps(no more than 10 steps). At the beginning, the loss value seems right but, all of sudden, it goes NaN in a single step, like from 5.5 to NaN. I tried extremely small learning rate, but the result was same.

It seems that there are division by zero. FYI, I am using tf-0.10-rc0. Any ideas?

is there anyone have successfully load the pretrained model in code?

is there anyone have successfully load the pretrained model in code?
I have no idea about it.
I have tried :
saver1 = tf.train.import_meta_graph(pretrained_meta)
saver1.restore(sess, pretrained_ckpt)
but it tells me that "At least two variables have the same name"
if you have loaded successfully, please tell me your method. I will appreciate it.

could you please tell me why there are two ''return'' in line 25 and line 26 in convert.py

I don't understand because I think there must be one ''return'' in a function. There will be no operation below the code ''return''.
thank you

This looks like an interesting start?

Are there some instructions on how to make it work?

inference_small function return none

sudo python train_cifar.py

WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
Traceback (most recent call last):
File "train_cifar.py", line 320, in
tf.app.run()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train_cifar.py", line 316, in main
train(is_training, logits, images, labels)
File "/Users/abc/work/tensorflow-resnet/resnet_train.py", line 33, in train
loss = loss(logits, labels)
File "/Users/abc/work/tensorflow-resnet/resnet.py", line 150, in loss
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
labels, logits)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1535, in _ensure_xent_args
raise ValueError("Both labels and logits must be provided.")
ValueError: Both labels and logits must be provided.
abcdeMacBook-Pro:tensorflow-resnet abc$

caffemodel files are missing

Hi,

I usually use tensorflow and am new to caffe so please be patient.

I am trying to get resnet weights out and I tried the convert.py seems the caffemodel files are missing, how do we get those?

and FYI Resnet.inference is missing preprocessing argument in definition

I look forward to a speedy resolution

The project will worke?

hi! this project will worke clearly?

Bug in ResNet forward.py

Thanks for the great work @ry .

I am trying to use your converted TF models. I just tried "forward.py" but got the following RunTime error:

RuntimeError: NodeDef mentions attr 'data_format' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]>; NodeDef: import/conv1/conv = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/device:CPU:0"](import/preprocess/centered_bgr, import/conv1/kernel)
^CWe've got an error while stopping in post-mortem: <type 'exceptions.KeyboardInterrupt'>

I tried all 3 TF models but got the same error. I checked the downloaded files and they were downloaded fully and correctly. Can you please help?

Thanks much!
Hamid

cannot download the pretrained weight

cannot download the pretrained weight by the torrent

block function in resnet.py

I think that there is a small mistake when defining the 'bottleneck' structure in the block function .
In your code, three 1x1 conv layers ('a', 'b', 'c') are used to constitute the 'bottleneck' block.
I believe you forgot c['ksize'] = 3 when defining layer 'b'.

Inconsistencies with the original implementation

Correct me if I'm wrong: the model declared here is different from the original Caffe model from the author

The shortcut layer in this model is added with a non-identity convolution layer but the original model is just using the value in the previous layer.

image classification using resnet pre-trained

hey, what command should I run to do a classification or object recognition on a pre-trained model given to us in the torrent file?

[jalal@scc-c01 MHRN]$ tree tensorflow-resnet-pretrained-20160509
tensorflow-resnet-pretrained-20160509
|-- ResNet-L101.ckpt
|-- ResNet-L101.meta
|-- ResNet-L152.ckpt
|-- ResNet-L152.meta
|-- ResNet-L50.ckpt
`-- ResNet-L50.meta

0 directories, 6 files
[jalal@scc-c01 MHRN]$ tree tensorflow-resnet
tensorflow-resnet
|-- LICENSE
|-- README.md
|-- __init__.py
|-- config.py
|-- convert.py
|-- data
|   |-- ResNet-101-deploy.prototxt
|   |-- ResNet-152-deploy.prototxt
|   |-- ResNet-50-deploy.prototxt
|   |-- ResNet_mean.binaryproto
|   |-- cat.jpg
|   `-- tensorflow-resnet-pretrained-20160509.tar.gz.torrent
|-- forward.py
|-- image_processing.py
|-- resnet.py
|-- resnet_train.py
|-- synset.py
|-- train_cifar.py
`-- train_imagenet.py

1 directory, 18 files

a CUDA runtime call was likely performed without using a StreamExecutor context

I got error like this, which seems to be caused by conflicts between caffe and tensorflow:

current context was not created by the StreamExecutor cuda_driver API: 0x2cc2820; a CUDA runtime call was likely performed without using a StreamExecutor context

However, I didn't find the way to solve it.

Error in resnet_train.py

I have been running inference with small number of images and then training; code only runs for one step and then breaks with following error:

step 0, loss = 1.13 (14.0 examples/sec; 0.642 sec/batch)
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py', wdir='C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master')

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py", line 76, in
image_tensor = sess.run(error)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 984, in _run
self._graph, fetches, feed_dict_string, feed_handles=feed_handles)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 410, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 227, in for_fetch
(fetch, type(fetch)))

TypeError: Fetch argument None has invalid type <class 'NoneType'>

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Run call was cancelled

Output Node Names for the pre-trained data ?

I am trying to use Freeze_Graph.py in Tensorflow_serving on the pre-trained model listed on this page which is in .ckpt and .meta

I have tried the output node names as "Accuracy/predictions"

However this is not correct, I expect the output node names were passed into whatever trained the model. However this isn't listed in the Readme.md.

RGB or BGR

When I load the trained models in the torrent files, do they work with RGB or BGR images? In 'convert.py', the image is converted from RGB to BGR (with 'preprocess()' function) after being loaded by skimage.io.imread(). However, this routine is not found in 'forward.py' (only loading routine is there).

NameError: global name 'inference' is not defined

When running train_imagenet.py, there is an error:

File "train_imagenet.py", line 96, in main
logits = inference(images,
NameError: global name 'inference' is not defined

So is there anybody who can tell me where is 'inference' defined? Thank you.

typo in FC layer

Hello,

In resnet.py:283 you use FC_WEIGHT_STDDEV even for weight decay which looks wrong for me. Probably just a copy-paste bug.

Saving Resnet graph code

Awesome work, looking forward to this. We get the graph saved as a metagraph but we can't really change much there easily after importing it. I was looking for a way to have the actual graph making code and then just restoring the weights by restoring the checkpoint file without the need for metagraph file. The reason is that I want to restore all weights to the full graph initially but then experiment with the graph by taking outputs at different layers of the resnet, which is only possible if I have the actual graph in code and I know the name of each layer. Do you think there's a way to get this code, ignoring other metadata like hyperparams. I think the info on this is there in convert.py and resnet.py but it's not clear enough to manually extract the desired code.

about queue_runners

when the program is running to"tf.train.start_queue_runners(sess=sess)"
then error.

ERROR:tensorflow:Exception in QueueRunner: Cast string to int64 is not supported
[[Node: Cast_1 = CastDstT=DT_INT64, SrcT=DT_STRING, _device="/job:localhost/replica:0/task:0/cpu:0"]]
my label dtype: int64 and img dtype:float32
how to fix? @ry
thx

Does it still run less accurately than the Caffe version?

how to convert checkpoint files to .pb format.

I want to use the pretrained residual network weights and finetune it on my own dataset. There is an example given in tensorflow examples/image_retraining/retrain.py which loads a pretrained inception model and relearns the last layer. In this the model is loaded in .pb format. Can anyone give the code to convert checkpoint files to .pb format.

Could you please upload a pre-trained model as .tfmodel file?

I've been getting puzzled about loading the .ckpt+.meta pre-trained model, and I really need a .tfmodel one. Could you please help me?
Thanks a loooooooooot!

how about imagenet txt

hi
can you show the txt for imagenet
I can not download imagenet, so you can give me the the txt for imagenet , just a little .
I want to run your train_imagent.py , but it seem need txt and jpg

Retrain resnet model on new data.

Hi,

I am trying to add a new fc layer on top of the avgpool resnet layer with a different number of outputs to suit my problem.
I do not want to only retrain the new fc but also the previous layers. So I need the gradients of the previous layers as well. Unfortunately this does not seem to work.
I have tried on a dummy net that I have created to save it (without the gradients -- so similar to the provided resnet meta and ckpt) and then load it and add a new fc layer and this worked without problems.

Here is a snapshot of my retraining code:

# Start the session:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=False))

# Gets data batches.
trainimages, trainlabels = dataAsTensors(is_training=True, batch_size=FLAGS.batch_size)

# In the default graph:
graph = tf.get_default_graph()
with graph.as_default():    

    # Data saver loading the graph meta only.
    dataSaver = tf.train.import_meta_graph('ResNet-L50.meta')

    for op in graph.get_operations():
        print op.name

    # Get both the 'avg_pool' and the 'images' operations.
    images = graph.get_tensor_by_name("images:0") 
    avgpool = graph.get_tensor_by_name('avg_pool:0')  

    # Define a new fc layer on top of the avg_pool layer 
    logits, _ = fc_num_outs(avgpool, FLAGS.num_classes, FLAGS.avgpool_size)    

    # Define the loss on top of the new fc and a placeholder for the labels 
    labelsVar = tf.placeholder(tf.int64, shape=(FLAGS.batch_size), name='labelsVar')
    loss_ = loss(logits, labelsVar)

    # Define the gradients and get the operation.
    global_step = tf.Variable(0, name='global_step', trainable=False)    
    ops = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
    train_op = ops.minimize(loss_, global_step=global_step)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess, coord=coord)
    with sess.as_default():

        # Initialize all variables.
        sess.run(tf.initialize_all_variables())

        # Restore the RESNET checkpoint after initialization.
        dataSaver.restore(sess, "ResNet-L50.ckpt")

        for i in range(0, FLAGS.max_steps):
            # Feed the batch images and the labels.
            npImages = trainimages.eval()
            npLabels = trainlabels.eval()

            # Run 1 step of the gradient optimization.
            sess.run(train_op, {images: npImages, labelsVar: npLabels})
            print "Done running grad step.. ", i

            if (i % 100 == 0): # Save the checkpoint
                dataSaver.save(sess, 'resnet_retrained' + str(i) + '.ckpt')

    coord.request_stop()
    coord.join(threads)
    sess.close()

I am not sure why for the resnet model I get this error:

File "retrain.py", line 278, in main
retrain()
File "retrain.py", line 244, in retrain
train_op = ops.minimize(loss_, global_step=global_step)
File "tensorflow/python/training/optimizer.py", line 193, in minimize grad_loss=grad_loss)
File "tensorflow/python/training/optimizer.py", line 250, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops)
File "tensorflow/python/ops/gradients.py", line 467, in gradients out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i)
File "tensorflow/python/ops/control_flow_ops.py", line 1047, in ZerosLikeOutsideLoop pred = op_ctxt.pred
AttributeError: 'NoneType' object has no attribute 'pred'

while for my own toy model the same code seems to work.

Thanks a lot.
Cheers,
Silvia

Error restoring the pre-trained resnet models.

Hi,

Great job providing the code and the pre-trained resnet models.
However, I have a problem loading/restoring the torrent models.

with tf.Session() as sess:
    dataSaver = tf.train.import_meta_graph('ResNet-L50.meta')
    dataSaver.restore(sess, 'ResNet-L50')

I get the error:

tensorflow.python.framework.errors.DataLossError: Unable to open table file ResNet-L50.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Any suggestion/help is much appreciated.
Best,
Silvia

Bug in forward.py

Hello,
the following error occurred when I execute the forward.py file

Traceback (most recent call last):
File "forward.py", line 11, in
new_saver.restore(sess, [checkpoint_fn(layers)])
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor u'save/Const:0', which has shape '()'

Could you please help me?

Thanks much!
Omid

Not able to find any peers for torrent

Is there anyway you could provide these as a direct download link on s3 or something?

ImageNet example complete?

Hello,
I wanted to know if the ImageNet-Resnet example is complete as I would like to test it out. I saw in the Readme that it wasn't?

I also don't understand how https://github.com/ry/tensorflow-resnet/blob/master/train_imagenet.py#L94 works in loading all the images of the dataset. It seems to be returning a tensor of size of FLAGS.batch_size

Any update would be really appreciated.
Thank you,
Ankur

Question about mean subtract

How come cafe model uses the mean subtracted image, while tensorflow model uses the original image?
caffe_model = load_caffe(img_p, layers)
vs.
o = sess.run(i, {images: img[np.newaxis, :]})

I dont see _imagenet_preprocess() being used anywhere

I think you may have unintentionally deleted image net preprocessing in this commit
6b42dfa

modify inference_small function in resnet.py

the original code in resnet.py within inference_small function didn't have return values!!!so add the following code at its last
logits=inference_small_config(x, c)
return logits

'convert.py' does not work with current 'resnet.py'

Running convert.py outputs the error:

Traceback (most recent call last):
  File "convert.py", line 343, in <module>
    tf.app.run()
  File "/home/mifs/mttt2/.virtualenvs/tfr1.0/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "convert.py", line 339, in main
    convert(g, img, img_p, layers)
  File "convert.py", line 259, in convert
    bottleneck=True)
TypeError: inference() got an unexpected keyword argument 'preprocess'

The current version of resnet does not seem to have a preprocess option. Is this still required?

Validation error

 if step > 1 and step % 100 == 0:
            _, top1_error_value = sess.run([val_op, top1_error], { is_training: False })
            print('Validation top1 error %.2f' % top1_error_value)

So this is the last line of resnet_train.
I am wondering if this is actually giving the user the so-called "test error" where the images are from the test sets.

Can anyone answer my question?

cannot load .ckpt file

I have been trying to run your forward.py code. But I get the following error message
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ResNet-L50.ckpt in

"forward.py", line 11, in <module> new_saver.restore(sess, checkpoint_fn(layers))

How can I solve this issue? I am using tensorflow .10 with python 2.7

Fully convolutional?

How exactly is ResNet fully convolutional?
In the original implementation, there is fc layer at the end...

It'd be cool to make it fully convolutional though :)