ry / tensorflow-resnet Goto Github PK
View Code? Open in Web Editor NEWResNet model in TensorFlow
License: MIT License
ResNet model in TensorFlow
License: MIT License
The Issue is caused by a variable named "scale1/scale1/moving_mean/biased" which is generated in moving_average of bn.This variable has no corresponding value in caffe。I change "vars_to_restore = tf.all_variables()" to "vars_to_restore = tf.trainable_variables()".It works but the test result has great difference between tensorflow and caffe whether adding the function "_imagenet_preprocess" or not .Even in caffe and by the tensorflow meta and ckpt file provided by the author,I input the picture "cat.jpg",the output of Top5 is as follows:
This is obviously the wrong result。what's wrong it?
In addition,After using the function "tf.trainable_variables()",an error occurs when I restore from the files .meta and .ckpt produced by the file convert.py.
NotFoundError (see above for traceback): Key scale1/moving_mean not found in checkpoint
Does anyone have a successful conversion?
hello, when I run convert.py
, I get error:
File "convert.py", line 169, in parse_tf_varnames
scale_num = int(m.group(1))
AttributeError: 'NoneType' object has no attribute 'group'
how to fix it?
always inform me of different tensor name not found error every time I press run
. For example,
NotFoundError (see above for traceback): Tensor name "scale5/block3/c/scale5/block3/c/moving_variance/biased" not found in checkpoint files ./data/tensorflow-resnet-pretrained-20160509/ResNet-L50.ckpt
NotFoundError (see above for traceback): Tensor name "scale4/block6/b/scale4/block6/b/moving_mean/local_step" not found in checkpoint files ./data/tensorflow-resnet-pretrained-20160509/ResNet-L50.ckpt
line 283 of resnet.py , should it be
weight_decay=FC_WEIGHT_DECAY)
?
Also, it might be better for each variable_scope, for example:
with tf.variable_scope('scale1'):
add resue flag at the end
with tf.variable_scope('scale1', resue = not is_training):
Hi
I have got the error below when i run the convert.py. both Caffe and Tensorflow are using the cpu mode. I am not sure why this error occurs. Any help is appreciated.
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:334] current context was not created by the StreamExecutor cuda_driver API: 00000263E3664030; a CUDA runtime call was likely performed without using a StreamExecutor context
First, when I use the inference() in resnet.py, assert will have an error, then I remove the assert, just give a assignment, and another error occurs, the shape of shortcut and x is not equal, they can not add together.
I don't know what happened
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
Traceback (most recent call last):
File "train_cifar.py", line 311, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "train_cifar.py", line 307, in main
train(is_training, logits, images, labels)
File "/home/me/tensorflows/tensorflow-resnet/resnet_train.py", line 33, in train
loss_ = loss(logits, labels)
File "/home/me/tensorflows/tensorflow-resnet/resnet.py", line 148, in loss
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 545, in sparse_softmax_cross_entropy_with_logits
logits = ops.convert_to_tensor(logits)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 621, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 163, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 346, in make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Hello, I have a question about the nets.resnet_v1.resnet_v1_101() this function in the tensorflow. I observe the structure of this model and find that the output of last three layers are [1, 7, 7, 2048], [1, 7, 7, 1024], [1, 14, 14, 512]. However the output of last three layers of the normal resnet sholud be [1, 7, 7, 2048], [1, 14, 14, 1024], [1, 28, 28, 512].
So what I doubt is that if I use small size feature map [1, 7, 7, 1024] to build the class and box subnet, will it affect the model's result of objects of a certain size? Because the aim of different size of feature map is to detect objects in different scales
Hi, I am trying to use this model for my specific dataset, but it gives a NaN after few steps(no more than 10 steps). At the beginning, the loss value seems right but, all of sudden, it goes NaN in a single step, like from 5.5 to NaN. I tried extremely small learning rate, but the result was same.
It seems that there are division by zero. FYI, I am using tf-0.10-rc0. Any ideas?
is there anyone have successfully load the pretrained model in code?
I have no idea about it.
I have tried :
saver1 = tf.train.import_meta_graph(pretrained_meta)
saver1.restore(sess, pretrained_ckpt)
but it tells me that "At least two variables have the same name"
if you have loaded successfully, please tell me your method. I will appreciate it.
I don't understand because I think there must be one ''return'' in a function. There will be no operation below the code ''return''.
thank you
Are there some instructions on how to make it work?
sudo python train_cifar.py
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
Traceback (most recent call last):
File "train_cifar.py", line 320, in
tf.app.run()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train_cifar.py", line 316, in main
train(is_training, logits, images, labels)
File "/Users/abc/work/tensorflow-resnet/resnet_train.py", line 33, in train
loss = loss(logits, labels)
File "/Users/abc/work/tensorflow-resnet/resnet.py", line 150, in loss
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
labels, logits)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1535, in _ensure_xent_args
raise ValueError("Both labels and logits must be provided.")
ValueError: Both labels and logits must be provided.
abcdeMacBook-Pro:tensorflow-resnet abc$
Hi,
I usually use tensorflow and am new to caffe so please be patient.
I am trying to get resnet weights out and I tried the convert.py seems the caffemodel files are missing, how do we get those?
and FYI Resnet.inference is missing preprocessing argument in definition
I look forward to a speedy resolution
hi! this project will worke clearly?
Thanks for the great work @ry .
I am trying to use your converted TF models. I just tried "forward.py" but got the following RunTime error:
RuntimeError: NodeDef mentions attr 'data_format' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]>; NodeDef: import/conv1/conv = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/device:CPU:0"](import/preprocess/centered_bgr, import/conv1/kernel)
^CWe've got an error while stopping in post-mortem: <type 'exceptions.KeyboardInterrupt'>
I tried all 3 TF models but got the same error. I checked the downloaded files and they were downloaded fully and correctly. Can you please help?
Thanks much!
Hamid
cannot download the pretrained weight by the torrent
I think that there is a small mistake when defining the 'bottleneck' structure in the block function .
In your code, three 1x1 conv layers ('a', 'b', 'c') are used to constitute the 'bottleneck' block.
I believe you forgot c['ksize'] = 3
when defining layer 'b'.
Correct me if I'm wrong: the model declared here is different from the original Caffe model from the author
The shortcut layer in this model is added with a non-identity convolution layer but the original model is just using the value in the previous layer.
hey, what command should I run to do a classification or object recognition on a pre-trained model given to us in the torrent file?
[jalal@scc-c01 MHRN]$ tree tensorflow-resnet-pretrained-20160509
tensorflow-resnet-pretrained-20160509
|-- ResNet-L101.ckpt
|-- ResNet-L101.meta
|-- ResNet-L152.ckpt
|-- ResNet-L152.meta
|-- ResNet-L50.ckpt
`-- ResNet-L50.meta
0 directories, 6 files
[jalal@scc-c01 MHRN]$ tree tensorflow-resnet
tensorflow-resnet
|-- LICENSE
|-- README.md
|-- __init__.py
|-- config.py
|-- convert.py
|-- data
| |-- ResNet-101-deploy.prototxt
| |-- ResNet-152-deploy.prototxt
| |-- ResNet-50-deploy.prototxt
| |-- ResNet_mean.binaryproto
| |-- cat.jpg
| `-- tensorflow-resnet-pretrained-20160509.tar.gz.torrent
|-- forward.py
|-- image_processing.py
|-- resnet.py
|-- resnet_train.py
|-- synset.py
|-- train_cifar.py
`-- train_imagenet.py
1 directory, 18 files
I got error like this, which seems to be caused by conflicts between caffe and tensorflow:
current context was not created by the StreamExecutor cuda_driver API: 0x2cc2820; a CUDA runtime call was likely performed without using a StreamExecutor context
However, I didn't find the way to solve it.
I have been running inference with small number of images and then training; code only runs for one step and then breaks with following error:
step 0, loss = 1.13 (14.0 examples/sec; 0.642 sec/batch)
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py', wdir='C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master')
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py", line 76, in
image_tensor = sess.run(error)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 984, in _run
self._graph, fetches, feed_dict_string, feed_handles=feed_handles)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 410, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 227, in for_fetch
(fetch, type(fetch)))
TypeError: Fetch argument None has invalid type <class 'NoneType'>
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Run call was cancelled
I am trying to use Freeze_Graph.py in Tensorflow_serving on the pre-trained model listed on this page which is in .ckpt and .meta
I have tried the output node names as "Accuracy/predictions"
However this is not correct, I expect the output node names were passed into whatever trained the model. However this isn't listed in the Readme.md.
When I load the trained models in the torrent files, do they work with RGB or BGR images? In 'convert.py', the image is converted from RGB to BGR (with 'preprocess()' function) after being loaded by skimage.io.imread(). However, this routine is not found in 'forward.py' (only loading routine is there).
When running train_imagenet.py, there is an error:
File "train_imagenet.py", line 96, in main
logits = inference(images,
NameError: global name 'inference' is not defined
So is there anybody who can tell me where is 'inference' defined? Thank you.
Hello,
In resnet.py:283
you use FC_WEIGHT_STDDEV
even for weight decay which looks wrong for me. Probably just a copy-paste bug.
Awesome work, looking forward to this. We get the graph saved as a metagraph but we can't really change much there easily after importing it. I was looking for a way to have the actual graph making code and then just restoring the weights by restoring the checkpoint file without the need for metagraph file. The reason is that I want to restore all weights to the full graph initially but then experiment with the graph by taking outputs at different layers of the resnet, which is only possible if I have the actual graph in code and I know the name of each layer. Do you think there's a way to get this code, ignoring other metadata like hyperparams. I think the info on this is there in convert.py and resnet.py but it's not clear enough to manually extract the desired code.
when the program is running to"tf.train.start_queue_runners(sess=sess)"
then error.
ERROR:tensorflow:Exception in QueueRunner: Cast string to int64 is not supported
[[Node: Cast_1 = CastDstT=DT_INT64, SrcT=DT_STRING, _device="/job:localhost/replica:0/task:0/cpu:0"]]
my label dtype: int64 and img dtype:float32
how to fix? @ry
thx
I want to use the pretrained residual network weights and finetune it on my own dataset. There is an example given in tensorflow examples/image_retraining/retrain.py which loads a pretrained inception model and relearns the last layer. In this the model is loaded in .pb format. Can anyone give the code to convert checkpoint files to .pb format.
I've been getting puzzled about loading the .ckpt+.meta pre-trained model, and I really need a .tfmodel one. Could you please help me?
Thanks a loooooooooot!
hi
can you show the txt for imagenet
I can not download imagenet, so you can give me the the txt for imagenet , just a little .
I want to run your train_imagent.py , but it seem need txt and jpg
Hi,
I am trying to add a new fc layer on top of the avgpool resnet layer with a different number of outputs to suit my problem.
I do not want to only retrain the new fc but also the previous layers. So I need the gradients of the previous layers as well. Unfortunately this does not seem to work.
I have tried on a dummy net that I have created to save it (without the gradients -- so similar to the provided resnet meta and ckpt) and then load it and add a new fc layer and this worked without problems.
Here is a snapshot of my retraining code:
# Start the session:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=False))
# Gets data batches.
trainimages, trainlabels = dataAsTensors(is_training=True, batch_size=FLAGS.batch_size)
# In the default graph:
graph = tf.get_default_graph()
with graph.as_default():
# Data saver loading the graph meta only.
dataSaver = tf.train.import_meta_graph('ResNet-L50.meta')
for op in graph.get_operations():
print op.name
# Get both the 'avg_pool' and the 'images' operations.
images = graph.get_tensor_by_name("images:0")
avgpool = graph.get_tensor_by_name('avg_pool:0')
# Define a new fc layer on top of the avg_pool layer
logits, _ = fc_num_outs(avgpool, FLAGS.num_classes, FLAGS.avgpool_size)
# Define the loss on top of the new fc and a placeholder for the labels
labelsVar = tf.placeholder(tf.int64, shape=(FLAGS.batch_size), name='labelsVar')
loss_ = loss(logits, labelsVar)
# Define the gradients and get the operation.
global_step = tf.Variable(0, name='global_step', trainable=False)
ops = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
train_op = ops.minimize(loss_, global_step=global_step)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess, coord=coord)
with sess.as_default():
# Initialize all variables.
sess.run(tf.initialize_all_variables())
# Restore the RESNET checkpoint after initialization.
dataSaver.restore(sess, "ResNet-L50.ckpt")
for i in range(0, FLAGS.max_steps):
# Feed the batch images and the labels.
npImages = trainimages.eval()
npLabels = trainlabels.eval()
# Run 1 step of the gradient optimization.
sess.run(train_op, {images: npImages, labelsVar: npLabels})
print "Done running grad step.. ", i
if (i % 100 == 0): # Save the checkpoint
dataSaver.save(sess, 'resnet_retrained' + str(i) + '.ckpt')
coord.request_stop()
coord.join(threads)
sess.close()
I am not sure why for the resnet model I get this error:
File "retrain.py", line 278, in main
retrain()
File "retrain.py", line 244, in retrain
train_op = ops.minimize(loss_, global_step=global_step)
File "tensorflow/python/training/optimizer.py", line 193, in minimize grad_loss=grad_loss)
File "tensorflow/python/training/optimizer.py", line 250, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops)
File "tensorflow/python/ops/gradients.py", line 467, in gradients out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i)
File "tensorflow/python/ops/control_flow_ops.py", line 1047, in ZerosLikeOutsideLoop pred = op_ctxt.pred
AttributeError: 'NoneType' object has no attribute 'pred'
while for my own toy model the same code seems to work.
Thanks a lot.
Cheers,
Silvia
Hi,
Great job providing the code and the pre-trained resnet models.
However, I have a problem loading/restoring the torrent models.
with tf.Session() as sess:
dataSaver = tf.train.import_meta_graph('ResNet-L50.meta')
dataSaver.restore(sess, 'ResNet-L50')
I get the error:
tensorflow.python.framework.errors.DataLossError: Unable to open table file ResNet-L50.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Any suggestion/help is much appreciated.
Best,
Silvia
Hello,
the following error occurred when I execute the forward.py file
Traceback (most recent call last):
File "forward.py", line 11, in
new_saver.restore(sess, [checkpoint_fn(layers)])
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/msf/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor u'save/Const:0', which has shape '()'
Could you please help me?
Thanks much!
Omid
Hello,
I wanted to know if the ImageNet-Resnet example is complete as I would like to test it out. I saw in the Readme that it wasn't?
I also don't understand how https://github.com/ry/tensorflow-resnet/blob/master/train_imagenet.py#L94 works in loading all the images of the dataset. It seems to be returning a tensor of size of FLAGS.batch_size
Any update would be really appreciated.
Thank you,
Ankur
How come cafe model uses the mean subtracted image, while tensorflow model uses the original image?
caffe_model = load_caffe(img_p, layers)
vs.
o = sess.run(i, {images: img[np.newaxis, :]})
I dont see _imagenet_preprocess() being used anywhere
I think you may have unintentionally deleted image net preprocessing in this commit
6b42dfa
the original code in resnet.py within inference_small function didn't have return values!!!so add the following code at its last
logits=inference_small_config(x, c)
return logits
Running convert.py outputs the error:
Traceback (most recent call last):
File "convert.py", line 343, in <module>
tf.app.run()
File "/home/mifs/mttt2/.virtualenvs/tfr1.0/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "convert.py", line 339, in main
convert(g, img, img_p, layers)
File "convert.py", line 259, in convert
bottleneck=True)
TypeError: inference() got an unexpected keyword argument 'preprocess'
The current version of resnet does not seem to have a preprocess option. Is this still required?
if step > 1 and step % 100 == 0:
_, top1_error_value = sess.run([val_op, top1_error], { is_training: False })
print('Validation top1 error %.2f' % top1_error_value)
So this is the last line of resnet_train.
I am wondering if this is actually giving the user the so-called "test error" where the images are from the test sets.
Can anyone answer my question?
I have been trying to run your forward.py
code. But I get the following error message
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ResNet-L50.ckpt
in
"forward.py", line 11, in <module> new_saver.restore(sess, checkpoint_fn(layers))
How can I solve this issue? I am using tensorflow .10 with python 2.7
How exactly is ResNet fully convolutional?
In the original implementation, there is fc layer at the end...
It'd be cool to make it fully convolutional though :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.