Giter VIP home page Giter VIP logo

finetune_alexnet_with_tensorflow's Introduction

Hi there 👋

I'm Frederik Kratzert, Researcher @ Google working in the Flood Forecasting Team. This here is my private GitHub account, were I maintain my open source projects and publish code related to any kind of research articles.

🤓 Research 🤔

Most of my research is dedicated towards solving applications in environmental sciences (mainly hydrology) with machine learning.

Links

finetune_alexnet_with_tensorflow's People

Contributors

kratzert avatar wogong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finetune_alexnet_with_tensorflow's Issues

ZeroDivisionError: float division by zero

Hi,
I tried reproducing the example keeping it as simple as possible. I have train.txt as:
images/cat1.png 0
images/cat2.png 0
images/cat3.png 0
images/dog1.png 1
images/dog2.png 1
images/dog3.png 1
And test.txt as:
images/cat4.png 0
images/dog4.png 1
when i run the finetune.py i get this error:
Traceback (most recent call last):
File "finetune.py", line 163, in
test_acc /= test_count
ZeroDivisionError: float division by zero
Tried debugging the error and found that val_batches_per_epoch is 0 and the inner body isn't executing.

New DataGenerator gets worse accuracy than old DataGenerator?

Using the new ImageDataGenerator (June 15 update), I get lower accuracy than the old ImageDataGenerator (using cv2).

I have changed the new ImageDataGenerator very slightly (tf.image.decode_png => tf.image.decode_jpeg, VGG_MEAN => IMAGENET_MEAN), but still get significant difference in accuracy. To benchmark the two datagenerators, I am loading in the BVLC AlexNet weights and validating on the ImageNet validation set without shuffle.

With the old cv2 ImageDataGenerator, I get 55.72% Top-1 Accuracy and 79.08% Top-5 Accuracy. With the new TF ImageDataGenerator, I get 48.58% Top-1 Accuracy and 73.21% Top-5 Accuracy.

I suspect this has to do with how TF loads in images as opposed to CV2. Here is what the two different image processing steps look like:

Old CV2 ImageDataGenerator:

def next_batch(self, batch_size):
        """
        This function gets the next n ( = batch_size) images from the path list
        and labels and loads the images into them into memory 
        """
        # Get next batch of image (path) and labels
        paths = self.images[self.pointer:self.pointer + batch_size]
        labels = self.labels[self.pointer:self.pointer + batch_size]
        
        #update pointer
        self.pointer += batch_size
        
        # Read images
        images = np.ndarray([batch_size, self.scale_size[0], self.scale_size[1], 3])
        for i in range(len(paths)):
            img = cv2.imread(paths[i])
            
            #flip image at random if flag is selected
            if self.horizontal_flip and np.random.random() < 0.5:
                img = cv2.flip(img, 1)
            
            #rescale image
            img = cv2.resize(img, (self.scale_size[0], self.scale_size[1]))
            img = img.astype(np.float32)
            
            #subtract mean, which is np.array([104., 117., 124.])
            img -= self.mean
                                                                 
            images[i] = img

        # Expand labels to one hot encoding
        one_hot_labels = np.zeros((batch_size, self.n_classes))
        for i in range(len(labels)):
            one_hot_labels[i][labels[i]] = 1

        #return array of images and labels
        return images, one_hot_labels

New TF ImageDataGenerator:

def _parse_function_inference(self, filename, label):
        """Input parser for samples of the validation/test set."""
        # convert label number into one-hot-encoding
        one_hot = tf.one_hot(label, self.num_classes)

        # load and preprocess the image
        img_string = tf.read_file(filename)
        img_decoded = tf.image.decode_jpeg(img_string, channels=3)
        img_resized = tf.image.resize_images(img_decoded, [227, 227])
        
        IMAGENET_MEAN = tf.constant([104., 117., 124.], dtype=tf.float32)
        img_float = tf.to_float(img_resized)
        # RGB -> BGR
        img_bgr = img_float[:, :, ::-1]
        img_centered = tf.subtract(img_bgr, IMAGENET_MEAN)

        return img_centered, one_hot

I believe I am using the CV2 Datagenerator correctly as I can successfully finetune and call train_generator.next_batch(batch_size) within the training and validation loops with no issue. I built off of the finetune.py file you provided for the new TF Datagenerator (I use cpu device cpu:0 and then run the datagenerator init ops), and then get batches within the training/validation loops with img_batch, label_batch = sess.run(next_batch).

Any advice? Thanks a lot for providing and maintaining this proejct. Besides for this issue, it was super easy to work with and modify!

Can I see the fc7 output?

Hello, I've never worked with tensorflow before and I cannot really understand how to see some of the output. I am trying to print the the values that fc7 outputs for every image in the testing set. Is there any way I can do this?

Where is the bvlc_alexnet.npy ?

Hello , I try to use your code in my data ,and I create the txt file, by the way , I want to know the format of the txt ,in my txt file , the first column is the photo directory ,and the second is the label . Does this correct? Besides this ,I run the program and it told me that :No such file or directory: 'bvlc_alexnet.npy'
I guess it was a weight file which has been trained before ,but I don't find it ,how should I do ?
Many thanks!

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [5]

hi, thanks for your sharing.
I am using your code to finetune alexnet to classify images into blur or clear. When I run finetune.py with my 'train.txt' and 'val.txt', after dozens of train_batches, I get this error 👎 tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [5]
[[Node: unstack_1 = UnpackT=DT_INT32, axis=0, num=4]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,227,227,3], [?,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

I use python3.4, TF1.4.0

converting one hot lables for multiple float numbers

How can i properly convert one hot labels for multiple float numbers? I have tried several methods but none work.

one_hot_labels = np.zeros((batch_size, self.n_classes))
		for i in range(len(labels)):
			for j in range(len(labels[i])):
			    one_hot_labels[i][labels[j]] = 1
		return images, one_hot_labels

Since i have multiple floating point numbers per image that's why i have to use another loop to iterate over its contents.

The error which i get is :
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Found this useful library but has floating point conversion errors.

Also tried manually converting labels list to numpy array and then re-evaluating, problem still exist.

AttributeError: 'list' object has no attribute 'copy'

I run my own dataset,it apper the fllowing error,why?\

Traceback (most recent call last):
File "finetune.py", line 109, in
horizontal_flip = True, shuffle = True)
File "/mnt/data/wxb/finetune_alexnet_with_tensorflow/datagenerator.py", line 33, in init
self.shuffle_data()
File "/mnt/data/wxb/finetune_alexnet_with_tensorflow/datagenerator.py", line 55, in shuffle_data
images = self.images.copy()
AttributeError: 'list' object has no attribute 'copy

validation accuracy at 50% instead of 60%?

I get 50% accuracy on the ILSVRC2012 validation set without training, which falls short of the 60% boasted in the AlexNet paper (i.e 40.7% top-1 error rate).

Any idea what could be the problem? I'm thinking this could be due to the parameters of the local_response_normalisation layer: yours seem different than the caffe AlexNet here. For instance you have alpha=1e-5 instead of 1e-4. I've been playing with those but have yet to find a configuration that gets me to 60%.

Error finetuning

Hello,

Thanks for the share. I successfully installed and validated it with Imagenet pictures.
I also created the train.txt en val.txt files.
But when I'm trying to finetune AlexNet, I get the following error :

`WARNING:tensorflow:From`

 C:/Users/Fabien/PycharmProjects/finetune_alexnet_with_tensorflow/finetune.py:98: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

2018-04-01 20:11:34.920700: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-04-01 20:11:36.529273 Start training...
2018-04-01 20:11:36.529273 Open Tensorboard at --logdir /tmp/finetune_alexnet/tensorboard
2018-04-01 20:11:36.529273 Epoch number: 1
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1\helpers\pydev\pydevd.py", line 1664, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1\helpers\pydev\pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1\helpers\pydev\pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/Fabien/PycharmProjects/finetune_alexnet_with_tensorflow/finetune.py", line 175, in <module>
    keep_prob: dropout_rate})
  File "C:\Users\Fabien\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
    run_metadata_ptr)
  File "C:\Users\Fabien\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1109, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "C:\Users\Fabien\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\numeric.py", line 492, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: b'C:\\Users\\Fabien\\PycharmProjects\\finetune_alexnet_with_tensorflow\\flowers_train\\class13\\image_1026.jpg'

And I can't find out where this come from and how to solve it
Thanks for helping me :)

Codes for generating .txt files?

Dear kratzert,
Thanks for the excellent work. And I am trying to fine tune with tensorflow with my own dataset. I have only images and no generated .txt files for now.
Aren't there any codes in the downloaded .zip files for generating .txt files?

mybe a bug in finetune.py

hi
I found my Validation Accuracy was a constant value during 10 epochs or enven 40 epochs. I am very confused.
I found in Validate:
sess.run(validation_init_op)
.....
img_batch, label_batch = sess.run(next_batch)

but the next_batch only find in
iterator = Iterator.from_structure(tr_data.data.output_types,
tr_data.data.output_shapes)
next_batch = iterator.get_next()

so, is there any possibility a bug here, and should add like this:
iterator_val = Iterator.from_structure(val_data.data.output_types,
val_data.data.output_shapes)
next_batch_val = iterator_val.get_next()

Low Accuracy than Caffe

I fine-tuned using caffe and I got 61% accuracy whereas by using this code in tensorflow I got 56% accuracy. For both cases I trained only fc8 layer. Can you explain why it is?

'utf-8' codec can't encode character '\udcc0' in position 2120: surrogates not allowed

Hi

I met this error message when executing finetune.py. (shown as follows)
Environment: Win 7, Anaconda 4.3.0 , Python 3.6.1
Do you have any ideas about what happened?

Traceback (most recent call last):

File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2898, in run_code
self.showtraceback()

File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 1826, in showtraceback
self._showtraceback(etype, value, stb)

File "C:\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 554, in _showtraceback
dh.parent_header, ident=topic)

File "C:\Anaconda3\lib\site-packages\jupyter_client\session.py", line 712, in send
to_send = self.serialize(msg, ident)

File "C:\Anaconda3\lib\site-packages\jupyter_client\session.py", line 607, in serialize
content = self.pack(content)

File "C:\Anaconda3\lib\site-packages\jupyter_client\session.py", line 103, in
ensure_ascii=False, allow_nan=False,

File "C:\Anaconda3\lib\site-packages\zmq\utils\jsonapi.py", line 43, in dumps
s = s.encode('utf8')

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc0' in position 2120: surrogates not allowed

where is the softmax?

It seems that

tf.nn.softmax_cross_entropy_with_logits

needs outputs to be a softmax,right?
I'm just curious about that,no offence

Finetunning crashed due to an OSError exception

I get this exception at the trainning part, but it doesn't happen always (it looks a random issue):

2018-04-18 15:02:15.902533 Start validation Traceback (most recent call last): File "finetune.py", line 188, in <module> print("{} Start validation".format(datetime.now())) OSError: raw write() returned invalid length 90 (should have been between 0 and 45)

I'm using Windows 10 64 bits with Python 3.5.2 :: Anaconda 4.2.0 (64-bit) and Tensorflow 1.2.1.
My dataset is Dogs VS Cats.

Besides, I'd like to recover the already trained model, since it's very upsetting getting this crash after some epoches. As you indicated, I tried to change the load_initial_weights line to this line:
saver.restore(sess, "/path/to/checkpoint/model.ckpt")

But I have three cktp files (index, data and meta). What should I do?
Thanks!

slice operation

RGB -> BGR

    img_bgr = img_centered[:, :, ::-1]

Hi bro, I am not familiar with python. Could you explain some about slice operation of list, as above from your code.
Thanks!

Cross Entropy and Accuracy are not matched

Hi,

For a two-class problem, I got the following result.
image

The cross-entropy is very very small, but I can not get high accuracy.

I traced the code of computing loss value (following code),
"loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=score, labels=y))"
and found that the score need to be " Unscaled log probabilities" (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)

So it seems that you need to perform a log function to "score" first before feeding it to softmax function?

Please correct me if I'm wrong, and many thanks.

I cannot see anything in tensorboard

I have successfully trained alexnet for dogs vs cats dataset but when I run tensorboard I can only visualize this page:

tensorboard

The rest of the pages are inactive:

tensorboard2

I have tensorflow 1.3.0, and python 3.6.3 and I am using windows 10. Do you have any idea what can cause the problem?

I get the bvlc_alexnet.npy and have a question?

I get the shape like this:
conv1
0 conv1 (11, 11, 3, 96)
0 conv1 (96,)
conv2
1 conv2 (5, 5, 48, 256)
1 conv2 (256,)
conv3
2 conv3 (3, 3, 256, 384)
2 conv3 (384,)
conv4
3 conv4 (3, 3, 192, 384)
3 conv4 (384,)
conv5
4 conv5 (3, 3, 192, 256)
4 conv5 (256,)

how to convert shape conv2(5,5,48,256) to (5,5,96,256)
conv4 (3,3,192,384) to (3,3,384,384)
conv5(3,3,192,256) to (3,3,384,256)

tensorboard can not show the accuracy and loss curve

I follow your step to run the code but ended up with different result,the logs file can be save correctly but the scalar and histogram data can't be shown on the tensorboard,while we can see the graph with tensorboard,please help to solve the problem ,thanks in advance!

error occurs when running jupyter file on my linux server

I want to run file validate_alexnet_on_imagenet.ipynb to test the implement of AlexNet. When I run it on my Windows PC, it works well. But when I run it on my Linux server, which is much more powerful than my PC, it's stuck for a while and output a ResourceExhaustedError, which I believe occurred in this line:

sess.run(tf.global_variables_initializer())

Full output is shown below:

---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

ResourceExhaustedError: OOM when allocating tensor with shape[4096]
	 [[Node: fc6/biases/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@fc6/biases"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc6/biases/Initializer/random_uniform/shape)]]

During handling of the above exception, another exception occurred:

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-4-eb8e619c57ae> in <module>()
      2 
      3     # Initialize all variables
----> 4     sess.run(tf.global_variables_initializer())
      5 
      6     # Load the pretrained weights into the model

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1118     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:
   1122       results = []

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1315     if handle is None:
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:
   1319       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1334         except KeyError:
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 
   1338   def _extend_graph(self):

ResourceExhaustedError: OOM when allocating tensor with shape[4096]
	 [[Node: fc6/biases/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@fc6/biases"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc6/biases/Initializer/random_uniform/shape)]]

Caused by op 'fc6/biases/Initializer/random_uniform/RandomUniform', defined at:
  File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-5d95ec9c311f>", line 9, in <module>
    model = AlexNet(x, keep_prob, 1000, [])
  File "/home1/grayson/finetune_alexnet_with_tensorflow/alexnet.py", line 56, in __init__
    self.create()
  File "/home1/grayson/finetune_alexnet_with_tensorflow/alexnet.py", line 82, in create
    fc6 = fc(flattened, 6*6*256, 4096, name='fc6')
  File "/home1/grayson/finetune_alexnet_with_tensorflow/alexnet.py", line 177, in fc
    biases = tf.get_variable('biases', [num_out], trainable=True)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
    constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
    constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
    constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 805, in _get_single_variable
    constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 213, in __init__
    constraint=constraint)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 303, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 779, in <lambda>
    shape.as_list(), dtype=dtype, partition_info=partition_info)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/init_ops.py", line 445, in __call__
    shape, -limit, limit, dtype, seed=self.seed)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/random_ops.py", line 240, in random_uniform
    shape, dtype, seed=seed1, seed2=seed2)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_random_ops.py", line 473, in _random_uniform
    name=name)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home1/grayson/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096]
	 [[Node: fc6/biases/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@fc6/biases"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc6/biases/Initializer/random_uniform/shape)]]

Anyone has an idea about it?

Test the output model

Hi,
I finished training Alexnet and output some .meta files in checkpoints folder. Then I want to predicted a picture with the new model parameters. I followed the validate_alexnet_on_imagenet.ipynb file step by step and changed "model = AlexNet(x, keep_prob, 1000, [])" to "model = AlexNet(x, keep_prob, 2, [])", as in my processes the num_classes are 2. The error message is as follows:

Dimension 1 in both shapes must be equal, but are 2 and 1000. Shapes are [4096,2] and [4096,1000]. for 'fc8_1/Assign' (op: 'Assign') with input shapes: [4096,2], [4096,1000].

I am not sure whether I need to change any other codes in validate_alexnet_on_imagenet.ipynb

Thank you

AttributeError: 'list' object has no attribute 'copy'

I run the finetune.py
I get a error AttributeError: 'list' object has no attribute 'copy'

`File "/usr/local/lib/python2.7/dist-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "/usr/local/lib/python2.7/dist-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
builtins.execfile(filename, *where)

File "/home/lan/tensorflow/finetune_alexnet_with_tensorflow-5d751d62eb4d7149f4e3fd465febf8f07d4cea9d/finetune.py", line 109, in
horizontal_flip = True, shuffle = True)

File "datagenerator.py", line 31, in init
self.shuffle_data()

File "datagenerator.py", line 53, in shuffle_data
images = self.images.copy()

AttributeError: 'list' object has no attribute 'copy'`

ValueError: Variable conv1/W already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

I am trying to implement a siamese network for the first time. I don't have an experience with variable sharing. I don't know why I become this Error. Any help will be appreciated

`from future import division, print_function, absolute_import

import tensorflow as tf

import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression

Data loading and preprocessing

import tflearn.datasets.mnist as mnist
X, Y, testX, testY = mnist.load_data(one_hot=True)
X = X.reshape([-1, 28, 28, 1])
testX = testX.reshape([-1, 28, 28, 1])

def tower_network(reuse= False):
network = tflearn.input_data(shape=(None,28,28,1))
network = tflearn.conv_2d(network, 32,1, activation='relu',reuse=reuse, scope='conv1')
network = tflearn.conv_2d(network, 64,1, activation='relu',reuse=reuse, scope='conv2')
network = tflearn.conv_2d(network, 128,1, activation='relu',reuse=reuse, scope='conv3')

network = tflearn.max_pool_2d(network, 2, strides=2)

network = tflearn.fully_connected(network, 512, activation='relu',reuse=reuse, scope='fc1')

network = tflearn.dropout(network, 0.5)
return network

def similarity_network( net1, net2):
num_classes = 2
network = tflearn.merge([net1,net2], mode='concat', axis=1, name='Merge') # merge net1 and net2 networks
# fully connected layers
network = tflearn.fully_connected(network, 2048, activation='relu')
network = tflearn.dropout(network, 0.5)
network = tflearn.fully_connected(network, 2048, activation='relu')
network = tflearn.dropout(network, 0.5)
# softmax layers
network = tflearn.fully_connected(network, num_classes, activation='softmax')
return network

net1 = tower_network()
net2 = tower_network(reuse=True)

#similarity network
network = similarity_network( net1, net2)
#output layer
#network = tflearn.regression(network, optimizer='sgd', loss='hinge_loss', learning_rate=0.02)
network = tflearn.regression(network, optimizer='sgd', loss='categorical_crossentropy', learning_rate=0.02)

Training

model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': X}, {'target': Y}, n_epoch=20,
validation_set=({'input': testX}, {'target': testY}),
snapshot_step=100, show_metric=True, run_id='convnet_mnist')`

About Prepocess

Hello there
In the datagenerator I find the VGG ,can you tell we what it is .
And I find that you use it to make the data centerned
But for the variance is required to equal to one , how to solve it
I really do not know how to do the preprocess , may I was wrong
Hope for response , thank you!

Change the code to make regression

Dear kratzert,
First, thanks for your amazing work !
I would like to use your code to solve a regression problem, do have some time to help me ?
Best.

Question about "padding" in pool and conv function

Hi, this is not an issue but a question:

watching your code i wanna ask you why you pass padding = VALID when you call function here

and in your definition of that function you pass padding = SAME

Same thing about conv

Thanks for the huge work.
Hope you can solve my doubt

alexnet architecture matches?

hi, I did some math computation and check the original paper:
the first convolution filter is [11,11,3,96], there is no maxpool,
second convolution filter is [5,5,96,256], and then we get maxpool
but in your code

        conv1 = conv(self.X, 11, 11, 96, 4, 4, padding='VALID', name='conv1')
        norm1 = lrn(conv1, 2, 2e-05, 0.75, name='norm1')
        pool1 = max_pool(norm1, 3, 3, 2, 2, padding='VALID', name='pool1')
        
        # 2nd Layer: Conv (w ReLu)  -> Lrn -> Pool with 2 groups
        conv2 = conv(pool1, 5, 5, 256, 1, 1, groups=2, name='conv2')
        norm2 = lrn(conv2, 2, 2e-05, 0.75, name='norm2')
        pool2 = max_pool(norm2, 3, 3, 2, 2, padding='VALID', name='pool2')

, I didn't get it, which is not the same as the picture, can you explain?

Tensorboard

I cannot obtain the test accuracy from tensor-board. Can you provide any example of Embedding Visualization, please ?

Dataaugmentation promble

hi,
I want to add dataaugmentation in datagenerator.py. According to your comments in def _parse_function_train(self, filename, label):

        """
    Dataaugmentation comes here.
    """

say I want to flip_up_and down and flip_left_right, in this way, and corp from 4 corner, one input image could have 6 output images.

so my question is: does the _parse_function_train() function still work for data.map(self._parse_function_train,....). if not, how I add my dataaugmentation in the code?

thanks!


My point is in _parse_function_train() function, one input image could have 6 output images. Is that ok?

Image Rescale Error

I get the following error from datagenerator.py:
error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

finetune_alexnet_with_tensorflow-master/datagenerator.py", line 150, in next_batch img = cv2.resize(img, (self.scale_size[0], self.scale_size[0]))

I debugged code as follows:

			img = cv2.imread(paths[i])
			print(img);
			if img is None:
				print (paths[i] + " : fail to read")
			else:
				print("Image is read");
			exit();

Error suggests that image is not read while in a separate program i verified that image is read:

import numpy as np
import cv2

img = cv2.imread('images/ld2.png')
cv2.imwrite('messigray.png',img) 

Followed this solution but couldn't work.

Is this suitable for large scale dataset?

Hi! Thanks for your wonderful program!
In the "self._parse_function_train" function ,do you load all of your training dataset ?
I want to apply your code to video dataset,and should I load all of the training dataset in this function?
thanks!

train_layers only works for fc7 and fc8

Thanks for your contribution!
when i reuse your code, and i want to train fc6, fc7,fc8 , but the result:
Cross entropy = nan
if only train fc7,fc8 , it works well.
also i have test earlier layers ,, like conv1,conv2.., it still appear nan
do have have any suggestion to help me fix this?

Error in notebook: "ValueError: Variable conv1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope"

Hi, I am having some trouble running the third cell of the notebook. Error is:

ValueErrorTraceback (most recent call last)
<ipython-input-7-f7a1b7dd0c14> in <module>()
      7 
      8 #create model with default config ( == no skip_layer and 1000 units in the last layer)
----> 9 model = AlexNet(x, keep_prob, 1000, [])
     10 
     11 #define activation of last layer as score

/mnt/ilcompf6d0/user/txiao/DockerFiles/finetune_alexnet_with_tensorflow/alexnet.py in __init__(self, x, keep_prob, num_classes, skip_layer, weights_path)
     39 
     40     # Call the create function to build the computational graph of AlexNet
---> 41     self.create()
     42 
     43   def create(self):

/mnt/ilcompf6d0/user/txiao/DockerFiles/finetune_alexnet_with_tensorflow/alexnet.py in create(self)
     44 
     45     # 1st Layer: Conv (w ReLu) -> Pool -> Lrn
---> 46     conv1 = conv(self.X, 11, 11, 96, 4, 4, padding = 'VALID', name = 'conv1')
     47     pool1 = max_pool(conv1, 3, 3, 2, 2, padding = 'VALID', name = 'pool1')
     48     norm1 = lrn(pool1, 2, 2e-05, 0.75, name = 'norm1')

/mnt/ilcompf6d0/user/txiao/DockerFiles/finetune_alexnet_with_tensorflow/alexnet.py in conv(x, filter_height, filter_width, num_filters, stride_y, stride_x, name, padding, groups)
    131   with tf.variable_scope(name) as scope:
    132     # Create tf variables for the weights and biases of the conv layer
--> 133     weights = tf.get_variable('weights', shape = [filter_height, filter_width, input_channels/groups, num_filters])
    134     biases = tf.get_variable('biases', shape = [num_filters])
    135 

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
   1047       collections=collections, caching_device=caching_device,
   1048       partitioner=partitioner, validate_shape=validate_shape,
-> 1049       use_resource=use_resource, custom_getter=custom_getter)
   1050 get_variable_or_local_docstring = (
   1051     """%s

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
    946           collections=collections, caching_device=caching_device,
    947           partitioner=partitioner, validate_shape=validate_shape,
--> 948           use_resource=use_resource, custom_getter=custom_getter)
    949 
    950   def _get_partitioned_variable(self,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
    354           reuse=reuse, trainable=trainable, collections=collections,
    355           caching_device=caching_device, partitioner=partitioner,
--> 356           validate_shape=validate_shape, use_resource=use_resource)
    357 
    358   def _get_partitioned_variable(

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource)
    339           trainable=trainable, collections=collections,
    340           caching_device=caching_device, validate_shape=validate_shape,
--> 341           use_resource=use_resource)
    342 
    343     if custom_getter is not None:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource)
    651                          " Did you mean to set reuse=True in VarScope? "
    652                          "Originally defined at:\n\n%s" % (
--> 653                              name, "".join(traceback.format_list(tb))))
    654       found_var = self._vars[name]
    655       if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable conv1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "alexnet.py", line 133, in conv
    weights = tf.get_variable('weights', shape = [filter_height, filter_width, input_channels/groups, num_filters])
  File "alexnet.py", line 46, in create
    conv1 = conv(self.X, 11, 11, 96, 4, 4, padding = 'VALID', name = 'conv1')
  File "alexnet.py", line 41, in __init__
    self.create()

This error was in both the May 22 commit (TF 1.0) and the latest commit, TF 1.12rc0. I am on TF 1.1.0.

Cheers!

Multiple columns in training data

Can you suggest a code snippet for multiple columns in the training file. Example train.txt contains one int column, what if i have more than one floating columns?

Thanks

Image converting RGB to BGR

Hello,

Can you tell me the reason why you are converting RGB images to BGR in datagenerator.py? if I keep images as RGB what will happen?

img_bgr = img_centered[:, :, ::-1]

Principal Component Analysis

Hi, the Alexnet paper listed PCA as a data augmentation method, I'm wondering whether this is implemented in your model as I can't seem to find it.

Classifying with Checkpoint

Hey, so I finished fine tuning my model, but I haven't really used tensorflow that much before. I understand how to load in the images I want to test on, but how do I use the checkpoint file to classify them?

Also thoughts on parameters for this dataset; 330 training images of Benign and Malignant, along with a validation and testing set of 40 images. I changed learning rate to 0.005 as well as batch size to 40, any other thoughts? I'm still training the last two layers, I'm not sure whether this is the right move.
Thanks!!

How about using tf.estimator for training and prediction?

Hello,Kratzert. Your blog and code are very nice,I am new to tensorflow and I learned a lot from them.
Did you try the class: tf.estimator?I found that you defined modes for training and inference in filedatagenerator.py, I refered to tensorflow tutorials and found tf.estimator can implement the same function.Why do you prefer not using this class?Maybe it has some disadvantages for summary operations?
: )

Nan in summary histogram for: fc8/biases_0

I run fintune.py,but I get some error.Could someone plz tell me what's the problem and how to modify the code to solve it? Thank you

the error is flowing:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: fc8/biases_0

about train/val set

hello , I meet trouble, in the train/val set import .
I want to know how to op, I can't overcome the problem.
I am design my deadline project, so I want to how op the Net.

something about dropout when testing...

I'm studying deep learning and as I learn it, dropout rate when less than 1, say 0.5 for example, then during testing it's traditional to keep all the units and multiply the outputs by 0.5 in order to make a good inference. However, I'm not sure if it's my unawareness or just a bug, it seems that the mechanism has not been implemented, I hope you can check it.

Confusion Matrix

Hello,

Thanks for the code. Actually, This is not an issue, It's just a question :)

I want to know how the classification model is working which, I need confusion matrix in order to understand. Tensorflow has a function for it which is (tf.confusion_matrix(labels,predictions,num_classes=None) but I am confused about what should be the labels and predictions values?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.