I've tried getting this to work but got stuck in the training step. No matter how few samples in the dataset, I run out of memory even though this runs on a GTX 1070 with 8GB VRAM.
Was there a change causing it to use more memory? Is there something to tweak?
I know this here is pretty much provided 'as-is' but would be happy about any pointers.
2017-10-13 00:34:32.073550: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:696] 1 Chunks of size 149082112 totalling 142.18MiB
2017-10-13 00:34:32.073739: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:696] 5 Chunks of size 198066176 totalling 944.45MiB
2017-10-13 00:34:32.073941: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:696] 1 Chunks of size 198066432 totalling 188.89MiB
2017-10-13 00:34:32.074106: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:696] 1 Chunks of size 396130304 totalling 377.78MiB
2017-10-13 00:34:32.074279: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:700] Sum Total of in-use chunks: 6.35GiB
2017-10-13 00:34:32.074451: I c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:702] Stats:
Limit: 6814913823
InUse: 6814913024
MaxInUse: 6814913536
NumAllocs: 2329
MaxAllocSize: 1624768512
2017-10-13 00:34:32.074716: W c:\l\work\tensorflow-1.1.0\tensorflow\core\common_runtime\bfc_allocator.cc:277] **************************************xx***************************x********************************
2017-10-13 00:34:32.074799: W c:\l\work\tensorflow-1.1.0\tensorflow\core\framework\op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[728]
Excepted with OOM when allocating tensor with shape[728]
[[Node: block12_sepconv1_bn/moments/sufficient_statistics/mean_ss = Sum[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=false, _device="/job:localhost/replica:0/task:0/gpu:0"](block12_sepconv1_bn/moments/sufficient_statistics/Sub, block12_sepconv1_bn/moments/sufficient_statistics/mean_ss/reduction_indices)]]
[[Node: block14_sepconv2_bn/moments/sufficient_statistics/Gather/_187 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9496_block14_sepconv2_bn/moments/sufficient_statistics/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'block12_sepconv1_bn/moments/sufficient_statistics/mean_ss', defined at:
File "model_xception.py", line 226, in <module>
model = Xception(include_top=True,weights=None)
File "model_xception.py", line 166, in Xception
x = BatchNormalization(name=prefix + '_sepconv1_bn')(x)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\keras\engine\topology.py", line 596, in __call__
output = self.call(inputs, **kwargs)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\keras\layers\normalization.py", line 177, in call
epsilon=self.epsilon)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\keras\backend\tensorflow_backend.py", line 1650, in normalize_batch_in_training
shift=None, name=None, keep_dims=False)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 642, in moments
y, axes, shift=shift, keep_dims=keep_dims, name=name)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 564, in sufficient_statistics
m_ss = math_ops.reduce_sum(m_ss, axes, keep_dims=keep_dims, name="mean_ss")
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1236, in reduce_sum
name=name)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2656, in _sum
keep_dims=keep_dims, name=name)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\windo\AppData\Local\conda\conda\envs\santosnet\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[728]
[[Node: block12_sepconv1_bn/moments/sufficient_statistics/mean_ss = Sum[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=false, _device="/job:localhost/replica:0/task:0/gpu:0"](block12_sepconv1_bn/moments/sufficient_statistics/Sub, block12_sepconv1_bn/moments/sufficient_statistics/mean_ss/reduction_indices)]]
[[Node: block14_sepconv2_bn/moments/sufficient_statistics/Gather/_187 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9496_block14_sepconv2_bn/moments/sufficient_statistics/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]