Actually the code generates the following error before the start of the training:
2022-02-18 13:55:36.786045: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 11 Chunks of size 86400000 totalling 906.37MiB
2022-02-18 13:55:36.786143: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 156301824 totalling 149.06MiB
2022-02-18 13:55:36.786242: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 345600000 totalling 988.77MiB
2022-02-18 13:55:36.786339: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 2.04GiB
2022-02-18 13:55:36.786447: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 2258003456 memory_limit_: 2258003559 available bytes: 103 curr_region_allocation_bytes_: 4516007424
2022-02-18 13:55:36.786613: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats:
Limit: 2258003559
InUse: 2188879360
MaxInUse: 2188879616
NumAllocs: 181
MaxAllocSize: 345600000
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-02-18 13:55:36.799434: W tensorflow/core/common_runtime/bfc_allocator.cc:474] ******************__*****************************************************************************xxx
2022-02-18 13:55:36.799597: W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory
ERROR - hyperopt - Failed after 0:00:38!
Traceback (most recent calls WITHOUT Sacred internals):
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1377, in _do_call
return fn(*args)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 660, in main
results = train_model(model, data_train, data_val, endpoints_total_val, lr_val, prior_val)
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 254, in train_model
train_step_ae.run(feed_dict=f_dic)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2755, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 5804, in _run_using_default_session
session.run(operation, feed_dict)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1396, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:
Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
self.run_commandline()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
return self.run(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
run()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
self.result = self.main_function(*args)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
self.optimize
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
setattr(self, attribute, function(self))
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
self.run_commandline()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
return self.run(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
run()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
self.result = self.main_function(*args)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
self.optimize
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
setattr(self, attribute, function(self))
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
Original stack trace for 'optimize/gradients_2/zeros_10':
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
self.run_commandline()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
return self.run(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
run()
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
self.result = self.main_function(*args)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
self.optimize
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
setattr(self, attribute, function(self))
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 477, in minimize
grads_and_vars = self.compute_gradients(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 603, in compute_gradients
grads = gradients.gradients(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 165, in gradients
return gradients_util._GradientsHelper(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 671, in _GradientsHelper
out_grads[i] = control_flow_state.ZerosLike(op, i)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 835, in ZerosLike
return _ZerosLikeV1(op, index)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 801, in _ZerosLikeV1
return array_ops.zeros(zeros_shape, dtype=val.dtype)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2927, in wrapped
tensor = fun(*args, **kwargs)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2988, in zeros
output = fill(shape, constant(zero, dtype=dtype), name=name)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 238, in fill
result = gen_array_ops.fill(dims, value, name=name)
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3508, in fill
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 740, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3776, in _create_op_internal
ret = Operation(
File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2175, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
0%| | 0/250 [00:31<?, ?it/s]