Comments (8)
Hi @oscarbg , this is something that we are actively looking into. As you noticed, tensorflow-directml's memory usage is very high at the moment, which is a problem when training with many batches. We will update this issue once we release a package that addresses these crashes.
from tensorflow-directml.
fails similarly on Vega:
>> AI-Benchmark-v.0.1.2
>> Let the AI Games begin..
* TF Version: 1.15.3
* Platform: Windows-10-10.0.19564-SP0
* CPU: N/A
* CPU RAM: 32 GB
* GPU/0: N/A
* GPU RAM: N/A GB
* CUDA Version: 11.0
* CUDA Build: V11.0.167
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
1.1 - inference | batch=50, size=224x224: 106 ± 33 ms
1.2 - training | batch=50, size=224x224: 10541 ± 176 ms
2/19. Inception-V3
2.1 - inference | batch=20, size=346x346: 1238 ± 29 ms
2020-06-28 23:35:55.556323: F tensorflow/core/common_runtime/dml/dml_command_recorder.cc:150] Check failed: (((HRESULT)((((HRESULT)0x8007000EL)))) >= 0) == true (0 vs. 1)
EDIT: on WSL2 fails even earlier on Vega:
after
1.1 - - inference | batch=50, size=224x224: 136 ± 22 ms
it crashes and ends WSL2 process..
from tensorflow-directml.
Thanks @PatriceVignola!
good to know devs are aware and working on it..
from tensorflow-directml.
Hey @oscarbg , we just released tensorflow-directml 1.15.3.dev200911 with many improvements to the memory allocator. You can try it out and tell us how it goes!
Also, since we have now open-sourced our fork, new tensorflow-directml issues should be opened over here.
from tensorflow-directml.
Hi @PatriceVignola,
thanks for update!
new update works very nice..
memory usage is good now..
now seems only remaining issue is upping the performance vs CUDA target..
on Titan V DirectML I get:
Device Inference Score: 6468
Device Training Score: 5271
Device AI Score: 11739
on CUDA I got:
Device Inference Score: 15245
Device Training Score: 15619
Device AI Score: 30864
so basically a 2x-3x performace loss using DirectML vs CUDA right now..
posting full benchmark on Titan V on 460.15 drivers:
>>> from ai_benchmark import AIBenchmark
>>> results = AIBenchmark().run()
>> AI-Benchmark-v.0.1.2
>> Let the AI Games begin..
* TF Version: 1.15.3
* Platform: Windows-10-10.0.20180-SP0
* CPU: N/A
* CPU RAM: 32 GB
* GPU/0: N/A
* GPU RAM: N/A GB
* CUDA Version: N/A
* CUDA Build: N/A
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
1.1 - inference | batch=50, size=224x224: 56.2 ± 7.3 ms
1.2 - training | batch=50, size=224x224: 1268 ± 10 ms
2/19. Inception-V3
2.1 - inference | batch=20, size=346x346: 87.4 ± 5.0 ms
2.2 - training | batch=20, size=346x346: 447 ± 7 ms
3/19. Inception-V4
3.1 - inference | batch=10, size=346x346: 89.6 ± 4.8 ms
3.2 - training | batch=10, size=346x346: 412 ± 6 ms
4/19. Inception-ResNet-V2
4.1 - inference | batch=10, size=346x346: 89.4 ± 1.8 ms
4.2 - training | batch=8, size=346x346: 370 ± 5 ms
5/19. ResNet-V2-50
5.1 - inference | batch=10, size=346x346: 68.5 ± 2.6 ms
5.2 - training | batch=10, size=346x346: 276 ± 5 ms
6/19. ResNet-V2-152
6.1 - inference | batch=10, size=256x256: 109 ± 4 ms
6.2 - training | batch=10, size=256x256: 403 ± 8 ms
7/19. VGG-16
7.1 - inference | batch=20, size=224x224: 112 ± 2 ms
7.2 - training | batch=2, size=224x224: 86.8 ± 1.9 ms
8/19. SRCNN 9-5-5
8.1 - inference | batch=10, size=512x512: 131 ± 3 ms
8.2 - inference | batch=1, size=1536x1536: 117 ± 4 ms
8.3 - training | batch=10, size=512x512: 719 ± 13 ms
9/19. VGG-19 Super-Res
9.1 - inference | batch=10, size=256x256: 151 ± 3 ms
9.2 - inference | batch=1, size=1024x1024: 242 ± 4 ms
9.3 - training | batch=10, size=224x224: 843 ± 9 ms
10/19. ResNet-SRGAN
10.1 - inference | batch=10, size=512x512: 176 ± 6 ms
10.2 - inference | batch=1, size=1536x1536: 159 ± 5 ms
10.3 - training | batch=5, size=512x512: 479 ± 8 ms
11/19. ResNet-DPED
11.1 - inference | batch=10, size=256x256: 203 ± 2 ms
11.2 - inference | batch=1, size=1024x1024: 329 ± 5 ms
11.3 - training | batch=15, size=128x128: 484 ± 5 ms
12/19. U-Net
12.1 - inference | batch=4, size=512x512: 493 ± 7 ms
12.2 - inference | batch=1, size=1024x1024: 550 ± 16 ms
12.3 - training | batch=4, size=256x256: 488 ± 12 ms
13/19. Nvidia-SPADE
13.1 - inference | batch=5, size=128x128: 233 ± 6 ms
13.2 - training | batch=1, size=128x128: 556 ± 6 ms
14/19. ICNet
14.1 - inference | batch=5, size=1024x1536: 349 ± 4 ms
14.2 - training | batch=10, size=1024x1536: 1506 ± 7 ms
15/19. PSPNet
15.1 - inference | batch=5, size=720x720: 1086 ± 10 ms
15.2 - training | batch=1, size=512x512: 398 ± 7 ms
16/19. DeepLab
16.1 - inference | batch=2, size=512x512: 672 ± 4 ms
16.2 - training | batch=1, size=384x384: 474 ± 4 ms
17/19. Pixel-RNN
17.1 - inference | batch=50, size=64x64: 989 ± 7 ms
17.2 - training | batch=10, size=64x64: 2643 ± 7 ms
18/19. LSTM-Sentiment
18.1 - inference | batch=100, size=1024x300: 681 ± 13 ms
18.2 - training | batch=10, size=1024x300: 1388 ± 10 ms
19/19. GNMT-Translation
19.1 - inference | batch=1, size=1x20: 335 ± 5 ms
Device Inference Score: 6468
Device Training Score: 5271
Device AI Score: 11739
For more information and results, please visit http://ai-benchmark.com/alpha
from tensorflow-directml.
How do I run a single model using ai-benchmarks?
from tensorflow-directml.
How do I run a single model using ai-benchmarks?
I don't think it's possible without modifying the AIBenchmark scripts. You could (after pip-installing the package, for example) modify the loop in run_tests
(ai_benchmark/utils.py) to skip the models that you're not interested in.
from tensorflow-directml.
My benchmarking fails after 8th test...
>> AI-Benchmark-v.0.1.2
>> Let the AI Games begin..
* TF Version: 1.15.5
* Platform: Windows-10-10.0.22000-SP0
* CPU: N/A
* CPU RAM: 7 GB
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
1.1 - inference | batch=50, size=224x224: 132 ± 2 ms
1.2 - training | batch=50, size=224x224: 693 ± 1 ms
2/19. Inception-V3
2.1 - inference | batch=20, size=346x346: 150 ± 2 ms
2.2 - training | batch=20, size=346x346: 483 ± 2 ms
3/19. Inception-V4
3.1 - inference | batch=10, size=346x346: 162 ± 2 ms
3.2 - training | batch=10, size=346x346: 555 ± 11 ms
4/19. Inception-ResNet-V2
4.1 - inference | batch=10, size=346x346: 182 ± 2 ms
4.2 - training | batch=8, size=346x346: 514 ± 2 ms
5/19. ResNet-V2-50
5.1 - inference | batch=10, size=346x346: 80.4 ± 2.9 ms
5.2 - training | batch=10, size=346x346: 266 ± 1 ms
6/19. ResNet-V2-152
6.1 - inference | batch=10, size=256x256: 117 ± 2 ms
6.2 - training | batch=10, size=256x256: 498 ± 3 ms
7/19. VGG-16
7.1 - inference | batch=20, size=224x224: 116 ± 1 ms
7.2 - training | batch=2, size=224x224: 96.9 ± 1.5 ms
8/19. SRCNN 9-5-5
8.1 - inference | batch=10, size=512x512: 203 ± 4 ms
8.2 - inference | batch=1, size=1536x1536: 183 ± 5 ms
Traceback (most recent call last):
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,64,512,512] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator
[[{{node gradients/generator/Relu_grad/ReluGrad}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ai-test.py", line 3, in <module>
b.run()
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\__init__.py", line 64, in run
use_CPU=self.use_CPU, precision=precision, _type="full", start_dir=self.cwd)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\utils.py", line 635, in run_tests
sess.run(train_step, feed_dict={input_: data, target_: target})
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,64,512,512] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator
[[node gradients/generator/Relu_grad/ReluGrad (defined at C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Original stack trace for 'gradients/generator/Relu_grad/ReluGrad':
File "ai-test.py", line 3, in <module>
b.run()
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\__init__.py", line 64, in run
use_CPU=self.use_CPU, precision=precision, _type="full", start_dir=self.cwd)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\utils.py", line 615, in run_tests
subTest.optimizer, subTest.learning_rate, testInfo.tf_ver_2)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\utils.py", line 202, in constructOptimizer
train_step = optimizer.minimize(loss_)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\training\optimizer.py", line 403, in minimize
grad_loss=grad_loss)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\training\optimizer.py", line 512, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gradients_util.py", line 679, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gradients_util.py", line 350, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gradients_util.py", line 679, in <lambda>
lambda: grad_fn(op, *out_grads))
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\nn_grad.py", line 415, in _ReluGrad
return gen_nn_ops.relu_grad(grad, op.outputs[0])
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 11732, in relu_grad
"ReluGrad", gradients=gradients, features=features, name=name)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3371, in create_op
attrs, op_def, compute_device)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3440, in _create_op_internal
op_def=op_def)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1762, in __init__
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'generator/Relu', defined at:
File "ai-test.py", line 3, in <module>
b.run()
[elided 0 identical lines from previous traceback]
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\__init__.py", line 64, in run
use_CPU=self.use_CPU, precision=precision, _type="full", start_dir=self.cwd)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\utils.py", line 557, in run_tests
input_, output_, train_vars_ = getModelSrc(test, testInfo, sess)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\ai_benchmark\utils.py", line 241, in getModelSrc
tf.train.import_meta_graph(test.model_src, clear_devices=True)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\training\saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\training\saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3575, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3575, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3465, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "C:\Users\alexv\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1762, in __init__
self._traceback = tf_stack.extract_stack()
what should I do?
from tensorflow-directml.
Related Issues (20)
- Not able to use my own callbacks HOT 3
- Tensorflow-directml is not making any difference in processing times in GPU vs CPU HOT 1
- AMD APU support? HOT 2
- Cannot assign a device for operation embedding/embeddings/Initializer/random_uniform/ HOT 5
- directml on custom tensoflow build ? HOT 1
- Windows Camera post process(DMFT) with DirectML(Tensorflow) HOT 10
- It's not working on Intel Graphics 5500 HOT 1
- Use c api to load pb models HOT 1
- how to set the default device using cAPI?
- session run crashed when runing on nvidia gpu HOT 1
- Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. HOT 2
- unbox expects Dml at::Tensor as inputs HOT 1
- Is it not supports the amd gpus without rocm?
- Does directml support multi-GPU training
- TensorFlow-DirectML Does Not Exist HOT 7
- AttributeError: module 'tensorflow' has no attribute 'float32' HOT 1
- CPU instructions notification after installing
- python newer versions support
- tensorflow GPU on WSL2 produces a segmentationfault
- RC astro tool operations not supported in DirectML HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-directml.