Comments (14)
Hi @PasaOpasen,
Q: What means None
in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
A: It means there is no validate inference process within the max_latency_ms
. It might because the inference latency is too long, or the input data are not validate. You can try to increase max_latency_ms
, or share the model so that I can have a check.
Q: input_spec
output_names
A: If you provide the sample_input_data_path
, or there are not dynamic input shapes, these two arguments are not necessary. If you have inputs with dynamic shapes, like [batches, 3, height, width]
, you need to provide input spec
. batches, height, width
should be set to int with possible numbers in real inference senario.
from olive.
Hi @shonigs , I tried with the model in notebook tutorials and no issues appeared. I am not sure if the issue is related to your ONNX model. Could you please share the model you used ? Thanks.
from olive.
Hi, I am getting the same error: KeyError: 'throughput'.
The complete error log is:
ERROR conda.cli.main_run:execute(41): `conda run olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu` failed. (See above for error)
2022-08-03 12:54:12,827 - olive.__main__ - WARNING - OLive will call "olive setup" to setup environment first
2022-08-03 12:54:13,474 - olive.optimization_config - INFO - Checking the model file...
2022-08-03 12:54:14,821 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider']
2022-08-03 13:06:48,111 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-08-03 13:44:02,303 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
File "/home/axio/miniconda3/envs/oonxoptimizer/bin/olive", line 8, in <module>
sys.exit(main())
File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 438, in main
options.func(options)
File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 322, in model_opt
optimize(opt_config)
File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 36, in optimize
olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in parse_tuning_result
best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in <lambda>
best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'
I am executing the optimization with conda run -n onnxoptimizer olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu >& log.txt
The above error message is the contents of log.txt (see final part of the execution command above).
Please find my ONNX model here: https://get.hidrive.com/2qErePEy (Link valid until August 10, 2022)
from olive.
Hi @kbraun-axio ,
I think this error happended because the max_latency_ms is too small for cpu infernece.
You can augument max_latency_ms, or change the execution provider from cpu to cuda.
Here is test result on my local machine with command olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cuda >& log_olive.txt
from olive.
Hi @leqiao-1,
Thanks for your reply and the log output.
I will increase the max_latency_ms and try running the optimization again. I will post the results here.
Unfortunately, our inference machine does not have a Nvidia GPU (we only use a Nvidia GPU in our training server). Therefore, I cannot set the execution provider to CUDA.
from olive.
Hi @leqiao-1,
Today, I tried to run the optimization again. This time, I increased the max_latency_ms to 10,000. However, I got the same error.
I attached the output log and the olive_opt_results folder (without the optimized model because it is too large) for you.
Do you think max_latency_ms of 10,000 is still not enough?
Inference with the onnx runtime and the same onnx model that I am trying to optimize takes about 7.5 seconds.
log_olive.txt
olive_opt_result.zip
from olive.
Hi @kbraun-axio
The latency depends on the machine. On my side, the inference takes about 400ms with cpu. If you want to try, you can still increase the max_latency_ms. However, enev if it works, it may take long time to run the troughput optimization, since the latency is too long.
from olive.
Okay, thank you. The machine on which we want to run the inference is a 6 core AMD CPU with 8 GB RAM from 2012. It runs in a manufacturing / shop floor environment. They do not have the newest hardware. But maybe it would be better to use a more powerful machine, like an Nvidia Jetson device, which supports CUDA.
Besides that, I realized the optimization takes lots of RAM. Watching the processes with htop
showed a memory consmption of up to 12GB for the python process running OLive. But the machine only has 8GB RAM, so Ubuntu started to use swap memory from the hard disk, which is very slow. Is that intended or is 8GB RAM too less for OLive?
from olive.
Hi @kbraun-axio,
Are you using onnxruntime gpu package with --providers_list cpu ? I can reproduce memory consmption issue in this way.
If so, it's maybe because when checking model input info with ort infenrece sessions, OLive will try to create session with cuda. I think it's a bug in OLive, and we will fix it. As a workaround, you can uninstall onnxruntime gpu package and install the cpu version.
If not, please let me know your onnxruntime package version with pip list
. I will check if I can run into the same issue.
from olive.
Hi @leqiao-1,
Yes I was running the gpu packge with --providers_list cpu. My colleque uninstalled the package and installed the default (CPU) package. Now the memory consumption is in the normal range and not too high. Thanks for your hint.
But the other issue, the KeyError: 'throughput', persists even with the cpu package and even if we set the max_latency_ms to higher values. Maybe it fails because the system is too old, it is from 2012.
from olive.
Hi @kbraun-axio
It might be possible, since the inference latency is very high on your side.
from olive.
I have same issue with log:
2022-12-26 23:59:42,091 - olive.optimization_config - INFO - Checking the model file...
2022-12-26 23:59:42,547 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-12-26 23:59:52,402 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-12-26 23:59:56,936 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'DnnlExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
j:\aprbot\tmp\Optimize_ONNX_Models_Throughput_with_OLive.ipynb Cell 9 in <cell line: 27>()
[1](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=0) opt_config = OptimizationConfig(
[2](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=1)
[3](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=2) model_path = "./craft.onnx",
(...)
[24](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=23) test_num = 200
[25](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=24) )
---> [27](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=26) result = optimize(opt_config)
File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:36, in optimize(optimization_config)
32 quantization_optimize(optimization_config)
34 tuning_results = tune_onnx_model(optimization_config)
---> 36 olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
38 result_json_path = os.path.join(optimization_config.result_path, "olive_result.json")
40 with open(result_json_path, 'w') as f:
File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result(optimization_config, *tuning_results)
57 def parse_tuning_result(optimization_config, *tuning_results):
58 if optimization_config.throughput_tuning_enabled:
---> 59 best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
60 else:
61 best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]
File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result.<locals>.<lambda>(x)
57 def parse_tuning_result(optimization_config, *tuning_results):
58 if optimization_config.throughput_tuning_enabled:
---> 59 best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
60 else:
61 best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]
KeyError: 'throughput'
Running by:
opt_config = OptimizationConfig(
model_path = "./model.onnx",
sample_input_data_path="./input.npz",
result_path = "olive_opt_latency_result",
throughput_tuning_enabled=True,
openmp_enabled=False,
max_latency_percentile = 0.95,
max_latency_ms = 1000000,
threads_num = 1,
min_duration_sec=10000,
providers_list = ["cpu", "dnnl"],
inter_thread_num_list = [1],
intra_thread_num_list=[1],
execution_mode_list = ["sequential"],
ort_opt_level_list=['all'],
concurrency_num=4,
warmup_num = 20,
test_num = 200
)
result = optimize(opt_config)
Model is huge and inference is over 15secs but what do I wrong ? What means None
in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
? What over params should I set?
input_spec
output_names
are so necessary? What shape I should write in input spec
if the model has dynamic input like [batches, 3, height, width]
?
from olive.
@leqiao-1 Thank u for fast response!
Can u please try to do anything with this model https://github.com/PasaOpasen/_olive_craft ?
I tried several configurations but nothing changed. Its inference is about 15sec with 2 cores and the optimization works too long with huge test_num
or warmup_num
and gives almost no output
Also the optimization uses 6-8 cores with concurrency_num=1
and all my 12 cores with concurrency_num=2
and all my 16GB memory with concurrency_num>2
from olive.
If you have any further concerns or questions, please reopen this issue.
from olive.
Related Issues (20)
- [FR]: Measure multiple custom metrics in one evaluation pass HOT 1
- [Bug]: whisper gpu does not consume gpu resources HOT 1
- [Bug]: OnnxQuantization HOT 8
- [Bug]: No model is selected. Skip packaging output artifacts. whisper HOT 8
- [Question] Whisper GPU mobile NNAPI HOT 1
- [Bug]: ort-nightly-gpu not installing HOT 4
- [Bug]: `ActivationSymmetric` default is not enforced in Vitis AI quantizer HOT 6
- [Bug]: Vitis AI quantizer with symmetric uint8 activation wrongfully hard-codes `zero_point=0` HOT 6
- model_type field required (type=value_error.missing) HOT 5
- [Bug]: Vitis quantization doesn't work with ORT 1.16.
- [FR]: quantize a fine-tuned model HOT 7
- [FR]: Enable optimization by onnxruntime on stable_diffusion.py HOT 5
- [Bug]: ImportError: cannot import name 'load_model' from ' 6800xt model HOT 2
- CUDAExecutionProvider is not in the session providers HOT 3
- [Bug]: Running Olive with ROCMExecutionProvider. HOT 6
- [FR]: About path. HOT 2
- [FR]: Whisper int4 support HOT 14
- [Bug]: TypeError: __init__() got an unexpected keyword argument 'io_config' HOT 9
- [FR]: Any roadmap for arm architecture HOT 1
- Add ONNXRuntimePackages builds for Java HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from olive.