Giter VIP home page Giter VIP logo

Comments (14)

leqiao-1 avatar leqiao-1 commented on July 16, 2024 1

Hi @PasaOpasen,
Q: What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
A: It means there is no validate inference process within the max_latency_ms. It might because the inference latency is too long, or the input data are not validate. You can try to increase max_latency_ms, or share the model so that I can have a check.

Q: input_spec output_names
A: If you provide the sample_input_data_path, or there are not dynamic input shapes, these two arguments are not necessary. If you have inputs with dynamic shapes, like [batches, 3, height, width], you need to provide input spec. batches, height, width should be set to int with possible numbers in real inference senario.

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

Hi @shonigs , I tried with the model in notebook tutorials and no issues appeared. I am not sure if the issue is related to your ONNX model. Could you please share the model you used ? Thanks.

from olive.

kbraun-axio avatar kbraun-axio commented on July 16, 2024

Hi, I am getting the same error: KeyError: 'throughput'.

The complete error log is:

ERROR conda.cli.main_run:execute(41): `conda run olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu` failed. (See above for error)
2022-08-03 12:54:12,827 - olive.__main__ - WARNING - OLive will call "olive setup" to setup environment first
2022-08-03 12:54:13,474 - olive.optimization_config - INFO - Checking the model file...
2022-08-03 12:54:14,821 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider']
2022-08-03 13:06:48,111 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-08-03 13:44:02,303 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/home/axio/miniconda3/envs/oonxoptimizer/bin/olive", line 8, in <module>
    sys.exit(main())
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 438, in main
    options.func(options)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 322, in model_opt
    optimize(opt_config)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am executing the optimization with conda run -n onnxoptimizer olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu >& log.txt

The above error message is the contents of log.txt (see final part of the execution command above).

Please find my ONNX model here: https://get.hidrive.com/2qErePEy (Link valid until August 10, 2022)

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

Hi @kbraun-axio ,
I think this error happended because the max_latency_ms is too small for cpu infernece.
You can augument max_latency_ms, or change the execution provider from cpu to cuda.
Here is test result on my local machine with command olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cuda >& log_olive.txt

log_olive.txt

from olive.

kbraun-axio avatar kbraun-axio commented on July 16, 2024

Hi @leqiao-1,
Thanks for your reply and the log output.
I will increase the max_latency_ms and try running the optimization again. I will post the results here.
Unfortunately, our inference machine does not have a Nvidia GPU (we only use a Nvidia GPU in our training server). Therefore, I cannot set the execution provider to CUDA.

from olive.

kbraun-axio avatar kbraun-axio commented on July 16, 2024

Hi @leqiao-1,

Today, I tried to run the optimization again. This time, I increased the max_latency_ms to 10,000. However, I got the same error.
I attached the output log and the olive_opt_results folder (without the optimized model because it is too large) for you.

Do you think max_latency_ms of 10,000 is still not enough?

Inference with the onnx runtime and the same onnx model that I am trying to optimize takes about 7.5 seconds.

log_olive.txt
olive_opt_result.zip

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

Hi @kbraun-axio
The latency depends on the machine. On my side, the inference takes about 400ms with cpu. If you want to try, you can still increase the max_latency_ms. However, enev if it works, it may take long time to run the troughput optimization, since the latency is too long.

from olive.

kbraun-axio avatar kbraun-axio commented on July 16, 2024

Okay, thank you. The machine on which we want to run the inference is a 6 core AMD CPU with 8 GB RAM from 2012. It runs in a manufacturing / shop floor environment. They do not have the newest hardware. But maybe it would be better to use a more powerful machine, like an Nvidia Jetson device, which supports CUDA.

Besides that, I realized the optimization takes lots of RAM. Watching the processes with htop showed a memory consmption of up to 12GB for the python process running OLive. But the machine only has 8GB RAM, so Ubuntu started to use swap memory from the hard disk, which is very slow. Is that intended or is 8GB RAM too less for OLive?

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

Hi @kbraun-axio,
Are you using onnxruntime gpu package with --providers_list cpu ? I can reproduce memory consmption issue in this way.

If so, it's maybe because when checking model input info with ort infenrece sessions, OLive will try to create session with cuda. I think it's a bug in OLive, and we will fix it. As a workaround, you can uninstall onnxruntime gpu package and install the cpu version.

If not, please let me know your onnxruntime package version with pip list. I will check if I can run into the same issue.

from olive.

kbraun-axio avatar kbraun-axio commented on July 16, 2024

Hi @leqiao-1,
Yes I was running the gpu packge with --providers_list cpu. My colleque uninstalled the package and installed the default (CPU) package. Now the memory consumption is in the normal range and not too high. Thanks for your hint.

But the other issue, the KeyError: 'throughput', persists even with the cpu package and even if we set the max_latency_ms to higher values. Maybe it fails because the system is too old, it is from 2012.

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

Hi @kbraun-axio
It might be possible, since the inference latency is very high on your side.

from olive.

PasaOpasen avatar PasaOpasen commented on July 16, 2024

I have same issue with log:

2022-12-26 23:59:42,091 - olive.optimization_config - INFO - Checking the model file...
2022-12-26 23:59:42,547 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-12-26 23:59:52,402 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-12-26 23:59:56,936 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'DnnlExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
j:\aprbot\tmp\Optimize_ONNX_Models_Throughput_with_OLive.ipynb Cell 9 in <cell line: 27>()
      [1](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=0) opt_config = OptimizationConfig(
      [2](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=1) 
      [3](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=2)     model_path = "./craft.onnx",
   (...)
     [24](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=23)     test_num = 200
     [25](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=24) )
---> [27](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=26) result = optimize(opt_config)

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:36, in optimize(optimization_config)
     32     quantization_optimize(optimization_config)
     34 tuning_results = tune_onnx_model(optimization_config)
---> 36 olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
     38 result_json_path = os.path.join(optimization_config.result_path, "olive_result.json")
     40 with open(result_json_path, 'w') as f:

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result(optimization_config, *tuning_results)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result.<locals>.<lambda>(x)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

KeyError: 'throughput'

Running by:

opt_config = OptimizationConfig(

    model_path = "./model.onnx",
    sample_input_data_path="./input.npz",
    result_path = "olive_opt_latency_result",

    throughput_tuning_enabled=True,
    openmp_enabled=False,
    max_latency_percentile = 0.95,
    max_latency_ms = 1000000,
    threads_num = 1,
    min_duration_sec=10000,

    providers_list = ["cpu", "dnnl"],
    inter_thread_num_list = [1],
    intra_thread_num_list=[1],
    execution_mode_list = ["sequential"],
    ort_opt_level_list=['all'],

    concurrency_num=4,

    warmup_num = 20,
    test_num = 200
)

result = optimize(opt_config)

Model is huge and inference is over 15secs but what do I wrong ? What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99) ? What over params should I set?

input_spec output_names are so necessary? What shape I should write in input spec if the model has dynamic input like [batches, 3, height, width] ?

from olive.

PasaOpasen avatar PasaOpasen commented on July 16, 2024

@leqiao-1 Thank u for fast response!

Can u please try to do anything with this model https://github.com/PasaOpasen/_olive_craft ?

I tried several configurations but nothing changed. Its inference is about 15sec with 2 cores and the optimization works too long with huge test_num or warmup_num and gives almost no output

Also the optimization uses 6-8 cores with concurrency_num=1 and all my 12 cores with concurrency_num=2 and all my 16GB memory with concurrency_num>2

from olive.

leqiao-1 avatar leqiao-1 commented on July 16, 2024

If you have any further concerns or questions, please reopen this issue.

from olive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.