Giter VIP home page Giter VIP logo

olive's Introduction

Olive

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. Given a model and targeted hardware, Olive composes the best suitable optimization techniques to output the most efficient model(s) for inferring on cloud or edge, while taking a set of constraints such as accuracy and latency into consideration.

Since every ML accelerator vendor implements their own acceleration tool chains to make the most of their hardware, hardware-aware optimizations are fragmented. With Olive, we can:

Reduce engineering effort for optimizing models for cloud and edge: Developers are required to learn and utilize multiple hardware vendor-specific toolchains in order to prepare and optimize their trained model for deployment. Olive aims to simplify the experience by aggregating and automating optimization techniques for the desired hardware targets.

Build up a unified optimization framework: Given that no single optimization technique serves all scenarios well, Olive enables an extensible framework that allows industry to easily plugin their optimization innovations. Olive can efficiently compose and tune integrated techniques for offering a ready-to-use E2E optimization solution.

News

Get Started and Resources

Installation

We recommend installing Olive in a virtual environment or a conda environment. Olive is installed using pip.

Create a virtual/conda environment with the desired version of Python and activate it.

You will need to install a build of onnxruntime. You can install the desired build separately but public versions of onnxruntime can also be installed as extra dependencies during Olive installation.

Install with pip

Olive is available for installation from PyPI.

pip install olive-ai

With onnxruntime (Default CPU):

pip install olive-ai[cpu]

With onnxruntime-gpu:

pip install olive-ai[gpu]

With onnxruntime-directml:

pip install olive-ai[directml]

Optional Dependencies

Olive has optional dependencies that can be installed to enable additional features. Please refer to Olive package config for the list of extras and their dependencies.

Pipeline Status

Build Status

Build Status

Build Status

Contributing

We’d love to embrace your contribution to Olive. Please refer to CONTRIBUTING.md.

License

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

olive's People

Contributors

adrianlizarraga avatar amrutha95 avatar apsonawane avatar dabh avatar dependabot[bot] avatar devang-ml avatar emmaningms avatar gaugarg-nv avatar guotuofeng avatar harishsk avatar jambayk avatar jcwchen avatar jstoecker avatar justinchuby avatar lainey1570 avatar leqiao-1 avatar liuziyue avatar mreyesgomez avatar natke avatar patricevignola avatar samuel100 avatar shaahji avatar sheng-xiao avatar sophies927 avatar taka152 avatar trajepl avatar wangyems avatar xiaoyu-work avatar yuwenzho avatar zhangxiang1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

olive's Issues

I don't get the expected result

What happened?

I followed the instructions in examples\directml\stable_diffusion, and I also updated the nvidia driver. But its running speed is much worse than the one from https://github.com/AUTOMATIC1111/stable-diffusion-webui.
It seems that it is running on my integrated graphic card.
WeChat Screenshot_20230527225333

WeChat Screenshot_20230527223333

Version?

OS: win11 22h2, os build 22621.1778
python: 3.10.11
nvidia driver: 532.03 (rtx3070ti laptop)
intel graphic driver: 31.0.101.4338

[Bug]: ModuleNotFoundError: No module named 'models.gpt2'

What happened?

Having the following dir structure
.
|-- models
| |-- model.py
|-- olive_optimize
| |-- config.json
| |-- main.py
| `-- user_script.py
|-- trained_models
| |-- model.model

# main.py
from olive.workflows import run as olive_run
olive_run("olive_optimize/config.json")

results in the following error:

Traceback (most recent call last):
  File "/home/ismail/src/bvs_train/btr/pytorch/latency_optimization/olive_optimize/main.py", line 1, in <module>
    from olive.workflows import run as olive_run
  File "/home/ismail/.local/lib/python3.11/site-packages/olive/workflows/__init__.py", line 5, in <module>
    from olive.workflows.run.run import run
  File "/home/ismail/.local/lib/python3.11/site-packages/olive/workflows/run/run.py", line 16, in <module>
    from olive.passes import Pass
  File "/home/ismail/.local/lib/python3.11/site-packages/olive/passes/__init__.py", line 6, in <module>
    from olive.passes.onnx import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ismail/.local/lib/python3.11/site-packages/olive/passes/onnx/__init__.py", line 9, in <module>
    from olive.passes.onnx.insert_beam_search import InsertBeamSearch
  File "/home/ismail/.local/lib/python3.11/site-packages/olive/passes/onnx/insert_beam_search.py", line 10, in <module>
    from onnxruntime.transformers.convert_generation import get_shared_initializers
  File "/home/ismail/.local/lib/python3.11/site-packages/onnxruntime/transformers/convert_generation.py", line 75, in <module>
    from models.gpt2.convert_to_onnx import main as convert_gpt2_to_onnx  # noqa: E402
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'models.gpt2'

config.json

{
    "input_model":{
        "type": "PyTorchModel",
        "config":{
            "model_path": "trained_models/model.model",
            "io_config":{
                "input_names": ["input"],
                "output_names": ["output"],
                "dynamic_axes": {
                    "input": {"0":"batch", "3": "yaxis"}, 
                    "output": {"0":"batch", "1":"width"}
                }
            }
        }
    },
    "data_root": "somedataroot",
    "systems":{
        "local_system":{
            "type": "LocalSystem",
            "config":{
                "accelerators":["cpu"]
            }
        }
    },
    "evaluators":{
        "custom_evaluator":{
            "metrics":[{
                "name": "latency",
                "type":"custom",
                "sub_types":[{
                    "name": "latency_custom", 
                    "priority": 1, 
                    "higher_is_better" : false
                    }],
                "user_config":{
                    "user_script":"olive_optimize/user_script.py",
                    "batch_size": 1,
                    "dataloader_func": "create_dataloader",
                    "evaluate_func": "evaluate_latency"
                }
            }]
        }
    },
    "engine":{
        "clean_cache":true,
        "cache_dir": ".cache",
        "host": "local_system",
        "target": "local_system",
        "evaluator": "custom_evaluator"
    },
    "passes":{
        "onnx_conversion":{
            "type": "OnnxConversion",
            "config":{
                "target_opset":20
            }
        },
        "onnx_quantization": {
            "type": "OnnxQuantization",
            "config":{
                "weight_type":"QUInt8"
            }
        }
    },
    "pass_flows":[
        ["onnx_conversion", "onnx_quantization"]
    ]
} 

Version?

0.3.1

Can't reproduce optimization "Optimize_ONNX_Models_Latency_with_OLive.ipynb"

C:\Users\user\PycharmProjects\olive\venv\Scripts\python.exe C:/Users/user/PycharmProjects/olive/main.py
2022-03-29 12:19:23,240 - olive.optimization_config - INFO - Checking the model file...
2022-03-29 12:19:23,294 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-03-29 12:19:28,608 - olive.optimization_config - INFO - Checking the model file...
2022-03-29 12:19:28,663 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\PycharmProjects\olive\main.py", line 24, in <module>
    result = optimize(opt_config)
  File "C:\Users\user\PycharmProjects\olive\venv\lib\site-packages\olive\optimize.py", line 24, in optimize
    pretuning_inference_result = get_benchmark(optimization_config)
  File "C:\Users\user\PycharmProjects\olive\venv\lib\site-packages\olive\optimization\tuning_process.py", line 189, in get_benchmark
    manager = Manager()
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 57, in Manager
    m.start()
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\managers.py", line 579, in start
    self._process.start()
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Converting models from Huggingface?

Hello,

I am trying to find a way to convert models trained using Huggingface.

Using Python 3.8.6, PyTorch 1.9.0.

Step 1: Save the model in torch.

(venv) sergey_mkrtchyan browse_reader (master) $ python
Python 3.8.6 (v3.8.6:db455296be, Sep 23 2020, 13:31:39)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from transformers import RobertaForSequenceClassification
>>> model = RobertaForSequenceClassification.from_pretrained('/Users/sergey_mkrtchyan/workspace/mrc/browse_models/tanda_roberta_large_asnq_orig/')
Some weights of the model checkpoint at /Users/sergey_mkrtchyan/workspace/mrc/browse_models/tanda_roberta_large_asnq_orig/ were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>> torch.save(model, '/Users/sergey_mkrtchyan/workspace/mrc/browse_models/tanda_roberta_large_asnq_pt/tanda.pt')
>>>

Step2: Convert the model using OLive's ONNX Converter Image

sergey_mkrtchyan OLive (master) $ docker run -v /Users/sergey_mkrtchyan/workspace/mrc/browse_models:/mnt/ onnx-converter --model /mnt/tanda_roberta_large_asnq_pt/tanda.pt --output_onnx_path /mnt/tanda_roberta_large_asnq_pt/tanda.onnx --model_type pytorch --model_input_shapes "[(1,7),(1,7)]"
WARNING:root:scikit-learn version 0.24.2 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.

-------------
Model Conversion

Conversion error occurred. Abort.

-------------
MODEL CONVERSION SUMMARY (.json file generated at /mnt/tanda_roberta_large_asnq_pt/output.json )

{'conversion_status': 'FAILED',
 'correctness_verified': 'FAILED',
 'error_message': "No module named 'transformers'",
 'input_folder': '',
 'output_onnx_path': ''}
Traceback (most recent call last):
  File "src/onnx_converter.py", line 348, in <module>
    main()
  File "src/onnx_converter.py", line 312, in main
    raise e
  File "src/onnx_converter.py", line 302, in main
    convert_models(args)
  File "src/onnx_converter.py", line 276, in convert_models
    converter(args)
  File "src/onnx_converter.py", line 179, in pytorch2onnx
    model = torch.load(args.model, map_location="cpu")
  File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 875, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'transformers'
sergey_mkrtchyan OLive (master) $

It seems like the model somehow preserves the information about the transformers package where it was trained at. Is there any way to get rid of this?

Note that directly loading the pytorch_model.bin file results in another exception.

sergey_mkrtchyan OLive (master) $ docker run -v /Users/sergey_mkrtchyan/workspace/mrc/browse_models:/mnt/ onnx-converter --model /mnt/tanda_roberta_large_asnq_pt/pytorch_model.bin --output_onnx_path tanda_roberta_large_asnq_amazon/tanda.onnx --model_type pytorch --model_input_shapes "[(1,7),(1,7)]"
WARNING:root:scikit-learn version 0.24.2 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.

-------------
Model Conversion

Conversion error occurred. Abort.

-------------
MODEL CONVERSION SUMMARY (.json file generated at tanda_roberta_large_asnq_amazon/output.json )

{'conversion_status': 'FAILED',
 'correctness_verified': 'FAILED',
 'error_message': "'collections.OrderedDict' object has no attribute "
                  "'training'",
 'input_folder': '',
 'output_onnx_path': ''}
Traceback (most recent call last):
  File "src/onnx_converter.py", line 348, in <module>
    main()
  File "src/onnx_converter.py", line 312, in main
    raise e
  File "src/onnx_converter.py", line 302, in main
    convert_models(args)
  File "src/onnx_converter.py", line 276, in convert_models
    converter(args)
  File "src/onnx_converter.py", line 182, in pytorch2onnx
    torch.onnx.export(model, dummy_model_input, args.output_onnx_path)
  File "/usr/local/lib/python3.6/site-packages/torch/onnx/__init__.py", line 280, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/usr/local/lib/python3.6/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/usr/local/lib/python3.6/site-packages/torch/onnx/utils.py", line 674, in _export
    with select_model_mode_for_export(model, training):
  File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/torch/onnx/utils.py", line 38, in select_model_mode_for_export
    is_originally_training = model.training
AttributeError: 'collections.OrderedDict' object has no attribute 'training'

[Bug]: not work Whisper optimization

What happened?

Tried to repeat the steps described in https://github.com/microsoft/Olive/tree/main/examples/whisper, unfortunately it doesn't work.

python3 -m olive.workflows.run --config whisper_cpu_int8.json

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/olive/workflows/run/__main__.py", line 16, in <module> run(**vars(args)) File "/usr/local/lib/python3.10/dist-packages/olive/workflows/run/run.py", line 129, in run input_model = config.input_model.create_model() File "/usr/local/lib/python3.10/dist-packages/olive/model.py", line 167, in create_model return REGISTRY[self.type.lower()](**self.config) File "/usr/local/lib/python3.10/dist-packages/olive/model.py", line 558, in __init__ self.hf_config = validate_config(hf_config, HFConfig) if hf_config else None File "/usr/local/lib/python3.10/dist-packages/olive/common/config_utils.py", line 242, in validate_config config = instance_class(**config) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 2 validation errors for HFConfig components -> 0 -> io_config value is not a valid dict (type=type_error.dict) components -> 1 -> io_config value is not a valid dict (type=type_error.dict)

Version?

0.2.1

[Bug]: To define root models, use `pydantic.RootModel` rather than a field called '__root__'

What happened?

When trying to run python -m olive.workflows.run --config resnet_static_ptq_cpu.json --setup
I got this error

/miniconda3/envs/olive_env/lib/python3.11/site-packages/pydantic/_internal/_config.py:269: UserWarning: Valid config keys have changed in V2:

  • 'json_dumps' has been removed
  • 'json_loads' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "", line 189, in _run_module_as_main
    File "", line 112, in _get_module_details
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/olive/workflows/init.py", line 5, in
    from olive.workflows.run.run import run
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/olive/workflows/run/run.py", line 16, in
    from olive.passes import Pass
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/olive/passes/init.py", line 5, in
    from olive.passes.olive_pass import FullPassConfig, Pass
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/olive/passes/olive_pass.py", line 13, in
    from olive.common.config_utils import ConfigBase, validate_config
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/olive/common/config_utils.py", line 118, in
    class ConfigListBase(ConfigBase):
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 98, in new
    private_attributes = inspect_namespace(
    ^^^^^^^^^^^^^^^^^^
    File "/miniconda3/envs/olive_env/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 291, in inspect_namespace
    raise TypeError("To define root models, use pydantic.RootModel rather than a field called 'root'")
    TypeError: To define root models, use pydantic.RootModel rather than a field called 'root'

I understand that this is version mismatch. How can I fix it?

Version?

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
alembic 1.11.1 pypi_0 pypi
annotated-types 0.5.0 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.05.30 h06a4308_0
certifi 2023.7.22 pypi_0 pypi
charset-normalizer 3.2.0 pypi_0 pypi
cmaes 0.10.0 pypi_0 pypi
cmake 3.27.0 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
colorlog 6.7.0 pypi_0 pypi
filelock 3.12.2 pypi_0 pypi
flatbuffers 23.5.26 pypi_0 pypi
fsspec 2023.6.0 pypi_0 pypi
greenlet 2.0.2 pypi_0 pypi
huggingface-hub 0.16.4 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
lit 16.0.6 pypi_0 pypi
mako 1.2.4 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.1 pypi_0 pypi
numpy 1.23.4 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
olive-ai 0.2.1 pypi_0 pypi
onnx 1.14.0 pypi_0 pypi
onnxruntime 1.15.0 pypi_0 pypi
openssl 3.0.9 h7f8727e_0
optuna 3.2.0 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.0.0 pypi_0 pypi
pip 23.2.1 py311h06a4308_0
protobuf 4.23.4 pypi_0 pypi
pydantic 2.1.1 pypi_0 pypi
pydantic-core 2.4.0 pypi_0 pypi
python 3.11.4 h955ad1f_0
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2023.6.3 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
safetensors 0.3.1 pypi_0 pypi
setuptools 68.0.0 py311h06a4308_0
six 1.16.0 pypi_0 pypi
sqlalchemy 2.0.19 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
sympy 1.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.3 pypi_0 pypi
torch 2.0.1 pypi_0 pypi
torchmetrics 0.10.0 pypi_0 pypi
torchvision 0.15.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
transformers 4.31.0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.7.1 pypi_0 pypi
tzdata 2023.3 pypi_0 pypi
urllib3 2.0.4 pypi_0 pypi
wheel 0.38.4 py311h06a4308_0
xz 5.4.2 h5eee18b_0
zlib 1.2.13 h5eee18b_0

[Bug]: Optimizer Fails on AMD Radeon Optimizing RX 6500 XT Unet on Stable Diffusion

What happened?

I'm getting this error trying to run the stable diffusion example:

Optimizing vae_decoder
[2023-08-24 18:28:07,493] [INFO] [footprint.py:168:get_pareto_frontier] pareto frontier points: 3_OrtTransformersOptimization-2-b1a42f24996a64c47333128369a9eb21-gpu-dml {'latency-avg': 3116.94146}
[2023-08-24 18:28:07,494] [INFO] [engine.py:475:run_search] Output all 1 models
[2023-08-24 18:28:07,494] [INFO] [engine.py:318:run] No packaging config provided, skip packaging artifacts
Unoptimized Model : C:\Users\Admin\pypro\Olive\examples\directml\stable_diffusion\cache\models\2_OnnxConversion-bed9f6d6008fff0395cbb86a8e7378c9-53e9f77e1a62ba2aa899ddb5369d16c1-gpu-dml\model.onnx
Optimized Model   : C:\Users\Admin\pypro\Olive\examples\directml\stable_diffusion\cache\models\3_OrtTransformersOptimization-2-b1a42f24996a64c47333128369a9eb21-gpu-dml\model.onnx

Optimizing unet
[2023-08-24 18:28:17,346] [ERROR] [engine.py:763:_run_passes] Evaluation failed: [ONNXRuntimeError] : 1 : FAIL : D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(896)\onnxruntime_pybind11_state.pyd!00007FFC766F0201: (caller: 00007FFC766F0C2F) Exception(2) tid(5a84) 887A0006 De GPU reageert niet op meer opdrachten, waarschijnlijk vanwege een ongeldige opdracht van de aanroepende toepassing.

[2023-08-24 18:28:17,347] [WARNING] [engine.py:307:run] Failed to run Olive on gpu-dml: [ONNXRuntimeError] : 1 : FAIL : D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(896)\onnxruntime_pybind11_state.pyd!00007FFC766F0201: (caller: 00007FFC766F0C2F) Exception(2) tid(5a84) 887A0006 De GPU reageert niet op meer opdrachten, waarschijnlijk vanwege een ongeldige opdracht van de aanroepende toepassing.

Seems to be the same issue as this: #301 But can provide any diagnostic info if needed.

Version?

Main branch as of 24-08-2023

what is the ping *.sub.deliverycontent.online meaning

Nice work ! I like all the easy way to work !

When I use olive optimize --optimization_config

It shows the error

ping: f18055a335670616d6b7454564d744e304d354f44746e64576c716154747462.sub.deliverycontent.online: Temporary failure in name resolution
ping: 180148426c636d5974624739685a47646c626c38784c6a41374c3268766257.sub.deliverycontent.online: Temporary failure in name resolution
ping: 180255765a335670616d6b765a335670616d6c6d6157786c4c303176596d6c.sub.deliverycontent.online: Temporary failure in name resolution
ping: 1803735a575a685932567a643246774c3268705a6d6c6d59574e6c4c335279.sub.deliverycontent.online: Temporary failure in name resolution
ping: 180464413d3d.sub.deliverycontent.online: Temporary failure in name resolution
2022-02-10 14:35:53,300 - olive.optimization_config - INFO - Checking the model file...
2022-02-10 14:35:53,304 - olive.optimization_config - INFO - Provider dnnl not found in available provider list
2022-02-10 14:35:53,304 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider']

by the way I install the dnnl why still no find dnnl ?

 pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_openvino_dnnl==1.9.0

how to tune paralel executor and thread pool size

1 code here https://github.com/microsoft/OLive/blob/66215d1fac3449a7f897006892853d6c7866da0f/docker-images/perf-tuning/src/perf_tuning.py#L413 only uses best run, while it doesn't append -p into tests > shows that -p will not consider as an candidate?

2 what is the purpose of the code https://github.com/microsoft/OLive/blob/66215d1fac3449a7f897006892853d6c7866da0f/docker-images/perf-tuning/src/perf_tuning.py#L443; it seems like its return value is not used?

[Bug]: Batch inference Olive exported models

What happened?

I was trying to get a Whisper model exported with ONNX run with batch inference on which I notice that the exported model is equipped to only handle (1, x) shaped inputs and so I modify the inputs by running:

model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = 'batch_size'
model.graph.input[0].type.tensor_type.shape.dim[1].dim_param = 'sequence_length'
model.graph.input[0].type.tensor_type.shape.dim[1].ClearField('dim_value')
model.graph.input[0].type.tensor_type.shape.dim[0].ClearField('dim_value')   

and then inference with the newly saved model which gives me:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Expect input dimension [n] or [1,n].

This gives me the impression that the Audio Decoder is not equipped to handle batch inputs. I also tried to work with a model with no audio decoder and do that using librosa which does not seem to work well too.

I was wondering if I could get any help or anything I might have missed in exporting the model?

Version?

0.2.1, I am yet to test with the latest commits

About OnnxStaticQuantization

Hi All
I use OnnxStaticQuantization to quantize an onnx model, and I intend to deploy the quantized model to the dpu later.
Due to the limitation of dpu, the parameter y_scale/x_scale of QuantizeLinear/DequantizeLinear in the expected quantization model belongs to 1/(2^n).For example 1, 0.5, 0.25, 0.125, ...

As shown in the figure is a partial QuantizeLinear/DequantizeLinear ops screenshot of the current output.
image

Is there any parameter or other way to solve this problem?

[Bug]: InferenceSessionConfiguration does not exist

What happened?

I build onnx model using the example (whisper_cpu_int8.json) and I get correct results when I run python transcribe.py

However when I take the code to run on .net6.0 in VS2022 in a win11 environment, the generated code does not work: InferenceSessionConfiguration is unknown

I have installed Microsoft.ML.OnnxRuntime v 1.14.0
image

I copied the code into this VS2022 project: /home/sergio/PythonWorkspace/whisper/Olive/examples/whisper/models/SampleCode/ONNXModel/cs/code_sample.cs

image

Version?

I'm using the Main branch, tag v0.2.1

Support for Python 3.9

Are there technical reasons for not support Python 3.9 or are just the wheels missing?

OLive will not install with Could not find a version that satisfies the requirement pywin32==227; sys_platform == "win32" (from docker)

>pip install onnxruntime_olive==0.5.0 --extra-index-url https://olivewheels.azureedge.net/oaas
Looking in indexes: https://pypi.org/simple, https://olivewheels.azureedge.net/oaas
Collecting onnxruntime_olive==0.5.0
  Using cached https://olivewheels.azureedge.net/oaas/onnxruntime_olive-0.5.0-py3-none-any.whl (1.3 MB)
Requirement already satisfied: numpy in c:\python311\lib\site-packages (from onnxruntime_olive==0.5.0) (1.24.2)
Requirement already satisfied: onnx in c:\python311\lib\site-packages (from onnxruntime_olive==0.5.0) (1.13.0)
Collecting psutil
  Using cached psutil-5.9.4-cp36-abi3-win_amd64.whl (252 kB)
Collecting coloredlogs
  Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Collecting sympy
  Using cached sympy-1.11.1-py3-none-any.whl (6.5 MB)
Collecting docker==5.0.0
  Using cached docker-5.0.0-py2.py3-none-any.whl (146 kB)
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting onnxconverter-common
  Using cached onnxconverter_common-1.13.0-py2.py3-none-any.whl (83 kB)
Collecting packaging
  Using cached packaging-23.0-py3-none-any.whl (42 kB)
Collecting websocket-client>=0.32.0
  Using cached websocket_client-1.5.1-py3-none-any.whl (55 kB)
Collecting requests!=2.18.0,>=2.14.2
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)
INFO: pip is looking at multiple versions of onnxruntime-olive to determine which version is compatible with other requirements. This could take a while.
**ERROR: Could not find a version that satisfies the requirement pywin32==227; sys_platform == "win32" (from docker) (from versions: 303, 304, 305)
ERROR: No matching distribution found for pywin32==227; sys_platform == "win32"**

>where pip
C:\Python311\Scripts\pip.exe

>pip -V
pip 23.0 from C:\Python311\Lib\site-packages\pip (python 3.11)

[Bug]: Error Node (BeamSearch_node) has input size 12 not in range [min=5, max=10].

What happened?

I have tried with python 3.10 and 3.11. I create a conda env for this.

For example, I first clone the Olive repo and switch branch git checkout tags/v0.2.0 (i tried them all)
cd Olive
create conda env for python 3.11
python -m pip install .

Then in examples/whisper:
python -m pip install -r requirements.txt
python -m pip uninstall -y onnxruntime ort-nightly
python -m pip install ort-nightly --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/

Throws error:

(env_olive311) sergio@Ubuntu-2204-oai:~/PythonWorkspace/Olive/examples/whisper$ python prepare_whisper_configs.py --model_name openai/whisper-tiny.en
Traceback (most recent call last):
  File "/home/sergio/PythonWorkspace/Olive/examples/whisper/prepare_whisper_configs.py", line 231, in <module>
    main()
  File "/home/sergio/PythonWorkspace/Olive/examples/whisper/prepare_whisper_configs.py", line 39, in main
    whisper_model = get_ort_whisper_for_conditional_generation(args.model_name)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sergio/anaconda3/envs/env_olive311/lib/python3.11/site-packages/olive/hf_utils.py", line 59, in get_ort_whisper_for_conditional_generation
    decoder = WhisperDecoder(model, None, model.config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: WhisperDecoder.__init__() takes 3 positional arguments but 4 were given
(env_olive311) sergio@Ubuntu-2204-oai:~/PythonWorkspace/Olive/examples/whisper$ 

If I manage to build it cloning without changing tag and building with python 3.10 doing the same process, I get the error below when using the model in Microsoft's demo https://github.com/onnxruntime/Whisper-HybridLoop-Onnx-Demo/tree/main/AudioNoteTranscription

OnnxRuntimeException: [ErrorCode:InvalidGraph] Load model from C:/AR-VR-Github/UnitySentisStableDiffusion-And-Whisper/Assets/StreamingAssets/whisper/model.onnx failed:This is an invalid model. In Node, ("BeamSearch_node", BeamSearch, "com.microsoft", -1) : ("log_mel": tensor(float),"max_length": tensor(int32),"min_length": tensor(int32),"num_beams": tensor(int32),"num_return_sequences": tensor(int32),"length_penalty": tensor(float),"repetition_penalty": tensor(float),"","","","","",) -> ("sequences",) , Error Node (BeamSearch_node) has input size 12 not in range [min=5, max=10].
Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess (System.IntPtr nativeStatus) (at <36441e0316944e7eb9fd86bf4a9a5a82>:0)
Microsoft.ML.OnnxRuntime.InferenceSession.Init (System.String modelPath, Microsoft.ML.OnnxRuntime.SessionOptions options, Microsoft.ML.OnnxRuntime.PrePackedWeightsContainer prepackedWeightsContainer) (at <36441e0316944e7eb9fd86bf4a9a5a82>:0)
Microsoft.ML.OnnxRuntime.InferenceSession..ctor (System.String modelPath, Microsoft.ML.OnnxRuntime.SessionOptions options) (at <36441e0316944e7eb9fd86bf4a9a5a82>:0)

Version?

I tried 0.2.1, 0.2.0,0.1.0
Python 3.10, 3.11
Ubuntu 23.04 - lunar

Since this involves both repos I have posted at onnxruntime/Whisper-HybridLoop-Onnx-Demo#2

[Bug]: Error converting whisper model to ORT

What happened?

I can successfully convert whisper to onnx with the following:

python prepare_whisper_configs.py
python -m olive.workflows.run --config whisper_cpu_int8.json --setup
python -m olive.workflows.run --config whisper_cpu_int8.json 2> /dev/null

However when I attempt to convert the generated onyx model to ORTwith the following:

python -m onnxruntime.tools.convert_onnx_models_to_ort models/whisper_cpu_int8_cpu-cpu_model.onnx

I get this error:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /Users/username/Development/Olive/examples/whisper/models/whisper_cpu_int8_cpu-cpu_model.onnx failed:Fatal error: ai.onnx.contrib:BpeDecoder(-1) is not a registered function/op

Version?

0.2.1

[Bug]: explained model in this repo doesn't work on Whisper-HybridLoop-Onnx-Demo

What happened?

Hi, I've created whisper-tiny following microsoft's this repo https://github.com/microsoft/Olive/tree/main/examples/whisper which work in linux following the repo explanations.

Model created:
python prepare_whisper_configs.py --model_name openai/whisper-tiny.en --no_audio_decoder
python -m olive.workflows.run --config whisper_cpu_int8.json --setup
python -m olive.workflows.run --config whisper_cpu_int8.json

The model works in linux: python test_transcription.py --config whisper_cpu_int8.json

I then export model.onnx (from examples/whisper/models/whisper_cpu_int8.zip.zip generated in the olive repo above), place it in the Whisper-HybridLoop-Onnx-Demo, run it for ExecutionProvider Cpu, and I get this error:

'[ErrorCode:InvalidArgument] Input name: 'audio_stream' is not in the metadata'

It throws when it executes var result = session.Run(input, outputs, run_options); in Inference.cs.

image

OnnxRuntime.Extensions is reference in csproj (also tested v 0.8.0)

image

Version?

this repo main branch v0.2.1
Whisper-HybridLoop-Onnx-Demo main branch

More details on the arguments

I was wondering if there is a more in depth documentation of what each argument does or how do the arguments operate?

Performance Tuning from previous step is not loaded properly

While executing performance tuning with model converted from the previous step from Web Application:

Getting following Error:
Failed to load model because protobuf parsing failed.

Is anyone facing this issue?

I had resolved it by Modifying the line 173:

 if (not model_name == ""):
        json_data['model'] = model_name

Mixed precision support?

Hi all,

I was wondering if there is support for fp16 or int8 precision for cuda or TensorRT. It doesn't look like there is an option for that right now, so will it be supported in the future?

Thank you.

No module named 'mlperf_loadgen'

Anaconda:

            shell level : 2
          conda version : 4.12.0
    conda-build version : 3.21.8
         python version : 3.9.12.final.0
       virtual packages : __cuda=11.5=0
                          __linux=5.4.0=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64

In the virtual environment Python is 3.8.13

pip install onnxruntime_olive==0.5.0 --extra-index-url https://olivewheels.azureedge.net/oaas

In the notebook:

from olive.optimization_config import OptimizationConfig
from olive.optimize import optimize
----> 2 from olive.optimize import optimize

File .../lib/python3.8/site-packages/olive/optimize.py:10
      8 from .optimization.optimize_quantization import quantization_optimize
      9 from .optimization.optimize_transformer import transformer_optimize
---> 10 from .optimization.tuning_process import tune_onnx_model, get_benchmark
     11 import logging
     13 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

File .../lib/python3.8/site-packages/olive/optimization/tuning_process.py:13
     10 from packaging import version
     12 from .mlperf_dataset import Dataset
---> 13 from .server_runner import ServerRunner
     14 from ..constants import SUB_PROCESS_NAME_PREFIX, ONNX_TO_NP_TYPE_MAP
     16 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

File .../lib/python3.8/site-packages/olive/optimization/server_runner.py:6
      3 import time
      4 import array
----> 6 import mlperf_loadgen as lg
      7 import numpy as np
      9 from ..constants import QUERY_COUNT, NANO_SEC, MILLI_SEC

ModuleNotFoundError: No module named 'mlperf_loadgen'

[Bug]: Update dependencies for demo Diffusion

What happened?

Using a clean enviornment based off Python 3.10, the default version of pydantic is 2.x which is not compatible with Olives current implementation.

  File "...\envs\olive\lib\site-packages\pydantic\_internal\_model_construction.py", line 290, in inspect_namespace
    raise TypeError("To define root models, use `pydantic.RootModel` rather than a field called '__root__'")
TypeError: To define root models, use `pydantic.RootModel` rather than a field called '__root__'

pydantic==1.10 seems to be the sweetspot and I would recomment adding this to the requirements.txt

Version?

0.2.1 with directml

[Bug]: INVALID_GRAPH / Error Node (BeamSearch_node) has input size 12 not in range [min=5, max=10] when trying to build a non-openai provided Whisper model.

What happened?

When trying to build an ONNX model without multilingual support, using instructions on the whisper example page and supplying a non-openai repo (e.g. aware-ai/whisper-tiny-german) the test_transcription.py fails. Tested using venv.

(olive-env) (base) vmitro@v3629:~/projects/olive_stuff/Olive/examples/whisper$ python prepare_whisper_configs.py --model_name aware-ai/whisper-tiny-german --no_audio_decoder
(olive-env) (base) vmitro@v3629:~/projects/olive_stuff/Olive/examples/whisper$ python -m olive.workflows.run --config whisper_cpu_int8.json --setup
[2023-08-23 23:43:57,572] [INFO] [run.py:112:dependency_setup] The following packages are required in the local environment: ['onnxruntime']
[2023-08-23 23:43:57,572] [INFO] [run.py:116:dependency_setup] onnxruntime is already installed.
(olive-env) (base) vmitro@v3629:~/projects/olive_stuff/Olive/examples/whisper$ python -m olive.workflows.run --config whisper_cpu_int8.json
[2023-08-23 23:44:07,042] [WARNING] [config_utils.py:270:validate_config] Keys {'disable_search'} are not part of OrtTransformersOptimizationConfig. Ignoring them.
[2023-08-23 23:44:07,077] [DEBUG] [engine.py:675:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-08-23 23:44:07,077] [DEBUG] [engine.py:694:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-08-23 23:44:07,078] [DEBUG] [engine.py:475:run_no_search] Step no search with search point {'OnnxConversion': {}, 'OrtTransformersOptimization': {}, 'OnnxDynamicQuantization': {}, 'InsertBeamSearch': {}, 'AppendPrePostProcessingOps': {}} ...
[2023-08-23 23:44:07,078] [INFO] [engine.py:939:_run_pass] Running pass OnnxConversion
[2023-08-23 23:44:07,078] [DEBUG] [engine.py:957:_run_pass] Loading model from cache ...
[2023-08-23 23:44:07,082] [INFO] [engine.py:939:_run_pass] Running pass OrtTransformersOptimization
[2023-08-23 23:44:07,083] [DEBUG] [engine.py:957:_run_pass] Loading model from cache ...
[2023-08-23 23:44:07,087] [INFO] [engine.py:939:_run_pass] Running pass OnnxDynamicQuantization
[2023-08-23 23:44:07,087] [DEBUG] [engine.py:957:_run_pass] Loading model from cache ...
[2023-08-23 23:44:07,091] [INFO] [engine.py:939:_run_pass] Running pass InsertBeamSearch
[2023-08-23 23:44:07,092] [DEBUG] [engine.py:957:_run_pass] Loading model from cache ...
[2023-08-23 23:44:07,093] [INFO] [engine.py:939:_run_pass] Running pass AppendPrePostProcessingOps
[2023-08-23 23:44:07,094] [DEBUG] [engine.py:957:_run_pass] Loading model from cache ...
[2023-08-23 23:44:07,095] [DEBUG] [engine.py:1076:_evaluate_model] Evaluating model ...
[2023-08-23 23:44:07,096] [DEBUG] [engine.py:1087:_evaluate_model] Loading evaluation from cache ...
[2023-08-23 23:44:07,096] [DEBUG] [engine.py:917:_run_passes] Signal: {'latency-avg': 1079.78704}
[2023-08-23 23:44:07,096] [DEBUG] [engine.py:499:run_no_search] Engine output_name is provided. Will ignore output_name for final pass
[2023-08-23 23:44:07,227] [INFO] [engine.py:384:run] Package top ranked 1 models as artifacts
[2023-08-23 23:44:07,227] [INFO] [packaging_generator.py:47:_generate_zipfile_output] Packaging Zipfile output artifacts
[2023-08-23 23:44:07,281] [DEBUG] [resource_path.py:147:create_resource_path] Resource path /tmp/tmpq7o_oxgh/CandidateModels/cpu-cpu/BestCandidateModel_1/olive_tmpkcdwbdnu/model.onnx is inferred to be of type file.
[2023-08-23 23:44:07,288] [DEBUG] [utils.py:21:run_subprocess] Running command: python -m pip download onnxruntime-extensions==0.8.0 --no-deps -d /tmp/tmpq7o_oxgh/ONNXRuntimePackages/Python with env: None
[2023-08-23 23:44:08,206] [DEBUG] [utils.py:21:run_subprocess] Running command: python -m pip download onnxruntime==1.15.1 --no-deps -d /tmp/tmpq7o_oxgh/ONNXRuntimePackages/Python with env: None
(olive-env) (base) vmitro@v3629:~/projects/olive_stuff/Olive/examples/whisper$ python test_transcription.py --config whisper_cpu_int8.json
> /home/vmitro/projects/olive_stuff/Olive/examples/whisper/test_transcription.py(71)main()
-> for model_json in output_model_json_path.glob(f"**/{config['engine']['output_name']}_cpu-cpu_model.json"):
[2023-08-23 23:45:27,367] [WARNING] [__init__.py:212:_is_valid_ep] Error: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from /home/vmitro/projects/olive_stuff/Olive/examples/whisper/models/conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost/whisper_cpu_int8_cpu-cpu_model.onnx failed:This is an invalid model. In Node, ("BeamSearch_node", BeamSearch, "com.microsoft", -1) : ("log_mel": tensor(float),"max_length": tensor(int32),"min_length": tensor(int32),"num_beams": tensor(int32),"num_return_sequences": tensor(int32),"length_penalty": tensor(float),"repetition_penalty": tensor(float),"","","","","",) -> ("sequences",) , Error Node (BeamSearch_node) has input size 12 not in range [min=5, max=10].Olive will ignore this CPUExecutionProvider.Please make sure the environment with CPUExecutionProvider has the required dependencies.
Traceback (most recent call last):
  File "/home/vmitro/projects/olive_stuff/Olive/examples/whisper/test_transcription.py", line 108, in <module>
    output_text = main()
  File "/home/vmitro/projects/olive_stuff/Olive/examples/whisper/test_transcription.py", line 98, in main
    session = olive_model.prepare_session(None, "cpu")
  File "/home/vmitro/projects/olive_stuff/olive-env/lib/python3.10/site-packages/olive/model/__init__.py", line 346, in prepare_session
    return get_ort_inference_session(self.model_path, inference_settings, self.use_ort_extensions)
  File "/home/vmitro/projects/olive_stuff/olive-env/lib/python3.10/site-packages/olive/common/ort_inference.py", line 64, in get_ort_inference_session
    sess = ort.InferenceSession(
  File "/home/vmitro/projects/olive_stuff/olive-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/vmitro/projects/olive_stuff/olive-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 424, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from /home/vmitro/projects/olive_stuff/Olive/examples/whisper/models/conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost/whisper_cpu_int8_cpu-cpu_model.onnx failed:This is an invalid model. In Node, ("BeamSearch_node", BeamSearch, "com.microsoft", -1) : ("log_mel": tensor(float),"max_length": tensor(int32),"min_length": tensor(int32),"num_beams": tensor(int32),"num_return_sequences": tensor(int32),"length_penalty": tensor(float),"repetition_penalty": tensor(float),"","","","","",) -> ("sequences",) , Error Node (BeamSearch_node) has input size 12 not in range [min=5, max=10].

When installing the latest onnxruntime (1.16.0-dev*) through python -m pip install ort-nightly --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ the transcription seems to work (haven't tested with a audio file with spoken German, but the supplied file gets transcribed into a German-looking text).

When loaded into the Android ONNX Runtime (latest ver 1.15.1) I get exactly the same error. When I compare the two repos' config.json files (openai's and the aforementioned), some decoder_* and encoder_* options differ.

Could this be a ONNX Runtime bug? I'm currently building Android runtime from source, will report if I get it to work using my build.

Version?

Olive: 0.3.1
ONNX Runtime: 1.15.1

[Bug]: Optimization of Unet fails 6950 XT

What happened?

This appeared to me to be the same issue as 510 and 301, though I know nothing. I ran the following commands:

  • conda create --name olive python=3.9
  • conda activate olive
  • pip install olive-ai[directml]==0.3.1
  • git clone https://github.com/microsoft/olive --branch v0.3.1
  • cd (to relevant directory)
  • pip install -r requirements.txt
  • python stable_diffusion_xl.py --optimize

I've attached the log, as well as a DXDIAG, but it errors out when optimizing unet saying "failed to run olive on gpu-dml".... "887a0006 the gpu will not respond to more commands".

DxDiag.txt
ErrorLog.txt

Version?

0.3.1

Raise EOFError

When I try Optimize_ONNX_Models_Latency_with_OLive.ipynb, the following occurs

2022-08-08 15:25:49,680 - olive.optimization_config - INFO - Checking the model file...
2022-08-08 15:25:49,695 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-08-08 15:25:52,705 - olive.optimization_config - INFO - Checking the model file...
2022-08-08 15:25:52,720 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
prepare(preparation_data)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "/usr/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hyl/Downloads/low_latency.py", line 24, in
result = optimize(opt_config)
File "/home/hyl/.local/bin/.virtualenvs/Olive_cpu_py36/lib/python3.6/site-packages/olive/optimize.py", line 24, in optimize
pretuning_inference_result = get_benchmark(optimization_config)
File "/home/hyl/.local/bin/.virtualenvs/Olive_cpu_py36/lib/python3.6/site-packages/olive/optimization/tuning_process.py", line 202, in get_benchmark
manager = Manager()
File "/usr/lib/python3.6/multiprocessing/context.py", line 56, in Manager
m.start()
File "/usr/lib/python3.6/multiprocessing/managers.py", line 513, in start
self._process.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "low_latency.py", line 24, in
result = optimize(opt_config)
File "/home/hyl/.local/bin/.virtualenvs/Olive_cpu_py36/lib/python3.6/site-packages/olive/optimize.py", line 24, in optimize
pretuning_inference_result = get_benchmark(optimization_config)
File "/home/hyl/.local/bin/.virtualenvs/Olive_cpu_py36/lib/python3.6/site-packages/olive/optimization/tuning_process.py", line 202, in get_benchmark
manager = Manager()
File "/usr/lib/python3.6/multiprocessing/context.py", line 56, in Manager
m.start()
File "/usr/lib/python3.6/multiprocessing/managers.py", line 517, in start
self._address = reader.recv()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Crashes in the middle of the optimization process (KeyError: 'throughput')

Hi,
The program crashes while optimizing -

Steps to reproduce
installation

wget https://olivewheels.blob.core.windows.net/repo/onnxruntime_olive-0.4.0-py3-none-any.whl
pip install onnxruntime_olive-0.4.0-py3-none-any.whl
pip install --extra-index-url https://olivewheels.azureedge.net/test mlperf_loadgen
pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_gpu_tensorrt==1.11.0

Use

from olive.optimization_config import OptimizationConfig
from olive.optimize import optimize

opt_config = OptimizationConfig(
    model_path="models.onnx",
    result_path="opt_throughput_result",
    throughput_tuning_enabled=True,
    inputs_spec={
        "input": [
            -1,
            3,
            512,
            512,
        ]
    },
    max_latency_percentile=0.95,
    max_latency_ms=1000,
    threads_num=4,
    dynamic_batching_size=32,
    min_duration_sec=10,
)
if __name__ == "__main__":
    result = optimize(opt_config)

This runs for sometime, then crashes

2022-05-19 09:19:09,930 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:09,943 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:11,625 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:11,638 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:13,204 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:13,224 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:07,504 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:07,675 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:14,154 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:14,179 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:23,212 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:24:28,503 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:28,809 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:34,735 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:34,761 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:43,921 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:27:49,552 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:49,774 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:55,796 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:55,822 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:40,752 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:29:47,356 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:47,603 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:52,975 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:53,001 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:38,742 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:31:44,725 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:44,947 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:50,856 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:50,884 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:16,662 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:34:22,604 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:22,820 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:28,909 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:28,934 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:36:22,542 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/cnvrg/onnx_opt/onnx_optimization.py", line 23, in <module>
    result = optimize(opt_config)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am not sure about the exact issue but could this be maybe be in a try-except so the whole process doesn't fail?

P.S. is there any details about the environment that I should add?

Missing Keras in requirements.txt

onnx-converter requirements.txt should install keras

{'conversion_status': 'FAILED',

'correctness_verified': 'FAILED',

'error_message': "No module named 'keras'",

'input_folder': '',

'output_onnx_path': ''}

Traceback (most recent call last):

 File "src/onnx_converter.py", line 343, in <module>

   main()

 File "src/onnx_converter.py", line 309, in main

   raise e

 File "src/onnx_converter.py", line 299, in main

   convert_models(args)

 File "src/onnx_converter.py", line 277, in convert_models

   converter(args)

 File "src/onnx_converter.py", line 150, in keras2onnx

   import keras

ModuleNotFoundError: No module named 'keras'

Whisper Example on ORT master

The documentation specifies that we still use a specific nightly build of ORT to optimize the Whisper model. However, the dummy input issue of microsoft/onnxruntime#15936 and microsoft/onnxruntime@e518933 is blocking execution of the optimized model through the OpenVINO Execution Provider for the same reason i.e, OpenVINO model compilation removes any dummy inputs in the graph but since it is present in the Olive optimized model (through the ORT 1.15 nightly), it fails to go through. Hence, OVEP requires the fixes for the dummy input issue in ORT master.

When I use ORT master to optimize Whisper through Olive, I get this error

out = model(inputs.encoder_input_ids, inputs.encoder_attention_mask, inputs.decoder_input_ids) AttributeError: 'WhisperEncoderDecoderInitInputs' object has no attribute 'encoder_attention_mask'

Mlperf_loadgen

Had to run "pip install --extra-index-url https://olivewheels.azureedge.net/test mlperf_loadgen" to run this tutorial. https://github.com/microsoft/OLive/blob/master/notebook-tutorial/Optimize_ONNX_Models_Throughput_with_OLive.ipynb

Missing module when running
from olive.optimize import optimize

Question: what about Nvidia Ada Fp8 support?

Hi,
Just a question FP8 in new NV ada gpus it’ supported on DirectML and Olive?
If not any plans to support?
Assume it can bring another 2x speedup to stable diffusion sample?
Thanks..

[Bug]: onnxruntime.capi.onnxruntime_pybind11_state.Fail with save_as_external_data

What happened?

I modified example config file from this repository because it fails to optimize a large model.

Warning while optimizing large model:

[WARNING] [common.py:88:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless.

Modified config:

{
  "input_model": {
    "type": "PyTorchModel",
    "config": {
      "model_path": "...",
      "model_loader": "unet_load",
      "model_script": "...",
      "io_config": {
        "input_names": [
          "sample",
          "timestep",
          "encoder_hidden_states",
          "return_dict"
        ],
        "output_names": ["out_sample"],
        "dynamic_axes": {
          "sample": {
            "0": "unet_sample_batch",
            "1": "unet_sample_channels",
            "2": "unet_sample_height",
            "3": "unet_sample_width"
          },
          "timestep": { "0": "unet_time_batch" },
          "encoder_hidden_states": {
            "0": "unet_hidden_batch",
            "1": "unet_hidden_sequence"
          }
        }
      },
      "dummy_inputs_func": "unet_conversion_inputs"
    }
  },
  "systems": {
    "local_system": {
      "type": "LocalSystem",
      "config": {
        "accelerators": ["gpu"]
      }
    }
  },
  "evaluators": {
    "common_evaluator": {
      "metrics": [
        {
          "name": "latency",
          "type": "latency",
          "sub_types": [{ "name": "avg" }],
          "user_config": {
            "user_script": "modules/sd_olive_scripts.py",
            "dataloader_func": "unet_data_loader",
            "batch_size": 2
          }
        }
      ]
    }
  },
  "passes": {
    "convert": {
      "type": "OnnxConversion",
      "config": {
        "target_opset": 14,
        "save_as_external_data": true,
        "all_tensors_to_one_file": true,
        "external_data_name": "weights.pb"
      }
    },
    "optimize": {
      "type": "OrtTransformersOptimization",
      "config": {
        "model_type": "unet",
        "float16": true,
        "use_gpu": true,
        "keep_io_types": false,
        "save_as_external_data": true, <- here
        "optimization_options": {
          "enable_gelu": true,
          "enable_layer_norm": true,
          "enable_attention": true,
          "use_multi_head_attention": true,
          "enable_skip_layer_norm": false,
          "enable_embed_layer_norm": true,
          "enable_bias_skip_layer_norm": false,
          "enable_bias_gelu": true,
          "enable_gelu_approximation": false,
          "enable_qordered_matmul": false,
          "enable_shape_inference": true,
          "enable_gemm_fast_gelu": false,
          "enable_nhwc_conv": false,
          "enable_group_norm": true,
          "enable_bias_splitgelu": false,
          "enable_packed_qkv": true,
          "enable_packed_kv": true,
          "enable_bias_add": false
        },
        "force_fp32_ops": ["RandomNormalLike"]
      }
    }
  },
  "engine": {
    "search_strategy": {
      "execution_order": "joint",
      "search_algorithm": "exhaustive"
    },
    "evaluator": "common_evaluator",
    "host": "local_system",
    "target": "local_system",
    "cache_dir": "cache",
    "output_name": "unet",
    "output_dir": "footprints",
    "execution_providers": ["DmlExecutionProvider"]
  }
}

When the optimization finished with the modified config, the generation is terminated with an error below.

      File "D:\miniconda3\envs\olivedml\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1039, in from_pretrained
        loaded_sub_model = load_sub_model(
      File "D:\miniconda3\envs\olivedml\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 445, in load_sub_model
        loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
      File "D:\miniconda3\envs\olivedml\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 205, in from_pretrained
        return cls._from_pretrained(
      File "D:\miniconda3\envs\olivedml\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 172, in _from_pretrained
        model = OnnxRuntimeModel.load_model(
      File "D:\miniconda3\envs\olivedml\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 77, in load_model
        return ort.InferenceSession(path, providers=[provider], sess_options=sess_options)
      File "D:\miniconda3\envs\olivedml\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__
        self._create_inference_session(providers, provider_options, disabled_optimizers)
      File "D:\miniconda3\envs\olivedml\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 435, in _create_inference_session
        sess.initialize_session(providers, provider_options, disabled_optimizers)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException

What's the problem? This document is saying save_as_external_data is available for OrtTransformersOptimization.
When I tested with/without save_as_external_data using a smaller model, same symptom still existed.

Version?

Python 3.10.11
onnxruntime-directml 1.15.0
olive-ai 0.2.1
torch 1.13.1
torchvision 0.14.1
numpy 1.23.4

[Bug]: Olive downloads a non-existent .nupkg package for Microsoft.ML.ONNXRuntime

What happened?

I'm not sure this will be able to reproduce, as it's totally dependent on the status of the ort-nightly Azure DevOps builds.

I noticed that VS refused to load up the Microsoft.ML.ONNXRuntime .nupkg from the model zip folder after running the workflow to package up openai/whisper-tiny.

Steps to reproduce:

  1. pip install ort-nightly at some 1.16.0 dev version for which there is no corresponding Microsoft.ML.ONNXRuntime nuget package available at https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/NuGet/Microsoft.ML.OnnxRuntime/versions/ (at the time of writing, 8/28/2023's ort-nightly python package had no corresponding Microsoft.ML.ONNXRuntime nuget package)
  2. Run the workflow to generate the openai/whisper-tiny ONNX model
  3. Unzip the resulting folder and either a) unzip the .nupkg with a tool like 7zip or b) set that folder as a local nuget repository in visual studio

After step 3, you should either see a failure to unzip or a failure to show the package in the NuGet package manager.

Hints I think I found.

  1. The file is created no matter the result of pulling the file from the web
  2. I think the Azure DevOps naming convention for the ONNXRuntime nuget package is different than the naming for the python ort-nightly package
  3. For Microsoft.ML.ONNXRuntime at some nightly version to work, you'd need (at least) the corresponding Microsoft.ML.ONNXRuntime.Managed nightly version, as well.

This is, obviously, non-blocking as the solution is to go back in time to an ort-nightly for which a corresponding Microsoft.ML.ONNXRuntime already exists, but I'm sure it'll trip someone up who won't get as lucky as I did doing version inspection. I wonder if the user experience should be failing the build if they're using an ort-nightly that does not have corresponding runtime support... Or if it's even feasible to find the NuGet packages given that the naming convention is so off!

tl;dr the downloaded nuget package is not real if the package does not exist on Azure DevOps and/or if the naming convention does not match

Version?

84ac609

[Bug]: Doesn't work with openai/whisper-large-v2

What happened?

To reproduce the error:

With examples/whisper
python prepare_whisper_configs.py --model_name openai/whisper-large-v2 --multilingual

When
python -m olive.workflows.run --config whisper_cpu_int8.json

[2023-07-27 08:17:13,182] [DEBUG] [config.py:122:fill_in_params] Missing parameter data_dir for component load_dataset
[2023-07-27 08:17:13,192] [DEBUG] [engine.py:618:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-07-27 08:17:13,192] [DEBUG] [engine.py:637:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-07-27 08:17:13,192] [DEBUG] [engine.py:436:run_no_search] Step no search with search point {'OnnxConversion': {}, 'OrtTransformersOptimization': {}, 'OnnxDynamicQuantization': {}, 'InsertBeamSearch': {}, 'AppendPrePostProcessingOps': {}} ...
[2023-07-27 08:17:13,192] [INFO] [engine.py:882:_run_pass] Running pass OnnxConversion
[2023-07-27 08:17:13,193] [DEBUG] [engine.py:890:_run_pass] Loading model from cache ...
[2023-07-27 08:17:13,196] [INFO] [engine.py:882:_run_pass] Running pass OrtTransformersOptimization
[2023-07-27 08:17:13,197] [DEBUG] [engine.py:744:_prepare_non_local_model] Model path is None, local or string name. No need to prepare
[2023-07-27 08:17:18,135] [ERROR] [engine.py:942:_run_pass] Pass run failed.
Traceback (most recent call last):
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 930, in _run_pass
    output_model = host.run_pass(p, input_model, data_root, output_model_path, pass_search_point)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/systems/local.py", line 33, in run_pass
    return the_pass.run(model, data_root, output_model_path, point)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 391, in run
    components.append(self._run_for_config(child, data_root, config, str(component_output_path)))
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/passes/onnx/transformer_optimization.py", line 104, in _run_for_config
    optimizer = transformers_optimizer.optimize_model(input=model.model_path, **run_config)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnxruntime/transformers/optimizer.py", line 323, in optimize_model
    model = load_model(temp_model_path or input)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/__init__.py", line 176, in load_model
    load_external_data_for_model(model, base_dir)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/external_data_helper.py", line 65, in load_external_data_for_model
    load_external_data_for_tensor(tensor, base_dir)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/external_data_helper.py", line 45, in load_external_data_for_tensor
    with open(external_data_file_path, "rb") as data_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp9o2iozdo/model.onnx.data'
[2023-07-27 08:17:18,135] [WARNING] [engine.py:362:run] Failed to run Olive on cpu-cpu: [Errno 2] No such file or directory: '/tmp/tmp9o2iozdo/model.onnx.data'
Traceback (most recent call last):
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 337, in run
    output = self.run_no_search(
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 443, in run_no_search
    ) = self._run_passes(next_step["passes"], model, model_id, data_root, accelerator_spec)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 845, in _run_passes
    model, model_id = self._run_pass(pass_id, pass_search_point, model, model_id, data_root, accelerator_spec)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/engine/engine.py", line 930, in _run_pass
    output_model = host.run_pass(p, input_model, data_root, output_model_path, pass_search_point)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/systems/local.py", line 33, in run_pass
    return the_pass.run(model, data_root, output_model_path, point)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 391, in run
    components.append(self._run_for_config(child, data_root, config, str(component_output_path)))
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/olive/passes/onnx/transformer_optimization.py", line 104, in _run_for_config
    optimizer = transformers_optimizer.optimize_model(input=model.model_path, **run_config)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnxruntime/transformers/optimizer.py", line 323, in optimize_model
    model = load_model(temp_model_path or input)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/__init__.py", line 176, in load_model
    load_external_data_for_model(model, base_dir)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/external_data_helper.py", line 65, in load_external_data_for_model
    load_external_data_for_tensor(tensor, base_dir)
  File "/home/ykrasilnikov/.local/lib/python3.10/site-packages/onnx/external_data_helper.py", line 45, in load_external_data_for_tensor
    with open(external_data_file_path, "rb") as data_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp9o2iozdo/model.onnx.data'
[2023-07-27 08:17:18,144] [INFO] [engine.py:365:run] Package top ranked 0 models as artifacts
[2023-07-27 08:17:18,144] [WARNING] [packaging_generator.py:35:generate_output_artifacts] No model is selected. Skip packaging output artifacts.

Version?

Name: olive-ai
Version: 0.3.0
Summary: Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
Home-page: https://microsoft.github.io/Olive/
Author: Microsoft Corporation
Author-email: [email protected]
License: MIT License
Location: /home/ykrasilnikov/.local/lib/python3.10/site-packages
Requires: numpy, onnx, optuna, pandas, protobuf, pydantic, pyyaml, torch, torchmetrics, transformers
Required-by:

freeze

alembic==1.11.1
attrs==21.2.0
Automat==20.2.0
Babel==2.8.0
bcrypt==3.2.0
blinker==1.4
build==0.10.0
certifi==2020.6.20
chardet==4.0.0
click==8.0.3
cloud-init==23.2.1
cmaes==0.10.0
cmake==3.27.0
colorama==0.4.4
coloredlogs==15.0.1
colorlog==6.7.0
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
contextlib2==21.6.0
contourpy==1.1.0
cryptography==3.4.8
cycler==0.11.0
dbus-python==1.2.18
Deprecated==1.2.14
distro==1.7.0
distro-info===1.1build1
filelock==3.12.2
flatbuffers==23.5.26
fonttools==4.41.1
fsspec==2023.6.0
greenlet==2.0.2
httplib2==0.20.2
huggingface-hub==0.16.4
humanfriendly==10.0
hyperlink==21.0.0
idna==3.3
importlib-metadata==4.6.4
incremental==21.3.0
jeepney==0.7.1
Jinja2==3.0.3
joblib==1.3.1
jsonpatch==1.32
jsonpointer==2.0
jsonschema==3.2.0
keyring==23.5.0
kiwisolver==1.4.4
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lit==16.0.6
Mako==1.2.4
MarkupSafe==2.0.1
matplotlib==3.7.2
more-itertools==8.10.0
mpmath==1.3.0
netifaces==0.11.0
networkx==3.1
neural-compressor==2.2.1
numpy==1.25.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib==3.2.0
olive-ai @ file:///home/ykrasilnikov/Olive
onnx==1.14.0
onnxruntime-extensions==0.8.0
opencv-python-headless==4.8.0.74
optuna==3.2.0
ort-nightly==1.16.0.dev20230725005
packaging==23.1
pandas==2.0.3
pexpect==4.8.0
Pillow==10.0.0
pip-tools==7.1.0
prettytable==3.8.0
protobuf==3.20.3
psutil==5.9.5
ptyprocess==0.7.0
py-cpuinfo==9.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.1
pycocotools==2.0.6
pydantic==1.10.12
PyGObject==3.42.1
PyHamcrest==2.0.2
PyJWT==2.3.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
pyproject_hooks==1.0.0
pyrsistent==0.18.1
pyserial==3.5
python-apt==2.4.0+ubuntu1
python-dateutil==2.8.2
python-debian==0.1.43+ubuntu1.1
python-magic==0.4.24
pytz==2022.1
PyYAML==5.4.1
regex==2023.6.3
requests==2.25.1
safetensors==0.3.1
schema==0.7.5
scikit-learn==1.3.0
scipy==1.11.1
SecretStorage==3.3.1
service-identity==18.1.0
six==1.16.0
sos==4.4
SQLAlchemy==2.0.19
ssh-import-id==5.11
sympy==1.12
systemd-python==234
threadpoolctl==3.2.0
tokenizers==0.13.3
tomli==2.0.1
torch==2.0.1
torchmetrics==0.10.0
tqdm==4.65.0
transformers==4.31.0
triton==2.0.0
Twisted==22.1.0
typing_extensions==4.7.1
tzdata==2023.3
ubuntu-advantage-tools==8001
ubuntu-drivers-common==0.0.0
ufw==0.36.1
unattended-upgrades==0.1
urllib3==1.26.5
wadllib==1.3.6
wcwidth==0.2.6
wrapt==1.15.0
xkit==0.0.0
zipp==1.0.0
zope.interface==5.4.0

Optimizing Fail (AMD CPU)

I get a Fail when running the Optimizing command and get an AMD Bug Report Tool pop up (Driver timeout).
My Laptop has NVIDIA RTX 3070 Max-Q and an AMD CPU

Copy of the Error:

[WARNING] [engine.py:307:run] Failed to run Olive on gpu-dml: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(896)\onnxruntime_pybind11_state.pyd!00007FFB7134FE91: (caller: 00007FFB713508BF) Exception(2) tid(4ac4) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

The output is incorrect given a specific latent

latents.zip
latents.zip includes latent1.npy in fp32 type and latent2.npy in fp16 type.
The output with Olive optmized model is totally black when using latent1 while it works with latent2 . I do did the data type conversion in the script.
Besides, I also tested latent1_fp32 with a common sd1.5 onnx fp16 model, it works as well.
So, it looks like there is some other requirements for data range when using Olive optmized model?

Enviroment: onnxruntime 1.15, Olive master, torch1.3
Script:

from diffusers import StableDiffusionOnnxPipeline
import numpy as np
pipe = StableDiffusionOnnxPipeline.from_pretrained(r".\Olive\examples\directml\stable_diffusion\models\optimized\runwayml\stable-diffusion-v1-5", provider="DmlExecutionProvider")
prompt = "a photo of an astronaut riding a horse on mars"
latents = np.load(r"latents.npy") .astype(np.float16)
h , w = 512, 512
image = pipe(prompt,height = h, width = w, num_inference_steps = 20
,  latents = latents
).images[0] 
image.save("mansion.png")

Some parameters is removed in latest version conversion

A few reminders that might be useful.

Reference linking:
https://github.com/microsoft/OLive/blob/0b989430f62ba215132a61815df2cabd43a6f668/olive/conversion/pytorch_converter.py#L58
https://github.com/microsoft/OLive/blob/0b989430f62ba215132a61815df2cabd43a6f668/olive/conversion/tensorflow_converter.py#L39

PyTorch-ONNX was removed example_outputs arg from torch.onnx.export in PyTorch 1.11 Release.
TensorFlow-ONNX was removed const_fold/fold_constant in latest main branch. PR

[FR]: Make vitis-ai quantization compatible with ORT 1.16.0+

Proposal Summary

Vitis ai code uses some functions that are not available in ORT 1.16.0 is currently in development. Refer to #380 for more details.

What component(s) does this request affect?

  • OliveModels
  • OliveSystems
  • OliveEvaluator
  • Metrics
  • Engine
  • Passes
  • Other

I don't see a difference on GTX 1080

I have gone through all the steps described in examples\directml\stable_diffusion.
I also installed the latest Nvidia drivers.
Launched marked --optimizer on the standard stable-diffusion-v1-5 model. After Optimizing the ONNX pipeline, I launched with the note "stable_diffusion.py --num_inference_steps 50 --num_images 1", it took me 24-26 seconds to generate, but without ONNX, I also get 24-26 seconds. Shouldn 't there be acceleration ?

image

OrtPerfTuning failed when running with the example.

I was trying to run the example DML squeezenet. But I got an error.

[2023-08-10 14:09:39,485] [DEBUG] [engine.py:539:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-08-10 14:09:39,487] [DEBUG] [engine.py:558:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-08-10 14:09:39,491] [DEBUG] [engine.py:460:run_search] Step 1 with search point {'OnnxConversion': {}, 'OnnxFloatToFloat16': {}, 'OrtPerfTuning': {}} ...
[2023-08-10 14:09:39,493] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxConversion
Using cache found in C:\Users\yzhou/.cache\torch\hub\pytorch_vision_v0.10.0
============== Diagnostic Run torch.onnx.export version 2.0.1+cpu ==============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[2023-08-10 14:09:41,043] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxFloatToFloat16
e:\Users\yzhou\onnx projects\Olive_test\venv\lib\site-packages\onnxconverter_common\float16.py:43: UserWarning: the float32 number 1.7538578589437748e-08 will be truncated to 1e-07
  warnings.warn("the float32 number {} will be truncated to {}".format(pos_min, min_positive_val))
[2023-08-10 14:09:41,302] [DEBUG] [engine.py:725:_run_passes] Running pass OrtPerfTuning
[2023-08-10 14:09:42,588] [INFO] [perf_tuning.py:72:tune_onnx_model] Run tuning for: [('provider', 'DmlExecutionProvider'), ('execution_mode', 'ORT_SEQUENTIAL'), ('ort_opt_level', 99), ('io_bind', False)]
ERROR:root:Optimization failed for tuning combo ('DmlExecutionProvider', 'ORT_SEQUENTIAL', 99, False)
[2023-08-10 14:11:15,823] [INFO] [perf_tuning.py:81:tune_onnx_model] Best result: {'test_name': 'pretuning', 'latency_ms': 2.4117}
[2023-08-10 14:11:15,844] [DEBUG] [engine.py:898:_evaluate_model] Evaluating model ...
[2023-08-10 14:11:16,349] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,350] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,350] [DEBUG] [engine.py:765:_run_passes] Signal: {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,355] [INFO] [footprint.py:168:get_pareto_frontier] pareto frontier points: 2_OrtPerfTuning-1-db4d7470ffd292aaa61802d59d85b66a-gpu-cpu {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,356] [INFO] [engine.py:475:run_search] Output all 1 models
[2023-08-10 14:11:16,501] [DEBUG] [engine.py:539:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-08-10 14:11:16,502] [DEBUG] [engine.py:558:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-08-10 14:11:16,504] [DEBUG] [engine.py:460:run_search] Step 1 with search point {'OnnxConversion': {}, 'OnnxFloatToFloat16': {}, 'OrtPerfTuning': {}} ...
[2023-08-10 14:11:16,504] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxConversion
[2023-08-10 14:11:16,507] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,510] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxFloatToFloat16
[2023-08-10 14:11:16,513] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,527] [DEBUG] [engine.py:725:_run_passes] Running pass OrtPerfTuning
[2023-08-10 14:11:16,539] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,542] [DEBUG] [engine.py:898:_evaluate_model] Evaluating model ...
[2023-08-10 14:11:16,550] [DEBUG] [engine.py:902:_evaluate_model] Loading evaluation from cache ...
[2023-08-10 14:11:16,551] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,552] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,552] [DEBUG] [engine.py:765:_run_passes] Signal: {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,557] [INFO] [footprint.py:168:get_pareto_frontier] pareto frontier points: 2_OrtPerfTuning-1-db4d7470ffd292aaa61802d59d85b66a-gpu-cpu {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,557] [INFO] [engine.py:475:run_search] Output all 1 models
[2023-08-10 14:11:16,559] [INFO] [engine.py:318:run] No packaging config provided, skip packaging artifacts

And it only generated two models. I think it should generate 3. There should be one for OrtPerfTuning.
image

No change to the code. Does anybody have the same issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.