Is your feature request related to a problem? Please describe. Du

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Expose ONNXRuntime memory arena shrinkage option as configurable parameter. about onnxruntime_backend HOT 4 CLOSED

zeruniverse commented on June 3, 2024

Expose ONNXRuntime memory arena shrinkage option as configurable parameter.

from onnxruntime_backend.

Comments (4)

pranavsharma commented on June 3, 2024

Resolved.

from onnxruntime_backend.

zeruniverse commented on June 3, 2024

@pranavsharma Thanks for implementing this but can arena shrink be smarter? Currently, user should specify where the model is (e.g. cpu:0), but actually triton knows where the model is better than user. I suggest user just specify this value as 0 or 1, and triton translates it to cpu:0, gpu:0, gpu:1 etc.

from onnxruntime_backend.

pranavsharma commented on June 3, 2024

@zeruniverse We would like to mirror the configs from the ORT framework here as much as possible. This has a few advantages.

It allows us to point to existing documentation that is being continually enriched.
Users can search for the config in https://github.com/Microsoft/onnxruntime/issues or get perf recommendations and when a config is mentioned they won't have to think how to translate the ORT framework string to ORT backend config.
It avoids the unnecessary translation of configs from ORT backend to ORT framework invocations. This allows for easy reproducibility of ORT framework related issues. You can paste the exact config and file an issue.

When you configure Triton to run with ORT backend and CPU, you've to mention the instance kind to be CPU any way. I realize you've to repeat it in this string but the minor inconvenience is a tradeoff against the advantages listed above.

from onnxruntime_backend.

zeruniverse commented on June 3, 2024

@pranavsharma thanks for your reply! Yes, I understand for CPU, user can just specify cpu:0. The problem is when there’re multiple GPUs(say 2) and each GPU has one instance of model X. In this case, instance 1 should have value gpu:0 and instance 2 should have gpu:1. I don’t know if user would be able to configure this since I assume parameter settings apply to all instances

from onnxruntime_backend.

Recommend Projects

Expose ONNXRuntime memory arena shrinkage option as configurable parameter. about onnxruntime_backend HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent