Comments (4)
Resolved.
from onnxruntime_backend.
@pranavsharma Thanks for implementing this but can arena shrink be smarter? Currently, user should specify where the model is (e.g. cpu:0), but actually triton knows where the model is better than user. I suggest user just specify this value as 0 or 1, and triton translates it to cpu:0, gpu:0, gpu:1 etc.
from onnxruntime_backend.
@zeruniverse We would like to mirror the configs from the ORT framework here as much as possible. This has a few advantages.
- It allows us to point to existing documentation that is being continually enriched.
- Users can search for the config in https://github.com/Microsoft/onnxruntime/issues or get perf recommendations and when a config is mentioned they won't have to think how to translate the ORT framework string to ORT backend config.
- It avoids the unnecessary translation of configs from ORT backend to ORT framework invocations. This allows for easy reproducibility of ORT framework related issues. You can paste the exact config and file an issue.
When you configure Triton to run with ORT backend and CPU, you've to mention the instance kind to be CPU any way. I realize you've to repeat it in this string but the minor inconvenience is a tradeoff against the advantages listed above.
from onnxruntime_backend.
@pranavsharma thanks for your reply! Yes, I understand for CPU, user can just specify cpu:0. The problem is when there’re multiple GPUs(say 2) and each GPU has one instance of model X. In this case, instance 1 should have value gpu:0 and instance 2 should have gpu:1. I don’t know if user would be able to configure this since I assume parameter settings apply to all instances
from onnxruntime_backend.
Related Issues (20)
- Onnxruntime Error
- Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op HOT 2
- InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.
- How to create onnx model for ragged batching?
- Add `enable_dynamic_shapes` To Model Config To Resolve CNN Memory Leaks With OpenVino EP
- GPU memory leak with high load for ONNX model HOT 3
- Onnxruntime backend error when workload is high since Triton uses CUDA 12 HOT 4
- how to use onnxruntime profiling in triton
- Error while Loading YOLOv8 Model with EfficientNMS_TRT Plugin in TRITON HOT 2
- Openvino doesn't work, it degrades inference performance instead HOT 4
- Support arbitrary options for execution providers
- Model failed to create because of output dimensions
- Question: Does ONNX-RT silently fallbacks to CPU? HOT 1
- Request for Supporting minShapes/optShapes/maxShapes for TensorRT HOT 1
- Will onxxruntime backend support INT8 on cpu ? HOT 1
- Enable "trt_build_heuristics_enable" optimization for onnxruntime-TensorRT HOT 2
- CPU Throttling when Deploying Triton with ONNX Backend on Kubernetes HOT 5
- Facing errors when installing onnxruntime backend for triton
- Failed to allocated memory for requested buffer of size X
- Is onnxruntime-genai supported? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxruntime_backend.