pykeio / ort Goto Github PK
View Code? Open in Web Editor NEWFast ML inference & training for Rust with ONNX Runtime
Home Page: https://ort.pyke.io/
License: Apache License 2.0
Fast ML inference & training for Rust with ONNX Runtime
Home Page: https://ort.pyke.io/
License: Apache License 2.0
Hello,
Thank you again for building these bindings.
I am working on integrating ONNX support for a project I have been working on (rust-bert). I have most existing pipelines working (from classification to text generation), but I am observing a severe performance degradation compared to the Libtorch backend using tch bindings.
Most of the pipeline logic is written using tch
tensors and I was hoping to be able to re-use most of this logic for ONNX models. I suspect the performance hit comes from the conversion between tch::Tensor
and ort::tensor::InputTensor
.
The current conversion I am using follows generally the following steps:
1. tch to ort:
1. tch::Tensor to `Vec`
2. `Vec` to `ndarray::ArrayD`
3. `ndarray::ArrayD` to `InputTensor`
The actual implementation looks like
let mut vec = vec![T::ZERO; num_elem];
tch_tensor.f_to_kind(T::KIND)?.f_copy_data(&mut vec, num_elem)?;
let shape: Vec<usize> = tch_tensor.size().iter().map(|s| *s as usize).collect();
let array = ndarray::ArrayD::from_shape_vec(ndarray::IxDyn(&shape), vec)?
let input_tensor = InputTensor::from_array(array )
2. ort to tch:
1. Extract array from `DynOrtTensor`
2. Convert array to slice
3. Create tensor from slice
The actual implementation looks like
let array = dyn_ort_tensor.try_extract::<f32>()?.view().to_owned();
let shape = array .shape().iter().map(|s| *s as i64).collect();
let slice = array .as_slice().unwrap()?;
let tensor = tensor::f_of_slice(slice)?;
tensor .f_reshape(shape)
This includes a lot of copy and memory allocations (especially given the slice intermediate representation). I was hoping to be able to convert from TchTensors
to OrtTensors
ideally without copy (creating these elements from the data point of the source element), or at least without having to go through the intermediate slices.
I have tried a few things on the tch
side, including creating a Tensor from a ndarray skipping the slice creation, but this still copies data over and I am unsure if there would be a better way of doing so.
impl<T: Element + Copy> TryInto<ndarray::ArrayD<T>> for &Tensor {
type Error = TchError;
fn try_into(self) -> Result<ndarray::ArrayD<T>, Self::Error> {
let num_elem = self.numel();
let shape: Vec<usize> = self.size().iter().map(|s| *s as usize).collect();
let array = unsafe {
let mut array = ndarray::ArrayD::uninit(ndarray::IxDyn(&shape));
at_copy_data(
self.to_kind(T::KIND).as_mut_ptr(),
array.as_mut_ptr() as *const c_void,
num_elem,
T::KIND.elt_size_in_bytes(),
);
array.assume_init()
};
Ok(array)
}
}
I understand you may not be fully familiar with the tch project - any hints on the way forward would be appreciated.
For information, the ONNX implementation I am working on is on guillaume-be/rust-bert#346
Thank you!
I'm trying to execute the code below
use ort::{
tensor::{
InputTensor,
DynOrtTensor,
FromArray,
OrtOwnedTensor
},
Environment,
LoggingLevel,
SessionBuilder,
OrtResult
};
use polars::{
datatypes::Float32Type,
prelude::*
};
use ndarray::IxDyn;
fn main () -> OrtResult<()> {
//Lendo o dataframe usando Polars
let dataframe = CsvReader::from_path("random_df.csv")
.unwrap()
.has_header(false)
.finish()
.unwrap()
.to_ndarray::<Float32Type>()
.unwrap();
//Criando o ambiente
let environment = Environment::builder()
.with_name("random_df_environment")
.with_log_level(LoggingLevel::Warning)
.build()?
.into_arc();
//Criando a sessão
let session = SessionBuilder::new(&environment)?
.with_model_from_file("random_df.onnx")?;
let input = vec![
InputTensor::from_array(dataframe.into_dyn())
];
let outputs: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> = session
.run(input)
.unwrap();
let scores = &outputs[0];
let scores: OrtOwnedTensor<'_, i64, IxDyn> = scores.try_extract()?;
let scores = scores.view();
let scores = scores.view();
println!("{:}", scores);
Ok(())
}
But i'm getting the Error: PointerShouldBeNull("CastTypeInfoToTensorInfo")
I googled it but didn't find a thing, could anyone help?
Hey,
first of all: thanks for the nice repo!
Having multiple CUDA or DirectML or OpenVino Devices: How do select the one that you want to use?
Thanks for any help :)
CODE:-
let session = SessionBuilder::new(&environment).unwrap()
.with_optimization_level(GraphOptimizationLevel::Level1).unwrap()
.with_intra_threads(1).unwrap()
.with_execution_providers([
ExecutionProvider::DirectML(DirectMLExecutionProviderOptions{device_id : 0})
]).unwrap()
.with_model_downloaded(ImageClassification::ResNet(ort::download::vision::ResNet::V2(ort::download::vision::ResNetV2::ResNet50)))
.expect("Could not download model from file");
CMD:-
$env:RUST_LOG = 'ort=debug';$env:ORT_STRATEGY = 'system'; $env:ORT_LIB_LOCATION = 'C:\Sandbox\rust-workspace\rust-ort\runtime';cargo run
"C:\Sandbox\rust-workspace\rust-ort\runtime" contains
CMD:-
$env:RUST_LOG = 'ort=debug';$env:ORT_STRATEGY = 'system'; $env:ORT_LIB_LOCATION = 'C:\Sandbox\rust-workspace\rust-ort\runtime\1.15.1';cargo run
"C:\Sandbox\rust-workspace\rust-ort\runtime\1.15.1" contains
nuget was downloaded from here
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.DirectML/1.15.0
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.DirectML/1.15.1
ERROR:-
Finished dev [unoptimized + debuginfo] target(s) in 5.18s
Running target\debug\rust-ort.exe
error: process didn't exit successfully: target\debug\rust-ort.exe
(exit code: 0xc0000138, STATUS_ORDINAL_NOT_FOUND)
I'm using the v2 branch for this, but the below is what is currently needed to get cudaHostRegister
pinned buffers working.
// cudaError_t is enum #[repr(u32)]
#[link(name = "cudart", kind = "dylib")]
extern "C" {
pub fn cudaHostRegister(ptr: *mut ::std::os::raw::c_void, size: usize, flags: ::std::os::raw::c_uint) -> cudaError_t;
pub fn cudaHostUnregister(ptr: *mut ::std::os::raw::c_void) -> cudaError_t;
}
let mut data1 = vec![0_u8; 16*1536*2048];
unsafe { cudaHostRegister(data1.as_mut_ptr() as _, data1.len(), cudaHostRegisterDefault) };
OrtValue
- but can't use Value::from_array
as it clones the data every time// o_mem and o_value wrap calls to CreateMemoryInfo and CreateTensorWithDataAsOrtValue
let shape = vec![16_i64, 1, 1536, 2048];
let mem_ptr = o_mem(ort::AllocationDevice::CPU, 0, ort::AllocatorType::Device, ort::MemType::CPUInput);
let input_tensor = unsafe { Value::from_raw(o_value(&mut data1, &shape, mem_ptr), session.inner()) };
bind.bind_input("images", input_tensor).unwrap();
bind.run().unwrap();
Using 50MB input buffers. PINNED buffer saves 1ms or 1.95%. Avoiding extra copy from ort::from_array
saves 19ms or 27%. Model is a yolov8m with custom starting layer for debayering and resize. Running on Quadro RTX 4000.
nvprof - compare with/without cudaHostRegister - 100 iterations
Time Name
643.54ms [CUDA memcpy HtoD] TensorRT with
747.56ms [CUDA memcpy HtoD] TensorRT without
659.65ms [CUDA memcpy HtoD] CUDA with
760.43ms [CUDA memcpy HtoD] CUDA without
nvsys analyze reports on PAGED async transfers without cudaHostRegister
Criterion results - pinned vs ort::from_raw()
vs standard ort::from_array()
forward_mymodel_onnx_cuda_pinned
time: [80.781 ms 80.847 ms 80.910 ms]
forward_mymodel_onnx_cuda_ort_fromraw
time: [81.675 ms 81.856 ms 82.093 ms]
forward_mymodel_onnx_cuda_ort_fromarray
time: [100.94 ms 101.06 ms 101.20 ms]
forward_mymodel_onnx_trt_pinned
time: [49.893 ms 49.950 ms 50.007 ms]
forward_mymodel_onnx_trt_ort_fromraw
time: [50.793 ms 50.943 ms 51.175 ms]
forward_mymodel_onnx_trt_ort_fromarray
time: [69.574 ms 69.701 ms 69.833 ms]
In super-resolution tasks, pure convolution is often used in order to adapt to different resolutions.
How should such a network be supported?
For example, This network accepts, N * 3 * 142 * 142
I still report an error after removing the length and width constraints.
pub fn new(runtime: &Arc<Environment>, models: &Path) -> OrtResult<Self> {
let mut session = make_session(runtime, models)?;
match session.inputs.get_mut(0) {
Some(s) => cancel_dimension(s, &[2, 3]),
None => {panic!("")}
};
Ok(Self { session })
}
pub fn make_session(runtime: &Arc<Environment>, model: &Path) -> OrtResult<Session> {
let build = SessionBuilder::new(&runtime)?
.with_execution_providers(&[ExecutionProvider::cuda(), ExecutionProvider::cpu()])?
.with_model_from_file(model)?;
Ok(build)
}
pub fn cancel_dimension(input: &mut Input, dimensions: &[usize]) {
for dim in dimensions {
match input.dimensions.get_mut(*dim) {
Some(s) => *s = None,
None => {}
}
}
}
got error:
[Input { name: "input", input_type: Float32, dimensions: [None, Some(3), None, None] }]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
Got invalid dimensions for input:
input for the following indices
index: 2 Got: 650 Expected: 142
index: 3 Got: 926 Expected: 142
Please fix either the inputs or the model.
Hello.
I am encountering problems while building ort for x86 on windows.
```error[E0277]: unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic
doesn't implement `Debug`
--> C:\Users\beqap.cargo\registry\src\github.com-1ecc6299db9ec823\ort-1.15.1\src\sys.rs:2965:2
|
2952 | #[derive(Debug, Copy, Clone)]
| ----- in this derive macro expansion
...
2965 | pub GetInputCharacteristic: ::std::option::Option<_system!(unsafe fn(op: *const OrtCustomOp, index: size_t) -> OrtCustomOpInputOutputCharacteristic...
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic` cannot be formatted using `{:?}` because it doesn't implement `Debug`
|
= help: the trait `Debug` is not implemented for `unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic`
= note: this error originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
session.run
accetps &[InputTensor]
.
I read in using the image package and got DynamicImage
, how to convert it to InputTensor
?
pub fn make_input_tensor(image: &RgbImage) -> InputTensor {
let (width, height) = image.dimensions();
let (width, height) = (width as usize, height as usize);
let channels = 3;
let shape = (width, height, channels);
let array = Array::from_shape_vec(shape, image.as_raw().to_vec()).unwrap();
// TODO: WHC -> NCHW, transpose(0,2,1).unqueeze(1)?
// Error: except IxDyn, found Dim<[Ix; 3]>
InputTensor::Uint8Tensor(array)
}
Steps to recreate:
cargo new --lib
Cargo.toml
add dependency ort = { version = "1.15.0", default-features = false, features = ["load-dynamic"] }
cargo b
. Compiling ort v1.15.0
error[E0308]: mismatched types
--> /Users/tushar/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-1.15.0/src/execution_providers.rs:614:22
|
614 | num_of_threads: options.num_threads,
| ^^^^^^^^^^^^^^^^^^^ expected `u64`, found `usize`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` (lib) due to previous error
I am on MacBook Air M1 with MacOs v13.4.1
Line 557 in 561870a
#[derive(Debug,Clone)]
pub struct Session {
//...
}
Right now running a session on a model that has a scalar input (0 dimension array) fails. I think these are rare, but one example is the silero-vad ONNX model which takes sample rate as a scalar input. Here's a minimum reproducible example:
use std::sync::Arc;
use ndarray::{arr0, Array};
use ort::{
tensor::{DynOrtTensor, FromArray, InputTensor, OrtOwnedTensor},
Environment, ExecutionProvider, GraphOptimizationLevel, OrtResult, SessionBuilder,
};
fn main() -> OrtResult<()> {
let environment = Arc::new(
Environment::builder()
.with_name("silero-vad")
.with_execution_providers([ExecutionProvider::cpu()])
.build()?,
);
let session = SessionBuilder::new(&environment)?
.with_optimization_level(GraphOptimizationLevel::Level1)?
.with_intra_threads(1)?
.with_model_from_file("./silero-vad.onnx")?;
let inputs = vec![
InputTensor::from_array(Array::<f32, _>::zeros([1, 512]).into_dyn()),
// 0-dim input //
InputTensor::from_array(arr0::<i64>(16000).into_dyn()),
InputTensor::from_array(Array::<f32, _>::zeros([2, 1, 64]).into_dyn()),
InputTensor::from_array(Array::<f32, _>::zeros([2, 1, 64]).into_dyn()),
];
let result: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> = session.run(inputs).unwrap();
let vad: OrtOwnedTensor<f32, _> = result[0].try_extract().unwrap();
println!("VAD: {:?}", vad);
Ok(())
}
Running this will result in a runtime error:
thread 'main' panicked at 'assertion failed: `(left != right)`
left: `0`,
right: `0`', /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.13.3/src/session.rs:627:5
By removing the the dimension assertion at that line it will run correctly
VAD: OrtOwnedTensor { data: TensorPtr { ptr: TensorPointerHolder { tensor_ptr: 0x55c22ac452d0 }, array_view: [[0.041475803]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2 } }
I'm not really sure what other effects removing that assertion would have here, but I'm happy to open a PR
Lines 624 to 632 in e4376dc
NuGet Gallery provides more pre-compiled runtimes, which can be obtained by unpacking nupkg file.
c headers in /build/native/include/
runtimes in /runtimes/
Microsoft.ML.OnnxRuntime
All platform runtime.
Microsoft.ML.OnnxRuntime.Gpu
Windows/Linux x64, built-in CUDA
TensorRT
support.
Microsoft.ML.OnnxRuntime.DirectML
Windows all architecture, built-in DirectML
support.
Hello,
I got a weird result that the output from the softmax function does not sum up to 1.
To address this, I have created a minimal reproducible example demonstrating the bug. You can find it at the following repo:
https://github.com/hobincar/minimal_reproducible_example_for_ort/tree/main
use std::path::Path;
use std::sync::Arc;
use std::vec::Vec;
use ndarray::{CowArray, Dim, IxDynImpl, arr1};
use ort::{
tensor::OrtOwnedTensor,
Environment, ExecutionProvider, GraphOptimizationLevel, OrtResult, SessionBuilder, Value
};
fn func() -> OrtResult<OrtOwnedTensor<'static, f32, Dim<IxDynImpl>>> {
let environment = Arc::new(
Environment::builder()
.with_execution_providers([ExecutionProvider::CPU(Default::default())])
.build()?
);
let session = SessionBuilder::new(&environment)?
.with_optimization_level(GraphOptimizationLevel::Disable)?
.with_model_from_file(Path::new(&String::from("softmax.onnx")))?;
let input = CowArray::from(arr1(&[1f32, 2f32, 3f32, 4f32])).into_dyn();
let output: Vec<Value> = session.run(vec![
Value::from_array(session.allocator(), &input)?,
])?;
let output: OrtOwnedTensor<f32, _> = output[0].try_extract()?;
println!("[1] output: {:?}", output);
let output = Ok(output);
println!("[2] output: {:?}", output);
output
}
fn main() {
let output = func();
println!("[3] output: {:?}", output);
}
The onnx model utilized is simple, consisting solely of a softmax operation:
Initially, the output appears correct ([0.032058604, 0.08714432, 0.23688284, 0.6439143]
), but unexpectedly, the values become incorrect after being returned by a function ([0.0, 0.0, 3.124826e-32, 6.1224e-41]
).
[1] output: OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.032058604, 0.08714432, 0.23688284, 0.6439143], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } }
[2] output: Ok(OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.032058604, 0.08714432, 0.23688284, 0.6439143], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } })
[3] output: Ok(OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.0, 0.0, 3.124826e-32, 6.1224e-41], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } })
Am I overlooking something, or could this be a bug?
Thanks in advance.
I wasn't able to generate macOS bindings for the v1.14.0-beta.0 release, I would greatly appreciate it if someone with a macOS machine could open a PR for regenerated bindings for x64 and ARM64! 😃
libclang is required; build with cargo build --features generate-bindings --target x86_64-apple-darwin
& aarch64-apple-darwin
as title
i only has cpu for onnx
If you drop an OrtDynTensor
or OrtOwnedTensor
after you drop the session (and possibly environment), the program segfaults.
error[E0308]: mismatched types
--> /home/<redacted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.3/src/execution_providers.rs:182:113
|
182 | let status = ortsys![unsafe UpdateCUDAProviderOptions(cuda_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len())];
| ^^^^^^^^^^ expected `u64`, found `usize`
|
::: /home/<redacted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.3/src/lib.rs:130:18
|
130 | unsafe { $crate::ort().$method.unwrap()($($n),+) }
| ------------------------------ arguments to this function are incorrect
|
help: you can convert a `usize` to a `u64` and panic if the converted value doesn't fit
|
182 | let status = ortsys![unsafe UpdateCUDAProviderOptions(cuda_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len().try_into().unwrap())];
| ++++++++++++++++++++
Session inputs/outputs should probably be reworked. The way they work now is a bit messy for my liking, and there is currently no clear path for supporting important features like IOBinding
or non-tensor types.
A few things I think should be addressed:
IOBinding
(#15)sequence<T>
and map<K, V>
types via Vec<T>
and HashMap<K, V>
(#30)Vec<T>
with a given shape (since some applications don't really need the input/output to be a complex tensor, i.e. Silero VAD)Originally posted by stexa April 6, 2023
Hey!
For the usage in audio processes and in general for the performance it would be really nice if the inputs and outputs could be set and allocated before the actual run() function. Somewhat like suggested here:
nbigaouette/onnxruntime-rs#41
This could be an additional function as well. Might this be something you would be interested in as well?
And thank you, I am very happy that someone is still working on a onnxruntime wrapper in rust, the crate is working out of the box like a charm for me :)
Originally posted by dzhao January 30, 2023
Hi, do you plan to support IOBiding for cuda/tensorrt?
https://stackoverflow.com/questions/70740287/onnxruntime-inference-is-way-slower-than-pytorch-on-gpu
This seems a critical feature for gpu serving.
Hi, thanks for this project! I really like the idea of being able to use Rust for ML!
I'm also relatively new to Rust, so I'm not sure if I'm on the wrong track.
I'd like to dynamically change an ONNX model. For that, I have a struct that will include both an onnx_session_builder
(SessionBuilder
) and an onnx_session
(Session
). However, when I create a Session
from the SessionBuilder
, the SessionBuilder
gets moved - with_model_downloaded
, with_model_from_file
, etc. takes self
.
I've tried cloning, Rc
, RefCell
, or even Box
and a combination of them with no success.
Is there a way to make a copy of a SessionBuilder
so I can recreate and replace a Session
?
Edit:
I was able to make it work by putting the crate as a local dependency and adding the#[derive(Clone)]
on SessionBuilder
. Are there any reasons not to do this that I'm missing?
Hi,
The signature of EnvBuilder::with_global_thread_pool
is as follows:
pub fn with_global_thread_pool(self, options: Vec<(String, String)>) -> EnvBuilder
What value the options
arguments is expected to contain?
Best
Musharraf
Since onnxRuntime supports, and provides pre-built binaries for Windows 32-bit, it is logical to support this build target for ort as well.
Currently, when trying to build using the following command,:
bash cargo build --target i686-pc-windows-msvc
I get the following output:
error: failed to run custom build command for `ort v1.14.1`
Caused by:
process didn't exit successfully: `D:\projects\blindpandas\libtashkeel\target\debug\build\ort-84e4d33039989d14\build-script-build` (exit code: 101)
--- stdout
[ort] strategy: "unknown"
--- stderr
thread 'main' panicked at 'unsupported target architecture: x86', C:\Users\user\.cargo\registry\src\github.com-1ecc6299db9ec823\ort-1.14.1\build.rs:385:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
Please fix this as it is sometime necessary to provide wheels for 32-bit python versions, or add inference to a 32-bit executable such as NVDA screen reader.
Best
Musharraf
The GPT example is not runnable =
$ cargo run --example gpt
Compiling ort v1.15.2 (/Users/hecmay/Desktop/libauto-rs/ort)
error[E0599]: no method named `with_model_downloaded` found for struct `SessionBuilder` in the current scope
--> examples/gpt.rs:30:4
|
27 | let session = SessionBuilder::new(&environment)?
| ___________________-
28 | | .with_optimization_level(GraphOptimizationLevel::Level1)?
29 | | .with_intra_threads(1)?
30 | | .with_model_downloaded(GPT2::GPT2LmHead)?;
| | -^^^^^^^^^^^^^^^^^^^^^ method not found in `SessionBuilder`
| |_________|
|
For more information about this error, try `rustc --explain E0599`.
error: could not compile `ort` (example "gpt") due to previous error
I added an old version of ort
as dep by using cargo add [email protected]
. When running cargo build
, it shows that the version that's being built is 1.15.2.
Reproduce:
cargo new test1
cd test1
cargo add [email protected] # This shows Adding ort v1.13.3 to dependencies.
cargo build # This shows Compiling ort v1.15.2
I'd be willing to help with the testing for this
Hi
I am trying to use your onnxruntime wrapper in my library. Ideally I'd like to build ort on windows with compile
strategy and static linking.
I am running build from Developer Command Prompt for VS 2019
as admin.
I am running build and test command
cargo build --features "directml" --features "prefer-compile-strategy" --features "compile-static"
cargo test --features "directml" --features "prefer-compile-strategy" --features "compile-static"
First it panics if clang is not installed
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: NotFound, message: "program not found" }', build.rs:495:138
But from the build script comments I think this is the small bug and instead your intention was to check if the clang and ninja are installed here:
// if we can use ninja on windows, great! let's use it!
// note that ninja + clang on windows is a total shitstorm so it's disabled for now
if Command::new("ninja").arg("--version").status().unwrap().success() && !Command::new("clang-cl").arg("--version").status().unwrap().success() {
...
So I just modify it to use ninja as cmake_generator and it fails with: fatal: No names found, cannot describe anything.
Full trace:
error: failed to run custom build command for `ort v1.13.2 (D:\RustProjects\ort)`
Caused by:
process didn't exit successfully: `D:\RustProjects\ort\target\debug\build\ort-53c3b3cbcbab5082\build-script-build` (exit code: 101)
--- stdout
[ort] strategy: "unknown"
cargo:rerun-if-env-changed=ORT_STRATEGY
Python 3.9.13
[ort] assuming C/C++ compilers are available
cargo:rerun-if-changed=D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\protoc-3.11.2-win32.zip
cargo:warning="C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\Hostx64\\x64"
--- stderr
2023-01-11 13:57:01,642 build [DEBUG] - Command line arguments:
--build --update --parallel --skip_tests --skip_submodule_sync --config Debug --disable_rtti --disable_memleak_checker --enable_msvc_static_runtime --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=0 --cmake_generator=Ninja --build_dir=build
2023-01-11 13:57:01,917 build [INFO] - Build started
2023-01-11 13:57:01,917 build [INFO] - Generating CMake build tree
2023-01-11 13:57:01,917 util.run [INFO] - Running subprocess in 'build\Debug'
'D:\Program Files\CMake\bin\cmake.EXE' 'D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\cmake' -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON '-DPython_EXECUTABLE=C:\Users\evil_unicorn\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe' '-DPYTHON_EXECUTABLE=C:\Users\evil_unicorn\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe' -Donnxruntime_USE_MIMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_JAVA=OFF -Donnxruntime_BUILD_NODEJS=OFF -Donnxruntime_BUILD_OBJC=OFF -Donnxruntime_BUILD_SHARED_LIB=OFF -Donnxruntime_BUILD_APPLE_FRAMEWORK=OFF -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_NNAPI_BUILTIN=OFF -Donnxruntime_USE_RKNPU=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_VITISAI=OFF -Donnxruntime_USE_TENSORRT=OFF -Donnxruntime_USE_TENSORRT_BUILTIN_PARSER=OFF -Donnxruntime_TENSORRT_PLACEHOLDER_BUILDER=OFF -Donnxruntime_USE_TVM=OFF -Donnxruntime_TVM_CUDA_RUNTIME=OFF -Donnxruntime_TVM_USE_HASH=OFF -Donnxruntime_USE_MIGRAPHX=OFF -Donnxruntime_CROSS_COMPILING=OFF -Donnxruntime_DISABLE_CONTRIB_OPS=OFF -Donnxruntime_DISABLE_ML_OPS=OFF -Donnxruntime_DISABLE_RTTI=ON -Donnxruntime_DISABLE_EXCEPTIONS=OFF -Donnxruntime_MINIMAL_BUILD=OFF -Donnxruntime_EXTENDED_MINIMAL_BUILD=OFF -Donnxruntime_MINIMAL_BUILD_CUSTOM_OPS=OFF -Donnxruntime_REDUCED_OPS_BUILD=OFF -Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF -Donnxruntime_USE_DML=OFF -Donnxruntime_USE_WINML=OFF -Donnxruntime_BUILD_MS_EXPERIMENTAL_OPS=OFF -Donnxruntime_USE_TELEMETRY=OFF -Donnxruntime_ENABLE_LTO=OFF -Donnxruntime_USE_ACL=OFF -Donnxruntime_USE_ACL_1902=OFF -Donnxruntime_USE_ACL_1905=OFF -Donnxruntime_USE_ACL_1908=OFF -Donnxruntime_USE_ACL_2002=OFF -Donnxruntime_USE_ARMNN=OFF -Donnxruntime_ARMNN_RELU_USE_CPU=ON -Donnxruntime_ARMNN_BN_USE_CPU=ON -Donnxruntime_ENABLE_NVTX_PROFILE=OFF -Donnxruntime_ENABLE_TRAINING=OFF -Donnxruntime_ENABLE_TRAINING_OPS=OFF -Donnxruntime_ENABLE_TRAINING_TORCH_INTEROP=OFF -Donnxruntime_ENABLE_TRAINING_ON_DEVICE=OFF -Donnxruntime_ENABLE_CPU_FP16_OPS=OFF -Donnxruntime_USE_NCCL=OFF -Donnxruntime_BUILD_BENCHMARKS=OFF -Donnxruntime_USE_ROCM=OFF -DOnnxruntime_GCOV_COVERAGE=OFF -Donnxruntime_USE_MPI=ON -Donnxruntime_ENABLE_MEMORY_PROFILE=OFF -Donnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=OFF -Donnxruntime_BUILD_WEBASSEMBLY=OFF -Donnxruntime_BUILD_WEBASSEMBLY_STATIC_LIB=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_EXCEPTION_CATCHING=ON -Donnxruntime_ENABLE_WEBASSEMBLY_EXCEPTION_THROWING=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_THREADS=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_DEBUG_INFO=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_PROFILING=OFF -Donnxruntime_ENABLE_EAGER_MODE=OFF -Donnxruntime_ENABLE_LAZY_TENSOR=OFF -Donnxruntime_ENABLE_EXTERNAL_CUSTOM_OP_SCHEMAS=OFF -Donnxruntime_ENABLE_CUDA_PROFILING=OFF -Donnxruntime_ENABLE_ROCM_PROFILING=OFF -Donnxruntime_USE_XNNPACK=OFF -Donnxruntime_USE_CANN=OFF -Donnxruntime_BUILD_UNIT_TESTS=0 -Donnxruntime_DEV_MODE=ON '-DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded$<$<CONFIG:Debug>:Debug>' -DONNX_USE_MSVC_STATIC_RUNTIME=ON -Dprotobuf_MSVC_STATIC_RUNTIME=ON -Dgtest_force_shared_crt=OFF -Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF -G Ninja -Donnxruntime_ENABLE_MEMLEAK_CHECKER=OFF -DCMAKE_BUILD_TYPE=Debug
Patch found: C:/Program Files/Git/usr/bin/patch.exe
Use protobuf from submodule
Use date from submodule
Use mp11 from submodule
Use json from submodule
Use re2 from submodule
Use cpuinfo from submodule
Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-ml.proto
Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-operators-ml.proto
Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-data.proto
Use flatbuffers from submodule
fatal: No names found, cannot describe anything.
CMake Warning (dev) at D:/Program Files/CMake/share/cmake-3.25/Modules/FetchContent.cmake:1279 (message):
The DOWNLOAD_EXTRACT_TIMESTAMP option was not given and policy CMP0135 is
not set. The policy's OLD behavior will be used. When using a URL
download, the timestamps of extracted files should preferably be that of
the time of extraction, otherwise code that depends on the extracted
contents might not be rebuilt if the URL changes. The OLD behavior
preserves the timestamps from the archive instead, but this is usually not
what you want. Update your project to the NEW behavior or specify the
DOWNLOAD_EXTRACT_TIMESTAMP option with a value of true to avoid this
robustness issue.
Call Stack (most recent call first):
external/abseil-cpp.cmake:20 (FetchContent_Declare)
onnxruntime_common.cmake:112 (include)
CMakeLists.txt:2054 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
2023-01-11 13:57:06,106 util.run [DEBUG] - Subprocess completed. Return code: 0
2023-01-11 13:57:06,106 build [INFO] - Building targets for Debug configuration
2023-01-11 13:57:06,107 util.run [INFO] - Running subprocess in 'D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime'
'D:\Program Files\CMake\bin\cmake.EXE' --build 'build\Debug' --config Debug -- -j6
Traceback (most recent call last):
File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 2812, in <module>
sys.exit(main())
File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 2727, in main
build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 1349, in build_targets
run_subprocess(cmd_args, env=env)
File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 740, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\python\util\run.py", line 49, in run
completed_process = subprocess.run(
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['D:\\Program Files\\CMake\\bin\\cmake.EXE', '--build', 'build\\Debug', '--config', 'Debug', '--', '-j6']' returned non-zero exit status 1.
thread 'main' panicked at 'failed to build ONNX Runtime', build.rs:518:13
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library\std\src\panicking.rs:575
1: core::panicking::panic_fmt
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library\core\src\panicking.rs:65
2: build_script_build::prepare_libort_dir
at .\build.rs:518
3: build_script_build::main
at .\build.rs:597
4: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943\library\core\src\ops\function.rs:251
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
warning: build failed, waiting for other jobs to finish...
Process finished with exit code 101
And when using Visual Studio code generator build passes just fine, but when building tests I get a ton of error LNK2001: unresolved external symbol
. Mostly names with Dbg
and protobuf
but there are also others.
All builds I tried were done after clean
.
I need help because I am out of ideas.
Hello,
I was performing inference on a model (centerface) using tensorrt execution provider. It did work, since tensorrt generated the engine files, but ExecutionProvider::tensorrt().is_available()
returns false instead.
Environment:
Tested on nvcr.io/nvidia/tensorrt:22.12-py3 docker image.
ort version: 1.14.3
tensorrt version: 8.5.1
cuda version: 11.8
Here's a snippet of the program i used to perform inference:
fn main() -> OrtResult<()> {
tracing_subscriber::fmt::init();
let environment = Arc::new(
Environment::builder()
.with_name("centerface")
.with_log_level(ort::LoggingLevel::Error)
.with_execution_providers([ExecutionProvider::tensorrt()
.with("trt_engine_cache_enable", "1")
.with("trt_engine_cache_path", "./cache")])
.build()?,
);
let session = SessionBuilder::new(&environment)?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(1)?
.with_model_from_file("centerface.onnx")?;
let input = Array::<f32, _>::random((10, 3, 32, 32), Standard);
let _outputs: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> =
session.run([InputTensor::from_array(input.into_dyn())])?;
println!("{:?}", ExecutionProvider::tensorrt().is_available());
Ok(())
}
Hi,
The title says it all. Android is an officially supported target for onnxruntime and Rust.
Microsoft provides pre-built artifacts for android on Maven
Moreover, onnxruntime provides Android-specific execution providers. See Execution Providers
Best
Musharraf
The Value
type is defined like this:
pub enum Value<'v> {
RustOwned {
ptr: *mut sys::OrtValue,
array: DynArrayRef<'v>,
memory_info: MemoryInfo
},
CppOwned {
ptr: *mut sys::OrtValue,
session: Arc<SessionPointerHolder>
}
}
Enum variants and their fields are public, so downstream code can:
Value
from an arbitrary dangling pointer, orValue
using the API, and then change the contained pointer to point whereeverNone of those things require unsafe code, so this API is unsound. It could be fixed by wrapping the enum
in another struct type, as a private field.
onnxruntime supports training see https://github.com/microsoft/onnxruntime-training-examples
Is this possible with ORT?
If so could someone give an example of how this might work?
My end goal is to see if I can fine tune models such as LLama using Rust, without jumping into python.
When enabling load dynamic, the compiler is complaining about mismatched types. (rustc 1.68.2)
error[E0308]: mismatched types
--> /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/execution_providers.rs:204:121
|
204 | let status = ortsys![unsafe UpdateTensorRTProviderOptions(tensorrt_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len())];
| ^^^^^^^^^^ expected `u64`, found `usize`
|
::: /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/lib.rs:130:18
|
130 | unsafe { $crate::ort().$method.unwrap()($($n),+) }
| ------------------------------ arguments to this function are incorrect
|
help: you can convert a `usize` to a `u64` and panic if the converted value doesn't fit
|
204 | let status = ortsys![unsafe UpdateTensorRTProviderOptions(tensorrt_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len().try_into().unwrap())];
| ++++++++++++++++++++
error[E0308]: mismatched types
--> /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/execution_providers.rs:258:21
|
258 | gpu_mem_limit: usize::MAX,
| ^^^^^^^^^^ expected `u64`, found `usize`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` due to 2 previous errors
Hi,
In my rust
code, there is the difference in inference time between tch
model and onnx
session.
tch model is much faster than onnx session. I wonder why.
Actually, there was little difference between the two in python
, but there is a lot in rust
.
I have no idea, I really want to solve this problem.
thank you,
ONNXRuntime
supports wasm
targets via it's onnxruntime for web
bindings.
According to this page you can build a static lib of onnxruntime for wasm targets
which you can then bundle with your C++
WebAssembly. project.
From what I understand, the wasm
lib provides the same C API
that regular onnxruntime provide.
Supporting wasm
targets will make it easy to deploy models built with ort
to web browsers and wasm
runtimes such as WasmTime and Wasmer.
Since wasm-runtimes are cross-platform, users can bundle their models along with pre and post processing code in a single, universal, executable module that they can then run on browsers or on any wasm-runtime.
Hello,
I am trying to work with the IoBindings
that were recently added, and I am facing a few issues. I could not find documentations or examples in the crate illustrating how this would work -- I am attempting to reproduce a minimal example in Python using onnxruntime
: the gist can be found here. I am attaching the tiny onnx model file (net.zip) to this issue, but it can be created again by running the notebook linked above.
Here are the current issues I am facing:
IoBindings
is not public: Line 19 in cb21def
(crate)
visibility modifier, but am I missing something on how the IoBinding
should be created?Drop
implementation for the IoBindings
does not include bound input and output. I believe bind_input
needs to take ownership of its value in the current implementation:pub fn bind_input<'a, 'b: 'a, 'c: 'b, S: AsRef<str> + Clone + Debug>(&'a mut self, name: S, ort_value: Value<'b>) -> OrtResult<()> {
[...]
}
Failed to get tensor type and shape: the ort_value must contain a constructed tensor or sparse tensor
The small Rust binary I am using for testing is included below for reference. This would require adding the tch dependency to access libtorch, please let me know if you have any issues doing so:
use anyhow;
use ndarray::{ArrayD, CowArray};
use ort::{AllocationDevice, AllocatorType, Environment, ExecutionProvider, GraphOptimizationLevel, IoBinding, MemoryInfo, MemType, SessionBuilder, Value};
use ort::tensor::OrtOwnedTensor;
fn main() -> anyhow::Result<()> {
tracing_subscriber::fmt::init();
let environment = Environment::builder()
.with_name("test")
.with_execution_providers([ExecutionProvider::CUDA(Default::default())])
.build()?
.into_arc();
let session = SessionBuilder::new(&environment)?
.with_optimization_level(GraphOptimizationLevel::Level1)?
.with_intra_threads(1)?.with_model_from_file("path/to/net.onnx")?;
let input_tensor = tch::Tensor::arange(16*2, (tch::Kind::Float, tch::Device::cuda_if_available())).view([16,2]);
// First option: ndarray
let input_array: ArrayD<f32> = input_tensor.as_ref().try_into()?;
let input_cow_array = CowArray::from(&input_array);
let output_array: OrtOwnedTensor<f32, _> = session.run(vec![Value::from_array(session.allocator(), &input_cow_array)?])?[0].try_extract()?;
println!("{:?}", output_array);
// Second option: IO Bindings
let mut io_bindings = IoBinding::new(&session)?;
let value = Value::from_array(session.allocator(), &input_cow_array)?;
let _ = io_bindings.bind_input("some_input", value)?;
let output_mem_info = MemoryInfo::new(AllocationDevice::CPU,0,AllocatorType::Device, MemType::Default)?;
let _ = io_bindings.bind_output("some_output", output_mem_info)?;
let outputs = io_bindings.outputs()?;
for (output_name, output_value) in outputs {
let output_array: OrtOwnedTensor<f32, _> = output_value.try_extract()?;
println!("{output_name}: {output_array:?}");
}
Ok(())
}
I have also tried extracting the values from the output memory info as follows:
for (_, output_value) in outputs {
let output_tensor = unsafe {
Tensor::from_blob(output_value.ptr() as *const u8, &[16, 5], &[5, 1], Kind::Float, Device::Cpu)
};
output_tensor.print();
}
but the values from the tensor are incorrect, so I guess the memory is not read from the right location.
Finally, eventually I think it would be great to be able to run something similar to the Python interface: where we can create 2 torch tensors (input and placeholder output), register pointers to these tensors in the io bindings, and calling session.run()
would populate the tensor output. This would probably require being allowed to pass raw pointers to the io-bindings, maybe a "dangerous
" module could be created to allow such usecase.
Thank you
Hello,
I am in the final stages of integrating ONNX support for a project via the ort
bindings. I have everything working well on my machine, and working on the CI when pinning the dependency to ort = {version="=1.14.3", optional = true, default-features = false, features = ["half"]}
. I downloaded the ONNXRuntime library and set up my ORT_DYLIB_PATH
to the .dll
downloaded manually.
For the CI I would like the download to happen automatically (since maintaining the download path per OS/version in the github cations is a bit of a hassle). This was working fine with version 1.14.3
, as can be seen by this CI run. I cannot get the same behaviour with 1.14.6
, see for example this run.
Could you please help me understand how I should set my features/CI to allow a migration from 1.14.3
to 1.14.6
? For information my actions file for ONNX tests is at availaable here, and this is my Cargo.toml
In the session run
function we can find many small Vec
creation which will do memory allocations.
As run
may be used in a "hot path" of a real time software where memory allocation is problematic, it would be nice to avoid them. It looks like they can be done at initialization time.
Some rust related reference: https://nnethercote.github.io/perf-book/heap-allocations.html#vec
If avoiding Vec
creation is too difficult, the smallvec
crate may be something to consider. With SmallVec
we can specify a "stack size", if the vector size goes above the stack size, it will allocate on heap, otherwise it will stay in the stack.
https://nnethercote.github.io/perf-book/heap-allocations.html#short-vecs
Hi,
The title says it all. Under x86_64-pc-windows-msvc
targets everything works perfectly, but the build fails under i686-pc-windows-msvc
targets.
Checking ort v1.15.0
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:88:56
|
88 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
| |
| arguments to this enum variant are incorrect
|
= note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
= note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:88:51
|
88 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ^^^^^-------------^
| |
| this argument influences the type of `Some`
note: tuple variant defined here
--> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
|
572 | Some(#[stable(feature = "rust1", since = "1.0.0")] T),
| ^^^^
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:99:56
|
99 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
| |
| arguments to this enum variant are incorrect
|
= note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
= note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:99:51
|
99 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ^^^^^-------------^
| |
| this argument influences the type of `Some`
note: tuple variant defined here
--> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
|
572 | Some(#[stable(feature = "rust1", since = "1.0.0")] T),
| ^^^^
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:203:57
|
203 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
| |
| arguments to this enum variant are incorrect
|
= note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
= note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:203:52
|
203 | let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
| ^^^^^-------------^
| |
| this argument influences the type of `Some`
note: tuple variant defined here
--> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
|
572 | Some(#[stable(feature = "rust1", since = "1.0.0")] T),
| ^^^^
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:778:20
|
778 | extract_io_count(f, session_ptr)
| ---------------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:786:5
|
786 | fn extract_io_count(
| ^^^^^^^^^^^^^^^^
787 | f: extern_system_fn! { unsafe fn(*const sys::OrtSession, *mut usize) -> *mut sys::OrtStatus },
| ---------------------------------------------------------------------------------------------
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:783:20
|
783 | extract_io_count(f, session_ptr)
| ---------------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:786:5
|
786 | fn extract_io_count(
| ^^^^^^^^^^^^^^^^
787 | f: extern_system_fn! { unsafe fn(*const sys::OrtSession, *mut usize) -> *mut sys::OrtStatus },
| ---------------------------------------------------------------------------------------------
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:802:19
|
802 | extract_io_name(f, session_ptr, allocator_ptr, i)
| --------------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _, _, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:816:5
|
816 | fn extract_io_name(
| ^^^^^^^^^^^^^^^
817 | / f: extern_system_fn! { unsafe fn(
818 | | *const sys::OrtSession,
819 | | size_t,
820 | | *mut sys::OrtAllocator,
821 | | *mut *mut c_char,
822 | | ) -> *mut sys::OrtStatus },
| |__________________________________-
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:807:19
|
807 | extract_io_name(f, session_ptr, allocator_ptr, i)
| --------------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _, _, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:816:5
|
816 | fn extract_io_name(
| ^^^^^^^^^^^^^^^
817 | / f: extern_system_fn! { unsafe fn(
818 | | *const sys::OrtSession,
819 | | size_t,
820 | | *mut sys::OrtAllocator,
821 | | *mut *mut c_char,
822 | | ) -> *mut sys::OrtStatus },
| |__________________________________-
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:839:45
|
839 | let (input_type, dimensions) = extract_io(f, session_ptr, i as _)?;
| ---------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:858:5
|
858 | fn extract_io(
| ^^^^^^^^^^
859 | / f: extern_system_fn! { unsafe fn(
860 | | *const sys::OrtSession,
861 | | size_t,
862 | | *mut *mut sys::OrtTypeInfo,
863 | | ) -> *mut sys::OrtStatus },
| |__________________________________-
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:850:46
|
850 | let (output_type, dimensions) = extract_io(f, session_ptr, i as _)?;
| ---------- ^ expected "stdcall" fn, found "C" fn
| |
| arguments to this function are incorrect
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _) -> _`
found fn pointer `unsafe extern "C" fn(_, _, _) -> _`
note: function defined here
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:858:5
|
858 | fn extract_io(
| ^^^^^^^^^^
859 | / f: extern_system_fn! { unsafe fn(
860 | | *const sys::OrtSession,
861 | | size_t,
862 | | *mut *mut sys::OrtTypeInfo,
863 | | ) -> *mut sys::OrtStatus },
| |__________________________________-
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\lib.rs:97:87
|
45 | ... ($(#[$meta])* unsafe extern "stdcall" fn $($tt)*);
| ------ expected due to this
...
97 | ...em_fn! { unsafe fn () -> *const ffi::c_char } = (*base).GetVersionString.unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected "stdcall" fn, found "C" fn
|
= note: expected fn pointer `unsafe extern "stdcall" fn() -> _`
found fn pointer `unsafe extern "C" fn() -> _`
error[E0308]: mismatched types
--> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\lib.rs:112:78
|
45 | ($(#[$meta:meta])* unsafe fn $($tt:tt)*) => ($(#[$meta])* unsafe extern "stdcall" fn $($tt)*);
| ------ expected due to this
...
112 | let get_api: extern_system_fn! { unsafe fn(u32) -> *const sys::OrtApi } = (*base).GetApi.unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^ expected "stdcall" fn, found "C" fn
|
= note: expected fn pointer `unsafe extern "stdcall" fn(_) -> _`
found fn pointer `unsafe extern "C" fn(_) -> _`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` (lib) due to 11 previous errors
warning: build failed, waiting for other jobs to finish...
Best
Musharraf
This is log.
2023-07-22T14:57:03.018264Z INFO apply_execution_providers: ort::execution_providers: CUDA execution provider registered successfully
2023-07-22T14:57:03.053798Z ERROR apply_execution_providers: ort::execution_providers: CUDA execution provider registration failed: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Project\yolov8_onnx_rust\target\debug\onnxruntime_providers_cuda.dll"
2023-07-22T14:57:03.054196Z INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-07-22T14:57:03.063023Z ERROR apply_execution_providers: ort::execution_providers: TensorRT execution provider registration failed: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Project\yolov8_onnx_rust\target\debug\onnxruntime_providers_tensorrt.dll"
I use Windows 11 22621.1992 .
Uses a dll that is automatically downloaded by the programme.
Thank you for creating this crate.
This error occurred while using TensorRT EP inference, both on my code and example code.
2023-04-15T15:47:24.013752Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Flush-to-zero and denormal-as-zero are off
2023-04-15T15:47:24.014425Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating and using per session threadpools since use_per_session_threads_ is true
2023-04-15T15:47:24.015015Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Dynamic block base set to 0
2023-04-15T15:47:24.202936Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Initializing session.
2023-04-15T15:47:24.203637Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.204585Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for CudaPinned
with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.205679Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for CUDA_CPU with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.206431Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for
OrtMemoryInfo:[name:Cuda id:0 OrtMemType:0 OrtAllocatorType:1 Device:[DeviceType:1 MemoryType:0 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.207074Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for
OrtMemoryInfo:[name:CudaPinned id:0 OrtMemType:-1 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:1 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.207709Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for
OrtMemoryInfo:[name:CUDA_CPU id:0 OrtMemType:-2 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:0 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.211005Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total shared scalar initializer count: 8
2023-04-15T15:47:24.215380Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total fused reshape node count: 02023-04-15T15:47:24.217219Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total shared scalar initializer count: 0
2023-04-15T15:47:24.218852Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total fused reshape node count: 02023-04-15T15:47:24.219866Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: [TensorRT EP] Model name is yolov5s_half.onnx
2023-04-15T15:47:25.570221Z INFO run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: [2023-04-15 15:47:25 WARNING] hDebInfo\_deps\onnx_tensorrt-src\onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to
cast down to INT32.
error: process didn't exit successfully: `target\debug\yolov5_onnx.exe -m .\yolov5s_half.onnx -i .\bus.jpg --half` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
2023-04-15T16:28:10.434694Z DEBUG ort::environment: Environment not yet initialized, creating a new one
2023-04-15T16:28:10.457540Z DEBUG ort::environment: Environment created env_ptr="0x22ad8786b70"
2023-04-15T16:28:10.458777Z INFO download_to{self=SessionBuilder { env: "GPT-2", allocator: Device, memory_type: Default } url="https://github.com/onnx/models/raw/main/text/machine_comprehension/gpt-2/model/gpt2-lm-head-10.onnx" download_dir="I:\\ort_test\\ort"}: ort::session: Model already exists, skipping download model_filepath="I:\\ort_test\\ort\\gpt2-lm-head-10.onnx"
2023-04-15T16:28:10.459654Z INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:10.652673Z INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:10.653144Z INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:17.079889Z INFO ort: [2023-04-15 16:28:17 WARNING] hDebInfo\_deps\onnx_tensorrt-src\onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
error: process didn't exit successfully: `target\debug\examples\gpt.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
I use pre-compiled onnxruntime downloaded from github, v1.14.1
Enviroment is
It seems that the lifetime is not necessary.
import torch
linear = torch.nn.Linear(2, 4)
x = torch.tensor([0.1, 0.2])
y = linear(x)
print(y)
torch.onnx.export(
linear,
(torch.rand(1, 2),),
'linear.onnx',
input_names=["x"],
output_names=['y'],
dynamic_axes={'x': {0: 'B'}, 'y': {0: 'B'}}
)
use std::sync::Arc;
use ndarray::{ArrayD, CowArray};
use ort::{tensor::OrtOwnedTensor, Environment, ExecutionProvider, SessionBuilder, Value};
fn main() {
let environment = Arc::new(
Environment::builder()
.with_execution_providers([ExecutionProvider::CPU(Default::default())])
.build()
.unwrap()
);
let mb = std::fs::read("linear.onnx").unwrap();
let session = SessionBuilder::new(&environment).unwrap().with_model_from_memory(&mb).unwrap();
drop(mb);
let x: ArrayD<f32> = ndarray::arr2(&[[0.1, 0.2]]).into_dyn();
let x = CowArray::from(x);
let outputs = session.run(vec![Value::from_array(session.allocator(), &x).unwrap()]).unwrap();
let y: OrtOwnedTensor<f32, _> = outputs[0].try_extract().unwrap();
dbg!(y.view().clone().into_dyn());
}
diff --git a/src/session.rs b/src/session.rs
index 2ef923f..c633613 100644
--- a/src/session.rs
+++ b/src/session.rs
@@ -481,7 +481,7 @@ impl SessionBuilder {
}
/// Load an ONNX graph from memory and commit the session.
- pub fn with_model_from_memory(self, model_bytes: &[u8]) -> OrtResult<InMemorySession<'_>> {
+ pub fn with_model_from_memory(self, model_bytes: &[u8]) -> OrtResult<Session> {
let mut session_ptr: *mut sys::OrtSession = std::ptr::null_mut();
let env_ptr: *const sys::OrtEnv = self.env.ptr();
@@ -533,7 +533,8 @@ impl SessionBuilder {
inputs,
outputs
};
- Ok(InMemorySession { session, phantom: PhantomData })
+ // Ok(InMemorySession { session, phantom: PhantomData })
+ Ok(session)
}
}
Hi there,
Thanks a lot for this amazing package. I meant to ask, how do I learn to make something like this? Learning how to write rust code is one thing, but could you reccomand me something specific to learn how to create this wrappers?
Thanks a lot.
Cheers,
Fra
I have a model I've converted to onnx format that is > 2GB. This results in a number of model.onnx files. I'm grappling the an efficient approach to loading the model.
For other models I will:
pub static ENCODER_MODEL: Lazy<Vec<u8>> = Lazy::new(|| {
let model_path = PathBuf::from(&*GLOBAL_AI_INCLUDE_ROOT).join("encoder/model/model.onnx");
let mut file = File::open(model_path).unwrap();
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).unwrap();
buffer
});
and then do something like:
pub struct SentenceEncoder<'s> {
session: InMemorySession<'s>,
}
impl SentenceEncoder<'_> {
pub fn new() -> SystemSyncResult<Self> {
let session = SessionBuilder::new(&ENVIRONMENT)?
.with_optimization_level(GraphOptimizationLevel::Level1)?
.with_model_from_memory(&*ENCODER_MODEL)?;
Ok(Self { session })
}
//more code....
}
I then have a dedicated threadpool that loads a specified number of threads and reuses the sessions across calls to ensure we can run inference in parallel. Callers can issue calls over a crossbeam channel to a watcher that prioritizes the incoming calls and dispatches them to a processing thread. The thread pics it up and based on the inference request type elects the appropriate session to run.
With most models the above strategy works great, and all session share the ENCODER_MODEL memory. This particular T5 model(it's a t5-large grammar synthesis pretrained model), when I convert it to onnx format, I end up with a bunch of files(model.onnx, onnx__MatMul_XXXXm shared.weights) due to Googles Protocol Buffers serialization format and size limitations. So the actual model.onnx file is very small and the rest of the model contents are located in the rest of the files.
Is there a way I can pass the model into ort from memory similar to how I'm doing with other models? Additionally, is there something I'm overlooking here(I'm new to using this crate)? Does the Microsoft runtime efficiently share the model representation in memory internally for sessions loaded with the same model file, so multiple calls to with_model_from_file()
won't result in unnecessary memory allocation? Am I misunderstanding something here?
Any help on this would be appreciated. Admittedly I haven't finished the implementation of this T5 model in rust yet, so I'm a bit preemptive in my questions.
Thanks again for the work on this crate!
Originally posted by dzhao December 30, 2022
Hi, first of all great crate!! much cleaner and more comprehensive.
I am ready to use it in one of my critical project.
One thing thought as we are using CPU for serving and our CPUs will benefit greatly from mkl lib.
Onnx has DnnlExecutionProvider that will enable the mkl support but it doesn't seem your crate provide this support..
Also I checked the downloaded onnx lib in my Mac and I don't see the dnnl lib so it looks like it is just using the normal MLAS cpu ep.
Is it possible to add this option?
Thanks so much for the work!
ort = { version = "1.15.2", features = ["load-dynamic", "tensorrt"] }
Environment::builder()
.with_execution_providers([ExecutionProvider::TensorRT(Default::default())])
.build()?
.into_arc();
ort::execution_providers: An error occurred when attempting to register `TensorrtExecutionProvider`: key/value cannot be empty
I spent some time poking around the onnxruntime and ort code. It looks like maybe onnxruntime is not getting the provider options somehow?
Hi! Thanks for all your work!
I gave this crate a try, and while it's easy to use, sadly I'm getting different (and thus wrong) outputs from my models.
Here is a gist for rust with ort: https://gist.github.com/LoipesMas/2d342b8087dbae4af31d8af2752e84de
Here is a gist for python with onnxruntime: https://gist.github.com/LoipesMas/d7258a3d009e9b06c3684d77e341251b
(Those are using the squeezenet models from here, but I originally run into this issue with a different, yolov7 based model)
rust+ort:
[src/main.rs:29] &input.shape() = [
1,
3,
224,
224,
]
[src/main.rs:30] input.slice(s![0, .., 100, 100]) = [0.92156863, 0.9529412, 0.99607843], shape=[3], strides=[1], layout=CFcf (0xf), const ndim=1
[src/main.rs:31] input.slice(s![0, .., 180, 50]) = [0.9882353, 0.96862745, 0.95686275], shape=[3], strides=[1], layout=CFcf (0xf), const ndim=1
[src/main.rs:48] max_score = Some(
(
794,
0.06649929,
),
)
[src/main.rs:49] scores.slice(s![0, 322, .., ..]) = [[1.08950435e-5]], shape=[1, 1], strides=[0, 0], layout=CFcf (0xf), const ndim=2
python:
frame.shape=(1, 3, 224, 224)
frame[0, :, 100, 100]=array([0.92156863, 0.9529412 , 0.99607843], dtype=float32)
frame[0, :, 180, 50]=array([0.9882353 , 0.96862745, 0.95686275], dtype=float32)
max_score=0.11967179
np.where(scores >= max_score)=(array([0]), array([669]), array([0]), array([0]))
scores[0][322]=array([[4.9559356e-05]], dtype=float32)
Input shapes and values are the same (I'm almost positive), but outputs are not even close (e.g. different indexes of max-value, different values (sometimes an order of magnitude different))
I'm not sure if that's an issue on my side (how I load data, how I use this crate or something else) or if it's on crate's side.
Since ort
is based on onnxruntime-rs
, this issue might be relevant (although probably not very helpful). Maybe those small errors add up somehow? No idea.
Thanks in advance!
Hi, thanks for a great project!
I am currently developing a runtime for an RNN-based LLM by converting the original model to ONNX and trying to potentially run it (or parts of it) using the ONNX hardware acceleration capabilities.
My main goal is mobile and running models there without NN-API on modern SoCs can mean ~x12 reduction in efficiency and performance.
Would it be hard to get the NNAPI running + get the downloadable build with it too? 👀
Will be certainly willing to help with testing!
Hello. First of all, thank you for sharing this great repo!
My onnx model uses boolean array as input, but converting from boolean ndarray to InputTensor seems not working (I tried with f64 ndarray and it works fine).
I recognizes that there is a PR (#58) which tries to fix it, but I also saw another PR (#41) that shows that you are working on the next release version.
My question is, will it be fixed on the next release version? I tried to figure it out myself, but it was hard because the code structure seems to be changed.
Thank you!
I've noticed that it's a common practice in Rust to leave expensive features off by default. For example, many packages don't include their derive macros as default features because they increase compile times.
I think it might be good to remove fetch-models
from the default features to match this convention.
What do you think? Any objections?
Hello 👋
Is there a way to link the ONNX runtime library statically (.a
, .lib
) into the outputted binary? This would simplify deployment!
You can find static builds of the ONNX runtime here: https://github.com/supertone-inc/onnxruntime-build/tree/main
Cheers,
Raphael
Hey there,
when compiling for windows 1809 using direct ml with cargo
[dependencies.ort]
version = "1.15.2"
features = ["download-binaries", "directml"]
I get the following error:
error: linking with `link.exe` failed: exit code: 1120
...
= note: libort-823b36f3ee700ecc.rlib(ort-823b36f3ee700ecc.ort.b1d99b0d4efe1b77-cgu.11.rcgu.o) : error LNK2019: unresolved external symbol OrtSessionOptionsAppendExecutionProvider_DML referenced in function _ZN3ort19execution_providers17ExecutionProvider5apply17h70344b19a4319081E
Could the reason be that direct ml is not installed in windows 1809? I tried to install it with
nuget install Microsoft.AI.DirectML -Version 1.12.1
Is there another, better way? I cannot upgrade the windows version at the momen
Thanks!
./program: error while loading shared libraries: libonnxruntime.so.1.15.1: cannot open shared object file: No such file or directory
Cargo.toml
[package]
name = "antivirus"
version = "0.1.0"
edition = "2021"
[profile.dev]
rpath = true
[profile.release]
rpath = true
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
ort = "1.15.2"
ndarray = "0.15.6"```
program/target/release ls:
``antivirus
antivirus.d
build
deps
examples
incremental
libonnxruntime.so
libonnxruntime.so.1.15.1``
Anyway I can fix this so I can share the binary without cargo run?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.