pykeio / ort Goto Github PK

View Code? Open in Web Editor NEW

817.0 8.0 92.0 3.57 MB

Fast ML inference & training for Rust with ONNX Runtime

Home Page: https://ort.pyke.io/

License: Apache License 2.0

Rust 99.26% Python 0.74%

ai inference machine-learning onnxruntime rust onnx

ort's People

Contributors

Stargazers

Watchers

ort's Issues

Conversion to/from `tch` Tensors

Hello,

Thank you again for building these bindings.

I am working on integrating ONNX support for a project I have been working on (rust-bert). I have most existing pipelines working (from classification to text generation), but I am observing a severe performance degradation compared to the Libtorch backend using tch bindings.

Most of the pipeline logic is written using tch tensors and I was hoping to be able to re-use most of this logic for ONNX models. I suspect the performance hit comes from the conversion between tch::Tensor and ort::tensor::InputTensor.

The current conversion I am using follows generally the following steps:

1. tch to ort:

1. tch::Tensor to `Vec`
2. `Vec` to `ndarray::ArrayD`
3. `ndarray::ArrayD` to `InputTensor`

The actual implementation looks like

let mut vec = vec![T::ZERO; num_elem];
tch_tensor.f_to_kind(T::KIND)?.f_copy_data(&mut vec, num_elem)?;
let shape: Vec<usize> = tch_tensor.size().iter().map(|s| *s as usize).collect();
let array = ndarray::ArrayD::from_shape_vec(ndarray::IxDyn(&shape), vec)?
let input_tensor = InputTensor::from_array(array )

2. ort to tch:

1. Extract array from `DynOrtTensor`
2. Convert array to slice
3. Create tensor from slice

The actual implementation looks like

let array = dyn_ort_tensor.try_extract::<f32>()?.view().to_owned();
let shape = array .shape().iter().map(|s| *s as i64).collect();
let slice = array .as_slice().unwrap()?;
let tensor = tensor::f_of_slice(slice)?;
tensor .f_reshape(shape)

This includes a lot of copy and memory allocations (especially given the slice intermediate representation). I was hoping to be able to convert from TchTensors to OrtTensors ideally without copy (creating these elements from the data point of the source element), or at least without having to go through the intermediate slices.

I have tried a few things on the tch side, including creating a Tensor from a ndarray skipping the slice creation, but this still copies data over and I am unsure if there would be a better way of doing so.

impl<T: Element + Copy> TryInto<ndarray::ArrayD<T>> for &Tensor {
    type Error = TchError;

    fn try_into(self) -> Result<ndarray::ArrayD<T>, Self::Error> {
        let num_elem = self.numel();
        let shape: Vec<usize> = self.size().iter().map(|s| *s as usize).collect();
        let array = unsafe {
            let mut array = ndarray::ArrayD::uninit(ndarray::IxDyn(&shape));
            at_copy_data(
                self.to_kind(T::KIND).as_mut_ptr(),
                array.as_mut_ptr() as *const c_void,
                num_elem,
                T::KIND.elt_size_in_bytes(),
            );
            array.assume_init()
        };
        Ok(array)
    }
}

I understand you may not be fully familiar with the tch project - any hints on the way forward would be appreciated.

For information, the ONNX implementation I am working on is on guillaume-be/rust-bert#346

Thank you!

Support for sequence/map types

I'm trying to execute the code below

use ort::{
    tensor::{
        InputTensor,
        DynOrtTensor,
        FromArray,
        OrtOwnedTensor
    },
    Environment,
    LoggingLevel,
    SessionBuilder,
    OrtResult
};
use polars::{
    datatypes::Float32Type,
    prelude::*
};
use ndarray::IxDyn;


fn main () -> OrtResult<()> {

    //Lendo o dataframe usando Polars
    let dataframe = CsvReader::from_path("random_df.csv")
        .unwrap()
        .has_header(false)
        .finish()
        .unwrap()
        .to_ndarray::<Float32Type>()
        .unwrap();

    //Criando o ambiente
    let environment = Environment::builder()
	    .with_name("random_df_environment")
	    .with_log_level(LoggingLevel::Warning)
	    .build()?
	    .into_arc();

    //Criando a sessão
    let session = SessionBuilder::new(&environment)?
	    .with_model_from_file("random_df.onnx")?;

    let input = vec![
        InputTensor::from_array(dataframe.into_dyn())
    ];

    let outputs: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> = session
        .run(input)
        .unwrap();
    let scores = &outputs[0];
    let scores: OrtOwnedTensor<'_, i64, IxDyn> = scores.try_extract()?;
    let scores = scores.view();
    let scores = scores.view();
    println!("{:}", scores);
    Ok(())
}

But i'm getting the Error: PointerShouldBeNull("CastTypeInfoToTensorInfo")
I googled it but didn't find a thing, could anyone help?

Multiple GPUs/ExecutionDevices: How to select one for inference?

Hey,

first of all: thanks for the nice repo!

Having multiple CUDA or DirectML or OpenVino Devices: How do select the one that you want to use?

Thanks for any help :)

Running DirectML execution provider with onnxruntime.dll version 1.15.0 and 1.15.1 fails

CODE:-

let session = SessionBuilder::new(&environment).unwrap()
    .with_optimization_level(GraphOptimizationLevel::Level1).unwrap()
    .with_intra_threads(1).unwrap()
    .with_execution_providers([
        ExecutionProvider::DirectML(DirectMLExecutionProviderOptions{device_id : 0})
    ]).unwrap()
    .with_model_downloaded(ImageClassification::ResNet(ort::download::vision::ResNet::V2(ort::download::vision::ResNetV2::ResNet50)))
    .expect("Could not download model from file");

CMD:-
$env:RUST_LOG = 'ort=debug';$env:ORT_STRATEGY = 'system'; $env:ORT_LIB_LOCATION = 'C:\Sandbox\rust-workspace\rust-ort\runtime';cargo run

"C:\Sandbox\rust-workspace\rust-ort\runtime" contains

"onnxruntime.dll" version 1.15.0
"onnxruntime.lib"

CMD:-
$env:RUST_LOG = 'ort=debug';$env:ORT_STRATEGY = 'system'; $env:ORT_LIB_LOCATION = 'C:\Sandbox\rust-workspace\rust-ort\runtime\1.15.1';cargo run

"C:\Sandbox\rust-workspace\rust-ort\runtime\1.15.1" contains

"onnxruntime.dll" version 1.15.1
"onnxruntime.lib"

nuget was downloaded from here
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.DirectML/1.15.0
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.DirectML/1.15.1

ERROR:-
Finished dev [unoptimized + debuginfo] target(s) in 5.18s
Running target\debug\rust-ort.exe
error: process didn't exit successfully: target\debug\rust-ort.exe (exit code: 0xc0000138, STATUS_ORDINAL_NOT_FOUND)

HowTo - Notes to use pinned host buffers for cuda and tensorrt

I'm using the v2 branch for this, but the below is what is currently needed to get cudaHostRegister pinned buffers working.

Link in CUDA runtime

// cudaError_t is enum #[repr(u32)]
#[link(name = "cudart", kind = "dylib")]
extern "C" {
    pub fn cudaHostRegister(ptr: *mut ::std::os::raw::c_void, size: usize, flags: ::std::os::raw::c_uint) -> cudaError_t;
    pub fn cudaHostUnregister(ptr: *mut ::std::os::raw::c_void) -> cudaError_t;
}

Alloc your buffer and register it with cuda

let mut data1 = vec![0_u8; 16*1536*2048];
unsafe { cudaHostRegister(data1.as_mut_ptr() as _, data1.len(), cudaHostRegisterDefault) };

Create the OrtValue - but can't use Value::from_array as it clones the data every time

// o_mem and o_value wrap calls to CreateMemoryInfo and CreateTensorWithDataAsOrtValue
let shape = vec![16_i64, 1, 1536, 2048];
let mem_ptr = o_mem(ort::AllocationDevice::CPU, 0, ort::AllocatorType::Device, ort::MemType::CPUInput);
let input_tensor = unsafe { Value::from_raw(o_value(&mut data1, &shape, mem_ptr), session.inner()) };
bind.bind_input("images", input_tensor).unwrap();
bind.run().unwrap();

Performance difference

Using 50MB input buffers. PINNED buffer saves 1ms or 1.95%. Avoiding extra copy from ort::from_array saves 19ms or 27%. Model is a yolov8m with custom starting layer for debayering and resize. Running on Quadro RTX 4000.

nvprof - compare with/without cudaHostRegister - 100 iterations
    Time   Name
643.54ms   [CUDA memcpy HtoD] TensorRT with
747.56ms   [CUDA memcpy HtoD] TensorRT without
659.65ms   [CUDA memcpy HtoD] CUDA with
760.43ms   [CUDA memcpy HtoD] CUDA without

nvsys analyze reports on PAGED async transfers without cudaHostRegister

Criterion results - pinned vs ort::from_raw() vs standard ort::from_array()

forward_mymodel_onnx_cuda_pinned
                        time:   [80.781 ms 80.847 ms 80.910 ms]
forward_mymodel_onnx_cuda_ort_fromraw
                        time:   [81.675 ms 81.856 ms 82.093 ms]
forward_mymodel_onnx_cuda_ort_fromarray
                        time:   [100.94 ms 101.06 ms 101.20 ms]

forward_mymodel_onnx_trt_pinned
                        time:   [49.893 ms 49.950 ms 50.007 ms]
forward_mymodel_onnx_trt_ort_fromraw
                        time:   [50.793 ms 50.943 ms 51.175 ms]
forward_mymodel_onnx_trt_ort_fromarray
                        time:   [69.574 ms 69.701 ms 69.833 ms]

How to support dynamic input size?

In super-resolution tasks, pure convolution is often used in order to adapt to different resolutions.

How should such a network be supported?

For example, This network accepts, N * 3 * 142 * 142

noise3_model.zip

I still report an error after removing the length and width constraints.

pub fn new(runtime: &Arc<Environment>, models: &Path) -> OrtResult<Self> {
    let mut session = make_session(runtime, models)?;
    match session.inputs.get_mut(0) {
        Some(s) => cancel_dimension(s, &[2, 3]),
        None => {panic!("")}
    };
    Ok(Self { session })
}
pub fn make_session(runtime: &Arc<Environment>, model: &Path) -> OrtResult<Session> {
    let build = SessionBuilder::new(&runtime)?
        .with_execution_providers(&[ExecutionProvider::cuda(), ExecutionProvider::cpu()])?
        .with_model_from_file(model)?;
    Ok(build)
}
pub fn cancel_dimension(input: &mut Input, dimensions: &[usize]) {
    for dim in dimensions {
        match input.dimensions.get_mut(*dim) {
            Some(s) => *s = None,
            None => {}
        }
    }
}

got error:

[Input { name: "input", input_type: Float32, dimensions: [None, Some(3), None, None] }]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
    Got invalid dimensions for input: 
        input for the following indices
        index: 2 Got: 650 Expected: 142
        index: 3 Got: 926 Expected: 142
    Please fix either the inputs or the model.

problems building on x86 on windows

Hello.

I am encountering problems while building ort for x86 on windows.

```error[E0277]: unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic doesn't implement `Debug`
--> C:\Users\beqap.cargo\registry\src\github.com-1ecc6299db9ec823\ort-1.15.1\src\sys.rs:2965:2
|
2952 | #[derive(Debug, Copy, Clone)]
| ----- in this derive macro expansion
...
2965 | pub GetInputCharacteristic: ::std::option::Option<_system!(unsafe fn(op: *const OrtCustomOp, index: size_t) -> OrtCustomOpInputOutputCharacteristic...
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic` cannot be formatted using `{:?}` because it doesn't implement `Debug`
|
= help: the trait `Debug` is not implemented for `unsafe extern "stdcall" fn(*const OrtCustomOp, usize) -> OrtCustomOpInputOutputCharacteristic`
= note: this error originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)

How to convert DynamicImage -> InputTensor?

session.run accetps &[InputTensor].

I read in using the image package and got DynamicImage, how to convert it to InputTensor?

pub fn make_input_tensor(image: &RgbImage) -> InputTensor {
    let (width, height) = image.dimensions();
    let (width, height) = (width as usize, height as usize);
    let channels = 3;
    let shape = (width, height, channels);
    let array = Array::from_shape_vec(shape, image.as_raw().to_vec()).unwrap();
    // TODO: WHC -> NCHW,  transpose(0,2,1).unqueeze(1)?
    // Error: except IxDyn, found Dim<[Ix; 3]>
    InputTensor::Uint8Tensor(array)
}

Compilation Fails on release 1.15.0

Steps to recreate:

Create a new lib package using cargo new --lib
In Cargo.toml add dependency ort = { version = "1.15.0", default-features = false, features = ["load-dynamic"] }
build the lib using cargo b.
I am getting the following error:

   Compiling ort v1.15.0
error[E0308]: mismatched types
   --> /Users/tushar/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-1.15.0/src/execution_providers.rs:614:22
    |
614 |                     num_of_threads: options.num_threads,
    |                                     ^^^^^^^^^^^^^^^^^^^ expected `u64`, found `usize`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` (lib) due to previous error

I am on MacBook Air M1 with MacOs v13.4.1

Does "Session" implement clone?

ort/src/session.rs

Line 557 in 561870a

pub struct Session {

#[derive(Debug,Clone)]
pub struct Session {
//...
}

Support for scalar inputs

Right now running a session on a model that has a scalar input (0 dimension array) fails. I think these are rare, but one example is the silero-vad ONNX model which takes sample rate as a scalar input. Here's a minimum reproducible example:

use std::sync::Arc;

use ndarray::{arr0, Array};
use ort::{
    tensor::{DynOrtTensor, FromArray, InputTensor, OrtOwnedTensor},
    Environment, ExecutionProvider, GraphOptimizationLevel, OrtResult, SessionBuilder,
};

fn main() -> OrtResult<()> {
    let environment = Arc::new(
        Environment::builder()
            .with_name("silero-vad")
            .with_execution_providers([ExecutionProvider::cpu()])
            .build()?,
    );

    let session = SessionBuilder::new(&environment)?
        .with_optimization_level(GraphOptimizationLevel::Level1)?
        .with_intra_threads(1)?
        .with_model_from_file("./silero-vad.onnx")?;
    let inputs = vec![
        InputTensor::from_array(Array::<f32, _>::zeros([1, 512]).into_dyn()),
        // 0-dim input //
        InputTensor::from_array(arr0::<i64>(16000).into_dyn()),
        InputTensor::from_array(Array::<f32, _>::zeros([2, 1, 64]).into_dyn()),
        InputTensor::from_array(Array::<f32, _>::zeros([2, 1, 64]).into_dyn()),
    ];

    let result: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> = session.run(inputs).unwrap();
    let vad: OrtOwnedTensor<f32, _> = result[0].try_extract().unwrap();
    println!("VAD: {:?}", vad);

    Ok(())
}

Running this will result in a runtime error:

thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `0`,
 right: `0`', /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.13.3/src/session.rs:627:5

By removing the the dimension assertion at that line it will run correctly

VAD: OrtOwnedTensor { data: TensorPtr { ptr: TensorPointerHolder { tensor_ptr: 0x55c22ac452d0 }, array_view: [[0.041475803]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2 } }

I'm not really sure what other effects removing that assertion would have here, but I'm happy to open a PR

ort/src/session.rs

Lines 624 to 632 in e4376dc

 unsafe fn get_tensor_dimensions(tensor_info_ptr: *const sys::OrtTensorTypeAndShapeInfo) -> OrtResult<Vec<i64>> { 

 let mut num_dims = 0; 

 ortsys![GetDimensionsCount(tensor_info_ptr, &mut num_dims) -> OrtError::GetDimensionsCount]; 

 assert_ne!(num_dims, 0); 

 let mut node_dims: Vec<i64> = vec![0; num_dims as _]; 

 ortsys![GetDimensions(tensor_info_ptr, node_dims.as_mut_ptr(), num_dims) -> OrtError::GetDimensions]; 

 Ok(node_dims) 

 }

Provide prebuilt ONNX Runtime binaries with more execution providers support

NuGet Gallery provides more pre-compiled runtimes, which can be obtained by unpacking nupkg file.

c headers in /build/native/include/
runtimes in /runtimes/

Microsoft.ML.OnnxRuntime
All platform runtime.
Microsoft.ML.OnnxRuntime.Gpu
Windows/Linux x64, built-in CUDA TensorRT support.
Microsoft.ML.OnnxRuntime.DirectML
Windows all architecture, built-in DirectML support.

Output goes wrong after being returned by a function

Hello,
I got a weird result that the output from the softmax function does not sum up to 1.

To address this, I have created a minimal reproducible example demonstrating the bug. You can find it at the following repo:
https://github.com/hobincar/minimal_reproducible_example_for_ort/tree/main

use std::path::Path;
use std::sync::Arc;
use std::vec::Vec;

use ndarray::{CowArray, Dim, IxDynImpl, arr1};
use ort::{
    tensor::OrtOwnedTensor,
    Environment, ExecutionProvider, GraphOptimizationLevel, OrtResult, SessionBuilder, Value
};


fn func() -> OrtResult<OrtOwnedTensor<'static, f32, Dim<IxDynImpl>>> {
    let environment = Arc::new(
        Environment::builder()
            .with_execution_providers([ExecutionProvider::CPU(Default::default())])
            .build()?
    );
    let session = SessionBuilder::new(&environment)?
        .with_optimization_level(GraphOptimizationLevel::Disable)?
        .with_model_from_file(Path::new(&String::from("softmax.onnx")))?;

    let input = CowArray::from(arr1(&[1f32, 2f32, 3f32, 4f32])).into_dyn();

    let output: Vec<Value> = session.run(vec![
        Value::from_array(session.allocator(), &input)?,
    ])?;
    let output: OrtOwnedTensor<f32, _> = output[0].try_extract()?;
    println!("[1] output: {:?}", output);

    let output = Ok(output);
    println!("[2] output: {:?}", output);

    output
}


fn main() {
    let output = func();

    println!("[3] output: {:?}", output);
}

The onnx model utilized is simple, consisting solely of a softmax operation:

Initially, the output appears correct ([0.032058604, 0.08714432, 0.23688284, 0.6439143]), but unexpectedly, the values become incorrect after being returned by a function ([0.0, 0.0, 3.124826e-32, 6.1224e-41]).

[1] output: OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.032058604, 0.08714432, 0.23688284, 0.6439143], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } }
[2] output: Ok(OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.032058604, 0.08714432, 0.23688284, 0.6439143], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } })
[3] output: Ok(OrtOwnedTensor { data: TensorPtr { ptr: 0xaaab0b67e7d0, array_view: [0.0, 0.0, 3.124826e-32, 6.1224e-41], shape=[4], strides=[1], layout=CFcf (0xf), dynamic ndim=1 } })

Am I overlooking something, or could this be a bug?
Thanks in advance.

v1.14.0-beta.0 macOS bindings

I wasn't able to generate macOS bindings for the v1.14.0-beta.0 release, I would greatly appreciate it if someone with a macOS machine could open a PR for regenerated bindings for x64 and ARM64! 😃

libclang is required; build with cargo build --features generate-bindings --target x86_64-apple-darwin & aarch64-apple-darwin

add support for tvm and openvino

as title
i only has cpu for onnx

Segfault when dropping ORT tensors after session

If you drop an OrtDynTensor or OrtOwnedTensor after you drop the session (and possibly environment), the program segfaults.

CUDA Provider Compile err on linux/aarch64: u64 != usize

error[E0308]: mismatched types
   --> /home/<redacted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.3/src/execution_providers.rs:182:113
    |
182 |                 let status = ortsys![unsafe UpdateCUDAProviderOptions(cuda_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len())];
    |                                                                                                                             ^^^^^^^^^^ expected `u64`, found `usize`
    |
   ::: /home/<redacted>/.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.3/src/lib.rs:130:18
    |
130 |         unsafe { $crate::ort().$method.unwrap()($($n),+) }
    |                  ------------------------------ arguments to this function are incorrect
    |
help: you can convert a `usize` to a `u64` and panic if the converted value doesn't fit
    |
182 |                 let status = ortsys![unsafe UpdateCUDAProviderOptions(cuda_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len().try_into().unwrap())];
    |                                                                                                                                       ++++++++++++++++++++

IO overhaul

Session inputs/outputs should probably be reworked. The way they work now is a bit messy for my liking, and there is currently no clear path for supporting important features like IOBinding or non-tensor types.

A few things I think should be addressed:

Allow for preallocating inputs/outputs to improve performance (#37)
Implement IOBinding (#15)
Support for sequence<T> and map<K, V> types via Vec<T> and HashMap<K, V> (#30)
Support converting tensors to/from Vec<T> with a given shape (since some applications don't really need the input/output to be a complex tensor, i.e. Silero VAD)

Allocate outside of run function: Discussed in #37

^{Originally posted by stexa April 6, 2023}
Hey!
For the usage in audio processes and in general for the performance it would be really nice if the inputs and outputs could be set and allocated before the actual run() function. Somewhat like suggested here:
nbigaouette/onnxruntime-rs#41

This could be an additional function as well. Might this be something you would be interested in as well?

And thank you, I am very happy that someone is still working on a onnxruntime wrapper in rust, the crate is working out of the box like a charm for me :)

IOBinding: Discussed in #15

^{Originally posted by dzhao January 30, 2023}
Hi, do you plan to support IOBiding for cuda/tensorrt?
https://stackoverflow.com/questions/70740287/onnxruntime-inference-is-way-slower-than-pytorch-on-gpu

This seems a critical feature for gpu serving.

How to clone a `SessionBuilder`?

Hi, thanks for this project! I really like the idea of being able to use Rust for ML!

I'm also relatively new to Rust, so I'm not sure if I'm on the wrong track.

I'd like to dynamically change an ONNX model. For that, I have a struct that will include both an onnx_session_builder (SessionBuilder) and an onnx_session (Session). However, when I create a Session from the SessionBuilder, the SessionBuilder gets moved - with_model_downloaded, with_model_from_file, etc. takes self.

I've tried cloning, Rc, RefCell, or even Box and a combination of them with no success.

Is there a way to make a copy of a SessionBuilder so I can recreate and replace a Session?

Edit:

I was able to make it work by putting the crate as a local dependency and adding the#[derive(Clone)] on SessionBuilder. Are there any reasons not to do this that I'm missing?

Document EnvBuilder::with_global_thread_pool

Hi,

The signature of EnvBuilder::with_global_thread_pool is as follows:

pub fn with_global_thread_pool(self, options: Vec<(String, String)>) -> EnvBuilder

What value the options arguments is expected to contain?

Best
Musharraf

Build fails for target i686-pc-windows-msvc

Since onnxRuntime supports, and provides pre-built binaries for Windows 32-bit, it is logical to support this build target for ort as well.

Currently, when trying to build using the following command,:

bash cargo build --target i686-pc-windows-msvc

I get the following output:

error: failed to run custom build command for `ort v1.14.1`

Caused by:
  process didn't exit successfully: `D:\projects\blindpandas\libtashkeel\target\debug\build\ort-84e4d33039989d14\build-script-build` (exit code: 101)
  --- stdout
  [ort] strategy: "unknown"

  --- stderr
  thread 'main' panicked at 'unsupported target architecture: x86', C:\Users\user\.cargo\registry\src\github.com-1ecc6299db9ec823\ort-1.14.1\build.rs:385:9
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...

Please fix this as it is sometime necessary to provide wheels for 32-bit python versions, or add inference to a 32-bit executable such as NVDA screen reader.

Best
Musharraf

method not found in `SessionBuilder`

The GPT example is not runnable =

$ cargo run --example gpt
   Compiling ort v1.15.2 (/Users/hecmay/Desktop/libauto-rs/ort)
error[E0599]: no method named `with_model_downloaded` found for struct `SessionBuilder` in the current scope
  --> examples/gpt.rs:30:4
   |
27 |       let session = SessionBuilder::new(&environment)?
   |  ___________________-
28 | |         .with_optimization_level(GraphOptimizationLevel::Level1)?
29 | |         .with_intra_threads(1)?
30 | |         .with_model_downloaded(GPT2::GPT2LmHead)?;
   | |         -^^^^^^^^^^^^^^^^^^^^^ method not found in `SessionBuilder`
   | |_________|
   | 

For more information about this error, try `rustc --explain E0599`.
error: could not compile `ort` (example "gpt") due to previous error

ort version does not match what is specified in Cargo.toml

I added an old version of ort as dep by using cargo add [email protected]. When running cargo build, it shows that the version that's being built is 1.15.2.

Reproduce:

cargo new test1
cd test1
cargo add [email protected] # This shows Adding ort v1.13.3 to dependencies.
cargo build # This shows Compiling ort v1.15.2

rocm support

I'd be willing to help with the testing for this

Windows `compile` build fails.

I am trying to use your onnxruntime wrapper in my library. Ideally I'd like to build ort on windows with compile strategy and static linking.
I am running build from Developer Command Prompt for VS 2019 as admin.

I am running build and test command

cargo build --features "directml" --features "prefer-compile-strategy" --features "compile-static"
cargo test --features "directml" --features "prefer-compile-strategy" --features "compile-static"

First it panics if clang is not installed

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: NotFound, message: "program not found" }', build.rs:495:138

But from the build script comments I think this is the small bug and instead your intention was to check if the clang and ninja are installed here:

// if we can use ninja on windows, great! let's use it!
// note that ninja + clang on windows is a total shitstorm so it's disabled for now
if Command::new("ninja").arg("--version").status().unwrap().success() && !Command::new("clang-cl").arg("--version").status().unwrap().success() {
...

So I just modify it to use ninja as cmake_generator and it fails with: fatal: No names found, cannot describe anything.

Full trace:

error: failed to run custom build command for `ort v1.13.2 (D:\RustProjects\ort)`
Caused by:
  process didn't exit successfully: `D:\RustProjects\ort\target\debug\build\ort-53c3b3cbcbab5082\build-script-build` (exit code: 101)
  --- stdout
  [ort] strategy: "unknown"
  cargo:rerun-if-env-changed=ORT_STRATEGY
  Python 3.9.13
  [ort] assuming C/C++ compilers are available
  cargo:rerun-if-changed=D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\protoc-3.11.2-win32.zip
  cargo:warning="C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\Hostx64\\x64"
  --- stderr
  2023-01-11 13:57:01,642 build [DEBUG] - Command line arguments:
    --build --update --parallel --skip_tests --skip_submodule_sync --config Debug --disable_rtti --disable_memleak_checker --enable_msvc_static_runtime --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=0 --cmake_generator=Ninja --build_dir=build
  2023-01-11 13:57:01,917 build [INFO] - Build started
  2023-01-11 13:57:01,917 build [INFO] - Generating CMake build tree
  2023-01-11 13:57:01,917 util.run [INFO] - Running subprocess in 'build\Debug'
    'D:\Program Files\CMake\bin\cmake.EXE' 'D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\cmake' -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON '-DPython_EXECUTABLE=C:\Users\evil_unicorn\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe' '-DPYTHON_EXECUTABLE=C:\Users\evil_unicorn\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe' -Donnxruntime_USE_MIMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_JAVA=OFF -Donnxruntime_BUILD_NODEJS=OFF -Donnxruntime_BUILD_OBJC=OFF -Donnxruntime_BUILD_SHARED_LIB=OFF -Donnxruntime_BUILD_APPLE_FRAMEWORK=OFF -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_NNAPI_BUILTIN=OFF -Donnxruntime_USE_RKNPU=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_VITISAI=OFF -Donnxruntime_USE_TENSORRT=OFF -Donnxruntime_USE_TENSORRT_BUILTIN_PARSER=OFF -Donnxruntime_TENSORRT_PLACEHOLDER_BUILDER=OFF -Donnxruntime_USE_TVM=OFF -Donnxruntime_TVM_CUDA_RUNTIME=OFF -Donnxruntime_TVM_USE_HASH=OFF -Donnxruntime_USE_MIGRAPHX=OFF -Donnxruntime_CROSS_COMPILING=OFF -Donnxruntime_DISABLE_CONTRIB_OPS=OFF -Donnxruntime_DISABLE_ML_OPS=OFF -Donnxruntime_DISABLE_RTTI=ON -Donnxruntime_DISABLE_EXCEPTIONS=OFF -Donnxruntime_MINIMAL_BUILD=OFF -Donnxruntime_EXTENDED_MINIMAL_BUILD=OFF -Donnxruntime_MINIMAL_BUILD_CUSTOM_OPS=OFF -Donnxruntime_REDUCED_OPS_BUILD=OFF -Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF -Donnxruntime_USE_DML=OFF -Donnxruntime_USE_WINML=OFF -Donnxruntime_BUILD_MS_EXPERIMENTAL_OPS=OFF -Donnxruntime_USE_TELEMETRY=OFF -Donnxruntime_ENABLE_LTO=OFF -Donnxruntime_USE_ACL=OFF -Donnxruntime_USE_ACL_1902=OFF -Donnxruntime_USE_ACL_1905=OFF -Donnxruntime_USE_ACL_1908=OFF -Donnxruntime_USE_ACL_2002=OFF -Donnxruntime_USE_ARMNN=OFF -Donnxruntime_ARMNN_RELU_USE_CPU=ON -Donnxruntime_ARMNN_BN_USE_CPU=ON -Donnxruntime_ENABLE_NVTX_PROFILE=OFF -Donnxruntime_ENABLE_TRAINING=OFF -Donnxruntime_ENABLE_TRAINING_OPS=OFF -Donnxruntime_ENABLE_TRAINING_TORCH_INTEROP=OFF -Donnxruntime_ENABLE_TRAINING_ON_DEVICE=OFF -Donnxruntime_ENABLE_CPU_FP16_OPS=OFF -Donnxruntime_USE_NCCL=OFF -Donnxruntime_BUILD_BENCHMARKS=OFF -Donnxruntime_USE_ROCM=OFF -DOnnxruntime_GCOV_COVERAGE=OFF -Donnxruntime_USE_MPI=ON -Donnxruntime_ENABLE_MEMORY_PROFILE=OFF -Donnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=OFF -Donnxruntime_BUILD_WEBASSEMBLY=OFF -Donnxruntime_BUILD_WEBASSEMBLY_STATIC_LIB=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_EXCEPTION_CATCHING=ON -Donnxruntime_ENABLE_WEBASSEMBLY_EXCEPTION_THROWING=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_THREADS=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_DEBUG_INFO=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_PROFILING=OFF -Donnxruntime_ENABLE_EAGER_MODE=OFF -Donnxruntime_ENABLE_LAZY_TENSOR=OFF -Donnxruntime_ENABLE_EXTERNAL_CUSTOM_OP_SCHEMAS=OFF -Donnxruntime_ENABLE_CUDA_PROFILING=OFF -Donnxruntime_ENABLE_ROCM_PROFILING=OFF -Donnxruntime_USE_XNNPACK=OFF -Donnxruntime_USE_CANN=OFF -Donnxruntime_BUILD_UNIT_TESTS=0 -Donnxruntime_DEV_MODE=ON '-DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded$<$<CONFIG:Debug>:Debug>' -DONNX_USE_MSVC_STATIC_RUNTIME=ON -Dprotobuf_MSVC_STATIC_RUNTIME=ON -Dgtest_force_shared_crt=OFF -Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF -G Ninja -Donnxruntime_ENABLE_MEMLEAK_CHECKER=OFF -DCMAKE_BUILD_TYPE=Debug
  Patch found: C:/Program Files/Git/usr/bin/patch.exe
  Use protobuf from submodule
  Use date from submodule
  Use mp11 from submodule
  Use json from submodule
  Use re2 from submodule
  Use cpuinfo from submodule
  Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-ml.proto
  Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-operators-ml.proto
  Generated: D:/RustProjects/ort/target/debug/build/ort-976a8a4574b33aaa/out/onnxruntime/build/Debug/external/onnx/onnx/onnx-data.proto
  Use flatbuffers from submodule
  fatal: No names found, cannot describe anything.
  CMake Warning (dev) at D:/Program Files/CMake/share/cmake-3.25/Modules/FetchContent.cmake:1279 (message):
    The DOWNLOAD_EXTRACT_TIMESTAMP option was not given and policy CMP0135 is
    not set.  The policy's OLD behavior will be used.  When using a URL
    download, the timestamps of extracted files should preferably be that of
    the time of extraction, otherwise code that depends on the extracted
    contents might not be rebuilt if the URL changes.  The OLD behavior
    preserves the timestamps from the archive instead, but this is usually not
    what you want.  Update your project to the NEW behavior or specify the
    DOWNLOAD_EXTRACT_TIMESTAMP option with a value of true to avoid this
    robustness issue.
  Call Stack (most recent call first):
    external/abseil-cpp.cmake:20 (FetchContent_Declare)
    onnxruntime_common.cmake:112 (include)
    CMakeLists.txt:2054 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  2023-01-11 13:57:06,106 util.run [DEBUG] - Subprocess completed. Return code: 0
  2023-01-11 13:57:06,106 build [INFO] - Building targets for Debug configuration
  2023-01-11 13:57:06,107 util.run [INFO] - Running subprocess in 'D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime'
    'D:\Program Files\CMake\bin\cmake.EXE' --build 'build\Debug' --config Debug -- -j6
  Traceback (most recent call last):
    File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 2812, in <module>
      sys.exit(main())
    File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 2727, in main
      build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
    File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 1349, in build_targets
      run_subprocess(cmd_args, env=env)
    File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\ci_build\build.py", line 740, in run_subprocess
      return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
    File "D:\RustProjects\ort\target\debug\build\ort-976a8a4574b33aaa\out\onnxruntime\tools\python\util\run.py", line 49, in run
      completed_process = subprocess.run(
    File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 528, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['D:\\Program Files\\CMake\\bin\\cmake.EXE', '--build', 'build\\Debug', '--config', 'Debug', '--', '-j6']' returned non-zero exit status 1.
  thread 'main' panicked at 'failed to build ONNX Runtime', build.rs:518:13
  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library\std\src\panicking.rs:575
     1: core::panicking::panic_fmt
               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library\core\src\panicking.rs:65
     2: build_script_build::prepare_libort_dir
               at .\build.rs:518
     3: build_script_build::main
               at .\build.rs:597
     4: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943\library\core\src\ops\function.rs:251
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
warning: build failed, waiting for other jobs to finish...
Process finished with exit code 101

And when using Visual Studio code generator build passes just fine, but when building tests I get a ton of error LNK2001: unresolved external symbol . Mostly names with Dbg and protobuf but there are also others.

All builds I tried were done after clean.

I need help because I am out of ideas.

Tensorrt execution provider is_available() returns false despite it being used

Hello,
I was performing inference on a model (centerface) using tensorrt execution provider. It did work, since tensorrt generated the engine files, but ExecutionProvider::tensorrt().is_available() returns false instead.

Environment:
Tested on nvcr.io/nvidia/tensorrt:22.12-py3 docker image.

ort version: 1.14.3
tensorrt version: 8.5.1
cuda version: 11.8

Here's a snippet of the program i used to perform inference:

fn main() -> OrtResult<()> {
    tracing_subscriber::fmt::init();

    let environment = Arc::new(
        Environment::builder()
            .with_name("centerface")
            .with_log_level(ort::LoggingLevel::Error)
            .with_execution_providers([ExecutionProvider::tensorrt()
                .with("trt_engine_cache_enable", "1")
                .with("trt_engine_cache_path", "./cache")])
            .build()?,
    );

    let session = SessionBuilder::new(&environment)?
        .with_optimization_level(GraphOptimizationLevel::Level3)?
        .with_intra_threads(1)?
        .with_model_from_file("centerface.onnx")?;

    let input = Array::<f32, _>::random((10, 3, 32, 32), Standard);
    let _outputs: Vec<DynOrtTensor<ndarray::Dim<ndarray::IxDynImpl>>> =
        session.run([InputTensor::from_array(input.into_dyn())])?;

    println!("{:?}", ExecutionProvider::tensorrt().is_available());
    Ok(())
}

Android Support

Hi,

The title says it all. Android is an officially supported target for onnxruntime and Rust.

Microsoft provides pre-built artifacts for android on Maven

Moreover, onnxruntime provides Android-specific execution providers. See Execution Providers

Best
Musharraf

`ort::value::Value` is unsound since it exposes raw pointers as public fields

The Value type is defined like this:

pub enum Value<'v> {
	RustOwned {
		ptr: *mut sys::OrtValue,
		array: DynArrayRef<'v>,
		memory_info: MemoryInfo
	},
	CppOwned {
		ptr: *mut sys::OrtValue,
		session: Arc<SessionPointerHolder>
	}
}

Enum variants and their fields are public, so downstream code can:

construct a Value from an arbitrary dangling pointer, or
construct a valid Value using the API, and then change the contained pointer to point whereever

None of those things require unsafe code, so this API is unsound. It could be fixed by wrapping the enum in another struct type, as a private field.

Is it possible to use ORT for training?

onnxruntime supports training see https://github.com/microsoft/onnxruntime-training-examples

Is this possible with ORT?

If so could someone give an example of how this might work?

My end goal is to see if I can fine tune models such as LLama using Rust, without jumping into python.

build error: load-dynamic feature can not compile

When enabling load dynamic, the compiler is complaining about mismatched types. (rustc 1.68.2)

error[E0308]: mismatched types
   --> /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/execution_providers.rs:204:121
    |
204 |                 let status = ortsys![unsafe UpdateTensorRTProviderOptions(tensorrt_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len())];
    |                                                                                                                                     ^^^^^^^^^^ expected `u64`, found `usize`
    |
   ::: /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/lib.rs:130:18
    |
130 |         unsafe { $crate::ort().$method.unwrap()($($n),+) }
    |                  ------------------------------ arguments to this function are incorrect
    |
help: you can convert a `usize` to a `u64` and panic if the converted value doesn't fit
    |
204 |                 let status = ortsys![unsafe UpdateTensorRTProviderOptions(tensorrt_options, key_ptrs.as_ptr(), value_ptrs.as_ptr(), keys.len().try_into().unwrap())];
    |                                                                                                                                               ++++++++++++++++++++

error[E0308]: mismatched types
   --> /../.cargo/registry/src/github.com-1ecc6299db9ec823/ort-1.14.4/src/execution_providers.rs:258:21
    |
258 |                     gpu_mem_limit: usize::MAX,
    |                                    ^^^^^^^^^^ expected `u64`, found `usize`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` due to 2 previous errors

The difference in inference time between tch and onnx

Hi,

In my rust code, there is the difference in inference time between tch model and onnx session.
tch model is much faster than onnx session. I wonder why.

Actually, there was little difference between the two in python, but there is a lot in rust.
I have no idea, I really want to solve this problem.

thank you,

Support for `wasm32-unknown-unknown`

What

ONNXRuntime supports wasm targets via it's onnxruntime for web bindings.

According to this page you can build a static lib of onnxruntime for wasm targets which you can then bundle with your C++ WebAssembly. project.

From what I understand, the wasm lib provides the same C API that regular onnxruntime provide.

Why

Supporting wasm targets will make it easy to deploy models built with ort to web browsers and wasm runtimes such as WasmTime and Wasmer.

Since wasm-runtimes are cross-platform, users can bundle their models along with pre and post processing code in a single, universal, executable module that they can then run on browsers or on any wasm-runtime.

Issues using IoBindings

Hello,

I am trying to work with the IoBindings that were recently added, and I am facing a few issues. I could not find documentations or examples in the crate illustrating how this would work -- I am attempting to reproduce a minimal example in Python using onnxruntime: the gist can be found here. I am attaching the tiny onnx model file (net.zip) to this issue, but it can be created again by running the notebook linked above.

Here are the current issues I am facing:

It seems that the constructor for IoBindings is not public:

ort/src/io_binding.rs

Line 19 in cb21def

pub(crate) fn new(session: &'s Session) -> OrtResult<Self> {

. I a working off a local copy and removed the (crate) visibility modifier, but am I missing something on how the IoBinding should be created?
The Drop implementation for the IoBindings does not include bound input and output. I believe bind_input needs to take ownership of its value in the current implementation:

pub fn bind_input<'a, 'b: 'a, 'c: 'b, S: AsRef<str> + Clone + Debug>(&'a mut self, name: S, ort_value: Value<'b>) -> OrtResult<()> {
    [...]
}

With these local changes I am still unable to run the model with iobindings: the output does not seem to be properly constructed/populated: Failed to get tensor type and shape: the ort_value must contain a constructed tensor or sparse tensor

The small Rust binary I am using for testing is included below for reference. This would require adding the tch dependency to access libtorch, please let me know if you have any issues doing so:

use anyhow;
use ndarray::{ArrayD, CowArray};
use ort::{AllocationDevice, AllocatorType, Environment, ExecutionProvider, GraphOptimizationLevel, IoBinding, MemoryInfo, MemType, SessionBuilder, Value};
use ort::tensor::OrtOwnedTensor;

fn main() -> anyhow::Result<()> {
    tracing_subscriber::fmt::init();

    let environment = Environment::builder()
        .with_name("test")
        .with_execution_providers([ExecutionProvider::CUDA(Default::default())])
        .build()?
        .into_arc();

    let session = SessionBuilder::new(&environment)?
        .with_optimization_level(GraphOptimizationLevel::Level1)?
        .with_intra_threads(1)?.with_model_from_file("path/to/net.onnx")?;

    let input_tensor = tch::Tensor::arange(16*2, (tch::Kind::Float, tch::Device::cuda_if_available())).view([16,2]);

    // First option: ndarray
    let input_array: ArrayD<f32> = input_tensor.as_ref().try_into()?;
    let input_cow_array = CowArray::from(&input_array);
    let output_array: OrtOwnedTensor<f32, _> = session.run(vec![Value::from_array(session.allocator(), &input_cow_array)?])?[0].try_extract()?;
    println!("{:?}", output_array);

    // Second option: IO Bindings
    let mut io_bindings = IoBinding::new(&session)?;

    let value = Value::from_array(session.allocator(), &input_cow_array)?;
    let _ = io_bindings.bind_input("some_input", value)?;
    let output_mem_info = MemoryInfo::new(AllocationDevice::CPU,0,AllocatorType::Device, MemType::Default)?;
    let _ = io_bindings.bind_output("some_output", output_mem_info)?;

    let outputs = io_bindings.outputs()?;

    for (output_name, output_value) in outputs {
        let output_array: OrtOwnedTensor<f32, _> = output_value.try_extract()?;
        println!("{output_name}: {output_array:?}");
    }

    Ok(())
}

I have also tried extracting the values from the output memory info as follows:

	for (_, output_value) in outputs {
		let output_tensor = unsafe {
                   Tensor::from_blob(output_value.ptr() as *const u8, &[16, 5], &[5, 1], Kind::Float, Device::Cpu) 
                };
		output_tensor.print();
	}

but the values from the tensor are incorrect, so I guess the memory is not read from the right location.

Finally, eventually I think it would be great to be able to run something similar to the Python interface: where we can create 2 torch tensors (input and placeholder output), register pointers to these tensors in the io bindings, and calling session.run() would populate the tensor output. This would probably require being allowed to pass raw pointers to the io-bindings, maybe a "dangerous" module could be created to allow such usecase.

Thank you

Setting up `ort` as a dependency for CI pipeline

Hello,

I am in the final stages of integrating ONNX support for a project via the ort bindings. I have everything working well on my machine, and working on the CI when pinning the dependency to ort = {version="=1.14.3", optional = true, default-features = false, features = ["half"]}. I downloaded the ONNXRuntime library and set up my ORT_DYLIB_PATH to the .dll downloaded manually.

For the CI I would like the download to happen automatically (since maintaining the download path per OS/version in the github cations is a bit of a hassle). This was working fine with version 1.14.3, as can be seen by this CI run. I cannot get the same behaviour with 1.14.6, see for example this run.

Could you please help me understand how I should set my features/CI to allow a migration from 1.14.3 to 1.14.6? For information my actions file for ONNX tests is at availaable here, and this is my Cargo.toml

Avoid unnecessary memory allocation in `run`

In the session run function we can find many small Vec creation which will do memory allocations.
As run may be used in a "hot path" of a real time software where memory allocation is problematic, it would be nice to avoid them. It looks like they can be done at initialization time.
Some rust related reference: https://nnethercote.github.io/perf-book/heap-allocations.html#vec

If avoiding Vec creation is too difficult, the smallvec crate may be something to consider. With SmallVec we can specify a "stack size", if the vector size goes above the stack size, it will allocate on heap, otherwise it will stay in the stack.
https://nnethercote.github.io/perf-book/heap-allocations.html#short-vecs

Ort v1.15 fails to build for i686-pc-windows-msvc targets

Hi,

The title says it all. Under x86_64-pc-windows-msvc targets everything works perfectly, but the build fails under i686-pc-windows-msvc targets.

These are the errors I get

    Checking ort v1.15.0
error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:88:56
    |
88  |         let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                         ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
    |                                                         |
    |                                                         arguments to this enum variant are incorrect
    |
    = note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
                  found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
    = note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:88:51
    |
88  |         let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                         ^^^^^-------------^
    |                                                              |
    |                                                              this argument influences the type of `Some`
note: tuple variant defined here
   --> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
    |
572 |     Some(#[stable(feature = "rust1", since = "1.0.0")] T),
    |     ^^^^

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:99:56
    |
99  |         let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                         ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
    |                                                         |
    |                                                         arguments to this enum variant are incorrect
    |
    = note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
                  found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
    = note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:99:51
    |
99  |         let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                         ^^^^^-------------^
    |                                                              |
    |                                                              this argument influences the type of `Some`
note: tuple variant defined here
   --> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
    |
572 |     Some(#[stable(feature = "rust1", since = "1.0.0")] T),
    |     ^^^^

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:203:57
    |
203 |             let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                             ---- ^^^^^^^^^^^^^ expected "C" fn, found "stdcall" fn
    |                                                             |
    |                                                             arguments to this enum variant are incorrect
    |
    = note: expected fn pointer `unsafe extern "C" fn(_, OrtLoggingLevel, _, _, _, _)`
                  found fn item `extern "stdcall" fn(_, OrtLoggingLevel, _, _, _, _) {custom_logger}`
    = note: when the arguments and return types match, functions can be coerced to function pointers
help: the type constructed contains `extern "stdcall" fn(*mut c_void, OrtLoggingLevel, *const i8, *const i8, *const i8, *const i8) {custom_logger}` due to the type of the argument passed
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\environment.rs:203:52
    |
203 |             let logging_function: sys::OrtLoggingFunction = Some(custom_logger);
    |                                                             ^^^^^-------------^
    |                                                                  |
    |                                                                  this argument influences the type of `Some`
note: tuple variant defined here
   --> C:\Users\user\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:572:5
    |
572 |     Some(#[stable(feature = "rust1", since = "1.0.0")] T),
    |     ^^^^

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:778:20
    |
778 |         extract_io_count(f, session_ptr)
    |         ---------------- ^ expected "stdcall" fn, found "C" fn
    |         |
    |         arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:786:5
    |
786 |     fn extract_io_count(
    |        ^^^^^^^^^^^^^^^^
787 |         f: extern_system_fn! { unsafe fn(*const sys::OrtSession, *mut usize) -> *mut sys::OrtStatus },
    |         ---------------------------------------------------------------------------------------------

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:783:20
    |
783 |         extract_io_count(f, session_ptr)
    |         ---------------- ^ expected "stdcall" fn, found "C" fn
    |         |
    |         arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:786:5
    |
786 |     fn extract_io_count(
    |        ^^^^^^^^^^^^^^^^
787 |         f: extern_system_fn! { unsafe fn(*const sys::OrtSession, *mut usize) -> *mut sys::OrtStatus },
    |         ---------------------------------------------------------------------------------------------

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:802:19
    |
802 |         extract_io_name(f, session_ptr, allocator_ptr, i)
    |         --------------- ^ expected "stdcall" fn, found "C" fn
    |         |
    |         arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _, _, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:816:5
    |
816 |       fn extract_io_name(
    |          ^^^^^^^^^^^^^^^
817 | /         f: extern_system_fn! { unsafe fn(
818 | |             *const sys::OrtSession,
819 | |             size_t,
820 | |             *mut sys::OrtAllocator,
821 | |             *mut *mut c_char,
822 | |         ) -> *mut sys::OrtStatus },
    | |__________________________________-

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:807:19
    |
807 |         extract_io_name(f, session_ptr, allocator_ptr, i)
    |         --------------- ^ expected "stdcall" fn, found "C" fn
    |         |
    |         arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _, _, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:816:5
    |
816 |       fn extract_io_name(
    |          ^^^^^^^^^^^^^^^
817 | /         f: extern_system_fn! { unsafe fn(
818 | |             *const sys::OrtSession,
819 | |             size_t,
820 | |             *mut sys::OrtAllocator,
821 | |             *mut *mut c_char,
822 | |         ) -> *mut sys::OrtStatus },
    | |__________________________________-

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:839:45
    |
839 |         let (input_type, dimensions) = extract_io(f, session_ptr, i as _)?;
    |                                        ---------- ^ expected "stdcall" fn, found "C" fn
    |                                        |
    |                                        arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:858:5
    |
858 |       fn extract_io(
    |          ^^^^^^^^^^
859 | /         f: extern_system_fn! { unsafe fn(
860 | |             *const sys::OrtSession,
861 | |             size_t,
862 | |             *mut *mut sys::OrtTypeInfo,
863 | |         ) -> *mut sys::OrtStatus },
    | |__________________________________-

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:850:46
    |
850 |         let (output_type, dimensions) = extract_io(f, session_ptr, i as _)?;
    |                                         ---------- ^ expected "stdcall" fn, found "C" fn
    |                                         |
    |                                         arguments to this function are incorrect
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_, _, _) -> _`
               found fn pointer `unsafe extern "C" fn(_, _, _) -> _`
note: function defined here
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\session.rs:858:5
    |
858 |       fn extract_io(
    |          ^^^^^^^^^^
859 | /         f: extern_system_fn! { unsafe fn(
860 | |             *const sys::OrtSession,
861 | |             size_t,
862 | |             *mut *mut sys::OrtTypeInfo,
863 | |         ) -> *mut sys::OrtStatus },
    | |__________________________________-

error[E0308]: mismatched types
  --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\lib.rs:97:87
   |
45 | ... ($(#[$meta])* unsafe extern "stdcall" fn $($tt)*);
   |                   ------ expected due to this
...
97 | ...em_fn! { unsafe fn () -> *const ffi::c_char } = (*base).GetVersionString.unwrap();
   |                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected "stdcall" fn, found "C" fn
   |
   = note: expected fn pointer `unsafe extern "stdcall" fn() -> _`
              found fn pointer `unsafe extern "C" fn() -> _`

error[E0308]: mismatched types
   --> C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\ort-1.15.0\src\lib.rs:112:78
    |
45  |     ($(#[$meta:meta])* unsafe fn $($tt:tt)*) => ($(#[$meta])* unsafe extern "stdcall" fn $($tt)*);
    |                                                               ------ expected due to this
...
112 |             let get_api: extern_system_fn! { unsafe fn(u32) -> *const sys::OrtApi } = (*base).GetApi.unwrap();
    |                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^ expected "stdcall" fn, found "C" fn
    |
    = note: expected fn pointer `unsafe extern "stdcall" fn(_) -> _`
               found fn pointer `unsafe extern "C" fn(_) -> _`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `ort` (lib) due to 11 previous errors
warning: build failed, waiting for other jobs to finish...

Best
Musharraf

Unable to use hardware acceleration on Windows

This is log.

2023-07-22T14:57:03.018264Z  INFO apply_execution_providers: ort::execution_providers: CUDA execution provider registered successfully
2023-07-22T14:57:03.053798Z ERROR apply_execution_providers: ort::execution_providers: CUDA execution provider registration failed: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Project\yolov8_onnx_rust\target\debug\onnxruntime_providers_cuda.dll"

2023-07-22T14:57:03.054196Z  INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-07-22T14:57:03.063023Z ERROR apply_execution_providers: ort::execution_providers: TensorRT execution provider registration failed: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Project\yolov8_onnx_rust\target\debug\onnxruntime_providers_tensorrt.dll"

I use Windows 11 22621.1992 .
Uses a dll that is automatically downloaded by the programme.

Process exit with exit code 0xc0000005 when using TensorRT execution provider

Thank you for creating this crate.
This error occurred while using TensorRT EP inference, both on my code and example code.

This is the log printed by my code

2023-04-15T15:47:24.013752Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Flush-to-zero and denormal-as-zero are off
2023-04-15T15:47:24.014425Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating and using per session threadpools since use_per_session_threads_ is true
2023-04-15T15:47:24.015015Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Dynamic block base set to 0      
2023-04-15T15:47:24.202936Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Initializing session.
2023-04-15T15:47:24.203637Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.204585Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for CudaPinned 
with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.205679Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Creating BFCArena for CUDA_CPU with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-04-15T15:47:24.206431Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for 
OrtMemoryInfo:[name:Cuda id:0 OrtMemType:0 OrtAllocatorType:1 Device:[DeviceType:1 MemoryType:0 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.207074Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for 
OrtMemoryInfo:[name:CudaPinned id:0 OrtMemType:-1 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:1 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.207709Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Allocator already registered for 
OrtMemoryInfo:[name:CUDA_CPU id:0 OrtMemType:-2 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:0 DeviceId:0]]. Ignoring allocator from CUDAExecutionProvider
2023-04-15T15:47:24.211005Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total shared scalar initializer count: 8
2023-04-15T15:47:24.215380Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total fused reshape node count: 02023-04-15T15:47:24.217219Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total shared scalar initializer count: 0
2023-04-15T15:47:24.218852Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: Total fused reshape node count: 02023-04-15T15:47:24.219866Z DEBUG run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: [TensorRT EP] Model name is yolov5s_half.onnx
2023-04-15T15:47:25.570221Z  INFO run{args=Args { model_name: ".\\yolov5s_half.onnx", img_name: ".\\bus.jpg", device: AUTO, opt_level: 1, half: true, conf_thresh: 0.2, score_thresh: 0.2, nms_thresh: 0.45, benchmark: false } warm_up=false}:init{model_file=".\\yolov5s_half.onnx" device=AUTO opt_level=1}: ort: [2023-04-15 15:47:25 WARNING] hDebInfo\_deps\onnx_tensorrt-src\onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to 
cast down to INT32.
error: process didn't exit successfully: `target\debug\yolov5_onnx.exe -m .\yolov5s_half.onnx -i .\bus.jpg --half` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

And this is the log of the example code

2023-04-15T16:28:10.434694Z DEBUG ort::environment: Environment not yet initialized, creating a new one
2023-04-15T16:28:10.457540Z DEBUG ort::environment: Environment created env_ptr="0x22ad8786b70"
2023-04-15T16:28:10.458777Z  INFO download_to{self=SessionBuilder { env: "GPT-2", allocator: Device, memory_type: Default } url="https://github.com/onnx/models/raw/main/text/machine_comprehension/gpt-2/model/gpt2-lm-head-10.onnx" download_dir="I:\\ort_test\\ort"}: ort::session: Model already exists, skipping download model_filepath="I:\\ort_test\\ort\\gpt2-lm-head-10.onnx"
2023-04-15T16:28:10.459654Z  INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:10.652673Z  INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:10.653144Z  INFO apply_execution_providers: ort::execution_providers: TensorRT execution provider registered successfully
2023-04-15T16:28:17.079889Z  INFO ort: [2023-04-15 16:28:17 WARNING] hDebInfo\_deps\onnx_tensorrt-src\onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
error: process didn't exit successfully: `target\debug\examples\gpt.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

I use pre-compiled onnxruntime downloaded from github, v1.14.1
Enviroment is

Windows 10 22H2
CUDA 11.6
rustup 1.25.1 and cargo 1.65.0
stable-x86_64-pc-windows-msvc unchanged - rustc 1.65.0 (897e37553 2022-11-02)
I am unable to find any solution to this problem, the only possible relevant information comes from this issue
And the error log does not seem to provide any information about the issue.

Do we really need `ort::session::InMemorySession`?

It seems that the lifetime is not necessary.

import torch

linear = torch.nn.Linear(2, 4)
x = torch.tensor([0.1, 0.2])
y = linear(x)
print(y)

torch.onnx.export(
    linear,
    (torch.rand(1, 2),),
    'linear.onnx',
    input_names=["x"],
    output_names=['y'],
    dynamic_axes={'x': {0: 'B'}, 'y': {0: 'B'}}
)

use std::sync::Arc;

use ndarray::{ArrayD, CowArray};
use ort::{tensor::OrtOwnedTensor, Environment, ExecutionProvider, SessionBuilder, Value};

fn main() {
	let environment = Arc::new(
		Environment::builder()
			.with_execution_providers([ExecutionProvider::CPU(Default::default())])
			.build()
			.unwrap()
	);

	let mb = std::fs::read("linear.onnx").unwrap();
	let session = SessionBuilder::new(&environment).unwrap().with_model_from_memory(&mb).unwrap();
	drop(mb);

	let x: ArrayD<f32> = ndarray::arr2(&[[0.1, 0.2]]).into_dyn();
	let x = CowArray::from(x);
	let outputs = session.run(vec![Value::from_array(session.allocator(), &x).unwrap()]).unwrap();
	let y: OrtOwnedTensor<f32, _> = outputs[0].try_extract().unwrap();
	dbg!(y.view().clone().into_dyn());
}

diff --git a/src/session.rs b/src/session.rs
index 2ef923f..c633613 100644
--- a/src/session.rs
+++ b/src/session.rs
@@ -481,7 +481,7 @@ impl SessionBuilder {
        }
 
        /// Load an ONNX graph from memory and commit the session.
-       pub fn with_model_from_memory(self, model_bytes: &[u8]) -> OrtResult<InMemorySession<'_>> {
+       pub fn with_model_from_memory(self, model_bytes: &[u8]) -> OrtResult<Session> {
                let mut session_ptr: *mut sys::OrtSession = std::ptr::null_mut();
 
                let env_ptr: *const sys::OrtEnv = self.env.ptr();
@@ -533,7 +533,8 @@ impl SessionBuilder {
                        inputs,
                        outputs
                };
-               Ok(InMemorySession { session, phantom: PhantomData })
+               // Ok(InMemorySession { session, phantom: PhantomData })
+               Ok(session)
        }
 }

How do I learn to create something like this?

Hi there,

Thanks a lot for this amazing package. I meant to ask, how do I learn to make something like this? Learning how to write rust code is one thing, but could you reccomand me something specific to learn how to create this wrappers?

Thanks a lot.

Cheers,

Fra

Loading large T5 Models > 2GB efficiently

I have a model I've converted to onnx format that is > 2GB. This results in a number of model.onnx files. I'm grappling the an efficient approach to loading the model.

For other models I will:

pub static ENCODER_MODEL: Lazy<Vec<u8>> = Lazy::new(|| {
    let model_path = PathBuf::from(&*GLOBAL_AI_INCLUDE_ROOT).join("encoder/model/model.onnx");
    let mut file = File::open(model_path).unwrap();
    let mut buffer = Vec::new();
    file.read_to_end(&mut buffer).unwrap();
    buffer
});

and then do something like:

pub struct SentenceEncoder<'s> {
    session: InMemorySession<'s>,
}

impl SentenceEncoder<'_> {
    pub fn new() -> SystemSyncResult<Self> {
        let session = SessionBuilder::new(&ENVIRONMENT)?
            .with_optimization_level(GraphOptimizationLevel::Level1)?
            .with_model_from_memory(&*ENCODER_MODEL)?;

        Ok(Self { session })
    }

    //more code....

}

I then have a dedicated threadpool that loads a specified number of threads and reuses the sessions across calls to ensure we can run inference in parallel. Callers can issue calls over a crossbeam channel to a watcher that prioritizes the incoming calls and dispatches them to a processing thread. The thread pics it up and based on the inference request type elects the appropriate session to run.

With most models the above strategy works great, and all session share the ENCODER_MODEL memory. This particular T5 model(it's a t5-large grammar synthesis pretrained model), when I convert it to onnx format, I end up with a bunch of files(model.onnx, onnx__MatMul_XXXXm shared.weights) due to Googles Protocol Buffers serialization format and size limitations. So the actual model.onnx file is very small and the rest of the model contents are located in the rest of the files.

Is there a way I can pass the model into ort from memory similar to how I'm doing with other models? Additionally, is there something I'm overlooking here(I'm new to using this crate)? Does the Microsoft runtime efficiently share the model representation in memory internally for sessions loaded with the same model file, so multiple calls to with_model_from_file() won't result in unnecessary memory allocation? Am I misunderstanding something here?

Any help on this would be appreciated. Admittedly I haven't finished the implementation of this T5 model in rust yet, so I'm a bit preemptive in my questions.

Thanks again for the work on this crate!

DnnlExecutionProvider

Discussed in #2

^{Originally posted by dzhao December 30, 2022}
Hi, first of all great crate!! much cleaner and more comprehensive.
I am ready to use it in one of my critical project.
One thing thought as we are using CPU for serving and our CPUs will benefit greatly from mkl lib.
Onnx has DnnlExecutionProvider that will enable the mkl support but it doesn't seem your crate provide this support..
Also I checked the downloaded onnx lib in my Mac and I don't see the dnnl lib so it looks like it is just using the normal MLAS cpu ep.
Is it possible to add this option?
Thanks so much for the work!

TensorRT provider failing with default options

ort = { version = "1.15.2", features = ["load-dynamic", "tensorrt"] }

Environment::builder()
   .with_execution_providers([ExecutionProvider::TensorRT(Default::default())])
   .build()?
   .into_arc();

ort::execution_providers: An error occurred when attempting to register `TensorrtExecutionProvider`: key/value cannot be empty

I spent some time poking around the onnxruntime and ort code. It looks like maybe onnxruntime is not getting the provider options somehow?

Different output values than in python

Hi! Thanks for all your work!
I gave this crate a try, and while it's easy to use, sadly I'm getting different (and thus wrong) outputs from my models.

Here is a gist for rust with ort: https://gist.github.com/LoipesMas/2d342b8087dbae4af31d8af2752e84de
Here is a gist for python with onnxruntime: https://gist.github.com/LoipesMas/d7258a3d009e9b06c3684d77e341251b
(Those are using the squeezenet models from here, but I originally run into this issue with a different, yolov7 based model)

Example outputs

rust+ort:

[src/main.rs:29] &input.shape() = [
    1,
    3,
    224,
    224,
]
[src/main.rs:30] input.slice(s![0, .., 100, 100]) = [0.92156863, 0.9529412, 0.99607843], shape=[3], strides=[1], layout=CFcf (0xf), const ndim=1
[src/main.rs:31] input.slice(s![0, .., 180, 50]) = [0.9882353, 0.96862745, 0.95686275], shape=[3], strides=[1], layout=CFcf (0xf), const ndim=1
[src/main.rs:48] max_score = Some(
    (
        794,
        0.06649929,
    ),
)
[src/main.rs:49] scores.slice(s![0, 322, .., ..]) = [[1.08950435e-5]], shape=[1, 1], strides=[0, 0], layout=CFcf (0xf), const ndim=2

python:

frame.shape=(1, 3, 224, 224)
frame[0, :, 100, 100]=array([0.92156863, 0.9529412 , 0.99607843], dtype=float32)
frame[0, :, 180, 50]=array([0.9882353 , 0.96862745, 0.95686275], dtype=float32)
max_score=0.11967179
np.where(scores >= max_score)=(array([0]), array([669]), array([0]), array([0]))
scores[0][322]=array([[4.9559356e-05]], dtype=float32)

Input shapes and values are the same (I'm almost positive), but outputs are not even close (e.g. different indexes of max-value, different values (sometimes an order of magnitude different))

I'm not sure if that's an issue on my side (how I load data, how I use this crate or something else) or if it's on crate's side.
Since ort is based on onnxruntime-rs, this issue might be relevant (although probably not very helpful). Maybe those small errors add up somehow? No idea.

Thanks in advance!

NN-API

Hi, thanks for a great project!

I am currently developing a runtime for an RNN-based LLM by converting the original model to ONNX and trying to potentially run it (or parts of it) using the ONNX hardware acceleration capabilities.

My main goal is mobile and running models there without NN-API on modern SoCs can mean ~x12 reduction in efficiency and performance.

Would it be hard to get the NNAPI running + get the downloadable build with it too? 👀

Will be certainly willing to help with testing!

Support for boolean input tensor from ndarray

Hello. First of all, thank you for sharing this great repo!

My onnx model uses boolean array as input, but converting from boolean ndarray to InputTensor seems not working (I tried with f64 ndarray and it works fine).

I recognizes that there is a PR (#58) which tries to fix it, but I also saw another PR (#41) that shows that you are working on the next release version.

My question is, will it be fixed on the next release version? I tried to figure it out myself, but it was hard because the code structure seems to be changed.

Thank you!

FR: Remove `fetch-models` from default features

I've noticed that it's a common practice in Rust to leave expensive features off by default. For example, many packages don't include their derive macros as default features because they increase compile times.

I think it might be good to remove fetch-models from the default features to match this convention.

What do you think? Any objections?

Static linking of ONNX runtime

Hello 👋

Is there a way to link the ONNX runtime library statically (.a, .lib) into the outputted binary? This would simplify deployment!

You can find static builds of the ONNX runtime here: https://github.com/supertone-inc/onnxruntime-build/tree/main

Cheers,
Raphael

Direct ML Compile Error

Hey there,

when compiling for windows 1809 using direct ml with cargo

[dependencies.ort]
version =  "1.15.2"
features = ["download-binaries", "directml"]

I get the following error:

error: linking with `link.exe` failed: exit code: 1120
...
  = note: libort-823b36f3ee700ecc.rlib(ort-823b36f3ee700ecc.ort.b1d99b0d4efe1b77-cgu.11.rcgu.o) : error LNK2019: unresolved external symbol OrtSessionOptionsAppendExecutionProvider_DML referenced in function _ZN3ort19execution_providers17ExecutionProvider5apply17h70344b19a4319081E

Could the reason be that direct ml is not installed in windows 1809? I tried to install it with

nuget install Microsoft.AI.DirectML -Version 1.12.1

Is there another, better way? I cannot upgrade the windows version at the momen

Thanks!

When running cargo build --release cant find ./program: error while loading shared libraries: libonnxruntime.so.1.15.1: cannot open shared object file: No such file or directory

./program: error while loading shared libraries: libonnxruntime.so.1.15.1: cannot open shared object file: No such file or directory

Cargo.toml

[package]
name = "antivirus"
version = "0.1.0"
edition = "2021"

[profile.dev]
rpath = true

[profile.release]
rpath = true

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ort = "1.15.2"
ndarray = "0.15.6"```

program/target/release ls:
``antivirus  
antivirus.d  
build  
deps  
examples  
incremental  
libonnxruntime.so  
libonnxruntime.so.1.15.1``

Anyway I can fix this so I can share the binary without cargo run?

	unsafe fn get_tensor_dimensions(tensor_info_ptr: *const sys::OrtTensorTypeAndShapeInfo) -> OrtResult<Vec<i64>> {
	let mut num_dims = 0;
	ortsys![GetDimensionsCount(tensor_info_ptr, &mut num_dims) -> OrtError::GetDimensionsCount];
	assert_ne!(num_dims, 0);

	let mut node_dims: Vec<i64> = vec![0; num_dims as _];
	ortsys![GetDimensions(tensor_info_ptr, node_dims.as_mut_ptr(), num_dims) -> OrtError::GetDimensions];
	Ok(node_dims)
	}

pykeio / ort Goto Github PK

ort's People

Contributors

Stargazers

Watchers

Forkers

ort's Issues

Performance difference

Allocate outside of run function: Discussed in #37

IOBinding: Discussed in #15

What

Why

This is the log printed by my code

And this is the log of the example code

Discussed in #2

Recommend Projects

Recommend Topics

Recommend Org