What happened? I was trying to get a Whisper model exported with O

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug]: Batch inference Olive exported models about olive HOT 6 CLOSED

microsoft commented on September 17, 2024

[Bug]: Batch inference Olive exported models

from olive.

Comments (6)

xiaoyu-work commented on September 17, 2024 1

Hi @Rishit-dagli , what error you got without using audio decoder? Also in the latest commit, we added detailed codes about custom components for Whisper model. Feel free to update them to meet your requires.

from olive.

Rishit-dagli commented on September 17, 2024

@xiaoyu-work Ah, yes, missed that:

This is what I see without using audio decoder:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Input 'attention_mask' is expected to have same shape as input_ids

Also in the latest commit, we added detailed codes about custom components for Whisper model. Feel free to update them to meet your requires.

The model exported with the latest commit still expects (1, x) shaped inputs as I see from this message, do you think I might have missed something?

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: audio_stream for the following indices

from olive.

wenbingl commented on September 17, 2024

@Rishit-dagli , Right now, the audio decoder in onnxruntime-extensions only supports 1-D audio input. Can you share more details about how you create the N-dim batch audio inputs? We may consider to add the support in the next release.

from olive.

Rishit-dagli commented on September 17, 2024

Right now, the audio decoder in onnxruntime-extensions only supports 1-D audio input

Ah, I see!

I also tried without the audio decoder so I load the data using:

audio_blob, _ = librosa.load(path)
audio_blob = np.expand_dims(audio_blob, axis=0)

This gets me (1,x) sized array which I then rearrange to form (x//b, b) sized array and do the padding. Running the model now without the audio decoder seems to be able to do inference but gives outputs of the pattern:

first batch transcribed well
transcribed well !!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

After the first batch it starts returning exclamation marks and I have not been able to understand what causes this.

from olive.

wenbingl commented on September 17, 2024

In that case, I don't think the batch input can solve your problem, check openai examples (https://github.com/openai/whisper/blob/main/whisper/transcribe.py) to see how to process the long file transcribing.

from olive.

jambayk commented on September 17, 2024

Closing this issue since it is not an Olive issue and has also turned stale.

from olive.

Recommend Projects

[Bug]: Batch inference Olive exported models about olive HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent