Giter VIP home page Giter VIP logo

Comments (6)

xiaoyu-work avatar xiaoyu-work commented on September 17, 2024 1

Hi @Rishit-dagli , what error you got without using audio decoder? Also in the latest commit, we added detailed codes about custom components for Whisper model. Feel free to update them to meet your requires.

from olive.

Rishit-dagli avatar Rishit-dagli commented on September 17, 2024

@xiaoyu-work Ah, yes, missed that:

This is what I see without using audio decoder:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BeamSearch node. Name:'BeamSearch_node' Status Message: Input 'attention_mask' is expected to have same shape as input_ids

Also in the latest commit, we added detailed codes about custom components for Whisper model. Feel free to update them to meet your requires.

The model exported with the latest commit still expects (1, x) shaped inputs as I see from this message, do you think I might have missed something?

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: audio_stream for the following indices

from olive.

wenbingl avatar wenbingl commented on September 17, 2024

@Rishit-dagli , Right now, the audio decoder in onnxruntime-extensions only supports 1-D audio input. Can you share more details about how you create the N-dim batch audio inputs? We may consider to add the support in the next release.

from olive.

Rishit-dagli avatar Rishit-dagli commented on September 17, 2024

Right now, the audio decoder in onnxruntime-extensions only supports 1-D audio input

Ah, I see!

I also tried without the audio decoder so I load the data using:

audio_blob, _ = librosa.load(path)
audio_blob = np.expand_dims(audio_blob, axis=0)

This gets me (1,x) sized array which I then rearrange to form (x//b, b) sized array and do the padding. Running the model now without the audio decoder seems to be able to do inference but gives outputs of the pattern:

first batch transcribed well
transcribed well !!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

After the first batch it starts returning exclamation marks and I have not been able to understand what causes this.

from olive.

wenbingl avatar wenbingl commented on September 17, 2024

In that case, I don't think the batch input can solve your problem, check openai examples (https://github.com/openai/whisper/blob/main/whisper/transcribe.py) to see how to process the long file transcribing.

from olive.

jambayk avatar jambayk commented on September 17, 2024

Closing this issue since it is not an Olive issue and has also turned stale.

from olive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.