Hello, First of all, thank you for the awesome and very well-written

Inference Issue about passt HOT 2 OPEN

kkoutini commented on May 28, 2024

Inference Issue

from passt.

Comments (2)

Jerry2001 commented on May 28, 2024

Also, for fsd50k, for sound files with different lengths in the same batch, should I pad them with 0 to the same length before passing them to the model?

from passt.

kkoutini commented on May 28, 2024

Hi! thank you for your interest!
The problem is that the model you've loaded was trained on 10-second clips (Audioset and cropped FSD50k) and audio file that you're processing is longer than 10 seconds (must be around 13.5 seconds from the error) and therefore there is not enough trained time pos encoding to cover the 13.5 seconds.
The get_scene_embeddings takes care of this, by checking if the audio is longer than the largest legnth the model can handle here
This is only a problem for inputs longer than 10-seconds, the model can handle shorter clips here by cropping the time positional encodings to match the input. If you use batched inputs, then you can pad shorter clips. If you're doing the inference one by one then the only constraint is to have enough time positional encodings to cover the whole input. One possible work around is to get (overlapping) windows of 10 seconds and average the resulting embeding, this is done here

During training I'm cropping and padding the raw waveforms with zeros here.

I hope this helps.

from passt.

Recommend Projects

Inference Issue about passt HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent