Comments (2)
Also, for fsd50k, for sound files with different lengths in the same batch, should I pad them with 0 to the same length before passing them to the model?
from passt.
Hi! thank you for your interest!
The problem is that the model you've loaded was trained on 10-second clips (Audioset and cropped FSD50k) and audio file that you're processing is longer than 10 seconds (must be around 13.5 seconds from the error) and therefore there is not enough trained time pos encoding to cover the 13.5 seconds.
The get_scene_embeddings
takes care of this, by checking if the audio is longer than the largest legnth the model can handle here
This is only a problem for inputs longer than 10-seconds, the model can handle shorter clips here by cropping the time positional encodings
to match the input. If you use batched inputs, then you can pad shorter clips. If you're doing the inference one by one then the only constraint is to have enough time positional encodings
to cover the whole input. One possible work around is to get (overlapping) windows of 10 seconds and average the resulting embeding, this is done here
During training I'm cropping and padding the raw waveforms with zeros here.
I hope this helps.
from passt.
Related Issues (20)
- Is it possible to use this project directly for a code example for instrument recognition? HOT 4
- mismatch version of pytorch-lighting and sarced HOT 15
- Installation issues HOT 1
- The loop in the diagram HOT 1
- RuntimeError: The size of tensor a (2055) must match the size of tensor b (99) at non-singleton dimension 3 HOT 3
- is `config.dyn_norm` enabled? HOT 1
- Is it possible to install the passt with python=3.6? HOT 2
- ImportError: cannot import name 'F1' from 'torchmetrics' (/app/anaconda3/lib/python3.7/site-packages/torchmetrics/__init__.py) HOT 1
- FSD50K - validating on eval data HOT 5
- Pretrained models config HOT 3
- OpenMic fine-tuned model? HOT 2
- Could not solve for environment specs HOT 4
- setup.py
- I have a problem. why convert wav to mp3? HOT 3
- difference of fine-tuning the pretrained models HOT 2
- Getting started with a custom dataset HOT 8
- 音频事件检测
- test my own model HOT 1
- Inference on AudioSet HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from passt.