Comments (1)
Hi,
Thank you for your question. I will reply to your questions below.
- zeroshot_asp_full.ckpt and zeroshot_asp_held_out.ckpt:
The full ckpt is the checkpoint we train with AudioSet full set. And we trained another model by holding out several classes in AudioSet to perform another experiment in our paper. So usually the full ckpt is the better one because it has full classes during the training.
- (other, vocals, bass, drums) same output:
I think you can look at the "test_key" variable in the "config.py", if you use the "inference" mode. you need to prepare a mixture audio (you want to separate), and a set of query audios (the source you indicate), there are two inference variables in the config.py you might take a look. After you fill in them, you need to change the "test_key" to be only one name like ["violin"], it just indicate the source you want to separate. And notice that the inference mode can only separate one source one time. But you may change a code a little bit to make it support separating multiple sources one time (I will mark it as one request to realize it).
If you are using the "test" model (i.e. musdb mode), you don't need to indicate the query. But you need to set the testavg_path and the testset_path. They can separate the mixture to drum, bass, vocal and other.
So if you are using the "inference" mode, but you set the test_key to be four keys, you will get the same output because you only have one query. The name of the test_key does not indicate the source to separate, it is just a name
- only 10 seconds?
Actually no, the separate model support any length on query and mixture, usually we cut them into small pieces one-by-one and concate the result together. The 10 second limitation is only shown in the sound event detection system during the training time because we think this is a large length to support the audio classification. But you can change it to other length by your need. It does not affect much unless you change it to much larger or shorter length (like 1 sec or 100 sec)
- last question what would you recommend setting up in configs for the best quality possible
One of the possible best query we think can have the best separation results is to use the query in the mixture audio. For example, you have a mixture audio with the violin lead. But you notice that there might be about 1-2 sec solo violin in the mixture. You can extract it as the query. This usually works the best since they are the most close timbre and acoustical feelling with the mixture (they are originally the same). Another choice is that if you don't have this solo part, you can collect other violin samples as many as possible (like 50 pieces). This is how we do in testing musDB, where we collect 100 samples from its training set for constructing vocal, bass, drum, and other latent query.
Hope above information will clarify your question.
Thanks!!
from zero_shot_audio_source_separation.
Related Issues (18)
- Code release HOT 2
- Possible to run on CPU only ? HOT 3
- Different length of input and output HOT 1
- Reproducing paper results HOT 4
- problem with installation HOT 1
- Total lenght of 'dataloader' across ranks is zero. Please make sure that it returns at least 1 batch. HOT 9
- Colab notebook? HOT 1
- How to use HOT 1
- ERROR: Unexpected bus error encountered in worker. HOT 1
- Can't use overlap + clicks issue HOT 2
- How can train using my own dataset HOT 2
- AssertionError: there should be a saved model when inferring HOT 1
- Poor results (and doubt) HOT 2
- I'm having trouble inferring a 15 second song. HOT 2
- Doubt about increasing samples (Query) HOT 1
- Is it possible to have result sound kept for each individual sample ? HOT 5
- Export pred_.wav to stereo HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zero_shot_audio_source_separation.