Giter VIP home page Giter VIP logo

Comments (1)

RetroCirce avatar RetroCirce commented on May 29, 2024

Hi,
Thank you for your question. I will reply to your questions below.

  1. zeroshot_asp_full.ckpt and zeroshot_asp_held_out.ckpt:

The full ckpt is the checkpoint we train with AudioSet full set. And we trained another model by holding out several classes in AudioSet to perform another experiment in our paper. So usually the full ckpt is the better one because it has full classes during the training.

  1. (other, vocals, bass, drums) same output:

I think you can look at the "test_key" variable in the "config.py", if you use the "inference" mode. you need to prepare a mixture audio (you want to separate), and a set of query audios (the source you indicate), there are two inference variables in the config.py you might take a look. After you fill in them, you need to change the "test_key" to be only one name like ["violin"], it just indicate the source you want to separate. And notice that the inference mode can only separate one source one time. But you may change a code a little bit to make it support separating multiple sources one time (I will mark it as one request to realize it).

If you are using the "test" model (i.e. musdb mode), you don't need to indicate the query. But you need to set the testavg_path and the testset_path. They can separate the mixture to drum, bass, vocal and other.

So if you are using the "inference" mode, but you set the test_key to be four keys, you will get the same output because you only have one query. The name of the test_key does not indicate the source to separate, it is just a name

  1. only 10 seconds?

Actually no, the separate model support any length on query and mixture, usually we cut them into small pieces one-by-one and concate the result together. The 10 second limitation is only shown in the sound event detection system during the training time because we think this is a large length to support the audio classification. But you can change it to other length by your need. It does not affect much unless you change it to much larger or shorter length (like 1 sec or 100 sec)

  1. last question what would you recommend setting up in configs for the best quality possible

One of the possible best query we think can have the best separation results is to use the query in the mixture audio. For example, you have a mixture audio with the violin lead. But you notice that there might be about 1-2 sec solo violin in the mixture. You can extract it as the query. This usually works the best since they are the most close timbre and acoustical feelling with the mixture (they are originally the same). Another choice is that if you don't have this solo part, you can collect other violin samples as many as possible (like 50 pieces). This is how we do in testing musDB, where we collect 100 samples from its training set for constructing vocal, bass, drum, and other latent query.

Hope above information will clarify your question.

Thanks!!

from zero_shot_audio_source_separation.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.