Giter VIP home page Giter VIP logo

text-to-audio-esc's Introduction

Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification

Francesca Ronchini1, Luca Comanducci1, and Fabio Antonacci1

1 Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano

arXiv

Abstract

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio generation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

Install & Usage

For generating the data, we used AudioLDM2 and AudioGen.

Intalling AudioLDM2

Please refer to the AudioLDM2 GitHub repo and follow the installation instructions. For this study, we used the official checkpoints available in the Hugging Face ๐Ÿงจ Diffusers and the audioldm checkpoint.

When AudioLDM2 has been installed, you can generate the audio files running the script audio_generation/class_generation_audioldm.py Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

After that, you can run the script with the command:

cd audio_generation
python class_generation_audioldm.py

Intalling AudioGen

Please refer to the AudioGen GitHub repo and follow the installation instructions.

When AudioGen has been installed, you can generate the audio files running the script audio_generation/class_generation_audiogen.py. Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

cd audio_generation
python class_generation_audiogen.py

Run the code

When all the data have been generated, you can reproduce the experiments.

First, install all the packages required by the system. Run the following command on your terminal to install all the packages needed:

pip install -r requirements.txt

When all packages have been installed, you need to specify which dataset to use following the instructions on the config/default.yaml file.

After all the parameters have been defined, you can run the code with the following command:

python main.py

Link to additional material

Additional material and audio samples are available on the companion website.

Additional information

For more details: "Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification", Francesca Ronchini, Luca Comanducci, and Fabio Antonacci - arXiv, 2024.

If you use code or comments from this work, please cite our paper:

@article{ronchini2024synthesizing,
  title={Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification},
  author={Ronchini, Francesca and Comanducci, Luca and Antonacci, Fabio},
  journal={arXiv preprint arXiv:2403.17864},
  year={2024}
}

text-to-audio-esc's People

Contributors

ronfrancesca avatar lucacoma avatar

Stargazers

Ryan Yard avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.