Giter VIP home page Giter VIP logo

python_audio_loading_benchmark's Introduction

Python Audio-Loading Benchmark

The aim of his repository is to evaluate the loading performance of various audio I/O packages interfaced from python.

This is relevant for machine learning models that today often process raw (time domain) audio and assembling a batch on the fly. It is therefore important to load the audio as fast as possible. At the same time a library should ideally support a variety of uncompressed and compressed audio formats and also is capable of loading only chunks of audio (seeking). The latter is especially important for models that cannot easily work with samples of variable length (convnets).

Tested Libraries

Library Short-Name/Code Out Type Supported codecs Excerpts/Seeking
scipy.io.wavfile scipy Numpy PCM (only 16 bit)
scipy.io.wavfile memmap scipy_mmap Numpy PCM (only 16 bit)
soundfile (libsndfile) soundfile Numpy PCM, Ogg, Flac
pydub pydub Python Array PCM, MP3, OGG or other FFMPEG/libav supported codec
aubio aubio Numpy Array PCM, MP3, OGG or other avconv supported code
audioread (libmad) ar_mad Numpy Array FFMPEG
audioread (gstreamer) ar_gstreamer Numpy Array all of FFMPEG
audioread (FFMPEG) ar_ffmpeg Numpy Array all of FFMPEG
librosa librosa Numpy Array relies on audioread
tensorflow 1.13 contrib.ffmpeg tf_decode Tensorflow Tensor All codecs supported by FFMPEG
torchaudio torchaudio PyTorch Tensor all codecs supported by Sox

Not tested

Results

The benchmark loads a number of (single channel) audio files of different length (between 1 and 151 seconds) and measures the time until the audio is converted to a tensor. Depending on the target tensor type (either numpy, pytorch or tensorflow) a different number of libraries were compared. E.g. when the output type is numpy and the target tensor type is tensorflow, the loading time included the cast operation to the target tensor. Furthermore, multiprocessing was disabled for data loaders that support them.

All results shown below, depict loading time in seconds for wav and mp3 files.

Load to Numpy Tensor

Load to PyTorch Tensor

Load to Tensorflow Tensor

Getting metadata information

In addition to loading the file, one might also be interested in extracting metadata. To benchmark this we asked for every file to provide metadata for sampling rate, channels, samples, and duration. All in consecutive calls, which means the file is not allowed to be opened once and extract all metadata together. Note, that we have excluded pydub from the benchmark results on metadata as it was significantly slower than the other tools.

Running the Benchmark

Installation using Docker

Build the docker container using docker build -t audio_benchmark ., it installs all the package requirements for all audio libraries.

Generate sample data

To test the loading speed, we generate different durations of random (noise) audio data and encode it either to PCM 16bit WAV, MP3 CBR, or MP4. The data is generated by using a shell script. To generate the data in the folder AUDIO, run generate_audio.sh.

Start Benchmark

Mount the data directory into the docker container and run run.sh inside the container. Afterwards, run plot.py to visualze the results.

Authors

@faroit, @hagenw

Contribution

We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.

python_audio_loading_benchmark's People

Contributors

faroit avatar hagenw avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.