mjhydri / beatnet Goto Github PK

BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).

License: Creative Commons Attribution 4.0 International

Python 100.00%

beatnet real-time real-time-beat-tracker real-time-downbeat-tracker real-time-tempo dnn-beat-tracking particle-filtering pytorch meter-detection crnn-network

beatnet's Introduction

Notice

BeatNet is the state-of-the-art AI-based Python library for joint music beat, downbeat, tempo, and meter tracking. This repo includes the BeatNet neural structure along with the efficient two-stage cascade particle filtering algorithm that is proposed in the paper. It offers four distinct working modes, as follows:

Streaming mode: This mode captures streaming audio directly from the microphone.
Real-time mode: In this mode, audio files are read and processed in real-time, yielding immediate results.
Online mode: Similar to Real-time mode, Online mode employs the same causal algorithm for track processing. However, rather than reading the files in real-time, it reads them faster, while still producing identical outcomes to the real-time mode.
Offline mode: Inferes beats and downbeats in an offline fashion.

To gain a better understanding of each mode, please refer to the Usage examples provided in this document.

This repository contains the user package and the source code of the Monte Carlo particle flitering inference model of the "BeatNet" music online joint beat/downbeat/tempo/meter tracking system. The arxiv version of the original ISMIR-2021 paper:

In addition to the proposed online inference, we added madmom's DBN beat/downbeat inference model for the offline usages. Note that, the offline model still utilize BeatNet's neural network rather than that of Madmom which leads to better performance and significantly faster results.

Note: All models are trained using pytorch and are included in the models folder. In order to recieve the training script and the datasets data/feature handlers, shoot me an email at mheydari [at] ur.rochester.edu

System Input:

Raw audio waveform object or directory.

By using the audio directory as the system input, the system automatically resamples the audio file to 22050 Hz. However, in the case of using an audio object as the input, make sure that the audio sample rate is equal to 22050 Hz.

System Output:

A vector including beats and downbeats columns, respectively with the following shape: numpy_array(num_beats, 2).

Input Parameters:

model: An scalar in the range [1,3] to select which pre-trained CRNN models to utilize.

mode: An string to determine the working mode. i.e. 'stream', 'realtime', 'online' and 'offline'.

inference model: A string to choose the inference approach. i.e. 'PF' standing for Particle Filtering for causal inferences and 'DBN' standing for Dynamic Bayesian Network for non-causal usages.

plot: A list of strings to plot. It can include 'activations', 'beat_particles' and 'downbeat_particles' Note that to speed up plotting the figures, rather than new plots per frame, the previous plots get updated. However, to secure realtime results, it is recommended to not plot or have as less number of plots as possible at the time.

thread: To decide whether accomplish the inference at the main thread or another thread.

device: Type of device being used. Cuda or cpu (by default).

Installation command:

Approach #1: Installing binaries from the pypi website:

pip install BeatNet

Approach #2: Installing directly from the Git repository:

pip install git+https://github.com/mjhydri/BeatNet

Note: Before installing the BeatNet make sure Librosa and Madmom packages are installed. Also, pyaudio is a python binding for Portaudio to handle audio streaming. If Pyaudio is not installed in your machine, depending on your machine type either install it thorugh pip (Mac OS and Linux) or download an appropriate version for your machine (Windows) from here. Then, navigate to the file location through commandline and use the following command to install the wheel file locally:

pip install <Pyaduio_file_name.whl>

Usage example 1 (Streaming mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='stream', inference_model='PF', plot=[], thread=False)

Output = estimator.process()

*In streaming usage cases, make sure to feed the system with as loud input as possible to leverage the maximum streaming performance, given all models are trained on the datasets containing mastered songs.

Usage example 2 (Realtime mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='realtime', inference_model='PF', plot=['beat_particles'], thread=False)

Output = estimator.process("audio file directory")

Usage example 3 (Online mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='online', inference_model='PF', plot=['activations'], thread=False)

Output = estimator.process("audio file directory")

Usage example 4 (Offline mode):

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='offline', inference_model='DBN', plot=[], thread=False)

Output = estimator.process("audio file directory")

Video Tutorial:

1: In this tutorial, we explain the BeatNet mechanism.

Video Demos:

In order to demonstrate the performance of the system for different beat/donbeat tracking difficulties, here are three video demo examples :

1: Song Difficulty: Easy

2: Song difficulty: Medium

3: Song difficulty: Veteran

Acknowledgements:

For the input feature extraction and the raw state space generation, Librosa and Madmom libraries are ustilzed respectively. Many thanks for their great jobs. This work has been partially supported by the National Science Foundation grants 1846184 and DGE-1922591.

arXiv 2108.03576

Cite:

@inproceedings{heydari2021beatnet,
  title={BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking},
  author={Heydari, Mojtaba and Cwitkowitz, Frank and Duan, Zhiyao},
  journal={22th International Society for Music Information Retrieval Conference, ISMIR},
  year={2021}
}

@inproceedings{heydari2021don,
  title={Don’t look back: An online beat tracking method using RNN and enhanced particle filtering},
  author={Heydari, Mojtaba and Duan, Zhiyao},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={236--240},
  year={2021},
  organization={IEEE}
}

beatnet's People

Contributors

Stargazers

Watchers

beatnet's Issues

Confused about the result.

sry, i am not clear about the meaning of second column. What does 1. and 2. refer to ?

thanks

Help install BeatNet

Hi guys, I'm trying to install this but can't get it running.

This is the step:

OS: Debian AMD64
Python: 3.8

Step:

Create virtualenv with 'virtualenv env'
Activate the virtualenv 'source env/bin/activate'
Install cython 'pip install cython'
Install beatnet 'pip install beatnet' => Success
Create test.py

//test.py content
from BeatNet.BeatNet import BeatNet

Run python test.py and got those error

python test.py
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    from BeatNet.BeatNet import BeatNet
  File "/home/user1/Documents/python/beatnet2/env/lib/python3.8/site-packages/BeatNet/BeatNet.py", line 8, in <module>
    from madmom.features import DBNDownBeatTrackingProcessor
  File "/home/user1/Documents/python/beatnet2/env/lib/python3.8/site-packages/madmom/__init__.py", line 24, in <module>
    from . import audio, evaluation, features, io, ml, models, processors, utils
  File "/home/user1/Documents/python/beatnet2/env/lib/python3.8/site-packages/madmom/audio/__init__.py", line 27, in <module>
    from . import comb_filters, filters, signal, spectrogram, stft
  File "madmom/audio/comb_filters.pyx", line 1, in init madmom.audio.comb_filters
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Thank you.

Invalid device string: 'cuda:cpu' error: bug in model.py

Hey, thanks for your sharing your work. It is very appreciated!

After a clean installation I wasn't able to run the example code provided. I tried digging around the code to see if I could fix it for you.

The error I kept getting was

RuntimeError: Invalid device string: 'cuda:cpu'

This code is the problem:

I changed it to this:

and I got it working

Here is the code.

    def change_device(self, device=None):
        """
        Change the device and load the model onto the new device.

        Parameters
        ----------
        device : string or None, optional (default None)
          Device to load model onto
        """
        if device is None:
            # If the function is called without a device, use the current device
            device = self.device
        elif not torch.cuda.is_available():
            device = torch.device('cpu')
        else:
        # Create the appropriate device object
            device = torch.device(f'cuda:{device}')

        # Change device field
        self.device = device
        # Load the transcription model onto the device
        self.to(self.device)

I could clean the code a little and open a pull request with this change, if you are open to that. In any case, maybe this issue will help others to make your great project work on their computer.

Cheers.

Not able to import beatnet

I have audio data, how do I call this?

I don't see any examples of how to handle if I have a 2D numpy array of amplitudes? Everything requires me to write it to a file first, which seems under optimal.
How do I "read" the output, IE what's in it? I watched the video, and while it's very cool, that's not something I can actually understand. I'm assuming that there'd be something that says 0 secs -> x seconds, 120 bpm, 4/4 time signature, or something along those lines.

Add CoreML model conversion script & tutorial

Is it possible to convert the BeatNet algorithm to CoreML to use it on iOS devices? And are there any plans to add a script for CoreML conversion? I would love to see BeatNet work on-device as there really isn't any good beat tracking system for mobile devices yet.

llvmlite error during installation

error: legacy-install-failure

× Encountered error while trying to install package.
╰─> llvmlite

I don't know what's wrong with this package

M1 Mac Support?

I've been wondering if anyone has got this awesome looking package working on M1 - I cannot for the life of me figure out which versions of numba and llvm to use. Brew doesn't let you install a version of llvm compatible with the versions of numba specified.

I wonder if I should checkout the repo, change some of the version requirements and cross my fingers 😂

Alternatively is this possible inside docker - will portaudio work inside a docker container?

Low-quality audio makes detection worse

Hey, thanks for your sharing. It is a great work.

I found that when I use 16000Hz audio I get worse results than 22050Hz.（audio from the same music）
Inputs are all automatically resampled to 22050 Hz.
How can I do better when I only have low quality audio of 16000Hz.

Numpy > 1.20 depreciation error

I'm getting

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:

with numpy version 1.24.4

Not speeding up inference on using CUDA/GPU

hi @mjhydri / @karen-pal / @rlleshi ,

I am able to run the inference on GPU. It is successfully running but the speed is not faster than on CPU. It is slower at times as well.

Config/Setup-
GPU - V100 16GB

Code used -

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='online', inference_model='PF', plot=['activations'], thread=False, device='cuda')

Output = estimator.process("audio file directory")

Also tried -

from BeatNet.BeatNet import BeatNet

estimator = BeatNet(1, mode='online', inference_model='PF', plot=['activations'], thread=False, device='cuda:0')

Output = estimator.process("audio file directory")

Note that I checked whether torch is recognizing the GPU or not. It is recognizing the GPU. Also, GPU memory consumption varies from 950 MB - 1350 MB during inference.

Incompatable with Spleeter?

I was able to get it working but only by installing BeatNet after calling Spleeter, which makes it a little weird to work with. I have an open StackOverflow question on this, but I was wondering if you could resolve it with dependency management of some kind.

https://stackoverflow.com/questions/75838650/spleeter-and-beatnet-incompatible-numpy-numba-libraries-any-solutions

Issues regarding the particle filtering model

Hi,

Many thanks for this cool work!
I have two questions regarding the particle filtering (PF) model:

The model does not produce same results for the same activation functions of a same track. In the attached jupyter notebook (pf-repeat-issue.ipynb), I run PF on a same activation function for five times, and get different results.
The model does not work on `ideal activation function'. In the pf-groundtruth-issue.ipynb, I generate an ideal activation function using beat annotations, which would only have peaks at beat positions. But the PF generate very low Recall for that.

The notebooks are shared via google drive: https://drive.google.com/drive/folders/1_H8u847bVnUP7Lfome8WuO98FNaU4Jew?usp=sharing

Are these issues expected because of the sampling process of PF? Or, is there any way we may avoid/alleviate these issues? Also, is there any idea regarding the variance/std of the PF performance under different conditions (e.g., genres)?
Just want to make sure I didn't use your model wrong. Thank you!

How to get bpm state space value?

Running offline mode on audio file produces beat and downbeat array. But I can't access any intermediate state space result. I want to get a scalar bpm value or bpm posterior.

beatnet train script

Hello, I have sent an email. I would greatly appreciate it if I could receive your response. Thank you."

Restrictive Numba dependency makes Numpy type hints non-descriptive

Would it be possible now to update the dependency of Numba to a later version that supports numpy 1.23? Without it, type hints of ndarray is limited to just that, ndarray, when IDEs infer variable types. The later versions allow inferring ndarray[shape, type] such as ndarray[(Any, 3), float] which is very useful for code readability.

I'm aware of the restriction from librosa that was commented back in 2021, so just wondering if theit misleading (fake) support was fixed?

Feel free to close this if it's still not possible. Thanks.

Make pyaudio optional?

Noticed that pyaudio is a required installation even when not using audio streaming, because it's imported in the top-level BeatNet.py module. Could this be relaxed to an optional dependency?

Can't install on Ubuntu 20.04.1 - the version of numba required by the package can't be found by pip (0.54.1)

Hey there!
Here's the error when trying to install with pip, either from pip install beatnet or by running pip install .:

ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.53.0 Requires-Python >=3.6,<3.10; 0.53.0rc1.post1 Requires-Python >=3.6,<3.10; 0.53.0rc2 Requires-Python >=3.6,<3.10; 0.53.0rc3 Requires-Python >=3.6,<3.10; 0.53.1 Requires-Python >=3.6,<3.10; 0.54.0 Requires-Python >=3.7,<3.10; 0.54.0rc2 Requires-Python >=3.7,<3.10; 0.54.0rc3 Requires-Python >=3.7,<3.10; 0.54.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement numba==0.54.1 (from beatnet) (from versions: 0.1, 0.2, 0.3, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.12.2, 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.18.1, 0.18.2, 0.19.1, 0.19.2, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.1, 0.29.0, 0.30.0, 0.30.1, 0.31.0, 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.1, 0.36.2, 0.37.0, 0.38.0, 0.38.1, 0.39.0, 0.40.0, 0.40.1, 0.41.0, 0.42.0, 0.42.1, 0.43.0, 0.43.1, 0.44.0, 0.44.1, 0.45.0, 0.45.1, 0.46.0, 0.47.0, 0.48.0, 0.49.0, 0.49.1rc1, 0.49.1, 0.50.0rc1, 0.50.0, 0.50.1, 0.51.0rc1, 0.51.0, 0.51.1, 0.51.2, 0.52.0rc2, 0.55.0rc1, 0.55.0, 0.55.1, 0.55.2, 0.56.0rc1, 0.56.0, 0.56.2, 0.56.3, 0.56.4, 0.57.0rc1, 0.57.0, 0.57.1rc1, 0.57.1, 0.58.0rc1, 0.58.0rc2, 0.58.0, 0.58.1, 0.59.0rc1)
ERROR: No matching distribution found for numba==0.54.1

I tried changing the numba version in setup.py to 0.55.0 (which is the closest version that is available) and that makes it install but it crashes with a numpy error then. Should I try to install 0.55.0 and move on with debugging numpy?

Thanks!

Unusual licensing choice

Hi Moji,

I wanted to point out that your license choice is a little unusual. Usually Creative Commons licenses aren't used for software.

Per CC themselves: "We recommend against using Creative Commons licenses for software"
https://creativecommons.org/faq/#can-i-apply-a-creative-commons-license-to-software

Given you've chosen CC-BY, have you considered using a more typical permissive attribution based license such as MIT, BSD-3-Clause, or Apache-2.0?

PyAudio Input overflowed

Hello,

I can run with below modification.

def activation_extractor_stream(self):
# TODO:
''' Streaming window
Given the training input window's origin set to center, this streaming data formation causes 0.084 (s) delay compared to the trained model that needs to be fixed.
'''
with torch.no_grad():

                hop = self.stream.read(self.log_spec_hop_length,exception_on_overflow = False)

numpy error!

ValueError: numpy.ndarray size changed, may indicate binary incompatibility.
Expected 96 from C header, got 88 from PyObject!

Can you please provide the annotation using which you have validated the model as I am not able to reproduce rest on GTZAN dataset?

Can you provide Rock Corpus Annotation and Ballroom Annotations also?

which numpy version should i use?

I used numpy==1.20.3 ,but it reported that ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject,I tried to upgrade numpy, but numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.23.1 which is incompatible. and reported ImportError: Numba needs NumPy 1.20 or less

what is the difference of the three feature extraction model?

Hi,Great work!
I realize the parameter of the 'model'.

I test the model on the same wav but they get different result.

from BeatNet.BeatNet import BeatNet
import  numpy as np
def get_bpm(inp):
    begin = inp[0][0]
    durations = []
    for line in inp[1:]:
        durations.append(line[0] - begin)
        begin = line[0]
    return 60 / np.mean(durations)

for i in range(1,4):
    estimator = BeatNet(i, mode='offline', inference_model='PF', thread=False)

    Output = estimator.process("bpm_tes1.wav")
    # print(Output)
    print(i, get_bpm(Output))

why BeatNet SOTA model?

in the "Temporal convolutional networks for musical audio beat tracking" of 2019, the published F-measure in ballroom、GTZAN are all better than BeatNet, so why BeatNet claimed SOTA?
is there something I missunderstand?

Could you please provide a complete code of training?

I want to do further training on my dataset, but there is currently no training code in the library.