MuseBERT

The code repository of the paper:

Z.Wang and G.Xia, MuseBERT: Pre-training of Music Representation for Music Understanding and Controllable Generation, ISMIR 2021.

The workflow of MuseBERT

In this initial commit, we open source the core of MuseBERT: the entire workflow of MuseBERT. The workflow is used in both pre-training and fine-tuning. It contains four steps:

Stochastic mapping: $X^\text{base}$ is a note matrix of size $(L, 3)$ consisting of $L$ notes and $3$ attributes of onset, pitch and duration. It is stochastically converted to more detailed note attributes $X^{\text{fac}}$ of size $(L, 7)$, together with a stack of relation matrices $R_\mathcal{S}$ of size $(4, L, L)$.
Data corruption: Data corruption is applied in BERT-like fashion where 1) note attributes are masked/replaced/kept at random. Relation matrices are recomputed (because there are replaced attributes) and masked at random.
Reconstruction: MuseBERT model is applied to reconstruct the input $X^\text{fac}$, where masked attributes are fed as model input and corrupted relations are fed as generalized relative positional encoding.
Deterministic Mapping: the reconstructed factorized data is decoded back to $X^\text{base}$.

There are several implementation details worth-mentioning:

MuseBERT does not use absolute positional encoding.
In practice, data corruption over relation matrices are not recomputed. We first compute the relation matrices based on a replaced-only $X^*$ (an intermediate product) and then apply mask corruption by generating a stack of symmetrical masks.
We propose novel Generalized Relative Positional Encoding introduced in eq. (10) & (11) of our paper. The direct implementation is not affordable in time complexity. We therefore implement GRPE in an effficient way:
- We uses distributive law to both eq. (10) & (11).
- Since the embedding of the same relation (e.g., $<$) will be used in many place, we compute it only once and avoid expanding relation matrices embedding into $(L, L, \text{emb_dim})$.
Loss is applied to corrupted tokens only. In our implementation, we compute the loss in the fashion where each segment (in a batch) is treated equally regardless of the different number of corrupted tokens. For example, bs=2, $N_1$ sample in sample 1, and $N_2$ samples in sample 2, the weighting is: [0.5 / N_1]* N_1 + [0.5 / N_2] * N_2.

Files

A brief introduction of what is in each file:

note_attribute_repr.py: converting between $X^\text{base}$ to $X^\text{fac}$.
note_attribute_corrupter.py: computing relation matrices and data corruption.
transformer.py: implementatino of GRPE and a modified Transformer encoder.
musebert_model.py: implementation of MuseBERT model.
curriculum_preset: preset hyperparameters of MuseBERT workflow, model architecture and training.
curricula.py: handling of hyperparameters for different presets for pre-training and different types of fine-tuning.
dataset.py, utils.py, train.py etc.

Training

python train.py

Thoughts & Future Plan

As stated in our paper as well, we see our model as a powerful controllable music generator and analyzer. However, it is still in its perliminary stage. We plan to update the code including more downstream fine-tuning tasks, and a more complete illustration of the entire MuseBERT methodology.

The paper is written in a very rigorous manner with clear definition and theorem proving, which is very rare in music generation studies. Nevertheless, we believe this is a proper way to present our work in the sense that our model adds music control from the token-wise/lexeme level of music, and an isomorphism is made between a BERT problem and a way of generalized positional encoding, and between a BERT problem and a constraint solver.

Problems and discussion on this paper is welcome. Please contact me at [email protected].

bdg-tbd / musebert Goto Github PK

musebert's Introduction

MuseBERT

The workflow of MuseBERT

Files

Training

Thoughts & Future Plan

musebert's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent