Giter VIP home page Giter VIP logo

musebert's Introduction

MuseBERT

The code repository of the paper:

Z.Wang and G.Xia, MuseBERT: Pre-training of Music Representation for Music Understanding and Controllable Generation, ISMIR 2021.

The workflow of MuseBERT

In this initial commit, we open source the core of MuseBERT: the entire workflow of MuseBERT. The workflow is used in both pre-training and fine-tuning. It contains four steps:

  1. Stochastic mapping: $X^\text{base}$ is a note matrix of size $(L, 3)$ consisting of $L$ notes and $3$ attributes of onset, pitch and duration. It is stochastically converted to more detailed note attributes $X^{\text{fac}}$ of size $(L, 7)$, together with a stack of relation matrices $R_\mathcal{S}$ of size $(4, L, L)$.
  2. Data corruption: Data corruption is applied in BERT-like fashion where 1) note attributes are masked/replaced/kept at random. Relation matrices are recomputed (because there are replaced attributes) and masked at random.
  3. Reconstruction: MuseBERT model is applied to reconstruct the input $X^\text{fac}$, where masked attributes are fed as model input and corrupted relations are fed as generalized relative positional encoding.
  4. Deterministic Mapping: the reconstructed factorized data is decoded back to $X^\text{base}$.

workflow

There are several implementation details worth-mentioning:

  1. MuseBERT does not use absolute positional encoding.
  2. In practice, data corruption over relation matrices are not recomputed. We first compute the relation matrices based on a replaced-only $X^*$ (an intermediate product) and then apply mask corruption by generating a stack of symmetrical masks.
  3. We propose novel Generalized Relative Positional Encoding introduced in eq. (10) & (11) of our paper. The direct implementation is not affordable in time complexity. We therefore implement GRPE in an effficient way:
    • We uses distributive law to both eq. (10) & (11).
    • Since the embedding of the same relation (e.g., $<$) will be used in many place, we compute it only once and avoid expanding relation matrices embedding into $(L, L, \text{emb_dim})$.
  4. Loss is applied to corrupted tokens only. In our implementation, we compute the loss in the fashion where each segment (in a batch) is treated equally regardless of the different number of corrupted tokens. For example, bs=2, $N_1$ sample in sample 1, and $N_2$ samples in sample 2, the weighting is: [0.5 / N_1]* N_1 + [0.5 / N_2] * N_2.

Files

A brief introduction of what is in each file:

  • note_attribute_repr.py: converting between $X^\text{base}$ to $X^\text{fac}$.
  • note_attribute_corrupter.py: computing relation matrices and data corruption.
  • transformer.py: implementatino of GRPE and a modified Transformer encoder.
  • musebert_model.py: implementation of MuseBERT model.
  • curriculum_preset: preset hyperparameters of MuseBERT workflow, model architecture and training.
  • curricula.py: handling of hyperparameters for different presets for pre-training and different types of fine-tuning.
  • dataset.py, utils.py, train.py etc.

Training

python train.py

Thoughts & Future Plan

As stated in our paper as well, we see our model as a powerful controllable music generator and analyzer. However, it is still in its perliminary stage. We plan to update the code including more downstream fine-tuning tasks, and a more complete illustration of the entire MuseBERT methodology.

The paper is written in a very rigorous manner with clear definition and theorem proving, which is very rare in music generation studies. Nevertheless, we believe this is a proper way to present our work in the sense that our model adds music control from the token-wise/lexeme level of music, and an isomorphism is made between a BERT problem and a way of generalized positional encoding, and between a BERT problem and a constraint solver.

Problems and discussion on this paper is welcome. Please contact me at [email protected].

musebert's People

Contributors

zzwaang avatar

Stargazers

 avatar Olivier avatar Bmois avatar Jiaqing Xie avatar Ziheng Chi avatar  avatar Huifan Yang avatar Maximos Kaliakatsos-Papakostas avatar Ben Chou avatar Gerald Golka avatar Daniel J. Szelogowski avatar  avatar Monet Joe avatar Xin Xu avatar YIYU avatar 、、 avatar Theo avatar zhujiem avatar Visalakshi Iyer avatar  avatar  avatar Yijing Feng avatar Ansh Gupta avatar  avatar Rémy Marquis avatar Yi-Hsuan Yang avatar  avatar zhanghx avatar Aolin Li avatar Amos Gee avatar SoulHappy avatar Teresa avatar  avatar  avatar lismin avatar QU Yang avatar jiyun.park avatar Mikhail Samin avatar aaronchen avatar Cuda Chen avatar Yoyo avatar Oriane Nédey avatar  avatar

Watchers

Angelo Mendes avatar  avatar  avatar

musebert's Issues

Is there any samples?

Hi. I really enjoyed reading the paper, it is an awesome work!

In the paper it said there would be some demo for this work, which is the reason why I followed the link here. Therefore I am wondering if there would be some demo be released.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.