Giter VIP home page Giter VIP logo

langevin-music's Introduction

Generating Music by Langevin Dynamics

We will introduce a new generative model for music composition, applying Langevin dynamics to a gradient-based score matching algorithm based on Song and Ermon, 2019. Unlike implicit models such as GANs, this learns a true, explicit distribution of the input data.

Annealed Langevin dynamics demo

Previous work has seen a success on modeling from continuous input manifolds, such high-quality image inpainting and conditional sampling from MNIST, CIFAR-10, and other datasets. However, it is an open question whether this algorithm can be adjusted to perform well on discrete domains, such as music scores.

We hope that Langevin dynamics and score matching can combine the controllability and of Markov chain Monte Carlo, with the global view and fast convergence of stochastic gradient descent, to generate high-quality structured, compositions.

Problem

DeepBach is a simple and controllable autoregressive model for Bach chorale generation, which are features that make it easy to train and use. In particular, learning Bach chorales is an interesting task because the music is highly structured (often following various "rules"), consistent, and often complex.

Bach chorale example

However, there are many instances where DeepBach is unable to capture long-term structure. Some casual listeners have remarked that the compositions "sound good but go nowhere". This could be due to a combination of vanishing LSTM gradients, and Gibbs sampling getting stuck in 1-optimal local minima.

We believe by applying enough tricks, it should be possible to produce a model that strongly avoids these local minima, while retaining controllability.

Approach

It was seen in Welling and Teh, 2011 that directing traditional MCMC algorithms with learned supervision can greatly accelerate their convergence. This is what motivates us to augment DeepBach's approach with score matching.

It's interesting to analyze other approaches that people have tried in the past:

  • Generative adversarial networks: Although GANs acheive very promising results in modeling latent distributions of images, it's difficult to train them on sequence tasks (discrete tokens), as gradients need to propagate from the discrminator to the generator (Yu et al., 2016).
  • Transformers: Transformers have been applied to the task of music generation and achieved state-of-the-art results on at least one dataset (Huang et al., 2018). However, transformers are computationally expensive, so they're not easily controllable through masking and iterative MCMC-like algorithms.
  • Markov random fields: MRFs have been used for generative models to optimize an energy function, notably for bitmap image generation in ConvChain. This lends credence to MCMC for discrete probabilistic modeling. However, as previously mentioned, it doesn't learn global structure. Also, the alternative approach of gradient ascent is impractical due to adversarial perturbations.

We think that score matching and Langevin dynamics, by adding graded noise to the distribution of data, has the potential to perform well on generative sequence modeling tasks such as music composition, while maintaining the controllability of models like DeepBach.

Evaluation

This project will be successful if we can implement a score matching algorithm for music generation and evaluate its feasibility. In the best case, score matching can be used to improve long-term patterns and interpretability. However, due to the complexity of the algorithm, results are unclear, and we may need various tricks or innovations to obtian convergence.

Our goal, then, is to determine the tractability and performance of a score-matching approach in the discrete domain, which we think is very exciting.

langevin-music's People

Contributors

ekzhang avatar romil797 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

langevin-music's Issues

Corrections to paper

Hi there, I'm there author of BachBot. Your work is very exciting and I'm happy to see it!

I wanted to point out some corrections to https://www.ekzhang.com/assets/pdf/Generative_Music_Modeling.pdf. In particular, the first reference to BachBot (bottom of p1) is misattributted to DeepBach. Furthermore, the two encodings are quite similar (Gaetan and I were both working on this in 2016 and BachBot was open source the entire time) but the music encoding format (discetization to sixteenths, represent note durations using ties across frames, inclusion of fermatas) actually originates from BachBot (see the citation on https://arxiv.org/pdf/1612.01010.pdf page 2 across the fold). DeepBach improves this encoding (among many other things) by adding the key signature to the encoding metadata.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.