Giter VIP home page Giter VIP logo

music_style_transfer's Introduction

Music Style Tranfer

Aim to create an artificial neural network that changes music style, eg) Beatles - hey jude -> hey jude Jazz ver.

Demo

https://hyunlee103.tistory.com/80

Dataset

image
Because the resolution of the audio separation was bad, we needed a separate sound source for the instrument. We used MUSDB18(https://github.com/sigsep/sigsep-mus-db) to satisfy this.

Requirements

  • CUDA 10.0
  • python 3.6.10
  • pytorch 1.7.1
  • numpy 1.19.2
  • opencv-python 4.5.1

Usage

You can choose between the time domain and the frequency domain.

python main.py --data_dir 'your datapath'

Implementation models

We tried three models, one in the time domain and two in the frequency domain.

1. Frequency domain - CycleGAN

We have applied the CycleGAN model, which shows excellent performance in the style transformation of image domain, to the mel-specrogram in a naive manner. In the process of restoring the specrogram to waveform, the sound source resolution was severely degraded. Moreover, the sound source converted through CycleGAN did not change much from the original. We found the reason in the direction back to self due to cycle loss and only consider the pixel-wise loss due to L1 loss, where the specrogram must achieve structural changes before the style can change. Therefore, we tried waveform instead of specrogram and MelGAN instead of CycleGAN.

2. Time domain - MelGAN

MelGAN is a model that reflects the structural loss between the input space of the generator and the generative space through the siamese network. However, since it is a model that applies to spectrogram, we concat input one-dimensional vector waveform axially to create a two-dimensional wave. Through this, not only can the melGAN be applied to the waveform, but also the dilation effect can be expected. This model was not satisfied with the result and we decided to try the autoencoder, not the generative model.

3. Time domain - Autoencoder

We tried Universal Music Translation(https://github.com/facebookresearch/music-translation) for style transfer rock to jazz piano. While this paper translates musical instruments versus musical instruments such as violin, cello, and piano, we tried to transfer the whole rock music into jazz piano. Because this model is based on wavenet, learning and inference cost is very high.

Limits and Future Studies

There is a limit to the application of prior computer vision research due to differences in image and audio data. Due to the high cost of waveNet, it is difficult to increase the resolution of the results and the real-time service seems to be a long way to go. The future direction of research is to identify the data characteristics that affect the music style and create a model that takes those characteristics into account. Also, we need to make low cost models for high-resolution real-time models.

Reference

  • Musdb18 Dataset
  • MUSIC SOURCE SEPARATION USING STACKED HOURGLASS NETWORK(Park et al, 2018 ISMR)
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
  • WaveNet: A Generative Model for Raw Audio(Deep Mind, 2016)
  • META-LEARNING EXTRACTORS FOR MUSIC SOURCE SEPARATION(Samuel et al. 2020)
  • MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms(Marco Pasini, 2020)
  • A Universal Music Translation Network(Noam Mor el al, 2018)

Contributor

Citation

@misc{musdb18,
 author       = {Rafii, Zafar and
              Liutkus, Antoine and
              Fabian-Robert St{\"o}ter and
              Mimilakis, Stylianos Ioannis and
              Bittner, Rachel},
 title        = {The {MUSDB18} corpus for music separation},
 month        = dec,
 year         = 2017,
 doi          = {10.5281/zenodo.1117372},
 url          = {https://doi.org/10.5281/zenodo.1117372} 
}

music_style_transfer's People

Contributors

hyunlee103 avatar koo616 avatar sanghyung-jung avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.