sbenthall / deeptune Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 0.0 5.92 MB

Music style transfer

License: MIT License

Python 0.27% Jupyter Notebook 99.73%

deeptune's Introduction

deeptune

Musical style transfer

Python Depedencies

Run pip install -r requirements.txt

Acknowledgements

This work builds on the Stanford Music Information Retrieval course materials.

deeptune's People

Contributors

Stargazers

Watchers

deeptune's Issues

Loss of signal in song roundtrip -- try keeping in amplitude space

There is some loss of signal in the roundtrip of a song from mp3 -> wav -> numpy -> wav (-> mp3).

On possibility is that this is being lost in the conversion from amplitude to decibel spaces.
Another is that it's being lost in the short term Fourier transform step.

Experiment with

reducing the amount of preprocessing and seeing which steps are causing the information loss
checking which "purer" forms are still exportable to numpy and feasible to import into the CNN

arbitrary sound transform class

Develop class and implement for an arbitrary sound transformation (such as adding noise).

Test using conversion to mp3

explore use of multiple targets in style extractor

The style extractor currently cargo-culted from the tutorial looks like it supports multiple content and style targets.

This might be a preferable way of dealing with the fragment matching problems than lining them up 1-by-1.

Look into this? See #16

Utility for transforming mp3 into array data

Don't do all this in the notebook!
Script out transforming mp3 into array data.

Song recognizing neural net in tensorflow

Building on #9 ....

Once the data is available, try a few different neural network architectures and pick the one with best performance.

Improve ANN architecture to capture better features

The current neural network architecture probably is not encoded features in an interesting way that's useful for style transfer.

Put some more thought into this and try again.

better output path for transfers

Tranfered fragments are being dumped into a directory with raw fragments.

This muddies the training/testing/product data categories.

This has to be better configured with distinct paths for each.

(Background) Adapt Stanford MIR neural net example

Build familiarity with Keras and/or Tensorflow API.

improve tooling around listening to style transfer output (for testing)

Improve tooling and workflow for listening to style transfer output.

This will make it easier to test for improvements.

(Currently it's a lot of brittle command line work).

Improve audio quality in audio -> array -> audio round trip

A lot of audio quality is lost in the round trip between audio, array representation, and audio.

This is as of these changes:

7e24fb9

Discover if there's a way to fix this. Consult an expert

Fragments as a Class

The song fragments ("chunks") are being handled in an ad hoc way.

It would be better to encapsulate these as a class.

Try increasing the sampling rate

Try increasing the sampling rate for preprocessing the sound data, and using smaller fragments (.5 seconds)

using song recognition model, do style transfers on pairs of fragments

Using the song recognition model from #10, and given two fragments, perform a style transfer from the second fragment to the first.

training/test data from steve morrell corpus

For training of neural network models, I need training/test data for music.

Goal:

Start with a directory of MP3s*
Break each MP3 into fragments -- parameter: length of time
MP3 fragment into array of data
arrays of data as X, with song/file title as Y, category label.
Export to file
Import into tensorflow

for training, can use genre diverse openly available music, such as the Steve Morrell corpus.

Style transfer for two whole songs with trained model

Building on #12

Somehow (lots of free parameters here) do a style transfer of two whole songs, reconstructing the new merged song from the fragments.

Since the style transfer from #12 is between two fragment, this raises a question of how to match fragments from one song to fragments of the other.

Can do something messy to start with, such as looping the style song. Using 15 second beat loops from the Yrevocnu Organ could be appropriate here.

Genre test in scripts

Scaffold the genre test from Stanford MIR into scripts

Reconstruct song (in mp3 form) from fragment series

The neural network operates on 'fragments' of a song: a numerical array representing a ~1 second clip.

The plan for music style transfer (for now) is to operate on these fragments.
But we need a way to reconstruct a listenable some from these fragments after the transfer.

Build the song reconstruction steps.

learn tensorflow

https://github.com/Hvass-Labs/TensorFlow-Tutorials

Solve array data to mp3/wav link.

Need to figure out how array data to mp3/wav works.

3D convolutional network for harmonic features.

The current method of preprocessing the music data is to split it into equal-time chunks, convert into an array of dB at each frequency at each time step (~1024 x 44 array, where the x axis is frequencies), and then run these 2D arrays through a convolutional neural network.

The timbre of a musical note is a function of the amplitude at the lowest frequency of the note as well as its harmonic frequencies. The convolutional features are not yet accounting for timbre.

Try reshaping the input data into a 3D array such that each frequency is adjacent in the z axis to its nearest harmonic frequencies (is this possible?). Then try using a 3D convolutional layer and see if it alters performance..