sbenthall / deeptune Goto Github PK
View Code? Open in Web Editor NEWMusic style transfer
License: MIT License
Music style transfer
License: MIT License
Tranfered fragments are being dumped into a directory with raw fragments.
This muddies the training/testing/product data categories.
This has to be better configured with distinct paths for each.
A lot of audio quality is lost in the round trip between audio, array representation, and audio.
This is as of these changes:
Discover if there's a way to fix this. Consult an expert
Don't do all this in the notebook!
Script out transforming mp3 into array data.
The style extractor currently cargo-culted from the tutorial looks like it supports multiple content and style targets.
This might be a preferable way of dealing with the fragment matching problems than lining them up 1-by-1.
Look into this? See #16
Using the song recognition model from #10, and given two fragments, perform a style transfer from the second fragment to the first.
Scaffold the genre test from Stanford MIR into scripts
Building on #12
Somehow (lots of free parameters here) do a style transfer of two whole songs, reconstructing the new merged song from the fragments.
Since the style transfer from #12 is between two fragment, this raises a question of how to match fragments from one song to fragments of the other.
Can do something messy to start with, such as looping the style song. Using 15 second beat loops from the Yrevocnu Organ could be appropriate here.
Build familiarity with Keras and/or Tensorflow API.
For training of neural network models, I need training/test data for music.
Goal:
There is some loss of signal in the roundtrip of a song from mp3 -> wav -> numpy -> wav (-> mp3).
On possibility is that this is being lost in the conversion from amplitude to decibel spaces.
Another is that it's being lost in the short term Fourier transform step.
Experiment with
Building on #9 ....
Once the data is available, try a few different neural network architectures and pick the one with best performance.
The song fragments ("chunks") are being handled in an ad hoc way.
It would be better to encapsulate these as a class.
Need to figure out how array data to mp3/wav works.
The neural network operates on 'fragments' of a song: a numerical array representing a ~1 second clip.
The plan for music style transfer (for now) is to operate on these fragments.
But we need a way to reconstruct a listenable some from these fragments after the transfer.
Build the song reconstruction steps.
Improve tooling and workflow for listening to style transfer output.
This will make it easier to test for improvements.
(Currently it's a lot of brittle command line work).
Try increasing the sampling rate for preprocessing the sound data, and using smaller fragments (.5 seconds)
The current neural network architecture probably is not encoded features in an interesting way that's useful for style transfer.
Put some more thought into this and try again.
The current method of preprocessing the music data is to split it into equal-time chunks, convert into an array of dB at each frequency at each time step (~1024 x 44 array, where the x axis is frequencies), and then run these 2D arrays through a convolutional neural network.
The timbre of a musical note is a function of the amplitude at the lowest frequency of the note as well as its harmonic frequencies. The convolutional features are not yet accounting for timbre.
Try reshaping the input data into a 3D array such that each frequency is adjacent in the z axis to its nearest harmonic frequencies (is this possible?). Then try using a 3D convolutional layer and see if it alters performance..
Develop class and implement for an arbitrary sound transformation (such as adding noise).
Test using conversion to mp3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.