Giter VIP home page Giter VIP logo

wavenet's Introduction

WaveNet Keras implementation

This repository contains a basic implementation of the WaveNet as described in the paper published by DeepMind: Oord, Aaron van den, et al. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).

Installation instructions

The code has only been tested and verified with Python 3.6. Assuming you have an installation of pipenv for Python 3, you may clone the project, navigate to the root folder and run:

make install

This will most likely take care of the dependencies, unless you're using Windows.

Reproducibility: Running the examples

In the examples folder you will find a small sample of data, downloaded from the LJ Speech Dataset. The dataset originally contains about 24 hours of speech, but I selected just a few files to create a small proof of concept, since I ran the training on my laptop and training such a complex architecture on a huge dataset was not viable for me. I used 50 files for training and 6 for validation.

Training

To train the network with the small amount of data provided in the package, navigate to the examples directory and run:

pipenv run python train_small.py

Feel free to also tweak the parameters and add more data, if your computational resources allow it (e.g. use AWS spot instances with GPUs). For example, I see posts around the internet that use 1000-2000 epochs. I used 20, because an order of magnitude higher would take days to train. The filter size should also probably be larger (e.g. 64), and the residual blocks should be more (but keep in mind the paper recommends dilation rate mod9).

In the figure below, you may see a plot of the training loss, using the default parameters currently in wavenet.examples.train_small. It's obvious that the model is far from saturation.

Training Loss

Generating sound

Using the little network that I trained, the generated wavefile sounds like plain noise. However, if you'd like to generate your own wavefile, tweak the parameters accordingly (e.g. point to your own model) and run:

pipenv run python generate_small.py

wavenet's People

Contributors

peustr avatar

Stargazers

Nikolaos Dionelis avatar DM avatar Stefan Dlugolinsky avatar Parth Shastri avatar 8bitmp3 avatar Roman Hossain Shaon avatar Tadeusz Hupalo avatar Rafael Del Lama avatar Siddharth Sanghavi avatar  avatar Ahmad Moussa avatar Hongtao Cai avatar Zhang, Yu-Rong avatar jkang avatar Alex Deineha  avatar Kevin Ashcraft avatar  avatar Tim Cowlishaw avatar Angelos Filos avatar Edward Cui avatar Tim Sainburg avatar Kai Saksela avatar Swastik Biswas avatar

Watchers

James Cloos avatar  avatar paper2code - bot avatar

wavenet's Issues

Causal convolution

Hello,

The code is very well redacted and explained, thanks for it.
I see the padding is put to 'same' in the model but shouldn't it be 'causal' ?
as the wavenet use causal convolutions.

Are we extracting a constant value (in the same index within "audio") as the target y-value?

y_cur = audio[receptive_field_size]

Hi, I am a grad student at Georgia Tech, trying to learn WaveNet through your implementation. Thanks for building this. I have a question though: why is this line extracting y_cur like this? If you always extract "receptive_field_size" index from "audio," we would be extracting the same y-value for every audio segment in X. I was wondering if this is the intention?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.