Giter VIP home page Giter VIP logo

rapai's Introduction

RapAI

Data Mining & Analytics (INFO 254/ DATA 144) Fall 2019 Final Project.

Training a language model to generate rap lyrics. Work in progress!!

Contents

model.py: keras models defined

train_lm.py: run to initiate training

Preprocess-Rap.ipynb: jupyter notebook to pre-process and generate train/val/test splits

data/rap_max100_10: train/val/test split of rap lyrics dataset containing 20 artists, 100 songs form each artist.

Usage

Requiremens

Training

python train_lm.py --name my_model --epoch 20 --batch_size 256 --lstm_size 100 --emb_trainable --domain_vocab

Running train_lm.py trains the model for the specified number of epochs and saves output files in the model directory (model directory named with the --name argument)

Required parameters:

--data: path to data folder

--name: model will be saved under this name

Optional parameters:

--batch_size: batch size for training (default: 128)

--epoch: number of epochs to train (default: 10)

--lstm_size: size of LSTM cells (default: 100)

--emb_trainable: Flag for updating pre-trained word embeddings (default: False if not specified)

--vocab_size: Vocabulary size (default: 50000)

--domain_vocab: if specified, will use up to --domain_vocab_n domain-specific words

--domain_vocab_n: number of most frequnt domain specific words that are not covered by Glove to include in the model. Ignored if --domain_vocab is not specified. (default: 1000)

--n_train: number of training samples to read (default: will read all if not specified)

--n_val: number of validation samples to read (default: will read all if not specified)

Evaluation Metrics

accuracy: percentage of correct token predictions on test set. Higher is better.

top5 accuracy: percentage of correct tokens present in model's top 5 predictions. Higher is better.

top10 accuracy: percentage of correct tokens present in model's top 10 predictions. Higher is better.

perplexity: perplexity of model measured on the test set. Calculated as exp(cross_entropy_loss). Perplexity is a measure of surprisal -- it measures how well our model can predict the test set. Lower perplexity = less surpisal = better.

Data Pre-Processing

Results

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.