Giter VIP home page Giter VIP logo

template-scala-parallel-dl4j-rntn's Introduction

Template description.

This template is based on deeplearning4j RNTN example. It's goal is to show how to integrate deeplearning4j library with PredictionIO.

Recursive Neural Tensor Network algorithm is supervised learning algorithm used to predict sentiment of sentences.

As of today, deeplearning4j RNTN algorithm implementation does not work properly (eg. training does not finnish). Corresponding issue in deeplearning4j library has been added.

Installation.

Follow installation guide for PredictionIO.

After installation start all PredictionIO vendors and check pio status:

pio-start-all
pio status

This template depends on deeplearning4j 0.0.3.3.3.alpha1-SNAPSHOT. In order to install it run:

git clone [email protected]:deeplearning4j/deeplearning4j.git
cd deeplearning4j
chmod a+x setup.sh
./setup.sh

Copy this template to your local directory with:

pio template get ts335793/template-scala-parallel-dl4j-rntn <TemplateName>

Build, train, deploy.

You might build template, train it and deploy by typing:

pio build
pio train -- --executor-memory=4GB --driver-memory=4GB
pio deploy -- --executor-memory=4GB --driver-memory=4GB

Those pio train options are used to avoid problems with java garbage collector. In case they appear increase executor memory and driver memory.

Attention!

  • pio train command won't stop as deeplearning4j RNTN fit function does not work properly

Importing training data.

You can import example training data from kaggle. It is collection of the Rotten Tomatoes movie reviews with sentiment labels.

In order to use this data, create new app:

pio app new <ApplicationName> # prints out ApplicationAccessKey and ApplicationId

set appId in engine.json to ApplicationId and import data with:

python data/import_eventserver.py --access_key <ApplicationAccessKey> --file train.tsv

You can always remind your application id and key with:

pio app list

Sending requests to server.

In order to send a query run in template directory:

python data/send_query_interactive.py

and type phrase you want sentiment to be predicted. The result will be a list of predicted sentiments for all sentences in phrase.

Algorithm overview.

At first Word2Vec is trained (it creates mapping from words to vectors).

val (vocabCache, weightLookupTable) = {
  val result = new SparkWord2Vec().train(data.phrases)
  (result.getFirst, result.getSecond)
}
val word2vec = new Word2Vec.Builder()
  .lookupTable(weightLookupTable)
  .vocabCache(vocabCache)
  .build()

It is passed to RNTN builder.

val rntn = new RNTN.Builder()
  .setActivationFunction(ap.activationFunction)
  .setAdagradResetFrequency(ap.adagradResetFrequency)
  .setCombineClassification(ap.combineClassification)
  .setFeatureVectors(word2vec)
  .setRandomFeatureVectors(ap.randomFutureVectors)
  .setRng(new DefaultRandom())
  .setUseTensors(ap.useTensors)
  .build()

Each phrase from training set is converted to tree with TreeVectorizer.

val listsOfTrees = data.labeledPhrases.mapPartitions(labeledPhrases => {
  val treeVectorizer = new TreeVectorizer() // it is so slow
  labeledPhrases.map(
    x => treeVectorizer.getTreesWithLabels(x.phrase, x.sentiment.toString, data.labels))
})
val listOfTrees = listsOfTrees.reduce(_ ++ _)

RNTN is fitted to those trees.

rntn.fit(listOfTrees)

Finally, model is saved.

new Model(
  rntn = rntn,
  labels = data.labels
)

Serving overview.

List of trees for sentences in query is created.

val trees = new TreeVectorizer().getTreesWithLabels(query.phrase, model.labels)

Sentiment for each sentence is being predicted.

val sentiment = model.rntn.predict(trees)

Result is returned.

PredictedResult(sentiment = sentiment.toList)

template-scala-parallel-dl4j-rntn's People

Contributors

tsteczniewski avatar dszeto avatar k4hoo avatar

Watchers

James Cloos avatar Tristan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.