Giter VIP home page Giter VIP logo

titanic's Introduction

titanic

Example code for solving the Titanic prediction problem (https://www.kaggle.com/c/titanic-gettingStarted) found on Kaggle. This example uses the Weka Data Mining Libraries to perform our classifications and predictions. Note, we are using Weka version 3.6.9.

Data Cleanup/Initialization

Before we begin, we have to clean up the data files provided by Kaggle (these cleanup steps have already been performed on the committed files). The first step is to remove the nested '""' (quotation marks) from the files. This was simply a straight search and replace operation in my editor.

The next step is to convert the CSV formatted files into the ARFF format. The ARFF format provides more detailed information about the type of data in the CSV files. To perform this conversion, you can use the CSVLoader from the Weka libraries.

java -cp lib/weka.jar weka.core.converters.CSVLoader test.csv > test.arff
java -cp lib/weka.jar weka.core.converters.CSVLoader train.csv > train.arff

Once we have created the ARFF files, we need to clean them up a little bit. First, we identify any 'string' column to be of type string, and not nominal. Then we ensure that nominal values are in the same order for both files (VERY IMPORTANT!). Here is what the header section of the ARFF file should look like:

@attribute survived {0,1}
@attribute pclass numeric
@attribute name string
@attribute sex {male,female}
@attribute age numeric
@attribute sibsp numeric
@attribute parch numeric
@attribute ticket string
@attribute fare numeric
@attribute cabin string
@attribute embarked {Q,S,C}

Training, Predicting, and Verifying the data

Now that we have cleaned up our data, we are ready to run the code. I have included the Eclipse project files to make it easy for anyone to import this project into Eclipse and go. I have also included an Ant build file to compile and run everything as well. If you don't have either of those options, you are on your own.

Training

To train the classifier, execute the 'titanic.weka.Train' class or run 'ant train' in a terminal. This will load the training data, create and train a Classifier, and write the Classifier to disk.

Predicting

To create a prediction, execute the 'titanic.weka.Predict' class or run 'ant predict'. This will load the test data, read the trained Classifier from disk, and produce a 'predict.csv'. This CSV file is in a suitable format to submit to Kaggle.

Verifying

To verify our predictions, execute the 'titanic.weka.Verify' class or run 'ant verify'. This will load our prediction results, read the trained Classifier from disk, then evaluate the classification performance. You will see output similar to this:

Correctly Classified Instances         418              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0.1409
Root mean squared error                  0.1986
Relative absolute error                 30.3515 %
Root relative squared error             41.2246 %
Total Number of Instances              418     

titanic's People

Contributors

birchsport avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.