Giter VIP home page Giter VIP logo

supervised-machine-learning-project's Introduction

Comparison of Supervised Machine Learning Algorithms

Course: COGS 118A - Supervised Machine Learning Algorithms (Winter 2020)

Data Sets

The following data sets were taken from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml):

Final Project Report

Write a report with >1,000 words (excluding references) including main sections: a) Abstract, b) Introduction, c) Methods, d) Experiments, e) Conclusions, and f) References. The basic requirement for the final project is based on the two-class classification problem.

Train your classifiers using the setting (not all metrics are needed) described in the empirical study by Caruana and Niculescu-Mizil. You are supposed to reproduce consistent results as in the paper. However, do expect some small variations. When evaluating the algorithms, you don’t need to use all the metrics that were reported in the paper. Using one metric, e.g. the classification accuracy, is sufficient. Please report the cross-validated classification results with the corresponding learned hyper-parameters.

If you compute accuracy and follow the basic requirement picking 3 classifiers and 3 datasets. You are looking at 3 trials/repeats X 3 classifiers X 3 datasets X 3 partitions (20/80, 50/50, 80/20). Each time you always report the best accuracy under the chosen hyper-parameter. Since for the accuracy is averaged among three 3 trials/repeats to rank order the classifiers, you will report 3 classifiers X 3 datasets X 3 partitions (20/80, 50/50, 80/20) X 3. accuracies (train, validation, test). When trying to debug, always try to see the training accuracy to see if you are able to at least push the training accuracy high (to overfit the data) as a sanity check making sure your implementation is correct. The heatmaps for your hyper-parameters are the details that do not need to be too carefully compared with. The searching for the hyper-parameters is internal and the final conclusion about the classifiers is based on the best hyper-parameter you have obtained for each time.

Pseudo code

For i in three different datasets
    For j in three types of different partitions
         For t in three different trials/repeats (shuffling or performing random splits for each type j (20/80,80/20) )
              For c in three different classifiers
                     cross validate
                      find the optimal hyper-parameter
                      train using the hyper-parameter above
                      obtain the training and validation accuracy/error
                       test
                       obtain the testing accuracy
          compute the averaged accuracy (training, validation, and testing) for each classifier c out of three trials/repeats
          rank order the classifiers

Chosen Classifiers

  • Linear SVM with scikit-learn

    Linear SVM img
  • KNN with scikit-learn

    KNN img
  • Decision Tree with scikit-learn

    Decision Tree img

Methodology

This section summarizes the parameters used for each learning algorithm.

  • SVM: I used the following kernel with SciKit Learn: linear. The regularization parameters used were: 0.1, 1, 10, 100, 1000.

  • K-NN: Uses KNeighborsClassifier from sklearn to create a grid searcher with 5-fold cross-validation with up to 3 neighbors. The KNN uses the Euclidean distance as weights.

  • Decision Tree: Uses DecisionTreeClassifier from sklearn and GridSearchCV with 10-fold cross-validation with a max-depth of 5.

Useful Resources

Acknowledgements

supervised-machine-learning-project's People

Contributors

jac237 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.