Giter VIP home page Giter VIP logo

machine_learning's People

Contributors

ronan-mch avatar

Watchers

 avatar

machine_learning's Issues

Data visualization(s) based on suitable visualization techniques including a principal component analysis (PCA).

Touch upon the following subjects, use visualizations when it appears sensible. Keep in mind the ACCENT principles and Tufte’s guidelines when you visualize the data.

  • Are there issues with outliers in the data,
  • do the attributes appear to be normal distributed,
  • are variables correlated,
  • does the primary machine learning modeling aim appear to be feasible based on your visualizations.

There are three aspects that needs to be described when you carry out the PCA analysis for the report:

  • The amount of variation explained as a function of the number of PCA components included,
  • the principal directions of the considered PCA components (either find a way to plot them or interpret them in terms of the features),
  • the data projected onto the considered principal components.

If your attributes have very different scales it may be relevant to standardize the data prior to the PCA analysis.

A description of your data set

  • What the problem of interest is (i.e. what is your data about),
  • Where you obtained the data,
  • What has previously been done to the data. (i.e. if available go through some of the original source papers and read what they did to the data and summarize what were their results).
  • What the primary machine learning modeling aim is for the data, i.e. which attributes you feel are relevant when carrying out a classification, a regression, a clustering, an association mining, and an anomaly detection in the later reports and what you hope to accomplish using these techniques. For instance, which attribute do you wish to explain in the regression based on which other attributes? Which class label will you predict based on which other attributes in the classification task? If you need to transform the data to admit these tasks, explain roughly how you might do this (but don’t transform the data now!).

A detailed explanation of the attributes of the data

  • Describe if the attributes are discrete/continous, Nominal/Ordinal/Interval/Ratio,
  • Give an account of whether there are data issues (i.e. missing values or corrupted data) and describe them if so
  • Describe the basic summary statistics of the attributes.

If your data set contains many similar attributes, you may restrict yourself to describing a few representative features (apply common sense).

Regression

  • Explain which regression problem you have chosen to solve.
  • Apply linear regression with forward selection and consider if transforming or combining attributes potentially may be useful. For linear regression, plotting the residual error vs. the attributes can give some insight into whether including a transformation of a variable can improve the model, i.e. potentially describe parts of the residuals.
  • Explain how a new data observation is predicted according to the estimated model. I.e. what are the effects of the selected attributes in terms of predicting the data. (Notice, if you interpret the magnitude of the estimated coefficients this in general requires that each attribute be normalized prior to the analysis.).
  • Fit an artificial neural network (ANN) model to the data.
  • Statistically evaluate if there is a significant performance difference between the fitted ANN and linear regression models based on the same cross-validation splits (i.e., use a paired t-test). Compare in addition if the performance of your models are better than simply predicting the output to be the average of the training data output.

Classification

  • Explain which classification problem you have chosen to solve.
  • Apply at least three of the following methods: Decision Trees, Logistic/Multinomial Regression, K-Nearest Neighbors (KNN), Naı̈ve Bayes and Artificial Neural Networks (ANN). (Use cross-validation to select relevant parameters in an inner cross-validation loop and give in a table the performance results for the methods evaluated on the same cross-validation splits on the outer cross-validation loop, i.e. you should use two levels of cross-validation).
  • For the models you are able to interpret explain how a new data observation is classified.
    (If you have multiple models fitted, (i.e., one for each cross-validation split) either focus on one of these fitted models or consider fitting one model for the optimal setting of the parameters estimated by cross-validation to all the data.)
  • Statistically compare the performance of the two best performing models (i.e., use a paired t-test). Compare in addition if the performance of your models are better than simply predicting all outputs to be the largest class in the training data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.