Giter VIP home page Giter VIP logo

classification's Introduction

Classification

This repository includes different scripts to perform clustering and classification analyses.

SVM_classification :octocat:

Suite of MATLAB scripts to disect two subpopulation in a 2D data set using Support Vectors Machine (SVM). Despite the name of the variables, the analysis can be applied to the results of the tSNE or any other dimensional reduction analysis.

Inputs

  1. Import into MATLAB's workspace the next four variables: whos('tsne1','tsne2','classes','names')
  2. Additional inputs are required depending on the trainning subset.

####tSNE_SVM_supervised.m

The training data set is matually picked by giving the classes of the two subpoulation. See details at the beginning of the script. The Optimum surface is computed them using a linear kerner. See documentation to use an alternative one.

####tSNE_SVM_bootstrapping.m

This approach combines SVM with bootstrapping to compute the optimum boundary disecting the two subpopulations. Instead of choose the whole data, the training data set is sampled from the two subpoulations and the optimum surface is computed using a quadratic kernel. This process is repeated one hundred times and all the optimum surfaces are ensemble using a least squared ellipse fit. This scripts depends on fitellipse.m and plotellipse.m from MATLAB Central.

Outputs

The script outputs a scatter-plot with the optimum boundare surface diseccting the two subpopulations.

GMM_clustering :octocat:

This script performs a clustering analysis using Gaussian Mixture Model (GMM).

Constrained mixture estimation for analysis and robust classification of clinical time series. Bioinformatics (2009).

Note! Consider to extend it to 3D. See how to here

Dependencies

The main script, gmm_clustering.m, computes the GMM with different modes using the gmdistribution function from MATLAB's statistical toolbox. To interact with the model, the custom user's interface files gui.fig and gui.m are required.

Inputs

  1. Data is read from a table in CSV or Excel file with a header, dscribing the variables names, and the next columns: (i) an unique identifier with the samples names, (ii) the class name to which each sample belongs, (iii) the X coordinates and (iv) the Y coordinates. Find an example in here
  2. K, the maximum number of modes. This parameter could be an integer or any of the next strings: 'one', 'two', 'three', ...'ten' that will use a multiple of the actual number of classes in the dataset. Notice that strings are defined between single quotes. By default this parameter is set to 'two'.
  3. seed, the random generator's seed. For repeatability, the randomness is controlled by fixing the seed of the random generator. By default, it is set as zero. Use any other value to generate alternative solutions.

Outputs

The main script will compute the data densite of the dataset and all GMM from 1 to K Gaussian components into the mix. The results of each single model will be plotted as a heatmap overlaping the GMM (with white dots indicating the mean of the Gaussian modes) and a 3D plot. In addition, the Akaike information criterion is computed and its evolution throgh all the models plotted. A trend lins is plot when a stable level is reached.

Clustering using GMM

After computing all models, the GUI gets open and any of the GMM for a given number of modes can be plotted with the classification results using the chosen GMM. Use the slider to choose the number of modes and compare the clustering results with the original input data set. For a given data set, the clustering results can be reported provinding which data belongs to a specific mode and theirs hyperparameters by doing click on Export.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.