Giter VIP home page Giter VIP logo

noise-detection-on-cancer-dataset's Introduction

Noise Detection and Removal on Cancer Dataset

This is my Scientific Initiation Project repository. The purpose of this work is to check how noise affects the performance of machine learning predictive models, more specifically with genomic data related to the cancer disease.

How it works

This script get a sequence of datasets, and may or may not balance the classes of the instances (all datasets used has only two classes). The two experiments are made separately.

Then, artificial noise is inputted on the dataset at 0.2 ratio, using a random methodology (each instance has the given fixed ratio of getting it class label exchanged).

The corrupted dataset is feed into each noise filter at a time, expecting that they remove at least the majority of the artificial inputted noise, and some noise instances originally on the dataset.

To verify the performance of each noise filter, different predictive models are fit with the original dataset, the corrupted dataset and the filtered dataset, generating three accuracies that are compared.

Technical details:

The experiment is reproducible, because the random seed is fixed with the arbitrarily selected '101010' value, which means that all script runs should hold the same results. However, your results may not coincide to those found inside 'results' subdirectory, as they are collected between several script changes and, most important, separated script runs.

If you want to change some experiment parameters, like the datasets, noise filters, classifiers, noise input ratio, number of folds on cross validation etc, just check out the './src/config.R' file and edit by hand whatever and whenever you want. More specific configuration must be edited on the 'main.R' file and it shouldn't be done unless you're sure of what you're doing, do with your own risk.

How to run:

The whole experiment should take days or weeks to be completed, and running it isn't recommended unless you want some metadata which is not already available on the 'result' subdirectory.

Just open a R section and type:

source('main.R')

The results will be automatically appended into the 'NoiseResults.dat' output file. If it does not exists, then it will be created. Make sure that the R section have all the permission it needs on your machine.

Filters used:

Will may find all these filters, alongside all relevant information about then, at https://CRAN.R-project.org/package=NoiseFiltersR.

  • AENN ('All-k Edited Nearest Neighbors')
  • INFFC ('Iterative Noise Filter based on the Fusion of Classifiers')
  • HARF ('High Agreement Random Forest')
  • ENG ('Editing with Nearest Graphs')

Classifiers used (R package):

  • k-Nearest Neighbors ('class')
  • Random Forest ('randomForest')
  • Support Vector Machine with Linear Kernel ('e1071')

noise-detection-on-cancer-dataset's People

Contributors

felsiq avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.