Giter VIP home page Giter VIP logo

bin712-atelier-grp-12's Introduction

Data cleaning

  • import
  • identify categorical data
    • convert categorical to int

Class

  {'Class 1' : 1, 'Class 2':2, 'Class 0':0, 'Class -1':-1, 'Class -2':-2}
  • identify missing data
    • analyse and replace

Data with missing values

  • vsurf_V (18 values)
  • vsurf_S (13 values)
  • vsurf_R (7 values)
  • ASA+ (2 values)
  • a_heavy (1 value)
  • ASA- (1 value)
  • a_IC (1 value)

*Before replacing missing data, it was decided to identify and replace/remove outliers

  • identify outliers
    • analyse and replace/remove

Using error bar of mean for each feature, the following features were identified as having outliers : Features with smaller error margins were not considered outliers

  • CASA- (10 values out of bounds)
  • DCASA (10 values out of bounds)
  • pmi (14 values out of bounds)
  • pmi2 (11 values out of bounds)
  • pmi3 (10 values out of bounds)
  • vsurf_R (1 value below lower_boundary)

A histogramme of the Data distribution of each feature was done and one feature stood out as having the outlier be removed

  • vsurf_R

The remaining Features were managed using the upper boundary as replacement value for all outliers outside this boundary

For each feature:

  Q1 = data1['pmi'].quantile(0.25)
  Q3 = data1['pmi'].quantile(0.75)
  IQR = Q3 - Q1
  upper_boundery = Q3 + 1.5*IQR
  
  condition = data1['pmi']>upper_boundery
  condition.sum() # 14 values are outside the upper_boundary
  data1['pmi'][data1['pmi']>Q3] = upper_boundery

Choosing best attributes using entropy

once data cleaned, 10 best attributes identified were :

     Feature
    0      ASA+
    1      ASA-
    2     CASA+
    3      DASA
    4    h_logP
    5    h_logS
    6      npr2
    7      rgyr
    8  std_dim1
    9  std_dim2

bin712-atelier-grp-12's People

Contributors

carofo73 avatar

Stargazers

 avatar  avatar

Watchers

chakir avatar

Forkers

mahsafarnia23

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.