Giter VIP home page Giter VIP logo

solution-for-hackerearth-machine-learning-challenge's Introduction

Solution-for-HackerEarth-Machine-Learning-challenge (26st place)

HackerEarth Machine Learning challenge: Of Genomes And Genetics link.

Why I took part in this competition?

Because:

  • I have wanted to practice some feature engineering techniques about tabular data, ensemble,... that I found on Kaggle link.
  • I have wanted to take part in a competition which was ongoing. That has made me feel like I'm taking part in the Olympics in which a lot of competitors have to compete against each other.

Time to complete

That time when I found this competition, It still had 1 week to close. So my work still had concentrated on EDA and Feature engineering. In this post, I will share how I did feature engineering.

Train and Test

When performing Label Encoding below, you must encode train and test together (reference from)

Feature Engineering Techniques

Label Encode ( Categorical features )

  • Features have 2 values:
    • Genes in mother's side
    • Birth defects
    • History of anomalies in previous pregnancie
    • Assisted conception IVF/ART
    • H/O serious maternal illness
    • Folic acid details (peri-conceptiona)
    • Place of birth
    • Heart Rate (rates/min
    • Respiratory Rate (breaths/min)
    • Follow-up
    • Inherited from father
    • Maternal gene
    • Status
    • Paternal gene From what values of features are? (Quantifier) then I have chosen values. For example: with Follow-up feature: High --> 2, Low ---> 1
  • Features have more than 2 values:
    • It's the same before. But some new values such as: -, Not available, Not applicable,.. so I had to label them.
    • With some text features like : Location of Institute, Institute Name, Family Name, Father's name I have extracted to had new features then encodes them:
      • Location of Institute: for examples: 125 PARKER HILL AV\nJAMAICA PLAIN, MA 02120\n(42.329611374844326, -71.10616871232227). I had created some features before:
          1. JAMAICA PLAIN : district
        1. MA 02120 : POST CODE
        2. 42.329611374844326 : Latitude
        3. -71.10616871232227 : Longtitude
      • Then hash code: district, POST CODE, Family Name, Father's name

Transforming

  • Log transform some numerical features: 'Patient Age', 'Blood cell count (mcL)', "Mother's age", "Father's age", 'White Blood cell count (thousand per microliter)'
  • Interaction (ratio): create ratio columns like as in df['patient_per_mom'] = df['Patient Age']/df["Mother's age"] df['patient_per_dad'] = df['Patient Age']/df["Father's age"] df['age_per_bcc'] = df['Patient Age']/df['Blood cell count (mcL)'] df['age_per_wbcc'] = df['Patient Age']/df['White Blood cell count (thousand per microliter)'] df['wbcc_per_bcc'] = df['White Blood cell count (thousand per microliter)'] /df['Blood cell count (mcL)']
  • Coordinate features: lat = df["latitude"] lon = df["longtitude"] df["x_dimen"] = np.cos(lat) * np.cos(lon) df["y_dimen"] = np.cos(lat) * np.sin(lon) df["z_dimen"] = np.sin(lat)

Create IS_NULL some impact features

Target Encoding

USING SMOTE TO DEAL WITH IMBALANCE DATASET

MODEL

Because time was limit, so I had choose autoML for model. And this is source code about my solution.

solution-for-hackerearth-machine-learning-challenge's People

Contributors

toan01-uet avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.