Giter VIP home page Giter VIP logo

8200bio_challenge's Introduction

image image

🥈🏆 Awarded 2nd place in the challenge 🏆🥈

Our team: Roi Peleg, Lilach Mor, Omer Rugi

About the challenge:

8200Bio_Data_Challenge 3rd event in collaboration with DermaDetect!

Given a small batch of tabular medical data and a Decision tree model, try to improve the accuracy of the model while keeping the features readable so it could be presented as a "Tree" and a doctor could make sense out of it.

In our preprocessing we did:

Label encoding - For all the discrete data.

OneHot encoding – on the labels (after label encoding them).

MinMaxScalar – on the numeric data.

Zeros & Ones – Replace the values of Booleans.

Data Completion – In the features ‘location_covrage’ and ‘pain.pain_type’ there was missing data, we tried RandomForest and KNN imputer to fill in the missing data. The samples that contained the data = train, the samples with missing data = test. The RandomForest was selected, performed better.

Feature/Demention reduction – using a Decision Tree model we found the 50 most important features, after running it we had a list of the most important features (rf_selected_features). After finding those, every call to the preprocessing cleaned the unnecessary features based on the rf_selected_features (both in train and in test).

Data Generator - creating new data based on the given sample's statistics and distribution.

When evaluating the model :

Split test train – did it in an equal manner between the classes, so the train will “see” all the possible labels.

Generating data – used statistic sampling of the train data to generate more samples to train the model better.

OneHot Encoding – used it on the labels when training the model.

** Note: we tried –

  1. Change all the discrete data as OneHot vectors but it didn’t work.
  2. Use GAN’s to generate data

Didn’t give good results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.