The 8200bio_challenge's intro from omerugi

8200bio_challenge's Introduction

🥈🏆 Awarded 2nd place in the challenge 🏆🥈

Our team: Roi Peleg, Lilach Mor, Omer Rugi

About the challenge:

8200Bio_Data_Challenge 3rd event in collaboration with DermaDetect!

Given a small batch of tabular medical data and a Decision tree model, try to improve the accuracy of the model while keeping the features readable so it could be presented as a "Tree" and a doctor could make sense out of it.

In our preprocessing we did:

Label encoding - For all the discrete data.

OneHot encoding – on the labels (after label encoding them).

MinMaxScalar – on the numeric data.

Zeros & Ones – Replace the values of Booleans.

Data Completion – In the features ‘location_covrage’ and ‘pain.pain_type’ there was missing data, we tried RandomForest and KNN imputer to fill in the missing data. The samples that contained the data = train, the samples with missing data = test. The RandomForest was selected, performed better.

Feature/Demention reduction – using a Decision Tree model we found the 50 most important features, after running it we had a list of the most important features (rf_selected_features). After finding those, every call to the preprocessing cleaned the unnecessary features based on the rf_selected_features (both in train and in test).

Data Generator - creating new data based on the given sample's statistics and distribution.

When evaluating the model :

Split test train – did it in an equal manner between the classes, so the train will “see” all the possible labels.

Generating data – used statistic sampling of the train data to generate more samples to train the model better.

OneHot Encoding – used it on the labels when training the model.

** Note: we tried –

Change all the discrete data as OneHot vectors but it didn’t work.
Use GAN’s to generate data

Didn’t give good results.

Recommend Projects

omerugi / 8200bio_challenge Goto Github PK

8200bio_challenge's Introduction

🥈🏆 Awarded 2nd place in the challenge 🏆🥈

About the challenge:

In our preprocessing we did:

When evaluating the model :

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent