A repository for kaggle OSIC fibrosis challange - kaggle page
The challenge is to predict patient FVC progressien given the patient's details (as in train.csv) with an initial FVC measurement and CT scans.
Every patient FVS is fitted with an exponential decaying function so: FVC(t) = Ie^-kt, where I is the initial FVC, t is time and k is the fitted expnential coefficiant.
The model is a squeezenet mockup. It tries to predict the exponent coefficiant from the patiants CT scans.
It does so by predicting k for every scan and then averaging over k's.
The predict.exponent_generator does just that. It itarates through the dataset images partitioned per patient. It then predicts k per image, average k's perpatient and yeilds the exponent function and patient ID.
The exponent functions for training are generated by taking a training patient FVC history and fitting a linear function:
ln(FVC) = -kt
Before fitting the funcion, outlier FVC measurements were removed by cook's distance method. Then k coefficiant is yielded and an exponenet function is built with the initial FVC measurement.
The CT scans and ground truth k per scan was uploaded to a GCS bucket - gs://osic_fibrosis after converting into TF Record format.
In order to generate preidction confidence, A quantile regression model was fitted to the table data with the CNN model prediction as an ensamble.
The quantile regression model essentially tries to predict 25 and 75 precentile predictions and then yeild their difference as the confidence in model predictions.
The model was trained on 256x256 image size for expirementing efficiency.
It was then elarged by addeing 2 Conv layers performed on 512x512 pics berfore the prior trained model to downsample the image but still gain some signal from the high resolution pixels.
The conda yml environment files can be found both for cpu and gpu at the environments folder.