My solution is pretty simple:
- put the data in the
data
folder - run
solution.ipynb
The data for the competition can be found here.
This is a longitudinal data where each patient is observed for some (variable) time.
The idea is simple:
-
Sort the data for each patient in ascending mode (lowest months first)
-
For each variable impute data using (in this order):
- linear interpolation
- forward fill
- backward fill
-
For the remaining data* (where there are still NA), apply:
- Apply standardization
- iterative imputation with Huber regressor
- Inverse transform to get predictions ins original scale
-
Apply roundings and clipping
* There will be still NA after imputation because some patient has either all NA in a specific variable or only one row.