rvandewater / yaib Goto Github PK

🧪Yet Another ICU Benchmark: a holistic framework for the standardization of clinical prediction model experiments. Provide custom datasets, cohorts, prediction tasks, endpoints, preprocessing, and models. Paper: https://arxiv.org/abs/2306.05109

Home Page: https://github.com/rvandewater/YAIB/wiki

License: MIT License

Makefile 0.53% Python 99.26% Shell 0.21%

amsterdamumcdb benchmark clinical-data clinical-ml deep-learning ehr eicu-crd framework hirid-dataset icu machine-learning mimic-iii mimic-iv patient-monitoring time-series

yaib's Introduction

Hi 👋, I am a PhD candidate in AI in healthcare at the Hasso Plattner Institute. I am a Computer Scientist and Data Scientist by education. I specialize in using predictive modelling methods to derive insights from healthcare data, thereby prolonging the life of patients. I am interested in applying the ml methods to improve healthcare.

TLDR:

🧠 AI in Health Researcher
🧑‍🎓 Ph.D. Candidate
🏛️ Hasso Plattner Institute
🏠 Berlin, Germany

yaib's People

Contributors

Stargazers

Watchers

Forkers

yet-another-icu-benchmark mlondschien data-designer ricardo-santos-fhp addison-weatherhead hahustat

yaib's Issues

Add unit tests for recipes

Add a first suit of tests to cover the main functionality of the recipe functionality.

Fix example configs

Because example configs include the other configs and includes aren't parsed for random searches these files don't work correctly at the moment and need fixing.

Run tests on commit

Make sure tests work before commiting (pre-commit hook).

Write typehints for existing methods

Write typehints for existing code, leave -> None out.

[REC] Group should be optional

Groups in a Recipe/Step should be optional. Steps can either not care about groups ever .group == True or use group if specified (ignore otherwise and treat as if no group). The current implementation of _apply_group() however forces a group for Steps with .group == True and leads to an error if there is not group.

Write tests for train and evaluate

Using the demo_data, write tests to make sure both train and evaluate work correctly. Trigger these tests on commits to PRs.

Investigate root files and update setup.py

Update / delete files in root folder if obsolete.

setup.py still contains old code, update where needed.

Remove legacy docs

TCN raises error

The current encodeder expects num_channels to be a list whose length determines the number of layers.

Write tests for recipes

Write unit and integration tests for the preprocessing with recipes.

Create a GitHub action to run these tests as a pre-commit hook.

Implement a new recipe role for the "index" (time in our case)

Implement a role that can be used for sorting the different groups if needed, for example forward filling needs to have the groups sorted by time.

Preprocessing via external file

In order for us to claim the benchmark is "an easily customisable framework", we might want to include a way to do preprocessing via an external file. This might be a python file, which you would supply via the CLI or link with a gin-config.

StepHistorical uses add_role, should use update_role

The current usage throws an error.

I find the naming of the functions quite confusing, could use a rename imo.

Add integration test for `recipes` package

Add an integration test which executes tests for standard preprocessing.

Implement missing Selector methods

The Selector contains a few missing methods that need to be implemented. Once starts_with is implemented, it can be used to make the tests for the sklearn step more robust by checking that the right number of columns is created.

Update readme for the recipes

Give some examples for using the recipe package correctly.

Generalise splits generation

The function that generates the splits is very much tied to our dataset. It includes hardcoded variables for the proportion of splits and the grouping variable ('stay_id'). This needs to be more general.

It also gets paths as parameters and writes to disk, something that needs to be reconsidered.

Write tests for steps implementations

StepImputeFill, StepScale and StepHistorical specifically still need tests.

Unintuitive training and validation loss scores

Validation AUC is higher than train AUC in Sepsis. Problem occurring with both @prockenschaub and @HendrikSchmidt

Add a debug flag that loads only a small set of data

Update pytorch to newest version for intel architecture

At the moment, pytorch 1.13 with pytorch-cuda 11.7 conflicts with the other dependencies. See whether the update works in a few weeks.

Redesign how `Ingredients` remember roles

Pandas allows to add persistent metadata to a DataFrame by subclassing the DataFrame and specifying _metadata = ["metadata_varname"]. This is what we currently use in Ingredients to remember roles. Unfortunately, metadata is propagated by reference whenever the DataFrame is subset, meaning any roles set on a slice will also be set on the original.

I got around this behaviour and enforce strict propagation by copy through overriding the __finalize__ function, which is responsible for remembering the metadata. However, this function is anotated with @final, hinting that it should not be overridden as this might be error-prone. Due to this, we should find some other way to enforce copying of roles.

Make random search accept arbitrary functions

At the moment, random search RS([...]) only accepts lists. Change it so that a user can put in a function that draws a value.

Include demo data

Preprocess demo data for mimic and eicu and use for examples and tests.

Update LGBM

The current architecture uses a deprecated feature:

UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.

Update to current library method

StepHistorical doesn't work with string as function argument anymore

Because e.g. 'min' == Accumulator.MIN evaluates to false, the StepHistorical can't be used like before anymore.

Implement recipes step that takes an sklearn transformer

Build a general step that takes an sklearn transformer and list of columns and uses the transformer on these columns.

Adapt add_step function to take multiple steps in recipes

Generalise the add_step function to take a list of steps as well as a single step (possibly rename to add_steps or have both).
Evaluate adding the steps in the recipe initialisation.

Data augmentation `step` for "Last x hours"

Implement a data augmentation step that extracts averages over last x hours. Not mandatory (at the moment), but nice to have.

Get training to work in parallel on different architectures

Right now, when trying to run the training of the DL models, the multiprocessing throws an error on my machine (MacBook Pro with M1 Pro).

python -m icu_benchmarks.run train \
                             -c configs/hirid/Classification/LSTM.gin \
                             -l logs/random_search/24h_multiclass/LSTM/run \
                             -t Phenotyping_APACHEGroup \
                             --num-class 15 \
                             --maxlen 288 \
                             -rs True\
                             -lr  3e-4 1e-4 3e-5 1e-5\
                             -sd 1111 2222 3333 \
                             --hidden 32 64 128 256 \
                             --do 0.0 0.1 0.2 0.3 0.4 \
                             --depth 1 2 3
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
2022-08-30 11:16:08,290 - INFO: Model will be trained using CPU Hardware. This should be considerably slower
Traceback (most recent call last):
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/run.py", line 426, in <module>
    main()
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/run.py", line 406, in main
    train_with_gin(model_dir=log_dir_seed,
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/models/train.py", line 46, in train_with_gin
    train_common(model_dir, overwrite, load_weights)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/config.py", line 1531, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/config.py", line 1508, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/models/train.py", line 88, in train_common
    model.train(dataset, val_dataset, weight)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/config.py", line 1531, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/gin/config.py", line 1508, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/models/wrappers.py", line 179, in train
    train_loss, train_metric_results = self._do_training(train_loader, weight, metrics)
  File "/Users/hendrikschmidt/projects/thesis/YAIB/icu_benchmarks/models/wrappers.py", line 134, in _do_training
    for t, elem in tqdm(enumerate(train_loader)):
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 355, in __iter__
    return self._get_iterator()
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 914, in __init__
    w.start()
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/hendrikschmidt/opt/anaconda3/envs/icu-benchmark/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'
  In call to configurable 'train' (<function DLWrapper.train at 0x7fcc5bec0790>)
  In call to configurable 'train_common' (<function train_common at 0x7fcc3b5000d0>)
Closing remaining open files:/Users/hendrikschmidt/projects/thesis/data/hirid_preprocessed/ml_stage/ml_stage_12h.h5...done/Users/hendrikschmidt/projects/thesis/data/hirid_preprocessed/ml_stage/ml_stage_12h.h5...done

The issue might be the dataloader / way the H5 file is opened, a possible solution is described here: pytorch/pytorch#11929 (comment).
Ideally, the training would work on all different architectures and not only Linux to facilitate development speed.

Evaluate replacing argsparse with abseil e.g. cpu_training flag

Suggested by Hendrik to improve the software engineering soundness

Investigate issue with split index

Atm StepHistorical leads to issues, because it sometimes drops the complete index. However, the splits are implemented as an index on the DF and so get lost as well/can't be lined up with the result.
This could also lead to problems when trying to do the following:

rec.prep(data=train_df)
rec.bake(data=val_df)
rec.bake(data=test_df)

Investigate if a different setup for the splits might make sense (e.g. dict with three keys, implies changing the dataloader too) or the HistoricalSteps (and maybe more preprocessing) need to be changed.

Add Sampling Options

We want users to be able to specify a type of balanced sampling. So for specifying test and train set, we would have the same ratio of positive and negative cases (for binary classification).

Potential candidates for oversampling:

- Random oversampling
- Synthetic Minority Oversampling (SMOTE)
- Adaptive Synthetic (ADASYN)

Start using code coverage package

Use something like https://coverage.readthedocs.io/en/6.5.0/ to get coverage reports.

Change behaviour of prep and bake

At the moment, the behaviour of prep and bake is a bit confusing, as prep already transforms the data as well as fitting it. Maybe consolidate the two functions into one or distinguish between them better. Make sure that fitting on one split and transforming on another works.

Common tasks in `Step.transform()`

@HendrikSchmidt in #41 raised the question whether common tasks in Step.transform() -- such as calling _check_ingredients() -- should be moved into the parent class. This is currently the case for Step.fit() with a dedicated do_fit() function that must be overridden by child classes.

Define initial TODO's

See where to start implementing changes

Remove typehints from docstrings

Type hinting in docstrings is redundant, as we are already using typehints in the function. Remove those.

The description should include required type(s) if the code does not contain a corresponding type annotation.

Adjust the command line arguments

The command line arguments will need a do-over and clean once the basic training routine is fixed.

Pass preprocessed DFs as parameter instead of writing to disk

Atm, all preprocessed DFs (splits, features, imputation) get written to disk before being read again by the loader. To make the process more adaptable, let the main method pass the DFs directly to the loader/Dataset and have the train method use that. Caching the results on disk could still be an option.

Feature generation: min, max, mean, median, num_measurements, variance
Impute with training mean
Scaling
Downsampling by time

Write docstrings for existing classes and methods

Write the documentation for the existing recipe code.