This repository contains code for the ICLR'24 paper: Multimodal Patient Representation Learning with Missing Modalities and Labels.
python==3.8.18
torch==2.0.1
src/
: Source code for MedLinkpreprocess/
: Scripts for data preprocessingdataset/
: Data, Dataset, Tokenizer, Vocabulary, and collate_fncore/
: Core implementation for the MUSE methodmetrics.py
: Metrics for model evaluationhelper.py
: Helper class for model training, evaluation, and inferenceutils.py
: Utility functions
Follow these steps to reproduce the results:
- Edit the path in
src/utils.py
to your local path. - Obtain the eICU and MIMIC-IV datasets and place it under
{raw_data_path}
. - Run the following notebooks under
src/preprocess
in the specified order to prepare the data:- eICU:
- Run
parse_eicu_remote.ipynb
- Run
preprocess_eicu.py
- Run
build_vocab_eicu.py
- Run
data_split_eicu.py
- Run
- MIMIC-IV:
- Run
parse_mimic4_remote.ipynb
- Run
preprocess_mimic4.py
- Run
build_vocab_mimic4.py
- Run
data_split_mimic4.py
- Run
- Run
get_code_embeddings.py
- eICU:
- Execute
run.py
undersrc/core
to train the model:python run.py \ --dataset [mimic4/eicu] \ --task [mortality/readmission] \ --official_run
@inproceedings{
wu2024multimodal,
title={Multimodal Patient Representation Learning with Missing Modalities and Labels},
author={Zhenbang Wu and Anant Dadu and Nicholas Tustison and Brian Avants and Mike Nalls and Jimeng Sun and Faraz Faghri},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=Je5SHCKpPa}
}