This folder includes the ConvAE architecture implementation and an example dataset to learn patient representations from their EHRs as described in [1].
Python 3.6+
Download the convae_architecture
folder
$ git clone http://github.com/landiisotta/convae_architecture
The full list of required Python Packages is available in requrirements.txt
file. It is possible
to install all the dependencies by:
$ pip install -r requirements.txt
Examples of randomly generated input EHRs (train and test) can be found in data_example
folder.
Outputs will be stored in ./data_example/encodings
and include ConvAE latent representations,
EHR sequences organized in subsequences of desired length, and best model weights.
To train the model:
sh learn_patient_representations.sh
To test the representations learned on the test set:
sh learn-patient-representations.sh test
Synthetic data include:
200 patients, 50:50 split for train and test;
vocabulary size = 200;
min sequence length = 3;
max sequence length = 100;
embedding dimension = 100;
subsequence length = 32.
Model parameters can be modified in utils.py
.
This example randomly initialize the embedding matrix.
[1] Landi, I., Glicksberg, B. S., Lee, H. C., Cherng, S., Landi, G., Danieletto, M., Dudley, J. T., Furlanello, C., & Miotto, R. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 96 (2020). https://doi.org/10.1038/s41746-020-0301-zDeep