Giter VIP home page Giter VIP logo

patient_trajectory_prediction's Introduction

Step 0 : Python environment

  • All of these scripts were ran with Python 3.7.
  • PyTorch version 1.5.0
  • Cuda version 10.2
  • GPU Quadro P6000

Step 1 : Cleaning data from MIMIC III's NoteEvents.csv (data cleaning)

1.1 Move to data_cleaning folder.

1.2 Run noteEvents_preproc.py (with MIMIC III's NOTEEVENTS.csv as input) - it takes about 4 hours to finish, and generates a preprocessed text (output.csv).

1.3 Run MIMIC_smart_splitter.py (with output.csv as input) : splits the preprocessed text into files of 50 Mb without cutting any note - it should take about 1 hour.

1.4 At this step, we have a new folder called "data" which contains two folders. The first one (chunkssmall) contains all files and the other one is empty.

Step 2 : CUI Recognizer with QuickUMLS (concept extraction)

2.1 Install QuickUMLS, see : https://github.com/Georgetown-IR-Lab/QuickUMLS - at the end, you should have a QuickUMLS folder, as follow :

Alt text

2.2 Put the "data" folder generated in step 1.3, and the installed "QuickUMLS" folder in concept_annotation folder.

2.3 Once you're in concept_annotation folder, run quickUMLS_getCUI.py (if your machine is able to run about 25-30 threads, this process takes between hours to 3 days to finish, depending on the chosen parameters).

Parameters are:

  • --t : Float which is QuickUMLS Threshold, should be between 0 and 1 (default --t=0.9).
  • --TUI : String which represents the TUI List filter, either "Alpha" or "Beta" (default --TUI=Beta).

Note: Both TUI lists are available in TUI_Lists.pdf file in the root of the repository.

2.4 Concatenate the multiple outputs to make one final file. For that, move to "data/outputchunkssmall" and run the 4th and last command mentioned in : useful_commands.txt

2.5 Run quickumls_processing.py with the concatenated output as input (output of the previous step).

A new file is generated, the data is ready for Deep Learning !

Step 3 : Deep Learning (PyTorch scripts)

Step 3.1 : Data preparation

1.1 Put the data file in "PyTorch_scripts/(any_target_task)/".

1.2 Run 01_data_preparation.py (or 01_data_prep_mortality.py / 01_data_prep_readmission.py depending on the chosen task).

Parameters are:

  • --admissions_file : path to the MIMIC III's ADMISSIONS.csv file.
  • --diagnoses_file : path to the MIMIC III's DIAGNOSES_ICD.csv file.
  • --notes_file : path to the data file.

1.3 A npz file (two for mortality_prediction) is generated, your data is ready for training!

Step 3.2A : Diagnoses prediction

Option 1 - FFN: Run 02_FFN_diagprediction.py (K-Fold Crossvalidation)

Optional arguments are:

  • --withCCS : add --withCCS=1 if you want to add CCS feature in the input (CCS one-hot concatenated with CUI one-hot)
  • --hiddenDimSize : size of the hidden layer
  • --batchSize : size of batches
  • --nEpochs : number of epochs
  • --lr : learning rate
  • --dropOut : drop-out probability in the last layer

Option 2 - RNN (NOT k-fold crossvalidation script because too long and heavy): train by running 02_GRU_train_GPU.py for GRU (or 02_LSTM_train_GPU.py for LSTM)

Then, test by running 03_GRU_test.py for GRU (or 03_LSTM_test.py for LSTM).

Optional arguments for both RNN models are:

  • --withCCS : add --withCCS=1 if you want to add CCS feature in the input (CCS one-hot concatenated with CUI one-hot) [only training script]
  • --hiddenDimSize : size of the hidden layer [both training script and testing script]
  • --batchSize : size of batches [both training script and testing script]
  • --nEpochs : number of epochs [only training script]
  • --lr : learning rate [only training script]
  • --dropOut : drop-out probability in the last layer [both training script and testing script]

Step 3.2B : Mortality prediction

In mortality_prediction folder, you can train and test a model (K-Fold Crossvalidation) whose architecture is :

Option 1) Fully-connected (02_FFN_mortality.py)

Option 2) RNN with Gated Recurrent Unit cells (02_GRU_mortality.py)

Optional arguments for both models are:

  • --withCCS : add --withCCS=1 if you want to add CCS feature in the input (CCS one-hot concatenated with CUI one-hot)
  • --hiddenDimSize : size of the hidden layer
  • --batchSize : size of batches
  • --nEpochs : number of epochs
  • --lr : learning rate
  • --dropOut : drop-out probability in the last layer

Step 3.2C : Readmission prediction

For readmission prediction, it is mainly the same method and arguments as mortality_prediction, but in readmission_prediction folder.

https://github.com/JamilProg/script_preproc_MIMIC/blob/master/README.md

patient_trajectory_prediction's People

Contributors

jamilprog avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.