Giter VIP home page Giter VIP logo

mimic_sepsis's Introduction

Sepsis Cohort from MIMIC III

This repo provides code for generating the sepsis cohort from MIMIC III dataset. Our main goal is to facilitate reproducibility of results in the literature.

This is a pure-python implementation based on a corrected version (by the first contributor below) of the original Matlab repo accompanying "The AI Clinician" paper (Komorowski, et al):

https://github.com/matthieukomorowski/AI_Clinician

Core updates and modifications to the above repo include:

  • Pure python re-implementation;
  • Numerous bug fixes;
  • Add description to the item IDs (essential to clarify what is what);
  • A point-by-point check with the original code to assure same data generation before imputation;
  • Deprecate original imputation, which are not reproducible;
  • Add KNN imputation to produce higher quality data.

LICENSE

Microsoft Open Source Code of Conduct


Contributing

This code has been developed as part of the RL4H initiative at MSR Montreal. Most of the core work has been done by


This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Requirements

We recommend using the Anaconda distribution for python dependencies. From this standard distribution we use both the os and argparse libraries. All other needed libraries used in this code can be found in requirements.txt.

How to use

1) MIMIC-III Database

You need to first set up and configure MIMIC III database. The details are provided here:

https://mimic.physionet.org/

The MIMIC database is publicly available; however, accessing MIMIC requires additional steps which are explained at the hosting webpage.

We chose to use a PostgreSQL server to manage the database (hence our use of the psycopg2 library requirement--see requirements.txt). Other options and formats are available, see the MIMIC repository for examples and alternatives.

After downloading and setting up the SQL files and performing all the steps from the physionet link above, you should be able to use this codebase without too much additional set-up.

2) Run preprocess.py

This script accesses the MIMIC database and extracts sub-tables for use in defining the final septic patient cohort in the next step.

There are 43 tables in the Mimic III database, 26 are unique and the other 17 are partitions of chartevents that are not to be queried directly (see: https://mit-lcp.github.io/mimic-schema-spy/ for further guidance).

Ulitmately, we create 15 sub-tables when extracting from the database. These subtables are stored in a subfolder processed_files/ that can be created manually. This script will create the subfolder if it doesn't already exist.

The preamble of this file will likely be the only editing needed to direct toward where a user's access to the MIMIC database is defined as well as where they choose save off the intermediate files.

Depending on the I/O readout speed and network connectivity (assuming that the MIMIC database is saved on a server) this script can take several hours to run completely.

3) Run sepsis_cohort.py

Using the sepsis3 criteria, this script uses the preprocessed intermediate tables produced in the prior step to define a cohort of septic patients. This cohort definition was spefically designed for use in sequential decision making purposes, yet this cohort definition code does not partition temporally spaced observations as individual data points. This script instead populates a table of patients who develop sepsis at some point during their treatment in the ICU and includes all observations 24 hours before until 48 hours after presumed onset of sepsis. Further preprocessing is required to represent this data in MDP format, an example where this is done can be found at: https://github.com/MLforHealth/rl_representations/.

External files required: Reflabs.tsv, Refvitals.tsv, sample_and_hold.csv (all saved in the ReferenceFiles/ sub-folder)

The final cohort table is saved in a user specified location in .csv format where the columns are z-normalized. The user can specify to also save off an unormalized copy of the same table.

Note: The size of the cohort will depend on which version of MIMIC-III is used. The original cohort from the 2018 Nature Medicine publication was built using MIMIC-III v1.3.

Again, depending on system characteristics, this script may take 2-3 hours to run to completion.

mimic_sepsis's People

Contributors

dependabot[bot] avatar fatemi avatar hyoshioka0128 avatar microsoftopensource avatar twkillian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mimic_sepsis's Issues

input_4hourly and max_dose_vaso : should they be part of the next step's observation ?

I am just seeking clarification.

Im looking at sepsis_final_data_withTimes.csv and sepsis_cohort.py

If I understand correctly, action is derived from input_4hourly and max_dose_vaso. We can say the action decision causes input_4hourly and max_dose_vaso, not the other way round.

Therefore input_4hourly and max_dose_vaso must not be considered part of the observation for the current step, but perhaps for the next step.

Is this correct?

In that case I may want to shift these two columns one step forward in time, to allow them to be used in observations.

sepsis_cohort.py: KeyError: 'MechVent'

$ make create-user mimic datadir=data_mimic/mimic-iii-clinical-database-1.4/
----------------
-- Check data --
----------------
# ... checks succeeded

$ python preprocess.py
0
10000
20000
30000
40000
50000
60000
70000
80000
90000


$ python sepsis_cohort.py
Using TensorFlow backend.
Loading processed files created from database using "preprocess.py"
Filling-in missing ICUSTAY IDs in bacterio
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:31:17
Filling-in missing ICUSTAY IDs in bacterio - 2
0% [###########################   ] 100% | ETA: 00:00:23
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:03:43
Filling-in missing ICUSTAY IDs in ABx
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:26
Full ICU -- Finding presumed onset of infection according to sepsis3 guidelines
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:02:08
Full ICU -- Number of preliminary, presumed septic trajectories:  25367
Full ICU -- Replacing item_ids with column numbers from reference tables
 Full ICU --  Making an array with all unique charttime (1 per row) and all items in columns.
Traceback (most recent call last):
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'MechVent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "sepsis_cohort.py", line 400, in <module>
    col = temp3.loc[ii, 'MechVent']
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexing.py", line 1762, in __getitem__
    return self._getitem_tuple(key)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexing.py", line 1272, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexing.py", line 1389, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexing.py", line 1965, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexing.py", line 625, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/generic.py", line 3529, in xs
    return self[key]
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/robin/miniconda3/envs/tfgpu1/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'MechVent'

ubuntu 18.04
Python 3.6.7
Dataset: mimic-iii-clinical-database-1.4

Also, thanks for sharing this awesome resource!

RuntimeError numpy

Hello,

When running the "sepsis_cohort.py", I got:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

even though I installed the same versions of the packages in requirements.txt. Do you have any idea what the error was about?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.