curate-mimic - A repository to guide in demonstrating how to curate the MIMIC III database with Natural Language Processing
-
Preliminary steps:
- Obtain access to the MIMIC III database (ask your PI).
- Obtain a UMLS account and API key
- Install Docker and docker-compose
-
Setup cTAKES containers:
git clone [email protected]:Machine-Learning-for-Medical-Language/ctakes-rest-package.git
cd ctakes-rest-package
export umls_api_key=<api key from above>
docker-compose up -d --scale ctakes=N
# This starts N containers -- each requires around 4 GB RAM.
-
Run the python script to process the data -- run with -h to receive detailed documentation of the options:
python process_mimic.py --input-path <path to NOTEEVENTS.csv file> --output-format <json|mongo|xmi|fhir> --output_dir <directory to write files if output format is file-based>
-
For MongoDB usage, setup a few indices for faster querying: a. TODO