Giter VIP home page Giter VIP logo

clinical-fusion's Introduction

Combining structured and unstructured data for predictive models: a deep learning approach

This repository contains source code for paper Combining structured and unstructured data for predictive models: a deep learning approach. In this paper, we proposed 2 frameworks, namely Fusion-CNN and Fusion-LSTM, to combine sequential clinical notes and temporal signals for patient outcome prediction. Experiments of in-hospital mortality prediction, long length of stay prediction, and 30-day readmission prediction on MIMIC-III datasets empirically shows the effectiveness of proposed models. Combining structured and unstructured data leads to a significant performance improvement.

Framework

Fusion-CNN

Fusion-CNN is based on document embeddings, convolutional layers, max-pooling layers. The final patient representation is the concatenation of the latent representation of sequential clinical notes, temporal signals, and the static information vector. Then the final patient representation is passed to output layers to make predictions.

Fusion-LSTM

Fusion-LSTM is based on document embeddings, LSTM layers, max-pooling layers. The final patient representation is the concatenation of the latent representation of sequential clinical notes, temporal signals, and the static information vector. Then the final patient representation is passed to output layers to make predictions.

Requirements

Dataset

MIMIC-III database analyzed in the study is available on PhysioNet repository. Here are some steps to prepare for the dataset:

Software

  • Python 3.6.10
  • Gensim 3.8.0
  • NLTK: 3.4.5
  • Numpy: 1.14.2
  • Pandas: 0.25.3
  • Scikit-learn: 0.20.1
  • Tqdm: 4.42.1
  • PyTorch: 1.4.0

Preprocessing

$ python 00_define_cohort.py # define patient cohort and collect labels
$ python 01_get_signals.py # extract temporal signals (vital signs and laboratory tests)
$ python 02_extract_notes.py --firstday # extract first day clinical notes
$ python 03_merge_ids.py # merge admission IDs
$ python 04_statistics.py # run statistics
$ python 05_preprocess.py # run preprocessing
$ python 06_doc2vec.py --phase train # train doc2vec model
$ python 06_doc2vec.py --phase infer # infer doc2vec vectors

Run

Baselines

Baselines (i.e., logistic regression, and random forest) are implemented using scikit-learn. To run:

$ python baselines.py --model [model] --task [task] --inputs [inputs]

Deep models

Fusion-CNN and Fusion-LSTM are implemented using PyTorch. To run:

$ python main.py --model [model] --task [task] --inputs [input] # train Fusion-CNN or Fusion-LSTM
$ python main.py --model [model] --task [task] --inputs [input] --phase test --resume # evaluate

clinical-fusion's People

Contributors

onlyzdd avatar bravezdd avatar vasudev-sharma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.