Giter VIP home page Giter VIP logo

clinicalxlnet's Introduction

Clinical XLNet

This repo hosts pretraining and finetuning weights and relevant scripts for Clinical XLNet.

Requirements

torch
argparse
copy
tqdm
matplotlib
numpy
pandas
time
sklearn

Pretrained Clinical XLNet Weights

To download pretrained Clinical XLNet, click the following links: This only uses Nursing Notes to pretrain and this uses the discharge summary to pretrain.

PMV and Mortality Prediction using Clinical XLNet

Below list the sample scripts for running prediction. You can also simply modify the label to do your own downstream prediction task. This is the finetuned weights for PMV task, and this is the finetuned weights for Mortality task.

Using Finetuned weights for Mortality or PMV Prediction

python train.py \
  --data_dir DATA_FILE\
  --config_path CONFIG\
  --model_path MORTALITY/PMV_MODEL_PATH \
  --save_meta_finetune_path SAVE_PATH \
  --prediction_label Mortality/PMV \
  --Batch_Size_Meta 4 \
  --Learning_Rate_Meta 1e-5 \
  --Training_Epoch_Meta 4 \
  --Batch_Size_Finetune 128 \
  --Learning_Rate_Finetune 2e-5 \
  --Training_Epoch_Finetune 30 \
  --saving_notes_embed_batch_size 32 \
  --skip_meta_finetuned 

Training your own mortality or PMV prediction model from pretraining ClinicalXLNet

python train.py \
  --data_dir DATA_FILE\
  --config_path CONFIG\
  --model_path PRETRAIN_MODEL_PATH \
  --save_meta_finetune_path SAVE_PATH \
  --prediction_label Mortality/PMV \
  --Batch_Size_Meta 4 \
  --Learning_Rate_Meta 1e-5 \
  --Training_Epoch_Meta 4 \
  --Batch_Size_Finetune 128 \
  --Learning_Rate_Finetune 2e-5 \
  --Training_Epoch_Finetune 30 \
  --saving_notes_embed_batch_size 32 

It will use the train.csv, val.csv, and test.csv from the (DATA_FILE) folder.

The results of AUROC and AUPRC will be printed out.

Datasets

We use MIMIC-III. Please fufill the CITI training program in order to use it. To use your own notes dataset, further pretraining is recommended.

File system expected:

-data
   -train.csv
   -val.csv
   -test.csv

Pretraining your own Clinical XLNet

We provide a notebook tutorial to pretrain your own Clinical XLNet.

Preprocessing and cohort curation

We provide notebook for preprocessing clinical notes and curate the PMV cohort on MIMIC-III. It consists of two parts, R script generates the general mechanical ventilation cohort and this notebook generates the specific cohort, see papers for detailed cohort curation process.

Contact

Please contact [email protected] for help or submit an issue.

Citation

Please cite arxiv:

@article{clinicalxlnet,
author = {Kexin Huang and Abhishek Singh and Sitong Chen and Edward Moseley and Chin-ying Deng and Naomi George and Charlotta Lindvall},
title = {Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation},
year = {2019},
journal = {arXiv:1912.11975},
}

clinicalxlnet's People

Contributors

akwok-dfci avatar kexinhuang12345 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

clinicalxlnet's Issues

None of the Links are working

Hi,

None of the links for pre-trained model with weights for fine tuned weights with PMV and mortality tasks are working.

Could you please update the links?

Thanks

Pooja Goyal

Missing scripts to create `icustay_detail.csv` and `ventdurations`

Hi,

Thanks a lot for this repository and for making everything public. I am trying to reproduce the dataset you created to run some experiments. I have tried running the R code in ./cohort_curation but it requires some user-derived csv's.

Would it be possible to upload the scripts that created these files?

Many thanks,

Felix

Could you also share the script about running other models?

Hi, I have read your paper, it's a brilliant job and I appreciate the script and models you open-sourced. I'd like to know more about how you generated the document embedding when using other models like clinical-BERT, did you regard a whole note as a "sentence" and cut the word that exceeds the maximum token number? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.