Giter VIP home page Giter VIP logo

covid19-forecasting's Introduction

covid19-forecasting

Simulation software for forecasting demand at various stages of hospitalization

PI: Michael C. Hughes

Jump to: Usage - Modeling - Installation

Updates

As of April 2021, we've moved a few polished efforts to their own stand-alone repositories:

Our mechanistic hospitalized patient trajectory model : ACED-HMM

Code is here: https://github.com/tufts-ml/aced-hmm-hospitalized-patient-trajectory-model

See our paper:

Gian Marco Visani, Alexandra Hope Lee, Cuong Nguyen, David M. Kent, John B. Wong, Joshua T. Cohen, and Michael C. Hughes. Approximate Bayesian Computation for an Explicit-Duration Hidden Markov Model of COVID-19 Hospital Trajectories.. Technical Report, 2021. https://www.michaelchughes.com/papers/VisaniEtAl_arXiv_2021.pdf

Our latent variable models for single-site future counts

Code is here: https://github.com/tufts-ml/single-hospital-count-forecasting/

See our paper:

Alexandra Hope Lee, Panagiotis Lymperopoulos, Joshua T. Cohen, John B. Wong, and Michael C. Hughes. Forecasting COVID-19 counts at a single hospital: A Hierarchical Bayesian approach. In ICLR 2021 Workshop on Machine Learning for Preventing and Combating Pandemics, 2021. PDF URL: https://www.michaelchughes.com/papers/LeeEtAl_ICLRWorkshopMLPreventingCombatingPandemics_2021.pdf

We will continue to use this repo for a few earlier-stage efforts.

Usage

Getting Started

Here's a very simple example, that will run our probabilistic progression model (with dummy initial conditions and dummy parameters) to forecast ahead for 120 days. (Requires you have already installed this project's conda environment

$ conda activate semimarkov_forecaster
$ python run_forecast.py \
    --config_file workflows/example_simple/params.json \
    --output_file /tmp/results-101.csv \
    --random_seed 101

Expected output:

----------------------------------------
Loaded SemiMarkovModel from config_file:
----------------------------------------
State #0 Presenting
    prob. 0.100 recover
    prob. 0.900 advance to state InGeneralWard
State #1 InGeneralWard
    prob. 0.100 recover
    prob. 0.900 advance to state OffVentInICU
State #2 OffVentInICU
    prob. 0.100 recover
    prob. 0.900 advance to state OnVentInICU
State #3 OnVentInICU
    prob. 0.100 recover
    prob. 0.900 advance to state TERMINAL
random_seed=101 <<<
----------------------------------------
Simulating for 120 timesteps with seed 101
----------------------------------------
100%|██████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 148.03it/s]
----------------------------------------
Writing results to /tmp/results-101.csv
----------------------------------------

This will write a CSV file to /tmp/results-101.csv, with columns for each census count and a row for each day

See an example output in (example_output/](.example_output/)

Using Snakemake workflows for reproducibility

Run the following, which will install all necessary python packages in a separate environment, and then run a single simple simulation with results saved to file results.csv

$ cd /path/to/covid19-forecasting/workflows/simple_example
$ snakemake --use-conda --cores 1 run_simple_example_simulation

Using Snakemake workflows on the Tufts HPC cluster

If you are in the hugheslab group and have access to the HPC cluster, you can

PREREQUISITE bashrc settings:

export PATH="/cluster/tufts/hugheslab/miniconda2/bin:$PATH"

Then login to the HPC system and do:

$ conda activate semimarkov_forecaster
$ pushd /cluster/tufts/hugheslab/code/covid19-forecasting/workflows/simple_example/
$ snakemake --cores 1 run_simple_example_simulation # Do NOT use '--use-conda' here, you already have the environment

Installation

1. Install Anaconda

Follow the instructions here: https://conda.io/projects/conda/en/latest/user-guide/install/index.html

2. Install Snakemake

$ conda install -c bioconda -c conda-forge snakemake-minimal

Having trouble? See the full install instructions: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html

3. Install semimarkov_forecaster conda environment

Use the project's included YAML file to specify all packages needed: semimarkov_forecaster.yml

conda env create -f semimarkov_forecaster.yml

Modeling

We have developed a probabilistic "semi-Markov" model to simulate individual patient trajectories through the major stages or levels of care within the hospital (present with symptoms, general ward, ICU, ICU with mechanical ventilation). When entering a stage, the patient first draws a new health status (recovering or declining), and then based on this status samples a “dwell time” duration (number of days to remain at current care stage) from a status-specific distribution. After the dwell time expires, recovering patients improve and leave the model, while declining patients progress to the next stage.

At each timestep, a patient can be described by:

  • a binary health state ('Recovering' or 'Declining')
  • an ordinal location state (e.g. 'Presenting', 'InGeneralWard', 'OffVentInICU', 'OnVentInICU')
  • the time left before transition to the next location state

Every parameter governing these distributions can be specified by the user, and all are readily estimated from local data or the literature (e.g. by counting the fraction of ventilator patients who recover).

We take an initial population, and run the model forward for a desired number of days.

By reading parameters in from a plain text file example, the model transparently facilitates communication of assumptions and invites modifications.

covid19-forecasting's People

Contributors

gvisani avatar michaelchughes avatar panlybero avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

covid19-forecasting's Issues

IHME and curve-fitting for admissions

  • Adding predictions made with IHME forecasts for Massachusetts, accounting for TMC's 3% market share. (done)
  • Look further into curve-fitting for admissions data. Create a working framework with simple model.

Towards an ABC pipeline for parameter learning

We want to make the ability to learn parameters

  1. Prototyping and debugging
  • Create SimplePatientTrajectory.py, a simpler version of PatientTrajectory.py (use same probabilities no matter if recovering or declining, etc)
  • Simulate a toy dataset with known "ground truth" parameters and known initial conditions
  • Record census counts for this simulation as "data"
  • Select a plausible set of priors for each progression probability (just pick vague ones)
  • See if we can use ABC MCMC to estimate posteriors over true parameters (testing over the amount of recorded data: 5 days worth, 25 days worth, 100 days worth, etc)

County data

Look at NYT dataset for county level data.
Interested in:

  • Chime parameters
  • Data to evaluate our models (hospitalizations, infections, deaths, icus etc)

Visualization Tasks from 4/7 Meeting

  • Update aesthetics (pretty-print strings, update dimensions of plots, layout)
  • Add checkboxes to Dash app to show/hide plots based on name
  • Add series for actual data to compare with predictions/forecast
  • Add column for actual date information rather than generic time step
  • Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)

TODOs for updating the model

Personalizing risk probabilities

@rathp Want to update PatientTrajectory.py to include "age" or other covariates.

Look over research literature, see if you can find age-specific or condition-specific estimates of:

  • who is presenting at hospital with symptoms
  • who is progressing to each stage
  • how long they are staying at each stage (e.g. is recovery longer for older patients?)

Improving the model

@hzhzhzhzhzhzhz Want to make the markov model more realistic.

Current: GeneralWard + decline > ICU + decline > Vent + recover > LEAVE MODEL

Ideal: General + decline > ICU + decline > Vent + recover > ICU + recover > General + recover

First steps:

What (if anything) would we change the markov model spec here:
https://github.com/tufts-ml/covid19-forecasting/blob/master/params_simple_example.json
or the code in PatientTrajectory.py

Integrate pipeline with SLURM cluster on Tufts HPC

Todos:

  • Write script that takes in a folder full of results-{random_seed}.csv files, and produces a summary of percentiles
    • Percentiles: 1, 2.5, 5, 10, 50, 90, 95, 97.5, 99
  • Snakemake for SLURM
  • Run a 50 trial test
  • Run a 1000 trial

Visualization TODOs from 4/10 Meeting

Gabe

  • Add ability in Dash app to switch between scenarios

Diana

  • Add ability for html plots to print out different scenarios

@amosca01

  • Add functionality for jumping to specific date ranges

TBD

  • Plot actual clinical data as an additional time series

@amosca01

  • Create workflow to pull in actual clinical data from Tufts

TODO revise PatientTrajectory class to include admit_type : "referral" or "new"

Summary

This change came from conversation with TMC stakeholders Tues 04/07

Background

Tufts Medical is part of a big hospital system called wellforce

Currently, numbers at TMC are lower than other area hospitals.

Expectation is that this will change as other area hospitals "fill up" and refer patients to TMC

TMC will then have a "mixture" of two types of patients:

  • Those who come in directly from home (what our model already assumed, will call these admit_type="new")
  • Those who come in from other hospitals, having already been hospitalized there (admit_type="referral")

We want to quickly revise the model so that can specify progression probabilities by admit_type (collaborators speculate that referred patients might have different demographic characteristics and different chances of recovering).

Proposed changes to params.json

For each essential parameter (e.g. proba_Recovering_given_{state}, and pmf_timesteps_{status}_{state}), we now accept TWO forms of probability:

  • Form 1: just value (as before)

"proba_Recovering_given_InICU": 0.5

  • Form 2: a dict of probability values keyed by admit_type

"proba_Recovering_given_InICU": {"new": 0.5, "referral: 0.1}

Key idea for expansion: we have a predefined order of patient attributes (e.g. ['age', 'admit_type', 'has_diabetes', 'has_heart_disease'])
When defining parameters for a patient, we try each key in order.

Proposed changes to PatientTrajectory.py

Add attribute: "admit_type"

Inside simulate trajectory, sample from the most specific attribute-driven distribution you can

Workflow for deploying a self-hosted visualization

@deastman @amosca01

As you work out a plausible workflow for deploying our visualizations to a self-hosted instance of plotly, use this issue to sketch out

First, what do we need from the IT folks

  • A
  • B

Then make a list of features we need in code:

  • A
  • B

Then, make a list of steps we need to take to deploy a forecast each time it is created (on our planned twice per week dev cycle)

  • Update snakemake file for that day's workflow
  • B
  • C

ADD functionality so terminal state can happen at other stages

@cuong450 @gvisani can we add this?

Health state is currently either declining or recovering

Would like to expand this to declining, recovering, or terminal.
Basically, this lets us at each stage when we draw the patient's health state to allow for possibility that they die

As soon as terminal state is drawn, patient leaves the model, so no "duration" here.

Please comment below with your proposal:

  • How should parameters JSON file change?
    --- Need to somehow specify of terminal state from each declining state... we'll assume if patient is on recovery path that terminal chance is zero. Probably declining in general ward has very low chance, declining in ICU off vent has low chance, declining in ICU on vent has some chance

  • How should the PatientTrajectory.py logic change?

  • Anything else needed?

TODOs from 04/07 meeting

  • Update aesthetics (pretty-print strings, update dimensions of plots, layout)
  • Add checkboxes to Dash app to show/hide plots based on name
  • Add series for actual data to compare with predictions/forecast
  • Add column for actual date information rather than generic time step
  • Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)
  • Figure out how to deploy Plotly Dash app

Incorporating a comparison to IHME forecasts

Hi @cuong450 @gvisani I've been looking a bit this morning to see what it would take to compare our ABC numbers to the forecasts produced by IHME and published here: http://www.healthdata.org/covid/data-downloads

Process

I downloaded the IHME forecasts that were released on Jan 15, 2021, thinking this was closest to our "testing" period of 01/11 - 02/11 of this year.

I wrote a little script that extracts the relevant rows for a specific state and target date range.

The raw fields available are

  • allbed_mean : Mean covid beds needed by day
  • ICUbed_mean : Mean ICU covid beds needed by day
  • InvVen_mean : Mean ICU covid beds needed by day

My script transforms the raw available fields into our standardized format as:

  • n_InICU = "ICUbed_mean"
  • n_OnVentInICU = "InvVen_mean" (Mean invasive ventilation needed by day)
  • n_OffVentInICU = "ICUbed_mean" - "InvVen_mean"
  • n_InGeneralWard = "allbeds" - "ICUbed"

Below I show some raw results.

Basically, the overall hospitalization numbers look way off for IHME (something like 50% error in MA, ). Wondering if this is just because they don't adapt carefully to these numbers, or if there's a mistake in my preprocessing.

Massachusetts : IHME seems to overshoot by 50% in general ward, by 11% in ICU)

Our raw data

date      n_InGeneralWard  n_InICU
20210110  1428.0           459.0
20210111  1378.0           451.0
20210112  1346.0           451.0
20210113  1446.0           461.0
20210114  1431.0           454.0
20210115  1458.0           451.0

IHME predictions

date      n_InGeneralWard_mean  n_InICU_mean
20210111  2094.4515             489.1111
20210112  2123.4055             495.8101
20210113  2150.8419             502.0627
20210114  2176.7907             507.8526
20210115  2201.2932             513.1421

California : IHME seems to overshoot by 25% in general ward, by 40% in ICU

Our raw data

date      n_InGeneralWard  n_InICU  n_TERMINAL  n_TERMINAL_5daysSmoothed
20210108  16863.0          4905.0   493.0       540
20210109  16738.0          4939.0   695.0       501
20210110  16649.0          4965.0   468.0       480
20210111  16833.0          4971.0   264.0       476
20210112  16766.0          4962.0   548.0       484
20210113  16530.0          4929.0   589.0       518
20210114  16206.0          4878.0   552.0       599
20210115  16038.0          4833.0   637.0       576

IHME predictions

date      n_InGeneralWard_mean  n_InICU_mean  n_Terminal_mean
20210108  18640.4770            6222.6998     540.0000
20210109  19083.4146            6361.3600     601.0000
20210110  19523.9250            6499.3644     297.0000
20210111  19960.9465            6638.2647     256.0000
20210112  20394.6927            6777.7344     454.1800
20210113  20824.5822            6921.0634     463.7011
20210114  21249.5539            7062.7601     473.5935
20210115  21670.3658            7202.1602     483.7611

TODOs from Fri 4/3 meeting

Input from CHIME/IHME models of presenting patients (@panlybero )

  • Specify as input files: "results/chime_output_1.csv" (columns: timestep, n_Presenting)
  • Can use random seeds for this too:
    • chime_1.csv
    • chime_2.csv

Running on Tufts HPC cluster ( @gvisani )

  • Given a concrete set of params (json file), we want to be able to kickoff 1000 trials with specified random seeds, save those to a output directory with specified name

Reporting results (@panlybero )

  • Write script that consumes folder full of results files, and produces summary statistics at each day for each stage
    ---- INPUT: results-1.csv, results-2.csv, .... results-101.csv (columns: n_InICU, n_OnVentInICU, ...)
    ---- OUTPUT: summary.csv (columns: n_InICU_2.5, n_InICU_5, n_InICU_50) [explode each colung in INPUT to one for each percentile)

Visualization (@diana)

  • Write script that converts summary.csv of a simulation to set of plots
  • Visualization to explore how parameters matter (slider where adjust mean time ICU for recovery)
  • Update aesthetics (pretty-print strings, update dimensions of plots, layout)
  • Add checkboxes to Dash app to show/hide plots based on name
  • Add series for actual data to compare with predictions/forecast
  • Add column for actual date information rather than generic time step
  • Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)

DATASETS standardized format

Proposal: Each dataset is defined by a folder under "datasets/" in the repo, with following standardized structure

datasets/ where a nickname could be something like "sitename-startdate-stopdate" (e.g. MA-20200301-20200901)

  • daily_admissions.csv : CSV file with columns

date, n_InGeneralWard, n_OffVentInICU, n_OnVentInICU

  • daily_counts.csv : CSV file with columns

date, n_InGeneralWard, n_OffVentInICU, n_OnVentInICU

  • config.json : Configuration related to this dataset

Towards baseline probabilistic predictions

We have two basic (and very related) prediction tasks:

Task 1: Admission

Given admission counts from the past W timesteps, predict next admission count

That is, we want to learn a distribution
$$
p( a_t | a_{t-W:t-1} )
$$
We could also condition on external features $x$ (e.g. the forecasted state-wide admissions from some trustworthy model).

Task 2: Census

Given census counts from the past W timesteps, predict next census count

$$
p( c_t | c_{t-W:t-1} )
$$
Of course, we could also condition on other features (e.g. also on emissions, or on census counts from another stage). But let's focus on this for now.

Learning goals

If $t$ is today's date, we want to make a probabilistic forecast for 1 week ahead: $a_{t+1}, a_{t+2}, \ldots a_{t+7}$

Model strategy

For either task, let's try to get each one of these working before we move on to the next step:

  1. Consider a Bayesian linear regression model with Normal likelihood

Use Stan or PyMC3 (or roll your own, if needed)

  1. Allow for missing data in the context

That is, we want to be able to deal with the fact that some windows $W$ will not have all observed values.

  1. Consider a Bayesian generalized linear model with an integer-sample-space likelihood

We are predicting counts. We should use something like Poisson, Negative Binomial, etc.
Using Gaussian is naive and wastes probability mass on non-integer values.

  1. Look at fancier models (e.g. Gaussian process, linear model with learned basis function, etc.)

Problems we'll need to solve along the way

Hyperparameter selection:

  • How to select likelihood hypers (variance, window size)?
  • How to select prior hypers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.