IHME and curve-fitting for admissions

Adding predictions made with IHME forecasts for Massachusetts, accounting for TMC's 3% market share. (done)
Look further into curve-fitting for admissions data. Create a working framework with simple model.

Towards an ABC pipeline for parameter learning

We want to make the ability to learn parameters

Prototyping and debugging

Create SimplePatientTrajectory.py, a simpler version of PatientTrajectory.py (use same probabilities no matter if recovering or declining, etc)
Simulate a toy dataset with known "ground truth" parameters and known initial conditions
Record census counts for this simulation as "data"
Select a plausible set of priors for each progression probability (just pick vague ones)
See if we can use ABC MCMC to estimate posteriors over true parameters (testing over the amount of recorded data: 5 days worth, 25 days worth, 100 days worth, etc)

Add to output CSV file

admitted to each stage (Presenting, General, ICU, onVent)

County data

Look at NYT dataset for county level data.
Interested in:

Chime parameters
Data to evaluate our models (hospitalizations, infections, deaths, icus etc)

Visualization Tasks from 4/7 Meeting

Update aesthetics (pretty-print strings, update dimensions of plots, layout)
Add checkboxes to Dash app to show/hide plots based on name
Add series for actual data to compare with predictions/forecast
Add column for actual date information rather than generic time step
Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)

Personalizing risk probabilities

@rathp Want to update PatientTrajectory.py to include "age" or other covariates.

Look over research literature, see if you can find age-specific or condition-specific estimates of:

who is presenting at hospital with symptoms
who is progressing to each stage
how long they are staying at each stage (e.g. is recovery longer for older patients?)

Improving the model

@hzhzhzhzhzhzhz Want to make the markov model more realistic.

Current: GeneralWard + decline > ICU + decline > Vent + recover > LEAVE MODEL

Ideal: General + decline > ICU + decline > Vent + recover > ICU + recover > General + recover

First steps:

What (if anything) would we change the markov model spec here:
https://github.com/tufts-ml/covid19-forecasting/blob/master/params_simple_example.json
or the code in PatientTrajectory.py

Integrate pipeline with SLURM cluster on Tufts HPC

Todos:

Write script that takes in a folder full of results-{random_seed}.csv files, and produces a summary of percentiles
- Percentiles: 1, 2.5, 5, 10, 50, 90, 95, 97.5, 99
Snakemake for SLURM
Run a 50 trial test
Run a 1000 trial

Visualization TODOs from 4/10 Meeting

Gabe

Add ability in Dash app to switch between scenarios

Diana

Add ability for html plots to print out different scenarios

@amosca01

Add functionality for jumping to specific date ranges

TBD

Plot actual clinical data as an additional time series

@amosca01

Create workflow to pull in actual clinical data from Tufts

TODO revise PatientTrajectory class to include admit_type : "referral" or "new"

Summary

This change came from conversation with TMC stakeholders Tues 04/07

Background

Tufts Medical is part of a big hospital system called wellforce

Currently, numbers at TMC are lower than other area hospitals.

Expectation is that this will change as other area hospitals "fill up" and refer patients to TMC

TMC will then have a "mixture" of two types of patients:

Those who come in directly from home (what our model already assumed, will call these admit_type="new")
Those who come in from other hospitals, having already been hospitalized there (admit_type="referral")

We want to quickly revise the model so that can specify progression probabilities by admit_type (collaborators speculate that referred patients might have different demographic characteristics and different chances of recovering).

Proposed changes to params.json

For each essential parameter (e.g. proba_Recovering_given_{state}, and pmf_timesteps_{status}_{state}), we now accept TWO forms of probability:

Form 1: just value (as before)

"proba_Recovering_given_InICU": 0.5

Form 2: a dict of probability values keyed by admit_type

"proba_Recovering_given_InICU": {"new": 0.5, "referral: 0.1}

Key idea for expansion: we have a predefined order of patient attributes (e.g. ['age', 'admit_type', 'has_diabetes', 'has_heart_disease'])
When defining parameters for a patient, we try each key in order.

Proposed changes to PatientTrajectory.py

Add attribute: "admit_type"

Inside simulate trajectory, sample from the most specific attribute-driven distribution you can

Workflow for deploying a self-hosted visualization

@deastman @amosca01

As you work out a plausible workflow for deploying our visualizations to a self-hosted instance of plotly, use this issue to sketch out

First, what do we need from the IT folks

A
B

Then make a list of features we need in code:

A
B

Then, make a list of steps we need to take to deploy a forecast each time it is created (on our planned twice per week dev cycle)

Update snakemake file for that day's workflow
B
C

ADD functionality so terminal state can happen at other stages

@cuong450 @gvisani can we add this?

Health state is currently either declining or recovering

Would like to expand this to declining, recovering, or terminal.
Basically, this lets us at each stage when we draw the patient's health state to allow for possibility that they die

As soon as terminal state is drawn, patient leaves the model, so no "duration" here.

Please comment below with your proposal:

How should parameters JSON file change?
--- Need to somehow specify of terminal state from each declining state... we'll assume if patient is on recovery path that terminal chance is zero. Probably declining in general ward has very low chance, declining in ICU off vent has low chance, declining in ICU on vent has some chance
How should the PatientTrajectory.py logic change?
Anything else needed?

TODOs from 04/07 meeting

@michaelchughes Recruit new teammembers to help with viz dashboard deployment and other features
@deastman

Update aesthetics (pretty-print strings, update dimensions of plots, layout)
Add checkboxes to Dash app to show/hide plots based on name
Add series for actual data to compare with predictions/forecast
Add column for actual date information rather than generic time step
Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)
Figure out how to deploy Plotly Dash app

@panlybero Try out a pipeline that uses CHIME to provide input numbers, in a probabilistic way (specify distributions over CHIME inputs, run using samples from these distributions)
@gvisani Read up on the IHME model and come up with prelim strategies for a "curve fitting" approach
Paper: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf
Current MA forecasts: https://covid19.healthdata.org/united-states-of-america/massachusetts

Incorporating a comparison to IHME forecasts

Hi @cuong450 @gvisani I've been looking a bit this morning to see what it would take to compare our ABC numbers to the forecasts produced by IHME and published here: http://www.healthdata.org/covid/data-downloads

Process

I downloaded the IHME forecasts that were released on Jan 15, 2021, thinking this was closest to our "testing" period of 01/11 - 02/11 of this year.

I wrote a little script that extracts the relevant rows for a specific state and target date range.

The raw fields available are

allbed_mean : Mean covid beds needed by day
ICUbed_mean : Mean ICU covid beds needed by day
InvVen_mean : Mean ICU covid beds needed by day

My script transforms the raw available fields into our standardized format as:

n_InICU = "ICUbed_mean"
n_OnVentInICU = "InvVen_mean" (Mean invasive ventilation needed by day)
n_OffVentInICU = "ICUbed_mean" - "InvVen_mean"
n_InGeneralWard = "allbeds" - "ICUbed"

Below I show some raw results.

Basically, the overall hospitalization numbers look way off for IHME (something like 50% error in MA, ). Wondering if this is just because they don't adapt carefully to these numbers, or if there's a mistake in my preprocessing.

Massachusetts : IHME seems to overshoot by 50% in general ward, by 11% in ICU)

Our raw data

date      n_InGeneralWard  n_InICU
20210110  1428.0           459.0
20210111  1378.0           451.0
20210112  1346.0           451.0
20210113  1446.0           461.0
20210114  1431.0           454.0
20210115  1458.0           451.0

IHME predictions

date      n_InGeneralWard_mean  n_InICU_mean
20210111  2094.4515             489.1111
20210112  2123.4055             495.8101
20210113  2150.8419             502.0627
20210114  2176.7907             507.8526
20210115  2201.2932             513.1421

California : IHME seems to overshoot by 25% in general ward, by 40% in ICU

Our raw data

date      n_InGeneralWard  n_InICU  n_TERMINAL  n_TERMINAL_5daysSmoothed
20210108  16863.0          4905.0   493.0       540
20210109  16738.0          4939.0   695.0       501
20210110  16649.0          4965.0   468.0       480
20210111  16833.0          4971.0   264.0       476
20210112  16766.0          4962.0   548.0       484
20210113  16530.0          4929.0   589.0       518
20210114  16206.0          4878.0   552.0       599
20210115  16038.0          4833.0   637.0       576

IHME predictions

date      n_InGeneralWard_mean  n_InICU_mean  n_Terminal_mean
20210108  18640.4770            6222.6998     540.0000
20210109  19083.4146            6361.3600     601.0000
20210110  19523.9250            6499.3644     297.0000
20210111  19960.9465            6638.2647     256.0000
20210112  20394.6927            6777.7344     454.1800
20210113  20824.5822            6921.0634     463.7011
20210114  21249.5539            7062.7601     473.5935
20210115  21670.3658            7202.1602     483.7611

TODOs from Fri 4/3 meeting

Input from CHIME/IHME models of presenting patients (@panlybero )

Specify as input files: "results/chime_output_1.csv" (columns: timestep, n_Presenting)
Can use random seeds for this too:
- chime_1.csv
- chime_2.csv

Running on Tufts HPC cluster ( @gvisani )

Given a concrete set of params (json file), we want to be able to kickoff 1000 trials with specified random seeds, save those to a output directory with specified name

Reporting results (@panlybero )

Write script that consumes folder full of results files, and produces summary statistics at each day for each stage
---- INPUT: results-1.csv, results-2.csv, .... results-101.csv (columns: n_InICU, n_OnVentInICU, ...)
---- OUTPUT: summary.csv (columns: n_InICU_2.5, n_InICU_5, n_InICU_50) [explode each colung in INPUT to one for each percentile)

Visualization (@diana)

Write script that converts summary.csv of a simulation to set of plots
Visualization to explore how parameters matter (slider where adjust mean time ICU for recovery)

Update aesthetics (pretty-print strings, update dimensions of plots, layout)
Add checkboxes to Dash app to show/hide plots based on name
Add series for actual data to compare with predictions/forecast
Add column for actual date information rather than generic time step
Allow for ability to update visualization based upon parameter change (proof of concept: recovery probability)

DATASETS standardized format

Proposal: Each dataset is defined by a folder under "datasets/" in the repo, with following standardized structure

datasets/ where a nickname could be something like "sitename-startdate-stopdate" (e.g. MA-20200301-20200901)

daily_admissions.csv : CSV file with columns

date, n_InGeneralWard, n_OffVentInICU, n_OnVentInICU

daily_counts.csv : CSV file with columns

date, n_InGeneralWard, n_OffVentInICU, n_OnVentInICU

config.json : Configuration related to this dataset

Towards baseline probabilistic predictions

We have two basic (and very related) prediction tasks:

Task 1: Admission

Given admission counts from the past W timesteps, predict next admission count

That is, we want to learn a distribution
$$
p( a_t | a_{t-W:t-1} )
$$
We could also condition on external features $x$ (e.g. the forecasted state-wide admissions from some trustworthy model).

Task 2: Census

Given census counts from the past W timesteps, predict next census count

$$
p( c_t | c_{t-W:t-1} )
$$
Of course, we could also condition on other features (e.g. also on emissions, or on census counts from another stage). But let's focus on this for now.

Learning goals

If $t$ is today's date, we want to make a probabilistic forecast for 1 week ahead: $a_{t+1}, a_{t+2}, \ldots a_{t+7}$

Model strategy

For either task, let's try to get each one of these working before we move on to the next step:

Consider a Bayesian linear regression model with Normal likelihood

Use Stan or PyMC3 (or roll your own, if needed)

Allow for missing data in the context

That is, we want to be able to deal with the fact that some windows $W$ will not have all observed values.

Consider a Bayesian generalized linear model with an integer-sample-space likelihood

We are predicting counts. We should use something like Poisson, Negative Binomial, etc.
Using Gaussian is naive and wastes probability mass on non-integer values.

Look at fancier models (e.g. Gaussian process, linear model with learned basis function, etc.)

Problems we'll need to solve along the way

Hyperparameter selection:

How to select likelihood hypers (variance, window size)?
How to select prior hypers?

tufts-ml / covid19-forecasting Goto Github PK

covid19-forecasting's Introduction

covid19-forecasting

Updates

Our mechanistic hospitalized patient trajectory model : ACED-HMM

Our latent variable models for single-site future counts

Usage

Getting Started

Using Snakemake workflows for reproducibility

Using Snakemake workflows on the Tufts HPC cluster

Installation

1. Install Anaconda

2. Install Snakemake

3. Install semimarkov_forecaster conda environment

Modeling

covid19-forecasting's People

Contributors

Stargazers

Watchers

covid19-forecasting's Issues

Personalizing risk probabilities

Improving the model

Summary

Background

Proposed changes to params.json

Proposed changes to PatientTrajectory.py

Process

Massachusetts : IHME seems to overshoot by 50% in general ward, by 11% in ICU)

Our raw data

IHME predictions

California : IHME seems to overshoot by 25% in general ward, by 40% in ICU

Our raw data

IHME predictions

Input from CHIME/IHME models of presenting patients (@panlybero )

Running on Tufts HPC cluster ( @gvisani )

Reporting results (@panlybero )

Visualization (@diana)

Task 1: Admission

Task 2: Census

Learning goals

Model strategy

Problems we'll need to solve along the way

Recommend Projects

Recommend Topics

Recommend Org

3. Install `semimarkov_forecaster` conda environment