BELANGRIJK: Deze repository bevat synthetische data. Deze synthetische data zijn gemaakt met datasets met individuele persoonsgegevens, waar de echte personen zijn vervangen door 'neppersonen'. Dat is gedaan op zo'n manier dat analyse van de synthetische data ongeveer dezelfde resultaten geeft als analyse van de oorspronkelijke data. Als echte personen zich toch lijken te herkennen in de synthetische data, dan is dat zuiver toeval: wij publiceren geen persoonsgegevens. Statistische eigenschappen van de synthetische data anders dan de resultaten in deze repository zijn niet noodzakelijkerwijs gelijk aan statistische eigenschappen van de oorspronkelijke data. Deze synthetische data mogen niet buiten deze repository gebruikt worden.
IMPORTANT: This repository contains synthetic data. These synthetic data have been created from datasets with personal information, but with real persons replaced by 'fake persons'. That has been done in such a way that the analysis of the synthetic data gives approximately the same results as analysis of the original data. If real persons seem to recognise themselves in the synthetic data, then that is pure coincidence: we do not publish personal data. Statistical properties of the synthetic data other than the results in this repository do not necessarily reflect the statistical properties of the original data. These synthetic data may not be used outside this repository.

BELANGRIJK: Deze repository bevat synthetische data. Deze synthetische data zijn gemaakt met datasets met individuele persoonsgegevens, waar de echte personen zijn vervangen door 'neppersonen'. Dat is gedaan op zo'n manier dat analyse van de synthetische data ongeveer dezelfde resultaten geeft als analyse van de oorspronkelijke data. Als echte personen zich toch lijken te herkennen in de synthetische data, dan is dat zuiver toeval: wij publiceren geen persoonsgegevens. Statistische eigenschappen van de synthetische data anders dan de resultaten in deze repository zijn niet noodzakelijkerwijs gelijk aan statistische eigenschappen van de oorspronkelijke data. Deze synthetische data mogen niet buiten deze repository gebruikt worden.

IMPORTANT: This repository contains synthetic data. These synthetic data have been created from datasets with personal information, but with real persons replaced by 'fake persons'. That has been done in such a way that the analysis of the synthetic data gives approximately the same results as analysis of the original data. If real persons seem to recognise themselves in the synthetic data, then that is pure coincidence: we do not publish personal data. Statistical properties of the synthetic data other than the results in this repository do not necessarily reflect the statistical properties of the original data. These synthetic data may not be used outside this repository.

Introduction

This repository contains the code used for data analysis and simulations leading to projections of COVID-19 ICU and hospital admissions, carried out by the Dutch National Institute for Public Health and the Environment (RIVM) on 6 January 2021. It is supplementary material with the publication "Projecting COVID-19 intensive care admissions in the Netherlands for policy advice: February 2020 to January 2021", by Klinkenberg et al (https://doi.org/10.1101/2023.06.30.23291989).

The code was originally used with the newest surveillance data, containing privacy-sensitive information on individual patient level. To let the code in this repository work, synthetic datasets have been created that approximately produce the same parameter estimates. To run the exact same simulations, the original parameter estimates have been provided as well.

How to use the code?

Save the repository in your local environment and open it as an R Project in RStudio. By opening the file "R/00_masterscript_20210106.R" and running it in order, all analyses are carried out and all simulation functions are defined by sourcing code files elsewhere in the repository, all simulations are run, and results are plotted. The code files themselves contain (brief) comments explaining what is done. The masterscript consists of the following steps:

Block 1: load libraries

The repository was created with the following version of R and packages

platform       x86_64-redhat-linux-gnu     
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          3.0                         
year           2023                        
month          04                          
day            21                          
svn rev        84292                       
language       R                           
version.string R version 4.3.0 (2023-04-21)
nickname       Already Tomorrow 

  deSolve lubridate   forcats   stringr     dplyr     purrr     readr 
   "1.35"   "1.9.2"   "1.0.0"   "1.5.0"   "1.1.2"   "1.0.1"   "2.1.4" 
    tidyr    tibble   ggplot2 tidyverse     stats  graphics grDevices 
  "1.3.0"   "3.2.1"   "3.4.2"   "2.0.0"   "4.3.0"   "4.3.0"   "4.3.0" 
    utils  datasets   methods      base 
  "4.3.0"   "4.3.0"   "4.3.0"   "4.3.0"

Block 2: analyse data

Define ANALYSISDATE and DELAYSTARTDATE. The latter determines which data to include to calculate the reporting delay distribution.
Read population data (size and age distribution)
Read and analyse the NICE hospital data for all probabilities and lengths-of-stay distributions. Warning messages (NaNs produced) can be ignored, as the final fits look well.
Calculate the reporting delay distribution from the NICE data
Read and analyse the OSIRIS notification data for the symptom-to-hospital distributions, and define incubation period distribution based on literature\
Analyse serological data to estimate hospitalisation probabilities and age-dependent infectivity/susceptibility, and define generation interval distribution
Read the contact matrices for all sets of control measures
Save the results (the repository contains the saved file)

Block 3: define additional functions

Functions to process simulation options and parameters to correct population and infectivity
Functions to process simulation options and parameters for delays and probabilities
Functions for the simulations themselves, with different purposes (estimation, simulation). There are two sets of simulation functions: first, 'engine' functions that do the actual simulations and take numerical input for all parameters; second, functions to optimise the likelihood or simulate scenarios, with more readable input parameters (eg names of matrices, named options). The file starts with function definitions to convert some of the readable input of the second set of functions to numerical input required for the first set.
Some additional functions required for parameter sampling and plotting simulated output

Block 4: load newest incidence data

The ANALYSISDATE is redefined and the newest data imported. This makes it possible to fit the model and run simulations without re-estimating all datasets in step 1. Not needed here, but used during code updates.
Save everything just before fitting to incidence data (the repository contains the saved file)

Block 5: load original results (only relevant when using synthetic data, not relevant when using original data as in the original code)

Replace parameter estimates from synthetic data with original parameter estimates from actual data. Keep synthetic data in memory.

Block 6: optimise likelihood to estimate stepwise constant transmissibilities and initial state (step 3 of parameter estimation)

The function logLik_optimise() estimates the initial state y_0, and the stepwise constant transmissibilities, given the changepoints of these constant transmissibilities, by fitting the simulated daily ICU admissions to reported daily ICU admissions. The changepoints are indicated in the input vector 'periodgroups', containing indicator variables for which stepwise transmissibility is used with which contact matrix. The corresponding contact matrices are given in 'contactcontrol', with corresponding transition times in 'endtimes'. The function logLik_optimise is used to assess the likelihood per set of changepoints and matching transmissibilities. Many options for plausible changepoint sets are evaluated manually and an optimal set is selected based on AIC. When comparing two fits, the fit with fewer parameters is preferred if 2*negative_log_likelihood + 2*nr_of_changepoints is no more than two points higher. The negative_log_likelihoods of three fits (sets of changepoints) are given below, with the final model choice. Not all evaluated changepoint sets are listed here.
Save the optim result (the repository contains the saved file)

Block 7: run the simulations

Sample 200 parameter sets of initial values and stepwise transmissibilities given the result of the previous block
Run the simulations for different scenarios, given by the contact matrices. Median contact matrices (that were for fitting the simulated to the observed ICU admissions) are used up to 14 days before ANALYSISDATE. The uncertainty about the recent and future contact patterns is larger than the uncertainty about the past, so 200 different contact matrix samples are used after ANALYSISDATE - 14.
Save the simulations (the repository contains the saved file of the first 20 runs)

Block 8: plotting the results

Some plots as used to present results to the policy makers.

mvboven / covid-projectionmodel Goto Github PK

covid-projectionmodel's Introduction

Introduction

How to use the code?

Block 1: load libraries

Block 2: analyse data

Block 3: define additional functions

Block 4: load newest incidence data

Block 5: load original results (only relevant when using synthetic data, not relevant when using original data as in the original code)

Block 6: optimise likelihood to estimate stepwise constant transmissibilities and initial state (step 3 of parameter estimation)

Block 7: run the simulations

Block 8: plotting the results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent