Giter VIP home page Giter VIP logo

covid19_epi_model's Introduction

COVID19 Population-Level Epidemiologic Models and Analyses

Note: This README is updated, but not every day, so some things may be out of date

This project focuses on population-level analyses of COVID19 time series. Our spatial units of interest are at the US county-level.

This repository contains all scripts used to produce this analysis. Data are linked and/or provided for download in agreement with the data licenses pertaining to each source.

Status: Refreshing data through July, adding counties in Southern US.

License: This project is licensed under the terms of the MIT license.

Cite: McGough et al. 2020 pre-print.

Project Data Scientists

  • Sarah McGough
  • Ryan Gan

Overall Aim

The overall aim of this project is to understand COVID19 mortality at a population-level. Herein, we developed a generalized additive model that models the general response of daily COVID19 deaths over time, within specific counties, county-level constructs of sociodemographic characteristics, and Google mobility.

The two specific aims are as follows:

Specific Aim 1:

Understand and describe the relationship between COVID deaths, and if any constructs of population-level characteristics increase the rate of COVID deaths.

Approach:

Population-level time series evaluating constructs of sociodemographic factors such as prevalence smoking, prevalence obesity, median household income, etc. and their association with rate of daily COVID events, accounting for trends in time.

Specific Aim 2:

What is the impact of social distancing initiatives at the population-level on COVID deaths?

Approach

In the same time-series model used above, we use a penalized distributed lag approach to model the association between percent change in retail and recreation mobility using Google mobility data.

Folder Structure:

data: Where created datasets are stored using data located in source_data.

data/source_data/: Contains scrapped data at different spatial and temporal resolutions.

  • nyt-us-counties: New York Times Daily county time series of confirmed COVID cases and deaths. CSV not provided in accordance with licensing/redistributing restrictions, but is downloaded when script is executed. Link to source github.

  • rwj-county_ranking-2020: Robert Wood Johnson (RWJ) sociodemographic variables. Used with permission from RWJ. CSV not provided in accordance with licensing/redistributing restrictions, but is downloaded when script is executed. Link to website.

  • google-mobility: Google mobility data. Used with permission. Note, we cannot house raw Google mobility data in our repo (CSV not provided), but is downloaded when script is executed. Link to source.

  • cb_2018_us_county_5m: Tiger/Line shapefile for US census used for various spatial calculations and maps. Link to US Census 2018 boundary file

  • county_pop_density.csv: Created using RWJ county population and county areas using TIGER shapefile.

  • county_fips.csv: Mapping between county names and 5-digit character FIPS code.

  • cdc_cancer.csv: CDC cancer incidence. Link to CDC Cancer Statistics

  • cms_comorbidity.csv: CMS data on prevalence of acute myocardial infarction, hypertension, and chronic obstructive pulmonary disease. Link to CMS Mapping Medicare Disparities

references: Contains manuscripts and references used for the project.

  • methods Folder for methods mostly around generalized additive models (GAMs) and distributed lag models. Note we do not include these in the public repo to avoid copywrite issues, but many of the references are publically available.

  • data_dictionary Contains data dictionary for Robert Wood Johnson (RWJ) variables. We have also added our own key for names used in analysis and their index in the original RWJ data dictionary; it can be used to map between the two.

results: Contains results from scripts. Script of origin denoted by 2 digit character (e.g. 04-result_figure came from 04 script).

scripts: Contains scripts for data collection, models, and results. Note, we will add/modify 00-run.R script to make sure appropriate packages are installed.

  • 00-run.R: Runs all subsequent scripts. Can be executed from the top Level of the directory using the following command in the command line: Rscript './scripts/00-run.R'.

  • 01-collect_data.R: Collects all data necessary for analysis.

  • 02-make_config_file.R: Creates a config.yaml file used in subsequent scripts.

  • 03-make_nyt_time_series_county.R: Builds a master time series of all COVID19 cases and deaths, mobility, and RWJ variables for counties in the US.

  • 04-county_principle_components_analysis.R: Runs principle components analysis for study counties, and sensitivity analysis that excludes New York City.

  • 05-analysis_dataframe.R: Joins in principle components to nyt time series and limits only to counties of interest for analysis.

  • 06-descriptive_figure.R: Creates Figure 1 plot of study map and time series of retail and recreation mobility and COVID19 deaths.

  • 07-main_gam_boot.R: GAM and corresponding bootstrap for primary results of principle component and lagged retail and recreation mobility. Saves large R list objects to be used in subsequent script. Note, that the number of iterations is based on the available compute cores. If at least 20 cores available it will run 10k iterations, else it will only run 100. User should be aware that this process can take awhile.

  • 08-boot_results.R: Creates results and figures for manuscript. For PC1, we used bias corrected percentiles of bootstraps, where DLNMs we used percentile method without bias corrected as fits looked fine. Also includes versions of plots showing all bootstrap iterations as another visualization of uncertainty.

  • 09-sensitivity_analyses.R: Runs sensitivity analyses excluding NYC, limiting temporal trend, and trying another approach to model lagged relationship. Note we use the mgcv bayesian posterior intervals for 95%CI as coverage is generally comparable to boot methods. Note, this is not always the case for time series models, hence our bootstrap approach above.

  • 10-dlnm_twostage.R: Sensitivity analysis using two-stage distributed lags. Runs dlnms for each county/location separately and pools results in meta-analysis. Note, that principle components cannot be included in these types of models since there is no variation within a given county, so dlnm estimates only.

  • config.yaml: Configure file generated in 02. Contains study end date, study county fips, and RWJ variables to evaluate in principle components analysis.

  • results/00-session_info.txt: Summary of time it takes to run from start to finish and R session info (R version and packages used).

covid19_epi_model's People

Contributors

ryangan avatar sarahhbellum avatar

Stargazers

 avatar Owain  gaunders avatar  avatar  avatar Jean-Paul R. Soucy avatar Sahir Bhatnagar avatar Ronan Perry avatar Allan Just avatar Devin Incerti avatar Navdeep Pal avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

restartus

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.