midas-network / covid-19 Goto Github PK

2019 novel coronavirus repository

TeX 100.00%

covid-19's Introduction

MIDAS 2019 Novel Coronavirus Repository

This GitHub only contains historical data. We have transitioned this GitHub repository to an online data catalog on the MIDAS website. We will not be accepting GitHub Pull requests or other contributions to this repository. If you'd like to point us to datasets, software, or other useful information, please let us know by contacting the MIDAS Coordination Center.

Introduction

This repository serves as a central platform to share computable information (in CSV format) relevant for modeling of the COVID-19 outbreak. The MIDAS Coordination Center (MCC) has created and will maintain it in collaboration with the broader modeling community. Community members are encouraged to contribute resources to the repository and thus support the overall COVID-19 research effort. See Information for Contributors for guidance on how to contribute material. Contact [email protected] for any questions or ideas for improvements, or to send/request any material to be included.

MIDAS 2019 Novel Coronavirus Mailing List

The MIDAS Coordination Center maintains a dedicated mailing list for updates and news about COVID-19 modeling research. To join the mailing list, complete the online request form.

Community contributions

We highly encourage community member to contribute to COVID-19 repository. Community contributions are acknowledged here.

Name	Affiliation	Contribution
Matt Biggerstaff	Centers for Disease Control	Parameter estimates
Cécile Viboud	Fogarty International Center	Parameter estimates
John Drake	University of Georgia	Data resources used by the CEID Coronavirus Working Group
Sang Woo Park	Princeton University	Line listing on South Korean data sets
Caitlin Rivers	Johns Hopkins University	Parameter estimates
Matthew Malishev	Emory University	Parameter estimates
Srini Venkatramanan	University of Virginia Biocomplexity Institute & Initiative	Surveillance and imported cases dashboards
Matteo Chinazzi	MOBS Lab	Parameter estimates
Shi Chen	University of North Carolina	Parameter estimates
Mauricio Santillana	Harvard University	Parameter estimates and data associated
Kaiyuan Sun	National Institutes of Health	Data resources
Shifu Chen	HaploX Biotechnology	Software resources

Data

Data Catalog

All data published in the repository are uploaded/found here. Data are uploaded by the MCC and by community members. Sets of related data files are presented as "collections". For example, a collection can be a set of related outbreak situation updates from a country, a set of time-stamped backup files, or another set of related files. Each collection has its own metadata, data guide, and location dictionary that maps geographic locations listed in the collection to international standards. Given the amount of information available globally, we will concentrate our efforts on listing computable datasets created by community members, instead of country-specific situation updates.

Parameter Estimates

Parameter estimates are stored in one CSV file with estimates for epidemiological parameters relevant for COVID-19 modeling. Estimates are extracted from a variety of sources including preliminary model reports, pre-prints, and peer-reviewed publications. For each parameter, metadata are also extracted. Parameter estimates are extracted by a team of curators from the MIDAS Coordination Center and community members. Parameter information is reviewed by corresponding authors before being posted. New parameter estimates can also be added by appending to the CSV file (see Information for Contributors).

Software Tools

Software tools for data-processing, modeling, and visualizations will be included in this section together with relevant metadata. One CSV catalog file includes all tools, each on a separate row. New tools can be added by appending to the CSV file or by submitting an issue.

Documents

All documents relevant to COVID-19 are posted in this section. Documents are mostly organized by country or by topic. Pre-prints and peer-reviewd manuscripts are posted with links to external webpages only while all other documents are also stored in this repository as collections in the documents folder. Add new documents by submitting an issue with the document information. *Given the large number of COVID-19 papers being published, we will continue to update COVID-19 modeling papers only.

Information for Users

Many people have contributed to the creation of this repository and of its content. Please cite the creators of data or other content listed in the respective metadata if you use any of their contributions. Also cite this repository as the source for those contributions as per the following suggested citation: "Creators of object, name of object, Retrieved from: MIDAS 2019 Novel Coronavirus GitHub Repository, URL. Accessed date"

Contact Information

For any questions or comments related to this repository, submit an issue or contact the MIDAS Coordination Center.

MIDAS Coordination Center
University of Pittsburgh
A737 Public Health
130 DeSoto Street
Pittsburgh PA 15261
United States
Tel: +1 412-624-7693
Email: [email protected]

covid-19's People

Contributors

Stargazers

Watchers

Forkers

choisy lunarmouse johnlevander nssac mattk7 linwangidd institutefordiseasemodeling missjoyalps darwinanddavis salauer thinkjrs pabloren nrolland ncov19 anabento haxzie ilanbm pengjiwu hhy5277 sunshineflickerhop hzh1-cl nickslevine harryhoch tkcy mrubayet star-ops nrva sasankadesu mktackabe gitgrupoift sdenega nizard1 5l1v3r1 jackwzp zohaibaamer adileg i7-ryzen sounak87 mpofukelvintafadzwa rafalates ismailsakdo mamh4 janes othmanchohdi duncanhiggins gabeochieng tanlull eufisica tele-sources chrishwiggins br00t4l17y bigboss21x broknfutr madacol xiaomanluo ameintjes schubertjan jbdatascience zain-saqer cesc342 ckoeksoy eamonmccann samuraiwarm abukaj willtarn pooyaravari karlosprado allaccountstaken jdmunguia selvaticus vincent0102 ikanberjalandidarat aknvictor theaception etorpy shoib1234 joshuatee jediknight0004 leoyichen anand7427 shivlondon bnapoli dormeir999 numalariamodeling davegeneral kashimmirza kalevivt affans pietracorvo younkert kant nfaltir emmamcbryde pi1011 mayankchhabra jiarmendano wiwern zavab digitalengineeringaej arafatx

covid-19's Issues

Mention to of historic influenza ?

Source : https://en.wikipedia.org/wiki/Template:Notable_flu_pandemics

For R0 : https://en.wikipedia.org/wiki/Template:Notable_flu_pandemics#cite_note-4

Biggerstaff, Matthew; Cauchemez, Simon; Reed, Carrie; Gambhir, Manoj; Finelli, Lyn (2014-09-04). "Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature". BMC Infectious Diseases. 14 (1): 480. doi:10.1186/1471-2334-14-480. ISSN 1471-2334. PMC 4169819. PMID 25186370.

visualisation for parameter estimates

Amazing work aggregating the parameter estimates, but for the casual viewer it would be nice to be able to /see/ these values. Perhaps something like this figure could be added to the README.

The code for this figure should be easy to adapt to the other parameters as well if you are interested.

library(dplyr)
library(reshape2)
library(ggplot2)


country_names <- c("China", "Iran", "Singapore")

x <- read.csv("estimates.csv",
              stringsAsFactors = FALSE,
              header = TRUE) %>%
    select(id,
           peer_review,
           name,
           abbreviation,
           units,
           country,
           value,
           lower_bound,
           upper_bound,
           title_publication) %>%
  filter(abbreviation == "R0") %>%
    filter(lower_bound != "Unspecified",
           value != "Unspecified",
           country != "Unspecified") %>%
    mutate(value = as.numeric(value),
           lower_bound = as.numeric(lower_bound),
           upper_bound = as.numeric(upper_bound))

id_order <- x$id[sort.int(x$value, index.return = TRUE)$ix]

nice_theme <- theme(
    panel.background = element_blank(),
    panel.grid.minor.y = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.title = element_text(size = 22),
    axis.text = element_text(size = 16),
    plot.title = element_text(size = 32),
    plot.subtitle = element_text(size = 22),
    legend.background = element_rect(colour = "black"),
    legend.title = element_text(size = 22),
    legend.text = element_text(size = 16),
    legend.key = element_rect(fill = "white")
    )

plot_df <- x
plot_df$plot_id <- factor(plot_df$id, levels = id_order)
plot_df$plot_peer_review <- sapply(plot_df$peer_review, is.na)

ggplot(plot_df,
       aes(x = plot_id,
           y = value,
           ymin = lower_bound,
           ymax = upper_bound,
           colour = country,
           shape = plot_peer_review)) +
  geom_pointrange() +
    geom_hline(yintercept = 1,
               linetype = "dashed") +
    labs(x = "Estimate Identifier",
         y = "Estimate",
         title = "R-naught",
         subtitle = "Basic Reproduction Number",
         colour = "Country",
         shape = "Peer Review\nStatus") +
    coord_flip() +
    nice_theme


## scale_factor <- 2
## ggsave("demo.png",
##        height = scale_factor * 14.8,
##        width = scale_factor * 10.5,
##        units = "cm")

Singapore C

format change for Ontario Situation Updates breaks csv

As of 2020/03/25 17:30 commas are being used in the status table on the Ontario Ministry of Health's page which is breaking the CSV format (e.g Cases_in_Ontario_2020-03-26.csv)

String quoting of CSV fields could be added to address this

date formatting issue in parameter estimates, estimates.csv

covid141,NA,medRxiv,ascertainment rate,NA,proportion,Japan,Unspecified,country,Unspecified,2020-28-02,Unspecified,0.44,confidence level

Date is formatted as YDM rather than YMD as all other dates. (2020-28-02 instead of 2020-02-28)

Korean updates today

https://www.cdc.go.kr/board/board.es?mid=a20501000000&bid=0015&list_no=365794&act=view#

https://www.cdc.go.kr/board/board.es?mid=a20501000000&bid=0015&list_no=365805&act=view#

Routine checks

Regular/Daily Briefings:

China: http://www.nhc.gov.cn/yjb/s2907/new_list.shtml
Hubei Province: http://wjw.hubei.gov.cn/fbjd/dtyw/
Hong Kong: https://www.dh.gov.hk/english/press/press.html and https://www.dh.gov.hk/tc_chi/press/press.html
Thai MOPH: https://ddc.moph.go.th/viralpneumonia/eng/news.php - also a thai version
Macau: https://news.gov.mo/search?5
WHO sitreps: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports

22 Jan 2020 - Wuhan local health commission announced that all briefings would go through Hubei Province rather than the local commission.

Periodic:

WHO: https://www.who.int/csr/don/archive/disease/novel_coronavirus/en/
WHO general DON: https://www.who.int/csr/don/en/
CDC: https://www.cdc.gov/media/dpk/diseases-and-conditions/coronavirus/coronavirus-2020.html
WHO guidance: https://www.who.int/health-topics/coronavirus
promedmail.org

Singapore COVID-19 cases in clusters view

Hey folks,
I'm collecting the official news provided by Singapore Gov. I did some text analysis and generate the cluster view of Singapore cases. I'm willing to contribute and get to know more and help to generate graph-structured data during COVID19. Please find the repo here.

https://github.com/lushl9301/Statistics-of-Singapore-COVID-19-Cases

Willing to adjust my data format so that to contribute to the data pool.

Thanks and Regards

Documentation/Conventions

Dates on files are publication dates
Date format: DD Mon YYYY - e.g., 22 Jan 2020
Use Google translate for machine translation.
Publish in English and original language.
Include a text file of PDF case reports and briefings when possible.
Files rather than links for briefings, publications etc. Include origin link to facilitate checking for updates.
Try to change special characters in file names. IE write out "beta" etc

include data on testing

There is a collection of sources for data on testing at the following page:

https://ourworldindata.org/coronavirus-testing-source-data

if this sort of data could be included that would be useful to understand the observation process.

a small suggestion

On https://github.com/midas-network/COVID-19/blob/master/software_tools/software_catalog.csv,
maybe column "URL_original" should be moved to be the 3rd column. Just my 2-cent. Feel free to close it without any change.

Activity status ?

Most of the content of parameter_estimates/2019_novel_coronavirus is from 2020-02-1X, mid february. Did you move the effort elsewhere ? @LucieContamin

License of the provided information?

Weird value of "death" variable in data/cases/global/

I noticed some weird values of the "death" variable in the line listing data. For example, the 27th case in Japan died at 02/13 but the death variable is 1 in the file 2020_02_18_1800EST_linelist_NIHFogarty.csv and 2020_02_19_1800EST_linelist_NIHFogarty.csv but becomes 1581552000 in all the following files starting from 2/20 to 3/16. There are 50-ish cases with similar issues though with different strange numbers. But interestingly for each case the weird number is consistent based on my partial observation.

Is this weird number indicating a death? My guess is that this input is recorded as the death date but then transformed into an integer accidentally.

Metadata link broken

nCoV-2019 Situation Reports from Johns Hopkins University Center for Health Security (metadata)

The metadata link is leading to 404 error

Estimates: Cumulative case counts

The first numbers in the estimates readme.md is 'Cumulative case counts'. My understanding is these are total infections (including undetected ones).

Firstly, these have an 'expiration date', and date when the estimate was done. Also what is being estimated varies widely across the papers. So would be better to have something like 'reporting rate' or 'detection rate', which has better utility for model-builders.

How do we make live data available ?

I'm running an hourly-updated system producing future-prediction coefficients. The output is JSON format.

An example showing it in action is here: https://cryptinc.com/covid19/covid19_predictor.html

It would benefit from someone with JavaScript and Charting/Mapping skills turning that data into an interactive tool to help people see what is in their immediate future. So far, it's proving to be accurate to within 3% when looking forward a few days, with good accuracy on mid term predictions as well.

A dozen curated global sources feed the back ends.

20200927 Data is the Same as 20200926 Data

20200927 Data is the Same as 20200926 Data.
Both 9/26 and 9/27 have the exactly same data, ending at 9/25/20.
9/27 is missing new data for 9/26/20.
Please review.

Please add this paper on "the role of absolute humidity on transmission rates of the COVID-19" to your repository

https://www.medrxiv.org/content/10.1101/2020.02.12.20022467v1

Thank you!

Why don't you put Hong Kong in the tie-in section in China?

A couple papers with parameter estimates for "time from symptom onset to hospitalization"

The following papers contain parameter estimates for days from symptom onset to hospitalization that could be added to the dataset here:

I'm new to GitHub, so apologies in advance if this is not the right channel or format to raise this.

Dispersion rate for Johns Hopkins under wrong category

The dispersion in the incubation period is listed under the "Dispersion" category, which I believe is meant to show dispersion in transmission.

Any updates for cases/global data?

Would be good!
Cheers!

Question: oxford & fogerty line listings

The Oxford set has the most line listings but is not current. Is there another source where a newer version can be found, or is it no longer being maintained?
The NIH Fogerty line listings contains 1400 rows. Is there an explanation somewhere of what this subset of cases consists of?

Thank you very much

Hey guys, thanks for the resource and may I contribute a model overview

I've been doing my best tracking online modeling efforts here:

https://github.com/vejmelkam/covid19-models

Maybe it would be useful as a reference, or just copy it here if interested. So far I was unfortunately unsuccessful in getting any help mapping modeling efforts in different languages (other than English, Czech, Slovak that is).

incubation period estimates need metadata about what the "value" is

As it is, it's not clear whether the value for any of the estimates (but I was looking in particular at the incubation period) are means or medians of distributions or exactly what the value represents.

Source of parameters and citations

General question:

Are these parameters being sourced on a pull request basis or is someone actively reading through the literature to get the estimates?

Because over the past few days, I've seen a few China CDC reports which are very comprehensive. It would be useful to include them if they're not already there. For some of these estimates, having an 'N' (number of patients) will be useful and somewhat essential when we try to use the most reliable one.

Finally, I am more comfortable with seeing 'XYZ et al.' as a way of citing them rather than the University (since a lot of the research is collaborative).

some updates on NSSAC dashboards

On https://github.com/midas-network/COVID-19/blob/master/software_tools/software_catalog.csv
can you help change creator_contact for both of NSSAC dashboards to [email protected]?

If you don't mind, please update the version for our surveillance dashboard to 0.8.6.