Giter VIP home page Giter VIP logo

covid-19-data's Introduction

COVID-19 Dataset by Our World in Data

Website shields.io Data documentation Open Source Love svg3

📢 Find our data on COVID-19 and its documentation in public/data!


Project structure

The project contains two independent directories:

  • public/data: Contains the final datasets. This is for people interested in consuming the data and understanding all the caveats about it and its metrics.
  • scripts: Contains all the code and intermediate files to produce the final dataset. This is for people interested in contributing to the project or better understanding our internal technical processes.

Documentation

If you are interested in the final dataset file, refer to this document. If you want to learn more about our processes, refer to our technical documentation.

Contribute

Thanks for considering contributing to this project! A good place to start is our contribution guideline.

covid-19-data's People

Contributors

3dgiordano avatar aywi avatar bdaniel88 avatar bnjmacdonald avatar breck7 avatar camappel avatar cgiattino avatar covid19owid avatar damiantaranto avatar danielgavrilov avatar davidc8 avatar dependabot[bot] avatar edomt avatar fqj1994 avatar gugod avatar hannahritchie avatar jozhuatwx avatar kokes avatar lucasrodes avatar marcelgerber avatar marigold avatar menoua avatar minyoh avatar nipund avatar owidbot avatar petchumpriwan avatar phuclygia avatar shmugoh avatar valentinmouret avatar waddey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19-data's Issues

Negative new cases

I would like to know what the meaning is for negative figures in the new cases column. May it be a mistake or does it mean something else?
As an example, the new cases figure for Spain on April 19th is -1430.

Adding Active Cases

There are numerous stats on cases and deaths, but why not active cases? It makes it useful with the active cases data to show which countries are improving/passed their peak.

SPAIN 4302020

It looks like the Spain data for 4/30/2020 is missing. Is it available?

Serbia data mismatch on 4/13/2020

Serbia data mismatch on 4/13/2020
I am using your data on https://mda-covid-19.appspot.com/ and I spot mismatch on 4/13/2020 and after in your full_data.csv file. According to your file:

date location new_cases new_deaths total_cases total_deaths
2020-04-13 Serbia 250 6 3630 80
2020-04-14 Serbia 0 0 3630 80
2020-04-15 Serbia 424 4 4054 84
2020-04-16 Serbia 819 15 4873 99

According to Serbian Government data should be:

date location new_cases new_deaths total_cases total_deaths
2020-04-13 Serbia 250 6 3630 80
2020-04-14 Serbia 424 5 4054 85
2020-04-15 Serbia 411 9 4465 94
2020-04-16 Serbia 408 5 4873 99

Please fix.
Thank you very much

Request for daily or moving average of Number of COVID-19 tests per confirmed case

The data for the (excellent) Number of COVID-19 tests per confirmed case is averaged since the "beginning of records" (#45).

As time goes on, if a country has an outbreak in cases, this average will take a long time to move (#46). Whilst this data is valuable, another version of the data that shows a daily or 7-day moving average over the data has the opportunity to inform debate and represent the situation in a timely and responsive manner.

Would it be possible to have a graph showing this daily or 7-day average? Or indeed extend the existing graph with a second slider to select the max days to average over?

I appreciate this is the data repo so not the right place for this suggestion. If you could direct me to the correct place for these suggestions I'd be grateful. Thank you very much for your hard work, excellent resource and your consideration of this suggestion.

make US states data available

The full covid-19 dataset is aggregated at the country level. I would like to access the data at the state level in the U.S. as well. Can these data please be made available for download either as additional cases in the full dataset, or as a separate file?

Chronological data?

Hi, It would be super useful to have historical data over time in addition to the current cumulative number plus daily change. Not sure if you have that data or not but it would be very powerful. Thanks!

Vietnam change its testing data location

On 12 April, the website on coronavirus by Vietnamese MoH was updated, making the number of tests no longer appear when first loading.
Actually they are still providing this number, just not very obvious. We'll have to click the "Infographic Việt Nam" (red button) lower in the page to see the testing number. As of morning 15 April, this has increased to 132,771 people tested.
The source of the image is here (it's also by the MoH, not sure why they post it on a second website)

Population of Guernsey

Not sure why this value is blank in the data. Gugu suggests the value of 67,052 from the CIA world factbook.

I found a value of 97,857 for Jersey.

The first case and death in input is not reflected in new_deaths and new_cases in full_data.csv

Examples:

  • Philipines, 2020-02-02

From deaths.csv: 1 new death

Entry in full_data.csv:

date location new_cases new_deaths total_cases total_deaths
2020-02-02 Philippines 1   2 1
  • Singapore 2020-01-24

From cases.csv: 1 new case

Entry in full_data.csv:

date location new_cases new_deaths total_cases total_deaths
2020-01-24 Singapore     1  
  • Vietnam 2020-01-24

From cases.csv: 2 new cases

Entry in full_data.csv:

date location new_cases new_deaths total_cases total_deaths
2020-01-24 Vietnam     2  

Portugal Cases Tested Source

Hi @edomt ,
I am from Portugal and I am building a Dashboard in PowerBI about this Pandemic and I am interesting in show a chart with Total Cases vs Total Tests in Portugal but I couldn't find that information in the Github source that I am using DSSG-PT but I found that your csv data "covid-testing-latest-data-source-details.csv" contains tests for Portugal and you refer as source the same source where I am retrieving the data for Portugal "Portugal - cases tested 2020-04-21 https://github.com/dssg-pt/covid19pt-data/blob/master/data.csv" but I couldn't find that info in that csv. Could you please help me?

Best regards,
Fábio Barnabé

Germany has delta 0 on 2020-04-03 in ECDC source data and on OWID

Hi,

first of all, thanks for your work and making everything available to the public!

For 2020-04-03, ECDC reports new cases and new deaths for Germany as 0,
on https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/. This may
or may not be a problem. At least for the following day (2020-04-04), they report
such a delta that it's 79696 total cases, 1017 total deaths on OWID and, currently,
on their website.
This corresponds to WHO Situation Report 74, which is for 2020-04-03, the day before!
For 2020-04-04, the WHO has Situation Report 75,
which has 85778 total cases, 1158 total deaths, which is exactly
what the German Robert Koch Institut (RKI) is (currently) reporting
(on their website).

For other locations, ECDC and WHO SR 75 are exactly in line
(Italy, Spain, Switzerland, Turkey, Belgium, Netherlands, possibly more..)
or are largely similar (UK off by 4 total cases, total deaths exact in line);
ok, some seem to differ (France, China, ...). But Germany has numbers
exactly as in SR 74. So, effectively a delay of 1 day was introduced.

This should not be due to RKI data being updated too late.
(Website linked above says data from 2020-04-04 00:00 (CEST),
updated on the web page at 10:10 (CEST, so 9:10 CET).
WHO and ECDC both say their data is from 10:00 CET,
so this should be in time.)
But somehow the official RKI data about Germany get into WHO reports,
but not into ECDC (timely), it seems.

Is this something that ought to be fixed? (I haven't tried to contact ECDC.)
Or is this something expected / everything ok?

Thanks,
Fabian

Incorrect UK daily testing numbers

The UK daily testing numbers cannot simply be derived from the cumulative totals (by finding the difference from the previous day's total), since the total incorporates revisions that may not just apply to the previous day.

For example, on 25 April the notes say:

The difference between the cumulative numbers from today and yesterday for people tested is 50,499 higher than the daily increase figure. Cumulative testing figures include 50,499 retrospective reports of people who tested negative between 31 January and 24 April.

The page from which the testing numbers are collected (https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public#number-of-cases-and-deaths), includes daily totals, so they can be directly added to the data in this repository.

I have been maintaining a repository of the raw HTML for the source page, collected every day (see coronavirus-covid-19-number-of-cases-in-uk-*.html in https://github.com/tomwhite/covid-19-uk-data/tree/master/data/raw), which can be used to get the corrected numbers for daily people tested.

I pulled out the correct figures for the dates that need fixing here:

Date                  DailyPeopleTested
2020-04-08            12959
2020-04-10            13543
2020-04-13            10745
2020-04-20            14106
2020-04-25            23115

Historical data expanation request

I pulled the history of the updates to the data file public/data/owid-covid-data.csv and compiled them all into one large file so I can compare changes in counts for a given date as newer reports come in (like I did for the NYC data - https://www.linkedin.com/feed/update/urn:li:activity:6658517451480805376/). The results are in the file public/data/history.csv in my fork of your repository (https://github.com/hjstein/covid-19-data).

In looking at the United States data, I noticed that the reports all have the same number for each date. For example, the data shows 30,613 new_cases for 4/8 in every update from 4/16 through the present. I'm surprised by this because when I did the same for the NYC data, I found the reported new cases for 4/8 kept increasing as newer reports came in (data and analysis available at https://github.com/hjstein/coronavirus-data). So, I would have thought if the counts for NYC for 4/8 aren't fully known until about 3 weeks later, then the totals here for the USA should be getting revised in later reports as well.

So, my question is, how is the new_cases count being calculated, and where is the data coming from?

Thanks.

Clarify data range for Number of COVID-19 tests per confirmed case

Regarding #45 I think it would reduce confusion to add a foot note saying that data is "always averaged from the beginning of records" or some wording like that. When you take a smaller range of time the title says for example:

Number of COVID-19 tests per confirmed case, May 6, 2020 to May 7, 2020

But the data is actually an average from a larger range.

Thank you again for this great resource and for your consideration of this proposed change.

p.s. I just released this is the data repo so not the right place for this suggestion. If you could direct me to the correct place for these suggestions I'd be grateful. Thank you.

broken link

The link to "Data on COVID-19 maintained by Our World in Data" in the title seems to be broken.

Recovered and active cases

It will be useful to get graph of people who are currently infected. There are stats on deaths and cases, but unfortunately, no data on those who recovered.

Data on chart not showing properly for single dates data like Singapore

Hi,

I noticed that for testing reports, For countries where you only have certain dates like Singapore 7 April 2020. When you add the country in this case Singapore, it will only be shown on the chart if you adjust the end date to on or before 7 April. If it is left at the current like 14 April as of today, Singapore will not appear on the chart. I also noticed the same for a few other countries but do you recall which ones now.

Weather?

Hi,

Can you add monthly average temperature and humidity for each country for the given month?

Many thanks,

CFR?

Hello,

Where can I get CFR figures per geographic region?

Many thanks,
Dan.

Columns instead of Rows for reporting historical data by County, State

Can you modify the script to report the cases in columns instead of rows for historical data? It makes it easier to work with for data extraction. Transposing them in Excel or other software is extremely time-consuming for data analytics and this will save a considerable amount of time when working with the dataset with other tools.

Obj definition

Hi
thanks for the work, could you help on defining your keys?

An example:

"iso_code": "ITA",
"location": "Italy",
"date": "2020-01-05",
"total_cases": 0,
"new_cases": 0,
"total_deaths": 0,
"new_deaths": 0,
"total_cases_per_million": 0,
"new_cases_per_million": 0,
"total_deaths_per_million": 0,
"new_deaths_per_million": 0,
"total_tests": null,
"new_tests": null,
"total_tests_per_thousand": null,
"new_tests_per_thousand": null,
"tests_units": "",
"population": 60461828,
"population_density": 205.859,
"median_age": 47.9,
"aged_65_older": 23.021,
"aged_70_older": 16.24,
"gdp_per_capita": 35220.084,
"extreme_poverty": 2,
"cvd_death_rate": 113.15100000000001,
"diabetes_prevalence": 4.78,
"female_smokers": 19.8,
"male_smokers": 27.8,
"handwashing_facilities": null,
"hospital_beds_per_100k": 3.18

What do the following means?

tests_units

"extreme_poverty": 2, (what is 2?)

"handwashing_facilities": null, What is it? What do you mean?

"hospital_beds_per_100k": 3.18 This can be dangerous data if we don't say what beds for what hospital division

"gdp_per_capita": 35220.084, what is the value, million, thousand?

"cvd_death_rate": 113.15100000000001, what value is this rate?

Basically we'd need some definition on all the keys to better understand and work with this data

Thanks a lot

Lebanon Data Offset by One

Lebanon data is offset by one day in Owid. The first case was reported on Feb 21, and not Feb 22. This is shifting the plots and is specifically impacting the current day, which shows the data from the previous day...

Test Counts?

I was under the impression from your web site that the dataset included a Test Count column by date for each location. I can't find test counts anywhere in the downloaded data. Can you direct me to it please?

ISO3 in testing data

Hi, thanks so much for your work, it's an excellent comprehensive resource.

The testing data (csv) only lists countries by name - is it possible for you to add ISO3 codes to this file? It would make joining with other data much simpler for users!

Thanks :-)

Spain missing 1 day vs. other countries

Spain data is missing 1 day vs. all other countries. The issue started about 3 days ago, was briefly fixed, and now is back. Currently for example, all countries have data up until April 30, while Spain only has data up until April 29. This is an issue for dashboards that rely on owid data

/covid-19-data/blob/master/public/data/testing/covid-testing.xlsx

Hi,

I'm trying to import this excel with the URL "raw format" (https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/testing/covid-testing.xlsx) on my PowerBI Desktop and it throws an error of data file format.

If I try to import the file locally it says the same, but if I try to "Save as" the file in a new file with the same data, it works.

It seems that the file haves some invalid format, can you try to save the file again and upload again please?

Maintain selected countries across graphs

Feature:

Suggest to implement the following:

  • 1. The set of Countries selected is kept across all graphs.
    • This is the default.
  • 2. Allow a mode to split - to allow the user to one wants to split it back to One way to
    • Disabled by default.
    • Consider if to implement and when. (less useful and less important than no. 1)

Implementation:

Could be storing states on cookies, or url redirects/parameters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.