Giter VIP home page Giter VIP logo

covid-tracking-data's Introduction

As of March 7, 2021 we are no longer collecting new dataLearn about available federal data.


COVID Tracking Data

This repository contains archives of a variety of data: backups of COVID Tracking Project data and archives of government data.


COVID Tracking Project Backups

Do not use this repository to download or display COVID Tracking Project data. Use the COVID Tracking API instead.

Hourly updated repository with CSV backups of data from the Covid Tracking API - see link for details on each field.

For information about the project and how this data is collected, see the COVID Tracking Project website and Twitter account.

COVID Data Archives

This repository also contains other archives of government COVID data that are not COVID Tracking Project datasets. See the data/ directory for a list of archived files and sources.

covid-tracking-data's People

Contributors

actions-user avatar carllcchen avatar dependabot[bot] avatar gilmourj avatar gitjeff05 avatar hmhoffman avatar jalbertbowden avatar jasonlcrane avatar jesseandersonumd avatar joshzarrabi avatar julia326 avatar kevee avatar muamichali avatar schwartzadev avatar smike avatar space-buzzer avatar the-daniel-lin avatar theomichel avatar zachlipton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-tracking-data's Issues

importing state 'AS' has different header order than other states.

=IMPORTDATA("http://covidtracking.com/api/states/daily.csv?state=AS") gives:
date | state | positive | negative | hash | dateChecked | total | totalTestResults | posNeg | fips | deathIncrease | hospitalizedIncrease | negativeIncrease | positiveIncrease | totalTestResultsIncrease | pending | hospitalizedCurrently | hospitalizedCumulative | inIcuCurrently | inIcuCumulative | onVentilatorCurrently | onVentilatorCumulative | recovered | death | hospitalized |  

All other states give this header order:
=IMPORTDATA("http://covidtracking.com/api/states/daily.csv?state=AK") gives:
date | state | positive | negative | pending | hospitalizedCurrently | hospitalizedCumulative | inIcuCurrently | inIcuCumulative | onVentilatorCurrently | onVentilatorCumulative | recovered | hash | dateChecked | death | hospitalized | total | totalTestResults | posNeg | fips | deathIncrease | hospitalizedIncrease | negativeIncrease | positiveIncrease | totalTestResultsIncrease |  

Field descriptions

I apologize if they're around somewhere but I haven't been able to find any field descriptions. Most of them are easy enough to figure out but I'm not sure what hash, pui, or pum are.

MA death data seems odd

Hi,

MA death data

2020-03-22 -> 5
2020-03-21 -> 51

My understanding is that all numbers are cumulative. So MA can't be right

100% positive tests

Seems like there are a few (but not insignificant) dates when the number of positive cases in state are equal to the number of total tests. This is likely an error especially for the more recent dates. If you have access to the original webpages from where this data were collected, I would be happy to extract the correct info.

Looking at dates more recent than April 1st. Here is a list of states and dates with such anamolous data:
AL ['2020-04-12', '2020-04-11', '2020-04-10', '2020-04-07']
AZ ['2020-04-05']
CA ['2020-04-13', '2020-04-06']
CO ['2020-04-11']
CT ['2020-04-08']
DE ['2020-04-12', '2020-04-08', '2020-04-06']
HI ['2020-04-07', '2020-04-03']
IA ['2020-04-14']
KS ['2020-04-08']
KY ['2020-04-11', '2020-04-09']
ME ['2020-04-14', '2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07', '2020-04-06', '2020-04-05', '2020-04-04', '2020-04-03', '2020-04-02']
NJ ['2020-04-13']
NM ['2020-04-11']
OK ['2020-04-13', '2020-04-12', '2020-04-08']
OR ['2020-04-10', '2020-04-04']
MI ['2020-04-09', '2020-04-08']
MS ['2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07', '2020-04-03']
MO ['2020-04-13']
RI ['2020-04-13', '2020-04-09']
SC ['2020-04-13', '2020-04-06', '2020-04-03']
UT ['2020-04-02']
VT ['2020-04-10']
WA ['2020-04-14', '2020-04-13', '2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07']
WY ['2020-04-14', '2020-04-12']

Code snippet for the output above:

df=pd.read_csv(file, index_col=0,parse_dates=True, infer_datetime_format=True)
cut_off = datetime.date(year=2020,month=4,day=1)
for st in abbrev_list:
    st_data  = df[df['state']==st]
    ind=np.where(st_data['positiveIncrease'] == st_data['totalTestResultsIncrease'])[0]
    dates = [str(d.date()) for d in st_data.index[ind] if d > cut_off]
    if len(dates) > 0:
        print(st, dates)

Columns where Current > Cumulative

In the US historic-data:

  • Every entry in the onVentilatorCumulative is much smaller than the corresponding entry in onVentilatorCurrently.
  • inIcuCumulative < inIcuCurrently for 3/26 - 5/28.

What happened to US county tracking data?

I found this project which used to contain county data, which now links to a non existent county API download. Are counties something that is being brought back or are they depreciated? I noticed there is no mention of counties on the API homepage.

Suggestion: automate this with GitHub Actions

Is this update script automated at the moment or is it being run manually? From the commit history it looks like it might be manual.

I've run several projects similar to this using automated scripts running in Circle CI or (more recently) GitHub Actions. I would be happy to submit a pull request to configure hourly / daily scheduled task in Actions if that would be useful.

Data not updating

Hi,

Is there is a reason the google docs spreadsheet contains data as of the 29th, but the data in this repo is only as of the 27th? Ideally we'd love to be using the most current source, but the github repo is the easiest to work with from an automation perspective, so just wondering if we need to switch to the Google docs spreadsheet, or there is a plan to make sure the github repo stays current?

Thanks.

IN screenshots get confused by a dialog

On the Indiana site, it pops up a dialog asking you to subscribe to a newsletter. This confuses the screenshots, and the very bottom of the content gets cut off as a result. Currently there are no numbers we use there, but it would still be helpful to have this.

Add a date column with date in a format readable by spreadsheets?

After importing any of the spreadsheets into Excel or Google Sheets, it's impossible for either Excel or Google Sheets to treat the date column like a date, because it's in a format that neither of those programs can recognize: 20200331 . Neither can they recognize 200331.

This makes it impossible to use a spreadsheet to produce graphs with easily-readable and understandable date labels like "3/31" without converting all 1430 rows of the date column by hand.

Data Dictionary?

Is there a data dictionary file describing variables? Wondering if I missed it somewhere or whether it might be in the works? Thanks.

JSON API does not return JSON

what is returned is actually not a valid JSON - it is a LIST of JSON objects.. which makes it messy for cross windows/linux app to be parsed using pd.read_json()

on windows it works easily, on linux doesnt and needs to be tweked somehow...

it would be good it the JSON API actually returned valid JSON, for instance like this:

{ "data":
[ {..}, {..} the list of jsons.. ]
}

Florida did not update for 2 days ...

Florida did not update for two days, but I see now that it caught up except that the data for yesterday is zero and today is higher than the highest ever, I guess these are cumulative values, anyone knows that values for yesterday? [this is about states_daily_4pm_et.csv]

URL error

When I try to run the backup_to_s3.py, I'm getting the error. Does anybody know why?

\backup_to_s3.py", line 147, in save_url_image_to_path
raise ValueError(f'Could not retrieve URL: {data_url}')

ValueError: Could not retrieve URL: https://www.cdc.gov/covid-data-tracker/

Thank you.

License the data under a Creative Commons Zero license

Right now this data is being licensed under the Apache license (same as the rest of the COVID19Tracking website). While the Apache license is great for software, it doesn't really make any sense for a data set (for example, there are no patent rights associated with data) and is likely incompatible with other licenses that are used for data, thus limiting people's ability to combine this data with other datasets. I would suggest licensing this specific repo under a Creative Commons license such as CC Attribution, CC0, or CC Attribution-ShareAlike. You could even dual license it under both Apache and a CC license if you wanted to (allowing the reuser to choose one license or the other).

Screenshot for states with multiple pages or press releases

Guam among other states issues individual press releases for all of their number, which means the single screenshot we're currently taking of their homepage carries little to no information to double check our data aside from occasional numbers in titles.

We should probably setup a process for individual states where we either screenshot n + 1 page (homepage + detail page) (which is fairly common, even on dashboards and sometimes requires a click interaction to trigger because ARCGIS dashboards do not use the DOM History API as they should to update the URL on separate tabs.

We also have situations where we have a central hub that links to individual press releases that either can't be predicted (non-sequential URLs for the individual press releases) and therefore require a click on the topmost item (assuming that item is a press release relevant to COVID, which may not always be the case).

UTC Dates

First off thanks for doing this.

Have you given any thought to using UTC instead of EST for the dates? Is there a limitation in the data sources that would prevent that? There are a lot of interesting data sources and examples showing up, and unfortunately many are going in different directions for things like time zones and geo granularity.

ICU and hospitalization tracking

Hey all,

I hope this is the right repo to open this issue, but I was wondering if it was at all possible to aggregate ICU or hospital usage in this data source? We are currently integrating this data into our model at neherlab.org/covid19 and would love to have access to this info.

Thanks!

Current and Cumulative data doesn't make sense?

Can the Current and Cumulative data for inICU and onVentilator be reviewed/revised? Cumulative numbers in particular seem suspect ... shouldn't cumulative numbers always either increase or stay the same?

Indiana Hospitalizations

They showed up only recently, (May 8th). They have not changed in 8 days, while all other numbers are changing. I looked at the most recent screen-shot, and I don't see where hospitalized is coming from for Indiana. The data looks like it's in error to me.

Columns in states_daily_4pm_et.csv deleted and changed order

Was it intentional to change the order of many columns in states_daily_4pm_e6.csv and delete 8 columns?

The 8 deleted columns are: dateModified, checkTimeEt, commercialScore, negativeRegularScore, negativeScore, positiveScore, score, grade.

See the attached for a complete list of the columns from 6/1 and from 6/2.

My spreadsheets weren't using the deleted columns, but the tables in them will require some reworking to accommodate the reordering, if that was intentional.

Untitled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.