covid19tracking / covid-tracking-data Goto Github PK

License: Apache License 2.0

Python 45.27% JavaScript 32.04% Jupyter Notebook 22.69%

covid-tracking-data's Introduction

As of March 7, 2021 we are no longer collecting new data. Learn about available federal data.

COVID Tracking Data

This repository contains archives of a variety of data: backups of COVID Tracking Project data and archives of government data.

COVID Tracking Project Backups

Do not use this repository to download or display COVID Tracking Project data. Use the COVID Tracking API instead.

Hourly updated repository with CSV backups of data from the Covid Tracking API - see link for details on each field.

For information about the project and how this data is collected, see the COVID Tracking Project website and Twitter account.

COVID Data Archives

This repository also contains other archives of government COVID data that are not COVID Tracking Project datasets. See the data/ directory for a list of archived files and sources.

covid-tracking-data's People

Contributors

Stargazers

Watchers

Forkers

chadhuber jeffchenoweth ikay13 mszafraniec rustybret shalevy1 feizhe olivierlacan echo00echo kinggerm etolson71 nvossburg ghuntington caozq19 berserkerdotnet jackie-bai anandsrao saadjanjua cjdd3b kasey11 ghostiewm lelandburrill georgzurbonsen zahink dweglein jcrow06 alangwilson mgk jbscholten jeffhuth-bytecode susmithbarigidad thoo lucaskelly49 venuthelakkat trainh2o2 xiangzhou09 bradgowland radovankavicky gapdata jslappe kallenkim philipg7 soumitalahiri chall1988 kaushalzs ucladrwang reductionista mickley xfyx welindquist amiedobracki laurenperitz jmarksd mdeakyne mlinksva michelesantacatterina amyqli klschroder99 btlucking scottliu sorenbullock smike robertdigital bjbarcla shipra2011 amyyfory gharaibeh89 danawilliams-cimas motomambo jimsewall c-w-m texas42 alexgerwer adamdoescode akafle1003 guanyingdeng cenhu veenagupta06903 haolinli1997 msmayeed sekhartd quany32 subhodeep andrewm-bose urgo jasonlcrane srmarti4 behzad-vahedi sarahboufelja54 namihanspal durnovv pingchashi bmajeed92 helina96 gophillyeagles man93oj petgoat pmberkeley alireza-boloori ting912

covid-tracking-data's Issues

Why totalTestResultsIncrease can be minus

I opened the us_states_covid19_daily.csv and was confused about the minus increase of totalTestResultsIncrease

[TX] Secondary screenshot is an excel download

We'd like to add this as the secondary 'screenshot' for Texas.

URL: https://www.dshs.texas.gov/coronavirus/TexasCOVID-19HospitalizationsOverTimebyTSA.xlsx

There are several empty csv's with API redirect message

importing state 'AS' has different header order than other states.

Field descriptions

I apologize if they're around somewhere but I haven't been able to find any field descriptions. Most of them are easy enough to figure out but I'm not sure what hash, pui, or pum are.

us_daily.csv is empty

"us_daily.csv" contains a single line:

Redirecting to /api/v1/us/daily.csv

Did I miss an announcement about its demise? It's still listed here:
https://github.com/COVID19Tracking/covid-tracking-data

TN secondary screenshot is always blank

http://covid-tracking-project-data.s3-website.us-east-1.amazonaws.com/state_screenshots/TN/

[PA] Secondary screenshot is a tab on the dashboard

Secondary screenshot for PA should be the dashboard (link: https://experience.arcgis.com/experience/cfb3803eb93d42f7ab1c2cfccca78bf7) but it requires switching to the "Hospital Preparedness" tab

MA death data seems odd

Hi,

MA death data

2020-03-22 -> 5
2020-03-21 -> 51

My understanding is that all numbers are cumulative. So MA can't be right

100% positive tests

Seems like there are a few (but not insignificant) dates when the number of positive cases in state are equal to the number of total tests. This is likely an error especially for the more recent dates. If you have access to the original webpages from where this data were collected, I would be happy to extract the correct info.

Looking at dates more recent than April 1st. Here is a list of states and dates with such anamolous data:
AL ['2020-04-12', '2020-04-11', '2020-04-10', '2020-04-07']
AZ ['2020-04-05']
CA ['2020-04-13', '2020-04-06']
CO ['2020-04-11']
CT ['2020-04-08']
DE ['2020-04-12', '2020-04-08', '2020-04-06']
HI ['2020-04-07', '2020-04-03']
IA ['2020-04-14']
KS ['2020-04-08']
KY ['2020-04-11', '2020-04-09']
ME ['2020-04-14', '2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07', '2020-04-06', '2020-04-05', '2020-04-04', '2020-04-03', '2020-04-02']
NJ ['2020-04-13']
NM ['2020-04-11']
OK ['2020-04-13', '2020-04-12', '2020-04-08']
OR ['2020-04-10', '2020-04-04']
MI ['2020-04-09', '2020-04-08']
MS ['2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07', '2020-04-03']
MO ['2020-04-13']
RI ['2020-04-13', '2020-04-09']
SC ['2020-04-13', '2020-04-06', '2020-04-03']
UT ['2020-04-02']
VT ['2020-04-10']
WA ['2020-04-14', '2020-04-13', '2020-04-12', '2020-04-11', '2020-04-10', '2020-04-09', '2020-04-08', '2020-04-07']
WY ['2020-04-14', '2020-04-12']

Code snippet for the output above:

df=pd.read_csv(file, index_col=0,parse_dates=True, infer_datetime_format=True)
cut_off = datetime.date(year=2020,month=4,day=1)
for st in abbrev_list:
    st_data  = df[df['state']==st]
    ind=np.where(st_data['positiveIncrease'] == st_data['totalTestResultsIncrease'])[0]
    dates = [str(d.date()) for d in st_data.index[ind] if d > cut_off]
    if len(dates) > 0:
        print(st, dates)

Columns where Current > Cumulative

In the US historic-data:

Every entry in the onVentilatorCumulative is much smaller than the corresponding entry in onVentilatorCurrently.
inIcuCumulative < inIcuCurrently for 3/26 - 5/28.

[OR] Primary source screenshots obscured by lightbox

The primary source screenshots for OR are obscured by a lightbox. Is there any workaround for that?

Front conversations

What happened to US county tracking data?

I found this project which used to contain county data, which now links to a non existent county API download. Are counties something that is being brought back or are they depreciated? I noticed there is no mention of counties on the API homepage.

WA numbers not update for the last 4 days

What the title says, there are update on the WA dept. of health data as of 11:50pm prior day, but COVID tracking project's data is stale.

Suggestion: automate this with GitHub Actions

Is this update script automated at the moment or is it being run manually? From the commit history it looks like it might be manual.

I've run several projects similar to this using automated scripts running in Circle CI or (more recently) GitHub Actions. I would be happy to submit a pull request to configure hourly / daily scheduled task in Actions if that would be useful.

Data not updating

Hi,

Is there is a reason the google docs spreadsheet contains data as of the 29th, but the data in this repo is only as of the 27th? Ideally we'd love to be using the most current source, but the github repo is the easiest to work with from an automation perspective, so just wondering if we need to switch to the Google docs spreadsheet, or there is a plan to make sure the github repo stays current?

Thanks.

IN screenshots get confused by a dialog

On the Indiana site, it pops up a dialog asking you to subscribe to a newsletter. This confuses the screenshots, and the very bottom of the content gets cut off as a result. Currently there are no numbers we use there, but it would still be helpful to have this.

[AL] Secondary screenshot should capture tab 11 of the dashboard

Please add a click on the number 11 id="ember353" before capturing the secondary screenshot.

WA screenshot capturing new dashboard is empty

https://covidtracking.com/screenshots/WA/WA-20200507-122500.png

(reported via COVID19Tracking/issues#383)

Add a date column with date in a format readable by spreadsheets?

After importing any of the spreadsheets into Excel or Google Sheets, it's impossible for either Excel or Google Sheets to treat the date column like a date, because it's in a format that neither of those programs can recognize: 20200331 . Neither can they recognize 200331.

This makes it impossible to use a spreadsheet to produce graphs with easily-readable and understandable date labels like "3/31" without converting all 1430 rows of the date column by hand.

Latest commit `0a155b9` has several "empty" .csv's - API redirect

They refer you to the API, rather than containing the data that is meant to be mirrored. This affects the following files:

data/counties.csv
data/states_current.csv
data/states_daily_4pm_et.csv
data/states_info.csv
data/us_current.csv
data/us_daily.csv

PA Screenshots capture dashboard while it is loading

Data Dictionary?

Is there a data dictionary file describing variables? Wondering if I missed it somewhere or whether it might be in the works? Thanks.

JSON API does not return JSON

what is returned is actually not a valid JSON - it is a LIST of JSON objects.. which makes it messy for cross windows/linux app to be parsed using pd.read_json()

on windows it works easily, on linux doesnt and needs to be tweked somehow...

it would be good it the JSON API actually returned valid JSON, for instance like this:

{ "data":
[ {..}, {..} the list of jsons.. ]
}

[NC] North Carolina tertiary screenshot needs to toggle the "Adult ICU Patients" button before screenshot

We need the second chart on the page to be toggled to "Adult ICU patients" before capture. The URL to the tableau workbook is in the States tab as the tertiary screenshot for NC and I've posted a screenshot of the element inspector because I'm not sure which part is the part needed!

There are no screenshots for 4/8 in S3 for any state

http://covid-data-archive.s3-website.us-east-2.amazonaws.com/state_screenshots/

Florida did not update for 2 days ...

Florida did not update for two days, but I see now that it caught up except that the data for yesterday is zero and today is higher than the highest ever, I guess these are cumulative values, anyone knows that values for yesterday? [this is about states_daily_4pm_et.csv]

URL error

When I try to run the backup_to_s3.py, I'm getting the error. Does anybody know why?

\backup_to_s3.py", line 147, in save_url_image_to_path
raise ValueError(f'Could not retrieve URL: {data_url}')

ValueError: Could not retrieve URL: https://www.cdc.gov/covid-data-tracker/

Thank you.

[SC] primary source screenshot doesn't capture embedded dashboard

The screenshots for the SC primary source never capture the embedded arcgis dashboard at the top with the case counts because it takes forever to load. See for instance https://covidtracking.com/screenshots/SC/SC-20200914-062847.png Is there any way it could have a 30-second timer on it or something so that the dashboard loads before the screenshot is taken?

Issues with NY data

Hello, wanted to report 2 issues related to NY data:

Hospitalizations and deaths for NY on 3/23 are the same as for 3/22, looks like it did not update
I'm getting an empty array from https://covidtracking.com/api/states/daily?state=NY

Washington Daily Total Test Result not consistent with the official dashboard

On the government dashboard, the total test number is available:

but the covid19tracking dataset has total test result same as test positive result, leading to positive rate = 1!

Could you please take a look into this?

License the data under a Creative Commons Zero license

Right now this data is being licensed under the Apache license (same as the rest of the COVID19Tracking website). While the Apache license is great for software, it doesn't really make any sense for a data set (for example, there are no patent rights associated with data) and is likely incompatible with other licenses that are used for data, thus limiting people's ability to combine this data with other datasets. I would suggest licensing this specific repo under a Creative Commons license such as CC Attribution, CC0, or CC Attribution-ShareAlike. You could even dual license it under both Apache and a CC license if you wanted to (allowing the reuser to choose one license or the other).

Searchability

Screenshots api endpoint not working (404)

State Website Screenshots - https://covidtracking.com/api/v1/screenshots.json (link from https://covidtracking.com/api) yields:

{"code":404,"error":"404 - NOT FOUND","pathname":"/v1/screenshots.json"}

Screenshot for states with multiple pages or press releases

Guam among other states issues individual press releases for all of their number, which means the single screenshot we're currently taking of their homepage carries little to no information to double check our data aside from occasional numbers in titles.

We should probably setup a process for individual states where we either screenshot n + 1 page (homepage + detail page) (which is fairly common, even on dashboards and sometimes requires a click interaction to trigger because ARCGIS dashboards do not use the DOM History API as they should to update the URL on separate tabs.

We also have situations where we have a central hub that links to individual press releases that either can't be predicted (non-sequential URLs for the individual press releases) and therefore require a click on the topmost item (assuming that item is a press release relevant to COVID, which may not always be the case).

[IN] Need secondary IN screenshot for hospitalization data

We need screenshots for IN's hospitalization/ICU/recovery data:

https://www.regenstrief.org/covid-dashboard/

[VA] Need to replace old dashboard link with a new link for screenshots

On 8/27, VA switched to using a different dashboard link to update their data, but our screenshots still capture the old un-updated dashboard.

We should switch our primary screenshots to be capturing: https://www.vdh.virginia.gov/coronavirus/covid-19-in-virginia/ NOT the public tableau one.

Data error in MD data started presenting about a week ago

In the data for MD here: https://covidtracking.com/data/state/maryland#historical

(and via the API) there's a data error on Mar 16 and Mar 17 that is causing a strange loop in my graph based on the data here: https://flyingsymbols.github.io/arewebeatingcovid19/

March 21 MA death data is incorrect.

It is not 51.

UTC Dates

First off thanks for doing this.

Have you given any thought to using UTC instead of EST for the dates? Is there a limitation in the data sources that would prevent that? There are a lot of interesting data sources and examples showing up, and unfortunately many are going in different directions for things like time zones and geo granularity.

ICU and hospitalization tracking

Hey all,

I hope this is the right repo to open this issue, but I was wondering if it was at all possible to aggregate ICU or hospital usage in this data source? We are currently integrating this data into our model at neherlab.org/covid19 and would love to have access to this info.

Thanks!

[WA] Secondary screenshot is sometimes blank

https://covid-tracking-project-data.s3.us-east-1.amazonaws.com/state_screenshots/WA/WA-secondary-20200724-063157.png

IA Screenshots are sometimes missing dashboard

You history data screen shots for Iowa on Tuesday, June 2, at 6:33pm as well as reports for Wed. June 3 contain NO DATA.

For references here are links to empty screenshots
https://covidtracking.com/screenshots/IA/IA-20200602-183322.png
https://covidtracking.com/screenshots/IA/IA-20200603-183421.png

Front conversations

Current and Cumulative data doesn't make sense?

Can the Current and Cumulative data for inICU and onVentilator be reviewed/revised? Cumulative numbers in particular seem suspect ... shouldn't cumulative numbers always either increase or stay the same?

API for CSV not pulling latest data...

I could be mistaken...but when I pull data from:

https://api.covidtracking.com/v1/states/daily.csv
https://api.covidtracking.com/v1/states/current.csv

It oddly is still showing data from 20200819 vs. 0820

The data looks fresh on:
https://covidtracking.com/data/charts/all-metrics-per-state

[OK] Oklahoma Screenshots are blank

https://covid-tracking-project-data.s3.us-east-1.amazonaws.com/state_screenshots/OK/OK-20200724-062307.png

[KS] Kansas tertiary screenshot needs to click on "Testing Rates" button before capture

Before taking this screenshot, we need to click on Testing Rates button

<div class="tab-button-zone-text" data-test-id="tab-button-zone-text" style="font-family: Nunito Sans; color: rgb(255, 255, 255); font-size: 11pt; font-style: normal; font-weight: normal; text-decoration-line: none;">Testing Rates</div>

[MI] Add missing March / April screenshots

A librarian in MI has sent in supplemental screenshots of MI's dashboard. Have forwarded email with attachments.

Indiana Hospitalizations

They showed up only recently, (May 8th). They have not changed in 8 days, while all other numbers are changing. I looked at the most recent screen-shot, and I don't see where hospitalized is coming from for Indiana. The data looks like it's in error to me.

Columns in states_daily_4pm_et.csv deleted and changed order

Was it intentional to change the order of many columns in states_daily_4pm_e6.csv and delete 8 columns?

The 8 deleted columns are: dateModified, checkTimeEt, commercialScore, negativeRegularScore, negativeScore, positiveScore, score, grade.

See the attached for a complete list of the columns from 6/1 and from 6/2.

My spreadsheets weren't using the deleted columns, but the tables in them will require some reworking to accommodate the reordering, if that was intentional.