Comments (11)
Hi @danielcs88 - thanks for commenting.
This package for R is about doing data cleaning. @epiforecasts has another package called epinow2 which can do the R_t calculations. I am not familiar with the guts of these calculations but I use EpiNow2 to generate municipality-by-municipality calculations for 60 areas in Lithuania. On a five year-old Mac mini that currently takes a few hours, using half the CPU and only going back about 6 weeks. If you have a machine you can run R on I'm happy to share my (rather clunky) workflow, which feeds into http://projects.martin-nielsen.ca/Graphs/COVID19-Lithuania-Municipalities.html (I should clean and share this workflow anyway...)
All to say that I think that if my PR is approved here we will stop leaning on your python processing of the Colombia data.
from covidregionaldata.
The key part here is in that directory (i.e.,
rolling-averages
). Since we're not pulling data from this folder, we are fine.
Thanks. Honestly, it's been three months since I've looked at this and I would trust your judgement on this. We can close the PR without merging.
from covidregionaldata.
Just seeing this now @RichardMN, been too busy with school. I stopped running it in July because the calculations for reproductive number would fail repeatedly. Since I didn't write the code for that, I couldn't find a way to fix it, but my code to source the data and shape it into the same format as the NY Times data seems to be still running fine. Just ran it right now, and although painfully slow (the API to download the data), it still updates and formats the data fine.
I didn't try to run the Rt code though, from what I remember it would take at least an hour to run.
from covidregionaldata.
Thank you for flagging this and solving for Colombia @RichardMN !
I had a quick look at the India data source and it's definitely stopped updating altogether with no plans to return. I can't immediately find an obvious alternative either.
On that though I am quite keen to implement #406, to source subnational data from the Google API when we don't have a direct source (or as a backup if a direct source breaks). For our own use case, I think that maintaining continuity is really important, even if there's a bit less visibility on how Google source that data. So personally I would prioritise doing this first before re-instating direct sources for currently broken countries - although obviously both would be good! I will have a look at this today, help vastly appreciated as always!
(Re Rt calculations: we publish updating subnational Rt estimates here), with accompanying data repo, in case it's helpful to see estimates / other R code.)
Kath
from covidregionaldata.
I've done some looking at Colombia. We were relying on @danielcs88 code which uses python to grab a massive case list and aggregate it. This appears to have stopped working in late July.
Going further upstream, the Colombian government is making this case list available through a Socrata api. This claims that an api key is required but (thankfully) it appears that we can make simple queries to narrow the data returned.
The following reprex lets us get cases (by diagnosis date) down to level 2.
library(RSocrata)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
cases <- read.socrata("https://www.datos.gov.co/resource/gt2j-8ykr.json?$select=departamento_nom,ciudad_municipio_nom,fecha_diagnostico")
cases_aggregate <- cases %>%
rename(department = departamento_nom,
municipality = ciudad_municipio_nom,
date = fecha_diagnostico) %>%
mutate(date = as_date(dmy_hms(date))) %>%
group_by(date, department, municipality) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(date)
Created on 2021-11-02 by the reprex package (v2.0.1)
from covidregionaldata.
Making a checklist of countries to fix:
- USA - turns out not to be an issue
- Colombia - fix in #433
- Cuba
- India - we have not found a replacement source
- Netherlands - fix in #446
from covidregionaldata.
Our data source for India stopped updating on 31 October. I have not found a replacement yet.
from covidregionaldata.
@RichardMN hopefully your PR gets approved! I wish I had proficiency in R, I understand it in a basic sense but nothing in a productive sense. If anything from what I have read and seen, R is better for what I use Python for, which is mostly data analysis.
from covidregionaldata.
This issue has been flagged as stale due to lack of activity
from covidregionaldata.
Hi @RichardMN, thanks for your continued attention to these issues.
I just reviewed your PR #431 (clean diff against current master
here) and it seems to change the data source from raw data to pre-processed data (rolling-average
folder), which doesn't seem like something we want.
I visited the NYT repo and it actually doesn't look like the data source we're currently using is going away. It's still updating daily. I did read the announcement in their README
and I think it's worded in a confusing way:
UPDATE: The county-level data for cases and deaths that includes seven-day averages and per 100,000 counts is now available in year-based files here. The us-counties.csv file in that directory containing county data since the beginning of the pandemic has been archived and will not be updated.
The key part here is in that directory (i.e., rolling-averages
). Since we're not pulling data from this folder, we are fine.
from covidregionaldata.
Tagging this in here that #446 is closed but there seems to be a (new?) data source at RIVM which looks as though it may give us hospitalization data separately: https://data.rivm.nl/covid-19/COVID-19_ziekenhuisopnames.csv
Metadata for that at https://data.rivm.nl/meta/srv/dut/catalog.search#/metadata/4f4ad069-8f24-4fe8-b2a7-533ef27a899f
I think that working through to figure out what that data is and whether/how to integrate it back in with the other streams we have can wait for 0.9.4
from covidregionaldata.
Related Issues (20)
- Add tests for download_JSON and JSON_reader HOT 2
- Add memoise support for download_JSON HOT 1
- Update package logo with new datasets HOT 1
- Review depreciated features HOT 2
- HTTP error 502 when downloading Vietnam's json data HOT 6
- Check if some required packages could be made suggested HOT 4
- Switch to preferably pkgdown theme HOT 3
- Giant package logo in web docs HOT 6
- Vietnam handles province labels badly, possible str_conv issue
- We've also made a fix to `complete()` to ensure that it always works as expected with grouped data frames. One of the results of this fix is that you can no longer supply group variables to `complete()` (if you have a grouped data frame, `complete()` will work "within" each group so you shouldn't have access to them). See https://github.com/tidyverse/tidyr/pull/1300 for more details. HOT 2
- Warnings from countrycode
- Vietnam data timing out HOT 1
- Colombia failing download tests
- Italian COVID-19 Integrated Surveillance Data HOT 19
- Run country-specific tests when relevant files are modified HOT 2
- France has moved their data - this breaks slowstart
- France has moved their data - we need to use a new upstream source HOT 1
- covidregionaldata archived on CRAN HOT 3
- South Africa new reported cases around higher than Our World In Data/WHO HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covidregionaldata.