Checking my regular graph generation run I see that some of my graphs are not being up

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

The key part here is in that directory (i.e., <code clas

Just seeing this now <a class="user-mention notranslate" data-hovercard-type="user" da

Thank you for flagging this and solving for Colombia <a class="user-mention notranslat

I've done some looking at Colombia. We were relying on <a class="user-mention notransl

Making a checklist of countries to fix: <li class=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Upstream data changes break our regional code - Colombia, Cuba, India, United States about covidregionaldata HOT 11 CLOSED

RichardMN commented on July 20, 2024 1

Upstream data changes break our regional code - Colombia, Cuba, India, United States

from covidregionaldata.

Comments (11)

RichardMN commented on July 20, 2024 2

Hi @danielcs88 - thanks for commenting.

This package for R is about doing data cleaning. @epiforecasts has another package called epinow2 which can do the R_t calculations. I am not familiar with the guts of these calculations but I use EpiNow2 to generate municipality-by-municipality calculations for 60 areas in Lithuania. On a five year-old Mac mini that currently takes a few hours, using half the CPU and only going back about 6 weeks. If you have a machine you can run R on I'm happy to share my (rather clunky) workflow, which feeds into http://projects.martin-nielsen.ca/Graphs/COVID19-Lithuania-Municipalities.html (I should clean and share this workflow anyway...)

All to say that I think that if my PR is approved here we will stop leaning on your python processing of the Colombia data.

from covidregionaldata.

RichardMN commented on July 20, 2024 2

The key part here is in that directory (i.e., rolling-averages). Since we're not pulling data from this folder, we are fine.

Thanks. Honestly, it's been three months since I've looked at this and I would trust your judgement on this. We can close the PR without merging.

from covidregionaldata.

danielcs88 commented on July 20, 2024 1

Just seeing this now @RichardMN, been too busy with school. I stopped running it in July because the calculations for reproductive number would fail repeatedly. Since I didn't write the code for that, I couldn't find a way to fix it, but my code to source the data and shape it into the same format as the NY Times data seems to be still running fine. Just ran it right now, and although painfully slow (the API to download the data), it still updates and formats the data fine.

I didn't try to run the Rt code though, from what I remember it would take at least an hour to run.

from covidregionaldata.

kathsherratt commented on July 20, 2024 1

Thank you for flagging this and solving for Colombia @RichardMN !

I had a quick look at the India data source and it's definitely stopped updating altogether with no plans to return. I can't immediately find an obvious alternative either.

On that though I am quite keen to implement #406, to source subnational data from the Google API when we don't have a direct source (or as a backup if a direct source breaks). For our own use case, I think that maintaining continuity is really important, even if there's a bit less visibility on how Google source that data. So personally I would prioritise doing this first before re-instating direct sources for currently broken countries - although obviously both would be good! I will have a look at this today, help vastly appreciated as always!

(Re Rt calculations: we publish updating subnational Rt estimates here), with accompanying data repo, in case it's helpful to see estimates / other R code.)

Kath

from covidregionaldata.

RichardMN commented on July 20, 2024

I've done some looking at Colombia. We were relying on @danielcs88 code which uses python to grab a massive case list and aggregate it. This appears to have stopped working in late July.

Going further upstream, the Colombian government is making this case list available through a Socrata api. This claims that an api key is required but (thankfully) it appears that we can make simple queries to narrow the data returned.

The following reprex lets us get cases (by diagnosis date) down to level 2.

library(RSocrata)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

cases <- read.socrata("https://www.datos.gov.co/resource/gt2j-8ykr.json?$select=departamento_nom,ciudad_municipio_nom,fecha_diagnostico")

cases_aggregate <- cases %>%
  rename(department = departamento_nom,
         municipality = ciudad_municipio_nom,
         date = fecha_diagnostico) %>%
  mutate(date = as_date(dmy_hms(date))) %>%
  group_by(date, department, municipality) %>%
  summarise(count = n(), .groups = "drop") %>%
  arrange(date)

^{Created on 2021-11-02 by the reprex package (v2.0.1)}

from covidregionaldata.

RichardMN commented on July 20, 2024

Making a checklist of countries to fix:

from covidregionaldata.

RichardMN commented on July 20, 2024

Our data source for India stopped updating on 31 October. I have not found a replacement yet.

from covidregionaldata.

danielcs88 commented on July 20, 2024

@RichardMN hopefully your PR gets approved! I wish I had proficiency in R, I understand it in a basic sense but nothing in a productive sense. If anything from what I have read and seen, R is better for what I use Python for, which is mostly data analysis.

from covidregionaldata.

github-actions commented on July 20, 2024

This issue has been flagged as stale due to lack of activity

from covidregionaldata.

Bisaloo commented on July 20, 2024

Hi @RichardMN, thanks for your continued attention to these issues.

I just reviewed your PR #431 (clean diff against current master here) and it seems to change the data source from raw data to pre-processed data (rolling-average folder), which doesn't seem like something we want.

I visited the NYT repo and it actually doesn't look like the data source we're currently using is going away. It's still updating daily. I did read the announcement in their README and I think it's worded in a confusing way:

UPDATE: The county-level data for cases and deaths that includes seven-day averages and per 100,000 counts is now available in year-based files here. The us-counties.csv file in that directory containing county data since the beginning of the pandemic has been archived and will not be updated.

The key part here is in that directory (i.e., rolling-averages). Since we're not pulling data from this folder, we are fine.

from covidregionaldata.

RichardMN commented on July 20, 2024

Tagging this in here that #446 is closed but there seems to be a (new?) data source at RIVM which looks as though it may give us hospitalization data separately: https://data.rivm.nl/covid-19/COVID-19_ziekenhuisopnames.csv

Metadata for that at https://data.rivm.nl/meta/srv/dut/catalog.search#/metadata/4f4ad069-8f24-4fe8-b2a7-533ef27a899f

I think that working through to figure out what that data is and whether/how to integrate it back in with the other streams we have can wait for 0.9.4

from covidregionaldata.

Upstream data changes break our regional code - Colombia, Cuba, India, United States about covidregionaldata HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent