Giter VIP home page Giter VIP logo

Comments (11)

RichardMN avatar RichardMN commented on July 20, 2024 2

Hi @danielcs88 - thanks for commenting.

This package for R is about doing data cleaning. @epiforecasts has another package called epinow2 which can do the R_t calculations. I am not familiar with the guts of these calculations but I use EpiNow2 to generate municipality-by-municipality calculations for 60 areas in Lithuania. On a five year-old Mac mini that currently takes a few hours, using half the CPU and only going back about 6 weeks. If you have a machine you can run R on I'm happy to share my (rather clunky) workflow, which feeds into http://projects.martin-nielsen.ca/Graphs/COVID19-Lithuania-Municipalities.html (I should clean and share this workflow anyway...)

All to say that I think that if my PR is approved here we will stop leaning on your python processing of the Colombia data.

from covidregionaldata.

RichardMN avatar RichardMN commented on July 20, 2024 2

The key part here is in that directory (i.e., rolling-averages). Since we're not pulling data from this folder, we are fine.

Thanks. Honestly, it's been three months since I've looked at this and I would trust your judgement on this. We can close the PR without merging.

from covidregionaldata.

danielcs88 avatar danielcs88 commented on July 20, 2024 1

Just seeing this now @RichardMN, been too busy with school. I stopped running it in July because the calculations for reproductive number would fail repeatedly. Since I didn't write the code for that, I couldn't find a way to fix it, but my code to source the data and shape it into the same format as the NY Times data seems to be still running fine. Just ran it right now, and although painfully slow (the API to download the data), it still updates and formats the data fine.

I didn't try to run the Rt code though, from what I remember it would take at least an hour to run.

from covidregionaldata.

kathsherratt avatar kathsherratt commented on July 20, 2024 1

Thank you for flagging this and solving for Colombia @RichardMN !

I had a quick look at the India data source and it's definitely stopped updating altogether with no plans to return. I can't immediately find an obvious alternative either.

On that though I am quite keen to implement #406, to source subnational data from the Google API when we don't have a direct source (or as a backup if a direct source breaks). For our own use case, I think that maintaining continuity is really important, even if there's a bit less visibility on how Google source that data. So personally I would prioritise doing this first before re-instating direct sources for currently broken countries - although obviously both would be good! I will have a look at this today, help vastly appreciated as always!

(Re Rt calculations: we publish updating subnational Rt estimates here), with accompanying data repo, in case it's helpful to see estimates / other R code.)

Kath

from covidregionaldata.

RichardMN avatar RichardMN commented on July 20, 2024

I've done some looking at Colombia. We were relying on @danielcs88 code which uses python to grab a massive case list and aggregate it. This appears to have stopped working in late July.

Going further upstream, the Colombian government is making this case list available through a Socrata api. This claims that an api key is required but (thankfully) it appears that we can make simple queries to narrow the data returned.

The following reprex lets us get cases (by diagnosis date) down to level 2.

library(RSocrata)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

cases <- read.socrata("https://www.datos.gov.co/resource/gt2j-8ykr.json?$select=departamento_nom,ciudad_municipio_nom,fecha_diagnostico")

cases_aggregate <- cases %>%
  rename(department = departamento_nom,
         municipality = ciudad_municipio_nom,
         date = fecha_diagnostico) %>%
  mutate(date = as_date(dmy_hms(date))) %>%
  group_by(date, department, municipality) %>%
  summarise(count = n(), .groups = "drop") %>%
  arrange(date)

Created on 2021-11-02 by the reprex package (v2.0.1)

from covidregionaldata.

RichardMN avatar RichardMN commented on July 20, 2024

Making a checklist of countries to fix:

  • USA - turns out not to be an issue
  • Colombia - fix in #433
  • Cuba
  • India - we have not found a replacement source
  • Netherlands - fix in #446

from covidregionaldata.

RichardMN avatar RichardMN commented on July 20, 2024

Our data source for India stopped updating on 31 October. I have not found a replacement yet.

from covidregionaldata.

danielcs88 avatar danielcs88 commented on July 20, 2024

@RichardMN hopefully your PR gets approved! I wish I had proficiency in R, I understand it in a basic sense but nothing in a productive sense. If anything from what I have read and seen, R is better for what I use Python for, which is mostly data analysis.

from covidregionaldata.

github-actions avatar github-actions commented on July 20, 2024

This issue has been flagged as stale due to lack of activity

from covidregionaldata.

Bisaloo avatar Bisaloo commented on July 20, 2024

Hi @RichardMN, thanks for your continued attention to these issues.

I just reviewed your PR #431 (clean diff against current master here) and it seems to change the data source from raw data to pre-processed data (rolling-average folder), which doesn't seem like something we want.

I visited the NYT repo and it actually doesn't look like the data source we're currently using is going away. It's still updating daily. I did read the announcement in their README and I think it's worded in a confusing way:

UPDATE: The county-level data for cases and deaths that includes seven-day averages and per 100,000 counts is now available in year-based files here. The us-counties.csv file in that directory containing county data since the beginning of the pandemic has been archived and will not be updated.

The key part here is in that directory (i.e., rolling-averages). Since we're not pulling data from this folder, we are fine.

from covidregionaldata.

RichardMN avatar RichardMN commented on July 20, 2024

Tagging this in here that #446 is closed but there seems to be a (new?) data source at RIVM which looks as though it may give us hospitalization data separately: https://data.rivm.nl/covid-19/COVID-19_ziekenhuisopnames.csv

Metadata for that at https://data.rivm.nl/meta/srv/dut/catalog.search#/metadata/4f4ad069-8f24-4fe8-b2a7-533ef27a899f

I think that working through to figure out what that data is and whether/how to integrate it back in with the other streams we have can wait for 0.9.4

from covidregionaldata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.