psteinb / covid19-curve-your-city Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 5.0 16.36 MB

Extrapolation der COVID19 Fallzahlen

License: BSD 3-Clause "New" or "Revised" License

R 64.83% Makefile 6.78% Shell 28.39%

covid19-curve-your-city's People

Contributors

Stargazers

Watchers

Forkers

veroniquelisi tkphd gerbsen

covid19-curve-your-city's Issues

visualization for statistics nerds

Add a section add the bottom of the page that visualizes several goodness-of-fit measures:

chi2 statistic along the predicted values
chi2 statistic by segment since the beginning of the fit procedure

SIR-X, a more sophisticaed model

try to apply
http://rocs.hu-berlin.de/corona/docs/forecast/model/
to data for Dresden

testing documentation for untrained users or automation… missing lib: lubridate, nls2 …

as a newbe to R I try to plot myself …

result on 2nd try after already iterating installation for lubridate …

********************************************************
Note: As of version 1.0.0, cowplot does not change the                                                                    
  default ggplot2 theme anymore. To recover the previous                                                                  
  behavior, execute:                                                                                                      
  theme_set(theme_cowplot())                                 
********************************************************     
                                                                                                                          
Fehler in library(nls2) : es gibt kein Paket namens ‘nls2’                                                                
Ausführung angehalten

not sure its a big problem anywhere else, but here I got errors due to missing libraries after following the advise in the readme… so I was repeating the install procedure for those as well.

are those to be added to the list of dependencies or is that a thing on my distro (debian flavor, installed R via sudo apt install r-base r-base-dev, version I got: "R scripting front-end version 3.6.3 (2020-02-29)") only?
the above note on cowplot hints me at there might be sth to do as well, what should be done?
also I exited the R console with quit(save="default",status=0,runLast=TRUE) – maybe worth mentioning how that is supposed to be done in the documented example
after that 3 files are updated, not 2 as mentioned in 4th step of the example (+ residuals_plus7.png)
how about moving ping graphics to a separate directory?
documented filename for source data changed
maybe source data for SMS/RKI can be pulled in automatically as well…
is checking dependencies explicitly (maybe available.packages or package.dependencies) faster than just running?

and the installation took quite a time… which probably is the reason why the plots are not done by gh actions (saw you still updating manually, thats why I was trying).

Possibility to specify output filename

While (automatically) generating output for multiple cities/regions it would be extremely helpful to provide a output file name. This would also help to save the files over time.

Your dataset in CoronaWhy Dataverse

Hello,

Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 dataframe https://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/KUZ08M.

Would you be willing to help with maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for medical science community.

Can R0 be obtained from our data?

https://www.medrxiv.org/content/10.1101/2020.01.27.20018952v1.full.pdf

data by Saxon Ministery

found yet another source of data that includes cases for Dresden
https://www.coronavirus.sachsen.de/infektionsfaelle-in-sachsen-4151.html
these numbers appear to be ahead of dresden.de by 2 cases on Mar 24. Could be due to the fact, that cases on dresden.de are stamped at noon (12:00) and those by the state of saxony are time stamped at 4:30pm.

workflow actions will be disabled

@psteinb
… after 60 days

so you may wish to re-enable manually in case you want it to continue.

Obtain data automatically from dresden.de

If I saw correctly, the plot on dresden.de pulls the data from a server to visualize it.

Using the firefox dev tools, I see this:

 Request URL:https://services.arcgis.com/ORpvigFPJUhb8RDF/arcgis/rest/services/corona_DD_sicht2/FeatureServer/0/query?f=json&where=Fallzahl%20IS%20NOT%20NULL&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&resultOffset=0&resultRecordCount=2000&cacheHint=true
Request Method:GET
Remote Address:52.222.150.203:443
Status Code:
304
Version:HTTP/2
Referrer Policy:no-referrer-when-downgrade

with the following params in the query string


f | "json"
-- | --
where | "Fallzahl IS NOT NULL"
returnGeometry | "false"
spatialRel | "esriSpatialRelIntersects"
outFields | "*"
resultOffset | "0"
resultRecordCount | "2000"
cacheHint | "true"

The response payload is this:

{"objectIdFieldName":"ObjectId","uniqueIdField":{"name":"ObjectId","isSystemMaintained":true},"globalIdFieldName":"","fields":[{"name":"Datum","type":"esriFieldTypeString","alias":"Datum","sqlType":"sqlTypeNVarchar","length":2147483647,"domain":null,"defaultValue":null},{"name":"Fallzahl","type":"esriFieldTypeInteger","alias":"Fallzahl","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null},{"name":"ObjectId","type":"esriFieldTypeOID","alias":"ObjectId","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null},{"name":"Sterbefall","type":"esriFieldTypeInteger","alias":"Sterbefall","sqlType":"sqlTypeOther","domain":null,"defaultValue":null},{"name":"Genesungsfall","type":"esriFieldTypeInteger","alias":"Genesungsfall","sqlType":"sqlTypeOther","domain":null,"defaultValue":null},{"name":"Anzeige_Indikator","type":"esriFieldTypeString","alias":"Anzeige_Indikator","sqlType":"sqlTypeOther","length":10,"domain":null,"defaultValue":null}],"features":[{"attributes":{"Datum":"7.03.20","Fallzahl":2,"ObjectId":1,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"08.03.20","Fallzahl":2,"ObjectId":2,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"09.03.20","Fallzahl":2,"ObjectId":3,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"10.03.20","Fallzahl":5,"ObjectId":4,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"11.03.20","Fallzahl":5,"ObjectId":5,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"12.03.20","Fallzahl":5,"ObjectId":6,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"13.03.20","Fallzahl":12,"ObjectId":7,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"14.03.20","Fallzahl":18,"ObjectId":8,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"15.03.20","Fallzahl":18,"ObjectId":9,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"16.03.20","Fallzahl":25,"ObjectId":10,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"17.03.20","Fallzahl":35,"ObjectId":11,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"18.03.20","Fallzahl":50,"ObjectId":12,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"19.03.20","Fallzahl":60,"ObjectId":13,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"20.03.20","Fallzahl":97,"ObjectId":14,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"21.03.20","Fallzahl":115,"ObjectId":15,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"22.03.20","Fallzahl":139,"ObjectId":16,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":"x"}}]}

As we can see, the city of dresden has 2 more categories "Sterbefall" and "Genesungsfall".

At best, I need a script that automatically pulls this data and converts it to csv for Dresden!

Problem running script

If I run #22 (new data included) with the following command:

Rscript exponential.R -i data/SMS/de_sachsen_sms.csv --deLabel Sachsen --enLabel "Saxony, Germany"

I get this error. Tried this with older branch too, and got same result. :/

34  33 3681.00980    3681.00376 5417.64846 2416.10308 2020-04-04
35  34 4186.31319    4186.30526 6202.68133 2729.33440 2020-04-05
36  35 4760.98112    4760.97087 7101.46773 3083.17403 2020-04-06
[1] "tomorrow"
  day diagnosed diagnosed_sir      upr      lwr       date
1  28  1934.832      1934.831 2754.023 1313.436 2020-03-30
[1] "1 week from now"
[1] day           diagnosed     diagnosed_sir upr           lwr
[6] date
<0 Zeilen> (oder row.names mit Länge 0)
Warnmeldung:
Ignoring unknown parameters: linewidth
[1] "plotting linear and log scale"
Fehler: Aesthetics must be either length 1 or the same as the data (1): x and y
Backtrace:
     █
  1. └─cowplot::plot_grid(...)
  2.   └─cowplot::align_plots(...)
  3.     └─base::lapply(...)
  4.       └─cowplot:::FUN(X[[i]], ...)
  5.         ├─cowplot::as_gtable(x)
  6.         └─cowplot:::as_gtable.default(x)
  7.           ├─cowplot::as_grob(plot)
  8.           └─cowplot:::as_grob.ggplot(plot)
  9.             └─ggplot2::ggplotGrob(plot)
 10.               ├─ggplot2::ggplot_gtable(ggplot_build(x))
 11.               ├─ggplot2::ggplot_build(x)
 12.               └─ggplot2:::ggplot_build.ggplot(x)
 13.                 └─ggplot2:::by_layer(function(l, d) l$compute_aesthetics(d, plot))
 14.                   └─ggplot2:::f(l = layers[[i]], d = data[[i]])
 15.
Ausführung angehalten

github actions zero out Dresden data

@vv01f I see that the automated script that pulls the data for Dresden apparently zeros out all columns except diagnosed. Compare 6cd762d and 6419daa. The diff for the latter is intriguing! it changed all rows dating back to Mar 12. This might be due to changes on dresden.de. I am not sure. Could you please have a look?

Write github action to update plots daily

As @gerbsen would appreciate to have the fit results as CSV or the figure of the plot directly, it would be great to have the fit run daily (until further notice). As I hardly find the time these days, any help would be appreciated.

introduce goodness of fit

expand the core script to produce a goodness of fit figure or plot. Chi2/ndf should do.

Installation instructions fail on OS X

Under OS X

install.packages(c("ggplot2","dplyr","readr","optparse", "cowplot", "nls2"))

fails, due to missing X11.

Warnmeldung:
In doTryCatch(return(expr), name, parentenv, handler) :
  kann shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so' nicht laden:
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.6/Resources/modules/R_X11.so
  Reason: image not found

Check and validate npgeo-corona-npgeo-de.hub.arcgis.com

I've been pointed to this webseite multiple times:
https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data

I found that the numbers here don't match the numbers as described in #6 which I so far use by default. For example, going to the display by county
https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0
Downloading the spreadsheet gives a csv file with 36 columns!

$ csvstat -n RKI_Corona_Landkreise.csv
  1: OBJECTID
  2: ADE
  3: GF
  4: BSG
  5: RS
  6: AGS
  7: SDV_RS
  8: GEN
  9: BEZ
 10: IBZ
 11: BEM
 12: NBD
 13: SN_L
 14: SN_R
 15: SN_K
 16: SN_V1
 17: SN_V2
 18: SN_G
 19: FK_S3
 20: NUTS
 21: RS_0
 22: AGS_0
 23: WSK
 24: EWZ
 25: KFL
 26: DEBKG_ID
 27: Shape__Area
 28: Shape__Length
 29: death_rate
 30: cases
 31: deaths
 32: cases_per_100k
 33: cases_per_population
 34: BL
 35: BL_ID
 36: county

The metadata for this table is a bit lengthy and hard to digest:
https://www.arcgis.com/sharing/rest/content/items/917fc37a709542548cc3be077a786c17/info/metadata/metadata.xml?format=default&output=html

Digging through this I could extract:

$ csvgrep -c36 -m 'Dresden' RKI_Corona_Landkreise.csv|csvcut -c1,8,9,30-36 |csvlook
| OBJECTID  | GEN     | BEZ              | cases | deaths | cases_per_100k | cases_per_population | BL      | BL_ID | county     |
| --------- | ------- | ---------------- | ----- | ------ | -------------- | -------------------- | ------- | ----- | ---------- |
|       357 | Dresden | Kreisfreie Stadt |    99 |  False |        17.849… |               0.018… | Sachsen |    14 | SK Dresden |

So today (Mar 23, 2020) this would be 99 cases which is the number between Mar 20 97 and Mar 21 115. So for me this gives:

an inconsistent number in comparison to dresden.de #6
could be 2 days old
a date column is missing in the data!
I am wondering where to get the historic data?

42  41 117.801075964 205.698715293 62.541687037 2020-04-12
[1] "tomorrow, day 35"
  day   ydata      upr      lwr       date
1  35 41.8148 70.09736 22.91919 2020-04-06
[1] "1 week from now, day 41"
  day    ydata      upr      lwr       date
1  41 117.8011 205.6987 62.54169 2020-04-12
Warnmeldung:
Ignoring unknown parameters: linewidth
[1] "plotting linear and log scale"
Fehler: Unknown graphics device ''
Backtrace:
    █
 1. └─ggplot2::ggsave(output_name, mycanvas, width = 12, height = 6.5)
 2.   └─ggplot2:::plot_dev(device, filename, dpi = dpi)
Zusätzlich: Warnmeldungen:
1: In self$trans$transform(x) : NaNs wurden erzeugt
2: Transformation introduced infinite values in continuous y-axis
3: In self$trans$transform(x) : NaNs wurden erzeugt
4: Transformation introduced infinite values in continuous y-axis
5: In self$trans$transform(x) : NaNs wurden erzeugt
6: Transformation introduced infinite values in continuous y-axis
7: In self$trans$transform(x) : NaNs wurden erzeugt
8: Transformation introduced infinite values in continuous y-axis
9: Transformation introduced infinite values in continuous y-axis
10: Removed 11 row(s) containing missing values (geom_path).
Ausführung angehalten

fehlerhafte Skalierung de_plus7.png seit 11.4.

Die Grafik de_plus7.png (linear) zeigt am 11.4. "-1000" am 12.4. "-2000" ab.

crowd sourced COVID19 cases

numbers obtained by crowdsourcing performed at risklayer:

https://docs.google.com/spreadsheets/d/1wg-s4_Lz2Stil6spQEYFdZaBEp8nWW26gVyfHqvcl8s/edit#gid=0