global-policy-lab / gpl-covid Goto Github PK
View Code? Open in Web Editor NEWRepo for code and small datasets related to Global Policy Lab's COVID-19 policy analysis. Read and share the acompanying article here:
Home Page: https://rdcu.be/b4Iyo
Repo for code and small datasets related to Global Policy Lab's COVID-19 policy analysis. Read and share the acompanying article here:
Home Page: https://rdcu.be/b4Iyo
Do we sum? Do we take the max? See convo on slack. From @estherrolf:
[I’m gonna use US specific words so like state=adm1] specifically,
(a) for non-popweighted variables, how do we aggregate state level contributions from county or city policies with differing policy_intensities? Like if a city has a policy with intensity 1 and a state level policy has intensity 0.2, is the state level (non-popweighted) intensity meant to be max(0.2,1) = 1 ?
(b) for popweighted values, are we counting state level contributions as min( sum_over_policies{percent_state_pop_effected * policy_intensity} , 1) ? If we do this, we would i think have to account for all the overlap in the percent_state_pop_effected and somewhoe also the intensities of the policies that overlap — seems prone to errors and likely that we’re not doing it the same way for each country.
basically I’m asking for us to come to a consensus on how to use policy_intensity (I know you guys have all thought more about this than me, but as the person trying to implement the merge there’s a lot of complications arising). It’s possible theres a straightforward way to use it, it’s also possible that it’s not worth the trouble to account for fractional intensities (which maybe avoids encoding our judments on what fraction to give?).
The code to generate each of the following figures should be in the run
script and referenced in the readme:
We should pull from cutoff_dates to know when to stop ED fig 2, otherwise the figure and the source data will be different every day as new data is available.
We should also add a no-download flag so that in tests we can avoid downloading new data in tests in case new data alters previously-reported values that occur prior to the cutoff date.
Most of the Figure / Table code is not running right now. I think it just requires a few tweaks w/ variable names and (for fig A2), reading from the already downloaded JHU data rather than trying to access the now broken old JHU links.
I think Fig 4 is still working, but not the extra script that calculates data for the paper that's related to figure 4
Warning messages from running codes/models/run_all_CB_simulations.R
Warning messages:
1: In chol.default(mat, pivot = TRUE, tol = tol) :
the matrix is either rank-deficient or indefinite
2: In sqrt(diag(z$STATS[[lhs]]$clustervcv)) : NaNs produced
3: In sqrt(diag(z$STATS[[lhs]]$robustvcv)) : NaNs produced
4: In chol.default(mat, pivot = TRUE, tol = tol) :
the matrix is either rank-deficient or indefinite
5: In sqrt(diag(z$STATS[[lhs]]$clustervcv)) : NaNs produced
6: In sqrt(diag(z$STATS[[lhs]]$robustvcv)) : NaNs produced
7: In compute_bootstrap_replications(full_data = mydata, policy_variables_to_use = policy_variables_to_use, :
Negative eigenvalues set to zero in clustered variance matrix. See felm(...,psdef=FALSE)
Error in eigen(sigma, symmetric = TRUE) :
infinite or missing values in 'x'
Calls: source ... withVisible -> <Anonymous> -> map -> .f -> <Anonymous> -> eigen
In addition: Warning messages:
1: In chol.default(mat, pivot = TRUE, tol = tol) :
the matrix is either rank-deficient or indefinite
2: In sqrt(diag(z$STATS[[lhs]]$clustervcv)) : NaNs produced
3: In chol.default(mat, pivot = TRUE, tol = tol) :
the matrix is either rank-deficient or indefinite
4: In sqrt(diag(z$STATS[[lhs]]$clustervcv)) : NaNs produced
5: In compute_bootstrap_replications(full_data = mydata, policy_variables_to_use = policy_variables_to_use, :
Negative eigenvalues set to zero in clustered variance matrix. See felm(...,psdef=FALSE)
Execution halted
A region subject to somepolicy
should not also have the "extra treatment" of somepolicy_opt
when other regions in the super-region get this treatment, since this optional policy doesn't actually effect any new policy treatment for the region that already had somepolicy
. The same goes for popwt
versions of somepolicy
--we should make sure that any sub-regions (that compose the pop-weight of the region) that are encompassed in somepolicy_popwt
are not also aggregated in somepolicy_opt_popwt
during any of the days those policies are in place, and vice versa.
@kendonB if I don't do this soon yell at me
I started to fix a few lines but I'm getting to the end and it looks like theres a bunch of national cases in there which are causing a mis-matched merge with population data and it's failing an assertion at the end. @estherrolf can you take a look?
Using this issue to track TODO's for getting new data and/or creating tables/figures to respond to reviewers. When you submit a PR, add something like the following to your PR (e.g. "updates policy data for #21 "). When it is merged, I'll check off the appropriate checkbox in this issue. For all updates, make sure that any new scripts or manual downloads have been added to the README and these scripts have been added to run
We are now using the data_sources.gsheet
file as the up-to-date version of all manually aggregated policy data. Each country's pipeline should be updated with the following steps:
data/raw/[country]/[country_code]_policy_data_sources.csv
data/interim/[country]/[country_code]_policy_data_sources_other.csv
. Its possible but not necessary to include downloads to data/raw in these scripts, but the processed output, formatted to match data_sources.gsheet
should go in interim.Make sure your code pulls the latest data, without filtering to cutoff date (this will be done in regression step). Push a PR with both any code updates AND data updates through at least 3/24
This can either occur in the same script as the pulling epi data script or as a separate step. But make sure this pulls from data/raw/[country]/[country_code]_policy_data_sources.csv
,
data/interim/[country]/[country_code]_policy_data_sources_other.csv
(if necessary), any of the epi data, and any auxiliary data like population, and saves data/processed/[adm_level]/[country_code]_processed.csv
. Make sure that the output has lat and lon columns.
reg_data.csv
files. This could either be done with an easy-to-change set of country-specific variables at the top of the script or (preferably) a csv located at codes/data/regression_cutoff_dates.csv
. (@jeanettelt )results/tables/[country_code]_results_table.rtf
for Appendix table (@sannanphan)codes/data/cutoff_dates.csv
(see #29) (@hdruckenmiller)The date column is missing in data/processed/adm2/CHN_processed.csv
"data downloads" vs. "analysis", to make it easier to re-run one of these parts.
looks like something changed about the french website we were scraping for cases... So now it downloads a blank file with no datestamp. Will want to update the code to handle this (not sure if its temporary or a permanent change)
Fig 1 code downloads US state shapes at runtime, and this sometimes causes errors in the CI pipeline. We can instead pull from the adm1.shp file and drop our dependency on r-tigris
.
Is there a stochastic component of the CI bounds that we can set a seed for?
A reminder to change title, update tree, and any other changes.
Appendix Fig A1 , find out why this line *graph combine hist_usa qn_usa, rows(1) xsize(10) saving(figures/appendix/error_dist/error_usa.gph, replace)
was commented out in codes/models/alt_growth_rates/USA_adm1.do
, preventing combination of graphs to make Appendix Fig A1.
Right now format_infected.do
will treat April like the Y2K bug, so we should fix that up before april comes around. Also, we should have a setting where it just "runs until the latest date thats downloaded". Right now its stops at the 18th and requires having the fr-sars-cov-2-YYYYMMDD
data being before that date, with daily data downloaded for each date in between that bulk download and end_sample
. After #17 is merged, I think the repo is fully cleaned up and we can tag that version as our "medarXiv" version, so that figures are replicable. And then we can start pulling in newer data across all the scripts.
@jeanettelt @sannanphan can one of you be in charge of getting format_infected.do
ready for that? Thanks!
Right now FRA is using slightly different structure/definitions for policy variables, so @jeanettelt @sannanphan @peiley and I need to put our heads together to sort this out.
We're getting one county with null imputed values, which is raising an error during one of @kendonB's checks (thanks for adding those!)
Browse[1]> usa_county_data_standardised[is.na(usa_county_data_standardised$cum_confirmed_cases_imputed),]
# A tibble: 3 x 13
date adm0_name adm1_name adm2_name population cum_confirmed_c… cum_confirmed_c… active_cases active_cases_im… cum_deaths
<date> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020-04-01 USA Wyoming Weston 7208 NA NA NA NA NA
2 2020-04-02 USA Wyoming Weston 7208 NA NA NA NA NA
3 2020-04-03 USA Wyoming Weston 7208 NA NA NA NA NA
# … with 3 more variables: cum_deaths_imputed <dbl>, cum_recoveries <dbl>, cum_recoveries_imputed <dbl>
see comment in #72. It looks like FRA_coefs.csv, FRA_preds.csv, and FRA_reg_data.csv currently have merge conflicts in them
the 2nd to last line breaks on a variable rename b/c there are 2 adm0 columns...want to fix this and figure out if there might be any other cause for this (maybe things aren't merging correctly?)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.