Giter VIP home page Giter VIP logo

augsynth's People

Contributors

clarkest avatar ebenmichael avatar gegrimm avatar sideye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

augsynth's Issues

conformal_inf

Hello,
I am trying to run the conformal_inf function and I get the following error:
Error in conformal_inf(asyn, alpha = 0.05, type = "iid", q = 1, ns = 1000, :
could not find function "conformal_inf"

A Problem with the Command Multisynth

Hello,
Thanks for the great work on this package.
I am using the command multisynth to conduct an event study. The treatment status plot is like this:
Rplot01

My command is:
out_syn <- multisynth(gdp_pc ~ D,
unit = county_id,
time = year,
data = data,
n_leads = 5)
and it reports the error as Xc[[j]][, (dj - ndim + 1):dj, drop = F] subscripts out of limits.

Cannot reproduce the (average effect) Plot

Hi, Eli:
I'm trying to reproduce the results of your staggered adoption example. Everything just goes well except I cannot produce the plot of the average treatment effect. With command like
plot(ppool_syn_summ, level = "Average")
the augsynth package can still produce the following plot some as plot(ppool_syn_summ)
My augsynth version is 0.2.0

Error in mask[j, ] : subscript out of bounds

Hi there
I am using the package for a monthly database, and I want to compile the 'multisynth' function. I am getting this error:
"Error in mask[j, ] : subscript out of bounds"
Recently, I used the function for a yearly database, and I don't have any issues. Could it be from the 'date' variable?
I wondered if you have any thoughts on what that error is related to?

Thanks

Parenthesis causes different CI's

Hello Ben, I am getting different confidence intervals depending on whether or not I insert a parenthesis on the outcome of interest--is this an error? The estimate remains the same regardless.

summary(augsynth::augsynth(gsa_mi ~ treatment, state, year, 2017, data=nphi_gsa, weightfunc="ASCM"))

One outcome and one treatment time found. Running single_augsynth.

Call:
single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
t_int = t_int, data = data, weightfunc = "ASCM")

Average ATT Estimate (p Value for Joint Null): 0.137 ( 0.505 )
L2 Imbalance: 0.050
Percent improvement from uniform weights: 72.1%

Avg Estimated Bias: 0.010

Inference type: Conformal inference

Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
2016 -0.006 NA NA NA
2018 0.137 -0.137 0.412 0.5
There were 50 or more warnings (use warnings() to see the first 50)

But this differs from the following specification, where I add a parenthesis to ...augsynth((gsa_mi)~...

summary(augsynth::augsynth((gsa_mi) ~ treatment, state, year, 2017, data=nphi_gsa, weightfunc="ASCM"))

Multiple outcomes and one treatment time found. Running augsynth_multiout.

Call:
augsynth_multiout(form = form, unit = !!enquo(unit), time = !!enquo(time),
t_int = t_int, data = data, weightfunc = "ASCM")

Overall L2 Imbalance (Scaled):0.050 (0.279)

Average ATT Estimate:
Outcome Estimate Std.Error Pre.RMSE
1 gsa_mi 0.1372524 0.08831002 0.02253471

There were 20 warnings (use warnings() to see them)

I should add that I only have one post-intervention observation (2018), and 5 pre-intervention observations (2008-2016, bianually). The intervention happened in 2017. My interpretation is that the big CI is due to the few pre-intervention observations.

A problem with augsynth() command

Hello,

thank you for all the great work on this package. I ran into a issue with a command augsynth() and I have no luck solving it. When I try to run:

asyn <- augsynth(Master_data_df_balanced$gdppc ~ Master_data_df_balanced$treated, Master_data_df_balanced$country_no, Master_data_df_balanced$year, 2014, Master_data_df_balanced)

I am getting error message:

Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "c('double', 'numeric')"

Frankly, I do not understand where the problem lies. This is how my dataframe looks like:
image

I am still learning how to work with R and I am sorry if this is trivial question. Any suggestions are appreciated.
Thank you!

Control Units error

Hello,

I am running a model with multiple treated units. I get an error on this line because the treated_unit has multiple elements.

control_units <- data %>% filter(!!unit != treated_unit) %>% distinct(!!unit) %>% pull(!!unit)

I fixed it by changing the control_units definition to:

control_units <- data %>% filter(!(!!unit %in% treated_unit)) %>% distinct(!!unit) %>% pull(!!unit)

Confidence interval that includes the value of zero

Hello everyone, again I want to ask for your help.

I am estimating with multysynth but my confidence interval on the graph includes the value of 0, is this enough proof that the intervention has no significant effect on treated units despite the synthetic control fitting very well and despite that there is a noticeable trend of impact after treatment?

In the package examples I saw that for augsynth there were ways to improve the confidence intervals, changing parameters in the main function or in the summary. Are there also options to improve confidence intervals in multysynth?

Thank you very much

Confidence Intervals are asymmetric and capped to zero.

library(tidyverse)
library(augsynth)

dat <- read_rds(url("https://github.com/davidnathanlang/augsynth_issue/blob/main/asymetric_confidence_interval_reprex.rds?raw=true")) 
asynth <- augsynth::augsynth(
  outcome ~ treat,
  unit = state,
  time = centered_week,
  data = dat ,
  progfunc = "None", fixedeff = FALSE)
#> One outcome and one treatment time found. Running single_augsynth.

(sum<-summary(asynth)) 
#> 
#> Call:
#> single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
#>     t_int = t_int, data = data, progfunc = "None", fixedeff = FALSE)
#> 
#> Average ATT Estimate (p Value for Joint Null):  0.526   ( 0.954 )
#> L2 Imbalance: 0.424
#> Percent improvement from uniform weights: 93.9%
#> 
#> Avg Estimated Bias: NA
#> 
#> Inference type: Conformal inference
#> 
#>  Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
#>     0   -0.139             -2.743              2.464   0.500
#>     1   -0.427             -3.031              2.177   0.278
#>     2   -0.674             -3.278              1.930   0.389
#>     3   -0.686             -3.290              1.918   0.444
#>     4   -0.585             -3.189              2.019   0.611
#>     5   -0.573             -3.177              2.031   0.778
#>     6   -0.434             -3.038              2.170   0.833
#>     7    0.100             -2.504              2.704   0.944
#>     8    0.324             -2.280              2.927   0.500
#>     9    0.505             -2.099              3.109   0.333
#>    10    1.394             -1.210              3.998   0.111
#>    11    1.803             -0.801              4.407   0.167
#>    12    2.124             -0.480              4.728   0.167
#>    13    2.402             -0.202              5.006   0.167
#>    14    2.756              0.000              5.360   0.222  # Assymetric Confidence Interval
sum$att %>%
  filter(Time >= 0) %>%
  mutate(left_interval = Estimate - lower_bound, right_interval = upper_bound-Estimate) %>%
  filter(left_interval-right_interval>0.00001)
#>    Time Estimate lower_bound upper_bound     p_val left_interval right_interval
#> 14   14 2.756339           0    5.360227 0.2222222      2.756339       2.603888

Created on 2021-09-21 by the reprex package (v2.0.0)

Problem with Multisynth

Hi,
So I am running multisynth for one of my research projects and it is giving the following error:

Error in Xc[[j]][, (dj - ndim + 1):dj, drop = F] :
subscript out of bounds

There are no missing values in my outcome variable.

Thanks in advance.

Error in if (n_leads > d)

Hello,

I am trying to run the multisynth as described in the staggered part using this command:
ppool_syn <- multisynth(lgdp ~ trt, group, year, analysis_df, n_leads=10)

I got the following error -> "Error in if (n_leads > d)" which seems to be raised by that file R/multi_synth_qp.R

I also have an issue with the size of my file (15k~ rows), I got the error Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105"
I have to filter on only one year to get the previous error.

Best,

augsynth error w/ Basque

I'm trying to run the sample code in the vignette using the Basque data. I've copied the code exactly and all packages are installed. When I run
syn <- augsynth(gdpcap ~ treatment, regionno, year, 1975, basque, progfunc="None", weightfunc="SCM")

I get the following error:

Error in enc2utf8(col_names(col_labels, sep = sep)) :
argument is not a character vector
I'm having difficulty dissecting this error message to see what it refers to.

Weighted ATT

Hi, is there a way to construct weighted ATT in the traditional augsynth environment. Thanks

Questions about multisynth plot

Hello everyone.

I have two questions, I would really appreciate if someone could help me.

I am estimating a staggered synthetic control. The multisynth plot function "plot(ppool_syn_time_summ) for example" allows me to plot the difference between the observed value of the treated units and the estimates of their counterfactuals, that is, it plots the treatment effect (if I'm not right please tell me). But in addition to obtaining that graph, I want to graph the two terms, that is, I want to obtain a graph in which two time series are presented, one for the observed value of the average treated units and another for the estimates of the average counterfactual. How could he do it with the augsynth package? Or how could I get the values ​​of these two series so I can build the graph?

On the other hand, I would like to know if there is any way to modify the aesthetics of the graphs of the multisynth plot function. I want to change colors, size of the lines, aesthetics of the graphic environment, among others. Is there a way to edit augsynth plots in ggplot2?

I am attentive, thank you very much.

Issue in running multisynth()

Hi Eli Ben-Michael,

Thanks for proposing a synthetic controls method for staggered adoption. I adapted the function: ppool_syn <- multisynth(lnppexpend ~ cbr, State, year, analysis_df, n_leads = 10) based on my dataset, which is a balance panel with staggered treatment time. But I encounter this error:

Error in cbind(wide$X, wide$y) :
number of rows of matrices must match (see arg 2)

I have no idea how to address this error. Any solution or suggestion are appreciated. Thanks.

Alpha in Conformal CI calculation

Hello!

For some reason there is not possible to change level of alpha in conformal CI calculation. Passing alpha directly to summary changes titles for upper and lower bound given the level of alpha, but not the CI-is themselves. Running conformal_inf returns following error: Error in do.call(fit_augsynth_internal, c(list(wide = new_wide_data, synth_data = new_synth_data, :
object 'fit_augsynth_internal' not found

Best,
H

Error in running multisynth: "Error in rowMeans(residuals[, 1:tj], na.rm = TRUE) : 'x' must be an array of at least two dimensions"

Hello,
first of all thank you very much for your great work: augsynth is a great package!
I am trying to estimate a multisynth (ppool_syn <- multisynth(y ~ treat,id,time,data, n_leads = 3)model with only a reduced number of leads. However, when I run it generates the following error: Error in rowMeans(residuals[, 1:tj], na.rm = TRUE) : 'x' must be an array of at least two dimensions. Do you know what may be causing this problem and if there is any possibility for me to overcome it?

Thanks a lot in advance for your help

Best regards

Estimates are not invariant to sort order

Running the example code generates distinct results based on the way the data is fed into the augsynth. It generates point estimates that are pretty different. Here's an example using the code from your vignette.

library(augsynth)
library(dplyr)
data(kansas)
(
syn1 <- augsynth(lngdpcapita ~ treated, fips, year_qtr,   # Vignette Example  Estimate -0.029
                 kansas,
                 progfunc = "None", scm = T)
)
(syn2 <- augsynth(lngdpcapita ~ treated, fips, year_qtr,  #Descending Estimate -0.008
                 kansas %>% arrange(fips,desc(year_qtr)),
                 progfunc = "None", scm = T)
)

Error in the staggered adoption code

Hello everyone,

I am developing an impact evaluation research and I was interested in the staggered adoption methodology developed by you. At the moment, I'm trying to replicate the code in the "augsynth: Estimating treatment effects with staggered adoption" section, and I'm getting the same error Arturo mentioned last month.

The code I run is the following:

with a choice of nu

ppool_syn <- multisynth(lnppexpend ~ cbr, State, year,
nu = 0.5, analysis_df)

The mistake is:
Error in mutate():
! Problem while computing trt_time = replace_na(trt_time, Inf).
Caused by error in stop_vctrs():
! Can't convert from replace to data due to loss of precision.
*Locations: 1

Could you please tell me how to solve this problem?

Thank you very much, greetings.

Adding auxiliary covariates causes fatal error

Adding auxiliary covariates is causing R to abort when calling multisynth. I have tried using both time-constant and time-varying covariates, ensured there is no missing data in the matching covariate, all to no avail. This is the format I am using:

ppool.syn<- multisynth(y ~ x | c1, unit, weeknumber, 
                              data, n_lags = 12, n_leads = 16)

When I do not include the c1 covariate, the code runs as expected. Thanks!

No confidence intervals of average ATT for conformal inference type

Hi!

I was wondering what was the reason behind there not being a confidence interval over the average ATT for the conformal inference type? Also, why doesn't the jackknife plus have a pvalue estimate for the average ATT?

Are there any plans to include the CI for conformal and the pvalue for jackkinfe in the near future?

Thanks!

Error: cannot allocate vector of size 42.9 Gb

I have been trying to implement augsynth on my monthly-panel dataset, which has a staggard adoption of the treatment. The dataset size is as follows :

Unbalanced Panel: n = 3609, T = 1-277, N = 45560

When I try to run the model, with the code :
ppool_syn <- multisynth(outcome ~ treatment_status , project, month, data = data, fixedeff = T )

i get the error :

cannot allocate vector of size 42.9 Gb.

Is there a limit to the size of the dataset ? Or am I missing something ?

I have not provided too much details of the project but will be happy to share more info if required. Thank you.

UPDATE:

I tried with a smaller data sample :
Unbalanced Panel: n = 1811, T = 1-129, N = 13230

It seems to be able to run, but now it indicates that the problem is non-convex:
Error in KKT matrix LDL factorization when computing nonzero elements. The problem seems to be non-convexerror in osqp_setup: KKT matrix factorization

p-value of the Average ATT Estimate

Hi @ebenmichael and thank you for this package!

I'm getting some results that seem inconsistent.

syn_Ridge = augsynth(lnComposite_HPI ~ tr, loc.f, Date.f, d_O, progfunc = "Ridge", scm = T)
a = summary(syn_Ridge, inf_type = "conformal") 
plot(syn_Ridge, inf_type = "conformal")

summary reports:

Average ATT Estimate (p Value for Joint Null):  -0.111   ( 0.411 )
L2 Imbalance: 0.180
Percent improvement from uniform weights: 88.1%
Avg Estimated Bias: 0.184
Inference type: Conformal inference
 Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
  149   -0.040             -0.073             -0.007   0.027
  150   -0.049             -0.082             -0.016   0.027
  151   -0.062             -0.095             -0.029   0.013
...
  189   -0.120             -0.200             -0.021   0.027
  190   -0.143             -0.223             -0.044   0.020

The corresponding plot is:
image

The plot matches nicely w/ the individual year ATTs reported by summary.
The Average ATT Estimate (-0.111) is equal to the average of the individual annual ATTs.
mean(a$att$Estimate[149:190])
However, the p-value of the Average ATT Estimate (0.411) is so much bigger than all the individual annual ATT p-values.
The biggest individual annual p-value is .09, the average is .019.
We would expect the pv of the Average ATT Estimate to not be so much bigger than the individual annual ATT p-values
Why is this? Is there some kind option I need to use? Or is there some kind of bug?

Already treated people as control group

Hi
According to your paper, the package is using not-treated and no-yet-treated individuals as controls.
Is there any way in the package to use already-treated people as controls, too, resembling a TWFE control group?

Thanks - Mo

Codes for result plots and tables?

Hello,

I wanted to ask if you will add codes to get a results plot showing outcome trajectories of the treated and synthetic control units (rather than the gap) as well as summary tables for V and W weights and for balance checks between the treated unit and the synthetic control, in the style of the standard "synth" package with the "path.plot" and "synth.tables" commands.

Thank you for the great package!

m of n bootstrap

First, thanks for developing the package. I'm using the multisynth routine to analyze a large number (2600) of treated units with staggered adoption. I drop some suspect observations in the panel. Not long after I submit the estimation command, R crashes. After digging into the code a bit, I suspect the error occurs when the problem is sent to osqp. When I keep the complete panel, R does not crash, but the problem is too large for my machine (144gb memory). This latter issue provides an informative error and is not an issue with the package. I wonder if there is a check that can be implemented before an incomplete panel is passed to osqp to prevent the crash?

Because of the size of the problem, I have been estimating the model on subsets of the data and assessing the stability of the estimates. It is computationally infeasible to apply the jackknife standard errors as you refer to in the paper and have coded in the package. I am implementing an m-out-of-n bootstrap routine (https://stats.stackexchange.com/questions/33948/subsample-bootstrapping) for my case and would be happy to share. Are you open to pull requests?

Thank you again,

Jude

Add in time-based cross validation

Let's replace the current glmnet cv method with time-based cross validation. This should

  1. Iterate over a set of hyper-parameters
  2. Hold out a time period (or group of time periods), fit augsynth with held out time period, and get prediction error on that held out time period
  3. Return a vector of hyper-parameters and the squared error across all held out time periods

Minor error in print function for augsynth

When running the print function for an augsynth object, I receive the following warning message:

Warning message:
In colMeans(augsynth$data$y[augsynth$data$trt == 1, , drop = F]) -  :
  longer object length is not a multiple of shorter object length

I believe the error is caused by the following lines of code:

augsynth/R/augsynth.R

Lines 188 to 190 in 180d182

## print att estimates
att_post <- colMeans(augsynth$data$y[augsynth$data$trt == 1,,drop=F]) -
predict(augsynth)

This warning message is isolated to the print function (I think). The only issue that arises from this warning is that there is a discrepancy between the Average ATT Estimate generated from the print function and the the Average ATT Estimate generated from the summary function.

To replicate this issue, one can run the code for the augsynth vignette.

Combine into one entry point for the API

There are two different estimators based on the data type:

  1. augsynth: for cases where there is one treated unit or multiple treated units with one treatment time.
  2. multisynth: for cases where there are multiple treated units across multiple distinct treatment times.

We should try to combine these functions into one augsynth function that checks whether there are multiple treatment times or not and then acts accordingly, maybe just by calling one of those two functions. Both of these functions create two different classes, so we might want to create a parent class and have these two classes as child classes.

One hing that would be nice to get rid of is the need to specify the treatment time t_int.

Jackknife+

I'm having a little trouble understanding appendix B of your paper.

For Jackknife+:
LOO residuals: image
Estimated post-treatment outcome (w/o period "t"): image

PI: image
I'm confused b/c the following quantity varies w/ dropped-period "t": image
Which dropped period "t" is used?
Or is this quantity averaged across all t in 1:T0?

Update:
for each t in 1:T0, compute x_t = image
Then use this to compute the order-stat for x_1 to x_T0: image
Then is this a typo?:image

Bug when time_cohort = T

I get a subscript out of bounds error when I restrict to time_cohort = T:

ppool_syn <- multisynth(y ~ treat, zip, t, 
    fit.data, lambda=0.4, n_leads=13, n_lags=18, time_cohort=T)

Error in donors[[j]]: subscript out of bounds
Traceback:

1. multisynth(lops ~ ssd, zip, t, fit.data, lambda = 0.4, n_leads = 13, 
 .     n_lags = 18, time_cohort = T)
2. multisynth_formatted(wide = wide, relative = T, n_leads = n_leads, 
 .     n_lags = n_lags, nu = nu, lambda = lambda, V = V, force = force, 
 .     n_factors = n_factors, scm = scm, time_cohort = time_cohort, 
 .     time_w = F, lambda_t = 0, fit_resids = TRUE, eps_abs = eps_abs, 
 .     eps_rel = eps_rel, verbose = verbose, long_df = long_df, 
 .     how_match = how_match, ...)   # at line 90-98 of file <text>
3. multisynth_qp(X = bal_mat, trt = wide$trt, mask = wide$mask, 
 .     Z = wide$Z[, !colnames(wide$Z) %in% wide$match_covariates, 
 .         drop = F], n_leads = n_leads, n_lags = n_lags, relative = relative, 
 .     nu = 0, lambda = lambda, V = V, time_cohort = time_cohort, 
 .     donors = donors, eps_rel = eps_rel, eps_abs = eps_abs, verbose = verbose)   # at line 263-278 of file <text>
4. lapply(1:nrow(mask), function(j) X[[j]][donors[[j]], mask[j, 
 .     ] == 1, drop = F])   # at line 541-542 of file <text>
5. FUN(X[[i]], ...)

The error occurs in multisynth_qp, in the following block:

    ## handle X differently if it is a list
    if(typeof(X) == "list") {
        x_t <- lapply(1:J, function(j) colSums(X[[j]][which_t[[j]], mask[j,]==1, drop=F]))
        
        # Xc contains pre-treatment data for valid donor units
        Xc <- lapply(1:nrow(mask),
                 function(j) X[[j]][donors[[j]], mask[j,]==1, drop=F])
        
        # std dev of outcomes for first treatment time
        sdx <- sd(X[[1]][is.finite(trt)])
    } else {
        x_t <- lapply(1:J, function(j) colSums(X[which_t[[j]], mask[j,]==1, drop=F]))        
        
        # Xc contains pre-treatment data for valid donor units
        Xc <- lapply(1:nrow(mask),
                 function(j) X[donors[[j]], mask[j,]==1, drop=F])

        # std dev of outcomes
        sdx <- sd(X[is.finite(trt)])
    }

If I print the dimension of mask and donors, I get (21, 38) and (17,), respectively. So obviously, there will be no data for j=18,..,21 in donors, and this is why we get an indexing error. If I look more closely at mask, there are four rows with NAs (in just the last 6 columns). When I set time_cohort=F, the dimensions are (1580, 38) and (1580,), and we don't have this issue, however, the problem is too large in this case.

Negative weights

Hello everyone,

I'm replicating the code from the "the vignette for staggered adoption" section and I'm having a problem because I don't understand the weights. I extract the weights from the synthetic control with "ppool_syn2$weights", but some weights are negative. I add all the weights for each treated unit and the sum if gives a result of 1 for all the treated units. I compare these weights with figure B.9 of the annex to the paper "SYNTHETIC CONTROLS WITH STAGGERED ADOPTION" and I do not obtain the same results. In the paper it is indicated that the weights must be non-negative but I do get negative weights. Could someone please explain to me why this happens?

Adding titles and labels to plots

Is there an easy way to modify the title and axes labels of plots created by plot()?

library(magrittr)
library(dplyr)
library(augsynth)
data(kansas)

kansas %>% 
  select(year, qtr, year_qtr, state, treated, gdp, lngdpcapita) %>% 
  filter(state == "Kansas" & year_qtr >= 2012 & year_qtr < 2013) 

syn <- augsynth(lngdpcapita ~ treated, fips, year_qtr, kansas,
                progfunc = "None", scm = T)


plot(syn, inf = T, cv = F, title = "Effect Kansas of tax cut on GDP")

Error in conformal_inf(augsynth, ...) : 
  unused argument (title = "Effect Kansas of tax cut on GDP")

I see that the plot.summary function allows optional arguments. Should I explicitly create the summary object first and pass that to the plot.summary function?

Not taking the treatment variable

Hi everyone,

Mi name is Juan and I'm trying to do an application of the augsynth package in R (R-studio) for an application of the staggered synthetic control method, but I always receive as result that, although the program gives me some results, as you can see in the PDF file of the URL im leaving behind, I think the code is not taking my treatment variable well (which I call the same you called in your application: cbr), as you will see in the graphs at the end of the PDF. I'll leave you behind a repository that I created with the info required to see my problem.

Thank you very much for your help.

https://github.com/juancamilo1020/augsynth_application

`#install.packages("devtools")
#devtools::install_github("ebenmichael/augsynth")
#install.packages('ggrepel')
library(magrittr)
library(dplyr)
library(augsynth)
library(tidyverse)
library(haven)
library(ggrepel)
datos <- read_dta(file="/Users/juancamiloperdomo/OneDrive/CNC/ADS-SGR/Base/Stata/Base_des_mun.dta")

datos %>%
filter(#!State %in% c("DC", "WI"),
AÑO >= 2007, AÑO <= 2019) %>%
mutate(prim_año_ejec2 = ifelse(is.na(prim_año_ejec2),
Inf, prim_año_ejec2),
cbr = 1 * (AÑO >= prim_año_ejec2)) -> analysis_df2
ppool_syn <- multisynth(form = Muertos_eventos_emergencia ~ cbr, unit = codmpio, time = AÑO, data = analysis_df2)
print(ppool_syn$nu)
ppool_syn
ppool_syn_summ <- summary(ppool_syn)
ppool_syn_summ
plot(ppool_syn_summ)`

Issues with `replace_na()` in tidyr v1.2.0

Hi everyone!

We just noticed that a change in the newly released version of tidyr (v1.2.0) causes a problem with augsynth().

Specifically, in the new version of tidyr, the replace_na() function no longer allows the type of data to change (more info on this change can be found here). Therefore, replacing NA's of the integer column trt_time with Inf through the line: mutate(trt_time = replace_na(trt_time, Inf)) makes the code crash as replace_na() can't no longer convert from double to integer.

Best.
Arturo

"caught segfault"

I'm having issues with the R Session aborting when I try to run augsynth with my data. This is start of the console message that I see when running in base R:

One outcome and one treatment time found. Running single_augsynth.
Error in KKT matrix LDL factorization when computing the nonzero elements. The problem seems to be non-convexERROR in osqp_setup: KKT matrix factorization.
The problem seems to be non-convex.

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: osqpSolve(private$.work)
...

This has been tested on Mac OS 10.15.7 and on Windows 10. I can run the example augsynth function with the included Kansas data without issue. I have included a reprex that references our data here. Thank you.

library(dplyr)
library(augsynth)

# Get our data
url <- "https://github.com/williamlief/synth_vax/blob/main/data/daily_data_2021-07-04.rds?raw=true"
dat <- readRDS(url(url, method="libcurl"))

dat <- dat %>% 
  select(people_fully_vaccinated_per_hundred, 
         state, centered_time) %>% 
  mutate(centered_time = as.numeric(centered_time), 
         treat = as.numeric(state == "OH" & centered_time >= 0),
         state = as.numeric(factor(state)))

# R aborts here with segfault:
syn <- augsynth(people_fully_vaccinated_per_hundred ~ treat, 
         unit = state, time = centered_time, data = dat)

predict() command has two different modes

It appears as if predict( acsm, att=TRUE ) gives a different output than predict( acsm, att=FALSE ). Is that by design?

> p1 = predict( syn, att=TRUE )
> class( p1 )
[1] "numeric"
> p2 = predict( syn, att=FALSE )
> class( p2 )
[1] "matrix"

Examples of how to use `cov_agg`?

By default, multisynth() and others use the mean of the covariates in the pre-treatment period to construct the synthetic control. If I want to take the average of the covariates every 3 years (for example) in the pre-treatment period, I guess I should use the argument cov_agg but I didn't find any examples in the docs or in the vignettes and the description of this argument is not very detailed.

Can you give some examples of how to use cov_agg?

Thanks for this package

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

When I was trying to fit a model with large units (e.g., over 100,000 units and around 30 time periods), the function failed adn reported:

Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

I think this might because of very large dense matric created during process and hopefully this method could be solved in the future.

Thank you!

Incomplete panel data causes segfault and R to abort

We've found that incomplete panel data causes a segfault and R to abort. This is related to #53 where missing values in the panel cause a segfault. Here we have determined that a panel data with missing rows (e.g. the entire row is absent) will also cause a segfault. Included here is a reproducible example of the behavior. I've also added an example of a simple function that could be added to the top of relevant augsynth and multisynth function calls so that an error is returned instead of crashing R.

library(augsynth)
library(dplyr)

kansas_2 <- kansas %>% 
  tidylog::filter(!(fips == 2 & year_qtr == 1995.25)) # drop an arbitrary row creating incomplete panel data


# # This will cause a segfault and R to abort
# syn <- augsynth(lngdpcapita ~ treated, fips, year_qtr, kansas_2,
#                 progfunc = "None", scm = T)

# Example function to check for completeness - could be expanded to check for missing 
# values or other issues. Catching and returning errors will make use much easier 
# than allowing the R session to abort.
check_data <- function(data, unit, time) {
  
  # Check whether there are omitted rows
  full_data <- data %>% tidyr::expand({{unit}}, {{time}})
  
  if(nrow(data) != nrow(full_data)) stop("There are missing rows in the input data set. Panel must be balanced.")
  
}

check_data(kansas, fips, year_qtr) # silent when no issue detected
check_data(kansas_2, fips, year_qtr)

Non-Convex Problem

Hi there
I have balanced panel data, contains 445 individuals and t = 10.
I am getting Non-convexity error with multisynth function:
"Error in KKT matrix LDL factorization when computing the nonzero elements. the problem seems to be non-convex."

I wondered if you have any idea what is going wrong.

Missing column name error

Hi,

I am trying to use multisynth as follows:
out <- multisynth(outcome ~ crb, unit = 'name_id',time = 'time_id', data = dat)

And am getting this error:

Error: Join columns must be present in data.
x Problem with `name_id`.

name_id is most definitely in the data frame. Subsequently, I also get

In max(trt_time) : no non-missing arguments, returning NA

My panel looks like this:
missings_new

Many thanks for your help!

Covariates and Sort Order produce distinct treatment estimates

I am getting distinct ATT estimates for synthetic control depending on sort order of unit names. I wanted to verify that this is still expected behavior. I thought this may have been an issue with cross-validation in ridge regression. I am still getting distinct results on vanilla synthetic control. See reprex below (Thanks @williamlief for the example).

library(augsynth)
library(rlang)

syn_wrap <- function(unit) {
  augsynth(lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
             log(revlocalcapita) + log(avgwklywagecapita) +
             estabscapita + emplvlcapita,
           {{unit}}, year_qtr, kansas,
           progfunc = "none", scm = T,fixedeff=F)
  
} 

set.seed(123)

syn_fips <- syn_wrap(fips)
syn_abb <- syn_wrap(abb) 
syn_state <- syn_wrap(state)

# syn_fips and syn_state are equal, syn_abb is different
syn_fips # -0.032
syn_abb # -0.005 (SIC)
syn_state # -0.0032

# Visualize estimates
library(patchwork)
library(ggplot2)

p1 <- plot(syn_fips) + labs(title = "Unit = fips")
p2 <- plot(syn_abb) + labs(title = "Unit = abb")
p3 <- plot(syn_state) + labs(title = "Unit = state")

# Checking source data 
distinct_kansas <- kansas %>% select(fips, abb, state) %>% 
  distinct()

# Its possible state sort order matters? fips and state sort the same way
# but abb sorts differently. 
wrap_plots(p1, p2, p3, nrow = 3) 

Confirm Jackknife+ is working as intended?

I wanted to make sure jackknife+ is working as intended? have an example where I was able to generate effect estimates that were outside the associated 95% confidence interval for a given week, specifically weeks 7 and 8 below. I understand that jackknife+ can construct asymmetric confidence intervals but it seems bizarre to have an estimate that doesn't fit inside it's own confidence intervals. Is this the intended behavior? @williamlief

library(tidyverse)
library(augsynth)
library(patchwork)
dat<-read_rds(url("https://github.com/davidnathanlang/augsynth_issue/raw/main/simple_dat.rds"))
asynth <- augsynth::augsynth(
  outcome ~ treat,
  unit = state,
  time = centered_week,
  data = dat  ,
  progfunc = "Ridge", fixedeff = TRUE)
#> One outcome and one treatment time found. Running single_augsynth.


conformal<-plot(asynth,inf_type="conformal") +labs(title="Conformal Inference")
jackknife_plus<-plot(asynth,inf_type="jackknife+") +labs(title="Jackknife+ Inference")

conformal+jackknife_plus # Huge difference in Standard Errors

(sum_model<-summary(asynth,inf_type = "jackknife+"))
#> 
#> Call:
#> single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
#>     t_int = t_int, data = data, progfunc = "Ridge", fixedeff = TRUE)
#> 
#> Average ATT Estimate:  7.527 
#> L2 Imbalance: 0.000
#> Percent improvement from uniform weights: 100%
#> 
#> Avg Estimated Bias: -10.045
#> 
#> Inference type: Jackknife+ over time periods
#> 
#>  Time Estimate 95% CI Lower Bound 95% CI Upper Bound
#>     0    2.003             -2.947              2.284
#>     1    3.776             -4.061              3.927
#>     2    5.736             -4.147              5.897
#>     3    6.424             -4.500              6.491
#>     4    7.851             -4.502              7.932
#>     5    9.160             -4.505              9.188
#>     6    8.977             -4.769              9.069
#>     7   10.772             -4.825             10.594
#>     8    9.316             -5.306              9.153
#>     9   10.291             -5.067             10.387
#>    10    8.962             -5.535              9.165
#>    11    8.838             -5.691              9.072
#>    12    8.566             -5.828              8.792
#>    13    8.106             -5.959              8.352
#>    14    7.245             -6.327              7.612
#>    15    6.946             -6.513              7.398
#>    16    6.478             -6.890              6.982
#>    17    6.040             -7.167              6.614
sum_model$att %>% as_tibble() %>% filter(Estimate>upper_bound) # Weeks 7 and 8 Have point estimates that are outside of their confidence interval
#> # A tibble: 2 x 4
#>    Time Estimate lower_bound upper_bound
#>   <dbl>    <dbl>       <dbl>       <dbl>
#> 1     7    10.8        -4.82       10.6 
#> 2     8     9.32       -5.31        9.15

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] patchwork_1.1.1 augsynth_0.2.0  forcats_0.5.1   stringr_1.4.0  
#>  [5] dplyr_1.0.7     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3    
#>  [9] tibble_3.1.2    ggplot2_3.3.5   tidyverse_1.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        lubridate_1.7.10  lattice_0.20-44   assertthat_0.2.1 
#>  [5] digest_0.6.27     utf8_1.2.1        R6_2.5.0          cellranger_1.1.0 
#>  [9] backports_1.2.1   reprex_2.0.0      evaluate_0.14     httr_1.4.2       
#> [13] highr_0.9         pillar_1.6.2      rlang_0.4.11      readxl_1.3.1     
#> [17] rstudioapi_0.13   Matrix_1.3-3      rmarkdown_2.9     styler_1.5.1     
#> [21] labeling_0.4.2    osqp_0.6.0.3      munsell_0.5.0     broom_0.7.9      
#> [25] compiler_4.1.0    modelr_0.1.8      xfun_0.24         pkgconfig_2.0.3  
#> [29] htmltools_0.5.1.1 tidyselect_1.1.1  fansi_0.5.0       crayon_1.4.1     
#> [33] dbplyr_2.1.1      withr_2.4.2       grid_4.1.0        jsonlite_1.7.2   
#> [37] gtable_0.3.0      lifecycle_1.0.0   DBI_1.1.1         magrittr_2.0.1   
#> [41] scales_1.1.1      cli_3.0.1         stringi_1.6.2     farver_2.1.0     
#> [45] fs_1.5.0          xml2_1.3.2        ellipsis_0.3.2    generics_0.1.0   
#> [49] vctrs_0.3.8       Formula_1.2-4     tools_4.1.0       glue_1.4.2       
#> [53] hms_1.1.0         yaml_2.2.1        colorspace_2.0-2  rvest_1.0.0      
#> [57] LowRankQP_1.0.4   knitr_1.33        haven_2.4.1

Created on 2021-09-17 by the reprex package (v2.0.0)

Error in augsynth command

Hello, I am having an issue to run the augsynth package in my data. Is there anyway to fix this? Thank you very much. The error is as follows:

library(augsynth)
options(error=utils::recover)
syn <- augsynth(Y ~ Treatment | cov1 + cov2, region, year, 1996, data, progfunc="None", weightfunc="SCM")
Error in X[trt == 0, , drop = F] : (subscript) logical subscript too long

Enter a frame number, or 0 to exit

1: augsynth(propM ~ Codelaw | inc + UR, STATEID, YEAR, 1996, asynthcali, progfunc = "None"
2: do.call(format_synth, wide)
3: (function (X, trt, y)
{
synth_data <- list()
synth_data$Z0 <- t(X[trt == 0, ,
4: t(X[trt == 0, , drop = F])

Selection: 4
Called from: t(X[trt == 0, , drop = F])
Browse[1]>

Submit augsynth to CRAN

Hi! Just wanted to ask if there's a specific reason why augsynth is not yet submitted to CRAN. Do you need a hand?
Just ran a devtools::check() on the project and found 1 error, 2 warnings, and 2 notes that when fixed would be practically good to go. Congrats on this great resource.

Error: Each row of output must be identified by a unique combination of keys.

Hi Eli, I'm seeing the below error and don't know if there's a workaround.
Earlier while running a different equation I was seeing "Keys are shared for 192 rows".
I've checked my df, and as you can see the following sample rows are different.
I even tried to delete rows, but still kept getting the error. Grateful for your assistance.

image
image

ppool_syn <- multisynth(spend_all ~ treat_post, statefips, time,

  •                     nu = 0.5, data, n_leads = 10)
    

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 1098 rows:

  • 1099, 1648
  • 1282, 1831
  • 1465, 2014
  • 1100, 1649
  • 1283, 1832
  • 1466, 2015
  • 1101, 1650
  • 1284, 1833
  • 1467, 2016
  • 1102, 1651
  • 1285, 1834
  • 1468, 2017
  • 1103, 1652
  • 1286, 1835
  • 1469, 2018
  • 1104, 1653
  • 1287, 1836
  • 1470, 2019
  • 1105, 1654
  • 1288, 1837
  • 1471, 2020
  • 1106, 1655
  • 1289, 1838
  • 1472, 2021
  • 1107, 1656
  • 1290, 1839
  • 1473, 2022
  • 1108, 1657
  • 1291, 1840
  • 1474, 2023
  • 1109, 1658
  • 1292, 1841
  • 1475, 2024
  • 1110, 1659
  • 1293, 1842
  • 1476, 2025
  • 1111, 1660
  • 1294, 1843
  • 1477, 2026
  • 1112, 1661
  • 1295, 1844
  • 1478, 2027
  • 1113, 1662
  • 1296, 1845
  • 1479, 2028
  • 1114, 1663
  • 1297, 1846
  • 1480, 2029
  • 1115, 1664
  • 1298, 1847
  • 1481, 2030
  • 1116, 1665
  • 1299, 1848
  • 1482, 2031
  • 1117, 1666
  • 1300, 1849
  • 1483, 2032
  • 1118, 1667
  • 1301, 1850
  • 1484, 2033
  • 1119, 1668
  • 1302, 1851
  • 1485, 2034
  • 1120, 1669
  • 1303, 1852
  • 1486, 2035
  • 1121, 1670
  • 1304, 1853
  • 1487

Issue with matching prior to weighting

Just trying out this package, so wanted to run a simple example where I match on total population prior to weighting. I run the following command:

spool_syn2 <- multisynth(outcome~ legal | educ_median + familyincome + nwshare + percap_MD | totpop, county_id, year, AHA, 
                        nu = 0, n_leads = 4, n_lags = 15, time_cohort = TRUE)

However, I get the following error:

Error in get_knn_donors(trt, Z, donors, k) : 
  Number of neighbors for knn not selected, please choose k.

Is my syntax incorrect? Or is it likely that my data is structured incorrectly?

Optional arguments are a mess

augsynth takes two sets of optional arguments:

  • opts_out: list of optional arguments for the outcome model
  • opts_weights: list of optional arguments for the weighting

opts_weights is never used, so let's get rid of it and replace opts_out with ... optional arguments that turtle all the way down to the actual outcome model (or fit_ridgeaug_formatted).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.