Improve choice of pairs,about sbstusa/spilloverdesign

Comments (16)

nahiggins commented on September 2, 2024

...just an additional thought here:

could throw in a little check at each stage in the algorithm that looks at
how big our total sample is, so we know when to stop.

could do something like:

(1) generate a "round" variable each time a treatment-control-pair is
identified (so the first treatment-control-pair in AK, AL, etc. would all
have a "round" = 1)

(2) continue the algorithm for some number of rounds, generating a map each
time

(3) calculate the total number of expected letters to be sent out after
each round (call it EN)

(4) stop the algorithm when EN > 150,000

That way we don't end up playing around too much at the end. And if we
wanted to look at alternative samples, we would have an easy set of
alternatives, where the first alternative is the total sample when EN first
becomes > 150,000, the second alternative is the total sample minus the
last round, etc.

Does that make sense?

On Mon, Jan 4, 2016 at 2:10 PM, Jake Bowers [email protected]
wrote:

This is the plan to improve our design.
Avoid overfull states

Try 1/4 of the counties in each state in the treatment group to avoid
overly filling states.
Avoid cross-pair adjacency of treated to control

Choose best pair as closest in size and largest in number of farmers. Then
choose next pair that is not adjacent but closest in size and largest in
farmers, etc..

Pair choice to control cross-pair adjacency will be iterative,
greedy.

Pair choice: 1000 versus 900 is better than 8 versus 7. Some way
to choose pairs with larger counties. proportional difference better.

Pair choice: Avoid treated-to-control adjacency across state
lines and within states.

—
Reply to this email directly or view it on GitHub
#3.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

I'll keep working on this tomorrow. I'm working to include this idea about keeping track of n. Right now, the algorithm is state specific under the idea that we do not want to have a particularly small number of counties in any one state. The total n process would probably require a more global approach: we can still pair within state, but we would do the sorting and choosing of pairs at a national level.

from spilloverdesign.

nahiggins commented on September 2, 2024

thanks Jake! What time do you think you'll be working? Just want to know
if we can mesh up on this.

On Mon, Jan 4, 2016 at 10:52 PM, Jake Bowers [email protected]
wrote:

I'll keep working on this tomorrow. I'm working to include this idea about
keeping track of n. Right now, the algorithm is state specific under the
idea that we do not want to have a particularly small number of counties in
any one state. The total n process would probably require a more global
approach: we can still pair within state, but we would do the sorting and
choosing of pairs at a national level.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jakesbst commented on September 2, 2024

I'm working now. The recursive algorithm is a bit tricky.

Jake Bowers

Social and Behavioral Sciences Team

Office of Evaluation Sciences | General Services Administration
[email protected] [email protected] | (202) 322-6714 | sbst.gov

On Tue, Jan 5, 2016 at 1:06 AM, nahiggins [email protected] wrote:

thanks Jake! What time do you think you'll be working? Just want to know
if we can mesh up on this.

On Mon, Jan 4, 2016 at 10:52 PM, Jake Bowers [email protected]
wrote:

I'll keep working on this tomorrow. I'm working to include this idea
about
keeping track of n. Right now, the algorithm is state specific under the
idea that we do not want to have a particularly small number of counties
in
any one state. The total n process would probably require a more global
approach: we can still pair within state, but we would do the sorting and
choosing of pairs at a national level.

—
Reply to this email directly or view it on GitHub
<
#3 (comment)

.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

from spilloverdesign.

nahiggins commented on September 2, 2024

ok. lmk how/if I can help. available all day to consult on this.

On Tue, Jan 5, 2016 at 9:04 AM, Jake Bowers [email protected]
wrote:

I'm working now. The recursive algorithm is a bit tricky.

Jake Bowers

Social and Behavioral Sciences Team

Office of Evaluation Sciences | General Services Administration
[email protected] [email protected] | (202) 322-6714 | sbst.gov

On Tue, Jan 5, 2016 at 1:06 AM, nahiggins [email protected]
wrote:

thanks Jake! What time do you think you'll be working? Just want to know
if we can mesh up on this.

On Mon, Jan 4, 2016 at 10:52 PM, Jake Bowers [email protected]
wrote:

I'll keep working on this tomorrow. I'm working to include this idea
about
keeping track of n. Right now, the algorithm is state specific under
the
idea that we do not want to have a particularly small number of
counties
in
any one state. The total n process would probably require a more global
approach: we can still pair within state, but we would do the sorting
and
choosing of pairs at a national level.

—
Reply to this email directly or view it on GitHub
<

#3 (comment)

.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

—
Reply to this email directly or view it on GitHub
<
#3 (comment)

.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

One thing to work on: Decide on how to best rank pairs. Right now, I'm ranking on sizeDiffs/avgN, and breaking ties on avgN and before I was ranking on sizeDiffs and then breaking ties based on avgN. Here is the code from the chooseBestPairs function:

## pairsInOrder<-cbind(sizeDiffs,avgN)[order(sizeDiffs,-1*avgN,decreasing=FALSE),] ## sort by diff in size and then by avgN
pairsInOrder<-cbind(sizeDiffs,avgN,sizeDiffs/avgN)[order(sizeDiffs/avgN,-1*avgN,decreasing=FALSE),] ## sort by diff in size/avgN and then by avgN

Here are the pairs for AK.

            sizeDiffs  avgN
02100-02180         0  23.0 0.0000000
02060-02188         0   6.0 0.0000000
02164-02270         0   4.0 0.0000000
02013-02275         1   8.5 0.1176471
02050-02280         2  16.0 0.1250000
02068-02198         4  27.0 0.1481481
02110-02150        21  96.5 0.2176166
02130-02220        11  49.5 0.2222222
02016-02185         2   6.0 0.3333333
02090-02122       256 550.0 0.4654545
02070-02105         9  16.5 0.5454545

I'm attaching a csv file with all of the pairs (named .txt to force github to upload it) so you can play around if you have time. Or play with just the AK data above.

pairchars.txt

from spilloverdesign.

nahiggins commented on September 2, 2024

Two ideas for you to react to:

Idea 1:

Penalty function:

penalty <- function(x1,x2){
if( (x1-x2) != 0 ){
log(x1) + log(x2) - log(abs(x1 - x2))
}else{
log(x1) + log(x2)
}
}

Using this function, higher scores are better. As you can see, this favors
large counties. Under this penalty function, a (500,1000) pairing scores
slightly better (6.91) than a (300,500) pairing (6.62). But a (400,500)
pairing scores better than both of those (7.60). A (25,25) pairing would
score a 6.44, i.e. worse than all of these, simply because the county is so
small. Given that it's hard to imagine detecting spillovers caused by a
mailing to 25 people, this seems reasonable. You start to get competitive,
so to speak, pretty quickly if the counties are well matched, however. So
a (50,50) pairing scores 7.82, i.e. better than a (400,500) pairing. But a
(400,500) pairing outscores a (45,50).

So we could fiddle with scale, but this seems to have the properties we are
looking for.

Idea 2:

Quantiles:

Create size quantiles (could use deciles to keep the quantiles small).
Make pairs randomly from within quantiles. Sample first from larger
quantiles, then work down.

What do you think about these ideas?

On Tue, Jan 5, 2016 at 9:18 AM, Jake Bowers [email protected]
wrote:

One thing to work on: Decide on how to best rank pairs. Right now, I'm
ranking on sizeDiffs/avgN, and breaking ties on avgN and before I was
ranking on sizeDiffs and then breaking ties based on avgN. Here is the
code from the chooseBestPairs function:

pairsInOrder<-cbind(sizeDiffs,avgN)[order(sizeDiffs,-1*avgN,decreasing=FALSE),] ## sort by diff in size and then by avgN

pairsInOrder<-cbind(sizeDiffs,avgN,sizeDiffs/avgN)[order(sizeDiffs/avgN,-1*avgN,decreasing=FALSE),] ## sort by diff in size/avgN and then by avgN

Here are the pairs for AK.
        sizeDiffs  avgN
02100-02180 0 23.0 0.0000000
02060-02188 0 6.0 0.0000000
02164-02270 0 4.0 0.0000000
02013-02275 1 8.5 0.1176471
02050-02280 2 16.0 0.1250000
02068-02198 4 27.0 0.1481481
02110-02150 21 96.5 0.2176166
02130-02220 11 49.5 0.2222222
02016-02185 2 6.0 0.3333333
02090-02122 256 550.0 0.4654545
02070-02105 9 16.5 0.5454545

I'm attaching a csv file with all of the pairs (named .txt to force github
to upload it) so you can play around if you have time. Or play with just
the AK data above.

pairchars.txt
https://github.com/sbstusa/spilloverdesign/files/78482/pairchars.txt

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

I like the first idea. I'll implement it and we can see how the map looks.

As of now you can see the new assignment mechanism at https://sbstusa.github.io/spilloverdesign/saturationDesign.html

I think it looks pretty good. I ended up restricting any adjacency between pairs rather than just controls because I'd like to keep pair choice as a function of fixed characteristics of counties rather than add any new randomness into the design phase --- such randomness would make standard errors and tests more difficult later.

The only thing I haven't done yet is to restrict the cross border treated-to-control adjacency.

from spilloverdesign.

nahiggins commented on September 2, 2024

Way cool. Let me know if I can help any other way. Otherwise I'll just be
ready to take the output and run w/ it as soon as you're done.

(What about the idea of using "rounds" so that we can easily add/eliminate
observations? Just wondering if that was easy / if you thought it made
sense)

On Tue, Jan 5, 2016 at 10:58 AM, Jake Bowers [email protected]
wrote:

I like the first idea. I'll implement it and we can see how the map looks.

As of now you can see the new assignment mechanism at
https://sbstusa.github.io/spilloverdesign/saturationDesign.html

I think it looks pretty good. I ended up restricting any adjacency between
pairs rather than just controls because I'd like to keep pair choice as a
function of fixed characteristics of counties rather than add any new
randomness into the design phase --- such randomness would make standard
errors and tests more difficult later.

The only thing I haven't done yet is to restrict the cross border
treated-to-control adjacency.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

I'm not sure that the rounds idea works with the code as written. I'm closing in on a solution to the cross-state problem now.

from spilloverdesign.

nahiggins commented on September 2, 2024

ok!

On Tue, Jan 5, 2016 at 11:37 AM, Jake Bowers [email protected]
wrote:

I'm not sure that the rounds idea works with the code as written. I'm
closing in on a solution to the cross-state problem now.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

What do you think? Should we close this issue? Seems like the pairs at least have no more adjacency across or within states. Some of the pairs may differ a lot in size (in absolute terms). But we choose the ones that perform best on the penalty() function up until we ran out of them (because of adjacency problems or we hit the budget of no more than 1/4 of counties assigned to treatment.)

You can see the final map here: https://sbstusa.github.io/spilloverdesign/saturationDesign.html#final-map

With 50% assigned to treatment, we get about 137363 farmers. If we assign at about .55 then we get 15000.

from spilloverdesign.

nahiggins commented on September 2, 2024

Wow! That certainly looks quite good!

I can't see a better way, can you? I think we've done it! (well,
you've done
it, anyway!)

On Tue, Jan 5, 2016 at 12:04 PM, Jake Bowers [email protected]
wrote:

What do you think? Should we close this issue? Seems like the pairs at
least have no more adjacency across or within states. Some of the pairs may
differ a lot in size (in absolute terms). But we choose the ones that
perform best on the penalty() function up until we ran out of them (because
of adjacency problems or we hit the budget of no more than 1/4 of counties
assigned to treatment.)

You can see the final map here:
https://sbstusa.github.io/spilloverdesign/saturationDesign.html#final-map

With 50% assigned to treatment, we get about 137363 farmers. If we assign
at about .55 then we get 15000.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

jwbowers commented on September 2, 2024

I think this was a joint effort even if I did more typing. The experimentDat.csv file is in the main github repository and I also just updated it on googleDrive (it is not a sheet yet, so not easy to get straight from R, but a double click would fix that depending on your workflow).

In general I don't put data files and binary files (pdf, png, jpg, doc, xls) into github because github has some size limits and basically just makes copies of those files rather than nicely just maintaining differences. This time, however, our files are small and csv is a text format, so I'm comfortable having them on github.

Ok. I'll close this issue now.

from spilloverdesign.

nahiggins commented on September 2, 2024

Thanks!

I'll just go ahead and download the new experimentDat.csv file from the
"master" branch!

Quick question in case you know of a fast way to do this: how to limit a
character variable to a certain number of characters? Seems like it should
be really straightforward, but I haven't gotten it to work yet. I need to
cut off some of the longer names to fit using a 30 character limit.

Regular expressions was my first thought on how to do this. Ideas?

On Tue, Jan 5, 2016 at 12:31 PM, Jake Bowers [email protected]
wrote:

I think this was a joint effort even if I did more typing. The
experimentDat.csv file is in the main github repository and I also just
updated it on googleDrive (it is not a sheet yet, so not easy to get
straight from R, but a double click would fix that depending on your
workflow).

In general I don't put data files and binary files (pdf, png, jpg, doc,
xls) into github because github has some size limits and basically just
makes copies of those files rather than nicely just maintaining
differences. This time, however, our files are small and csv is a text
format, so I'm comfortable having them on github.

Ok. I'll close this issue now.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

nahiggins commented on September 2, 2024

doh! substr.

On Tue, Jan 5, 2016 at 12:53 PM, Nathaniel Higgins - MX-DETAILEE <
[email protected]> wrote:

Thanks!

I'll just go ahead and download the new experimentDat.csv file from the
"master" branch!

Quick question in case you know of a fast way to do this: how to limit a
character variable to a certain number of characters? Seems like it should
be really straightforward, but I haven't gotten it to work yet. I need to
cut off some of the longer names to fit using a 30 character limit.

Regular expressions was my first thought on how to do this. Ideas?

N

On Tue, Jan 5, 2016 at 12:31 PM, Jake Bowers [email protected]
wrote:

I think this was a joint effort even if I did more typing. The
experimentDat.csv file is in the main github repository and I also just
updated it on googleDrive (it is not a sheet yet, so not easy to get
straight from R, but a double click would fix that depending on your
workflow).

In general I don't put data files and binary files (pdf, png, jpg, doc,
xls) into github because github has some size limits and basically just
makes copies of those files rather than nicely just maintaining
differences. This time, however, our files are small and csv is a text
format, so I'm comfortable having them on github.

Ok. I'll close this issue now.

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

Nathaniel Higgins, Ph.D.
Fellow, White House Social & Behavioral Sciences Team
[email protected]
(202) 302-9146

from spilloverdesign.

Improve choice of pairs about spilloverdesign HOT 16 CLOSED

Comments (16)

pairsInOrder<-cbind(sizeDiffs,avgN)[order(sizeDiffs,-1*avgN,decreasing=FALSE),] ## sort by diff in size and then by avgN

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent