johncruf / alderman_machine Goto Github PK

View Code? Open in Web Editor NEW

3.0 2.0 2.0 122.61 MB

This is the repository for a research project investigating clientelistic politics in Chicago

Makefile 11.25% TeX 42.40% Shell 1.85% R 34.06% Python 10.43% Perl 0.01%

chicago menu politics urban alderman

alderman_machine's Introduction

I’m @JohnCRuf , a Pre-doctoral researcher under Jonathan Dingel at UChicago Booth and an ex-engineer.
👀 I’m interested in Economic Research. In particular I am interested in Political Economy, Urban Economics, and Innovation and Productivity from an empirical IO perspective (Although I do enjoy a good causal inference paper).
📫 How to reach me: Email me at [email protected] or [email protected]

alderman_machine's People

Contributors

Stargazers

Watchers

Forkers

syanthonyadam tmalthouse

alderman_machine's Issues

Address 11/15 RP Seminar comments

Intro:

Audience did not like 'misallocation' wording and verbiage because I do not present an optimal spending benchmark to compare against.
Alternatives include 'political favoritism' and 'politically-influenced welfare weighting'. The point is fair in my opinion but `politically-influenced welfare weighting' is a mouthful

Data Description:

Audience wants to see spending per capita figures in the data
People want to see maps and distributions of spending categories as well
Junbiao in particular had some nice comments on this as he framed his thinking in terms of 'rates of return' on projects. I don't really have an objection to this framing but it may be something to discuss if I dig into the spending categories more
Audience: Would be good to consider state-level investment (I don't have the time to get that data, so I'll set this aside)
Audience: May be good to go into somewhat gory details of data processing for applications, shows drive and problem-solving skills.
Lots of curiosity about potential border design. One audience member mentioned Bordeu's JMP!

Bernie Stone Case Study:

The audience went wild for this, with audible gasps when I revealed the

My first comment is to put this case study on the front page of the paper as its "catchy"
Basically, everyone thought this was an extremely impressive part of the paper and made the case that this was a worthwhile and interesting project to do, regardless of the results

Results

Need to be clearer about sample
One recommendation for an TWFE approach where I weigh the treatment by the amount of support - this would allow me to increase my sample size by 5 times because I would be able to use every precinct
One recommendation for using a synthetic control design
One recommendation for DiD placebo tests
Multiple questions on the best way to interpret the treatment effect.
A lot of questions and recommendations afterward, overall people were very supportive and fairly impressed. I wasn't expecting people to be impressed - I thought people would be bored by how local the setting is.

Why not a joint test of significance?

Had an idea while interpretting an issue for work.

Why can't I do a joint test between the top and bottom models? In how many specifications doe the joint tests pass?

Gather 311 Complaint Data

Gather complaint data and match it to political data using voting-precinct maps. Make sure to include duplicates.

Copy Bordeau's border specification

Olivia Bordeau has a brilliant JMP that I can copy for this project, that gives me a research design basically guaranteed to work:

https://www.oliviabordeu.com/papers/fragmented_cities_obordeu.pdf

Look at borders, order them by size. Larger wards will be less likely to spend menu money on local fringes. Use a border discontinuity design to estimate the border jump from dense to non-dense wards.

Finish incorporating 2005-2011 and 2019-2022 data into current draft of thesis

Fully incorporate the 2005-2011 data and 2019-2022 data into the thesis, and update results and methods to use this new data.

Create updated version of Myerson's favored minorities model

commit data scrape menu output files

Merge Thomas Malthouse's work into this repo

This should be done a pull request so I can review the work. This needs to follow the task-based format of this repo.

Key questions

Why is he getting a significant sorting effect in elections? It could be a composition-based effect from merging general and runoff elections. If the composition selectively removes general elections when a runoff exists, you have a mechanism for inducing a significant discontinuity.
Is it possible to merge contracting data with the menu data? If so, how?

Data Description Improvements

I want a few new figures:

Maps of spending categories. What areas get the most spending on streets, parks, sidewalks and alleys, policing, and misc?
Take 50 wards for each cycle from 2003-2022. This should 5 total: (04-07, 08-11, 12-15, 16-19, 19-22). Look at the distribution of a fraction of the total budget spent on top X incumbent supporting precincts in the general election of the previous election for each of the 50 wards in these five periods. This gives 250 numbers. What does this look like? If it is bi-modal, we have a strong case for bringing in Dixit-Londregan results.

Data processing for 2021 and 2022 PDFs do not work

Something needs to be fixed with the multi-line conjoin functionality of the 2016-2022 menu processing data that doesn't align with the formatting of the 2021 and 2022 files. A new separate script is needed for 2021 and 2022 that can successfully deal with their format's multi-line row formatting.

Python package management is FUBAR

currently the code to setup the python package management is FUBAR. I should be setting up a virtual environment and using that.

Add citations

Add comments here on citations that I should add

Custom DiD Estimator?

I need to consider using a custom-built estimator for this "close elections DiD" thing.

Under the close election assumption, I have treatment randomization over wards. However, because the "treatment" is having a "tied" alderman replaced, I dosage correlates to net votes. Furthermore, under the close election assumption, precinct-level net votes don't significantly change when treated or untreated.

Thus, I effectively see the "counter-factual dosage" each precinct would have gotten if their alderman had been reelected. I'm effectively matching on this "dosage" and then running a DiD, but this leads to a lower power than if I can use all of the 50 precincts available in each ward.

Traditional DiD estimators (e.g., https://bcallaway11.github.io/posts/five-minute-did-continuous-treatment) assume that you can't see this "counterfactual dosage," but in this case, we can.

Racial Redistricting Study

Wards tend to be majority White, Hispanic, or Black. Racial balancing of wards seems to be expected. In the 2015 redistricting, identify majority census tracts that get redistricted (mostly) to wards where they are a minority (preferably controlled by an alderman who does not share their race). Do their menu funds decrease? Do their city services decrease?

First, count the number of such occurrences in the 2015 and by racial categories. White -> hispanic, black -> white, etc.

Then, run a Dif-in-dif study to see what happens.

Eg. a Black census tract gets redistricted to a Hispanic-controlled ward, and what happens before/after relative to adjacent tracts that remained in the original ward?

Verify that spending totals align. If they don't align, why?

The PDFs contain accurate expenditure totals. Some of my expenditure totals obtained from summing the elements are much less. For example, from the collected dataset, the 49th ward's total expenditures are $365,839.8. In the 2005 PDF, it is $1,320,000.00. What gives?

DiD Spending Model: Close Elections

There are 9 close runoffs in 2019 and another 9 in 2015, defining "close" as <10% voteshare gap. I need to run a diff-in-diff on this and probably and RDD to boot. Trying to determine if the precincts who support an incumbent in a given election experience a drop in spending when they're booted out of office.

Ward 49 year 2018 shift

Ward 49 in 2018 has a column shift that the new python code needs to get around.

Writing improvements

Be clearer about how I construct samples
Be clearer about "fraction of observed spending" concept
Remove references to misallocation, talks bout political favoritism instead

Replace current theoretical model with something with a little more panache

The current model is very basic. It would be advisable to create a more interesting model that better reflects the situation at hand.

Eg. Citizens have a probability of getting a concern, and utility depends on how many concerns are addressed and how much the politician extracts rents. Politician can spend time addressing concerns (to get reelected) or extract rents. Political experience lowers DWL of rent extraction and increases productivity of addressing concerns. Over time, politician gets so good at extracting rents that they address a minimal number of concerns.

Because there is an initial strategy of extracting maximal rents at maximum inefficiency, people choose the devil they know instead of the devil they don't know.

R Merge Asserts

There's a dangerous amount of merging going on in the repo without appropriate asserts. Need to figure that out.

Elections data scraping code is stale... again

The election data scraping code is once again stale. Need to be updated.

Incorporate Geographic Distribution results (HHI and Responsiveness) into draft

Add section to draft conducting a distribution test of updated myerson model's predictions

Gather spatial data using census and google API's

Develop an R or Python script that takes the location information from the menu data, processes it so that the census and/or Google location APIs can handle it, and obtains geographic coordinates. Then match this data with Chicago Political Map data, especially voting precinct data.

Is it possible to pin down voter ethnicity + location using name and address?

Thomas says that some degree of identification is possible for distinguishing between jewish, latino, white
Does the voter registration data I got from the BOE have address? If so we're in business.

Fix missing locations in 2016-2022 data

Many off-menu expenditures from 2016-2022 seem to have no location attached to them (because the location is not present in the underlying data). I need to edit the menu_money_data_cleaner.R script to fix them.

DiD Spending Model: Retirement Due to Corruption

So, the close elections DiD estimator found robust null results.
This could be because close elections necessarily drive Aldermen towards median voter-style outcomes.
In a typical city, we'd be SOL with no way to boot out entrenched incumbents exogenously.
Luckily, this is all happening in Chicago so that we can use investigation-forced retirements as potential treatments.

The idea is to take the population of "entrenched" incumbents (ie, those who are unchallenged or win by extremely large margins) and compare them to the sample of aldermen forced into retirement by corruption allegations/investigations.

Quantifying geolocation error

I've observed an "obvious error rate" of approximately ~24 from my work on matching df_with_2_ands.csv to the 2003-2011 precinct map. An error rate of 25/1500 corresponds to a ~1.5% error rate. Not bad, but not great. Obvious errors are projects that span > 2 wards. There are only 3 project ids in the dataset that are from 2011 on.

We need a task that quantifies these errors across years and across wards. We need to show that this error rate is actually random.

Use intertemporal treatment effects

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3731856

This will allow me to use the full breadth of my sample and increase it 5x. Effectively answering the "why don't you just interact it" question.

Bernie Stone Geocoding Improvements

As seen here, we do a very good job of geomatching each year except for 2005 due to a $700k entry called "parks."

However, there is no reason why we can't bump up the 2009 and 2010 numbers by manually replacing whatever small typos are causing the geocoding script to miss ~500k.

Re-run BLP and discrete choice models using voting precinct data

Remove old, drastically underpowered BLP models and replace them with modestly underpowered BLP models exploiting voting precinct data.

There are 46 runoffs in the dataset. 40 precincts per ward/year mean a BLP model with ~1800 observations rather than ~200. No longer drastically underpowered, now merely modestly underpowered.

Re-Run DiD with Multiperiod data

Now that data from 2005 through 2022 is almost usable, the thesis needs to re-run the original DiD and RDD analyses.

Collect and Clean Geographic Data

Collect and clean the geographic version of the menu data and the 311 call data and match it.

Bernie Stone Case Study

“Well, I grew up in the 50th Ward and you know, God bless [the late former Ald.] Bernie Stone, may he rest in peace, but I remember crossing California going west, every street was resurfaced almost every year,” Ramirez-Rosa says. “They always had brand new lighting and then east of California, where he would lose the precincts consistently, I mean the streets were in shambles. Many people felt he was spending the bulk of the menu money west of California, where he was getting the bulk of the vote.” - Alderman Carlos Ramirez-Rosa

Let's test this explicitly. With the new data, I have Bernie Stone's Menu allocations for his last six years in office. Divide the 50th ward into a grid, does funding serendipitously drop across California?

Develop HHI-style index of equitable spending

Develop a concentration measure of menu spending. Determine if newer aldermen are more or less equitable in expenditures.

Slides need to be imported from overleaf

As in the title, the repo still needs to make the presentation slides.

Grab Campaign Contribution Data

After addressing #38, if I find a positive effect, it could be due to an intertemporal tradeoff between current election odds and next year. The basic theory would be that giving to current supporters induces campaign contributions, of which individual contributions matter most. Thus aldermen face a tradeoff -- they can give to their supporters to increase their war chest and roll over the war chest to the next election, or they can target the median voter now to secure this year's win.

Thus, this would rationalize the (hypothetical) finding that swapping secure aldermen for new, less secure aldermen finds an effect but not swapping out aldermen in competitive elections. You only care about tomorrow, given that you're confident you'll win today.

Should you need to do more geocoding at some point, I've had relative success using the Chicago street center lines data set to find the coordinates of intersecting streets.

Dataset: https://data.cityofchicago.org/Transportation/Street-Center-Lines/6imu-meau
WIP API implementation by @kollerbud: https://github.com/smacmullan/chicago-participatory-urbanism/blob/main/chicago_participatory_urbanism/geocoder_api.py

It would be a good idea to update the current location code to use this API.