Giter VIP home page Giter VIP logo

2020_benfords's Introduction

First digit visualization of in selected counties/cities in the 2020 presidential election.

Jupyter notebooks to analyze various precincts/wards for the 2020 election. Each notebook has either a source URL for the dataset or a link to the spreadsheet that was downloaded and parsed.

Benford's Law, also called the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.

Plots of the first digits of counts in various precincts and wards for selected counties/cities.

Fulton County, GA:

Fulton County

Miami-Dade, FL

Miami-Dade

Milwaukee, WI

Milwaukee

Chicago, IL

Chicago

Allegheny, PA

Allegheny

2020_benfords's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

2020_benfords's Issues

Data availability

Maybe I completely missed it, but what exactly are the source links for the data?

Paper suggests that even second-digit analysis cannot be used

Please refer to chapter 2 in the following paper:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.697.5592&rep=rep1&type=pdf

The paper suggests precinct results in previous elections in a number of countries do not seem to follow the second-digit Benford distribution.

Let me try to outline why this does not hold for second digits either. If you have precincts in cities designed so that the votes for a certain candidate follows a chi squared distribution with an expected value of 5000 and a certain deviation, then the most likely result is 5000 (2nd digit: 0). The second most likely results are 4999 and 5001 (2nd digits: 9 and 0). The third most likely results are 4998 and 5002 (2nd digits: 9 and 0). Etc. (edit: i got this wrong the first time)

On the other hand, for a Benford distribution, the most likely result is 1. The second most likely result is 2. The third most likely result is 3. Etc.

Hence, using second digits does not fix the problem with planned precinct sizes. We can perhaps see from the example how Benford's Law will only work if the expected value of the distribution is 0. With rational planning of precinct sizes inside cities, that won't happen. Countryside precincts are more likely to follow the Benford pattern, as the number of votes in each precinct will be more "organically" determined and less planned.

It thus seems that the methodology cannot be applied inside cities.

Does anybody have a dataset known to be fraudulent?

I want to experiment around with changing data to base 8 or base 16, multiplying, etc, was wondering if anybody knows of a dataset that was known to be manipulated so I can tell if the fraud get's obscured or holds.

Registered voters & number of ballots

Very Cool Project. Could you also plot the distribution of leading digits in the number of registered voters and number of ballots per voting district?

In cities where the votes are heavily skewed to one candidate or another, the distribution of leading digits of votes for that candidate should be highly correlated to the leading digit of the number of ballots. Would number of registered voters and number of ballots follow Benford's law?

Why is the Biden Election Day vote data nearly Gaussian ?

The weirdest thing to me about all of this is that the Election Day vote distribution for Biden is almost perfectly Normal, with a slight right skew.

Screen Shot 2020-11-08 at 9 44 15 PM

Whereas the Trump data , being heavy-tailed, just looks more like real-world data to me

Screen Shot 2020-11-08 at 9 44 26 PM

Any thoughts ?

2016 and 2012 comparisons

It would be interesting to see these same districts compared to the 2016 results and the 2012 results (where an incumbent was running).

Looking at vote counts instead of first digits shows why this is not evidence of fraud

I suggest you plot a histograms of the vote counts per precinct, along with the histograms of the first digits. You will see immediately that these are not evidence of voter fraud, or even examples of data that should obey Benford's Law.

Instead, what you will see is that counties like Allegheny were chosen, where Biden almost always got more than 200 votes per precinct, and Trump did not. So, it superficially looks like Biden has a "shortage" of the digit 1. But, in fact, this normal distribution should not be expected to obey Benford's Law, even approximately.

Benford law 2020 Election Allegheny

Plans for expansion?

Are there plans of expanding this project to include every county/division/voting unit in the country? I think there's value in it and would hash out some of the concerns in Issue #5.

I'd also suggest including documentation about where the data can be obtained as well.

If there's interest in carrying out a similar analysis in R I could devote some time.

Strongest Evidence there was election fraud.

If this is too off base from the data delete it. I came here searching the truth and I'm convinced there was voter fraud, a claim I do not levy lightly. I am not an attorney and I recommend everyone read the entire docket below.

Source:
PA Courts - I would recommend everyone reading all of this.
https://www.courtlistener.com/docket/18618673/donald-j-trump-for-president-inc-v-boockvar/

Issue:
A Trump claim was about the right to cure ballots stating that this happened illegally based on the laws by the state of PA.

Proof:
The opposition to the case has filed DEMOCRATIC VOTERS affidavits CONFIRMING Trump's claim votes were systematically interfered with at scale throughout PA. I had to read this 5 times. This is likely one of the dumbest things I have seen in the court of law.

Source:
https://www.courtlistener.com/docket/18618673/donald-j-trump-for-president-inc-v-boockvar/
Via re 30 MOTION to Intervene filed by Joseph Ayeni, Black Political Empowerment Project, Common Cause Pennsylvania, Lucia Gajda, Stephanie Higgins, Meril Lara, League of Women Voters of Pennsylvania, Ricardo Morales, NAACP Pennsylvania State Conference, Natalie Price, Tim Stevens, Taylor Stover. They have submitted affidavits against Trump's motion stating they actually broke the law and even named specific parties like the DNC.
See the exhibits on 31 Nov 10, 2020.

Research suggests Benford is unreliable in Election Fraud Detection

Benfords law regarding election data needs to use the second digit analysis

Benford's first-digit analysis is intended to be used on data with several orders of magnitude, and hundreds of votes per precinct over hundreds of counties is not sufficient. For detecting voter fraud, you need to use the second-digit analysis.

The data presented in this project do not properly apply the law and are misleading.

https://repository.library.georgetown.edu/bitstream/handle/10822/557850/Brown_georgetown_0076M_11716.pdf
https://www.cambridge.org/core/journals/political-analysis/article/benfords-law-and-the-detection-of-election-fraud/3B1D64E822371C461AF3C61CE91AAF6D
https://en.wikipedia.org/wiki/Benford%27s_law#Election_data

Failure to take account for external factors

  • Covid caused the mail-in voting rate to rise.
  • Counties counted votes at different rates due to the surplus of mail-in ballots vs the standard rate using an electronic system.
  • Mail-in voting discouraged by the republican candidate. As a result, one side was more likely to cast in-person using such a system at a poll location.

Don't get me wrong; this looks well written. However, this could do for a PR with such notices. I'd be happy to contribute one to the readme.

Data from multiple providers?

Has there been any comparison of results for the race with data from different providers? E.g, compare NYT Edison vs Clarity Elections for Georgia or Fulton County

For what it's worth, I created a simple project to download data from NYT, perhaps it could one day be an option for this repo's analysis:

https://github.com/tomdotcash/election_data

Understanding the plot

Can someone help me to understand the plots? I understand Benford law but what does the frequency mean in the plots? I know it's the frequency of some numbers/data of the vote but what exactly are these numbers? Where do they come from? Thanks!

Code to mass process the data

I combined the toolkit I've been building for the last few days with your data and some of the processing code. Not sure if this is something you'd be interested in, but I can put together a PR to add the script to the repo. You can take a look here, it's not as pretty as your code, but it is tested and works. Results have been compared with other benford py libraries. Here's the link: https://github.com/FraudAnalysis/Benford2020/blob/main/Analysis.ipynb

Milwaukee ward sizes are small and there is a highly preferred candidate

The disappearance of Benford's law in Milwaukee is a function of voter preference alone. If one candidate has between 60% and 80% average chance of receiving a vote, then the sizes of the wards in Milwaukee are too small to accommodate Benford's law. See further details with my simulations here https://rpubs.com/frycast/687633

Edit: Not just too small, but too concentrated. They do not span many orders of magnitude.

Edit 2: The thread below becomes distracted by an effort to look into election data anomalies that are not directly related to this issue. My intention here is not to develop a fraud detection tool, but to highlight the major flaws with the one being used, and currently being touted by various news sources as evidence of fraud. So far, this issue is still open, and should be resolved by at least adding some comments to the README clarifying that the pattern observed in Milwaukee is a pattern that can arise in election data absent of fraud. Hopefully the owner of this popular repository, and the people involved here in this thread, are all interested in acting in good faith, and will focus on resolving the issue.

Analyze same data with different base values

Given that with Benford's law :

  • Clarity should (?) improve as the spread of values increases
  • The law still holds for any given base

Would be interesting to cross reference the same input data sets with using bases other than 10

something like
np.logX(1 + 1/digit) * N

where X is a range of numbers say [2..16]

So if you generated a range of graphs across the same data with different base sizes and eyeballed the result, that may lead to higher confirmation that the given data is abnormal if its also abnormal for the majority of numeric bases.

Alternatively, it may show what appears as an apparent anomaly is maybe not as bad as it looks. maybe ?

This is super interesting, you got me reading up on stats all over again :) Thanks.

Time series analysis?

Via Twitter: someone on 4chan, of all places, scraped time series from NYT and analyzed GOP/Dem vote share drift in vote count deltas over time. That is, with each update, what's the percentage of the update for Biden vs Trump (or at least that's how I understand it). Outside the turbulent period where in-person votes are counted, mail-in tended to drift slightly towards GOP over time in uncontested areas (both GOP and Dem), which they explain away as rural votes taking longer to arrive, and steeply towards Dem in contested ones.

Given where this comes from, take it with a massive grain of salt - I can't vouch for veracity of the data. I'm attaching the CSV if anyone wants to verify/take a look.

2020_election_time_series.zip

Not linking the Twitter thread here, as it launches into several conspiracies which we here can neither confirm nor deny.

The average ward in Milwaukee has 750 votes, how would Biden have 100-200 in 30% of wards?

This repo's use of Benford's Law is so misleading that it discredits other claims of fraud more generally.

  • If you pull the data from Milwaukee city, the average ward has 755 votes. Biden wins an average of 595 votes per ward. Obviously if this is true, his first-digit distribution is going to be skewed towards the 4, 5, 6 range.
  • Only 20.5% of wards had over 1,000 votes, and 2.1% of wards had between 100-200 votes. These are the only wards where Biden would even have a chance to get to 1___ votes.
  • It's laughably easy to produce these kinds of anomalies with political data. 65.6% of main-party candidates in the 2018 House elections had a vote total starting with 1. Massive fraud? No, it's because the average congressional district had 264 thousand votes, and in most races one or both of the candidates had 100,000-something votes.

2020 Milwaukee Data
2018 House Election Data

If the size of the district you're looking at is the same across many different races, the results will skew towards something completely different from Benford's Law, absent any fraud whatsoever. Thus, these results from Milwaukee or any other place provide no evidence of election fraud.

Breakdown by vote type

I am doing my own similar analysis on this, and I've noticed that if you break results down by vote type, by far the most non-conforming dataset is the Joe Biden ELECTION DAY results, rather than absentee or mail-in results. Please add a breakdown by vote type as well as by total. I have uploaded my datasets and code here: https://github.com/snex/election_results_benford

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.