Giter VIP home page Giter VIP logo

crowd-counting-consortium's Introduction

Crowd Counting Consortium Dataset

This repository is home to a compiled and augmented version of the Crowd Counting Consortium's data on political protest events in the United States. We use a broad definition of "protest," so the dataset includes protests, rallies, demonstrations, marches, strikes, and similar actions.

  • The latest version of the full dataset is stored in csv format in this repository in two separate files, one covering the period 2017-2020 and the other covering the period 2021-present. The two files have identical column names and formates and can be merged into a single, large file covering the project's entire scope if desired. The easiest way to access them is to read them directly into your statistical software. If you prefer to download them to a local directory, though, you'll need to:

    1. Open this page in your browser;
    2. Right-click on the "View raw" linkin the center of the lower tile; and
    3. Choose a name and local destination for the file.
  • The data dictionary describes the columns in that file.

  • The coding guidelines describe the processes we use to find and encode information about relevant events.

If you use these data, please cite "Crowd Counting Consortium" as the source.

If you have questions about the dataset, please email Jay Ulfelder, Ph.D., at [email protected].

General Information

We strive to update this data set weekly, on Wednesdays not later than 4 PM Eastern time, with exceptions around holidays and ends of months. Please note that, while the raw data are updated on a rolling basis, there is some lag between the appearance of a news story or social-media post about an event (or the submission of a form to CCC) and the addition of a complete record to CCC's Google Sheets. CCC strives to keep that interval as short as possible, but the project operates on a shoestring budget, so periods of higher protest activity can make for longer delays.

The compiled data set serves as the input to the Crowd Counting Consortium (CCC) Data Dashboard, which interactively maps the events and plots summaries of them. Both the dashboard and the compiled data set are maintained by the Nonviolent Action Lab, a research program within the Carr Center for Human Rights Policy at the Harvard Kennedy School. The Crowd Counting Consortium is co-directed by faculty at Harvard University and the University of Connecticut.

The CCC collects these data in the public interest and to further scholarly research. The CCC is not formally affiliated with any other efforts to collect data on demonstrations, though it collaborated with Count Love from 2017 until early 2021. Anyone who wishes to conduct research using the data is fully responsible for any necessary contact with their own Institutional Review Boards.

We recognize that no large-scale data set on political crowds will ever capture all relevant events with 100% accuracy. Even so, we aspire to make this record as complete and error-free as possible. If you see a record that you believe needs correcting, or if you are aware of a relevant event that is not included, please do not open an issue here to report it. Instead, please submit a record or correction via the (anonymous) Google Form on the original CCC website.

If you have suggestions on how to improve this repository or the compiled version of the data it hosts, please submit a ticket via the "Issues" button above (or just click here).

Academic Research Using CCC Data

Curious how these data are getting used? Here, in reverse chronological order, are peer-reviewed published studies that have used CCC's work. If you are aware of any published articles or monographs that cite the data that aren't listed here, please let us know.

  • Amory Gethin and Vincent Pons. "Social Movements and Public Opinion in the United States." National Bureau of Economic Research Working Paper No. 32342 (April 2024) link

  • Neal Caren. "Right-Wing Protest in the United States, 2017 to 2022." Socius: Sociological Research for a Dynamic World (July 5, 2023). link

  • Cassy Dorff, Grace Adcox, and Amanda Konet. "Data innovations on protests in the United States." Journal of Peace Research Vol. 60, No. 1 (2023): 172-189. link

  • Daniel Karell, Andrew Linke, Edward C. Holland, and Edward Hendrickson. "Hard-Right Social Media and Civil Unrest." American Sociological Review Vol. 88, No. 2 (2023) link

  • Jeremy Pressman, Erica Chenoweth, Tommy Leung, L. Nathan Perkins, and Jay Ulfelder. "Protests Under Trump, 2017-2021." Mobilization Vol. 27, No. 1 (2022): 13-26. link

  • Yuko Sato and Jake Haselswerdt. "Protest and state policy agendas: Marches and gun policy after Parkland." Policy Studies Journal Vol. 50, No. 4 (2022): 877-895. link

  • Joan C. Timoneda and Erik Wibbels. "Spikes and Variance: Using Google Trends to Detect and Forecast Protests." Political Analysis Vol. 30, No. 1 (2022): 1-18. link

  • Peter J. Phillips and Gabriela Pohl. "Crowd counting: a behavioural economics perspective." Quality & Quantity (2021): 1-18. link

  • Jeremy Pressman and Austin Choi-Fitzpatrick. "Covid19 and protest repertoires in the United States: an initial description of limited change," Social Movement Studies Vol. 20, No. 6 (2021): 766-773. link

  • Anton M. Sobolev, Keith Chen, Jungseock Joo, and Zachary C. Steinert-Threlkeld. "News and Geolocated Social Media Accurately Measure Protest Size Variation." American Political Science Review Vol. 114, no. 4 (2020): 1343-51. link

  • Nicholas S. Miras. "Polls and Elections: Resistance Is Not Futile: Anti-Trump Protest and Senators' Opposition to President Trump in the 115th Congress." Presidential Studies Quarterly Vol. 49, no. 4 (2019): 932-958. link

  • Kenneth T. Andrews, Neal Caren, and Alyssa Browne. "Protesting Trump." Mobilization Vol. 23, no. 4 (2018): 393-400. link

And here are some working papers we've seen that use the dataset. Again, please let us know if you're aware of others that should be listed.

  • Larreboure, Magdalena and Felipe González, "The Impact of the Women's March on the U.S. House Election" (November 22, 2021) link

  • Ebbinghaus, Mathis, Nathan Bailey, and Jacob Rubel. "Defended or defunded? Local and state policy outcomes of the 2020 Black Lives Matter protests." (November 19, 2021) link

Journalism Using CCC Data

Journalists also use CCC data, often to document trends or provide context in stories on specific events. Here are some examples.

  • "Riot police and over 2,000 arrests: a look at 2 weeks of campus protests", The Washington Post (May 3, 2024) link

  • "Wo US-Studierende für Palästina demonstrieren", Zeit Online (April 26, 2024) link

  • "The growing pro-Palestinian movement, visualized", The Wall Street Journal (April 24, 2024) link

  • "The new movement for Palestine", Hadas Thier, Hammer & Hope (Spring 2024) link

  • "Some young Black voters undecided about Biden over lack of support for Palestinians", USA Today (December 17, 2023) link

  • "Tens of thousands have joined pro-Palestinian protests across the United States. Experts say they are growing", PBS News Hour (December 15, 2023) link

  • "'Largest pro-Palestinian Mobilization in U.S. History' | More Than 1 Million Americans Participated in Protests Since Hamas-Israel War Began on Oct 7" Haaretz (December 5, 2023) link

  • "Pro-Palestinian marches are far more frequent than pro-Israeli ones. How U.S. reaction to the Israel-Hamas war has changed", Los Angeles Times (November 21, 2023) link

  • "Protesters say they wanted Congress's attention. Police saw a threat", Washington Post (November 16, 2023) link

  • "Will protests over the Israel-Hamas war shift U.S. policy?", Good Authority (October 27, 2023) link

  • "LGBTQ community celebrates pride in the face of online and offline attacks", Reuters (June 11, 2023) link

  • "Montgomery police to patrol Drag Story Hours after Proud Boys protest", Washington Post (February 21, 2023) link

  • "Drag Story Hour protest in NYC caps a year of anti-drag attacks", NBC Out (December 30, 2022) link

  • "2021 was supposed to be the 'worst year' for LGBTQ rights - then came 2022", NBC Out (December 29, 2022) link

  • "Mass Shooting at Gay Nightclub in Colorado Follows Surge of Right-Wing Rhetoric and Threats Targeting LGBTQ Community", The Americano (November 22, 2022) link

  • "Hateful rhetoric, demonstrations targeting LGBTQ+ community on the rise, experts say", WWMT.com (November 21, 2022) link

  • "Clear spike in anti-trans rhetoric sets stage for violence like Colorado Springs shooting, experts say", The Colorado Sun (November 20, 2022) link

  • "Club Q shooting follows year of bomb threats, drag protests, anti-trans bills", The Washington Post (November 20, 2022) link

  • "Where Is the Anti-Biden Tea Party?", The New York Times (August 24, 2021) link

  • "BLM and Floyd protests were largely peaceful, data confirms", Christian Science Monitor (July 8, 2021) link

  • "Protests for Black lives are still happening", Vox (July 16, 2020) link

  • "Black Lives Matter may be the largest movement in U.S. history", The New York Times (July 3, 2020) link

  • "Maps: How Protests Evolved in the Wake of George Floyd's Killing", The Wall Street Journal (June 12, 2020) link

CCC Blog

We occasionally publish short pieces of analysis and data visualization on our project blog, which you can find here.

crowd-counting-consortium's People

Contributors

nonviolent-action-lab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

crowd-counting-consortium's Issues

Add unique identifier

It would be nice to add a unique identifier... the main thing is to have one, of any kind, but personally I'd just go with an integer that increments with each event that's added. It doesn't really matter if the order means anything for entries already added. Additionally, and optionally, you could add a new column to record "DateAdded" to the database, i.e., when the data is moved from the google sheet into the csv.

problems parsing ccc_compiled.csv

I'm getting problems parsing the csv, I don't have time to investigate much but wanted to let you know.

> prot <- read_csv("~/crowd-counting-consortium/ccc_compiled.csv")
Parsed with column specification:
cols(
  .default = col_character(),
  Date = col_date(format = ""),
  County = col_logical(),
  Online = col_double(),
  ClaimType = col_double(),
  EstimateLow = col_double(),
  EstimateHigh = col_double(),
  EstimateText = col_logical(),
  EstimateCat = col_double(),
  ReportedArrests = col_double(),
  ReportedParticipantInjuries = col_double(),
  ReportedPoliceInjuries = col_double(),
  ReportedPropertyDamage = col_double(),
  Final = col_double(),
  lat = col_double(),
  lon = col_double()
)
See spec(...) for full column specifications.
Warning: 41534 parsing failures.
 row          col           expected    actual                                           file
1237 EstimateText 1/0/T/F/TRUE/FALSE hundreds  '~/crowd-counting-consortium/ccc_compiled.csv'
1245 EstimateText 1/0/T/F/TRUE/FALSE dozens    '~/crowd-counting-consortium/ccc_compiled.csv'
1258 EstimateText 1/0/T/F/TRUE/FALSE thousands '~/crowd-counting-consortium/ccc_compiled.csv'
1259 EstimateText 1/0/T/F/TRUE/FALSE hundreds  '~/crowd-counting-consortium/ccc_compiled.csv'
1260 EstimateText 1/0/T/F/TRUE/FALSE na        '~/crowd-counting-consortium/ccc_compiled.csv'
.... ............ .................. ......... ..............................................
See problems(...) for more details.


> problems(prot)
# A tibble: 41,534 x 5
     row col         expected          actual        file                                   
   <int> <chr>       <chr>             <chr>         <chr>                                  
 1  1237 EstimateTe… 1/0/T/F/TRUE/FAL… hundreds      '~/crowd-counting-consortium/ccc_compi…
 2  1245 EstimateTe… 1/0/T/F/TRUE/FAL… dozens        '~/crowd-counting-consortium/ccc_compi…
 3  1258 EstimateTe… 1/0/T/F/TRUE/FAL… thousands     '~/crowd-counting-consortium/ccc_compi…
 4  1259 EstimateTe… 1/0/T/F/TRUE/FAL… hundreds      '~/crowd-counting-consortium/ccc_compi…
 5  1260 EstimateTe… 1/0/T/F/TRUE/FAL… na            '~/crowd-counting-consortium/ccc_compi…
 6  1261 EstimateTe… 1/0/T/F/TRUE/FAL… hundreds      '~/crowd-counting-consortium/ccc_compi…
 7  1262 EstimateTe… 1/0/T/F/TRUE/FAL… several hund… '~/crowd-counting-consortium/ccc_compi…
 8  1265 EstimateTe… 1/0/T/F/TRUE/FAL… nearly 300    '~/crowd-counting-consortium/ccc_compi…
 9  1266 EstimateTe… 1/0/T/F/TRUE/FAL… na            '~/crowd-counting-consortium/ccc_compi…
10  1267 EstimateTe… 1/0/T/F/TRUE/FAL… 650 or so     '~/crowd-counting-consortium/ccc_compi…
# … with 41,524 more rows

Strange encoding for certain rows in `ccc_compiled.csv`

On row 890 of ccc_compiled.csv, the raw entry is

2017-01-29,"Fort Lauderdale","FL","Fort Lauderdale�Hollywood International Airport",0,"demonstration",NA,NA,"General protest",NA,NA,"end the ""Muslim ban""; immigration; anti-Trump",1,"immigration; presidency; religion",NA,200,200,200,2,"0",0,"0",0,"0",0,"0",0,NA,NA,NA,NA,NA,"http://www.nbcmiami.com/news/local/-Protest-at-Miami-International-Airport-Following-Trump-Travel-Restrictions-412087023.html",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"26.122439","-80.137317","Fort Lauderdale","Broward County","FL","12011"

In R, the questionable character in "Fort Lauderdale�Hollywood International Airport" is interpreted as "\x96", and after a cursory search on StackOverflow this doesn't seem to be a valid Unicode character. This also occurs for the location_detail column at rows 932, 1050, 1086, 1168, 1230, 1722, 1775, 1799, and 446 other rows, and possibly for other columns as well.

In R, basic string manipulation on rows containing characters like these fails; e.g. tolower() throws with error Error in tolower(.) : invalid multibyte string 1. Some discussion about that error can be found here, although it didn't help me much.

Could I ask what encoding the CSV is in, and if there are any recommended strategies for working around them?

Encoding is variably UTF-8 or ISO-8859-1

Hi! Great work on this dataset - unfortunately, reading the files with Python fails because the encodings vary in some of the records. Luckily they all seem to be either UTF-8 or ISO-8859-1 for the moment. You can fix them with this script:

#!/usr/bin/env python3

import chardet
import logging
from pathlib import Path

for path in Path(".").glob("*.csv"):
    tmp = path.with_suffix(".csv.tmp")
    with open(path, "rb") as infh, open(tmp, "wt") as outfh:
        for idx, spam in enumerate(infh):
            try:
                line = spam.decode("utf8")
            except UnicodeDecodeError:
                det = chardet.detect(spam)
                logging.warning(
                    "Line %d of %s is not UTF8, probably %s, re-encoding it",
                    idx,
                    path,
                    det["encoding"],
                )
                line = spam.decode(det["encoding"])
            outfh.write(line)
    tmp.rename(path)

Instead of `Final` -> `Version`?

Instead of a binary variable, you could call it Version and then give yourself the freedom to revise past data if necessary. 0 could still mean un-finalized, 1 still indicates "intended to be final" and then > 1 indicates revisions. If you do ever revise data, you should also add a DateRevised column to make it easier for users of the data to see when data was revised.

capitalize column names for consistency?

Best to have either none or all variable names capitalized.

lat. Latitude of locality in which the event took place, as resolved by Google Maps Geocoding API.

lon. Longitude of locality in which the event took place, as resolved by Google Maps Geocoding API.

locality. Name of the locality in which the event took place, as resolved by Google Maps Geocoding API.

county. Name of the county in which that locality sits, as resolved by the Google Maps Geocoding API.

state. Postal abbreviation of the state or territory in which that locality sits, as resolved by the Google Maps Geocoding API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.