Giter VIP home page Giter VIP logo

crows-pairs's Introduction

CrowS-Pairs

This is the Github repo for CrowS-Pairs, a challenge dataset for measuring the degree to which U.S. stereotypical biases present in the masked language models (MLMs). The associated paper is to be published as part of The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020).

Data reliability: Please note that recent work by Blodgett et al. (2021) has found significant issues with noise and reliability of the data in CrowS-Pairs. The problems are significant enough that CrowS-Pairs may not be a good indicator of the presence of social biases in LMs. Please refer to the Blodgett et al. paper for details.

The Dataset

The dataset along with its annotations is in crows_pairs_anonymized.csv. It consists of 1,508 examples covering nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status.

Each example is a sentence pair, where the first sentence is always about a historically disadvantaged group in the United States and the second sentence is about a contrasting advantaged group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:

  • sent_more: The sentence which is more stereotypical.
  • sent_less: The sentence which is less stereotypical.
  • stereo_antistereo: The stereotypical direction of the pair. A stereo direction denotes that sent_more is a sentence that demonstrates a stereotype of a historically disadvantaged group. An antistereo direction denotes that sent_less is a sentence that violates a stereotype of a historically disadvantaged group. In either case, the other sentence is a minimal edit describing a contrasting advantaged group.
  • bias_type: The type of biases present in the example.
  • annotations: The annotations of bias types from crowdworkers.
  • anon_writer: The anonymized id of the writer.
  • anon_annotators: The anonymized ids of the annotators.

Quantifying Stereotypical Biases in MLMs

Requirement

Use Python 3 (we use Python 3.7) and install the required packages.

pip install -r requirements.txt

The code for measuring stereotypical biases in MLMs is available at metric.py. You can run the code using the following command:

python metric.py 
	--input_file data/crows_pairs_anonymized.csv 
	--lm_model [mlm_name] 
	--output_file [output_filename]

For mlm_name, the code supports bert, roberta, and albert.

The --output_file will store the sentence scores for each example. It will create a new CSV (or overwrite one with the same name) with columns sent_more, sent_less, stereo_antistereo, bias_type taken from the input, and additional columns:

  • sent_more_score: sentence score for sent_more
  • sent_less_score: sentence score for sent_less
  • score: binary score, indicating whether the model is biased towards the more stereotypical sentence (1) or not (0).

Please refer to the paper on how we calculate the sentence score.

Note that, if you use a newer version of transformers (3.x.x), you might obtain different scores than the one reported in our paper.

License

CrowS-Pairs is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. It is created using prompts taken from the ROCStories corpora and the fiction part of MNLI. Please refer to their papers for more details.

Reference

If you use CrowS-Pairs or our metric in your work, please cite it directly:

@inproceedings{nangia2020crows,
    title = "{CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models}",
    author = "Nangia, Nikita  and
      Vania, Clara  and
      Bhalerao, Rasika  and
      Bowman, Samuel R.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics"
}

crows-pairs's People

Contributors

rasikabh avatar claravania avatar woollysocks avatar antmarakis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.