Giter VIP home page Giter VIP logo

bias-aware-prf's Introduction

On the Orthogonality of Bias and Utility in Ad hoc Retrieval

This repository contains the code and resources for our bias-aware query expansion method, which diminishes the existing gender biases among retrived set of documents. The main focus of this approach in on controlling the bias in psuedo relevance feedback. Our work has shown that it is possible to effectively revise a user query that would lead to a less biased ranked list of documents. Based on our experiments, we find that a less biased revised query can maintain utility and at the same time reduce bias. We believe that this work lays the foundation for considering fairness and utility as two cooperating measures as opposed to being viewed as competing aspects.

In order to revise the initial query in a way that utility is maintained while significantly reducing bias, we re-rank the retrieved list of documents by BM25 using the interpolation formula:

Reldebiased(d) = (1-λ) Rel(d) - λ Bias(d)

We evaluate our approach by measuring the geneder bias in the retrieved lists of queries for our bias-aware expansion method against simple BM25 and RM3 expansion methods. Associated run files for each of the methods can be found in results/runs directory.The run files for our bias-aware expansion method are available for different values of lambda that determines the level of being sensitive towards biasedness of documents. In addition, the original queries and bias-aware expanded queries for Robust04, GOV2, CW09, CW12 are available in the results/queries directory.

We selected the interpolation coefficient (\lambda) from [0, 1] with 0.1 incerement. In order to explore whether bias has actually been reduced systematically in the retrieved list of documents, we measure the degree of bias using two diffirent appraoches including ARaB methods and LIWC Toolkit. Our results for λ = 0.5 are provided in table 1. The complete set of results for all λ values are available in Table2 in results directory. The results show that regardless of the metric used to measure the bias of the retrieved ranked list of documents, bias is decreased significantly on all the three bias metrics. The percentage of decrease in bias is consistent across all metrics and always statistically significant.

Table 1: Bias measurements using ARaB and LIWC-based metrics.

Method Robust04 GOV2 CW9 CW12
ARaB LIWC ARaB LIWC ARaB LIWC ARaB LIWC
TF Boolean TF Boolean TF Boolean TF Boolean
BM25 (Run) 0.61 0.35 0.48 0.33 0.14 0.07 0.23 0.08 0.05 0.40 0.14 0.19
PRF (Run) 0.61 0.34 0.45 0.39 0.11 0.07 0.22 0.07 0.07 0.42 0.10 0.2
Our Approach (Run) 0.43 0.27 0.34 0.18 0.07 0.05 0.14 0.06 0.04 0.23 0.05 0.13
Decrease in Bias (%) 29.5 20.6 24.4 53.8 36.4 28.6 36.4 14.3 42.9 45.2 50.0 35.0

Usage

In order to achieve a less biased reformulated set of queries, given the initial queries and corresponding relevant documents, one should replicate the following steps:
  1. use documents_calculate_bias.py script to calculate the bias level of the documents of the given collection.

  2. Use interpolation.py script to interpolate the retrieval score (given by BM25) with the bias score of each document to re-rank the documents. (In our experiments, we have selected lambda in range of [0,1] with 0.1 increment.)

  3. Use anserini toolkit to perform pseudo relevance feedback and expand the queries based on the top 10 documents of each query. In order to have a less biased expansion, we added a function called customised_RM3 in the SimpleSearcher class of anserini to expand each query based on the given initial query and the re-ranked list of documents that are less biased in comparison with the original run. The changes are made in the SimpleSearcher and RM3ReRanker classes of anserini that is forked into this repository. Finally, the searcher returns a list of retrieved documents based on the expanded queries which can be found here.

In order to evaluate the bias-aware expanded queries and calculate the level of gender biases inside the retirieved documents of each run file:
  1. You can use the following command inside the anserini directory to evaluate the performance of the bias-aware expanded queries:
tools/eval/trec_eval.9.0.4/trec_eval -m map -m P.30 results/queries/Original Queries/RB04/RB04_qrels.txt /results/runs/Bias-aware PRF/RB04/retrieved_list_unbiased_lambda_0.5.txt
  1. You may use runs_calculate_bias.py and retrieved_list_calculate_bias.py scripts for calculating the TF ARab and TF Boolean metrics introduced in Do Neural Ranking Models Intensify Gender Bias? . In addition, the codes for one other metric namely, LIWC are included inside src/LIWC directory. The LIWC lexicon is proprietary, so it is not included in this repository. The lexicon data can be purchased from liwc.net.

bias-aware-prf's People

Contributors

biasawareprf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.