Giter VIP home page Giter VIP logo

on-measuring-and-mitigating-biased-inferences-of-word-embeddings's Introduction

On-Measuring-and-Mitigating-Biased-Inferences-of-Word-Embeddings (AAAI 2020)

This branch contains code for ELMo-based debiasing. For BERT-based debiasing, checkout the bert_debias branch.

For citing our work:

@misc{dev2019measuring, title={On Measuring and Mitigating Biased Inferences of Word Embeddings}, author={Sunipa Dev and Tao Li and Jeff Phillips and Vivek Srikumar}, year={2019}, eprint={1908.09369}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Preprocessing

First have glove.840B.300d.txt located at ./data/glove.840B.300d.txt, and SNLI data txt files located at ./data/nli_bias/. Then do preprocessing:

python3 preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/nli_bias/ --output snli
python3 get_pretrain_vecs.py --glove ./data/glove.840B.300d.txt --dict ./data/nli_bias/snli.word.dict \
	--output ./data/snli.glove

For unlabeled data (i.e. bias probing data), preprocess like this:

DATA_NAME=occupation_gender_templates.stitch
python3 preprocess_unlabeled.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/nli_bias/ \
	--sent1 ${DATA_NAME}.sent1.txt --sent2 ${DATA_NAME}.sent2.txt \
	--vocab ./data/nli_bias/snli.word.dict --vocab_all ./data/nli_bias/snli.allword.dict \
	--output ${DATA_NAME}
python3 get_pretrain_vecs.py --glove ./data/glove.840B.300d.txt --dict ./data/nli_bias/${DATA_NAME}.word.dict \
	--output ./data/${DATA_NAME}.glove

Training

To train a baseline RNN model on SNLI, use the following:

MODEL=./models/baseline
python3 -u train.py --gpuid 0 --dir data/nli_bias/ \
	--train_data snli-train.hdf5 --val_data snli-val.hdf5 \
	--word_vecs snli.glove.hdf5 --dict snli.word.dict \
	--encoder rnn --use_elmo_post 0 \
	--save_file $MODEL | tee $MODEL.log.txt

Evaluation

To evaluate a trained model on SNLI test set, use:

MODEL=./models/baseline
python3 -u eval.py --gpuid 0 --dir data/nli_bias/ \
	--data snli-test.hdf5 \
	--word_vecs snli.glove.hdf5 --dict snli.word.dict \
	--encoder rnn --use_elmo_post 0 \
	--load_file $MODEL

To evlauate on unlabeled data, use:

DATA_NAME=occupation_gender_templates.stitch
MODEL=./models/baseline
python3 -u predict_unlabeled.py --gpuid 0 --dir data/nli_bias/ \
	--data ${DATA_NAME}.hdf5 --res ${DATA_NAME}.sent1.txt,${DATA_NAME}.sent2.txt,${DATA_NAME}.x_pair.txt \
	--word_vecs ${DATA_NAME}.glove.hdf5 --dict ${DATA_NAME}.word.dict \
	--encoder rnn --use_elmo_post 0 \
	--pred_output models/${DATA_NAME}.pred.txt \
	--load_file $MODEL

on-measuring-and-mitigating-biased-inferences-of-word-embeddings's People

Contributors

sunipa avatar t-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

aparnarajct

on-measuring-and-mitigating-biased-inferences-of-word-embeddings's Issues

License

Hello.

I am interested in this research and want to use the code. Could you add a LICENSE file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.