Giter VIP home page Giter VIP logo

gn_glove's Introduction

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, Kai-Wei Chang

Abstract

Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.


Updates:

The coreference model in this paper is based on end2end 2017 version.

Please note the embeddings trained by us didn't do lowercase.

Please modify the debias.sh to your own datapath (line 18).


01/14/2019

We also train the GN-GloVe using 1-billion training data. It can be downloaded here. There are 142527 tokens in this embedding corpus.


In Table 1 and 3, it should be "Hard-GloVe". And on Page 5, it should be "OntoNotes".

Our pretrained word embeddings (using wikidump dataset; 322636 tokens in the embeddings) can be found here. And GloVe trained on this Wikidump dataset can be found here.

The seed words we use in our paper is under wordlist.

The SemBias dataset can be found under SemBias. The last 40 instances are the "subset" we used in our paper.

You can run the code using "debias.sh" (Please change the corresponding parameters).

License

See the LICENSE file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.