Giter VIP home page Giter VIP logo

grounding-embeddings's Introduction

Grounding-Embeddings

(This is just a rough summary, will be updated more thoroughly.)

Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning

Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.

Link to paper.

Setup

You'll need NLTK, sklearn, numpy, matplotlib, tqdm, and gensim.

Data

To automatically retrieve all of the above except CSLB:

bash setup.sh

Directory

The main directory has subgraphs, where we keep our code, data outputs, and intermediates. The folders cslb, mcrae, glove, and word2vec are empty but should store the data mentioned above.

Our code is most compatible with Python 3.

The script feature_fit.py computes feature fit scores for words, as described in our paper. Note that the GloVe inputs are in word2vec format. These files should be the same as downloaded GloVe files except it includes the number of vectors and its dimension. So, the top of glove.6B.300d.w2v.txt has an extra line with "400000 300" and glove.840B.300d.w2v.txt has an extra line with "2196017 300".

grounding-embeddings's People

Contributors

hans avatar lucy3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.