Giter VIP home page Giter VIP logo

attract-repel's Introduction

Attract-Repel

Nikola Mrkšić, University of Cambridge ([email protected])

This repository contains the code and data for the Attract-Repel method presented in Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints (Mrkšić et al., TACL 2017).

Available Word Vector Spaces

The bilingual word vector spaces for English + 51 languages (link to list of languages with respective language codes) are available here. The four-lingual EN-DE-IT-RU vector space which achieves state-of-the-art performance on Multilingual SimLex-999 can be downloaded here.

The five baseline bilingual vector spaces used in the paper (Tables 7 and 8) are available here.

Hebrew and Croatian SimLex-999 datasets are available here. The two datasets are also included in this repository.

The Italian and German Wizard-of-Oz (WOZ) dialogue state tracking datasets are available here.

The large morphologically specialised vectors (SGNS-LARGE) for English, Italian and German, presented in Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules, (Vulić et al., ACL 2017) are available here.

Configuring the Tool

The Attract-Repel tool reads all the experiment config parameters from the experiment_parameters.cfg file in the root directory. An alternative config file can be provided as the first (and only) argument to attract-repel.py.

The config file specifies:

  • the location of the initial word vectors (distributional_vectors);
  • the sets of linguistic constraints to be injected into the vector space (antonyms and synonyms);
  • whether to print SimLex scores after each epoch (print_simlex) and whether to log SimLex/WS-353 scores to file (log_scores_over_time).

The config file also specifies the hyperparameters of the attract-repel procedure (set to their default values in config/experiment_parameters.cfg).

The evaluation directory contains the SimLex-999 dataset (Hill et al., 2014), its multilingual variant (Leviant and Reichard, 2015), the SimVerb dataset (Gerz et al., 2016), and mono- and multilingual WS-353 datasets (Finkelstein et al., 2002; Leviant and Reichart, 2015). It also containts the Hebrew and Croatian SimLex-999 datasets collected in our work.

Running Experiments

python code/attract-repel.py config/experiment_parameters.cfg

Running the experiment loads the word vectors specified in the config file and fits them to the provided linguistic constraints. The procedure prints the updated word vectors to the results directory as results/final_vectors.txt (one word vector per line), alternative write path can be specified in the config file (output_filepath).

References

The TACL paper which introduces the Attract-Repel procedure, the cross-lingual vector spaces, Hebrew and Croatian SimLex-999 datasets and Italian and German Dialogue State Tracking corpora:

 @Article{Mrksic:2017,
  author    = {Nikola Mrk\v{s}i\'c and Ivan Vuli\'{c} and Diarmuid {\'O S\'eaghdha} and Ira Leviant and Roi Reichart and Milica Ga\v{s}i\'c and Anna Korhonen and Steve Young},
  title     = {Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints},
  journal   = {Transactions of the Association for Computational Linguistics},
  volume    = {5}
  year      = {2017},
  pages     = {309--324},  
 }

The ACL paper which uses the Attract-Repel procedure and simple language-specific rules to induce high-quality vector spaces which model morphological phenomena:

@inproceedings{Vulic:2017,
  author    = {Vuli\'{c}, Ivan and Mrk\v{s}i\'{c}, Nikola and Reichart, Roi and {\'O S\'eaghdha}, Diarmuid and Young, Steve and Korhonen, Anna},
  title     = {Morph-fitting: {F}ine-Tuning Word Vector Spaces with Simple Language-Specific Rules},
  booktitle = {Proceedings of ACL},
  year      = {2017},
  pages     = {56--68},
  }

If you are using PPDB 2.0 (Pavlick et al., 2015) or WordNet (Miller, 1995) constraints, please cite these papers. If you are using BabelNet constraints, please cite (Navigli and Ponzetto, 2012).

attract-repel's People

Contributors

ivulic avatar nmrksic avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.