Giter VIP home page Giter VIP logo

emf's Introduction

Word Embedding Revisted: Explicit Matrix Factorization

Introduction

  1. We write matlab code to train skip-gram negative sampling(SGNS) that is originally provided by the famous NLP toolbox word2vec(https://code.google.com/p/word2vec/).
  2. Our objective function is equivalent to SGNS, however, we design our algorithm in another way.
  3. You can take our code(w2vsbd.m) as a matlab implementation of SGNS, which is quite simple.
  4. Moveover, we provide a supervised explicit matrix factorization(w2vsbdsup.m) that will boost the performance through supervision.

Usage

  1. Download dataset 'enwik9.zip' from http://cs.fit.edu/~mmahoney/compression/textdata.html
  2. Decompress 'enwik9.zip' and get 'enwik9', then put it in folder './data/'
  3. Run run_emf.m file in matlab, then you will get a result of the first experiment of our paper
  4. Run run_semf.m file in matlab, then you will get a result of the second experiment of our paper
  5. Refer to our paper(https://etali.github.io/papers/EMF-IJCAI2015.pdf) and code

Our Experimental Environment(Requirements)

  1. Red Hat Enterprise Linux Server release 6.2 (64x)
  2. perl 5.10
  3. gcc 4.4.5
  4. matlab R2011a

Details

We adopt word2vec from https://code.google.com/p/word2vec/ to generate co-occurrence matrix, and our algorithm only bases on co-occurrence matrix. Our algorithm is a batch mode alternating minimization that is not as scalable as the algorithm in word2vec, however, it performs as good as skip-gram negative sampling(SGNS) provided by word2vec. We provide the word2vec.c code we used in our project under folder emf/word2vec/, in which we altered several snippets.

Authors

Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong, Enhong Chen

[email protected]

University of Science and Technology of China

emf's People

Contributors

etali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.