Giter VIP home page Giter VIP logo

pytorch-skipgram's Introduction

Using pytorch to implement word2vec algorithm Skip-gram Negative Sampling (SGNS), and refer paper Distributed Representations of Words and Phrases and their Compositionality.

Dependency

  • python 3.6
  • pytorch 0.4+

Usage

Run main.py.

Initialize the dataset and model.

# init dataset and model
word2vec = Word2Vec(data_path='text8',
                    vocabulary_size=50000,
                    embedding_size=300)

# the index of the whole corpus
print(word2vec.data[:10])

# word_count like this [['word', word_count], ...]
# the index of list correspond index of word
print(word2vec.word_count[:10])

# index to word
print(word2vec.index2word[34])

# word to index
print(word2vec.word2index['hello'])

Train and get the vector.

# train model
word2vec.train(train_steps=200000,
               skip_window=1,
               num_skips=2,
               num_neg=20,
               output_dir='out/run-1')

# save vector txt file
word2vec.save_vector_txt(path_dir='out/run-1')

# get vector list
vector = word2vec.get_list_vector()
print(vector[123])
print(vector[word2vec.word2index['hello']])

# get top k similar word
sim_list = word2vec.most_similar('one', top_k=8)
print(sim_list)

# load pre-train model
word2vec.load_model('out/run-1/model_step200000.pt')

Evaluate

Refer repository eval-word-vectors. Like this:

eval/wordsim.py vector.txt eval/data/EN-MTurk-287.txt
eval/wordsim.py vector.txt eval/data/EN-MC-30.txt

pytorch-skipgram's People

Contributors

blackredscarf avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.