Giter VIP home page Giter VIP logo

dist-based-model4bwe's Introduction

dist-based-model4bwe

PyTorch implementation of "A Distribution-based Model to Learn Bilingual Word Embeddings" (Cao et al., COLING2016)

conda install pytorch -c pytorch -y
conda install tqdm
python main.py --src en:data/ukWaC/tokenized.mini.txt.xz --trg it:data/itWaC/tokenized.mini.txt.xz -o vectors.txt --batch-size 1024 --cuda -v
# CPU: python main.py --src en:data/ukWaC/tokenized.mini.txt.xz --trg it:data/itWaC/tokenized.mini.txt.xz -o vectors.txt --batch-size 1024 -v
2018-02-20 10:19:02,241/Corpus[INFO]: Read from data/ukWaC/tokenized.mini.txt.xz
2018-02-20 10:19:02,452/Corpus[INFO]: Done.
2018-02-20 10:19:02,453/Corpus[INFO]: Read from data/itWaC/tokenized.mini.txt.xz
2018-02-20 10:19:02,700/Corpus[INFO]: Done.
2018-02-20 10:19:12,779/MAIN[INFO]: window size: 2
2018-02-20 10:19:12,779/MAIN[INFO]: learning rate: 0.01
2018-02-20 10:19:12,779/MAIN[INFO]: batch size: 1024
256it [00:17, 14.96it/s]
[1] loss = 8331.1028 (8096.2456/234.8572), time = 17.11
2018-02-20 10:19:30,112/MAIN[INFO]: Save embeddings to vectors.txt
256it [00:16, 15.26it/s]
[2] loss = 8242.8091 (8047.9243/194.8848), time = 16.78
2018-02-20 10:19:49,061/MAIN[INFO]: Save embeddings to vectors.txt
256it [00:16, 15.23it/s]
[3] loss = 8174.0800 (7979.4080/194.6720), time = 16.82
2018-02-20 10:20:07,977/MAIN[INFO]: Save embeddings to vectors.txt
256it [00:16, 15.34it/s]
[4] loss = 8144.2930 (7949.8464/194.4466), time = 16.69
2018-02-20 10:20:26,840/MAIN[INFO]: Save embeddings to vectors.txt
256it [00:16, 15.46it/s]
[5] loss = 8101.0505 (7906.8492/194.2012), time = 16.56
2018-02-20 10:20:45,467/MAIN[INFO]: Save embeddings to vectors.txt
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('vectors.txt', binary=False)
model.most_similar('en:cat')
> [('en:folk-tales', 0.5345577597618103), ('en:inject', 0.5104585886001587), ('it:mariotti', 0.4928727149963379), ('it:ebbero', 0.48998019099235535), ('it:funzionano', 0.4896196126937866), ('it:trash', 0.47888505458831787), ('en:feudal', 0.47887367010116577), ('en:creeps', 0.47296014428138733), ('it:abbronzati', 0.4703264832496643), ('en:staging', 0.4651801586151123)]

Umm...

dist-based-model4bwe's People

Contributors

notani avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

afcarl

dist-based-model4bwe's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.