Giter VIP home page Giter VIP logo

poincare_glove's People

Contributors

alex-tifrea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

poincare_glove's Issues

Question about "Cartesian product of p balls"

Dear Alex,

I am following your Hypobolic series work.
In this paper, I got lost when I met this concept,

image

My question is,
1) What does (D^n)^p mean?
2) How do we get that embedding? for example, the 50x2D, do we have to train 50 independent 2D Poincare embedding and then combine them?

question about the comment in glove.py

Hi, I have a question about the comment in glove.py.
In line 265 of glove.py, it says that. "vocab is indexed from 0; for co-occ we use 1-based indexing"
What does it mean?
Thanks.

Missing pretrained model 50x2D dist-sq with init trick

Description

TODO: change commented example

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

Hi Alex,

Thanks a lot for your open-source resources! I found the model that obtained the best performance in hypernymy evaluation 50x2D Poincare GloVe, h(x) = x^2 , init trick (190k) is missing in the released pre-trained vectors. Is it convenient for you to make the trained vectors for this configuration public? Thanks a lot!

A Question about the Norm of the embedding

Description

TODO: change commented example

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

Dear Alex,

I am curious why the norm of the word embeddings are close to 0.5 instead of 1. The most space of the poincare ball is close to the edge, which means the norm should be close to 1. I am a little confused.

Regards,
Shaoteng

TypeError: C variable gensim.models.word2vec_inner.scopy has wrong signature

I tried to run the training but the training did not start because of the following error:

Running python3 glove_code/scripts/glove_main.py --train --root=.. --coocc_file=data/demo/cooccurrence.bin --vocab_file=data/demo/vocab.txt --size=100 --workers=20 --chunksize=1000 --epochs=50 --lr=0.05 --restrict_vocab=200000 --coocc_func log --bias --train_log_filename ../train_logs/tmp_a283c
Redirecting output to ../train_logs/train_glove_ep50_size100_lr0.05_vocab200000_vanilla_OPTadagrad_COOCCFUNClog_bias
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 4, in
import gensim
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/init.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/init.py", line 15, in
from .doc2vec import Doc2Vec # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/doc2vec.py", line 78, in
from gensim.models.doc2vec_inner import train_document_dbow, train_document_dm, train_document_dm_concat
File "gensim/models/doc2vec_inner.pyx", line 1, in init gensim.models.doc2vec_inner
#!/usr/bin/env cython
TypeError: C variable gensim.models.word2vec_inner.scopy has wrong signature (expected __pyx_t_6gensim_6models_14word2vec_inner_scopy_ptr, got __pyx_t_14poincare_glove_6gensim_6models_14word2vec_inner_scopy_ptr)
Evaluating model from ../models/glove/glove_baseline/glove_ep50_size100_lr0.05_vocab200000_vanilla_OPTadagrad_COOCCFUNClog_bias
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 4, in
import gensim
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/init.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/init.py", line 15, in
from .doc2vec import Doc2Vec # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/doc2vec.py", line 78, in
from gensim.models.doc2vec_inner import train_document_dbow, train_document_dm, train_document_dm_concat
File "gensim/models/doc2vec_inner.pyx", line 1, in init gensim.models.doc2vec_inner
#!/usr/bin/env cython
TypeError: C variable gensim.models.word2vec_inner.scopy has wrong signature (expected __pyx_t_6gensim_6models_14word2vec_inner_scopy_ptr, got __pyx_t_14poincare_glove_6gensim_6models_14word2vec_inner_scopy_ptr)

When I compiled the Cython files, there were also some errors:

/home/usr/anaconda3/envs/glove/lib/python3.6/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/usr/Work/poincare_glove/poincare_glove/glove_code/src/glove_inner.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
warning: src/glove_inner.pyx:582:16: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

Could you please let me know how to resolve this issue?
Thanks!

Segmentation Fault (core dumped)

During Poincare embeddings training, the program often exits with a segmentation fault (core dumped) message.

[1073746064, 1073746480] Cannot compute Poincare distance between points. Points need to be inside the unit ball, but their squared norm is -nan and 0.999980.
zsh: segmentation fault (core dumped)  python3 glove_code/scripts/glove_main_es.py --train --root=..   --size=100

I'm using Ubuntu 18.04 with python 3.6.5.

Could you please let me know how to resolve this issue? Thanks!

About Hypernymy score

Hi, I am following your paper Poincare Glove: Hyperbolic Word Embedding. In your paper, you mentioned the experiment of hypernym detection and hypernymy score detection, but I am not able to find the corresponding code for this part. Did you upload this part? Thanks a lot!

ModuleNotFoundError: No module named 'glove_code.src.glove_inner'

I tried to run the training for Poincare GloVe vectors, but the training did not start because of the following error:


./run_glove.sh --train --root /poincare --coocc_file /glove/cooccurrence_full_vocab_20190221.shuf.bin --vocab_file /glove/vocab_orig.txt --epochs 10 --workers 16 --restrict_vocab 6000000 --lr 0.01 --poincare 1 --bias --size 100 --dist_func cosh-dist-sq --no_eval

Training the model and preparing to save it to ../models/glove/geometric_emb/glove_ep10_size100_lr0.01_vocab6000000_poincare_OPTradagrad_COOCCFUNClog_DISTFUNCcosh-dist-sq_bias
Running python3 glove_code/scripts/glove_main.py --train --root=.. --coocc_file=/glove/cooccurrence_full_vocab_20190221.shuf.bin --vocab_file=/glove/vocab_orig.txt --size=100 --workers=16 --chunksize= --epochs=10 --lr=0.01 --restrict_vocab=6000000 --poincare=1 --coocc_func log --dist_func cosh-dist-sq --bias --train_log_filename ../train_logs/tmp_35607
Redirecting output to ../train_logs/train_glove_ep10_size100_lr0.01_vocab6000000_poincare_OPTradagrad_COOCCFUNClog_DISTFUNCcosh-dist-sq_bias
/usr/local/lib/python3.6/dist-packages/smart_open/ssh.py:34: UserWarning: paramiko missing, opening SSH/SCP/SFTP paths will be disabled. pip install paramiko to suppress
warnings.warn('paramiko missing, opening SSH/SCP/SFTP paths will be disabled. pip install paramiko to suppress')
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 6, in
from glove_code.src.glove import Glove, NNConfig, InitializationConfig
File "/poincare/poincare_glove/glove_code/src/glove.py", line 5, in
from glove_code.src.glove_inner import read_all
ModuleNotFoundError: No module named 'glove_code.src.glove_inner'


Before running this I installed the required dependencies (pip3 install Cython nltk annoy) and ran the setup script (python3 setup.py develop).

I additionally tried to import the Cython module glove_code.src.glove_inner directly in the python3 REPL, and then I got the following error:


Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import pyximport
pyximport.install()

(None, <pyximport.pyximport.PyxImporter object at 0x7f4f09454eb8>)
from glove_code.src.glove_inner import read_all
/usr/local/lib/python3.6/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /poincare/poincare_glove/glove_code/src/glove_inner.pyx
tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
...
cdef extern from "vfast_invsqrt.h":
double fast_inv_sqrt(float number) nogil

REAL = np.float32

cdef scopy_ptr scopy=<scopy_ptr>PyCObject_AsVoidPtr(fblas.scopy._cpointer) # y = x
^

glove_code/src/glove_inner.pyx:28:5: 'scopy_ptr' is not a type identifier

Error compiling Cython file:
...
double fast_inv_sqrt(float number) nogil

REAL = np.float32

cdef scopy_ptr scopy=<scopy_ptr>PyCObject_AsVoidPtr(fblas.scopy._cpointer) # y = x
cdef saxpy_ptr saxpy=<saxpy_ptr>PyCObject_AsVoidPtr(fblas.saxpy._cpointer) # y += alpha * x
^

glove_code/src/glove_inner.pyx:29:5: 'saxpy_ptr' is not a type identifier

Error compiling Cython file:
...

Traceback (most recent call last):
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 215, in load_module
inplace=build_inplace, language_level=language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 191, in build_module
reload_support=pyxargs.reload_support)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyxbuild.py", line 102, in pyx_to_dll
dist.run_commands()
File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 462, in load_module
language_level=self.language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 231, in load_module
raise exc.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 215, in load_module
inplace=build_inplace, language_level=language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 191, in build_module
reload_support=pyxargs.reload_support)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyxbuild.py", line 102, in pyx_to_dll
dist.run_commands()
File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
ImportError: Building module glove_code.src.glove_inner failed: ["distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1\n"]


There was lots of similar errors in the import.

Did I perhaps forget to do some installation step?

Time taken to train word embeddings

Hi @alex-tifrea ,

I'm training word embeddings on the RCV1 corpus. I've generated the vocab (7MB) and co-occurence file (7.5GB) from the glove's code for RCV1 dataset. I run the following command to train
./run_glove.sh --train --root .. -coocc_file ../poincare_glove2/GloVe/cooccurrence.bin --vocab_file ../poincare_glove2/GloVe/vocab.txt --epochs 50 --workers 20 --restrict_vocab 200000 --lr 0.01 --poincare 1 --bias --size 100 --dist_func cosh-dist-sq

The scripts uses only single CPU core and do not dump anything in the logs/* file.

Could anyone provide any pointers to what I am doing wrong or something else is wrong?

Thanks

A problem from a interesting evaluation

Hi, I tried your pretrained consequence in my experiment. I believe this kind of embedding is suitable for dataset with hierarchy structure and can keep the hierarchy information. To evaluate this, I designed a experiment, but got a confusing result.

In the experiment, I choose a small dataset including 1000 words and find the parent words of them in the wordnet. Then, I calculate the distance between the child word and parent word, comparing it to the distance between the child word and other words in wordnet. Surprisingly, I find the distance between the child word and its parent word is relatively large, which I believe should be very small. I count the rank of the distance between the child word and its parent word among all words, and find it is only ranked in the middle.

Do you know the reason of the result?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.