alex-tifrea / poincare_glove Goto Github PK
View Code? Open in Web Editor NEWImplementation of the "Poincare Glove: Hyperbolic word embeddings" paper
License: GNU Lesser General Public License v2.1
Implementation of the "Poincare Glove: Hyperbolic word embeddings" paper
License: GNU Lesser General Public License v2.1
I am encountering this problem when running ./run_glove.h....
Hi, I have a question about the comment in glove.py.
In line 265 of glove.py, it says that. "vocab is indexed from 0; for co-occ we use 1-based indexing"
What does it mean?
Thanks.
TODO: change commented example
Hi Alex,
Thanks a lot for your open-source resources! I found the model that obtained the best performance in hypernymy evaluation 50x2D Poincare GloVe, h(x) = x^2 , init trick (190k)
is missing in the released pre-trained vectors. Is it convenient for you to make the trained vectors for this configuration public? Thanks a lot!
I encountered this problem as 'cannot import name 'WordEmbCheckpointSaver' from 'gensim.models.callbacks''. Which gensim are you using? The latest gensim seems not having 'WordEmbCheckpointSaver' in callbacks.py
TODO: change commented example
Dear Alex,
I am curious why the norm of the word embeddings are close to 0.5 instead of 1. The most space of the poincare ball is close to the edge, which means the norm should be close to 1. I am a little confused.
Regards,
Shaoteng
I tried to run the training but the training did not start because of the following error:
Running python3 glove_code/scripts/glove_main.py --train --root=.. --coocc_file=data/demo/cooccurrence.bin --vocab_file=data/demo/vocab.txt --size=100 --workers=20 --chunksize=1000 --epochs=50 --lr=0.05 --restrict_vocab=200000 --coocc_func log --bias --train_log_filename ../train_logs/tmp_a283c
Redirecting output to ../train_logs/train_glove_ep50_size100_lr0.05_vocab200000_vanilla_OPTadagrad_COOCCFUNClog_bias
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 4, in
import gensim
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/init.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/init.py", line 15, in
from .doc2vec import Doc2Vec # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/doc2vec.py", line 78, in
from gensim.models.doc2vec_inner import train_document_dbow, train_document_dm, train_document_dm_concat
File "gensim/models/doc2vec_inner.pyx", line 1, in init gensim.models.doc2vec_inner
#!/usr/bin/env cython
TypeError: C variable gensim.models.word2vec_inner.scopy has wrong signature (expected __pyx_t_6gensim_6models_14word2vec_inner_scopy_ptr, got __pyx_t_14poincare_glove_6gensim_6models_14word2vec_inner_scopy_ptr)
Evaluating model from ../models/glove/glove_baseline/glove_ep50_size100_lr0.05_vocab200000_vanilla_OPTadagrad_COOCCFUNClog_bias
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 4, in
import gensim
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/init.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/init.py", line 15, in
from .doc2vec import Doc2Vec # noqa:F401
File "/home/usr/Work/poincare_glove/poincare_glove/gensim/models/doc2vec.py", line 78, in
from gensim.models.doc2vec_inner import train_document_dbow, train_document_dm, train_document_dm_concat
File "gensim/models/doc2vec_inner.pyx", line 1, in init gensim.models.doc2vec_inner
#!/usr/bin/env cython
TypeError: C variable gensim.models.word2vec_inner.scopy has wrong signature (expected __pyx_t_6gensim_6models_14word2vec_inner_scopy_ptr, got __pyx_t_14poincare_glove_6gensim_6models_14word2vec_inner_scopy_ptr)
When I compiled the Cython files, there were also some errors:
/home/usr/anaconda3/envs/glove/lib/python3.6/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/usr/Work/poincare_glove/poincare_glove/glove_code/src/glove_inner.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
warning: src/glove_inner.pyx:582:16: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
Could you please let me know how to resolve this issue?
Thanks!
During Poincare embeddings training, the program often exits with a segmentation fault (core dumped) message.
[1073746064, 1073746480] Cannot compute Poincare distance between points. Points need to be inside the unit ball, but their squared norm is -nan and 0.999980.
zsh: segmentation fault (core dumped) python3 glove_code/scripts/glove_main_es.py --train --root=.. --size=100
I'm using Ubuntu 18.04 with python 3.6.5.
Could you please let me know how to resolve this issue? Thanks!
Dear Alex,
I found that in the pre-trained embedding models you provided, the "poincare_glove_50x2D_cosh-dist-sq.txt" and the "poincare_glove_100D_cosh-dist-sq.txt" are exactly the same. Why?
Hi, I am following your paper Poincare Glove: Hyperbolic Word Embedding. In your paper, you mentioned the experiment of hypernym detection and hypernymy score detection, but I am not able to find the corresponding code for this part. Did you upload this part? Thanks a lot!
I tried to run the training for Poincare GloVe vectors, but the training did not start because of the following error:
./run_glove.sh --train --root /poincare --coocc_file /glove/cooccurrence_full_vocab_20190221.shuf.bin --vocab_file /glove/vocab_orig.txt --epochs 10 --workers 16 --restrict_vocab 6000000 --lr 0.01 --poincare 1 --bias --size 100 --dist_func cosh-dist-sq --no_eval
Training the model and preparing to save it to ../models/glove/geometric_emb/glove_ep10_size100_lr0.01_vocab6000000_poincare_OPTradagrad_COOCCFUNClog_DISTFUNCcosh-dist-sq_bias
Running python3 glove_code/scripts/glove_main.py --train --root=.. --coocc_file=/glove/cooccurrence_full_vocab_20190221.shuf.bin --vocab_file=/glove/vocab_orig.txt --size=100 --workers=16 --chunksize= --epochs=10 --lr=0.01 --restrict_vocab=6000000 --poincare=1 --coocc_func log --dist_func cosh-dist-sq --bias --train_log_filename ../train_logs/tmp_35607
Redirecting output to ../train_logs/train_glove_ep10_size100_lr0.01_vocab6000000_poincare_OPTradagrad_COOCCFUNClog_DISTFUNCcosh-dist-sq_bias
/usr/local/lib/python3.6/dist-packages/smart_open/ssh.py:34: UserWarning: paramiko missing, opening SSH/SCP/SFTP paths will be disabled. pip install paramiko
to suppress
warnings.warn('paramiko missing, opening SSH/SCP/SFTP paths will be disabled. pip install paramiko
to suppress')
Traceback (most recent call last):
File "glove_code/scripts/glove_main.py", line 6, in
from glove_code.src.glove import Glove, NNConfig, InitializationConfig
File "/poincare/poincare_glove/glove_code/src/glove.py", line 5, in
from glove_code.src.glove_inner import read_all
ModuleNotFoundError: No module named 'glove_code.src.glove_inner'
Before running this I installed the required dependencies (pip3 install Cython nltk annoy) and ran the setup script (python3 setup.py develop).
I additionally tried to import the Cython module glove_code.src.glove_inner directly in the python3 REPL, and then I got the following error:
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import pyximport
pyximport.install()
(None, <pyximport.pyximport.PyxImporter object at 0x7f4f09454eb8>)
from glove_code.src.glove_inner import read_all
/usr/local/lib/python3.6/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /poincare/poincare_glove/glove_code/src/glove_inner.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Error compiling Cython file:
...
cdef extern from "vfast_invsqrt.h":
double fast_inv_sqrt(float number) nogil
REAL = np.float32
cdef scopy_ptr scopy=<scopy_ptr>PyCObject_AsVoidPtr(fblas.scopy._cpointer) # y = x
^
glove_code/src/glove_inner.pyx:28:5: 'scopy_ptr' is not a type identifier
Error compiling Cython file:
...
double fast_inv_sqrt(float number) nogil
REAL = np.float32
cdef scopy_ptr scopy=<scopy_ptr>PyCObject_AsVoidPtr(fblas.scopy._cpointer) # y = x
cdef saxpy_ptr saxpy=<saxpy_ptr>PyCObject_AsVoidPtr(fblas.saxpy._cpointer) # y += alpha * x
^
glove_code/src/glove_inner.pyx:29:5: 'saxpy_ptr' is not a type identifier
Error compiling Cython file:
...
Traceback (most recent call last):
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/usr/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'x86_64-linux-gnu-gcc' failed with exit status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 215, in load_module
inplace=build_inplace, language_level=language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 191, in build_module
reload_support=pyxargs.reload_support)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyxbuild.py", line 102, in pyx_to_dll
dist.run_commands()
File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 462, in load_module
language_level=self.language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 231, in load_module
raise exc.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 215, in load_module
inplace=build_inplace, language_level=language_level)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyximport.py", line 191, in build_module
reload_support=pyxargs.reload_support)
File "/usr/local/lib/python3.6/dist-packages/pyximport/pyxbuild.py", line 102, in pyx_to_dll
dist.run_commands()
File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/usr/local/lib/python3.6/dist-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/usr/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/usr/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
ImportError: Building module glove_code.src.glove_inner failed: ["distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1\n"]
There was lots of similar errors in the import.
Did I perhaps forget to do some installation step?
Hi @alex-tifrea ,
I'm training word embeddings on the RCV1 corpus. I've generated the vocab (7MB) and co-occurence file (7.5GB) from the glove's code for RCV1 dataset. I run the following command to train
./run_glove.sh --train --root .. -coocc_file ../poincare_glove2/GloVe/cooccurrence.bin --vocab_file ../poincare_glove2/GloVe/vocab.txt --epochs 50 --workers 20 --restrict_vocab 200000 --lr 0.01 --poincare 1 --bias --size 100 --dist_func cosh-dist-sq
The scripts uses only single CPU core and do not dump anything in the logs/* file.
Could anyone provide any pointers to what I am doing wrong or something else is wrong?
Thanks
Hi, I tried your pretrained consequence in my experiment. I believe this kind of embedding is suitable for dataset with hierarchy structure and can keep the hierarchy information. To evaluate this, I designed a experiment, but got a confusing result.
In the experiment, I choose a small dataset including 1000 words and find the parent words of them in the wordnet. Then, I calculate the distance between the child word and parent word, comparing it to the distance between the child word and other words in wordnet. Surprisingly, I find the distance between the child word and its parent word is relatively large, which I believe should be very small. I count the rank of the distance between the child word and its parent word among all words, and find it is only ranked in the middle.
Do you know the reason of the result?
Thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.