Giter VIP home page Giter VIP logo

word2vec's Introduction

NOT MAINTAINED

  • I have not used this code in a long time
  • No issues or PRs can be created
  • Latest release doesn't work with newer versions of numpy
  • I recommened moving to a native alternative in Tensorflow or PyTorch

word2vec

pypi build coverage license

Python interface to Google word2vec.

Training is done using the original C code, other functionality is pure Python with numpy.

Installation

pip install word2vec

Compilation

The installation requires to compile the original C code using gcc.

You can override the compilation flags if needed:

WORD2VEC_CFLAGS='-march=corei7' pip install word2vec

Windows: There is basic some support for this support based on this win32 port.

Usage

Example notebook: word2vec

The default functionality from word2vec is available with the following commands:

  • word2vec
  • word2phrase
  • word2vec-distance
  • word2vec-word-analogy
  • word2vec-compute-accuracy

Experimental functionality on doc2vec can be found in this example: doc2vec

word2vec's People

Contributors

abolger avatar alfioemanuelefresta avatar danielfrg avatar dr-costas avatar fabmue avatar hsmtkk avatar iamalbert avatar mariarigaki avatar mayjs avatar pranjalv123 avatar sakurai-youhei avatar stephenbalaban avatar tianhuil avatar timgates42 avatar tjwei avatar tyler-thetyrant avatar ynjxsjmh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

word2vec's Issues

Not working

Hi, I am using Mac OS X El Capitan and I am having difficulties to run the code from http://nbviewer.jupyter.org/github/danielfrg/word2vec/blob/master/examples/word2vec.ipynb. In particular this is what I get:

$ python
Python 2.7.10 (default, Oct 23 2015, 18:05:06) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import word2vec
>>> word2vec.word2phrase('text8', 'text-phrases', verbose=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "word2vec/scripts_interface.py", line 110, in word2phrase
    run_cmd(command, verbose=verbose)
  File "word2vec/scripts_interface.py", line 142, in run_cmd
    stderr=subprocess.PIPE)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Does anyone know what's going on?

How to set the word2vec parameter to the best?

Hi,
I have a question: how to set parameters to be optimal when using word2vec to carry out word vector training. For example, window, iter, alpha ,min-count.

Looking forward to your advice or answers.
Best regards,
Thank you very much!

While installing the word2Vec library getting the following errors.

ERROR: Command errored out with exit status 1:
command: 'D:\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"'; file='"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\manoj\AppData\Local\Temp\pip-wheel-oiecgwem'
cwd: C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec
Complete output (33 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\word2vec
copying word2vec\io.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec\scripts_interface.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec\utils.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec\wordclusters.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec\wordvectors.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec_version.py -> build\lib.win-amd64-3.7\word2vec
copying word2vec_init_.py -> build\lib.win-amd64-3.7\word2vec
creating build\lib.win-amd64-3.7\word2vec\tests
copying word2vec\tests\test_word2vec.py -> build\lib.win-amd64-3.7\word2vec\tests
copying word2vec\tests_init_.py -> build\lib.win-amd64-3.7\word2vec\tests
UPDATING build\lib.win-amd64-3.7\word2vec/_version.py
set build\lib.win-amd64-3.7\word2vec/_version.py to '0.10.6'
running build_ext
building 'word2vec.word2vec_noop' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\word2vec
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda\include -ID:\Anaconda\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /Tcword2vec/word2vec_noop.c /Fobuild\temp.win-amd64-3.7\Release\word2vec/word2vec_noop.obj word2vec_noop.c
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:D:\Anaconda\libs /LIBPATH:D:\Anaconda\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" /EXPORT:PyInit_word2vec_noop build\temp.win-amd64-3.7\Release\word2vec/word2vec_noop.obj /OUT:build\lib.win-amd64-3.7\word2vec\word2vec_noop.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\word2vec\word2vec_noop.cp37-win_amd64.lib
Creating library build\temp.win-amd64-3.7\Release\word2vec\word2vec_noop.cp37-win_amd64.lib and object build\temp.win-amd64-3.7\Release\word2vec\word2vec_noop.cp37-win_amd64.exp
Generating code
Finished generating code
installing to build\bdist.win-amd64\wheel
running install
error: [WinError 2] The system cannot find the file specified
Compilation command: gcc C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\word2vec\src\win32/word2vec.c -o Scripts\word2vec.exe -O2 -Wall -funroll-loops

ERROR: Failed building wheel for word2vec
Running setup.py clean for word2vec
Failed to build word2vec
Installing collected packages: word2vec
Running setup.py install for word2vec ... error
ERROR: Command errored out with exit status 1:
command: 'D:\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"'; file='"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\manoj\AppData\Local\Temp\pip-record-skgqfgah\install-record.txt' --single-version-externally-managed --compile --install-headers 'D:\Anaconda\Include\word2vec'
cwd: C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec
Complete output (3 lines):
running install
error: [WinError 2] The system cannot find the file specified
Compilation command: gcc C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\word2vec\src\win32/word2vec.c -o Scripts\word2vec.exe -O2 -Wall -funroll-loops
----------------------------------------
ERROR: Command errored out with exit status 1: 'D:\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"'; file='"'"'C:\Users\manoj\AppData\Local\Temp\pip-install-0w9h1x7a\word2vec\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\manoj\AppData\Local\Temp\pip-record-skgqfgah\install-record.txt' --single-version-externally-managed --compile --install-headers 'D:\Anaconda\Include\word2vec' Check the logs for full command output.
(base) PS C:\windows\system32>

non-breaking space (\xa0) treated as a character

It turns out, that in the input files that I have - there are some \xa0 characters which seem to be nothing else but a non-breaking space character. For some reason the program treats it as a word and creates a vector for it, so after running word2vec.word2vec('...', '...', binary=0), in the wordvector I file I have a line:

  -0.297636 -0.038046 0.405622 ... -0.068306 0.909337 0.405136 

where the first character is the non-breaking space, the second is the space and then 100 floats. However when I try to read it into memory with model.load() the spaces are stripped, -0.297636 is taken as the word and I get an error that the vector has only 99 numbers.

Do you have any idea how to fix this? Of course I can try to edit my input data, but there's a hell lot of it and I can't guarantee that the users of my software will provide cleaned data in the future.

Thanks!

[Errno 2] No such file or directory: 'word2vec'

  • On MacOS 10.13.6, I've tried Python-2.7 3.5 3.6 3.7, I can exec command word2vec in terminal and get a .bin file output, but it didn't work when run with python code
  • I cloned source code from git but got the same exception.

example:

import word2vec as wv

if __name__ == '__main__':
wv.word2vec('../data/dict.txt.big.txt', '/Users/porridge/Downloads/Word2Vec.bin', size=300, verbose=True)

Exceptions messages such as:

FileNotFoundError: [Errno 2] No such file or directory: 'word2vec'
FileNotFoundError: [Errno 2] No such file or directory: 'word2phrase'
FileNotFoundError: [Errno 2] No such file or directory: 'word2clusters'

depends on which function I called

Error Can't import word2vec

When importing Word2vec in python 2.7 am facing error kindly anyone help

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import word2vec
  File "word2vec.py", line 14, in <module>
    model = word2vec.Word2Vec(sentences, size=100, window=4, min_count=1, workers=4)
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 432, in __init__
    self.train(sentences)
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 690, in train
    raise RuntimeError("you must first build vocabulary before training the model")
RuntimeError: you must first build vocabulary before training the model

a question

hi,I want to ask a question about word2vec details. I read codes,the row of the vector matrix sorted by counting. Are words corresponding column vector the words corresponding row vector?If not ,how do you get the words corresponding row vector? thank you.

word2phrase

Is there an example on how to use word2phrase - preferably from within a python script? Btw, I use Anaconda.

Encoding issues when attempting to load word2vec models

Hello,

I'm attempting to load a word2vec model pre-trained on the Google News corpora, but get encoding issues

import word2vec

model = word2vec.load('/Users/grant/Downloads/GoogleNews-vectors-negative300.bin', kind='bin')

Traceback (most recent call last):
  File "w2v_exps.py", line 3, in <module>
    model = word2vec.load('/Users/grant/Downloads/GoogleNews-vectors-negative300.bin', kind='bin')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/word2vec/io.py", line 18, in load
    return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/word2vec/wordvectors.py", line 171, in from_binary
    vocab[i] = word.decode(encoding)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

I assume that the pre-trained google news vectors found at https://code.google.com/p/word2vec/ would be utf-8 encoded, but I guess not? Is this project not intended to work with those files?

Word2vec import error iun python

Kindly help me to correct this error

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import word2vec
  File "word2vec.py", line 14, in <module>
    model = word2vec.Word2Vec(sentences, size=100, window=4, min_count=1, workers=4)
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 432, in __init__
    self.train(sentences)
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 690, in train
    raise RuntimeError("you must first build vocabulary before training the model")
RuntimeError: you must first build vocabulary before training the model

Encoding issues when attempting to load word2vec models for arabic language

Hello,

I'm attempting to load a word2vec model pre-trained on my own Arabic corpora which are encoded on UTF-8, but get encoding issues

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-8-17f0d481c41a> in <module>()
----> 1 model = word2vec.load('/home/adel/Desktop/final.bin')

~/anaconda3/lib/python3.6/site-packages/word2vec/io.py in load(fname, kind, *args, **kwargs)
     16             raise Exception('Could not identify kind')
     17     if kind == 'bin':
---> 18         return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
     19     elif kind == 'txt':
     20         return word2vec.WordVectors.from_text(fname, *args, **kwargs)

~/anaconda3/lib/python3.6/site-packages/word2vec/wordvectors.py in from_binary(cls, fname, vocabUnicodeSize, desired_vocab, encoding, newLines)
    200                 include = desired_vocab is None or word in desired_vocab
    201                 if include:
--> 202                     vocab[i] = word.decode(encoding)
    203 
    204                 # read vector

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position 97: unexpected end of data

Decoding error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/andrey/keras/lib/python3.6/site-packages/word2vec/io.py", line 18, in load
    return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
  File "/Users/andrey/keras/lib/python3.6/site-packages/word2vec/wordvectors.py", line 171, in from_binary
    vocab[i] = word.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 64: unexpected end of data

Wrong documentation about word2vec training

In the provided docstrings (scripts_interface.py) the following is wrong:

def word2vec(train, output, size=100, window=5, sample='1e-3', hs=0,
negative=5, threads=12, iter_=5, min_count=5, alpha=0.025,
debug=2, binary=1, cbow=1, save_vocab=None, read_vocab=None,
verbose=False):

cbow
Use the continuous back of words model; default is 1 (skip-gram
model)

When you run the word2vec program standalone you get this:

WORD VECTOR estimation toolkit v 0.1c

Options:
Parameters for training:
[...]
-cbow
Use the continuous bag of words model; default is 1 (use 0 for skip-gram model)
[...]

So the provided docstring should be changed to:

Use the continuous back of words model; default is 1 (cbow model)

not able to install

I tried the windows installer and the pip method to try to install it, on both python 3.3 and python 2.7 it gave me errors. For the windows explorer it said no python directory was found, and with the pip method on both 3.3 and 2.7 it gave me Windows Error:[Error 2] with information of the 'subprocess.call' function failing to find a file. My system is windows 7, tried on both 32bits and 64bits.

word2vec install on Windows 8.1 (and Windows XP) 32-bit

Install on Windows XP and Windows 8.1 caused the same error. Tried both:

pip install -U word2vec, and
"python setup.py install" using the zip'ed download

Anaconda 2.1 32-bit.

C:\Documents and Settings\dinesh>pip install -U word2vec
Collecting word2vec
  Downloading word2vec-0.6.7.tar.gz
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "c:\docume~1\dinesh\locals~1\temp\pip-build-d9dqjn\word2vec\setup.py"
, line 17, in <module>
        subprocess.call(['make', '-C', 'word2vec-c'])
      File "C:\Anaconda\lib\subprocess.py", line 522, in call
        return Popen(*popenargs, **kwargs).wait()
      File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
        errread, errwrite)
      File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
        startupinfo)
    WindowsError: [Error 2] The system cannot find the file specified
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

      File "<string>", line 20, in <module>

      File "c:\docume~1\dinesh\locals~1\temp\pip-build-d9dqjn\word2vec\setup.py"
, line 17, in <module>

        subprocess.call(['make', '-C', 'word2vec-c'])

      File "C:\Anaconda\lib\subprocess.py", line 522, in call

        return Popen(*popenargs, **kwargs).wait()

      File "C:\Anaconda\lib\subprocess.py", line 710, in __init__

        errread, errwrite)

      File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child

        startupinfo)

    WindowsError: [Error 2] The system cannot find the file specified

    ----------------------------------------
←[31m    Command "python setup.py egg_info" failed with error code 1 in c:\docum
e~1\dinesh\locals~1\temp\pip-build-d9dqjn\word2vec←[0m
..............................................................................................................................................
C:\Documents and Settings\dinesh\My Documents\Downloads\word2vec-master>python s
etup.py install
Traceback (most recent call last):
  File "setup.py", line 17, in <module>
    return_code = subprocess.call(['make', '-C', 'word2vec-c'])
  File "C:\Anaconda\lib\subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
    errread, errwrite)
  File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Do I get tokenizer?

I want to get tokenizer to use my research.

Or, at least get tokenized result for the training corpus..

Absolute vs relative file paths

Hi,

I'm struggling to get this wrapper lib work. It seems that if I provide relative paths to disk resources it works just fine. But when I want to use absolute ones, it just doesn't.

Is this somehow expected? To be fair, that's not a big deal: one can always use os.path.realpath(), but I was wondering whether you could tell me more about why that's happening.

Thanks,
Michele.

OSError: [Errno 2] No such file or directory

when trying the example word2vec.word2phrase a system error occurred.
after a short source search, I found out that is actually calling word2phrase as a command, and I've tried word2phrase command in shell and there is no such command nor word2vec.

I've installed by pip in mac(10.10.5) and there is no error,

pip show word2vec
Name: word2vec
Version: v0.8.1
Location: /Library/Python/2.7/site-packages
Requires: numpy

could you please inform me where can I find this command? maybe append some path to $PATH?

Installing word2vec on python3.4 doesn't work

Hey,

I tried to install word2vec on python3.4 and got the following:

      File "/tmp/pip-build-9693mw7d/word2vec/setup.py", line 29
        print ' '.join(command)
                ^
    SyntaxError: invalid syntax

I looked into your setup.py and it seems like you already fixed the python3 compatibility, but I think you didn't upload the release to pip. I cloned the repo and installed the local package and it works, but just FYI.

Thanks,

Arwin

Failing to load GoogleNews-vectors-negative300.bin

Hello,
while trying to load the pre-trained model from google news dataset, I'm getting the following error (both python3 and python2):

Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import word2vec
>>> word2vec.load('GoogleNews-vectors-negative300.bin')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/venvs/word2vec_python2/local/lib/python2.7/site-packages/word2vec/io.py", line 18, in load
    return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
  File "~/venvs/word2vec_python2/local/lib/python2.7/site-packages/word2vec/wordvectors.py", line 202, in from_binary
    vocab[i] = word.decode(encoding)
  File "~/venvs/word2vec_python2/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

The error looks very similar to #51

pip install not iterable

I pip installed word2vec successfully, but running pip install a second, third, etc times throws errors. Also, pip uninstall will not work. The library is still installed, and I can successfully import it. I think something is messed up with the file paths in the library maybe?

>>> import word2vec
>>> word2vec.__file__
'/usr/local/lib/python2.7/dist-packages/word2vec/__init__.pyc'
>>> 
$ pip freeze | grep word2vec
$
$ pip uninstall word2vec
Cannot uninstall requirement word2vec, not installed
Storing complete log in /tmp/tmpOUYFdB
$

Here's the traceback in /tmp/tmpOUYFdb:

Exception information:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pip-1.3.1-py2.7.egg/pip/basecommand.py", line 139, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/dist-packages/pip-1.3.1-py2.7.egg/pip/commands/uninstall.py", line 54, in run
    requirement_set.uninstall(auto_confirm=options.yes)
  File "/usr/local/lib/python2.7/dist-packages/pip-1.3.1-py2.7.egg/pip/req.py", line 899, in uninstall
    req.uninstall(auto_confirm=auto_confirm)
  File "/usr/local/lib/python2.7/dist-packages/pip-1.3.1-py2.7.egg/pip/req.py", line 417, in uninstall
    raise UninstallationError("Cannot uninstall requirement %s, not installed" % (self.name,))
UninstallationError: Cannot uninstall requirement word2vec, not installed

Cannot install via pip on Ubuntu 12.04 Precise

When running:

➜  word2vec git:(master) pip install word2vec

I get back:

Collecting word2vec
  Using cached word2vec-0.8.1.tar.gz
    Complete output from command python setup.py egg_info:
    /tmp/cc89ypZ6.s: Assembler messages:
    /tmp/cc89ypZ6.s:2867: Error: no such instruction: `vfmadd312ss (%r12),%xmm0,%xmm11'
    /tmp/cc89ypZ6.s:2886: Error: no such instruction: `vfmadd312ss 4(%r12),%xmm0,%xmm1'
    [many, many more]
    /tmp/cc89ypZ6.s:5426: Error: no such instruction: `vfmadd312ss (%rdx,%r9,4),%xmm0,%xmm2'
    /tmp/cc89ypZ6.s:5429: Error: no such instruction: `vfmadd312ss (%rdx,%rdi,4),%xmm0,%xmm15'
    gcc word2vec-c/word2vec.c -o bin/word2vec -lm -pthread -O3 -Wall -march=native -funroll-loops -Wno-unused-result

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-xYMfux/word2vec

And the installation fails.

➜  word2vec git:(master) uname -a
Linux uf8bc12856564545249ca 3.13.0-66-generic #108~precise1-Ubuntu SMP Thu Oct 8 10:07:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Let me know if you need more information about my environment. If the entire Error: no such instruction stuff is useful to you I can provide it.

Thanks for building this. I've used it on OSX and I like it!

word2vec.word2phrase() problem in python 3

Hello,
I am using a Turkish dataset. (https://github.com/ahmetax/derlemtr/blob/master/buyuk_veri/hurriyet_noktasiz_2010_01.txt.rar)

word2vec.word2phrase(train=fin, output=fout, verbose=True) call immediately returns without making any thing, and without errors.
The problem might be related to special Turkish characters (utf-8).
I am using Python 3.5.1 on Ubuntu 16.04
word2vec.word2clusters() and word2vec.word2vec() run with no problems.
How can we solve that problem? (I can create phrase file by using a revised version of word2phrase.py from https://github.com/travisbrady/word2phrase )
Thank you.

Note: There is no problem when I use text8.

Some words not in model GoogleNews-vectors-negative300.bin

When I use word2vec to access the pre-trained model GoogleNews-vectors-negative300.bin', some of the words are reported as being not in the model. I've had the same problem on a 16GB Mac running OS 10.10.2 and on a large linux machine. Here a session on linux:

$ python
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Aug 21 2014, 18:22:21) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import word2vec
>>> w = word2vec
>>> m = w.load('GoogleNews-vectors-negative300.bin')
>>> m.vectors.shape
(3000000, 300)
>>> m['dog']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "word2vec/wordvectors.py", line 45, in __getitem__
    return self.get_vector(word)
  File "word2vec/wordvectors.py", line 54, in get_vector
    idx = self.ix(word)
  File "word2vec/wordvectors.py", line 40, in ix
    raise KeyError('Word not in vocabulary')
KeyError: u'Word not in vocabulary'
>>> m['cat']
array([ 0.03357537,  0.05204856,  0.04530652,  0.05636346, -0.02170937,
       -0.00815787, -0.0528576 ,  0.02413651,  0.06094805,  0.02588944,
        ...
        0.03559798,  0.06715073,  0.00525879, -0.04476715,  0.03249664])
>>> 

AttributeError: module 'word2vec' has no attribute 'word2vec'

I ran into a problem “AttributeError: module 'word2vec' has no attribute 'word2vec'”

anaconda 4.3 on Centos 7

how to solve it?thanks

>>> import word2vec
>>> word2vec.word2vec('chatCorpus.txt', 'corpusWord2Vec.bin', size=50,verbose=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'word2vec' has no attribute 'word2vec'

[WinError 2] The system cannot find the file specified

Hi,

I keep receiving this error message while installing on python 3.6:

Installing collected packages: word2vec
 Running setup.py install for word2vec ... error
   Complete output from command c:\users\annaz\appdata\local\programs\python\python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\annaz\\AppData\\Local\\Temp\\pip-req-build-jp3b8ddr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\annaz\AppData\Local\Temp\pip-record-wpoq3fy5\install-record.txt --single-version-externally-managed --compile:
   running install
   Compilation command: gcc C:\Users\annaz\AppData\Local\Temp\pip-req-build-jp3b8ddr\word2vec\c\win32/word2vec.c -o Scripts\word2vec.exe -O2 -Wall -funroll-loops
   error: [WinError 2] The system cannot find the file specified

   ----------------------------------------
Command "c:\users\annaz\appdata\local\programs\python\python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\annaz\\AppData\\Local\\Temp\\pip-req-build-jp3b8ddr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\annaz\AppData\Local\Temp\pip-record-wpoq3fy5\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\annaz\AppData\Local\Temp\pip-req-build-jp3b8ddr\

What am I missing while installing it?

word2vec install on ubuntu 11.04 a success but lots of warnings

word2vec installed successfully on ubuntu 11.04 using "pip install -U word2vec".

There were quite a few warnings and the output is provided as information.

ubuntu@ubuntu-VirtualBox:~$ pip install -U word2vec
Collecting word2vec
  Downloading word2vec-0.6.7.tar.gz
    make: Entering directory `/tmp/pip-build-EDXakj/word2vec/word2vec-c'
    gcc word2vec.c -o ../bin/word2vec -lm -pthread -O2 -Wall -funroll-loops
    word2vec.c: In function ‘TrainModelThread’:
    word2vec.c:398:36: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       unsigned long long next_random = (long long)id;
                                        ^
    word2vec.c:408:50: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       fseek(fi, file_size / (long long)num_threads * (long long)id, SEEK_SET);
                                                      ^
    word2vec.c: In function ‘ReadVocab’:
    word2vec.c:343:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(fin, "%lld%c", &vocab[a].cn, &c);
               ^
    gcc word2phrase.c -o ../bin/word2phrase -lm -pthread -O2 -Wall -funroll-loops
    gcc distance.c -o ../bin/w2v-distance -lm -pthread -O2 -Wall -funroll-loops
    distance.c: In function ‘main’:
    distance.c:45:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    distance.c:46:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    distance.c:54:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    distance.c:55:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    gcc word-analogy.c -o ../bin/w2v-word-analogy -lm -pthread -O2 -Wall -funroll-loops
    word-analogy.c: In function ‘main’:
    word-analogy.c:44:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    word-analogy.c:45:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    word-analogy.c:53:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    word-analogy.c:54:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    gcc compute-accuracy.c -o ../bin/w2v-compute-accuracy -lm -pthread -O2 -Wall -funroll-loops
    compute-accuracy.c: In function ‘main’:
    compute-accuracy.c:46:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    compute-accuracy.c:48:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    compute-accuracy.c:56:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    compute-accuracy.c:58:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    compute-accuracy.c:69:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st1);
              ^
    compute-accuracy.c:78:12: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
           scanf("%s", st1);
                ^
    compute-accuracy.c:86:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st2);
              ^
    compute-accuracy.c:88:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st3);
              ^
    compute-accuracy.c:90:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st4);
              ^
    make: Leaving directory `/tmp/pip-build-EDXakj/word2vec/word2vec-c'
Requirement already up-to-date: numpy>=1.7.1 in ./anaconda/lib/python2.7/site-packages (from word2vec)
Installing collected packages: word2vec
  Running setup.py install for word2vec
    make: Entering directory `/tmp/pip-build-EDXakj/word2vec/word2vec-c'
    gcc word2vec.c -o ../bin/word2vec -lm -pthread -O2 -Wall -funroll-loops
    word2vec.c: In function ‘TrainModelThread’:
    word2vec.c:398:36: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       unsigned long long next_random = (long long)id;
                                        ^
    word2vec.c:408:50: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       fseek(fi, file_size / (long long)num_threads * (long long)id, SEEK_SET);
                                                      ^
    word2vec.c: In function ‘ReadVocab’:
    word2vec.c:343:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(fin, "%lld%c", &vocab[a].cn, &c);
               ^
    gcc word2phrase.c -o ../bin/word2phrase -lm -pthread -O2 -Wall -funroll-loops
    gcc distance.c -o ../bin/w2v-distance -lm -pthread -O2 -Wall -funroll-loops
    distance.c: In function ‘main’:
    distance.c:45:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    distance.c:46:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    distance.c:54:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    distance.c:55:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    gcc word-analogy.c -o ../bin/w2v-word-analogy -lm -pthread -O2 -Wall -funroll-loops
    word-analogy.c: In function ‘main’:
    word-analogy.c:44:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    word-analogy.c:45:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    word-analogy.c:53:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    word-analogy.c:54:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    gcc compute-accuracy.c -o ../bin/w2v-compute-accuracy -lm -pthread -O2 -Wall -funroll-loops
    compute-accuracy.c: In function ‘main’:
    compute-accuracy.c:46:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &words);
             ^
    compute-accuracy.c:48:9: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
       fscanf(f, "%lld", &size);
             ^
    compute-accuracy.c:56:11: warning: ignoring return value of ‘fscanf’, declared with attribute warn_unused_result [-Wunused-result]
         fscanf(f, "%s%c", &vocab[b * max_w], &ch);
               ^
    compute-accuracy.c:58:37: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
         for (a = 0; a < size; a++) fread(&M[a + b * size], sizeof(float), 1, f);
                                         ^
    compute-accuracy.c:69:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st1);
              ^
    compute-accuracy.c:78:12: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
           scanf("%s", st1);
                ^
    compute-accuracy.c:86:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st2);
              ^
    compute-accuracy.c:88:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st3);
              ^
    compute-accuracy.c:90:10: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
         scanf("%s", st4);
              ^
    make: Leaving directory `/tmp/pip-build-EDXakj/word2vec/word2vec-c'
Successfully installed word2vec-0.6.7

Dependency on Cython

In relation to #61, installation seems to depend on Cython now.

How to reproduce the issue

  1. Spin up clean environment by docker - docker run -it python:3 bash.
  2. Run pip install word2vec according to README.

Expected behavior

The installation of word2vec should succeed.

Actual behavior

The installation of word2vec fails due to missing Cython which is referred in setup.py.

root@586fc4889a4a:/# pip install word2vec
Collecting word2vec
  Downloading https://files.pythonhosted.org/packages/ce/51/5e2782b204015c8aef0ac830297c2f2735143ec90f592b9b3b909bb89757/word2vec-0.10.2.tar.gz (60kB)
     |████████████████████████████████| 61kB 38kB/s
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8jl_l2_x/word2vec/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8jl_l2_x/word2vec/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
         cwd: /tmp/pip-install-8jl_l2_x/word2vec/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-8jl_l2_x/word2vec/setup.py", line 4, in <module>
        from Cython.Build import cythonize
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

heroku install error

remote: Traceback (most recent call last):
remote: File "", line 1, in
remote: File "/tmp/pip-build-n8d7a1uo/word2vec/setup.py", line 4, in
remote: from Cython.Build import cythonize
remote: ModuleNotFoundError: No module named 'Cython'

is the module different from gensim.models.?

hi,dear,
when I install the module ,found the bugs down,

 ERROR: Command errored out with exit status 1:
     command: 'd:\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ggca1\\AppData\\Local\\Temp\\pip-install-2wl0oxut\\word2vec\\setup.py'"'"'; __file__='"'"'C:\\Users\\ggca1\\AppData\\Local\\Temp\\pip-install-2wl0oxut\\word2vec\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\ggca1\AppData\Local\Temp\pip-record-7e3e1b2c\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\python36\Include\word2vec'
         cwd: C:\Users\ggca1\AppData\Local\Temp\pip-install-2wl0oxut\word2vec\
    Complete output (3 lines):
    running install
    Compilation command: gcc C:\Users\ggca1\AppData\Local\Temp\pip-install-2wl0oxut\word2vec\word2vec\src\win32/word2vec.c -o Scripts\word2vec.exe -O2 -Wall -funroll-loops
    error: [WinError 2] 系统找不到指定的文件。
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'd:\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ggca1\\AppData\\Local\\Temp\\pip-install-2wl0oxut\\word2vec\\setup.py'"'"'; __file__='"'"'C:\\Users\\ggca1\\AppData\\Local\\Temp\\pip-install-2wl0oxut\\word2vec\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\ggca1\AppData\Local\Temp\pip-record-7e3e1b2c\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\python36\Include\word2vec' Check the logs for full command output.

so I just use another module gensim.models.Word2vec

unable to call word2vec cmdline tool from the python library on Mac OS X

In [2]: import word2vec; word2vec.word2vec('text8.txt', 'text8.bin', size=300, verbose=True)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-869655b24b66> in <module>()
----> 1 import word2vec; word2vec.word2vec('text8.txt', 'text8.bin', size=300, verbose=True)

/Library/Python/2.7/site-packages/word2vec/scripts_interface.py
---> 29     proc = subprocess.Popen(process, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

And you're clueless on what's going on.

After analysis, I found that attempts to run:

['/Library/Python/2.7/site-packages/word2vec/../bin/word2vec', '-train', 'text8.txt', '-output', 'text8.bin', '-size', '300', '-window', '5', '-sample', '0', '-hs', '1', '-negative', '0', '-threads', '4', '-min-count', '5', '-alpha', '0.025', '-debug', '2', '-binary', '1', '-cbow', '0']

But pip on mac os x installs the binary files to the
/Library/Frameworks/Python.framework/Versions/2.7/bin/
and the library itself is at the
/Library/Python/2.7/site-packages/word2vec/

Problem is with
realpath = os.path.dirname(os.path.realpath(file))
which gives incorrect path.

If you're unable to run the command line,
please at least display the correct error:
what happened (and how to fix it).

word2vec only trains on 875 words from 32863

I have this review dataset containing reviews from Yelp and Amazon however, when I train word2vec on them it gives me the following output:
Starting training using file data.text Vocab size: 875 Words in train file: 32863
data.text looks something like this
hello world hello world
without newlines.
,thanks

unable to install word2vec on linux

I have installed Cython before installing word2vec
numpy version 1.10.4
gcc 4.8.4

This is the problem which i get during installation of word2vec

$ sudo pip install word2vec 
[sudo] password for sharad: 
The directory '/home/sharad/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/sharad/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting word2vec
  Downloading word2vec-0.9.0.tar.gz (49kB)
    100% |████████████████████████████████| 53kB 5.3kB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-qm2BCb/word2vec/setup.py", line 101, in <module>
        ext_modules=cythonize("word2vec/word2vec_noop.pyx"),
      File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 758, in cythonize
        aliases=aliases)
      File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 651, in create_extension_list
        for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
      File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 103, in nonempty
        raise ValueError(error_msg)
    ValueError: 'word2vec/word2vec_noop.pyx' doesn't match any files

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-qm2BCb/word2vec

kindly provide me the solution
Thanks

install in Docker Container (ubuntu:16.10)

In my docker container ubuntu:16.10

I well install this
pip install word2vec

But, I got error
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-lczJ96/word2vec/

I must install "Cython" to fixed it.
pip install Cython


python version: 2.7
pip version: 9.0.1 (for python 2.7)

Setup for Mac OS 10.9.5 does not work: fatal error: 'malloc.h' file not found

When building from the cloned directory, running
"python setup.py make install" results in the following:

>>gcc word2vec-c/word2vec.c -o bin/word2vec -lm -pthread -O3 -Wall -march=native -funroll-loops -Wno-unused-result -I/usr/include/malloc
>>gcc word2vec-c/word2phrase.c -o bin/word2phrase -lm -pthread -O3 -Wall -march=native -funroll-loops -Wno-unused-result -I/usr/include/malloc
>>gcc word2vec-c/distance.c -o bin/word2vec-distance -lm -pthread -O3 -Wall -march=native -funroll-loops -Wno-unused-result -I/usr/include/malloc
word2vec-c/distance.c:18:10: fatal error: 'malloc.h' file not found

The Google Code archive reports the same issue for the original source code because of where the Mac OS puts the "malloc" and "stdlib" files. I will submit a patch once I get it working.

it seems that word2vec-doc2vec does not work yet

I used following command to train doc vector:
word2vec-doc2vec -train ../data/paper.seg -output ../data/model/paper.vec.bin -size 100 -window 5 -alpha 0.025 -sample 1e-3 -hs 0 -negative 5 -min-count 5 -cbow 1 -threads 4 -iter 20 -sentence-vectors 1 -binary 1
the train file looks like this:
_*2599671 公路 工程造价 管理信息系统
_*2599672 湖南 电视产业 电视台 管理体制改革 集团化 网络建设 竞争机制
_*2599673 超高功率电弧炉 强化用氧技术 最佳氧耗值 电炉炼钢 南京钢铁集团有限公司 脱碳 氧气 节能
_*2599674 retran程序 乏燃料贮存水池 热工水力安全分析 核电厂安全 点池模型
_*2599675 北方地区 住宅建筑 电采暖 用电负荷 辐射供暖系统
_*2599676 汽车工程 自动变速器 自动控制系统 换挡智能化 换挡控制器
_*2599677 船舶 强度 直接计算板
_*2599678 微柱高效液相色谱法 测定 茶多酚 茶叶
_*2599679 出版单位 出版体制改革 现代企业制度 国有资产管理
_*2599680 车间作业调度 遗传算法 领域搜索 收敛性 稳定性

However , when I loaded the trained model using model.load(...) and model["_*0"] as tutorial did, I got exception of "Word not in vocabulary"

Finally I found there are no document vector in the model file. there are only word vectors.

pls help. thanks

hi

hi,I want to ask a feasibility issue.you know,the componets of the vectors are features indicating some context type.i want to get these features,my language is chinese,but i don't know how to start. Can you give me an idea? Thank you.

How can I use google's word2vec?

I want to use your interface but with google's pre-trained word2vec, instead of the sample data .
So instead of:
model = word2vec.load('../data/text8.bin') it will be:
model = word2vec.load('word2vec-google-news-300')
Is that possible?

unable to install word2vec on Ubuntu 18.04.1

Tried to install word2vec with and without '-U' flag, with pip and pip3. Keep receiving the same error:

$ pip install word2vec
Collecting word2vec
  Using cached https://files.pythonhosted.org/packages/98/9c/0cc6019be231950235517c29d2d6a2fca76dfa75ad4162ccce22fb1b4364/word2vec-0.9.4.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-zG4Ed2/word2vec/setup.py", line 23, in <module>
        from Cython.Build import cythonize
    ImportError: No module named Cython.Build
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-zG4Ed2/word2vec/

Environment:

$ uname -a
Linux E7440 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:12:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ python --version
Python 2.7.15+

$ python3 --version
Python 3.6.8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.