The mittens from roamanalytics

how to initialize vectors for words in corpus not in GloVe

Not so much an issue as a question.

Do you have any suggestions on the "best" way to initialize vectors for words in my small corpus that don't appear in GloVe? I'm assuming that I need to do this so that mittens can "retrofit" it. But I suspect that a simple np.random.rand(100) is probably not the best way to go.

Any suggestions would be much appreciated.

Cannot run mittens with tensorflow 2.1

After installing tensorflow 2.1, I cannt run GloVe any more - "fit" function gives the following errors:

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2020-03-01 17:54:01.126392: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-01 17:54:01.127263: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.127355: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.127418: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128038: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128072: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128111: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128134: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128168: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128184: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.129233: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.129268: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
return fn(*args)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
target_list, run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[{{node Tensordot_1/MatMul}}]]
[[Sum/_5]]
(1) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[{{node Tensordot_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/richard/Documents/AI/XCS224U/similarity_methods.py", line 28, in
embeddings = glove_model.fit(cooccurrence)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 240, in fit
X, fixed_initialization=fixed_initialization)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 84, in fit
fixed_initialization=fixed_initialization)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 83, in _fit
self.log_coincidence: log_coincidence})
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 960, in run
run_metadata_ptr)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run
feed_dict_tensor, options, run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run
run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[node Tensordot_1/MatMul (defined at /anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py:151) ]]
[[Sum/_5]]
(1) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[node Tensordot_1/MatMul (defined at /anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py:151) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'Tensordot_1/MatMul':
File "/Documents/AI/XCS224U/similarity_methods.py", line 28, in
embeddings = glove_model.fit(cooccurrence)
File "/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 240, in fit
X, fixed_initialization=fixed_initialization)
File "/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 84, in fit
fixed_initialization=fixed_initialization)
File "/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 56, in _fit
self._build_graph(vocab, initial_embedding_dict)
File "/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 151, in _build_graph
tf.tensordot(self.bw, tf.transpose(self.ones), axes=1) +
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4106, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2798, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 5626, in mat_mul
name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in init
self._traceback = tf_stack.extract_stack()

how to use mittens for more than 20k vocab ?

I am trying to go through the code to understand where the changes can help in using mittens for vocab more than 20k. if you can tell me the approach or point to the part of the code that needs can be done patches to break that limitation. Or any other explanation would be helpful.

Tensorflow 2.1 error

After installing tensorflow 2.1, I cannt run GloVe any more - "fit" function gives the following errors:

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2020-03-01 17:54:01.126392: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-01 17:54:01.127263: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.127355: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.127418: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128038: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128072: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128111: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128134: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128168: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.128184: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.129233: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-03-01 17:54:01.129268: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
return fn(*args)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
target_list, run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[{{node Tensordot_1/MatMul}}]]
[[Sum/_5]]
(1) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[{{node Tensordot_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/richard/Documents/AI/XCS224U/similarity_methods.py", line 28, in
embeddings = glove_model.fit(cooccurrence)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 240, in fit
X, fixed_initialization=fixed_initialization)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 84, in fit
fixed_initialization=fixed_initialization)
File "/home/richard/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 83, in _fit
self.log_coincidence: log_coincidence})
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 960, in run
run_metadata_ptr)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run
feed_dict_tensor, options, run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run
run_metadata)
File "/home/richard/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[node Tensordot_1/MatMul (defined at /anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py:151) ]]
[[Sum/_5]]
(1) Internal: Blas GEMM launch failed : a.shape=(4, 1), b.shape=(1, 4), m=4, n=4, k=1
[[node Tensordot_1/MatMul (defined at /anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py:151) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'Tensordot_1/MatMul':
File "/Documents/AI/XCS224U/similarity_methods.py", line 28, in
embeddings = glove_model.fit(cooccurrence)
File "/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 240, in fit
X, fixed_initialization=fixed_initialization)
File "/anaconda3/lib/python3.7/site-packages/mittens/mittens_base.py", line 84, in fit
fixed_initialization=fixed_initialization)
File "/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 56, in _fit
self._build_graph(vocab, initial_embedding_dict)
File "/anaconda3/lib/python3.7/site-packages/mittens/tf_mittens.py", line 151, in _build_graph
tf.tensordot(self.bw, tf.transpose(self.ones), axes=1) +
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4106, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2798, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 5626, in mat_mul
name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in init
self._traceback = tf_stack.extract_stack()

Suggestion on how to generate the cooccurrence matrix

Not an Issue, but i was wondering, do you have any suggestion on a library to generate the matrix (and eventually the vocab) or eventually a tutorial?

I have quite a long corpus of more or less 4k short documents and I cannot use the script by the original project since i'm on windows.

Scale Glove implementation to million vocab size

Hi, Thank you for making TF implementation of Glove available. Is there any plan to scale the implementation to handle tens of billion of tokens or million size vocab? Thanks!

Is there a way to batch train GLoVe models?

I have a gigantic corpus of text to train on that leads to memory issues.
Wondering if there was a way to do a batch training for GLoVe and/or Mittens models similar to partial_fit in some scikit-learn models?

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

When trying to run fit() with a mittens model, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-814206757e5a> in <module>()
      2     cooccur_matrix,
      3     vocab=vocab,
----> 4     initial_embedding_dict=vector_dict
      5 )

~/anaconda3/envs/mittens/lib/python3.6/site-packages/mittens/mittens_base.py in fit(self, X, vocab, initial_embedding_dict, fixed_initialization)
     78             X, vocab, initial_embedding_dict
     79         )
---> 80         weights, log_coincidence = self._initialize(X)
     81         return self._fit(X, weights, log_coincidence,
     82                          vocab=vocab,

~/anaconda3/envs/mittens/lib/python3.6/site-packages/mittens/mittens_base.py in _initialize(self, coincidence)
    144         bounded = np.minimum(coincidence, self.xmax)
    145         weights = (bounded / float(self.xmax)) ** self.alpha
--> 146         log_coincidence = log_of_array_ignoring_zeros(coincidence)
    147         return weights, log_coincidence
    148 

~/anaconda3/envs/mittens/lib/python3.6/site-packages/mittens/mittens_base.py in log_of_array_ignoring_zeros(M)
    258     log_M = M.copy()
    259     mask = log_M > 0
--> 260     log_M[mask] = np.log(log_M[mask])
    261     return log_M
    262 

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

It appears that the indexing can't occur with a 2-dimensional array, but the input to this method is the cooccurrence matrix which has to be 2D, correct?

Note: I'm using numpy 1.14.5

how to use sparse matrix with mittens

I stored the co-occurrence matrix in MatrixMarket format and read into python with mmread() , do I have to convert it as dense matrix (which is impossible for memory issue)? Or does Mittens handle with this format?

Make Mittens Deterministic

At present, mittens is not reproducible because of calls to np.random.seed(None) in the function for generating random matrices. This is a nuisance for testing or research reproducibility. I'm still not 100% sure I'm going to use mittens in my current project, but if I do, I will send a pull request to fix this.

process finished with exit code 137 (interrupted by signal 9 sigkill)

Hi,
i am trying to run a glove model on Hyper-V virtual machine.
my matrix is composed of ~17K different tokens.
i run the following code:
glove_model = GloVe(n=100, max_iter=10)
embeddings = glove_model.fit(cooccurrence)

and i got this error:
process finished with exit code 137 (interrupted by signal: 9 sigkill)

can someone explain me how can i fix it?

Training epochs loss

I fine tuned mittens using stanford glove embeddings on my review dataset. After I prepared my co-occurence matrix the vocabulary size was 43,933. Therefore, given the capacity of my computer I fine tuned in two parts.

used 22000 of initial vocab as first pass to fine tune embeddings and,
used remaining vocab data in second pass.

The strange thing that I observe is that for first pass error over 1000 iterations reduced from 91000 (approx.) to 30000(approx.), but for second pass over 1000 iterations error scale was between 95 and 0.79 (approx).

I am confused to see this behaviour because both pass had almost same amount of data. I would like to know why is this happening.

Is this good or bad? If Yes, then how can I fix it?

Inconsistent results - Mittens VS standard Glove

I am using mittens with a pre-built cooccurrence matrix of domains with the hopes of clustering certain domains that are thematically related, close to each other. Using the non-vectorized glove implementation from https://github.com/stanfordnlp/GloVe, I get very strong results. The current initialization is:

glove_model = Glove(no_components=50, learning_rate=0.03)
glove_model.fit(coo_matrix(matrix, dtype=float), epochs=50, no_threads=64, verbose=True)

Finding the nearest domains to nintendo using cosine distance yields good results.

find_nearest(glove_model, "nintendo", 10)

[('game', 0.955347329117499),
('zavvi', 0.9382098190168783),
('eurogamer', 0.9296358002057901),
('playstation', 0.9290108695965159),
('gamespot', 0.9241452666014682),
('gamesradar', 0.9210470827690169),
('365games', 0.9193152241566838),
('ign', 0.9178656620515147),
('ea', 0.912055674280889),
('forbiddenplanet', 0.9118661547211797)]

Given these results, I wanted to use mittens for two reasons: take advantage of the vectorized implementation for speed, and harness the ability to extend glove into a retrofitted model. However, when I used a basic mittens (without retrofitting existing embeddings), the results come out quite poor, even when the same hyperparameters are used.

glove_mittens_50_50 = GloVe(n=50, max_iter=50, learning_rate=0.03)
cooccurance = np.array(matrix.todense()) # was sparse matrix for original glove
glove_mittens_trained_50_50 = glove_mittens_50_50.fit(cooccurance)

I built a pd dataframe with the resulting numpy matrix and incorporated the domains as the index before writing a function that would calculate the cosine distance in the same way that the original glove model does.

find_nearest(mittens_glove_df_50_50, "nintendo", 10)

[('hmrc', 0.9992567),
('anglingdirect', 0.999141),
('axa', 0.99907136),
('greatist', 0.99906415),
('techadvisor', 0.99906313),
('victorianplumbing', 0.99903136),
('dell', 0.9990228),
('imore', 0.99899846),
('carpetright', 0.99899185)]

As you can see, the results are not at all as expected. Furthermore, while the original glove model will have converged and not change much (only very slightly) by increasing the number of iterations, the vectorized glove in this package will.

find_nearest(mittens_glove_df_50_100, "github", 10)  # 100 iterations

[('yammer', 0.9993163),
('twitch', 0.9992425),
('axs', 0.9992203),
('rottentomatoes', 0.99920493),
('travelsupermarket', 0.99919695),
('lbc', 0.99919665),
('motors', 0.99918556),
('goodreads', 0.9991843),
('deezer', 0.9991767),
('nationalexpress', 0.99917376)]

Is there a reason why this is the case? Am I doing anything wrong, or is there anything else you'd like me to try?

Thanks.

TypeError: exponent must be an integer

When trying to build the mittens model, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-814206757e5a> in <module>()
      2     cooccur_matrix,
      3     vocab=vocab,
----> 4     initial_embedding_dict=vector_dict
      5 )

~/anaconda3/envs/mittens/lib/python3.6/site-packages/mittens/mittens_base.py in fit(self, X, vocab, initial_embedding_dict, fixed_initialization)
     78             X, vocab, initial_embedding_dict
     79         )
---> 80         weights, log_coincidence = self._initialize(X)
     81         return self._fit(X, weights, log_coincidence,
     82                          vocab=vocab,

~/anaconda3/envs/mittens/lib/python3.6/site-packages/mittens/mittens_base.py in _initialize(self, coincidence)
    143         self.n_words = coincidence.shape[0]
    144         bounded = np.minimum(coincidence, self.xmax)
--> 145         weights = (bounded / float(self.xmax)) ** self.alpha
    146         log_coincidence = log_of_array_ignoring_zeros(coincidence)
    147         return weights, log_coincidence

~/anaconda3/envs/mittens/lib/python3.6/site-packages/numpy/matrixlib/defmatrix.py in __pow__(self, other)
    320 
    321     def __pow__(self, other):
--> 322         return matrix_power(self, other)
    323 
    324     def __ipow__(self, other):

~/anaconda3/envs/mittens/lib/python3.6/site-packages/numpy/matrixlib/defmatrix.py in matrix_power(M, n)
    139         raise ValueError("input must be a square array")
    140     if not issubdtype(type(n), N.integer):
--> 141         raise TypeError("exponent must be an integer")
    142 
    143     from numpy.linalg import inv

TypeError: exponent must be an integer

I was able to identify that the error is because self.alpha is set to 0.75. If I set that to 1.0 I do not get the error (though I will obviously not be weighting things as intended).

Note: I am using numpy 1.14.5

Memory Error

I get memory error while converting corpus.matrix (co-occurrencce matrix) to numpy array. This is because the size of my data is quite large.

Is it necessary to convert co-occurrence matrix to numpy array? can we not work with sparse matrix?

What other solutions can you suggest for me?

Save mittens object in the tensorflow implementation.

If I try saving the trained model (GloVe object) with pickle, it fails because I used the tensorflow implementation. How should I save it?

glove = GloVe(max_iter=self.max_iter, n=self.embedding_dim, learning_rate=self.eta)
G = glove.fit(data)
trained_model = glove

with open(model_path, "w") as f:
pickle.dump(self.trained_model, f)

File "glove_vectorizer.py", line 107, in save_model
with open(model_path, "w") as f:
_pickle.PicklingError: Can't pickle <class 'module'>: attribute lookup module on builtins failed

what's meaning of the embeddings from glove_model.fit?

I'm first contact glove. I get the cooccurrence and have a train on your code.But i don't know how to use it what i get.what's the meaning of the embedinngs? Can you recommend a tutorial or give me some explain.If you can tell me the next step that i should do.
I have 7180 vocabulary.so my cooccurrence matrix is 7180*7180.I get the embedings' matrix is 7180 * 100.What's the 100 means?
glove_model = GloVe(max_iter=1000) embeddings = glove_model.fit(cooccurrence)
output: array([[ 0.5545428 , 0.23376928, -0.07426096, ..., 0.990664 , -0.6490942 , 0.6620429 ], [ 0.8841677 , 0.51804036, 0.04785374, ..., 0.68058044, -0.90760165, 0.509221 ], [ 0.20097731, -0.14931226, -0.3834525 , ..., 0.46705124, -0.2532921 , 0.036834 ], ..., [-0.11915646, -0.028824 , -0.05225999, ..., -0.14990021, 0.05760989, -0.12905821], [-0.14854796, -0.02987392, 0.02080684, ..., -0.09068809, 0.1080381 , -0.09017138], [-0.10357033, -0.08430145, -0.03921192, ..., -0.1640319 , 0.05499419, -0.09780643]], dtype=float32)

typeError with tf_mittens.py, line 168

I am trying to using mittens to fit for a target domain, but met the following errors:

new_embed = mittens_model.fit(
... comatrix,
... vocab=id2word_cooc,
... initial_embedding_dict= old_embed)

Traceback (most recent call last):
File "", line 4, in
File "/home/clin/env/local/lib/python2.7/site-packages/mittens/mittens_base.py", line 84, in fit
fixed_initialization=fixed_initialization)
File "/home/clin/env/local/lib/python2.7/site-packages/mittens/tf_mittens.py", line 61, in _fit
self.cost = self._get_cost_function()
File "/home/clin/env/local/lib/python2.7/site-packages/mittens/tf_mittens.py", line 168, in _get_cost_function
if self.mittens > 0:
File "/home/clin/env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 542, in nonzero
raise TypeError("Using a tf.Tensor as a Python bool is not allowed. "
TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

roamanalytics / mittens Goto Github PK

mittens's People

Contributors

Stargazers

Watchers

Forkers

mittens's Issues

Recommend Projects

Recommend Topics

Recommend Org