benfred / implicit Goto Github PK
View Code? Open in Web Editor NEWFast Python Collaborative Filtering for Implicit Feedback Datasets
Home Page: https://benfred.github.io/implicit/
License: MIT License
Fast Python Collaborative Filtering for Implicit Feedback Datasets
Home Page: https://benfred.github.io/implicit/
License: MIT License
Hi Ben,
I am trying to modify your code to work with regular ALS matrix factorization algorithm (for sparce matrices)
This code seems working for now.
However could you please take a look and verify correctness of the proposed changes ?
def least_squares(Cui, X, Y, regularization, num_threads=0):
users, factors = X.shape
E = np.eye(factors)
for u in range(users):
A = np.zeros(shape=(factors,factors))
b = np.zeros(factors)
# confidence = 1, Pu = confidence
for i, confidence in nonzeros(Cui, u):
factor = Y[i]
A += np.outer(factor, factor)
b += 1 * factor * confidence
A += regularization * E
X[u] = np.linalg.solve(A, b)
Hey, thanks for putting together this package. I'm encountering a C compiling-related error when I try to install it on Ubuntu. I've checked that gcc is installed. Any thoughts on what might be going on?
Thanks,
Collecting implicit==0.2.6 (from -r requirements.txt (line 67))
Downloading implicit-0.2.6.tar.gz (260kB)
�
100% || 266kB 2.6MB/s
plete output from command python setup.py egg_info:
Error compiling Cython file:
------------------------------------------------------------
...
from cython.parallel import parallel, prange
from libc.stdlib cimport malloc, free
from libc.string cimport memcpy
# requires scipy v0.16
cimport scipy.linalg.cython_lapack as cython_lapack
^
------------------------------------------------------------
implicit/_als.pyx:9:8: 'scipy/linalg/cython_lapack.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
from libc.stdlib cimport malloc, free
from libc.string cimport memcpy
# requires scipy v0.16
cimport scipy.linalg.cython_lapack as cython_lapack
cimport scipy.linalg.cython_blas as cython_blas
^
------------------------------------------------------------
implicit/_als.pyx:10:8: 'scipy/linalg/cython_blas.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void axpy(int * n, floating * da, floating * dx, int * incx, floating * dy,
int * incy) nogil:
if floating is double:
cython_blas.daxpy(n, da, dx, incx, dy, incy)
else:
cython_blas.saxpy(n, da, dx, incx, dy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:18:19: cimported module has no attribute 'saxpy'
Error compiling Cython file:
------------------------------------------------------------
...
# lapack/blas wrappers for cython fused types
cdef inline void axpy(int * n, floating * da, floating * dx, int * incx, floating * dy,
int * incy) nogil:
if floating is double:
cython_blas.daxpy(n, da, dx, incx, dy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:16:19: cimported module has no attribute 'daxpy'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void symv(char *uplo, int *n, floating *alpha, floating *a, int *lda, floating *x,
int *incx, floating *beta, floating *y, int *incy) nogil:
if floating is double:
cython_blas.dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
else:
cython_blas.ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
^
------------------------------------------------------------
implicit/_als.pyx:25:19: cimported module has no attribute 'ssymv'
Error compiling Cython file:
------------------------------------------------------------
...
cython_blas.saxpy(n, da, dx, incx, dy, incy)
cdef inline void symv(char *uplo, int *n, floating *alpha, floating *a, int *lda, floating *x,
int *incx, floating *beta, floating *y, int *incy) nogil:
if floating is double:
cython_blas.dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
^
------------------------------------------------------------
implicit/_als.pyx:23:19: cimported module has no attribute 'dsymv'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) nogil:
if floating is double:
return cython_blas.ddot(n, sx, incx, sy, incy)
else:
return cython_blas.sdot(n, sx, incx, sy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:31:26: cimported module has no attribute 'sdot'
Error compiling Cython file:
------------------------------------------------------------
...
else:
cython_blas.ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) nogil:
if floating is double:
return cython_blas.ddot(n, sx, incx, sy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:29:26: cimported module has no attribute 'ddot'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) nogil:
if floating is double:
cython_blas.dscal(n, sa, sx, incx)
else:
cython_blas.sscal(n, sa, sx, incx)
^
------------------------------------------------------------
implicit/_als.pyx:37:19: cimported module has no attribute 'sscal'
Error compiling Cython file:
------------------------------------------------------------
...
else:
return cython_blas.sdot(n, sx, incx, sy, incy)
cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) nogil:
if floating is double:
cython_blas.dscal(n, sa, sx, incx)
^
------------------------------------------------------------
implicit/_als.pyx:35:19: cimported module has no attribute 'dscal'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void posv(char * u, int * n, int * nrhs, floating * a, int * lda, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dposv(u, n, nrhs, a, lda, b, ldb, info)
else:
cython_lapack.sposv(u, n, nrhs, a, lda, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:44:21: cimported module has no attribute 'sposv'
Error compiling Cython file:
------------------------------------------------------------
...
cython_blas.sscal(n, sa, sx, incx)
cdef inline void posv(char * u, int * n, int * nrhs, floating * a, int * lda, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dposv(u, n, nrhs, a, lda, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:42:21: cimported module has no attribute 'dposv'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void gesv(int * n, int * nrhs, floating * a, int * lda, int * piv, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dgesv(n, nrhs, a, lda, piv, b, ldb, info)
else:
cython_lapack.sgesv(n, nrhs, a, lda, piv, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:51:21: cimported module has no attribute 'sgesv'
Error compiling Cython file:
------------------------------------------------------------
...
cython_lapack.sposv(u, n, nrhs, a, lda, b, ldb, info)
cdef inline void gesv(int * n, int * nrhs, floating * a, int * lda, int * piv, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dgesv(n, nrhs, a, lda, piv, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:49:21: cimported module has no attribute 'dgesv'
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void axpy(int * n, floating * da, floating * dx, int * incx, floating * dy,
int * incy) nogil:
if floating is double:
cython_blas.daxpy(n, da, dx, incx, dy, incy)
else:
cython_blas.saxpy(n, da, dx, incx, dy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:18:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
# lapack/blas wrappers for cython fused types
cdef inline void axpy(int * n, floating * da, floating * dx, int * incx, floating * dy,
int * incy) nogil:
if floating is double:
cython_blas.daxpy(n, da, dx, incx, dy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:16:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void symv(char *uplo, int *n, floating *alpha, floating *a, int *lda, floating *x,
int *incx, floating *beta, floating *y, int *incy) nogil:
if floating is double:
cython_blas.dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
else:
cython_blas.ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
^
------------------------------------------------------------
implicit/_als.pyx:25:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cython_blas.saxpy(n, da, dx, incx, dy, incy)
cdef inline void symv(char *uplo, int *n, floating *alpha, floating *a, int *lda, floating *x,
int *incx, floating *beta, floating *y, int *incy) nogil:
if floating is double:
cython_blas.dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
^
------------------------------------------------------------
implicit/_als.pyx:23:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) nogil:
if floating is double:
return cython_blas.ddot(n, sx, incx, sy, incy)
else:
return cython_blas.sdot(n, sx, incx, sy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:31:31: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
else:
cython_blas.ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
cdef inline floating dot(int *n, floating *sx, int *incx, floating *sy, int *incy) nogil:
if floating is double:
return cython_blas.ddot(n, sx, incx, sy, incy)
^
------------------------------------------------------------
implicit/_als.pyx:29:31: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) nogil:
if floating is double:
cython_blas.dscal(n, sa, sx, incx)
else:
cython_blas.sscal(n, sa, sx, incx)
^
------------------------------------------------------------
implicit/_als.pyx:37:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
else:
return cython_blas.sdot(n, sx, incx, sy, incy)
cdef inline void scal(int *n, floating *sa, floating *sx, int *incx) nogil:
if floating is double:
cython_blas.dscal(n, sa, sx, incx)
^
------------------------------------------------------------
implicit/_als.pyx:35:25: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void posv(char * u, int * n, int * nrhs, floating * a, int * lda, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dposv(u, n, nrhs, a, lda, b, ldb, info)
else:
cython_lapack.sposv(u, n, nrhs, a, lda, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:44:27: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cython_blas.sscal(n, sa, sx, incx)
cdef inline void posv(char * u, int * n, int * nrhs, floating * a, int * lda, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dposv(u, n, nrhs, a, lda, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:42:27: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void gesv(int * n, int * nrhs, floating * a, int * lda, int * piv, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dgesv(n, nrhs, a, lda, piv, b, ldb, info)
else:
cython_lapack.sgesv(n, nrhs, a, lda, piv, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:51:27: Calling gil-requiring function not allowed without gil
Error compiling Cython file:
------------------------------------------------------------
...
cython_lapack.sposv(u, n, nrhs, a, lda, b, ldb, info)
cdef inline void gesv(int * n, int * nrhs, floating * a, int * lda, int * piv, floating * b,
int * ldb, int * info) nogil:
if floating is double:
cython_lapack.dgesv(n, nrhs, a, lda, piv, b, ldb, info)
^
------------------------------------------------------------
implicit/_als.pyx:49:27: Calling gil-requiring function not allowed without gil
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-5s7y1gnq/implicit/setup.py", line 111, in <module>
ext_modules=define_extensions(use_cython),
File "/tmp/pip-build-5s7y1gnq/implicit/setup.py", line 47, in define_extensions
return cythonize(modules)
File "/home/rof/.pyenv/versions/3.5.4/lib/python3.5/site-packages/Cython/Build/Dependencies.py", line 1039, in cythonize
cythonize_one(*args)
File "/home/rof/.pyenv/versions/3.5.4/lib/python3.5/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: implicit/_als.pyx
implicit/_als.pyx: cannot find cimported module 'scipy.linalg.cython_lapack'
implicit/_als.pyx: cannot find cimported module 'scipy.linalg.cython_blas'
Compiling implicit/_als.pyx because it depends on /home/rof/.pyenv/versions/3.5.4/lib/python3.5/site-packages/Cython/Includes/libc/string.pxd.
Compiling implicit/_nearest_neighbours.pyx because it depends on /home/rof/.pyenv/versions/3.5.4/lib/python3.5/site-packages/Cython/Includes/libcpp/vector.pxd.
[1/2] Cythonizing implicit/_als.pyx
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5s7y1gnq/implicit/
So the als.recommend function takes an 'int' for userid and a 'coo_matrix' for item_user_data.T which is a coo_matrix, however you cannot index coo_matrix that way so it fails for me
In [46]: ratings
Out[46]:
<635810x14744082 sparse matrix of type '<class 'numpy.float64'>'
with 115307196 stored elements in COOrdinate format>
In [47]: model.recommend(5, ratings.T)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-2d2962f49703> in <module>()
----> 1 model.recommend(5, ratings.T)
/Users/ml/lib/python3.5/site-packages/implicit/als.py in recommend(self, userid, user_items, N)
87
88 # calcualte the top N items, removing the users own liked items from the results
---> 89 liked = set(user_items[userid].indices)
90 count = N + len(liked)
91 if count < len(scores):
TypeError: 'coo_matrix' object does not support indexing
If not a coo_matrix what should I pass to the recommend function?
OS: Windows 10
Python Version: 3.5.2
Cython Version: 0.26
scipy Version: 0.19.1
pip install implicit as well as python setup.py install result in errors:
gcc: error: /O2: No such file or directory
gcc: error: /openmp: No such file or directory
I thought openmp was only relevant for OSX. Any suggestions?
Is it a good idea to stop ALS by validation dataset based on some criteria(RMSE, etc.)? The paper use probe datasets as validation-set. Once the RMSE is less than 1e-9, they stop the iteration.
I'm trying to install the package from sources with the following command:
python setup.py install
However, when I import implicit
I get the following error:
In [1]: import implicit
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-9eae7608d57c> in <module>()
----> 1 import implicit
/home/agrigorev/soft/implicit/implicit/__init__.py in <module>()
----> 1 from .implicit import alternating_least_squares
2
3 __version__ = '0.1.5'
4
5 __all__ = [alternating_least_squares, __version__]
/home/agrigorev/soft/implicit/implicit/implicit.py in <module>()
4 import os
5 import logging
----> 6 from . import _implicit
7
8 log = logging.getLogger("implicit")
ImportError: cannot import name '_implicit'
I assume this indicates that there was a problem with building the cython source.
What is the correct way of installing the library from sources? I'm running anaconda3 with python 3.5 under ubuntu.
Ben, thank you for a wonderful blogpost on CUDA programming! Was playing with the latest version of implicit. While the package does build, I am running into the module 'implicit.cuda' has no attribute 'CuCSRMatrix'
error.
It looks like this is due to the implicit.cuda.CuCSRMatrix
call (and other related calls) on https://github.com/benfred/implicit/blob/master/implicit/als.py#L171. Switching all of them to _cuda.CuCSRMatrix
after changing the import statement to from implicit.cuda import _cuda
fixes the issue.
Note that https://github.com/benfred/implicit/blob/master/implicit/cuda/__init__.py#L3 doesn't seem to be doing what it's expected to do.
Hi,
Great package. Thanks for sharing it.
I just want to note that cython is a requirement for this package, and I had problems installing it until I discover that.
thanks.
Imri
It seems that the code expects a sparse (e.g. csr) input containing the confidences on each of the UI pairs. However, looking at the Hu et al. 2008 paper, it seems like there should still be non-zero confidence for items that are unobserved (r_ui = 0
in the paper's notation), such as from c_ui = 1 + alpha * r_ui. But, doing this would make the confidence matrix dense.
I see that in computing the updates for the latent factor matrices, there is a (C - I) term and a multiplication by P, which transform back to the sparse space, but then shouldn't the input matrix be dense to accommodate the subtraction? (Ideally it isn't, but I'm trying to understand the implementation as written).
Thanks a lot for the explanation!
pip install implicit
initially fails to find GCC since clang gcc is located in /usr/bin/gcc
After adding /usr/bin/gcc
to setup.py
I get:
gcc: error: implicit/_implicit.c: No such file or directory
I also tried installing gcc via Homebrew and installing from source:
python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to implicit.egg-info/requires.txt
writing implicit.egg-info/PKG-INFO
writing top-level names to implicit.egg-info/top_level.txt
writing dependency_links to implicit.egg-info/dependency_links.txt
reading manifest file 'implicit.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'implicit/*.c'
writing manifest file 'implicit.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.5-x86_64/egg
running install_lib
running build_py
running build_ext
building 'implicit._implicit' extension
gcc-6 -fno-strict-aliasing -I/Users/seanlaw/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/seanlaw/anaconda/include/python2.7 -c implicit/_implicit.c -o build/temp.macosx-10.5-x86_64-2.7/implicit/_implicit.o -fopenmp -ffast-math
gcc-6: error: implicit/_implicit.c: No such file or directory
gcc-6: fatal error: no input files
compilation terminated.
An exception has occurred, use %tb to see the full traceback.
SystemExit: error: command 'gcc-6' failed with exit status 1
Don't know if there were recent changes but it looks like implict/_implicit.c
is missing
I've been running the movielen example, trying to use approximate_als.NMSLibAlternatingLeastSquares
instead os the AlternatingLeastSquares
.
The recommend function returns indexs I can't decipher what they means.
I think there might be a bug.
It fails when I try to use the movie_lookup[movie]
.
I've got some troubles with executing lastfm.py test:
asegrenev@vw:~/Downloads/implicit$ python3 examples/lastfm.py
Traceback (most recent call last):
File "examples/lastfm.py", line 24, in
from implicit import alternating_least_squares
File "/home/asegrenev/anaconda3/lib/python3.5/site-packages/implicit-0.1.7-py3.5-linux-x86_64.egg/implicit/init.py", line 1, in
from .implicit import alternating_least_squares
File "/home/asegrenev/anaconda3/lib/python3.5/site-packages/implicit-0.1.7-py3.5-linux-x86_64.egg/implicit/implicit.py", line 6, in
from . import _implicit
ImportError: /home/asegrenev/anaconda3/lib/python3.5/site-packages/implicit-0.1.7-py3.5-linux-x86_64.egg/implicit/_implicit.cpython-35m-x86_64-linux-gnu.so: undefined symbol: GOMP_parallel
On Ubuntu 16.04, when I try to import the bm25_weight function I get the following error:
import implicit
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/implicit/init.py", line 3, in
from . import nearest_neighbours
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/implicit/nearest_neighbours.py", line 7, in
from ._nearest_neighbours import all_pairs_knn
ImportError: /home/ubuntu/anaconda3/lib/python3.6/site-packages/implicit/_nearest_neighbours.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZdlPvm
So I am attempting to build this library, but I keep getting this error about not method called POSV found:
Downloading/unpacking implicit
Downloading implicit-0.1.7.tar.gz (161kB): 161kB downloaded
Running setup.py (path:/tmp/pip_build_skylion/implicit/setup.py) egg_info for package implicit
Error compiling Cython file:
------------------------------------------------------------
...
# Since we've already added in YtY, we subtract 1 from confidence
for j in range(factors):
temp = (confidence - 1) * Y[i, j]
axpy(&factors, &temp, &Y[i, 0], &one, A + j * factors, &one)
posv("U", &factors, &one, A, &factors, b, &factors, &err);
------------------------------------------------------------
implicit/_implicit.pyx:97:20: no suitable method found
Error compiling Cython file:
---------------------------------
I am really wondering what could be the issue. I can look into trying to get this to build later, but I have tried on both bash for Windows and Windows with the proper libraries installed. I am wondering if this is an issue with newer versions of scipy or cython. Here is my scipy config info.
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib']
language = f77
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
openblas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
I was able to compile the package from source,
however I am getting error during the execution of the least_squares (cpp extension, python version is working)
File "build/bdist.freebsd-11.0-RELEASE-p1-amd64/egg/implicit/als.py", line 48, in alternating_least_squares
File "implicit/_als.pyx", line 60, in implicit._als.least_squares (implicit/_als.cpp:3561)
ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
Here is the corresponding line from _als.cpp
cdef double[:] data = Cui.data
Can you please advise how to fix this ?
Tried the following source to get iteration log.
model = AlternatingLeastSquares(factors=20,
regularization = 0.1,
calculate_training_loss=True,
iterations = 150,
num_threads=4)
Strangely there was no log of iteration loss. Checked the source als.py, but don't know why.
It runs OK. Just no log to check the output.
Hi Ben,
First let me thank you so much for this amazing software! I really appreciate the time and effort that went into it. I'm excited to try the changes that permit quick recommendations but I'm getting the following error when I try to run the lastfm.py example:
Traceback (most recent call last): File "lastfm.py", line 27, in <module> from implicit.approximate_als import (AnnoyAlternatingLeastSquares, NMSLibAlternatingLeastSquares, ImportError: No module named approximate_als
PyCharm also says cannot find reference 'approximate_als' in __init.py__
Note that the line numbers may be off by a few as I added: import os os.environ["OPENBLAS_NUM_THREADS"] = "1"
in light of a related OpenBLAS warning I received.
I'm on Xubuntu that I installed a few days ago (not a VM).
I can comment out the approximate_als import and the subsequent reference to it, and the code then runs fine (giving a quick test with bm25). I tried to figure out a solution and it's probably really obvious but I haven't figured it out.
Probably unhelpful stuff I did to try to fix it: I tried adding "from . import approximate_als" to the top of init.py and "approximate_als to "all =" but that didn't work. I also tried blanking out init.py but that didn't work. I did a pip install nmslib and pip install annoy after combing through your blog but that didn't help.
Thanks for any thoughts you might have!
It would be super cool feature to include side-features.
I've been using this library for building a recommender system where items already consumed ('liked') shouldn't be hidden, so I had to download the library and recompile it to not filter them out.
How does it sound to expose a named parameter for disabling that filtering? I could implement it...
I don't know if I get this right,
but since
related = model.similar_items(itemid,K)
return the best K similar itemID,
how can I find get the best K similar userID list given some userID?
looking forward to your reply
The logic in setup.py fallbacks to '-march=native' if anaconda is not detected. This creates a problem for our Jenkins build. Can we update the build commands to allow custom CPU types?
movieId| name | similar_movieId | score
26462 | Bad Boys (1983)| 113812 | 0.926960830278
Hi Ben,
Thanks for the awesome library, i am using the library to create a recommender system.
I am trying to implement my own version of returning the liked items by the user, so can i ask you what do the recalculate_user parameter in the recommend() do in the als.py?
Thank you in advance!
Meiyi
pip install implicit
Successfully installed implicit-0.1.7
import implicit
implicit.__version__
'0.1.5'
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
I can see the the first entry for user_factors is for new user.
But if I know which user id I want, how can I retrieve it per user? how to match each user to it's factors?
ImportError Traceback (most recent call last)
in ()
----> 1 import implicit
/Users/yanan.chen/anaconda/lib/python2.7/site-packages/implicit/init.py in ()
1 from .als import alternating_least_squares
2
----> 3 from . import nearest_neighbours
4 from . import als
5
/Users/yanan.chen/anaconda/lib/python2.7/site-packages/implicit/nearest_neighbours.py in ()
5 from scipy.sparse import coo_matrix, csr_matrix
6
----> 7 from ._nearest_neighbours import all_pairs_knn
8 from .recommender_base import RecommenderBase
9 from .utils import nonzeros
ImportError: dlopen(/Users/yanan.chen/anaconda/lib/python2.7/site-packages/implicit/_nearest_neighbours.so, 2): Symbol not found: __ZdlPvm
Referenced from: /Users/yanan.chen/anaconda/lib/python2.7/site-packages/implicit/_nearest_neighbours.so
Expected in: flat namespace
in /Users/yanan.chen/anaconda/lib/python2.7/site-packages/implicit/_nearest_neighbours.so
When I run the attached input, I get the following input:
Traceback (most recent call last):
File "/Users/username/Desktop/Recommendation/Implementation.py", line 206, in
collaborative_filter(formatted, result)
File "/Users/username/Desktop/Recommendation/Implementation.py", line 80, in
collaborative_filter
df, plays = read_data(input_filename)
File "/Users/username/Desktop/Recommendation/Implementation.py", line 25, in read_data
data['user'].cat.codes.copy())))
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 182, in init
self._check()
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 240, in _check
raise ValueError('negative row index found')
ValueError: negative row index found
From what I can tell, the input is correctly formatted with 3 columns separated by tabs. Thank you for your time!
faulty_input.txt
Hi Ben,
I'm using implicit to predict a top7list of recommendations using a sparse matrix of aggregated customer purchases composed of 7101 customer purchases from 24 products.
The issue I'm having is that I'm a little confused at the output from .recommend which produces a list of N tuples:
[(845, 1.0136324354312989), (1150, 1.0028331824506354), (51, 1.0027650376439357), (2411, 1.0024685562873292), (1810, 1.0019960930254448), (1211, 1.0018685279069661), (775, 1.0018545578136604)]
Now I would have expected the first value in the tuple to be an index to the product list, but I suspect that I'm looking at the indices for the latent factor vectors? If you give me a steer about the process for extracting out the product identities it would be very much appreciated.
Kind regards,
Michael.
`
import pandas as pd
import scipy.sparse as sparse
import numpy as np
import implicit
# import data and add header rows
data = pd.read_csv('D:\santander\\train_sample_small.csv', names=['cust_id', 'product', 'rating'])
# transform dataset to sum by activity
grouped_data = data.groupby(['cust_id', 'product']).sum().reset_index()
grouped_data.head()
# Only get customers where purchase totals were positive
grouped_purchased = grouped_data.query('rating > 0')
print(grouped_purchased.head())
# Get our unique customers
customers = list(np.sort(grouped_purchased.cust_id.unique()))
# Get our unique products that were purchased
products = list(grouped_purchased['product'].unique())
# All of our purchases
rating = list(grouped_purchased.rating)
# Get the associated row/column indices
rows = grouped_purchased['cust_id'].astype('category', categories=customers).cat.codes
cols = grouped_purchased['product'].astype('category', categories=products).cat.codes
# create sparse matrix from data
purchases_sparse = sparse.csr_matrix((rating, (rows, cols)), shape=(len(customers), len(products)), dtype=np.float64)
# Build, fit model and recommend top 7 products for first user
model = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=50)
model.fit(item_users=purchases_sparse)
recom = model.recommend(userid=0, user_items=purchases_sparse.T, N=7)`
I don't see much about this situation in the literature - but suppose we hypothesize that our confidence in user's implicit feedback should decrease as time passes (ie we are more confident that they're interested in an item they recently interacted with than one they interacted with days/weeks ago). Any advice/thoughts on how to approach this? Am currently applying a time decay function directly to the R_ui matrix, but if you have more experience with this setting I'd be happy to hear about it.
Non issue, it was 32-bit python, after installing 64-bit, it works fine
File "pandas_libs\parsers.pyx", line 894, in pandas._libs.parsers.TextReader.read
File "pandas_libs\parsers.pyx", line 944, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas_libs\parsers.pyx", line 2228, in pandas._libs.parsers._concatenate_chunks
MemoryError
Hey Ben, looks like when I install via pip I'm not getting your recent changes to support 32 bit factorization -- the dtype argument is not part of the alternating_least_squares
method signature. (Among other things, this means I can't download and run the tests.)
error: command 'cl.exe' failed: No such file or directory
Thanks for wonderful lib,
I have a question: what if I have multiple user features like user rating, play time, play number of an item....,
How to compile these user features and apply it as an input ranking to run this algorithm?
Currently if you ask for more recommendations that are available, you get out of bounds.
Solution: return as many as possible instead
I can fix that and make PR
To compute similar artists is great, but making customised recommendation for each user based on his or her listening history could be better?
all the time I try to use example script lastfm.py I get the error
python implicit_argparser.py --input=usersha1-artmbid-artname-plays.tsv
WARNING:root:Annoy isn't installed
Traceback (most recent call last):
File "implicit_argparser.py", line 155, in
cg=args.cg)
File "implicit_argparser.py", line 97, in calculate_similar_artists
model.fit(plays)
File "/Users/Kakadu/anaconda/lib/python3.6/site-packages/implicit/annoy_als.py", line 78, in fit
self.cosine_index = annoy.AnnoyIndex(self.item_factors.shape[1], 'angular')
NameError: name 'annoy' is not defined
Where can I get Annoy? P.S I use MacOS
by the way I used model=als by default and even in this way I get the error with Annoy that can be seen in Traceback , be glad to see a reasonable reply
Hi Ben, thanks for the package. I have a dataset of 14M+ users and 1M+ items. The model fitting take around 70 mins. But getting recommendations for all users would take north of 10 hours. Is there a way to expedite this, parallelize it maybe. Any suggestions?
Hi @benfred,
Can you add in the official library the option for recommend method to accept additional list of items, that you don't want to include in the recommendations. Therefore we would still get for example top N items (excluding the items from given list) instead of filtering afterwards where you prune given recommendations and get less than the number of requested recommendations.
Thanks!
Are the models pickle
-able? If so close this question, if not -- consider it a feature request!
Hi Ben. Thanks for library and especially for great posts. I'm wondering what is the procedure for loss calculation? I checked code here, but didn't understand exact algorithm. I suppose it is kind of approximation for loss from paper, isn't it? I think it is almost infeasible(very computationally expensive) to calculate exact loss from paper, because it will require to calculate prediction for each users and each item matrix in order to take into account loss for not observed items. Or I missed some trick?
Hi, thanks for this nice package!
AlternatingLeastSquares.recommend
has a parameter called filter_items
, which, apart from the source code, does not have any documentation.
The same parameter is found in RecommenderBase
, AnnoyAlternatingLeastSquares
, and ItemItemRecommender
.
Before reading the source, I thought it was a whitelist of items that the recommendation should select from (which suited my usecase), but as it turns out, it is a blacklist. So, I have a two suggestions I'd like to hear your thoughts on:
filter_items
, and make a more descriptively named parameter such as skip_items
, ignore_items
, or item_blacklist
.filter_items: A list of items that should not be recommended
.I would think the second one is a no-brainer - of course it should be documented. The first one, however, is a bit more bold. Thoughts?
As another remark (should I move it to another issue?), the line if filter_items:
such as here doesn't work with numpy arrays. I would suggest moving to if filter_items is not None:
instead.
Hi Ben,
I was testing new version.
But seems like import of the nearest_neighbours has failed.
vagrant@deep-learning:~/implicit/tests$ python als_test.py
.
----------------------------------------------------------------------
Ran 1 test in 3.561s
OK
vagrant@deep-learning:~/implicit/tests$ python knn_test.py
E
======================================================================
ERROR: testNearestNeighbours (__main__.NearestNeighboursTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "knn_test.py", line 20, in testNearestNeighbours
counts = implicit.nearest_neighbours.tfidf_weight(counts).tocsr()
AttributeError: 'module' object has no attribute 'nearest_neighbours'
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (errors=1)
vagrant@deep-learning:~/implicit/tests$
is there any way this can take advantage of aws lamda for big input file and output?
generator new data and new user every day,how to do incremental training
pip install implicit
fails on Windows 7.
After first fixing the issue of nog finding the VS C++ compiler (by installing the appropriate VS build tools), now the setup reports "failed building wheel" for this package.
Additionaly this message appears:
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
/c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\XXX\AppData\Local\Continuum\
Anaconda3\include -IC:\Users\XXX\AppData\Local\Continuum\Anaconda3\include
"-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program
Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86
)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\in
clude\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tcimplicit\
_implicit.c /Fobuild\temp.win-amd64-3.5\Release\implicit\_implicit.obj -Wno-unus
ed-function -O3 -fopenmp -ffast-math
cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function'
I tried checking for the CFLAGS in a previously mentioned issue (that one was on Mac OSX though):
C:\Users\XXX\implicit>python
Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1
900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> print(sysconfig.get_config_var("CFLAGS"))
None
Any idea what is going on here?
requirements.txt
should be updated. implicit-0.2.6
could not be installed with current requirement Cython==0.22.0.
In many cases, ranking the items is an easier problem then solving the matrix.
That can be implemented by optimise the precision-recall instead of the RMSE.
Another cool feature.
Every time when i run model.fit() i obtain different values in model.user_factors and model.item_factors. How can i get reproducible result after model fitting?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.