tohtsky / myfm Goto Github PK

A Python/C++ implementation of Bayesian Factorization Machines

License: MIT License

C++ 68.94% Python 30.88% CMake 0.13% Shell 0.05%

gibbs-sampling-algorithm ordinal-regression factorization-machines factorization-machine gibbs-sampler bayesian-inference regression-models collaborative-filtering

myfm's People

Contributors

Stargazers

Watchers

Forkers

kiminh zzszmyf k-tahiro dav009 smeyerhot devanshusomani99 superjj0430 phucmanh1999 cptangab haginot xiyichen jmabry nghuuquyen merumeru-rururu

myfm's Issues

Problem with mapper dependency.

The mapper library doesn't seem to have any type DefaultMapper...

ImportError                               Traceback (most recent call last)
<ipython-input-9-46b0ef8584af> in <module>
      6 import pandas as pd
      7 from scipy import sparse as sps
----> 8 from mapper import DefaultMapper
      9 # read movielens 1m data.
     10 from myfm.utils.benchmark_data import MovieLens1MDataManager

ImportError: cannot import name 'DefaultMapper' from 'mapper'

Differences in ml100k, ml1m and documentation.

It looks as though both notebooks are loosely following the docs but all three are different. Is one example more up to date than the others or are they all intentionally different? Which notebook should readers step through?

Inference at test time?

How should the FM be used to make predictions? For example, say I train this model on 1000 user movie pairs. I want to make a prediction for an unseen user, which is a vector where the values are predicted ratings for all possible movies. However, in the examples it looks like the same users get used for training and testing. ie. for user A the model trains on 80% of the known movie ratings and then tries to predict the remaining 20%. How should we call the model when we want to predict 80% of ratings for an unseen user ie. one not in the training set?

In other words I would like to take a vector of length n where I have m known ratings and infer the remaining n-m? Would I have to include the m known ratings in the training set?

Bug in myfm.MyFMOrderedProbit() model

Hello Tomoki Ohtsuki,
I was following your ml-100k-extended exemplary notebook, but had problems running the myfm.MyFMOrderedProbit() model with use_date=False. The fitting works fine, but the problem arises during prediction. I tried to attach a screenshot of the error I get. I hope it worked. If not, the error I am getting is "ValueError: Relation blocks have inconsistent mapper size with case_size". In your notebook, if the use_date=False is set, then X is set to None and X_rel=test_blocks. The error message is based on this set None value. However, in the myfm.MyFMRegressor() model everything works as expected.

Thanks in advance for your help!

TypeError: 'CategoryValueToSparseEncoder' object is not subscriptable

Hi, when I run 'python ml-100k-regression.py 1', the bug traceback is as follows:
df_train.shape = (80000, 4), df_test.shape = (20000, 4)
Traceback (most recent call last):
File "ml-100k-regression.py", line 233, in
target.append(RelationBlock(user_map, augment_user_id(unique_users)))
File "ml-100k-regression.py", line 189, in augment_user_id
col.append(movie_to_internal[mid])
TypeError: 'CategoryValueToSparseEncoder' object is not subscriptable

Can you give me some instructions on the bug?

import myfm
from sklearn.feature_extraction import DictVectorizer
import numpy as np
train = [
    {"user": "1", "item": "5", "age": 19},
    {"user": "2", "item": "43", "age": 33},
    {"user": "3", "item": "20", "age": 55},
    {"user": "4", "item": "10", "age": 20},
]
v = DictVectorizer()

X = v.fit_transform(train)

# Note that X is a sparse matrix
print(X.toarray())

# The target variable to be classified.
y = np.asarray([0, 1, 1, 0])
fm = myfm.MyFMClassifier(rank=4)
print("fit")
fm.fit(X,y)
print("fit done")

# It also supports prediction for new unseen items.
fm.predict_proba(v.transform([{"user": "1", "item": "10", "age": 24}]))

$ python test.py 
[[19.  0.  0.  0.  1.  1.  0.  0.  0.]
 [33.  0.  0.  1.  0.  0.  1.  0.  0.]
 [55.  0.  1.  0.  0.  0.  0.  1.  0.]
 [20.  1.  0.  0.  0.  0.  0.  0.  1.]]
fit
  0%|                                                                                                                                         | 0/100 [00:00<?, ?it/s]Segmentation fault (core dumped)

tohtsky / myfm Goto Github PK

myfm's People

Contributors

Stargazers

Watchers

Forkers

myfm's Issues

Problem with mapper dependency.

Differences in ml100k, ml1m and documentation.

Inference at test time?

Bug in myfm.MyFMOrderedProbit() model

TypeError: 'CategoryValueToSparseEncoder' object is not subscriptable

multicore training?

Could we reconstruct the matrix (same dimensions) with the feature embedding ?

Segmentation fault (core dumped)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent