Giter VIP home page Giter VIP logo

pywfm's Introduction

pywFM

pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library:

Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain. libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares (ALS) optimization as well as Bayesian inference using Markov Chain Monte Carlo (MCMC).

For more information regarding Factorization machines and libFM, read Steffen Rendle's paper: Factorization Machines with libFM, in ACM Trans. Intell. Syst. Technol., 3(3), May. 2012

Don't forget to acknowledge libFM (i.e. cite the paper Factorization Machines with libFM) if you publish results produced with this software.

Motivation

While using Python implementations of Factorization Machines, I felt that the current implementations (pyFM and fastFM) had many flaws. Then I though, why re-invent the wheel? Why not use the original libFM?

Sure, it's not Python native yada yada ... But at least we have a bulletproof, battle-tested implementation that we can guide ourselves with.

Installing

First you have to clone and compile libFM repository and set an environment variable to the libFM bin folder:

git clone https://github.com/srendle/libfm /home/libfm
cd /home/libfm/
# taking advantage of a bug to allow us to save model #ShameShame
git reset --hard 91f8504a15120ef6815d6e10cc7dee42eebaab0f
make all
export LIBFM_PATH=/home/libfm/bin/

Make sure you are compiling source from libfm repository and at this specific commit, since pywFM needs the save_model. Beware that the installers and source code in libfm.org are both dated before this commit. I know this is extremely hacky, but since a fix was deployed it only allows the save_model option for SGD or ALS. I don't know why exactly, because it was working well before.

If you use Jupyter take a look at the following issue for some extra notes on getting libfm to work.

Then, install pywFM using pip:

pip install pywFM

Binary installers for the latest released version are available at the Python package index.

Dependencies

  • numpy
  • scipy
  • sklearn
  • pandas

Example

Very simple example taken from Steffen Rendle's paper: Factorization Machines with libFM.

import pywFM
import numpy as np
import pandas as pd

features = np.matrix([
#     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
#    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
    [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
    [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
    [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
    [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
    [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
    [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
    [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
])
target = [5, 3, 1, 4, 5, 1, 5]

fm = pywFM.FM(task='regression', num_iter=5)

# split features and target for train/test
# first 5 are train, last 2 are test
model = fm.run(features[:5], target[:5], features[5:], target[5:])
print(model.predictions)
# you can also get the model weights
print(model.weights)

You can also use numpy's array, sklearn's sparse_matrix, and even pandas' DataFrame as features input.

Prediction on new data

Current approach is to send test data as x_test, y_test in run method call. libfm uses the test values to output some results regarding its predictions. They are not used when training the model. y_test can be set as dummy value and just collect the predictions with model.predictions (also disregard the prediction statistics since those will be wrong). For more info check libfm manual.

Running against a new dataset using something like a predict method is not supported yet. Pending feature request: #7

Feel free to PR that change ;)

Usage

Don't forget to acknowledge libFM (i.e. cite the paper Factorization Machines with libFM) if you publish results produced with this software.

FM: Class that wraps libFM parameters. For more information read libFM manual
Parameters
----------
task : string, MANDATORY
        regression: for regression
        classification: for binary classification
num_iter: int, optional
    Number of iterations
    Defaults to 100
init_stdev : double, optional
    Standard deviation for initialization of 2-way factors
    Defaults to 0.1
k0 : bool, optional
    Use bias.
    Defaults to True
k1 : bool, optional
    Use 1-way interactions.
    Defaults to True
k2 : int, optional
    Dimensionality of 2-way interactions.
    Defaults to 8
learning_method: string, optional
    sgd: parameter learning with SGD
    sgda: parameter learning with adpative SGD
    als: parameter learning with ALS
    mcmc: parameter learning with MCMC
    Defaults to 'mcmc'
learn_rate: double, optional
    Learning rate for SGD
    Defaults to 0.1
r0_regularization: int, optional
    bias regularization for SGD and ALS
    Defaults to 0
r1_regularization: int, optional
    1-way regularization for SGD and ALS
    Defaults to 0
r2_regularization: int, optional
    2-way regularization for SGD and ALS
    Defaults to 0
rlog: bool, optional
    Enable/disable rlog output
    Defaults to True.
verbose: bool, optional
    How much infos to print
    Defaults to False.
seed: int, optional
    seed used to reproduce the results
    Defaults to None.
silent: bool, optional
    Completly silences all libFM output
    Defaults to False.
temp_path: string, optional
    Sets path for libFM temporary files. Usefull when dealing with large data.
    Defaults to None (default mkstemp behaviour)
FM.run: run factorization machine model against train and test data

Parameters
----------
x_train : {array-like, matrix}, shape = [n_train, n_features]
    Training data
y_train : numpy array of shape [n_train]
    Target values
x_test: {array-like, matrix}, shape = [n_test, n_features]
    Testing data
y_test : numpy array of shape [n_test]
    Testing target values
x_validation_set: optional, {array-like, matrix}, shape = [n_train, n_features]
    Validation data (only for SGDA)
y_validation_set: optional, numpy array of shape [n_train]
    Validation target data (only for SGDA)

Return
-------
Returns `namedtuple` with the following properties:

predictions: array [n_samples of x_test]
   Predicted target values per element in x_test.
global_bias: float
    If k0 is True, returns the model's global bias w0
weights: array [n_features]
    If k1 is True, returns the model's weights for each features Wj
pairwise_interactions: numpy matrix [n_features x k2]
    Matrix with pairwise interactions Vj,f
rlog: pandas dataframe [nrow = num_iter]
    `pandas` DataFrame with measurements about each iteration

Docker

This repository includes Dockerfile for development and for running pywFM.

  • Run pywFM examples (Dockerfile): if you are only interested in running the examples, you can use the pre-build image availabe in Docker Hub:
# to run examples/simple.py (the one in this README).
docker run --rm -v "$(pwd)":/home/pywfm -w /home/pywfm -ti jfloff/pywfm python examples/simple.py
  • Development of pywFM (Dockerfile): useful if you want to make changes to the repo. Dockerfile defaults to bash.
# to build image
docker build --rm=true -t jfloff/pywfm-dev .
# to run image
docker run --rm -v "$(pwd)":/home/pywfm-dev -w /home/pywfm-dev -ti jfloff/pywfm-dev

Future work

  • Improve the save_model / load_model so we can have a more defined init-fit-predict cycle (perhaps we could inherit from sklearn.BaseEstimator)
  • Can we contribute to libFM repo so save_model is enabled for all learning methods (namely MCMC)?
  • Look up into shared library solution to improve I/O overhead

I'm no factorization machine expert, so this library was just an effort to have libFM as fast as possible in Python. Feel free to suggest features, enhancements; to point out issues; and of course, to post PRs.

License

MIT (see LICENSE.txt file)

pywfm's People

Contributors

capdaha avatar jfloff avatar vi3k6i5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pywfm's Issues

Support dense matrices or convert to sparse instead of raising TypeError

When I try to fit als.FMClassifier with a dense numpy matrix, I get:

  File "/home/mack/anaconda2/lib/python2.7/site-packages/fastFM/als.py", line 175, in fit
    order="F")
  File "/home/mack/anaconda2/lib/python2.7/site-packages/fastFM/validation.py", line 29, in wrapper
    raise TypeError('A dense matrix was passed in, but sparse'
TypeError: A dense matrix was passed in, but sparsedata is required.

Any reason not to support dense matrices? I'm wondering if it's simply because the type of data an FM is well-suited for cannot generally be represented by a dense matrix. Seems like if the user can fit it into memory, then why not support it? If sparse matrices are absolutely required, is it reasonable to simply convert to a dense matrix instead of raising the TypeError?

problem of running in jupyter notebook

I followed the instructions as you wrote on the github and can successfully run the test codes.
however, I met the following error when running in my jupyter notebook


OSError Traceback (most recent call last)
in ()
16 target = [5, 3, 1, 4, 5, 1, 5]
17
---> 18 fm = pywFM.FM(task='regression', num_iter=5)
19
20 # split features and target for train/test

/usr/local/lib/python2.7/dist-packages/pywFM/init.pyc in init(self, task, num_iter, init_stdev, k0, k1, k2, learning_method, learn_rate, r0_regularization, r1_regularization, r2_regularization, rlog, verbose, seed, silent, temp_path)

OSError: LIBFM_PATH is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).

actually I have already set the LIBFM_PATH in my ~/.bashrc file as this:
export LIBFM_PATH=$HOME/local/libfm/bin
I don't know why the jupyter notebook cannot find this path

Python3?

Hi,

Does pywFM support Python3? I tried to run the example and got this error

model = fm.run(features[:5], target[:5], features[5:], target[5:])
/bin/sh: /usr/local/lib/python3.5/site-packages/pywFM/libfm/bin/libFM: cannot execute binary file
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-85e95dc4c2fd> in <module>()
----> 1 model = fm.run(features[:5], target[:5], features[5:], target[5:])

/usr/local/lib/python3.5/site-packages/pywFM/__init__.py in run(self, x_train, y_train, x_test, y_test)
    184         # parses rlog into
    185         import pandas as pd
--> 186         rlog = pd.read_csv(rlog_path, sep='\t')
    187
    188         # removes temporary output file after using

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    496                     skip_blank_lines=skip_blank_lines)
    497
--> 498         return _read(filepath_or_buffer, kwds)
    499
    500     parser_f.__name__ = name

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    273
    274     # Create the parser.
--> 275     parser = TextFileReader(filepath_or_buffer, **kwds)
    276
    277     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    588             self.options['has_index_names'] = kwds['has_index_names']
    589
--> 590         self._make_engine(self.engine)
    591
    592     def _get_options_with_defaults(self, engine):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    729     def _make_engine(self, engine='c'):
    730         if engine == 'c':
--> 731             self._engine = CParserWrapper(self.f, **self.options)
    732         else:
    733             if engine == 'python':

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1101         kwds['allow_leading_cols'] = self.index_col is not False
   1102
-> 1103         self._reader = _parser.TextReader(src, **kwds)
   1104
   1105         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)()

ValueError: No columns to parse from file

Model cannot be saved when k2 = 0

When the latent dimension is 0, libFM still performs training but the pywFM can't save the model.

Code:

fm = pywFM.FM(task='classification', num_iter=5, k2=0, rlog=False)

Error:

Traceback (most recent call last):
  File "fm_assist.py", line 34, in <module>
    model = fm.run(X_train, df_train['outcome'], X_test, df_test['outcome'])
  File "/Users/jilljenn/code/TF-recomm/venv/lib/python3.6/site-packages/pywFM/__init__.py", line 222, in run
    pairwise_interactions.append([float(x) for x in line.split(' ')])
  File "/Users/jilljenn/code/TF-recomm/venv/lib/python3.6/site-packages/pywFM/__init__.py", line 222, in <listcomp>
    pairwise_interactions.append([float(x) for x in line.split(' ')])
ValueError: could not convert string to float: 

C assertions causing core dump without any useful message

I'm getting the following with 0.2.9 (fresh pip install today):

python: ffm_als_mcmc.c:172: sparse_fit: Assertion `(sizeof (*w_0) == sizeof (float) ? __finitef (*w_0) : sizeof (*w_0) == sizeof (double) ? __finite (*w_0) : __finitel (*w_0)) && "w_0 not finite"' failed.
[1]    30860 abort (core dumped)  python classify.py --nsplits 1 -c fm --basic-descriptive --date  --subways ...

I'm using the als.FMClassifier via a wrapper class for one-vs-rest classification:

        from fastFM import als
        class FMClassifier(als.FMClassification):
            def fit(self, X, y, *args):
                y = y.copy()
                y[y == 0] = -1
                return super(FMClassifier, self).fit(X, y, *args)

            def predict_proba(self, X):
                probs = super(FMClassifier, self).predict_proba(X)
                return np.tile(probs, 2).reshape(2, probs.shape[0]).T

        from sklearn.multiclass import OneVsRestClassifier
        return OneVsRestClassifier(
            FMClassifier(n_iter=500, random_state=42))

The data I was using to fit the model is attached. I tried to get a smaller dataset, but I was having trouble producing it when I cut things out.

X_train_sparse_fit_issue.npy.zip
y_train_sparse_fit_issue.npy.zip

Running error with __init__.py

Hi,
I am trying the example code but got some error message w.r.t the init.py (see below). Is there anything I can do to avoid the error? Thanks!


EmptyDataError Traceback (most recent call last)
in ()
20 # split features and target for train/test
21 # first 5 are train, last 2 are test
---> 22 model = fm.run(features[:5], target[:5], features[5:], target[5:])
23 print(model.predictions)
24 # you can also get the model weights

F:\Anaconda\lib\site-packages\pywFM_init_.py in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set)
228 # parses rlog into
229 import pandas as pd
--> 230 rlog = pd.read_csv(rlog_path, sep='\t')
231 os.close(rlog_fd)
232 os.remove(rlog_path)

an Error in Pandas

Hi.
I tried your sample code, then an error occurred in
"model = fm.run(features[:5], target[:5] , features[5:], target[5:])".
Then, it says
"EmptyDataError: No columns to parse from file".

Could you tell me how to solve this problem??

Cannot produce Test(ll) results locally

Hi,

I've been testing pywFM package and my question involves understanding how the model.prediction links to the information that is produced in the output

My specific example: If i run libFM with a train and test dataset, i can see in the output test(ll) drops to 0.515385, if i take the predictions and run the test predictions against the test label i get logloss values of 8.134375875846, where i should get 0.515385

For clarity please see the thread i started on Kaggle which also enables you to download the data and reproduce the error.

Full example code: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19319/help-with-libfm/110652#post110652

Problem with -save_model on Windows Running toy example

If I run the toy example:

 import pywFM
 import numpy as np
 import pandas as pd

 features = np.matrix([
 #     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
 #    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
    [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
    [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
    [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
    [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
    [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
    [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
    [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
 ])
 target = [5, 3, 1, 4, 5, 1, 5]

 fm = pywFM.FM(task='regression', num_iter=5)

 model = fm.run(features[:5], target[:5], features[5:], target[5:])

I got this error

ERROR: the parameter save_model does not exist
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pywFM\__init__.py", line 231, in run
    rlog = pd.read_csv(rlog_path, sep='\t')
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 498, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 275, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 590, in __init__
    self._make_engine(self.engine)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 731, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1103, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas\parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5030)
ValueError: No columns to parse from file

so it looks like that -save_model doesn't work. My temporary solution is change lines 162 and 163 in __init__.py from:

                "-verbosity %d" % self.__verbose,
                "-save_model %s" % model_path]

to:

                "-verbosity %d" % self.__verbose]
                #"-save_model %s" % model_path]

The model starts working, you can use model prediction, but the model isn't saved.

FM.run in Example code fails on Windows

model = fm.run(features[:5], target[:5], features[5:], target[5:]) line in https://github.com/jfloff/pywFM example fails with the following output. Same error happens with both libFM compiled from sources and using binaries http://www.libfm.org/libfm-1.40.windows.zip.

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
20 # split features and target for train/test
21 # first 5 are train, last 2 are test
---> 22 model = fm.run(features[:5], target[:5], features[5:], target[5:])
23 print(model.predictions)
24 # you can also get the model weights

C:\Miniconda2\lib\site-packages\pywFM__init__.pyc in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set)
228 # parses rlog into
229 import pandas as pd
--> 230 rlog = pd.read_csv(rlog_path, sep='\t')
231 os.close(rlog_fd)
232 os.remove(rlog_path)

C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
527 skip_blank_lines=skip_blank_lines)
528
--> 529 return _read(filepath_or_buffer, kwds)
530
531 parser_f.name = name

C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
293
294 # Create the parser.
--> 295 parser = TextFileReader(filepath_or_buffer, **kwds)
296
297 if (nrows is not None) and (chunksize is not None):

C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in init(self, f, engine, **kwds)
610 self.options['has_index_names'] = kwds['has_index_names']
611
--> 612 self._make_engine(self.engine)
613
614 def _get_options_with_defaults(self, engine):

C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
745 def _make_engine(self, engine='c'):
746 if engine == 'c':
--> 747 self._engine = CParserWrapper(self.f, **self.options)
748 else:
749 if engine == 'python':

C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in init(self, src, *_kwds)
1117 kwds['allow_leading_cols'] = self.index_col is not False
1118
-> 1119 self._reader = _parser.TextReader(src, *_kwds)
1120
1121 # XXX

pandas\parser.pyx in pandas.parser.TextReader.cinit (pandas\parser.c:5030)()

ValueError: No columns to parse from file`

Path not set

Hi, trying the use this wrapper and I am getting the error

OSError                                   Traceback (most recent call last)
<ipython-input-1-6bd4815fa3b6> in <module>()
     16 target = [5, 3, 1, 4, 5, 1, 5]
     17 
---> 18 fm = pywFM.FM(task='regression', num_iter=5)
     19 
     20 # split features and target for train/test

/usr/local/lib/python3.5/dist-packages/pywFM/__init__.py in __init__(self, task, num_iter, init_stdev, k0, k1, k2, learning_method, learn_rate, r0_regularization, r1_regularization, r2_regularization, rlog, verbose, silent, temp_path)
    103         self.__libfm_path = os.environ.get('LIBFM_PATH')
    104         if self.__libfm_path is None:
--> 105             raise OSError("`LIBFM_PATH` is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).")
    106 
    107     def run(self, x_train, y_train, x_test, y_test, x_validation_set=None, y_validation_set=None):

OSError: `LIBFM_PATH` is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).

I followed the install instructions exactly, specifically making sure I did the export path part correctly. What might be going wrong here?

Testing target values

Hello

In order to run the model one needs to add a vector y of target values for the testing dataset also. I don't understand the requirement, why does it need the test values? Are they used when training the model?


Parameters

x_train : {array-like, matrix}, shape = [n_train, n_features]
Training data
y_train : numpy array of shape [n_train]
Target values
x_test: {array-like, matrix}, shape = [n_test, n_features]
Testing data
y_test : numpy array of shape [n_test]
Testing target values


My idea is to predict the values of a test set from which I do not have the target values, how should I proceed?

problem of regression results

I ran your example code as follows

import pywFM
import numpy as np
import pandas as pd

features = np.matrix([

[1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
[1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
[1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
[0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
[0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
[0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
[0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]

])
target = [5, 3, 1, 4, 5, 1, 5]

fm = pywFM.FM(task='regression', num_iter=1000, verbose=True)

model = fm.run(features[:5], target[:5], features[5:], target[5:])
print(model.predictions)
print(model.weights)

and the predictions are not so good, like this:
[3.71942, 3.4779]
[-0.36734, -1.25636, 1.04973, -2.0381, -2.07228, 0.0822247, -0.202202, -1.26609, -2.40143, -0.568957, -2.13888, -1.41459, 0.36015, 0.787539, -0.303377]
actually the GT is [1, 5]. I tried to implement with even more iterations and the result is still not so good. how about the result of your implementation.

thx

Problem on Windows Running toy example

Hi,
I installed pywFM with in a Anaconda environment. I pip installed it right, then I set as environment variable LIBFM_PATH. Then I run the program:

`

import pywFM
import numpy as np
import pandas as pd

features = np.matrix([
#     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
#    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
    [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
    [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
    [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
    [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
    [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
    [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
    [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
])
target = [5, 3, 1, 4, 5, 1, 5]

fm = pywFM.FM(task='regression', num_iter=5)

# split features and target for train/test
# first 5 are train, last 2 are test
model = fm.run(features[:5], target[:5], features[5:], target[5:])

`

But the last line gives me an error:

"C:\Users\[...]\Anaconda\Lib\site-packages\pywFMlibFM" non è riconosciuto come comando interno o esterno, 
 un programma eseguibile o un file batch. (it means that it's not recognized as a internal or external command, an executable or a file batch)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pywFM\__init__.py", line 230, in run
    rlog = pd.read_csv(rlog_path, sep='\t')
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 498, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 275, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 590, in __init__
    self._make_engine(self.engine)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 731, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1103, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas\parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5030)
ValueError: No columns to parse from file

Could you point me please where is the error? Thank you

Global bias is None, even k0 is True

I was trying to calculate the prediction value myself with the weight output by model and always got the wrong answer.
Then I realized that the bias value is always None, even if I set k0=True manually.

fm = pywFM.FM(task='regression', num_iter=5, k2=2, k0=True)

I don't know whether the wrong answer came from the absence of bias, but it seems strange that the global_bias is always None.

How can I fix it?
Or, is there any possibility I can calculate the prediction myself?

Predict for new data.

Say I trained the model with

fm.run(train_x, train_y, val_x, val_y)

How do i run prediction for another dataset?

pred_y = fm.run(test_x)

run method expects y_test as input, Which doesn't make sense at all.
run(self, x_train, y_train, x_test, y_test, x_validation_set=None, y_validation_set=None, meta=None)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.