Giter VIP home page Giter VIP logo

irspack's Introduction

irspack - Implicit recommender systems for practitioners

Python pypi GitHub license Build Read the Docs codecov

Docs

irspack is a Python package for recommender systems based on implicit feedback, designed to be used by practitioners.

Some of its features include:

  • Efficient parameter tuning enabled by C++/Eigen implementations of core recommender algorithms and optuna.
    • In particular, if an early stopping scheme is available, optuna can prune out unpromising trial based on the intermediate validation scores.
  • Various utility functions, including
    • ID/index mapping utilities
    • Fast, multithreaded argsort for batch recommendation retrieval
    • Efficient and configurable evaluation of recommender system performance

Installation & Optional Dependencies

In most cases, you can install the pre-build binaries via

pip install irspack

The binaries have been compiled to use AVX instruction. If you want to use AVX2/AVX512 or your environment does not support AVX (like Rosetta 2 on Apple M1), install it from source like

CFLAGS="-march=native" pip install git+https://github.com/tohtsky/irspack.git

In that case, you must have a decent version of C++ compiler (with C++11 support). If it doesn't work, feel free to make an issue!

Optional Dependencies

I have also prepared a wrapper class (BPRFMRecommender) to train/optimize BPR/warp loss Matrix factorization implemented in lightfm. To use it you have to install lightfm separately, e.g. by

pip install lightfm

If you want to use Mult-VAE, you'll need the following additional (pip-installable) packages:

Basic Usage

Step 1. Train a recommender

To begin with, we represent the user/item interaction as a scipy.sparse matrix. Then we can feed it into recommender classes:

import numpy as np
import scipy.sparse as sps
from irspack import IALSRecommender, df_to_sparse
from irspack.dataset import MovieLens100KDataManager

df = MovieLens100KDataManager().read_interaction()

# Convert pandas.Dataframe into scipy's sparse matrix.
# The i'th row of `X_interaction` corresponds to `unique_user_id[i]`
# and j'th column of `X_interaction` corresponds to `unique_movie_id[j]`.
X_interaction, unique_user_id, unique_movie_id = df_to_sparse(
  df, 'userId', 'movieId'
)

recommender = IALSRecommender(X_interaction)
recommender.learn()

# for user 0 (whose userId is unique_user_id[0]),
# compute the masked score (i.e., already seen items have the score of negative infinity)
# of items.
recommender.get_score_remove_seen([0])

Step 2. Evaluation on a validation set

To evaluate the performance of a recommenderm we have to split the dataset to train and validation sets:

from irspack.split import rowwise_train_test_split
from irspack.evaluation import Evaluator

# Random split
X_train, X_val = rowwise_train_test_split(
    X_interaction, test_ratio=0.2, random_state=0
)

evaluator = Evaluator(ground_truth=X_val)

recommender = IALSRecommender(X_train)
recommender.learn()
evaluator.get_score(recommender)

This will print something like

{
    'appeared_item': 435.0,
    'entropy': 5.160409123151053,
    'gini_index': 0.9198367595008214,
    'hit': 0.40084835630965004,
    'map': 0.013890322881619916,
    'n_items': 1682.0,
    'ndcg': 0.07867240014767263,
    'precision': 0.06797454931071051,
    'recall': 0.03327028758587522,
    'total_user': 943.0,
    'valid_user': 943.0
}

Step 3. Hyperparameter optimization

Now that we can evaluate the recommenders' performance against the validation set, we can use optuna-backed hyperparameter optimization.

best_params, trial_dfs  = IALSRecommender.tune(X_train, evaluator, n_trials=20)

# maximal ndcg around 0.43 ~ 0.45
trial_dfs["ndcg@10"].max()

Of course, we have to hold-out another interaction set for test, and measure the performance of tuned recommender against the test set.

See examples/ for more complete examples.

TODOs

  • more benchmark dataset

irspack's People

Contributors

k-tahiro avatar random1992 avatar tnakae avatar tohtsky avatar wararaki718 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

irspack's Issues

AttributeError: 'DataFrame' object has no attribute 'ndcg'

Thank you for building this great project.

I found a AttributeError: 'DataFrame' object has no attribute 'ndcg'. Could you give some hint for this. Maybe the version caused such an issue.

[I 2021-04-15 22:32:57,587] Trial 19 finished with value: -0.1701716280
513936 and parameters: {'top_k': 15, 'normalize_weight': True}. Best is trial 13
 with value: -0.38695462976440437.
>>>
>>> # maximal ndcg around 0.38 ~ 0.39
>>> trial_dfs.ndcg.max()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    trial_dfs.ndcg.max()
  File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line
 5460, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'ndcg'

Add more examples

Our examples are limited to Movielens (because it's handy), but it is clearly not the most "typical" implicit feedback data.
We clearly need more examples with

  • different datasets
  • reproducing recent papers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.