Giter VIP home page Giter VIP logo

microsoft / recommenders Goto Github PK

View Code? Open in Web Editor NEW
18.0K 270.0 3.0K 215.29 MB

Best Practices on Recommendation Systems

Home Page: https://recommenders-team.github.io/recommenders/intro.html

License: MIT License

Jupyter Notebook 0.40% Python 97.73% Dockerfile 0.56% C++ 0.47% Scala 0.84%
machine-learning recommender ranking deep-learning python jupyter-notebook recommendation-algorithm rating operationalization kubernetes azure microsoft recommendation-system recommendation-engine recommendation data-science tutorial artificial-intelligence

recommenders's Introduction

Recommenders

Documentation status

What's New (October, 2023)

We are pleased to announce that this repository (formerly known as Microsoft Recommenders, https://github.com/microsoft/recommenders), has joined the Linux Foundation of AI and Data (LF AI & Data)! The new organization, recommenders-team, reflects this change.

We hope this move makes it easy for anyone to contribute! Our objective continues to be building an ecosystem and a community to sustain open source innovations and collaborations in recommendation systems.

Now to access the repo, instead of going to https://github.com/microsoft/recommenders, you need to go to https://github.com/recommenders-team/recommenders. The old URL will still resolve to the new one, but we recommend that you update your bookmarks.

Introduction

Recommenders objective is to assist researchers, developers and enthusiasts in prototyping, experimenting with and bringing to production a range of classic and state-of-the-art recommendation systems.

Recommenders is a project under the Linux Foundation of AI and Data.

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Prepare Data: Preparing and loading data for each recommendation algorithm.
  • Model: Building models using various classical and deep learning recommendation algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
  • Evaluate: Evaluating algorithms with offline metrics.
  • Model Select and Optimize: Tuning and optimizing hyperparameters for recommendation models.
  • Operationalize: Operationalizing models in a production environment on Azure.

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the Recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

Getting Started

We recommend conda for environment management, and VS Code for development. To install the recommenders package and run an example notebook on Linux/WSL:

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>

# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders

# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>

# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git

# 6. Within VSCode:
#   a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;  
#   b. Select Jupyter kernel <kernel_name>;
#   c. Run the notebook.

For more information about setup on other platforms (e.g., Windows and macOS) and different configurations (e.g., GPU, Spark and experimental features), see the Setup Guide.

In addition to the core package, several extras are also provided, including:

  • [gpu]: Needed for running GPU models.
  • [spark]: Needed for running Spark models.
  • [dev]: Needed for development for the repo.
  • [all]: [gpu]|[spark]|[dev]
  • [experimental]: Models that are not thoroughly tested and/or may require additional steps in installation.

Algorithms

The table below lists the recommendation algorithms currently available in the repository. Notebooks are linked under the Example column as Quick start, showcasing an easy to run example of the algorithm, or as Deep dive, explaining in detail the math and implementation of the algorithm.

Algorithm Type Description Example
Alternating Least Squares (ALS) Collaborative Filtering Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment. Quick start / Deep dive
Attentive Asynchronous Singular Value Decomposition (A2SVD)* Collaborative Filtering Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment. Quick start
Cornac/Bayesian Personalized Ranking (BPR) Collaborative Filtering Matrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment. Deep dive
Cornac/Bilateral Variational Autoencoder (BiVAE) Collaborative Filtering Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment. Deep dive
Convolutional Sequence Embedding Recommendation (Caser) Collaborative Filtering Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment. Quick start
Deep Knowledge-Aware Network (DKN)* Content-Based Filtering Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment. Quick start / Deep dive
Extreme Deep Factorization Machine (xDeepFM)* Collaborative Filtering Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment. Quick start
FastAI Embedding Dot Bias (FAST) Collaborative Filtering General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment. Quick start
LightFM/Factorization Machine Collaborative Filtering Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment. Quick start
LightGBM/Gradient Boosting Tree* Content-Based Filtering Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments. Quick start in CPU / Deep dive in PySpark
LightGCN Collaborative Filtering Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment. Deep dive
GeoIMC* Collaborative Filtering Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment. Quick start
GRU Collaborative Filtering Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment. Quick start
Multinomial VAE Collaborative Filtering Generative model for predicting user/item interactions. It works in the CPU/GPU environment. Deep dive
Neural Recommendation with Long- and Short-term User Representations (LSTUR)* Content-Based Filtering Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment. Quick start
Neural Recommendation with Attentive Multi-View Learning (NAML)* Content-Based Filtering Neural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment. Quick start
Neural Collaborative Filtering (NCF) Collaborative Filtering Deep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment. Quick start / Deep dive
Neural Recommendation with Personalized Attention (NPA)* Content-Based Filtering Neural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment. Quick start
Neural Recommendation with Multi-Head Self-Attention (NRMS)* Content-Based Filtering Neural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment. Quick start
Next Item Recommendation (NextItNet) Collaborative Filtering Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment. Quick start
Restricted Boltzmann Machines (RBM) Collaborative Filtering Neural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment. Quick start / Deep dive
Riemannian Low-rank Matrix Completion (RLRMC)* Collaborative Filtering Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment. Quick start
Simple Algorithm for Recommendation (SAR)* Collaborative Filtering Similarity-based algorithm for implicit user/item feedback. It works in the CPU environment. Quick start / Deep dive
Self-Attentive Sequential Recommendation (SASRec) Collaborative Filtering Transformer based algorithm for sequential recommendation. It works in the CPU/GPU environment. Quick start
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)* Collaborative Filtering Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment. Quick start
Multi-Interest-Aware Sequential User Modeling (SUM)* Collaborative Filtering An enhanced memory network-based sequential user model which aims to capture users' multiple interests. It works in the CPU/GPU environment. Quick start
Sequential Recommendation Via Personalized Transformer (SSEPT) Collaborative Filtering Transformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment. Quick start
Standard VAE Collaborative Filtering Generative Model for predicting user/item interactions. It works in the CPU/GPU environment. Deep dive
Surprise/Singular Value Decomposition (SVD) Collaborative Filtering Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment. Deep dive
Term Frequency - Inverse Document Frequency (TF-IDF) Content-Based Filtering Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment. Quick start
Vowpal Wabbit (VW)* Content-Based Filtering Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning. Deep dive
Wide and Deep Collaborative Filtering Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment. Quick start
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) Collaborative Filtering Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment. Deep dive

NOTE: * indicates algorithms invented/contributed by Microsoft.

Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Algorithm Type Description Example
SARplus * Collaborative Filtering Optimized implementation of SAR for Spark Quick start

Algorithm Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.

Algo MAP nDCG@k Precision@k Recall@k RMSE MAE R2 Explained Variance
ALS 0.004732 0.044239 0.048462 0.017796 0.965038 0.753001 0.255647 0.251648
BiVAE 0.146126 0.475077 0.411771 0.219145 N/A N/A N/A N/A
BPR 0.132478 0.441997 0.388229 0.212522 N/A N/A N/A N/A
FastAI 0.025503 0.147866 0.130329 0.053824 0.943084 0.744337 0.285308 0.287671
LightGCN 0.088526 0.419846 0.379626 0.144336 N/A N/A N/A N/A
NCF 0.107720 0.396118 0.347296 0.180775 N/A N/A N/A N/A
SAR 0.110591 0.382461 0.330753 0.176385 1.253805 1.048484 -0.569363 0.030474
SVD 0.012873 0.095930 0.091198 0.032783 0.938681 0.742690 0.291967 0.291971

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

This project adheres to Microsoft's Open Source Code of Conduct in order to foster a welcoming and inspiring community for all.

Build Status

These tests are the nightly builds, which compute the asynchronous tests. main is our principal branch and staging is our development branch. We use pytest for testing python utilities in recommenders and the Recommenders notebook executor for the notebooks.

For more information about the testing pipelines, please see the test documentation.

AzureML Nightly Build Status

The nightly build tests are run daily on AzureML.

Build Type Branch Status Branch Status
Linux CPU main azureml-cpu-nightly staging azureml-cpu-nightly
Linux GPU main azureml-gpu-nightly staging azureml-gpu-nightly
Linux Spark main azureml-spark-nightly staging azureml-spark-nightly

References

  • D. Li, J. Lian, L. Zhang, K. Ren, D. Lu, T. Wu, X. Xie, "Recommender Systems: Frontiers and Practices", Springer, Beijing, 2024. Available on this link.
  • A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", WWW 2020: International World Wide Web Conference Taipei, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692
  • S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967
  • L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019), 2019.

recommenders's People

Contributors

aeroabir avatar almudenasanz avatar anargyri avatar angusrtaylor avatar bethz avatar dciborow avatar eisber avatar evgeniachroni avatar gramhagen avatar heatherbshapiro avatar jingyanwangms avatar jreynolds01 avatar leavingseason avatar loomlike avatar maxkazmsft avatar miguelgfierro avatar motefly avatar nicolashug avatar nikhilrj avatar niwilso avatar pradnyeshjoshi avatar roalexan avatar simonyansenzhao avatar simonzhaoms avatar tqtg avatar wesszumino avatar wutaomsft avatar yanzhangads avatar yjw1029 avatar yueguoguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recommenders's Issues

Add environment installation to smoke and integration tests

As of now, the smoke and integration tests run on a pre created environment.

We have to test the full pipeline, this means that, for each environment, python, spark and GPU, we create the conda file, install the environment, execute the tests and then remove the environment.

TODO:

  • Change names of builds in VSTS
  • Add different environments for unit tests
  • Create and delete the environment after smoke and integration tests
  • Modify README and link statuses

Add test data sets

Tests on certain modules such as evaluation and split should be performed on a sufficiently large dataset (e.g., Netflix, Movielens-1M, etc.).

Define and create master PR strategy

  • Consider what additional smoke tests / other tests we need in place
  • Timers on integration tests
  • Consider tests on larger datasets
  • Test bench on cosmosdb - consider the scaling of each algorithm

Clean up notebooks in git

  • Find a way to reduce git changes created by changes in kernel, cell run history.
  • Make sure notebooks are following correct templates, review templates

Consider using nbdiff

DS VM pySpark version and library mismatch

As DS VM upgraded to Spark 2.3 the supporting libraries and environment don't work with Spark 2.2 which is required for Airship (note Recommenders). The quick fix is to upgrade virtual env to Spark 2.3. Going forward, we should figure out if we want to keep upgrading Spark versions as DS VM upgrades them OR anchor to specific Spark version with standalone DS VM libs.

Rename "utilities" folder to "reco_utils"

As per Tao - it would be less confusing if we use a name like rec_utils instead of utilities.

from utilities.recommender.sar.sar_singlenode import SARSingleNodeReference
from utilities.dataset.url_utils import maybe_download
from utilities.dataset.python_splitters import python_random_split
from utilities.evaluation.python_evaluation import PythonRatingEvaluation, PythonRankingEvaluation

Review docstrings in reco_utils

Some files don't have the correct docstrings. Ex:

class SparkRatingEvaluation:
    """Spark Rating Evaluator"""

    def __init__(
        self,
        rating_true,
        rating_pred,
        col_user=DEFAULT_USER_COL,
        col_item=DEFAULT_ITEM_COL,
        col_rating=DEFAULT_RATING_COL,
        col_prediction=PREDICTION_COL,
    ):
        """Initializer.
        Args:
            rating_true (spark.DataFrame): True labels.
            rating_pred (spark.DataFrame): Predicted labels.
        """

Recalculate / update SAR user-item affinity matrix or item-item similarity matrix

One suggestion for SAR implementation - SAR currently seems does both user-item affinity matrix calculation and item-item similarity matrix calculation in the same function fit(). Would be good to have them separately in the case we want to re-calculate (update) one of the matrix. Or even better if we have an update function for individual user or item records that only re-calculate the cells related to the user or item from the matrices.

SAR unit test configs

SAR unit test configs for both single node and pySpark unit tests are the same - right now they are duplicated in each of the test files. They should be imported from conftest.py

Review problem with Jupyter kernel in the unit tests

When creating a new cicd system, we had to register a jupyter kernel.

ipykernel install --user --name py36 --display-name "Python (py36)"

python -m ipykernel install --user --name recommender --display-name "Python (recommender)"

We need to review this issue

SAR has no predict method

The predict method should fill in the SAR score for a given User, Item pair, into a column called prediction. This is needed in order to utilize existing ML Lib libraries.

BUG: python tests doesn't pass if we use only python environment

when using reco_bare environment and do pytest -m "not notebooks and not spark" tests/unit/, I get in a Mac:

(reco_bare) MININT-JFKQCE5:Recommenders miguel$ pytest -m "not notebooks and not spark" tests/unit/
================== test session starts ===================
platform darwin -- Python 3.6.0, pytest-3.6.4, py-1.6.0, pluggy-0.7.1
rootdir: /Users/miguel/MS/code/Recommenders, inifile:
plugins: pylint-0.11.0, datafiles-2.0, cov-2.6.0
collected 39 items / 3 errors / 3 deselected             

========================= ERRORS =========================
____ ERROR collecting tests/unit/test_sar_pyspark.py _____
ImportError while importing test module '/Users/miguel/MS/code/Recommenders/tests/unit/test_sar_pyspark.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/unit/test_sar_pyspark.py:7: in <module>
    from reco_utils.recommender.sar.sar_pyspark import SARpySparkReference
reco_utils/recommender/sar/sar_pyspark.py:13: in <module>
    import pyspark.sql.functions as F
E   ModuleNotFoundError: No module named 'pyspark'
__ ERROR collecting tests/unit/test_spark_evaluation.py __
ImportError while importing test module '/Users/miguel/MS/code/Recommenders/tests/unit/test_spark_evaluation.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/unit/test_spark_evaluation.py:7: in <module>
    from reco_utils.evaluation.spark_evaluation import (
reco_utils/evaluation/spark_evaluation.py:5: in <module>
    from pyspark.mllib.evaluation import RegressionMetrics, RankingMetrics
E   ModuleNotFoundError: No module named 'pyspark'
___ ERROR collecting tests/unit/test_spark_splitter.py ___
ImportError while importing test module '/Users/miguel/MS/code/Recommenders/tests/unit/test_spark_splitter.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/unit/test_spark_splitter.py:10: in <module>
    from reco_utils.dataset.spark_splitters import spark_chrono_split, spark_random_split
reco_utils/dataset/spark_splitters.py:6: in <module>
    from pyspark.sql import Window
E   ModuleNotFoundError: No module named 'pyspark'
!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!
========= 3 deselected, 3 error in 2.02 seconds ==========

debug issue with pysar test and tolerance

On some machines, using a tolerance of 1e-8, the tests pass, but in others they don't.

We got this error on Prometheus, when testing test_sar_single_node.py:

(py36) miguel@prometheus:~/repos/Recommenders$ pytest tests/unit/test_sar_singlenode.py 
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.6.5, pytest-3.6.4, py-1.7.0, pluggy-0.7.1
rootdir: /home/miguel/repos/Recommenders, inifile:
collected 15 items                                                                                                                                                                         

tests/unit/test_sar_singlenode.py ...........FFFF                                                                                                                                    [100%]

========================================================================================= FAILURES =========================================================================================
____________________________________________________________________________________ test_user_affinity ____________________________________________________________________________________

demo_usage_data =                  UserId    MovieId     Timestamp  Rating  exponential  rating_exponential
0      0003000098E85347  DQF...076
11837  00030000822E3BAE  DAF-00448  1.416292e+09       1     0.009076            0.009076

[11838 rows x 6 columns]
sar_settings = {'ATOL': 1e-08, 'FILE_DIR': 'http://recodatasets.blob.core.windows.net/sarunittest/', 'TEST_USER_ID': '0003000098E85347'}
header = {'col_item': 'MovieId', 'col_rating': 'Rating', 'col_timestamp': 'Timestamp', 'col_user': 'UserId'}

    def test_user_affinity(demo_usage_data, sar_settings, header):
        time_now = demo_usage_data[header["col_timestamp"]].max()
        model = SARSingleNodeReference(
            remove_seen=True,
            similarity_type="cooccurrence",
            timedecay_formula=True,
            time_decay_coefficient=30,
            time_now=time_now,
            **header
        )
        _apply_sar_hash_index(model, demo_usage_data, None, header)
        model.fit(demo_usage_data)
    
        true_user_affinity, items = load_affinity(sar_settings["FILE_DIR"] + "user_aff.csv")
        user_index = model.user_map_dict[sar_settings["TEST_USER_ID"]]
        test_user_affinity = np.reshape(
            np.array(
                _rearrange_to_test(
                    model.user_affinity, None, items, None, model.item_map_dict
                )[user_index,].todense()
            ),
            -1,
        )
>       assert np.allclose(
            true_user_affinity.astype(test_user_affinity.dtype),
            test_user_affinity,
            atol=sar_settings["ATOL"],
        )
E       AssertionError: assert False
E        +  where False = <function allclose at 0x7f6110e1d730>(array([0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.      ...       , 0.        , 0.        ,\n       0.        , 0.        , 0.15181286, 1.        , 0.        ,\n       0.        ]), array([0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.      ...       , 0.        , 0.        ,\n       0.        , 0.        , 0.15195908, 1.        , 0.        ,\n       0.        ]), atol=1e-08)
E        +    where <function allclose at 0x7f6110e1d730> = np.allclose
E        +    and   array([0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.      ...       , 0.        , 0.        ,\n       0.        , 0.        , 0.15181286, 1.        , 0.        ,\n       0.        ]) = <built-in method astype of numpy.ndarray object at 0x7f60fc6adee0>(dtype('float64'))
E        +      where <built-in method astype of numpy.ndarray object at 0x7f60fc6adee0> = array(['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',\n       '0', '0.0221122254449968', '0', '0', '0..., '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',\n       '0', '0.151812861826336', '1', '0', '0'], dtype='<U18').astype
E        +      and   dtype('float64') = array([0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.      ...       , 0.        , 0.        ,\n       0.        , 0.        , 0.15195908, 1.        , 0.        ,\n       0.        ]).dtype

tests/unit/test_sar_singlenode.py:201: AssertionError
___________________________________________________________________________ test_userpred[3-cooccurrence-count] ____________________________________________________________________________

threshold = 3, similarity_type = 'cooccurrence', file = 'count', header = {'col_item': 'MovieId', 'col_rating': 'Rating', 'col_timestamp': 'Timestamp', 'col_user': 'UserId'}
sar_settings = {'ATOL': 1e-08, 'FILE_DIR': 'http://recodatasets.blob.core.windows.net/sarunittest/', 'TEST_USER_ID': '0003000098E85347'}
demo_usage_data =                  UserId    MovieId     Timestamp  Rating  exponential  rating_exponential
0      0003000098E85347  DQF...076
11837  00030000822E3BAE  DAF-00448  1.416292e+09       1     0.009076            0.009076

[11838 rows x 6 columns]

    @pytest.mark.parametrize(
        "threshold,similarity_type,file",
        [(3, "cooccurrence", "count"), (3, "jaccard", "jac"), (3, "lift", "lift")],
    )
    def test_userpred(
        threshold, similarity_type, file, header, sar_settings, demo_usage_data
    ):
        time_now = demo_usage_data[header["col_timestamp"]].max()
        model = SARSingleNodeReference(
            remove_seen=True,
            similarity_type=similarity_type,
            timedecay_formula=True,
            time_decay_coefficient=30,
            time_now=time_now,
            threshold=threshold,
            **header
        )
        _apply_sar_hash_index(model, demo_usage_data, None, header)
        model.fit(demo_usage_data)
    
        true_items, true_scores = load_userpred(
            sar_settings["FILE_DIR"]
            + "userpred_"
            + file
            + str(threshold)
            + "_userid_only.csv"
        )
        test_results = model.recommend_k_items(
            demo_usage_data[
                demo_usage_data[header["col_user"]] == sar_settings["TEST_USER_ID"]
            ],
            top_k=10,
        )
        test_items = list(test_results[header["col_item"]])
        test_scores = np.array(test_results["prediction"])
        assert true_items == test_items
>       assert np.allclose(true_scores, test_scores, atol=sar_settings["ATOL"])
E       assert False
E        +  where False = <function allclose at 0x7f6110e1d730>(array([40.96870941, 40.37760085, 19.55002941, 18.10756063, 13.24775154,\n       12.67358812, 12.49898911, 12.0359004 , 10.91842008, 10.91185623]), array([41.00239015, 40.41649126, 19.5650067 , 18.12114858, 13.26051135,\n       12.6742369 , 12.50043289, 12.047493  , 10.92893636, 10.92236618]), atol=1e-08)
E        +    where <function allclose at 0x7f6110e1d730> = np.allclose

tests/unit/test_sar_singlenode.py:245: AssertionError
_______________________________________________________________________________ test_userpred[3-jaccard-jac] _______________________________________________________________________________

threshold = 3, similarity_type = 'jaccard', file = 'jac', header = {'col_item': 'MovieId', 'col_rating': 'Rating', 'col_timestamp': 'Timestamp', 'col_user': 'UserId'}
sar_settings = {'ATOL': 1e-08, 'FILE_DIR': 'http://recodatasets.blob.core.windows.net/sarunittest/', 'TEST_USER_ID': '0003000098E85347'}
demo_usage_data =                  UserId    MovieId     Timestamp  Rating  exponential  rating_exponential
0      0003000098E85347  DQF...076
11837  00030000822E3BAE  DAF-00448  1.416292e+09       1     0.009076            0.009076

[11838 rows x 6 columns]

    @pytest.mark.parametrize(
        "threshold,similarity_type,file",
        [(3, "cooccurrence", "count"), (3, "jaccard", "jac"), (3, "lift", "lift")],
    )
    def test_userpred(
        threshold, similarity_type, file, header, sar_settings, demo_usage_data
    ):
        time_now = demo_usage_data[header["col_timestamp"]].max()
        model = SARSingleNodeReference(
            remove_seen=True,
            similarity_type=similarity_type,
            timedecay_formula=True,
            time_decay_coefficient=30,
            time_now=time_now,
            threshold=threshold,
            **header
        )
        _apply_sar_hash_index(model, demo_usage_data, None, header)
        model.fit(demo_usage_data)
    
        true_items, true_scores = load_userpred(
            sar_settings["FILE_DIR"]
            + "userpred_"
            + file
            + str(threshold)
            + "_userid_only.csv"
        )
        test_results = model.recommend_k_items(
            demo_usage_data[
                demo_usage_data[header["col_user"]] == sar_settings["TEST_USER_ID"]
            ],
            top_k=10,
        )
        test_items = list(test_results[header["col_item"]])
        test_scores = np.array(test_results["prediction"])
        assert true_items == test_items
>       assert np.allclose(true_scores, test_scores, atol=sar_settings["ATOL"])
E       assert False
E        +  where False = <function allclose at 0x7f6110e1d730>(array([0.0616357 , 0.04918001, 0.04247487, 0.04009872, 0.03847229,\n       0.03839772, 0.03251167, 0.02474822, 0.02432458, 0.0224889 ]), array([0.06163639, 0.04921205, 0.04247624, 0.04011545, 0.03848885,\n       0.03843471, 0.0325135 , 0.02477206, 0.02432508, 0.02249099]), atol=1e-08)
E        +    where <function allclose at 0x7f6110e1d730> = np.allclose

tests/unit/test_sar_singlenode.py:245: AssertionError
________________________________________________________________________________ test_userpred[3-lift-lift] ________________________________________________________________________________

threshold = 3, similarity_type = 'lift', file = 'lift', header = {'col_item': 'MovieId', 'col_rating': 'Rating', 'col_timestamp': 'Timestamp', 'col_user': 'UserId'}
sar_settings = {'ATOL': 1e-08, 'FILE_DIR': 'http://recodatasets.blob.core.windows.net/sarunittest/', 'TEST_USER_ID': '0003000098E85347'}
demo_usage_data =                  UserId    MovieId     Timestamp  Rating  exponential  rating_exponential
0      0003000098E85347  DQF...076
11837  00030000822E3BAE  DAF-00448  1.416292e+09       1     0.009076            0.009076

[11838 rows x 6 columns]

    @pytest.mark.parametrize(
        "threshold,similarity_type,file",
        [(3, "cooccurrence", "count"), (3, "jaccard", "jac"), (3, "lift", "lift")],
    )
    def test_userpred(
        threshold, similarity_type, file, header, sar_settings, demo_usage_data
    ):
        time_now = demo_usage_data[header["col_timestamp"]].max()
        model = SARSingleNodeReference(
            remove_seen=True,
            similarity_type=similarity_type,
            timedecay_formula=True,
            time_decay_coefficient=30,
            time_now=time_now,
            threshold=threshold,
            **header
        )
        _apply_sar_hash_index(model, demo_usage_data, None, header)
        model.fit(demo_usage_data)
    
        true_items, true_scores = load_userpred(
            sar_settings["FILE_DIR"]
            + "userpred_"
            + file
            + str(threshold)
            + "_userid_only.csv"
        )
        test_results = model.recommend_k_items(
            demo_usage_data[
                demo_usage_data[header["col_user"]] == sar_settings["TEST_USER_ID"]
            ],
            top_k=10,
        )
        test_items = list(test_results[header["col_item"]])
        test_scores = np.array(test_results["prediction"])
        assert true_items == test_items
>       assert np.allclose(true_scores, test_scores, atol=sar_settings["ATOL"])
E       assert False
E        +  where False = <function allclose at 0x7f6110e1d730>(array([0.00134902, 0.00084695, 0.00072497, 0.00072133, 0.00066855,\n       0.0006003 , 0.00045299, 0.00045202, 0.00041803, 0.00034772]), array([0.00134902, 0.00084696, 0.00072513, 0.00072134, 0.00066871,\n       0.00060031, 0.00045312, 0.00045204, 0.00041804, 0.00034806]), atol=1e-08)
E        +    where <function allclose at 0x7f6110e1d730> = np.allclose

tests/unit/test_sar_singlenode.py:245: AssertionError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.