Giter VIP home page Giter VIP logo

crm-ltr's Introduction

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

This repository contains code and the Commercial Dataset needed for reproducing the results presented in the following paper: Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products .

Note: Updates to the repository coming soon.

Commercial Dataset: E-commerce dataset for LTR

The dataset and its description can be found here.

Neural network architecture of S-CNN

We selected a simple yet powerful CNN model proposed by Severyn et.al [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.723.6492&rep=rep1&type=pdf] for empirical evaluation of our CRM approach. We refer to this model as S-CNN in the paper. The figure below depicts the architecture of the neural network. This figure is taken from the paper of Severyn et.al. The implementation in Keras was adapted from https://github.com/gvishal/rank_text_cnn.

Deep learning architecture for reranking short text pairs

Evaluation

  • For evaluation we use the standard tool used by TREC community for evaluating ad-hoc retrieval tasks trec_eval. The latest version of this tool can be found here.
  • evaluation_metrics.py: contains function get_trec_eval_metrics(). This function returns evaluation results using the evaluation tool trec_eval.
  • Reproducing the results

    First install Jupyter Notebook using following command: (For details, click here)
     pip3 install jupyter 

    If you are not familiar with running notebook, click here.

    Download the treac_eval tool and Mercateo Dataset.

    • CRM_Training
      • crm_training_clicks.ipynb: Run this jupyter notebook for training CRM model from AtB click logs [Reproducing results of Table 2 of the paper].
      • crm_training_orders.ipynb: Run this jupyter notebook for training CRM model from order logs [Reproducing results of Table 3 of paper].
      • crm_model.py: Keras implementation of CNN for short text pairs with counterfactual risk minimization (CRM) loss function.
    • Cross_Entropy_Training
      • cross_entropy_training_clicks.ipnyb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 2 of paper].
      • cross_entropy_training_orders.ipynb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 3 of paper].
      • model_cross_entropy.py: Keras implementation of CNN for short text pairs with cross-entropy loss.
    • LambdaMART Training
      • Download the binary file of RankLib tool from here.
      • We used latest binary 'RankLib-2.1-patched.jar' for our experiments.
      • Train LambdaMART model, for Graded Order Labels [Table 3], by running this command:
         java -jar RankLib-2.1-patched.jar -train LambdaMART_files/New_Graded_Order_TrainFile.csv 
          -test LambdaMART_files/New_Graded_Order_TestFile.csv -validate LambdaMART_files/
          New_Graded_Order_DevFile.csv -ranker 6 -metric2t NDCG@10 -metric2T NDCG@10 
          -save Model_LMART_Graded_Orders.txt 
      • In order to evaluate the saved model on other metrics {NDCG@5,P@5,P@10,RR,MAP}, run this command:
         java -jar RankLib-2.1-patched.jar -load Model_LMART_Graded_Orders.txt -test
            LambdaMART_files/New_Graded_Order_DevFile.csv -metric2T NDCG@5 
    • Affect of DNN architecture
      • For reproducing the results in Table 4 of the paper, refer to MatchZoo.
      • MatchZoo has comprehensive documentation on the dependencies and how to run the models.
      • For a fair comparison with S-CNN model, we modified the models in MatchZoo and added a fully connected layer before the last layer. This layer is added so that we can utilize the dense features.

    Dependencies

    • python 2.7 or higher
    • numpy
    • keras
    • trec_eval

    For more information about the dataset and model please refer to the paper. For any questions/bugs, you can report issue here.

    crm-ltr's People

    Contributors

    drybalko avatar ecom-research avatar

    Stargazers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    Watchers

     avatar  avatar  avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.