Giter VIP home page Giter VIP logo

llmrank's Introduction

LLMRank

LLMRank aims to investigate the capacity of LLMs that act as the ranking model for recommender systems. [paper]

Yupeng Hou†, Junjie Zhang†, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, Wayne Xin Zhao. Large Language Models are Zero-Shot Rankers for Recommender Systems. ECIR 2024.

🛍️ LLMs as Zero-Shot Rankers

We use LLMs as ranking models in an instruction-following paradigm. For each user, we first construct two natural language patterns that contain sequential interaction histories and retrieved candidate items, respectively. Then these patterns are filled into a natural language template as the final instruction. In this way, LLMs are expected to understand the instructions and output the ranking results as the instruction suggests.

🚀 Quick Start

  1. Write your own OpenAI API keys into llmrank/openai_api.yaml.
  2. Unzip dataset files.
    cd llmrank/dataset/ml-1m/; unzip ml-1m.inter.zip
    cd llmrank/dataset/Games/; unzip Games.inter.zip
    For data preparation details, please refer to [data-preparation].
  3. Install dependencies.
    pip install -r requirements.txt
  4. Evaluate ChatGPT's zero-shot ranking abilities on ML-1M dataset.
    cd llmrank/
    python evaluate.py -m Rank

🔍 Key Findings

Please click the links below each "Observation" to find the code and scripts to reproduce the results.

Observation 1. LLMs struggle to perceive order of user historie, but can be triggered to perceive the orders

LLMs can utilize historical behaviors for personalized ranking, but struggle to perceive the order of the given sequential interaction histories.

By employing specifically designed promptings, such as recency-focused prompting and in-context learning, LLMs can be triggered to perceive the order of historical user behaviors, leading to improved ranking performance.

Code is here -> [reproduction scripts]

Observation 2. Biases exist in using LLMs to rank

LLMs suffer from position bias and popularity bias while ranking, which can be alleviated by specially designed prompting or bootstrapping strategies.

Code is here -> [reproduction scripts]

Observation 3. Promising zero-shot ranking abilities

LLMs have promising zero-shot ranking abilities, ...

..., especially on candidates retrieved by multiple candidate generation models with different practical strategies.

Code is here -> [reproduction scripts]

🌟 Acknowledgement

Please cite the following paper if you find our code helpful.

@inproceedings{hou2024llmrank,
  title={Large Language Models are Zero-Shot Rankers for Recommender Systems},
  author={Yupeng Hou and Junjie Zhang and Zihan Lin and Hongyu Lu and Ruobing Xie and Julian McAuley and Wayne Xin Zhao},
  booktitle={{ECIR}},
  year={2024}
}

The experiments are conducted using the open-source recommendation library RecBole.

We use the released pre-trained models of UniSRec and VQ-Rec in our zero-shot recommendation benchmarks.

Thanks @neubig for the amazing implementation of asynchronous dispatching OpenAI APIs. [code]

llmrank's People

Contributors

hyp1231 avatar leoleojie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

llmrank's Issues

复现

image
image
按照代码中给的命令,由于我的网络问题,把async_dispatch设为False,复现之后各个指标好像都达不到论文中的值,是因为async_dispatch的设置吗?

关于异步调用api

在使用asyncio异步调用api时显示Error communicating with OpenAI,而当使用串行调用api时,即async_dispatch设为False,可以正常返回结果。

数据预处理步骤报错

使用python dataset/unisrec_auxiliary_files_process.py生成 *.feat1CLS时,提示:
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/zhangjunjie/bert-base-uncased'.
此处是否有需要下载的预训练模型

关于baseline的评估细节

据我了解,这个表格中的zero-shot部分,随机抽取200个用户,作为test set,每个用户1个正样本+19个负采样;

我的问题是,baseline中的full这些模型的指标

  1. 是否也是随机抽取200个用户?
  2. 是否也是在1+19的方式计算出来的?

image

另外在3.2节中提到:"The top-3 best items retrieved by each candidate generation model will be merged into a candidate set containing a total of 21 items." 我想问一下:
3. 每种检索方式检索的3个中,如果有重复是怎么处理的?

感谢回复!

请教问题

user_token2id = dataset.field2token_id['user_id']
item_token2id = dataset.field2token_id['item_id']

您好,这里我有些困惑,有些不确定。
filed可能指的是user 和 item;
field 2 token_id,token_id 是什么?

user_token 2 id 和 item_token 2 id 这里的id是什么?

这里可能涉及内部id和外部id,item id、token 和真实item的自然语言名字,这三个如何转换?

感谢您的工作,期望得到您的回复。

在推理阶段bootstrap的策略是如何使用?

您好,我从代码中看到你们实现boostrap策略是通过同一个样本复制三次然后shuffle计算结果的,那么你们的方法在推理的阶段是如何使用呢?比如有三个候选项目,三次大模型给出三个排序结果,如何将这个shuffle三次的输出结果综合得到最后的输出结果呢?这块我看论文中没有明显提出来,还是说这不是你们论文的讨论范围,也就是你们只是做评测,然后这个过程和推理过程是有差别的呢?
希望能得到您的回答,谢谢!

sasrec games复现结果不一致

test result: OrderedDict([('recall@1', 0.04), ('recall@5', 0.235), ('recall@10', 0.46), ('recall@20', 1.0), ('recall@50', 1.0), ('ndcg@1', 0.04), ('ndcg@5', 0.1323), ('ndcg@10', 0.2035), ('ndcg@20', 0.3383), ('ndcg@50', 0.3383)])
您好,以上是我按照给出的sasrec预训练模型和代码中的数据得到的结果,远低于论文结果,请问是怎么回事,是yaml文件中某个地方的设置问题吗,现在是在random数据集中进行的,请问是什么原因?
image

how were the m1-1m.random constructed?

hello, I try to search some user with items from m1-1m.random in file m1-1m.inter, but I cannot find the corresponding one?does this file include random negative pairs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.