Worth-reading papers and related awesome resources on matching task. Matching task is common in many tasks, like natural language inference (NLI), question answering (QA), recommendation system (RecSys), information retrieval (IR) and advertising. This repository also contains many relative research field of this task, including approximately approximate nearest neighbor (ANN), text matching algorithm, CTR, LTR (learning-to-rank) and so on.
Suggestions about adding papers, repositories and other resources are welcomed!
Since I am Chinese, I mainly focus on Chinese resources. Welcome to recommend excellent resources in English or other languages!
值得一读的匹配任务相关论文与资源集合。匹配任务常见于自然语言推断、问答、推荐系统、信息检索、广告等场景。本仓库还包含该任务的许多相关研究领域,包括最近邻搜索、文本匹配算法和CTR、LTR等。
欢迎新增论文、代码仓库与其他资源等建议!
- Enhanced-RCNN: An Efficient Method for Learning Sentence Similarity. Shuang Peng, Hengbin Cui, Niantao Xie, Sujian Li, Jiaxing Zhang, Xiaolong Li. (WWW 2020) [paper]
- Match^2: A Matching over Matching Model for Similar Question Identification. Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xueqi Cheng, Hui Jiang, Xiaozhao Wang. (SIGIR 2020) [paper]
- DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding. Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang. (SIGIR 2020) [paper]
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Omar Khattab, Matei Zaharia. (SIGIR 2020) [paper][code]
- Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston. (ICLR 2020) [paper][unofficial code]
- Pre-training Tasks for Embedding-based Large-scale Retrieval. Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar. (ICLR 2020) [paper]
- CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. Dong-Kyu Chae, Jinsoo Kang, Sangwook Kim, Jungtae Lee. (CIKM 2018) [paper][code]
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Pipei Huang, Huan Zhao, Guoliang Kang, Qiwei Chen, Wei Li, Dik Lun Lee. (CIKM 2019) [paper] - MIND
- SDM: Sequential Deep Matching Model for Online Large-scale Recommender System. Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, Wilfred Ng. (CIKM 2019) [paper][code]
- Learning Robust Models for e-Commerce Product Search. Thanh V. Nguyen, Nikhil Rao, Karthik Subbian. (ACL 2020) [paper] - QUARTS
- Internal and Contextual Attention Network for Cold-start Multi-channel Matching in Recommendation. Ruobing Xie, Zhijie Qiu, Jun Rao, Yi Liu, Bo Zhang, Leyu Lin. (IJCAI 2020) [paper] - ICAN
- Deep Retrieval: An End-to-End Learnable Structure Model for Large-Scale Recommendations. Weihao Gao, Xiangjun Fan, Jiankai Sun, Kai Jia, Wenzhi Xiao, Chong Wang, Xiaobing Liu. (CoRR 2020) [paper]
- Deep Session Interest Network for Click-Through Rate Prediction. Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, Keping Yang. (IJCAI 2019) [paper][codee] - DSIN
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba. Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, Wenwu Ou. (DLP-KDD 2019) [paper] - BST
- Deep Match to Rank Model for Personalized Click-Through Rate Prediction. Ze Lyu, Yu Dong, Chengfu Huo, Weijun Ren. (AAAI 2020) [paper][code][blog] - DMR
- Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction. Qi Pi, Xiaoqiang Zhu, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Kun Gai. (CoRR 2020) [paper] - SIM
- GateNet: Gating-Enhanced Deep Network for Click-Through Rate Prediction. Tongwen Huang, Qingyun She, Zhiqiang Wang, Junlin Zhang. (CoRR 2020) [paper]
- Deep Feedback Network for Recommendation. Ruobing Xie, Cheng Ling, Yalong Wang, Rui Wang, Feng Xia, Leyu Lin. (IJCAI 2020) [paper][code] - DFN
- Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction. Weinan Xu, Hengxu He, Minshi Tan, Yunming Li, Jun Lang, Dongbai Guo. (SIGIR 2020) [paper] [code] - DHAN
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction. Wentao Ouyang, Xiuwu Zhang, Lei Zhao, Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, Yanlong Du. (CIKM 2020) [paper][blog]
- Operation-aware Neural Networks for User Response Prediction. Yi Yang, Baile Xu, Furao Shen, Jian Zhao. (Neural Networks Volume 121, January 2020) [paper] - ONN NFFM
- IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models. Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang. (SIGIR 2017) [paper][code]
- Detecting Near-Duplicates for Web Crawling. Gurmeet Singh Manku, Arvind Jain profile, Anish Das Sarma. (WWW 2007) [paper] - Simhash
- Product Quantization for Nearest Neighbor Search. Hervé Jégou, Matthijs Douze, Cordelia Schmid. (IEEE Transactions on Pattern Analysis and Machine Intelligence 2011) [paper] - PQ
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. Yu. A. Malkov, D. A. Yashunin. (IEEE Trans. Pattern Anal. Mach. Intell. 42(4)) [paper] - HNSW
- ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. Martin Aumüller, Erik Bernhardsson, Alexander Faithfull. (Information Systems 2019) [paper][code]
- Embedding-based Retrieval in Facebook Search. Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, Linjun Yang. (KDD 2020) [paper]
- Accelerating Large-Scale Inference with Anisotropic Vector Quantization. Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar. [paper][code] - ScaNN
- Real-time Attention Based Look-alike Model for Recommender System. Yudan Liu, Kaikai Ge, Xu Zhang, Leyu Lin. (KDD 2019) [paper] - RALM
- Applying Deep Learning To Airbnb Search. Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins, Thomas Legrand. (KDD 2019) [paper]
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search. Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun, Ping Li. (KDD 2019) [paper]
- Embedding-based Retrieval in Facebook Search. Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, Linjun Yang. (KDD 2020) [paper]
- Deep Learning for Matching in Search and Recommendation. Jun Xu, Xiangnan He, Hang Li. (SIGIR 2018) [slides][paper]
- A Survey on Knowledge Graph-Based Recommender Systems. Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, Qing He. (CoRR 2020) [paper]
- Graph Learning Approaches to Recommender Systems: A Review. Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z. Sheng, Mehmet A. Orgun, Longbing Cao, Nan Wang, Francesco Ricci, Philip S. Yu. (CoRR 2020) [paper]
- Adversarial Machine Learning in Recommender Systems: State of the art and Challenges. Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra. (CoRR 2020) [paper]
- A Comparison of Supervised Learning to Match Methods for Product Search. Fatemeh Sarvi, Nikos Voskarides, Lois Mooiman, Sebastian Schelter, Maarten de Rijke. (SIGIR 2020) [paper][code]
- Baidu / Familia - A Toolkit for Industrial Topic Modeling
- chihming / competitive-recsys
- Coder-Yu / RecQ
- DA-southampton / NLP_ability - 梳理自然语言处理工程师(NLP)需要积累的各方面知识
- DA-southampton / Tech_Aarticle - 深度学习模型在各大公司实际生产环境的应用讲解文章
- guoday / ctrNet-tool - This's the tool for CTR, including FM, FFM, NFFM and so on
- jrzaurin / pytorch-widedeep
- lanwuwei / SPM_toolkit
- NTMC-Community / MatchZoo
- NTMC-Community / MatchZoo-py
- mengfeizhang820 / Paperlist-for-Recommender-Systems
- mJackie / RecSys
- pengming617 / text_matching
- RediSearch / RediSearch - Fulltext Search and Secondary Index module for Redis
- shenweichen / DeepMatch
- shenweichen / DeepCTR
- shenweichen / DeepCTR-Torch
- shenweichen / GraphEmbedding
- ShuaichiLi / Chinese-sentence-similarity-task - 中文问题句子相似度计算比赛及方案汇总
- THUNLP / NeuIRPapers - Must-read Papers on Neural Information Retrieval
- THUNLP / OpenMatch
- THUwangcy / ReChorus [video] - "Chorus" of recommendation models: a PyTorch framework for Top-K recommendation with implicit feedback
- wangle1218 / deep_text_matching - Implementation several deep text match (text similarly) models for Keras
- wzhe06 / Reco-papers
- zhaogaofeng611 / TextMatch - 基于Pytorch的中文语义相似度匹配模型
- ZiyaoGeng / Recommender-System-with-TF2.0 - Recurrence the recommender paper with Tensorflow2.0
- aaalgo / KGraph - A Library for Approximate Nearest Neighbor Search
- erikbern / ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python
- facebookresearch / Faiss - A library for efficient similarity search and clustering of dense vectors
- FALCONN-LIB / FALCONN - LSH-based FAst Lookups of Cosine and Other Nearest Neighbors
- google-research / ScaNN - a method for efficient vector similarity search at scale
- Jina AI / Jina - An easier way to build neural search in the cloud
- kayzhu / LSHash - A fast Python implementation of LSH
- leonsim / simhash - A Python Implementation of Simhash Algorithm
- Microsoft / SPTAG - A distributed approximate nearest neighborhood search (ANN) library
- milvus-io / Milvus - An open source vector similarity search engine
- pixelogik / NearPy - Python framework for fast ANN search in large, high-dimensional datasets
- primetang / pyflann - python bindings for FLANN
- Spotify / Annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
- Vearch / Vearch - A distributed system for efficient similarity search of embedding vectors
- wangzhegeek / DSSM-Lookalike
- yanyiwu / simhash - A C++ Implementation of Simhash for Chinese
- matsui528 / Rii - IVFPQ-based fast and memory efficient ANN search method with a subset-search functionality
- mukul5sharma / SearchEngine - A simple search engine using BM25 ranking algorithm
- Adversarial NLI: A New Benchmark for Natural Language Understanding. Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela. (ACL 2020) [paper][data][blog]
- OCNLI: Original Chinese Natural Language Inference. Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, Lawrence S. Moss. (EMNLP 2020) [paper][data]
- MIND: A Large-scale Dataset for News Recommendation. Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, Ming Zhou. (ACL 2020) [paper][data]
- EE_NovRain / 各大公司广泛使用的在线学习算法FTRL详解
- Johnson0722 / CTR预估算法之FM, FFM, DeepFM及实践
- 良口三 / 从Triplet Loss看推荐系统中文章Embedding
- 辛俊波 / 线下auc涨,线上ctr/cpm跌的原因和解决办法
- 七便士 / SDM(Sequential Deep Matching Model)的复现之路
- 深度传送门 / 工业界深度推荐系统与CTR预估必读的论文汇总
- 朱亚东 / 排序学习综述
- 阿里 / 神马搜索如何提升搜索的时效性?
- 阿里 / Life-long兴趣建模视角CTR预估模型:Search-based Interest Model
- 阿里 / 阿里妈妈深度树匹配技术演进3.0:TDM->JTM->BSAT
- coolhok / faiss-learning学习文档
- Chenny / 传统文本匹配算法详解(附代码)
- cmathx / 搜索推荐召回&&粗排相关性优化最新进展—2020
- CNU小学生 / 一文看懂HNSW算法理论的来龙去脉
- 策略算法工程师之路 / Query纠错算法
- 叉烧 / ACL2020 | 线上搜索结果大幅提升!亚马逊提出对抗式query-doc相关性模型
- Dezhi Ye / Embedding-based Retrieval in Facebook Search论文解读
- 邓邓最棒 / 海量文本求topk相似:faiss库初探
- 丁香园 / 丁香园在语义匹配任务上的探索与实践
- 丁香园 / 搜索中的Query扩展技术
- Giant / K近邻算法哪家强?KDTree、Annoy、HNSW原理和使用方法介绍
- 花椒 / 智能推荐算法在直播场景中的应用
- 京东 / 深度解析京东个性化推荐系统演进史
- 科学空间 / 从EMD、WMD到WRD:文本向量序列的相似度计算
- liqima / Faiss wiki in Chinese
- Merria28 / 相似度检测——hnsw参数选择
- 每天都要机器学习 / 基于向量的深层语义相似文本召回?你需要bert和faiss
- 美团 / 深度学习在美团点评推荐平台排序中的运用
- 美团 / MT-BERT在文本检索任务中的实践
- 平安寿险PAI / AAAI 2020 | 基于Transformer的对话选择语义匹配模型
- 清雨影 / TOP N 推荐神器 Ranknet加速史(附Pytorch实现)
- Tree / Ranking算法评测指标之 CG、DCG、NDCG
- 腾讯 / 神盾推荐——MAB算法应用总结
- 腾讯 / 移动腾讯网召回算法实践总结
- 腾讯 / 个性化推荐如何满足用户口味?微信看一看的技术这样做
- 腾讯 / 详文解读微信「看一看」多模型内容策略与召回
- 腾讯 / 万字长文读懂微信“看一看”内容理解与推荐
- 王鸿伟 / DNN可以进行高阶特征交互,为什么Wide&Deep和DeepFM等模型仍然需要显式构造Wide部分?
- 吴海波 / 乱弹机器学习评估指标AUC
- Yong Yuan / 图像检索:向量索引
- 夕小瑶 / 2020深度文本匹配最新进展:精度、速度我都要!
- 一小撮人 / Fiass - Getting started
- 一小撮人 / Fiass - Faster search、Lower memory 、Run on GPUs
- 一小撮人 / Fiass - clustering, PCA, quantization
- 一小撮人 / Faiss - Guidelines to choose an index
- 一小撮人 / Faiss - Basic index
- 一小撮人 / Faiss - Binary indexes, Composite indexes
- 一小撮人 / Fiass - 常见问题总结
- 一小撮人 / 一文带你了解Annoy!
- 知乎 / Query 理解和语义召回在知乎搜索中的应用
- 字节跳动 / 3分钟了解今日头条推荐算法原理(附视频+PPT)