This is the repository for our paper "Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?", published at ECIR2024.
@inproceedings{lyu2024interpretable,
title={Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?},
author={Lyu, Lijun and Roy, Nirmal and Oosterhuis, Harrie and Anand, Avishek},
booktitle={European Conference on Information Retrieval},
pages={384--402},
year={2024},
organization={Springer}
}
We experimented with MQ2008
, Web30k
, and Yahoo!
datasets, the code is compatible with Istella
as well. After downloading the dataset under your project, we preprocessed and saved the dataset to h5py
object to save memory with preprocess.py
.
python preprocess.py --DIRECTORY your-project-dir --Dataset MQ2008 --Fold Fold1 --task h5
We use hydra
to manage hyperparameters. The choice of datasets, feature selection methods, training mode (pointwise
, pairwise
, listwise
) are defined under config
. For training a model, see train.yaml
, for testing and explanation, see explain.yaml
. All feature selection methods are under models
. To train a model with L2X under multiple tries of hyperparameters:
HYDRA_FULL_ERROR=1 python train_framework.py --multirun models=l2x datasets=MQ2008 train_mode=listwise gpus=1 train_mode.train_loader.batch_size=400 random_seed=0,1,2,3,4 models.model.hparams.feature_ratio=0.1,0.2, trainer.max_epochs=20