Giter VIP home page Giter VIP logo

task-recommendation's Introduction

Recommending Tasks based on Search Queries and Missions

This repository provides resources developed within the article "Recommending Tasks based on Search Queries and Missions".

Datasets

The datasets constructed in our work are placed under the directory data/datasets/.

Corpus of Search Queries and Missions

This corpus, presented in Section 5.1 of the article, is in data/datasets/corpus_of_missions.tsv.

  • The format is the same as in the original corpus on which ours is based. This is, each row in the file is in the format:
<UserID>	<Query>	<TimeStamp>	<ClickRank>	<ClickDomain>	<MissionID>	<Comment>

Corpus of Procedural Search Missions

Our corpus of procedural missions, described in Section 5.2 of the article, is in the JSON file data/datasets/corpus_of_procedural_missions.json. The content is of the shape:

{
    <mission ID>: {
                      "all_queries": {<query ID>: <query>},
	              "best_queries": {<query ID>: <query>}
	              }
}

where:

  • <mission ID> is the mission ID (by construction, prefixed by the corresponding userID in the corpus of search queries and missions);
  • "best_queries" field holds only the best queries found per search mission; and
  • "all_queries" field contains all the queries in the search mission, including the best one(s).

Test Collection

The test collection built in Section 5.4 of the article is placed under the directory data/datasets/qrels/. Each file is in the usual format of an input qrels file (i.e. the file with relevance judgements) for the trec_eval command.

  • qrels-QB.tsv is the basic collection, which is used to evaluate query-based task recommendation (Section 6.2 of the article). Its format is:
<query ID>	Q0	<task ID>	<relevance>
  • qrels-MB.tsv is used to evaluate a mission-based task recommendation runfile. The collection is obtained as explained in the last paragraph of Section 5.4 of the article. Its format is:
<mission ID>	Q0	<task ID>	<relevance>
  • qrels-QB-restricted.tsv is used to evaluate the query-based task recommendation runfile in order to be comparable with a mission-based runfile. It's obtained after restricting the original collection to contain only a single best query per mission, randomly selected. Additionally, its query IDs are replaced by the respective mission IDs. Then, its format is:
<mission ID>	Q0	<task ID>	<relevance>

Features for Learning to Recommend Tasks

The file data/features/features-QB.json contains the features for each query-task instance, used for learning to recommend tasks based on search queries.

Analogously, data/features/features-MB.json contains the features for each query-task instance, used for learning to recommend tasks based on search missions.

Each of these feature sets has this format:

{
  "QID:<query ID>|TASKID:<task ID>": {
    "features": {
      <feature>: <feature value>,
    },
    "properties": {
      "task_id": <task ID>,
      "query_id": <query ID>,
      "task_title": <task title>,
      "query": <query>
    },
    "target": <relevance>
  },
  ...
}

Task Recommendation

The output task recommendation runfiles are placed under the directory output/. Each file is in the usual format of an input runfile for the trec_eval command. Its format is:

<query ID>	Q0	<task ID>	<rank position>	<score>	<run ID>
  • 1-baselines/ contains the outputs from the text-based baselines (Section 6.1 of the article);
  • 2-query_based/ contains the outputs from our query-based task recommendation approach (Section 6.2 of the article); and
  • 3-mission_based/ contains the outputs from our mission-based task recommendation approach (Section 6.3 of the article).

trec_eval is the standard tool used by the TREC community for evaluating an retrieval method. It calculates and prints various evaluation measures, evaluating the results file against the file of relevance judgements.

In order to evaluate a <runfile> against a <qrels file> in terms of the metrics used in the article, one can run the following:

$ path/to/trec_eval -c -m ndcg_cut.10 -m P.10 -m map <qrels file> <runfile>

For example, below it's shown how to call for the evaluation of the several methods and configurations to obtain the results from Tables 4, 5, and 6 in the article:

# Query-based: BM25-Title
path/to/trec_eval -c -m ndcg_cut.10 -m P.10 -m map data/datasets/qrels/qrels-QB.tsv output/1-baselines/BM25/results-BM25-title.trec.txt

# Query-based: configurations by some feature subsets
path/to/trec_eval -c -m ndcg_cut.10 -m P.10 -m map data/datasets/qrels/qrels-QB.tsv output/2-query_based/results-<LTR configuration>.trec.txt

# Query-based: restricted to be comparable with any mission-based runfile
path/to/trec_eval -c -m ndcg_cut.10 -m P.10 -m map data/datasets/qrels/qrels-QB-restricted.tsv output/2-query_based/results-LTR-restricted.trec.txt 

# Mission-based: configurations by method family and aggregator function
path/to/trec_eval -c -m ndcg_cut.10 -m P.10 -m map data/datasets/qrels/qrels-MB.tsv output/3-mission_based/by_<method family>/aggr_<aggregator function>/results.trec.txt

task-recommendation's People

Contributors

dariogarigliotti avatar

Watchers

Johannes Bjerva avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.