Giter VIP home page Giter VIP logo

wands's Introduction

WANDS - Wayfair ANnotation Dataset

OSS Template Version Contributor Covenant

About The Project

WANDS is a Wayfair product search relevance dataset that is published as a companion to the paper from ECIR 2022:

WANDS: Dataset for Product Search Relevance Assessment
Yan Chen, Shujian Liu, Zheng Liu, Weiyi Sun, Linas Baltrunas and Benjamin Schroeder

The dataset allows objective benchmarking and evaluation of search engines on an E-Commerce dataset. Key features of this dataset includes:

  1. 42,994 candidate products
  2. 480 queries
  3. 233,448 (query,product) relevance judgements

Please refer to the paper for more details.

Getting Started

To get a local copy up and running follow these simple steps.

Installation

Clone the repo

git clone https://github.com/wayfair/WANDS.git

Dataset Details

The data is stored in the dataset folder in three files:

  1. product.csv - Stores all candidate products, columns include:
    a. product_id - ID of a product
    b. product_name - String of product name
    c. product_class - Category which product falls under
    d. category_hierarchy - Parent categories of product, delimited by /
    e. product_description - String description of product
    f. product_features - | delimited string of attribute:value pairs which describe the product
    g. rating_count - Number of user ratings for product
    h. average_rating - Average rating the product received
    i. review_count - Number of user reviews for product

  2. query.csv - Stores search queries, columns include:
    a. query_id - unique ID for each query
    b. query - query string
    c. query_class - category to which the query falls under

  3. label.csv - Stores annotated (product,relevance judgement) pairs, columns include
    a. id - Unique ID for each annotation
    b. query_id - ID of the query this annotation is for
    c. product_id - ID of the product this annotation applies to
    d. label - Relevance label, one of 'Exact', 'Partial', or 'Irrelevant'

Sample Notebook

We have included a sample notebook read_dataset.ipynb to show you how you can read the data from the three CSV files easily.

Annotation Guidelines

We released annotation guidelines as a supplement to the dataset.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. For detailed contributing guidelines, please see CONTRIBUTING.md

License

Distributed under the MIT License. See LICENSE for more information.

Contact

For questions or feedback, please reach out to [email protected] or the first author of the referenced paper.

Project Link: https://github.com/wayfair/WANDS

Citation

Please cite this paper if you are building on top of or using this dataset:

@InProceedings{wands,  
  title = {WANDS: Dataset for Product Search Relevance Assessment},  
  author = {Chen, Yan and Liu, Shujian and Liu, Zheng and Sun, Weiyi and Baltrunas, Linas and Schroeder, Benjamin},  
  booktitle = {Proceedings of the 44th European Conference on Information Retrieval},  
  year = {2022},  
  numpages = {12}  
}

wands's People

Contributors

jdhmtl avatar ecir2022 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.