Giter VIP home page Giter VIP logo

wsdm-20's Introduction

Comparative Web Search Questions

This repository contains the data and code for reproducing results of the paper:

@InProceedings{bondarenko:2020a,
  author =                {Alexander Bondarenko and Pavel Braslavski and Michael V{\"o}lske and Rami Aly and Maik Fr{\"o}be and Alexander Panchenko and Chris Biemann and Benno Stein and Matthias Hagen},
  booktitle =             {13th ACM International Conference on Web Search and Data Mining (WSDM 2020)},
  editor =                {James Caverlee and Xia (Ben) Hu and Mounia Lalmas and Wei Wang},
  ids =                   {stein:2020a},
  month =                 feb,
  pages =                 {52-60},
  publisher =             {ACM},
  site =                  {Houston, USA},
  title =                 {{Comparative Web Search Questions}},
  url =                   {https://dl.acm.org/doi/abs/10.1145/3336191.3371848},
  year =                  2020
}

[Paper Link]

Code structure

Notebooks:

The two last notebooks will use predictions produced by CNN and BERT as decision probabilities.

Classification with Neural Networks

Repository for neural models implemented for the classification of comparative questions in a binary and multi-label setting.

System Requirement

The system was tested on Debian/Ubuntu Linux with a GTX 1080TI and TITAN X.

CNN

To train the CNN model contained in this repository, the following repository has been used: https://github.com/uhh-lt/BlurbGenreCollection-HMC

Example for training model:

python3 main.py --mode train --classifier cnn --use_cv --use_early_stop --num_filters 100 
--learning_rate 0.0001 --lang COMPQ_BINARY_YAN --sequence_length 15

Example for testing model:

python3 run_cnn.py --input_path path_to_test_data --cnn_path /checkpoints/your_cnn_model.h5 --vocabulary_path your_vocabulary_cnn --threshold 0.5

BERT

BERT is based on the pre-trained models from https://github.com/huggingface/pytorch-pretrained-BERT with commit 98dc30b21e3df6528d0dd17f0910ffea12bc0f33

  1. Install requirements.

Please check out the version used for this particular publication.

git clone -n <repo_name> 
git checkout <commit_sha>

commit_sha = 98dc30b21e3df6528d0dd17f0910ffea12bc0f33

  1. Please ensure that the data directory specified when training/testing the model contain a file train-binary.tsv and test-binary.tsv for training and testing respectively.

execute run_bert.py from examples subfolder

Example for training model:

python3 run_bert.py   --task_name COMPQ   --do_train --do_lower_case --data_dir path_to_train_data/   --bert_model bert-base-multilingual-uncased --max_seq_length 128   --train_batch_size 32   --learning_rate 2e-5   --num_train_epochs 3.0 --output_dir models/finetuned_bert_model

Example for testing model:

python3 run_bert.py   --task_name COMPQ --do_eval --do_lower_case --data_dir path_to_test_data --bert_model your_bert_model 
--max_seq_length 128 --train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir path_to_model --models/finetuned_bert_model

With: --data_dir The path to the input data. Again, please name the files in the input data train-binary.tsv and test-binary.tsv.

--output_dir this is where you have to put in the model and config file.

After running, the output should also already be created in the output directory.

Data

Annotated English question queries are available from: [Data Link] Pre-trained models for Russian are available from: [Data Link]

wsdm-20's People

Contributors

alebondarenko avatar raldir avatar mam10eks avatar

Watchers

James Cloos avatar  avatar  avatar  avatar

Forkers

mam10eks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.