whuir / dazer Goto Github PK

The Tensorflow implementation of accepted ACL 2018 paper "A deep relevance model for zero-shot document filtering", Chenliang Li, Wei Zhou, Feng Ji, Yu Duan, Haiqing Chen, http://aclweb.org/anthology/P18-1214

Python 99.49% Shell 0.51%

tensorflow zero-shot document-classification document-filtering deeplearning document-ranking

dazer's People

Contributors

Stargazers

Watchers

Forkers

tianke0711 hsouporto bigdataedison qmindong llouislu popocq yue123161 buaachuanwang raman-r-4978 howard1337 songkaisong himanshuverma02

dazer's Issues

Where can I find the paper

Hi, Could you please give a link of the paper? Since I cannot find it any where else. Thanks.

training data format

what is positive_document and negative_document in training data format when training with 20 newsgroup dataset.

The Data

I want to know which public dataset you are using, or can you send me the data you use, I hope to run the program correctly.Thank you.
My email [email protected]

An end-to-end working example is appreciated

I just finished reading the paper and it's a great one! Very clearly written with solid experimental results.

It'll help greatly for people to try out your model if you can provide an end-to-end working example starting from publicly available word embeddings and datasets. The current code requires the user to follow a specific data format and it takes time to convert the data before feeding to the model.

Unable to reproduce MAP numbers

Hello,

Thank you for the great work. Below are the steps, I follow to run the code where I assume task = space.

Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to formulate training data and ignore training records corresponding to categories = ['sci.space'] and ['comp.graphics']. This way, training_data_size = 10,134
Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to get val/data data. This way, testing_data_size = 7,532
Set c.DAZER.train_class_num = 18 in sample.config. Rest of settings remain same.
Run sample-train.sh and sample-test.sh
Relevance score file is produced.
For the testing dataset, ignore document corresponding to ['comp.graphics'], mark the documents = 1 for category ['sci.space'] and mark the documents = 0 for rest of the categories.
Use https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html to calculate AP score for task = space where y_true is binary and y_score = relevance scores.

Following above steps, I get MAP ~ 0.050 which is way far from the reported number. Could you please let me know how did you calculate MAP scores? Additionally, please let me know if any of the above steps are incorrect. Thanks.

whuir / dazer Goto Github PK

dazer's People

Contributors

Stargazers

Watchers

Forkers

dazer's Issues

Where can I find the paper

training data format

The Data

An end-to-end working example is appreciated

Unable to reproduce MAP numbers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent