meituan-dianping / asap Goto Github PK

ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

License: Apache License 2.0

asap's Introduction

ASAP-NAACL2021

General Introduction

This repository contains the data of the NAACL 2021 paper: ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating

ASAP is a large-scale Chinese restaurant review dataset for Aspect category Sentiment Analysis (ACSA) and review rating Prediction (RP).

ASAP includes 46, 730 genuine user reviews from the Dianping App, a leading Online-to-Offline (O2O) e-commerce platform. Besides a 5-star scale rating, each review is manually annotated according to its sentiment polarities towards 18 pre-defined aspect categories, including food, service, enrionment and so on. We split the dataset into a training set (36,850), a validation set (4,940) and a test set (4,940) randomly.

Data Example

Read File

import pandas as pd

data = pd.read_csv(file_path, header=0)

Data Label

The sentiment polarity over the aspect category is labeled as 1(Positive), 0(Neutral), −1(Negative), −2(Not-Mentioned)

The star rating ranges from 1 to 5.

Citation

Please cite the following paper if you found it useful in your work.

@inproceedings{bu-etal-2021-asap,
    title = "{ASAP}: A {C}hinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction",
    author = "Bu, Jiahao  and
      Ren, Lei  and
      Zheng, Shuang  and
      Yang, Yang  and
      Wang, Jingang  and
      Zhang, Fuzheng  and
      Wu, Wei",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.167",
    pages = "2069--2079"
}

Contact

Jiahao Bu: [email protected]

Lei Ren: [email protected]

Jingang Wang: [email protected]

asap's People

Contributors

Stargazers

Watchers

asap's Issues

A question about the number of labels

Thanks for your extremely great work.

I have a question about the number of labels C mentioned in 4.2 section.

In your dataset, it seems that there are 4 labels for one aspect-category (1 - positive, 0 - neural, -1 - negative, -2 - not mentioned), but in the ACSA part, it looks like C is equal to 3 (positive, neural, negative)

C is the number of labels (i.e, 3 in our task).

so, I wonder why the label not mentioned was scrapped.

I have noticed that the gate function p_i is used to ensure only the mentioned aspect categories can participate in the calculation of the loss function.
but it seems like the gate function p_i is not a trainable parameter, so when using the trained model to do prediction, how the model knows which aspect-category are mentioned in a review?

for example, for an input review R written by a user,
the output of the ACSA part might be:

\hat{y}_1 = [0.7, 0.2, 0.1] 
\hat{y}_2 = [0.5, 0.2, 0.3]
...
\hat{y}_18 = [0.05, 0.05, 0.9]

(Please correct me if I have misunderstood something)

in this case, we can know that this user has positive sentiment with the first and second aspect-category, and has negative sentiment with the 18th aspect-category, but how do we know which aspect-category are mentioned in this review?

Can you make your experimental model publicly available for reproduction?

Your work is extremely valuable!

If possible, could you make your experimental model publicly available for replication?

Thanks in advance!