Giter VIP home page Giter VIP logo

asap's Introduction

ASAP-NAACL2021

General Introduction

This repository contains the data of the NAACL 2021 paper: ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating

ASAP is a large-scale Chinese restaurant review dataset for Aspect category Sentiment Analysis (ACSA) and review rating Prediction (RP).

ASAP includes 46, 730 genuine user reviews from the Dianping App, a leading Online-to-Offline (O2O) e-commerce platform. Besides a 5-star scale rating, each review is manually annotated according to its sentiment polarities towards 18 pre-defined aspect categories, including food, service, enrionment and so on. We split the dataset into a training set (36,850), a validation set (4,940) and a test set (4,940) randomly.

Data Example

image

Read File

import pandas as pd

data = pd.read_csv(file_path, header=0)

Data Label

The sentiment polarity over the aspect category is labeled as 1(Positive), 0(Neutral), −1(Negative), −2(Not-Mentioned)

The star rating ranges from 1 to 5.

Citation

Please cite the following paper if you found it useful in your work.

@inproceedings{bu-etal-2021-asap,
    title = "{ASAP}: A {C}hinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction",
    author = "Bu, Jiahao  and
      Ren, Lei  and
      Zheng, Shuang  and
      Yang, Yang  and
      Wang, Jingang  and
      Zhang, Fuzheng  and
      Wu, Wei",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.167",
    pages = "2069--2079"
}

Contact

Jiahao Bu: [email protected]

Lei Ren: [email protected]

Jingang Wang: [email protected]

asap's People

Contributors

bitwjg avatar blueseasky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

asap's Issues

A question about the number of labels

Thanks for your extremely great work.

I have a question about the number of labels C mentioned in 4.2 section.

In your dataset, it seems that there are 4 labels for one aspect-category (1 - positive, 0 - neural, -1 - negative, -2 - not mentioned), but in the ACSA part, it looks like C is equal to 3 (positive, neural, negative)

C is the number of labels (i.e, 3 in our task).

so, I wonder why the label not mentioned was scrapped.

I have noticed that the gate function p_i is used to ensure only the mentioned aspect categories can participate in the calculation of the loss function.
but it seems like the gate function p_i is not a trainable parameter, so when using the trained model to do prediction, how the model knows which aspect-category are mentioned in a review?

for example, for an input review R written by a user,
the output of the ACSA part might be:

\hat{y}_1 = [0.7, 0.2, 0.1] 
\hat{y}_2 = [0.5, 0.2, 0.3]
...
\hat{y}_18 = [0.05, 0.05, 0.9]

(Please correct me if I have misunderstood something)

in this case, we can know that this user has positive sentiment with the first and second aspect-category, and has negative sentiment with the 18th aspect-category, but how do we know which aspect-category are mentioned in this review?

文章中BERT模型的疑问

本文的BERT模型应该是采用单阶段的方法来解决方面级的情感分类吧,一共有18个方面,每个方面有3类。
但是,不应该再多出一个类来表示评论中是否包含该方面的信息吗?针对每个方面进行情感分类的时候,用的是交叉熵损失函数吧。这样的话,不就默认评论包含所有的方面信息吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.