fani-lab / lady Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 3.0 332.62 MB

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation

License: Other

Python 86.54% JavaScript 0.52% CSS 4.18% Dockerfile 0.55% TypeScript 8.20%

aspect-detection review-analysis

lady's People

Contributors

Stargazers

Watchers

Forkers

impedaka lillliant farinamhz

lady's Issues

Setup and Quickstart

Hey @arfaamr,

This is an issue page to log your progress in setting up and running the project.
Feel free to reach out if you have any concerns or questions.

@hosseinfani

Needing for update in OCTIS library

Hi @hosseinfani,
OCTIS has been updated in its original repository, and I think the problem with their scikit version requirement has been resolved as I checked out their updates.

Hence, we'll need to update our OCTIS repo in Fani-lab.

@3ripleM

Integration of the new modules of data augmentation with the pipeline

In this step,
We will add the modules of Back-translation, Word-alignment, and Similarity-comparison to the pipeline to use them for augmenting the data, determine which language is better, which reviews should be added to the original English dataset, and also find the related aspect labels of each augmented or new review.

Edit: In this step, we will add the back-translated reviews to the original dataset and retrain the model.

Adding Word-alignment module for the back-translation task

In this issue, we are going to provide a module that gets two datasets of the same length (each of them contains a list of texts) as the input and gives the alignments list (these alignments will be between every two texts) as the output.

Alignments of the output for the given two texts will be a list of tuples, and each tuple contains an index of a token in the first text and an index of a token in the second text that aligns with text 1.

Negative perplexity in gensim model

Classification baseline for aspect term extraction

Hi @Lillliant,

To update you first on what we are doing;
We are adding new baselines to LADy for checking the results of back-translation on different aspect term models.

As you may know already, we have two kinds of models for aspect extraction;

The first one is aspect category detection models, which are the ones that find the aspect categories, like food, service, anecdotes/miscellaneous, etc.
The second types of model are aspect term detection which tries to find the aspect terms used in the reviews. For example, they may find sushi, price, menu, etc.

Our focus is on the second ones that try to find the aspect terms. However, not all the aspect term models can be helpful in this task as we are trying to find the aspects, whether they are latent or explicit.
In fact, aspect term models which try to find the index of the words as the aspect in the review (span-based ones), or the models that tag the words and say whether each word is aspect or not, are not helpful because if the word is removed due to different reasons like the social background and it is latent, these models can not find them anymore and they fall short in case of latent aspect detection.

So, we were using topic models at first that provided us with some top words in the same topic of the review based on the model's vocabulary.

Now we are searching for a model which is initially for the task of aspect term extraction, not the topic modeling, that can somehow classify the reviews and find the aspect term label even if it is not presented explicitly in the review.

Let's say we have a review describing the menu of the restaurant. So, in this case, the model should be able to do sth like classification and say the aspect term is the menu. Actually, we don't want it to say which word of the review is an aspect term and, in fact, give us a word that may be the aspect term.

We would appreciate it if you help us with this task and see if any of the recent models (preferably after 2019) can do this and if their code or library is available, or we can send them an email to ask for the code even if it is not available.

I have made an Excel sheet in teams files in the LADy channel and put some recent works there so that you can check them, and we can complete the file together and find the suitable one.

batch execution of nllb translation

nllb can translate at one a list of source sentences, esp, in the presence of gpu seems it gives speedup. We need to break the set of input reviews into batches and call nllb to balance speed and memory size.

Attention Based Aspect Extraction (ABAE)

We need to study this baseline
An Unsupervised Neural Attention Model for Aspect Extraction

Toy dataset as a test case for the pipeline

Aggregation and results

In this step, we want to aggregate the results for each of the datasets and topics in one file.

2019-ACL-Context-aware Embedding for Targeted Aspect-based Sentiment Analysis

Context-aware Embedding for Targeted Aspect-based Sentiment Analysis

This issue is for the summary of the paper above.

Distribution of aspect terms and aspect categories in datasets

@farinamhz @Lillliant
As part of stats on datasets, we need to show the distribution of aspects in each dataset, which is probably long-tail or imbalanced.

and distribution of words in semevals

For an example codebase, you can look at this function that produce stats on dataset of teams:

https://github.com/fani-lab/OpeNTF/blob/45aa32b1e32edc906d926c7f841a4ec089f34d18/src/cmn/team.py#L210

Adding HAST as a supervised baseline

We are going to add HAST as a supervised baseline to our experiment for data augmentation in aspect detection task.
It has been published in IJCAI 2018 as "Aspect Term Extraction with History Attention and Selective Transformation".

LADy's Roadmap for Extension (Epic)

@farinamhz @Lillliant
Here is the overall plan for LADy's next iteration of development. Please create an issue for each sub development and link the issue page here:

0-Web Demo
Demo application: #32

running the existing version and adding to the codeline
design and features modifications
testing
hosting and deployment #52

1- Baselines

new state-of-the-art aspect detection baseline (preferably generation-based ones): #63

1.1. Supervised aspect term detection:

aspect term detection 2: Exploiting {BERT} for End-to-End Aspect-based Sentiment Analysis => needs integration

1.2. Supervised aspect category detection: ??

Classification baseline for aspect term extraction --> (#42)

1.3. Unsupervised

Unsupervised aspect category detection 1: Embarrassingly Simple Unsupervised Aspect Extraction => needs integration

1.3.1. Unsupervised Topic-modeling-based:

Neural Topic Modeling --> #30
CTM, ETM, BerTopic
Non-neural Topic Modeling: #12
LDA,
BTM: #5
Random: #9

2- Training Strategy

Augmentation
Back-translation module: #24
Word-alignment module: #26
Semantic comparison module: #27
Labelling the new results of back-translation: #29
Using the modules: #28
Normal

3- Dataset

4- Research Paper

Find one example that resurfaces an aspect

5- Aggregation of results

sub results from the aggregation file (i.e., from agg file to the tabular in paper)

6- Resource paper

testing the existing work and readme: #46
main flow diagram: #48
general paper writeup (to be continued)
integration of the aspect detection models
web app snapshots, deployment, and paper writeup: #32 and #52
running the experiments again to upload the results and resolving problem with sharepoint

7- Languages

With new languages (preferably low-resource ones): #70

8- Translator

Adding a new translator model: #66

Labeling the aspects for back-translated reviews

By using the word alignments and similarity comparison between the original and back-translated reviews, we want to find the new aspect labels for the new (back-translated) reviews.

We discard the new reviews which have a similarity score below a threshold with the original reviews (the threshold is 0.5 for now)
We add the labels which are aligned with the original labels as the labels of similar-semantic new reviews.

2021-ACM WWW-Latent Target-Opinion as Prior for Document-Level Sentiment Classification

Latent Target-Opinion as Prior for Document-Level Sentiment Classification: A Variational Approach from Fine-Grained Perspective

I chose this paper because they stated in the subject that they have focused on latent aspect-opinion pairs. However, as I understood and will explain in summary, they have a different explanation for the word "latent" in their work which helped me to get the idea of bolding the meaning and examples of latent in our work while we write the documents.

Dockerize and fix installation on linux

Based on this issue on spacy repository I had to add typing_extensions to our libraries in order to setup the project

Issue Page: Pydantic issubclass error for python 3.8 and 3.9

You should add this line in the requirement.txt
typing_extensions==4.4.0

Gif image/video for illustrating the pipeline

We need sth like this to present our pipeline: https://www.linkedin.com/feed/update/urn:li:activity:7092542279298527232/

Setup and Quickstart

@Lillliant
This is an issue page to log your progress in setting up and running the project. Please let us know if you have any concerns or questions about this.

Adding a neural topic model baseline

Here we are going to add the CTM baseline to the pipeline as a neural topic model.

Aspect based sentiment analysis + Running Bert and Cat library

Hi @arfaamr,
As I mentioned earlier, our plan is to first run two libraries, Bert and Cat, which can be found in the fani-lab repository. We will aim to understand how they work with their respective datasets by reading their papers.

Here are the steps (which I will update later on):

Run the libraries with their proposed datasets (clone their repository, not the one in fani-lab).
Run them with LADy datasets.
Compare the results and document our understanding.

As a suggestion, we can use Poetry as the package manager. We'll fork the repository on our computers and update the packages using Poetry. This approach is beneficial because @farinamhz intends to integrate these libraries into the LADy project. Currently, it's not possible as they are not in library format. By managing them using Poetry, we can easily convert them into libraries with just a few tweaks.

@hosseinfani can you please verify these steps?
@arfaamr If you have any questions about these steps, please feel free to ask here.

Training Language Models

One approach to personalized review analysis would be training personalized language models. So, I opened this issue for your to log your learning progress about language models, and conditional language models, ....

a server for the web app

Adding a new tanslation model to the pipeline

In this phase, we plan to integrate an additional translation model to determine if the observed improvements are consistent across various translators or if there's potential for further enhancement.

Evaluating results of translation

Hi @Lillliant,

We have implemented and got the results of back-translation on two datasets so far, the first one is Semeval-Restaurant-2016, and the other one is Semeval-Restaurant-2015.

Now we want to evaluate the translation and back-translation results based on specific metrics used in this area.
These are some examples of metrics that are more important: exact match, rouge, and bleu. However, you can search and let me know if any other metrics have been used more lately.

You can find the results of the back-translation for
Semeval-2016 in: data/augmentation/back-translation
and Semeval-2015 in output/augmentation/back-translation-Semeval-15

D represents the original dataset in English, D.L represents the translated dataset, and D_L represents the back-translated dataset.
Now we compare D with D.L, then D.L with D_L, and finally D with D_L to find the values for those metrics.

All the texts or reviews you want to compare, whether in original, translated, or back-translated datasets, can be found in the column "sentences".

Please find the values for metrics in these two datasets for these languages: L in {fra, arb, due, spa, and zho}, which are French, Arabic, German, Spanish, and Chinese.

Feel free to let me know if you have any concerns or questions about this task.

@hosseinfani

New baseline for Aspect-Based Sentiment Analysis

Hey @Lillliant,
I found a paper titled "Generative Aspect-Based Sentiment Analysis with Contrastive Learning and Expressive Structure" from EMNLP'22. It focuses on a generation-based method using T5 for aspect and sentiment detection and seems relevant to our case.
Could you review the paper to make sure if it provides both sentiment and aspect term information, making it a potential baseline for us? The code is also accessible at the following link.

Override the list of languages in git

@farinamhz
You can follow this link to fix the Brainfuck language for XML files in this repo:
https://github.com/github/linguist/blob/master/docs/overrides.md
https://github.com/github/linguist/blob/master/docs/troubleshooting.md

See if we can ignore files in data folder for language detection. Otherwise, change it to xml for semeval files.

Evaluation on the new pipeline with RND, LDA, and BTM as the aspect models

Check the existing readme and codeline

Hey @Lillliant, @DelaramRajaei and @impedaka,

Our codeline and readme have been recently updated. I kindly request you review the instructions in the readme for the installation process and obtain the latest version from the main branch. Please let me know if you encounter any issues with the code or readme or have any suggestions for improving the readme.

Feel free to raise any problems or questions you may have here.

@hosseinfani

Adding Twitter Reviews Dataset

In this task, we aim to incorporate an additional unsolicited dataset, specifically from Twitter, into LADy.

The dataset has been sourced from a research work titled "Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification," published in ACL 2014, which can be accessed via the following link: ACL 2014 Paper.

Loader for XML-based Semeval Datasets

Currently, the code has a loader for a csv file (not sure if it's from semeval). We need to add a loader for the official semeval datasets

Datasets for aspect extraction

Hello @DeepKaran1,
This issue page is for your first task on this project.
As we discussed, we have a pipeline in which we store the aos of each review from the datasets. Now, we have a dataset loader for the Semeval dataset (Version 2016 for the restaurant domain) in our code line. However, since we only have one dataset now, we need to add more datasets and test the pipeline for better evaluation.
As the first step, please research the current datasets that are popular in aspect extraction, especially in recent works, and report your findings. Then we can discuss them and decide which ones should be added to our code line.

Feel free to ask me if you have any questions regarding this task.

@hosseinfani, I would appreciate your comments on this task.

BTM Baseline

We need to add btm topic modeling for latent aspect detection

Adding CAt to the pipeline

We need to add CAt work into our pipeline and use it as an aspect model option when running the pipeline.
The implementation needs to be completed for dataset reading and preprocessing. However, we will add it to the code and also want to pass an embedding from the pre-trained models like the sentence-transformers library instead of training word2vec.

2018-ACL (Workshop on Economics and NLP)-Implicit and Explicit Aspect Extraction in Financial Microblogs

Implicit and Explicit Aspect Extraction in Financial Microblogs

I chose this paper because they stated in the subject of the paper that they have focused on implicit aspects.

Aspect Sentiment Triplet Extraction Baseline

Hey @arfaamr,
I hope you are doing well.
As we want to add a new baseline in LADy that is suitable for both aspect term and sentiment extraction, I have chosen Enhanced Multi-Channel Graph Convolutional Network for Aspect Sentiment Triplet Extraction paper accepted in ACL 2022.
You can check it out and run it using their repo at this link: https://github.com/CCChenhao997/EMCGCN-ASTE

Let me know if you have any questions or need help on this task.

@hosseinfani

Meta-based Self-training and Re-weighting for Aspect-based Sentiment Analysis

I read this paper (Not completely, all except the methodology)

It is not a good one to write the summary because it is not related to my work on an unsupervised method for latent aspect detection.
However, as a record of reading this paper, the proposed method in this paper is a self-supervised one for multi-task to mitigate the problem of insufficient and imbalanced data in the Aspect-based Sentiment Analysis (ABSA). It consists of three models (teacher model, student model, and meta-weighter). The teacher model helps the student model to train with generating the labels and the meta-weighter will try to find the optimum weight for each label. There is an example of the ABSA task below:

Input
Opinion: "The restaurant is crowded but with efficient and accurate service."

Output

Aspect extraction: restaurant
Opinion extraction: crowded
Aspect-level sentiment classification: negative

Aspect extraction: service
Opinion extraction: efficient
Aspect-level sentiment classification: positive

Adding translation model in the pipeline for the back-translation task

In this issue, we are going to provide a suitable model or API for translating the reviews from English to Language L and then back-translate the reviews from L to English.

2020-ACL- CAt: Embarrassingly Simple Unsupervised Aspect Extraction

This issue page is for a summary of the paper named "Embarrassingly Simple Unsupervised Aspect Extraction," published in ACL 2020, and it can be accessed using this link.

pipeline progress flow

we need such a thing to show the flow

LocLDA Baseline

We need to study this work. I think the current code is based on this paper but they also released their code.
An Unsupervised Aspect-Sentiment Model for Online Reviews

Semantic comparison of the augmented review with the original review

In this part, we will add a semantic comparison function for comparing the results of back-translation technique in a way that we

Add the augmented reviews to the dataset if their aspect is semantically similar to the original review's aspect, and
Discard the augmented review if the aspect's semantics differs from the aspect in the original review.

Setup and Quickstart

This is an issue page to log the progress for setup and quickstart.
@DeepKaran1 @hosseinfani

User-based Toy Review Dataset

To validate and unit test any proposed method for user-based review analysis, we need to build a tiny-scale test-case dataset.

Web Application for LADy

Hi @impedaka,

Welcome to LADy :)

This is an issue page to log your progress in developing the web app.

Please let us know if you have any concerns or questions.

@hosseinfani

Latent Aspect Detection from Online Unsolicited Customer Reviews

Main problem

The main problem of this paper is to detect the latent aspects in online unsolicited customer reviews which are not mentioned clearly. Aspects are defined as features of products and services, and customers will give their opinion about them as a review. These hidden aspects are not mentioned directly due to the social background of the author and readers.

Existing work

Existing methods to detect aspects in reviews can be divided into three categories based on the level of human supervision of the method:

Rule-based: in this method association rule mining approach is being used to match the aspects with the words.
Disadvantage: this method is not scalable when the number of combinations in reviews increases.
Supervised: in this method, we use the supervised machine learning method on a labeled dataset in which all the aspects are clearly shown with human effort.
Disadvantage: Human effort for annotating the labels in this method is time-consuming with a high cost, and also leads to bias.
Unsupervised: this method is not under human supervision.
Disadvantage: even in this method they still assume that aspects are clearly shown in the review, so it misses out on the hidden aspects.

Inputs

A collection of unsolicited customer reviews with no human supervision

Outputs

Latent aspects in the reviews

Example

The girl [staff as a hidden aspect] at the front desk was really nice
Were given table far from river [management as a hidden aspect]

Proposed Method

We have a generative process for generating the reviews in the following steps:

We pick an aspect with a high probability out of all aspect's probabilities in the Dirichlet distribution
We pick related words with high probabilities to make the review from the Dirichlet distribution
Coherence score for finding the optimum number of aspects
Resnik similarity score to calculate the inter-word semantic similarities

Experimental Setup

Dataset

Training: A dataset scraped from Google reviews of restaurants across North America (PxP)
Evaluation: SemEval with removing the explicitly shown aspects

Preprocessing
Removing numerical, non-English words, stop-words, emojis, and punctuations from reviews

Metrics

mean reciprocal rank (MRR)
recall
nDCG
success @5

Baselines

Random: a simple method for choosing the hidden aspect randomly
locLDA: unsupervised method to find the explicit aspects and unable to find the hidden ones
CMLA: a supervised model to extract both aspect and opinion using attention mechanisms
HAST: a supervised model to extract explicit aspects with attention block using bi-directional LSTMs
OTE-MTL: supervised multitasking learning framework to extract both aspects and opinions and parse sentiment dependencies between them
PxP: a proposed model of the paper, an unsupervised model that assumes that a review may have a hidden aspect

Results

The main contribution of this paper is to propose an unsupervised model for detecting the latent aspects of noisy and short unsolicited customer reviews. Results show that this unsupervised modeling of aspects as hidden variables leads to more accurate detection in comparison to baselines that detect the aspects which are clearly shown. Besides, the proposed unsupervised method has better results on MRR score in comparison to state-of-the-art supervised methods such as CMLA.

Code
https://github.com/MohammadForouhesh/latent-aspect-detection

Presentation
There is no available presentation for this paper

Random Baseline

We need to add a random baseline which randomly assigns a word in dictionary as an aspect to a review.
This helps us to know how distant a baseline is from random

Semeval-2014-Restaurant dataset

We need to add the loader for Semeval-R-14 to add the dataset to the pipeline

Literature Review on Aspect and Sentiment Extraction

I've recently conducted a literature review on Aspect and Sentiment Extraction and discovered several interesting papers on this topic that I'd like to share here.
Contributions from others are also appreciated.

2010-ACL-An Unsupervised Aspect-Sentiment Model for Online Reviews

An Unsupervised Aspect-Sentiment Model for Online Reviews

Main problem

The primary purpose of this paper is to detect aspects and find out the sentiment in online reviews. The proposed method pays attention to the domain and language as well as the impact of aspects on sentiment polarity.

Existing work

Existing methods that will be discussed are divided into three categories;

Aspect

The first studies in aspect detention were based on classic information extraction that uses frequently occurring noun phrases. However, there are several cases where aspects have low frequency, for instance, the type of different dishes of a restaurant, or even some cases where aspects can be described without noun phrases, for example, the character and atmosphere of a place. In the cases mentioned, those classic methods are not helpful.
Also, there is another approach that is being used which is LDA-similar topic models, which are not suited for the task of aspect detection based on the problem that they are only able to find global topics rather than rateable aspects which are related to the review. For this problem, a developed model named MG-LDA attempts to capture both layers of topics (global and local) where the local topics match rateable aspects.

Sentiment

Recently most methods have been based on a manually constructed lexicon of terms that are strongly positive or negative regardless of context. However, they are not helpful because sentiment is usually expressed with words whose polarities are domain specific.
Another approach is bootstrapping using a seed group of terms with known polarity to get the polarity of domain-specific terms. For instance, there is a method named Turney, which uses only a pair of adjectives which is good and poor, to recognize other terms’ polarity based on mutual information.
Besides, Hatzivassiloglou and McKeown presented an approach that determines the polarity of adjectives in a large corpus regardless of the task. In the beginning, nodes are adjectives, and edges are weighted based on the occurrence of these nodes as adjectives near each other in conjunction or disjunction. Then, the graph will be split into two parts with a heuristic function, and the part which contains the adjective with higher frequency will be labeled as positive and the other as negative.

Combined

Aspect affects the polarity of opinions. For instance, cheap as an opinion for staff has different polarity compared to cheap as an opinion for food.
There is work that proposes an LDA model with positive and negative sentiments as two additional topics. Also, there is another work that uses a seed set of positive and negative adjectives and iteratively propagates sentiment polarity through conjunction relations.
This work is also working on combined models. There is a local LDA that acts as a document with each sentence and that will output the aspects as a result. Also, they create a seed set of highly relevant positive and negative adjectives that are related to the aspects, and they use morphological negation indicators for this purpose.

Inputs

A collection of online customer review sentences.

Outputs

Aspects in the reviews with their sentiment

Proposed Method

Although a knowledge-rich approach might ignore some adjectives, authors focus on adjectives of the parts of speech as the sentiment indicators. This is because of the reason that adjectives can convey different sentiments depending on the aspect being discussed. For example, the adjective ‘warm’ is very positive in the Staff aspect but slightly negative in the General Food aspect.
In this work, a local LDA has been applied to the restaurant reviews dataset by assuming that each review sentence is a separate document to extract low frequent aspects. To be more specific, they proposed first to identify aspects using topic models and then identify aspect-specific opinion words by considering adjectives only.

Experimental Setup

Dataset

Training: A public dataset containing over 50k restaurant reviews from Citysearch New York. (http://newyork.citysearch.com/)
Also, they collected 1086 reviews for four leading netbook computers from Amazon.com to demonstrate the domain independence of the system.
Evaluation: Annotated dataset, which is a subset of 3400 sentences from the Citysearch corpus. (Manually labeled for aspect and sentiment)
There were six manually defined aspect labels: 1. Food & Drink, 2. Service, 3. Price, 4. Atmosphere, 5. Anecdotes, and 6. Miscellaneous.
They used only sentences with a single label for evaluation.
Sentiments for each sentence are positive, negative, neutral, or mixed.

Metrics

Kendall’s tau coefficient (tk)
Kendall’s distance (Dk)
They have been used to compare rankings and look at the number of pairs of ranked items that agree or disagree with the ordering in the gold standard.
The value of tk is in [ -1 (perfect disagreement), 1 (perfect agreement)] with 0 indicating an almost random ranking.
The value of Dk is in [0 (perfect agreement), 1 (perfect disagreement)].

Baselines

LDA propsed by Blei et al., 2003: Authors have used standard implementation of Latent Dirichlet Allocation (LDA)
Cluster validation procedure used by Levine and Domany, 2001, Lange et al., 2004, and Niu et al., 2007: Here they have a cluster corresponding to each aspect, and we label each sentence as belonging to the cluster of the most probable aspect.
Ganu et al., 2009: For evaluating and showing the quality of automatically inferred aspects in this work, they compared the output to the sentence-level manual annotation of Ganu's work.
Titov and Mc-Donald, 2008a: LocLDA had similar performance on a domain with
similar characteristics of their work and overcomes their issue with global topics and many-to-one mapping of topics to aspects.
Fahrni and Klenner: They have used a seed set of 128 positive and 88 negative adjectives independent of domain and target from Fahrni and Klenner's work for evaluation.

Results

The experiments show that a fully unsupervised approach has great results in the tasks of aspect detection and sentiment analysis. The aspects are inferred from the data and are more representative than manually derived ones. For example, the service aspect is important separately from staff, while the manually derived aspects grouped service under the staff group. Also, sentiments regarding their aspects are better showing polarity like the word “cheap,” which is discussed before.

Code
There is no code available for this paper.
There is just a standard implementation of LDA that they used on: [GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation] (https://gibbslda.sourceforge.net/)

Presentation
There is no available presentation for this paper.

fani-lab / lady Goto Github PK

lady's People

Contributors

Stargazers

Watchers

Forkers

lady's Issues

Main problem

Existing work

Inputs

Outputs

Example

Proposed Method

Experimental Setup

Baselines

Results

Main problem

Existing work

Inputs

Outputs

Proposed Method

Experimental Setup

Baselines

Results

Recommend Projects

Recommend Topics

Recommend Org