Giter VIP home page Giter VIP logo

ktoufiquee / a-comparative-analysis-of-noise-reduction-methods-in-sentiment-analysis-on-noisy-bangla-texts Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 3.45 MB

Noise Identification, Noise reduction, and Sentiment Analysis on Bangla Noisy Texts

Jupyter Notebook 100.00%
back-translation bangla bangla-dataset bengali cost-sensitive-learning mask-language-modeling noise-reduction paraphrase sentiment-analysis spelling-correction

a-comparative-analysis-of-noise-reduction-methods-in-sentiment-analysis-on-noisy-bangla-texts's Introduction

A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts

This repository contains the code and datasets used in the paper titled "A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts " accepted in The 9th Workshop on Noisy and User-generated Text (W-NUT) collocated with EACL 2024.

Paper Link: https://www.arxiv.org/abs/2401.14360.

Table of Contents

Models

The code used to train and test the models are available in this repository under the folder NC-SentNoB Codes.

Datasets

The NC-SentNoB dataset is available at this repository on the folder NC-SentNoB Dataset.
Also available on: [HuggingFace] | [Paperswithcode] | [Kaggle]

SentNoB dataset with Back translation applied is also available on the folder Back-Translated Data. The 1000 ground truths used to evaluate denoising methods are available on the file 1000 Ground Truth.xlsx

Training & Evaluation

We used seven pretrained transformer models for Sentiment Analysis and SVM, BiLSTM, BanglaBERT for Noise Identification.

Benchmarks

  • Noise Identification
Model Precision Recall F1-Score
SVM (C) 0.76 0.45 0.57
SVM (W) 0.64 0.38 0.48
SVM (C+W) 0.75 0.45 0.56
Bi-LSTM 0.36 0.18 0.24
Bangla-BERT-base 0.73 0.54 0.62
  • Sentiment Analysis on Noisy Text
Model Precision Recall F1-Score
Bangla-BERT-Base 0.72 0.72 0.72
BanglaBERT 0.75 0.75 0.75
BanglaBERT Large 0.74 0.74 0.74
BanglaBERT Generator 0.72 0.72 0.72
sahajBERT 0.72 0.72 0.72
Bangla-Electra 0.68 0.68 0.68
MuRIL 0.73 0.73 0.73
  • Sentiment Analysis after Noise Reduction
Model Precision Recall F1-Score
Bangla-BERT-Base 0.69 0.69 0.69
BanglaBERT 0.72 0.72 0.72
BanglaBERT Large 0.73 0.73 0.73
BanglaBERT Generator 0.70 0.70 0.70
sahajBERT 0.70 0.70 0.70
Bangla-Electra 0.66 0.66 0.66
MuRIL 0.71 0.71 0.71

Future Research Directions

  1. Develop robust noise reduction models for Bangla texts.
  2. Investigate and develop noise specific reduction techniques.
  3. Leverage LLMs for noise reduction.
  4. Compare performance of Sentiment Analysis using LLMs in both settings: with and without noise.
  5. Implement character-level NMT models for back-translation to reduce noise.

License

Contents of this repository are restricted to non-commercial research purposes only under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Creative Commons License

Citation

If you use any of the datasets, models or code modules, please cite the following paper:

@misc{elahi2024comparative,
      title={A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts}, 
      author={Kazi Toufique Elahi and Tasnuva Binte Rahman and Shakil Shahriar and Samir Sarker and Md. Tanvir Rouf Shawon and G. M. Shahariar},
      year={2024},
      eprint={2401.14360},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

a-comparative-analysis-of-noise-reduction-methods-in-sentiment-analysis-on-noisy-bangla-texts's People

Contributors

ktoufiquee avatar shahariar-shibli avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.