Giter VIP home page Giter VIP logo

ahmedazizkhelifi / 5th-place-partial-solution-for-the-zindi-ai4d-icompass-social-media-sentiment-analysis-for-tunisian- Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fadhloun-y/5th-place-partial-solution-for-the-zindi-ai4d-icompass-social-media-sentiment-analysis-for-tunisian-

0.0 0.0 0.0 3.45 MB

This challenge aims to classify sentiment in the Tunisian Arabizi dialect

Python 92.16% Shell 7.84%

5th-place-partial-solution-for-the-zindi-ai4d-icompass-social-media-sentiment-analysis-for-tunisian-'s Introduction

5TH PLACE Partial Solution for the Zindi AI4D iCompass Social Media Sentiment Analysis for Tunisian Arabizi

Redirect to Challenge Website

Objective Of the Challenge

The objective of this challenge is to, given a sentence, classify whether the sentence is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen. Predict if the text would be considered positive, negative, or neutral (for an average user).

Quick Introduction

The dataset has three sets of labels - Negative,Positive,Neutral and was highly Imbalanced. 54 % Positive Samples 42 % Negative Samples 04 % Neutral Samples

Our Approach

We encountered two main problems with the dataset which are annotation and imbalacing of text classes .. We started to believe even if we had the chance to add external data and train our language modeling model we won't achieve great results so as a result of that we proceeded with two different approaches text-augmentation (which is not included in this solution ) and turning the problem into a binary task keeping only positive and negative texts . Our models were a combination between transformers and reccurent neural networks ( RobertaXLM - LSTM ) and (Bert Multilingual Cased - LSTM) inspired from Icompass Paper Learning Word Representations for Tunisian Sentiment Analysis .

Instructions to run the code

Code could be run using google colab .

Environment Setup

You'll find a requirement file that you could install in your own virtual environment.

Data Setup

Download data from the competition website and save it to the ./data/ directory.

Training Phase

To start training the models run ./train.sh it will take about 4 hours to run the two architectures .

Testing Phase

To start testing the models run ./test.sh it will create test files with ID of text sample and predictions if you are using the twolabels classifier mode or Negative,Neutral and Positive predictions if threelabels classifiermode is used .

Blending Phase

./inference.sh could be run after training and test phases it will create the best submission with IDs and labels (-1 Negative , 1 Positive ) if classifiermode used is twolabels or (-1 , 0 Neutral , 1 ) if classifiermode used is threelabels .

Single Best Model

Best single model was BertMultilingualCased combined with LSTM averaged on 5 folds of the test set which achieved 83.52 Accuracy .

Hyperparameters

ArchitectureName : Bert-LSTM LossFunction : BCEWithLogistLoss SequenceMaxLength : 64 BatchSize : 32 LearningRate : 3e-5 NumClasses : 2 Threshhold : 0.5

Blending

Final model was an average of all models checkpoints which achieved 0.8382 on the private LB.

Architectures : BertMultilingualCased-LSTM , RobertaXLM-LSTM and BertMultilingualUncased trained on augmented data.

SequenceMaxLength : 64,128

What didn't Work

-- We had around 300k tokens for only 70000 samples for training set , there were a lot of similair words but with similair meaning but different spelling so the idea was to cluster those words grouping each words that can be written with same words and in the same order and then map each word in training samples with same chars and same order for them into one token .

-- Pseudo Labeling .

5th-place-partial-solution-for-the-zindi-ai4d-icompass-social-media-sentiment-analysis-for-tunisian-'s People

Contributors

fadhloun-y avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.