Giter VIP home page Giter VIP logo

ammarnasr / arabic-polarization-twitter Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 140.38 MB

This repository contains tools and scripts for analyzing Arabic Twitter discussions about the civil war in Sudan. It includes data collection methods, preprocessing steps, machine learning models for tweet classification, and analysis tools for understanding public opinion and geopolitical narratives.

License: MIT License

Jupyter Notebook 8.17% HTML 91.83%

arabic-polarization-twitter's Introduction

Twitter Polarization Analysis Toolkit

This repository contains tools and scripts for analyzing Arabic Twitter discussions about the civil war in Sudan. It includes data collection methods, preprocessing steps, machine learning models for tweet classification, and analysis tools for understanding public opinion and geopolitical narratives.

Repostiry Contents

The repository is organized as follows:

../
├── data/
│   ├── cleaned_data.csv
│   ├── combined_reports_with_preds_final.parquet
│   └── data.xlsx
├── embeddings/
│   └── labelled_embeddings.parquet
├── LICENSE
├── logs/
│   └── stratified_logs.pth
├── models/
│   ├── clf_anti peace.pth
│   ├── clf_Pro peace,.pth
│   ├── clf_RSF.pth
│   └── clf_SAF.pth
├── notebooks/
│   ├── 01_data_preprocessing.ipynb
│   ├── 02_model_training.ipynb
│   ├── 03_evaluation.ipynb
│   └── 04_analysis_visualization.ipynb
├── plots/
│   ├── binary_labels_counts_per_month.html
│   ├── binary_labels_counts_per_month_seaborn.png
│   └── labels_counts_percentage.html
└── reports/
    ├── Report 1.xlsx
    ├── Report 2.xlsx
    ├── Report 3.xlsx
    ├── Report 4.xlsx
    └── Report 5.xlsx

Data

The data folder contains the following files:

  • data.xlsx: The Labelled 900 tweets dataset.
  • cleaned_data.csv: The cleaned dataset after preprocessing.
  • combined_reports_with_preds_final.parquet: The final dataset after merging the reports with the predictions. All the reports 1-5 are in the reports folder.

Embeddings

The embeddings folder contains the following files:

  • labelled_embeddings.parquet: The OpenAI GPT-3 embeddings for the labelled dataset. The embeddings are in dataframe format with the tweet text crossponding to the embeddings. Embeddings are used for training the Classification models and the model_training notebook have code to get the embeddings from the OpenAI API.

Logs

The logs folder contains the following files:

  • stratified_logs.pth: The logs for the stratified KFold cross-validation for the classification models. This include the Loss, Accuracy, ROC-AUC and F1 scores for each fold for training and validation.

Models

The models folder contains the following files:

  • Some trained models for the classification tasks. The four examples shown are for clarifications and the models are not the final models used in the analysis. The final models can be found and used here: Trained Models
  • The models , result and inference tool can be accessed in this streamlit app: Tweet Classification App

Notebooks

The notebooks folder contains the following files:

  • 01_data_preprocessing.ipynb: The notebook for the data preprocessing steps.
  • 02_model_training.ipynb: The notebook for the model training steps.
  • 03_evaluation.ipynb: The notebook for the evaluation of the models.
  • 04_analysis_visualization.ipynb: The notebook for the analysis and visualization of the results.

Plots

The plots folder contains the following files:

  • binary_labels_counts_per_month.html: The plot for the binary labels counts per month. The plot is interactive and can be viewed in the browser.
  • labels_counts_percentage.html: The plot for the labels counts percentage. The plot is interactive and can be viewed in the browser.
  • binary_labels_counts_per_month_seaborn.png: The plot for the binary labels counts per month. The plot is a seaborn plot.

View on Streamlit labels_counts_percentage binary_labels_counts_per_month binary_labels_counts_per_month_seaborn

arabic-polarization-twitter's People

Contributors

ammarnasr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.