Giter VIP home page Giter VIP logo

bert-sentiment-analysis-turkish's Introduction

BERT Sentiment Analysis Turkish

Sentiment Analysis in Turkish tweets is implemented with 3 different feature extraction techniques and simple multilayer perceptron(MLP). These feature extraction techniques are:

For more NLP content like this, please subscribe to my blog: https://akoksal.com/

And check out this post for detailed explanation and better models with Keras: https://akoksal.com/articles/understand-tweets-better-with-BERT-sentiment-analysis

Note that, transformers library 3.0.2 version is used in these notebooks. Please install this version of the library: pip install transformers==3.0.2

Dataset

Due to license problems in Twitter datasets, I had to remove BOUN Twitter Data and collected tweets with TweetScraper. I can share only Tweet IDs for BOUN Twitter Data. Download

I also shared dummy json data for BOUN Twitter Data and TweetScraper data in this repo to show the required data format. Please, DO NOT train your model and analyze with this data as it would fail.

Notebooks

Notebooks are self-explanatory. You can check out PyIstanbul Notebooks folder for 3 different feature extraction techniques.

BERT Features with Keras notebook has custom loss, Dropout, and more controllable features with Keras which result better scores with 68% macro averaged recall.

Results

Models Positive Recall Neutral Recall Negative Recall Average Recall(Macro)
SentiTurkNet 0.04 0.94 0.09 0.36
Word2Vec 0.37 0.69 0.47 0.51
BERT 0.53 0.76 0.67 0.65

Analysis

Also, 3 different topics with big incidents are analyzed and the correlation between incidents and Twitter sentiments is seen by BERT model.

1. Netflix

Protests in Twitter after new Turkish series in Netflix with LGBT content.

Tweet

Scores

2. Cappy

Two different incidents in Twitter about Cappy for unidentified objects in juice.

Tweet 1

Tweet 2

Scores

3. Berkcan Guven

Some major critics about Berkcan Guven in Twitter after he released a video with underage celebrity. He removed the video after 7 hours which already had more than 700k views.

Video

Scores

Citation

For SoTA and more detailed BERT-based models and the dataset, you can check out the BounTi repository.

You can cite the following paper if you use our work:

@INPROCEEDINGS{BounTi,
  author={Köksal, Abdullatif and Özgür, Arzucan},
  booktitle={2021 29th Signal Processing and Communications Applications Conference (SIU)}, 
  title={Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis}, 
  year={2021},
  volume={},
  number={}
  }

bert-sentiment-analysis-turkish's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.