Giter VIP home page Giter VIP logo

sentinel's Introduction

SentiNEL: Sentiment Analysis from Tweets

SentiNEL system is developed for sentiment analysis of tweets based on SemEval2015 Task10-Subtask A: Contextual Polarity Disambiguation. The purpose of SentiNEL is that given a message containing a marked instance of a word or a phrase, determines whether that instance is positive, negative or neutral in that context. SentiNEL is inspired by the IOA system. The main differences are that SentiNEL extracts more features (e.g. Char 3, 4, 5 grams, Hashtag, longer Word2Vec dimension, more lexicons etc.) for training. Besides, SentiNEL trains L2-regularized logistic regression SVM classifier with C value 0.5. The code is based on Webis system. However, Webis is a system only for SemEval2015 Task10-Sub Task B (Message-level task). We modify the code and adapt it to term-level. The system is scored by computing F1-score for predicting positive/negative phrases. Comparing to IOA system, SentiNEL improves the F1-score from 83.90 to 88.15 on Tweet2013-test, from 84.18 to 84.73 on Sms2013-test.

Key words: Sentiment analysis, Machine Learning, Data Mining, NLP

Architectural Overview

SentiNEL consists of four steps:

  • Pre-train Word2Vec module: it trains the Word2Vec vectors from all the words which appear at least 3 times in the dataset
  • Extraction of features: it extracts features from the training dataset
  • Train: it trains the SVM classifier with extracted features
  • Evaluation: it evaluates the trained SVM classifier and tests it with testing dataset

image

Corpus description

The corpus is collected from SemEval-2015 Task 10 Dataset. The following table shows the account of dataset we collected.

Corpus Positive Negative Neutral Total Tweets
Tweet2013-train 4484(62.5%) 2329(32.5%) 356(5%) 7169
Tweet2013-dev 506(62.6%) 326(34.0%) 40(3.4%) 872
Tweet2013-test 2132(62.6%) 1156(34.0%) 116(3.4%) 3404
Sms2013-test 1071(45.9%) 1103(47.3%) 159(6,8%) 2333
Tweet2014-test 3568(66.5%) 1606(29.9%) 190(3.5%) 5364
Sms2014-test 710(45.3%) 747(46.7%) 111(7.1%) 1568

Requirements

  • Java 7+
  • Maven 3+

Setting Up

git clone https://github.com/MultimediaSemantics/sentinel	
mvn clean
mvn compile

Train

mvn exec:java -Dexec.args="train train_file [save_features_file]"
-train					set 	train mode
-train_file 			set 	the input file for training
-save_features_file		set 	the file to save trained features, by default SentiNEL saves the extracted features in arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="train train"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features.arff

mvn exec:java -Dexec.args="train train model1"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features-model1.arff

Evaluation

mvn exec:java -Dexec.args="eval test_file [saved_features_file]"
-eval					set 	test mode
-test_file 				set 	the input file for testing
-saved_features_file	set 	the file contains trained features, by default SentiNEL trains SVM classifier with arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="eval Tweet2013-test"

Train SVM classifier with the extracted features: arff/Trained-Features.arff, then evaluate it with testing dataset: resources/tweets/Tweet2013-test.txt"

mvn exec:java -Dexec.args="eval Sms2013-test Trained-Features-model1"

Train SVM classifier with the extracted features: arff/Trained-Features-model1.arff, then evaluate it with testing dataset: resources/tweets/Sms2013-test.txt"

Output

"I drove a Lincoln and it's a truly dream"
Lincoln -> positive

The output of SentiNEL locates in output/ folder. result.txt file contains the sentiment prediction results, and error_analysis.txt file contains the wrong sentiment prediction results.

Team

  • Yonghui Feng
  • Ahmed Abdelli
  • Giuseppe Rizzo
  • Raphael Troncy

sentinel's People

Contributors

amourdemai avatar giusepperizzo avatar ahmedabdelli avatar rtroncy avatar

Stargazers

Salih Doruk Şahin avatar Suraj Donthi avatar Jewel K avatar  avatar wzm avatar  avatar KwanHong Lee avatar Radek Baczyński avatar  avatar  avatar Selim Firat Yilmaz avatar Quang Luong avatar  avatar Athul Krishnan avatar  avatar Silvestre Losada avatar  avatar

Watchers

 avatar Giulio Collura avatar James Cloos avatar Mario Guerriero avatar  avatar PU Yang avatar  avatar  avatar  avatar Elise Dong avatar  avatar Eugenia Spano avatar Martino Mensio avatar Luca avatar Fabio Ellena avatar  avatar Sara avatar Li Xianglei avatar Efstratios Sygkounas avatar Lorenzo Canale avatar

sentinel's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.