Sarcasm Detection

It is aimed at detecting sarcasm in news headlines. I have also developed a web app where user can type some news headline or some text and check whether it is sarcastic or not.

Setting up your enviroment
Project Motivation
File Descriptions
Instructions
Results
Conclusion
Licensing, Authors, and Acknowledgements

Setting up your enviroment

Python 3.* is required to run this Project. If you are running Anaconda, then you'll need four extra libraries (4 to 5):

Pandas
Numpy
Matplotlib
Plotly
Wordcloud
NLTK
Keras

Install above dependencies using pip install <dependency>

Project Motivation

Recent advances in natural language sentence generation research have seen increasing interests in measuring negativity and positivity from thesentiment of words or phrases. However, accuracy and robustness of results are often affected by untruthful sentiments that are sarcastic in nature and this is often left untreated. Sarcasm detection is a very important process which can help to filter out noisy data (i,.e sarcastic sentences) from training data inputs, which can be used for natural language sentence generation².

Project Overview and analysis can be found in the notebook

File Descriptions

All my work is in the notebook. Data folder contains two files, both of which are required by the notebook

Instructions

To run the web app:

Go to app directory cd app
Run the run.py file python run.py
Open your browser and go to http://localhost:3001
Test normal headlines from here: https://www.sciencedaily.com/news/computers_math/artificial_intelligence/
Test sarcastic headlines from here: https://bestlifeonline.com/funniest-newspaper-headlines-of-all-time/

Note:

Since I'm using plotly which use iframes for visualizations, you won't be able to see them in notebook on github. Please download and open the HTML file in Firefox/Chrome.

Results

I'm using accuracy to compare the models

Testing a normal headline outside of dataset:
Testing a sarcastic headline: outside of dataset:

Conclusion

I tried three different models but still not able to breach 90% mark. Deep Learning method seems promising, but we are short of data to use more advanced deep learning models. Even a basic LSTM seemed to be overfitting in just 10 epochs. So, one thing which would be good to have would be more data
From amongst traditional machine learning methods, Naive Bayes works better than Random Forest in my case. Naive Bayes is a good algorithm for working with text classification. When dealing with text, it’s very common to treat each unique word as a feature, and since the typical person’s vocabulary is many thousands of words, this makes for a large number of features. The relative simplicity of the algorithm and the independent features assumption of Naive Bayes make it a strong performer for classifying texts.
For the webapp, I'm going with Naive Bayes for the following reasons:
- It requires less model training time
- Naive Bayes model size is low and quite constant with respect to the data
- Naive Bayes can quickly adapt to changes in data whereas we would have to rebuild Random Forest everytime

iamrajeshchauhan / capstone Goto Github PK

capstone's Introduction

Sarcasm Detection

Table of Contents

Setting up your enviroment

Project Motivation

Project Overview and analysis can be found in the notebook

File Descriptions

Instructions

Note:

Results

Conclusion

Acknowledgements

capstone's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent