Giter VIP home page Giter VIP logo

trump_speech_nlp's Introduction

Performing Sentiment Analysis on 35 Trump Campaign Speeches and Generating a Wordcloud

Kaggle provides a number of great datasets. I found someone that posted 35 of Trump's campaign speeches in 2019 and 2020. Given how close the election was and how divisive Trump is considered by many, I thought it would be interesting to perform NLP and sentiment analysis on the speeches.

Source: https://www.kaggle.com/christianlillelund/donald-trumps-rallies

Steps Taken to Perform the Analysis

The Kaggle dataset provided 35 text documents with each text document containing one speech that Trump gave across various cities in the U.S.

I read in the the individual speeches, converting each speech into one large, concatenated string, and stored them into a DataFrame:

alt text

I then converted cleaned the text by removing punctuations:

alt text

Now the text is clean enough to perform the analysis.

Steps Taken to Perform the Sentiment Analysis

The analyis was performed using the NLKT (Natural Language Tool Kit), a popular Python library used to analyze human language data. Within the NLTK library there is a popular model called VADER (Valence Aware Dictionary And Sentiment Reasoner), which uses a lexical approach to analyze text. Words are labeled according to their semantic orientation as either positive or negative and the VADER provides an overall assessment of how positive or negative is the sentiment.

There are some pros and cons to using the VADER model which include:

Pros:

  • Easy to understand approach and quick to implement

Cons:

  • Misspellings and grammatical mistakes can cause the analysis to overlook words or usage

For more information here is a great article on the pros and cons of the VADER model:

https://www.codeproject.com/Articles/5269447/Pros-and-Cons-of-NLTK-Sentiment-Analysis-with-VADE

Lastly, I generated sentiment scores for each speech, appended them to empty lists, and then attached the sentiment scores to new DataFrame. This DataFrame was combined with the orignal DataFrame:

alt text

Conclusion:

To my surprise the speeches were largely rated as "neutral". There was not much variation in the scoring:

alt text

The VADER model scores text using scores the postivity, negativity and neutrality of each article by assigning a percentage. The three numbers combined must equal 100%. The average scores for each speech were:

  • 17.9% positive
  • 72.4% neutral
  • 9.7% negative

Below is a plot also demonstrating that there was not much variation in the scoring among the 35 speeches:

alt text

Visualizing Trump's Campaign Speeches with a WordCloud

The WordCloud highlights how Trump's most commonly used words are ones we see in the media such as "win", "China" and "great".

The WordCloud also illustrates how Trump's campaign speeches contain words that are shorter in length. This appear to be consistent based on the findings from professor Roderick Hart according to an interesting article from dictionary.com:

"As President, Trump receives a lot of attention for his use of words—especially since he continues to hold campaign rallies. In Trump and Us: What He Says and Why People Listen, professor Roderick Hart argues that Trump “turned the rant into an art form,” and relies on using both a lot of words and short words to convey his point."

alt text

trump_speech_nlp's People

Contributors

davidtstill avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.