Giter VIP home page Giter VIP logo

jfall-sentiment's Introduction

Sentiment Analysis of Social Media Posts with Apache Spark

This repository contains the sample source code and presentation used in the ignite session I have on JFall 2015. I also wrote a blog post on the subject which you can find here.

Presentation

The presentation (as PDF) can be found here.

Spark Hello World

A small runnable example of how to do do a word-count analysis is shown in HelloSparkWorld.java.

Running the analysis

Downloading the data

The 5GB dataset can be downloader using your favorite torrent client using this link.

You should end up with a RC_2015-01.bz2 file around 5GB in size.

The application.properties file has the default input set to /tmp/RC_2015-01.bz2. If you downloaded the file to a different location please change the properties file accordingly.

Configuration

The application has two config settings that need to be set by you (if their defaults are incorrect), these settings are contained in application.properties.

The input property should point to RC_2015-01.bz2 you just downloaded. The output property should point to an empty directory. The application will create the full directory if possible.

Running the Analysis

You can run the analysis by simply starting running the Main class. It should start a spark context and start an analysis run. You can then connect to http://localhost:4040/ to see the progress. Keep in mind that this process will take quite some time, more than one hour on my machine.

First it reads all the JSON and parses it into internal comment structures and analyses these. The resulting data is stored in a temporary object store location. This isn't strictly needed at all but since this part takes by far the most amount of time it's done for convenience: running new reduce operations on this dataset takes a lot less time than going through the entire deserialization again.

The object file is then used to do the count and sentiment reductions which are then written to their corresponding files.

Links

jfall-sentiment's People

Contributors

nielsutrecht avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.