Giter VIP home page Giter VIP logo

twitter-feelings's Introduction

TwitterFeelings

Build StatusCoverage Status

TwitterFeelings is a Twitter sentiment analysis engine. It streams live statuses from Twitter and categorizes them according to their mood (positive or negative). It is written in Elixir, runs on Erlang VM and uses Redis for storage.

It is partly based on Twitter Sentiment Classification using Distant Supervision described in this Stanford paper.

TwitterFeelings is composed of 3 parts :

  • corpus builder: uses Twitter search API to build a very large set of tweets that will feed the sentiment analyzer, to make it learn.
  • learning: based on a corpus of tweets, computes each word's probability to appear in a positive and in a negative tweet.
  • streaming machine : uses Twitter streaming API to get live tweets on a specific topic, and updates Redis mood counters accordingly.

Setup

  • install Elixir / Erlang / Redis
  • install dependencies mix deps.get
  • build script with mix escript.build command
  • have following environment variables declared:
    • TWITTER_CONSUMER_KEY
    • TWITTER_CONSUMER_SECRET
    • TWITTER_ACCESS_TOKEN
    • TWITTER_ACCESS_TOKEN_SECRET

Corpus Builder

This application runs thousand of queries on Twitter Search API, to build a large corpus of Tweets that we will able to analyze later. Since we need to know if each retrieved status is either positive or negative, we will use Stanford approach (see link in intro) to categorize Twitter statuses according to the smileys they contain.

Each tweet retrieved is:

  • filtered (we don't keep biased twitters containing both positive and negative smileys)
  • normalized (downcased, stripped of urls/usernames/accents/smileys/short words/...)
  • stored in a dedicated Redis set

To build a large corpus of french positive and negative tweets, run the following commands:

./twitter_feelings build-corpus --lang fr --mood positive
./twitter_feelings build-corpus --lang fr --mood negative

It will take hours since the application has to deal with Twitter rate limitations (450 queries per 15mn). Each query fetches 100 tweets at once.

Learning

Live Tweets will a be categorized using the Naive Bayesian approach. The great benefit of this method is that a lot of stuff can be precomputed making the live computation very fast. The purpose of this module is to precompute (from existing tweets corpus) the probability for each monogram (ie word) to appear in either positive or negative messages.

Once two corpus have been built (both negative and positive moods on a single language), run the learning module with this command:

./twitter_feelings learn --lang fr

On my computer (4ghz i7), it takes about 10mn to learn from 2 x 500.000 tweets.

Streaming Machine

TODO

twitter-feelings's People

Contributors

cblavier avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.