Description
Twitter users often associate and socialize with other users based on similar interests. The Tweets of these users can be classified using a trained LDA model to automate the discovery of their similarities.
Prerequisites
To use Python 3, the beta version of the Pattern library must be manually installed using:
pip3 install git+git://github.com/pattern3/pattern.git
Otherwise, Python 2.7 can be used since Pattern package is not currently compatible with Python > 2.7.
If you manually install Pattern3 you should remove the pattern library from the requirements.txt file before installing.
Installing
Download:
git clone https://github.com/kenneth-orton/twitter_LDA_topic_modeling.git
Run linux_setup.sh:
./linux_setup.sh
Install Python packages using pip (or use an environment like a normal person):
pip install -r requirements.txt
Process
- Get user and follower ids by location - twitter_user_grabber.py
- Download Tweets for each user - get_community_tweets.py
- Create an LDA model from a corpus of documents - create_LDA_model.py
- Generate topic probability distributions for Tweet documents - tweets_on_LDA.py
- Calculate distances between Tweet documents and graph them - plot_distances.py