In this sample, I will leave content of twitter4j.properties as dummy data. please replace the data in twitter4j.properties with your own and make sure the .jar and .properties in a same directory.
-
(java) To connect to Public Twitter API using your own keys and secretid
-
(java) Given twitter handle, you can find the followers' handle list
-
(java) Twitter API Key Resoure Control by managing the concurrency and locking mechanism to maximize the rate litmit utilization
-
(java) Executor Thread pool to submit concurrent tasks
-
(python) Random Sampling Account and then output as csv file in 01SamplingAccount folder
-
(python) Read through full account list and then output as csv file in 01FullAccount folder
-
(python) Compose a gnip query rule with interested accounts that aligning with rule limitations
-
(python) Create a historical job that can sent to gnip
-
(python) Generate csv files group by rule tags
-
(spark) Generate json/csv files group by rule tags (accerelate processing speed by parallelizing)
spark-submit --master "local[*]" --executor-memory 2G --total-executor-cores 20 06GNIPDataGroupByRuleTag-Spark.py > 06GNIPDataGroupByRuleTag-Spark.log 2>&1
-
(spark) Generate json/csv files filter by influencee account (accerelate processing speed by parallelizing)
-
(spark) Speark GraphX to analyze the social networking of random sampled followers
-
(java8) CountTweets
export MAVEN_OPTS="-ea" mvn exec:java@0002 -Dexec.args="./output/collect-follower-day4/modelpress.followers.json Scanner"
-
(java8) CountTweetsParaller : Use parallels stream to parse json object
export MAVEN_OPTS="-ea" mvn exec:java@0003 -Dexec.args="./output/collect-follower-day4/modelpress.followers.json Parallels"
-
(node v6) Mapbox visualization on followers home locations
- (java8) Utility Class that getting directories and files resursively using stream. Here, in order to handle checked exception in stream chain , Throwables.propagate(e) in google guava library was used.
mvn exec:java@0004
- Adding GitBook Integration (experimental)
Java (Stream, Concurrency, Twitter API)
Python (Data Processing)
Spark (Data Processing)
Node V6 (Mapbox Visualization)