[x] Setting up Apache Maven for Java project - User Interface and MapReduce functions
[x] Setting up GitHub repository workflow
[x] Setting up GitHub Actions for automation
[x] Creating a web crawler in Python using Tweepy library to fetch data based on some parameter.
[] Create a User Interface
[x] Create a HDFS cluster for MapReduce functionality and program Hadoop MapReduce in Java
[x] Setup Hadoop Core and create Job Tracker and Task Trackers for the project
[x] Implement MapReduce in HDFS using Java to count the frequency of significant words in Data dictionary, in Twitter string
[x] Configure Apache Maven with MapReduce codes and install Apache Hadoop Jar dependency
[x] Configure MapReduce code in GitHub Actions for automation
[x] Automate the Big Data pipeline till MapReduce using GitHub Actions
[] Use Data Ingestion tools like Flume to send data from crawler to HDFS at real time
[x] WAP in Java to implement MapReduce from JSON file extracted from crawler to find the frequency of significant words - Textual Analysis
[] Data Classification - create a multi-class data dictionary for sentimental analysis - currently for words (in future, we might extend it for phrases and sentences for improved accuracy)
[x] Data Predicition - Using the KNN algorithm in Python to find the relation between tweets and their sentiments.
[x] Data Visualization - Using the Python matplotlib library to implement visualization.
-
pom.xml - Setup Apache Maven
-
helloworld.java - Basic Java project setup
-
maven.yml - setup GitHub Actions
-
crawler.py - Web Crawler in Python to extract twitter data based on specific hashtags.
-
info.csv - data file created as an output for the crawler and to be sent to the HDFS core for processing
-
MapReduce functionalities in Java
- Convolutional Neural Networks
- Decision Tree
- SVM
- Pre-Processing
- Random Forests
- Naive Bayes
- XGBoost
-
matplotlib.py - Data Visualization using matplotlib in python
-
Hadoop Setup
It is an open source project. Open for everyone.
Follow these contribution guidelines.
MIT License, copyrighted to Storms In Brewing (2019-2020)