Giter VIP home page Giter VIP logo

real_time_social_media_mining's Introduction

DevOps pipeline for Real Time Social/Web Mining

Workflow

Workflow

[x] Setting up Apache Maven for Java project - User Interface and MapReduce functions

[x] Setting up GitHub repository workflow

[x] Setting up GitHub Actions for automation

[x] Creating a web crawler in Python using Tweepy library to fetch data based on some parameter.

[] Create a User Interface

[x] Create a HDFS cluster for MapReduce functionality and program Hadoop MapReduce in Java

[x] Setup Hadoop Core and create Job Tracker and Task Trackers for the project

[x] Implement MapReduce in HDFS using Java to count the frequency of significant words in Data dictionary, in Twitter string

[x] Configure Apache Maven with MapReduce codes and install Apache Hadoop Jar dependency

[x] Configure MapReduce code in GitHub Actions for automation

[x] Automate the Big Data pipeline till MapReduce using GitHub Actions

[] Use Data Ingestion tools like Flume to send data from crawler to HDFS at real time

[x] WAP in Java to implement MapReduce from JSON file extracted from crawler to find the frequency of significant words - Textual Analysis

[] Data Classification - create a multi-class data dictionary for sentimental analysis - currently for words (in future, we might extend it for phrases and sentences for improved accuracy)

[x] Data Predicition - Using the KNN algorithm in Python to find the relation between tweets and their sentiments.

[x] Data Visualization - Using the Python matplotlib library to implement visualization.

Important Source files and dependencies

  1. pom.xml - Setup Apache Maven

  2. helloworld.java - Basic Java project setup

  3. maven.yml - setup GitHub Actions

  4. crawler.py - Web Crawler in Python to extract twitter data based on specific hashtags.

  5. info.csv - data file created as an output for the crawler and to be sent to the HDFS core for processing

  6. MapReduce functionalities in Java

  1. Sentimental Analysis in Python
  • Convolutional Neural Networks
  • Decision Tree
  • SVM
  • Pre-Processing
  • Random Forests
  • Naive Bayes
  • XGBoost
  1. matplotlib.py - Data Visualization using matplotlib in python

  2. Hadoop Setup

How to Contribute

It is an open source project. Open for everyone.

Follow these contribution guidelines.

License

MIT License, copyrighted to Storms In Brewing (2019-2020)

real_time_social_media_mining's People

Contributors

krannaut avatar megharawat3 avatar nishkarshraj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.