Giter VIP home page Giter VIP logo

twitter-data-analysis's Introduction

Twitter Data Analysis

Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

tweeter_logo

What Is Twitter Data?

Twitter data is the information collected by either the user, the access point, what’s in the post and how users view or use your post. While this might sound somewhat vague, it’s largely due to the massive amount of data that can be collected from a single Tweet.

With this information, you can know demographics, total clicks on your profile or how many people saw your Tweet. This is just the tip of the iceberg, but understanding the data allows you to know how it’s used and the patterns of your content.

How the data is collected?

Twitter is the Major Source of data for our challenge. We were provided with pre-downloaded data on Economic hardships-related topics. The data comes in two parts.

1. The first will be around 100MB of a raw twitter data dump in JSON format. This data is collected using the following keywords: [‘inflation’, ‘fuelprice’, ‘fuelpricehike’, ‘ fuelprices’, ‘fuelshortage’, ‘foodprice’, ‘oilprice’, ‘oilprices’, ‘cookingoilprice’, ‘unemployment’, ‘unemploymentrate’, ‘economiccrisis’, ‘economichardship’]

2. The second one will be around 300MB of the same format, but collected based on the original keyword plus country-specific geocodes included e.g. ‘0.0263,37.9062,530km for Kenya’.

Extracting tweets from Twitter raw JSON file

To load the data from JSON format we need to install the required libraries. We will have to load the Twitter data into a pandas data frame using different types of python functions like find_status_count(), find_hashtags(), and find_retweeted_text(). Using this many functions, we need to append every tweet into a list and at the end, we will get the extracted data in the form of a CSV file.


Table of Contents


Project Structure

  • images/ the folder where all snapshot for the project are stored.
  • models/ the folder where script logs are stored.
  • data/ the folder where the dataset files are stored.
  • .github/: the folder where github actions and unit-tests are integrated.
  • .vscode/: the folder where local path are stored.
  • notebooks: a jupyter notebook for preprocessing the data.
  • scripts/: folder where modules are stored.
  • tests/: the folder containing unit tests for the scripts.

root folder

  • requirements.txt: a text file lsiting the projet's dependancies.
  • .travis.yml: a configuration file for Travis CI for unit test.
  • setup.py: a configuration file for installing the scripts as a package.
  • README.md: Markdown text with a brief explanation of the project and the repository structure.

Installation guide

Conda Enviroment

conda create --name mlenv python==3.8
conda activate mlenv

then

git clone https://github.com/Ammon21/Twitter-Data-Analysis.git
cd Twitter-Data-Analysis
sudo python3 setup.py install

twitter-data-analysis's People

Contributors

10acad avatar abuton avatar ammon21 avatar jbkwizera avatar kiiru-anastasia avatar yabebalfantaye avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.