Giter VIP home page Giter VIP logo

data-wrangling---weratedogs-twitter-archive's Introduction

Data-Wrangling---WeRateDogs-Twitter-Archive

Gathering, cleaning, and analysing Twitter Archive Data.
Screenshot

Introduction

In this project I will be gathering, assessing, cleaning and analysing data from the WeRateDogs Twitter archive data. WeRateDogs is a Twitter account with over 4 million followers that rates people's dogs with humurous captions. WeRateDogs has a unique rating system. The rating denominators are almost always 10 while the numerators are almost always greater than 10 i.e. 11/10, 12/10, 13/10 because "they're good dogs Brent".

Project Overview

The project will follow these steps:

  1. Gathering
  2. Assessment
  3. Cleaning
  4. Storing Cleaned Data
  5. Analysis and Visualisation of Data
  6. Reporting

Project Environment

I used Python 3.8.8 for this project and Juptyer Notebook as an IDE. Then main packages are as follows: pandas, numpy, requests, os, json, tweepy and matplotlib.pyplot. I also used additonal packages such as seaborn and skimage for further analysis and visualisation.

Data

  • 'twitter-archive-enhanced.csv': the enhanced archive data provided by Udacity, which contains extra columns for dog stages (doggo, floofer, pupper and puppo).
  • 'image-predictions.tsv': image predictions that were compiled after the archive data was run through a neural network that can classify dog breeds.
  • 'tweet-json.txt': a json text file gathered from Twitter's API.
  • 'twitter_archive_master.csv': a master dataset created from merging the above 3 after assessment and cleaning, and to be used for analysis and visualisation.

Reports

  • wrangle_act.ipynb: this is where all the code for wrangling and analysis is executed.
  • wrangle_act.html: this is the html version of the IPython notebook.
  • wrangle_report: this is a summary of the steps I took to assess, clean and store the data.
  • act_report.pdf: this is includes insights and visualisations from the analysis.

Author

Gorata Malose
Linkedin: Gorata Malose

License

This project is licensed under the MIT License and it was submitted by Gorata Malose as part of Udacity's Data Analyst Nanodegree programme.

data-wrangling---weratedogs-twitter-archive's People

Contributors

goratab avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.