Giter VIP home page Giter VIP logo

ammarshaikh123 / projects-on-data-cleaning-and-manipulation Goto Github PK

View Code? Open in Web Editor NEW
24.0 2.0 16.0 8.75 MB

This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.

Jupyter Notebook 99.74% Python 0.23% CSS 0.03%
data-science data-analysis data-mining feature-engineering feature-selection data-cleaning data-preprocessing machine-learning data-visualization imputation

projects-on-data-cleaning-and-manipulation's Introduction

Data-Cleaning-and-Manipulation

This repo contains my projects on Data Cleaning and Manipulation, I have covered diverse topics under each project, You can see the description for each project below.

1)A New Era of Data Analysis in Baseball

In this notebook, we're going to wrangle, analyze, and visualize Statcast data to compare two baseball players named Aaron Judge and Giancarlo Stanton, we will use data visualizations like scatter plot, KDE, 2D histogram and python 'def' method to create functions for our 2D histogram.

2)A Visual History of Nobel Prize Winners

A very interesting project I worked on, over here I analyzed the past Nobel Prize winners and tried to draw insights like 'How many males and females won the prize', 'How many of the winners were from USA', 'How many won the prize more than once' and 'How dominant were the winners when it came to country / gender', for answering this questions we used various data maniupulation techniques like group by, value counts and also used Data Visualization techniques like Line plot, lmplot.

3)AB Testing with Cookie Cats

In this project we performed AB testing on a popular game called as Cookie Cats, during the game players occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress. In this project we analyzed by AB-testing whether is it better to keep the gate at level 30 or level 40 by analyzing various factor of performance.

4)Dr. Semmelweis and the Discovery of Handwashing

In this project we analyze the popular Discovery of Handwashing by Dr. Semmelweis and how it helped reduce number of death among infants and preganant women, we used visualizationa and various mathematical functions inorder to conclude how much the discovery helped reducec the death rate.

5)Exploring 67 years of LEGO

One of the first and basic project done by me, used simple data manipulation techniques to analyze colors of lego blocks and also the different lego sets build over the years.

6)Exploring the Bitcoin Cryptocurrency Market

In this project we explore the data of Bitcoin Market, we see the market capitalization of top companies and analyze how volatile is the bitcoin market. We further analyzed to see at how much cost did the companies begin and how quickly their rates plunged.

7)Exploring the evolution of Linux

In this notebook, we analyzed the evolution of a very famous open-source project โ€“ the Linux kernel. The Linux kernel is the heart of some Linux distributions like Debian, Ubuntu or CentOS.

We get some first insights into the work of the development efforts by

  1. Identifying the TOP 10 contributors and
  2. Visualizing the commits over the years.

8)Exporing the Ames Iowa dataset

In this project we explored the breath alcohol tests from Ames Iowa. We analyzed what time of the day/moth were the tests mostly cocnducted. We also tried to find out if there was any pattern in the time of which the tests were conducted.

9)PROJECT WHICH DEBTS ARE WORTH THE BANK'S EFFORT

In this project I analysed different recovery procedures take by the bank for various loan categories, I tried to find out if the money invested in making this procedures run is actually giving the return to the banks using Statistical Tests and Exploratory graphical analysis in this project.

10)TV, Halftime Shows, and the Big Game

A very interesting project. I analysed all the performers that have performed in Super Bowl and analysed various parameters like band or singers who have performed more than once, number of songs performed during halftime, how has the viewership evolved of super bowl and also how likely are user to stay till the end of the match.

11)The GitHub History of the Scala Language

Scala is an open source project. Open source projects have the advantage that their entire development histories -- who made changes, what was changed, code reviews, etc. -- publicly available.

In this project I read, cleaned, and visualized the real world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). We found out who had the most influence on its development and who were the experts.

  1. Suicide Analytics

In this project I used data from the Government of India website regarding the suicide rates and tried to analyse differenet parameters which I have explained in my blog here. I have used web scrapping to as well as data manipulation to bring out the insights for this project

  1. World Cup History

One of the project which I enjoyed working on throughly. I used web scraping to get raw data from a cricket stats website. Later I formatted that data and tried to analyse it and gathered some really cool insights which you can read here

Few snapshots of visualization I have performed in the mentioned projects.

alt text

projects-on-data-cleaning-and-manipulation's People

Contributors

ammarshaikh123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.