Giter VIP home page Giter VIP logo

preppin-python's Introduction

Preppin' Data Challenges Solutions

This repository contains my solutions to the Preppin' Data challenges.

Preppin' Data is a weekly challenge for data analysts and data scientists to hone their data processing skills using Tableau. Each challenge consists of a scenario and a dataset, and the objective is to clean, transform, and analyze the data to generate the desired output.

I have decided to solve these challenges using Python and SQL, and I have uploaded my solutions to this repository. My solutions include Python scripts, Jupyter notebooks, and SQL scripts that demonstrate how to manipulate and analyze the given datasets to generate the desired outputs.

Each challenge has its own folder, and the folder contains the challenge description, the input dataset, and the output dataset. The Jupyter notebooks are named after the challenge number, and the SQL scripts are named using the format "YYYY_WX.sql" (ie: 2021_W1.sql). The output datasets are named using the format "YYYY_WX.csv".

I hope that my solutions will be helpful to those who are looking to improve their data processing skills using Python and SQL, and I welcome any feedback or suggestions on how to improve my solutions.

Requirements

To run the Python scripts and Jupyter notebooks, you need to have Python 3 installed along with the following libraries:

  • pandas
  • numpy
  • datetime

To run the SQL scripts, you need to have a SQL client installed, such as MySQL Workbench, Microsoft SQL Server Management Studio, or SQLiteStudio.

Usage

To run the Python scripts and Jupyter notebooks, simply navigate to the corresponding challenge folder and run the script using the Python interpreter or the Jupyter notebook server.

To run the SQL scripts, open the script in your SQL client and execute the script against the database of your choice.

2021 Solution

Challenge Python
Week 01 ๐Ÿ
Week 02 ๐Ÿ
Week 03 ๐Ÿ
Week 04 ๐Ÿ

2019 Solution

Challenge Python
Week 01
Week 02 ๐Ÿ
Week 03 ๐Ÿ
Week 04 ๐Ÿ
Week 05 ๐Ÿ
Week 06 ๐Ÿ
Week 07 ๐Ÿ
Week 08 ๐Ÿ
Week 09 ๐Ÿ
Week 10 ๐Ÿ
Week 11 ๐Ÿ
Week 12 ๐Ÿ
Week 13 ๐Ÿ
Week 14 ๐Ÿ

Disclaimer

The solutions provided in this repository are my own and may not be the most efficient or optimal solutions to the challenges. They are intended to showcase how to solve the challenges using Python and SQL and should not be used for production purposes without proper testing and validation. The datasets used in the challenges are the property of Preppin' Data and are used under fair use for educational purposes only.

๐Ÿ Python Snippets

Read all Excel tabs and concat as one dataframe

all_tabs = pd.read_excel('folder\\filename.xlsx', sheet_name=None)

DataFrame Transformations

Unioning dataframes together with concat

import pandas as pd
df_total = pd.concat([df1,df2,df3])

Joining tables

output = pd.merge(left=df_pivot, right=target, on=['Quarter', 'Store'], how='inner')

Pivoting columns to rows

df2 = df2.melt(id_vars=None, value_vars=['Animal Ingredients', 'E Numbers'], var_name='pivot_name', value_name='pivot_value')

Replacing null values with zero, blank, previous or preceeding values

import pandas as pd

# replace nulls with zeroes
df['Column with nulls'] = df['Column with nulls'].fillna(0)

# replace nulls with empty string (blank)
df['Column with nulls'] = df['Column with nulls'].fillna('')

# replace nulls with previous non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='ffill')

# replace nulls with next non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='bfill')

Aggregrating data

Create aggregrated columns grouped by other columns

import pandas as pd
df = df.groupby(['Col1','Col2']).agg(col3_min=('Col3','min'),col3_max=('Col3','max'),col3_sum=('Col3','sum')).reset_index()

Ranking column | Example:

output['Rank'] = output.groupby('group_col')['order_by_col'].rank(method='dense', ascending=False)

Data Clean-up

Rename single column

import pandas as pd
df.rename( columns={'Col1':'Col1_New_Name'}, inplace=True )

Split a column into multiple columns based on the specified separator

# split the pivot_value column into multiple columns based on the comma separator
df_split = df['pivot_value'].str.split(', ', expand=True)

# concatenate the resulting dataframes with the original dataframe
df_concat = pd.concat([df, df_split], axis=1)

preppin-python's People

Contributors

cuonglam0105 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.