Giter VIP home page Giter VIP logo

preppin-data's Introduction

Preppin' Data with Python ๐Ÿ, R ๐Ÿดโ€โ˜ ๏ธ, SQL โ„๏ธ, Alteryx โฌ†๏ธ, and Tableau Prep โœจ

Solving Preppin' Data Challenges with Python, and now Alteryx and Tableau Prep.

Status GitHub Issues GitHub Pull Requests License

Twitter ๐Ÿ’ฌย ย ย |ย ย ย LinkedIn ๐Ÿ‘”ย ย ย |ย ย ย GitHub :octocat:ย ย ย |ย ย ย Website ๐Ÿ”—

๐Ÿ…ฐ๏ธ About

At the start of 2021 I wanted to improve my data prep skills with Python. Now in 2022 I'm doing the same for Alteryx and Tableau Prep.

Preppin' Data is a weekly data prep challenge built around Tableau Prep but the challenges apply to other tools and even coding languages, so ideal practice if you're looking to improve your data prep skills.

From participating, I'm much more confident in Python and use Python in projects such as working out the life expectancy of chess pieces and finding the resale value of Pokemon trading cards.

Below are my solutions and Python code snippets I regularly use in these challenges.

โœ… 2023 Solutions

Challenge Python R SQL
Week 01 ๐Ÿ ๐Ÿดโ€โ˜ ๏ธ โ„๏ธ
Week 02 ๐Ÿ ๐Ÿดโ€โ˜ ๏ธ โ„๏ธ
Week 03 ๐Ÿ ๐Ÿดโ€โ˜ ๏ธ โ„๏ธ
Week 04 ๐Ÿ ๐Ÿดโ€โ˜ ๏ธ โ„๏ธ

โœ… 2022 Solutions

Challenge Python Alteryx Tableau Prep
Week 01 ๐Ÿ โฌ†๏ธ โœจ
Week 02 ๐Ÿ โฌ†๏ธ โœจ
Week 03 ๐Ÿ โฌ†๏ธ โœจ
Week 04 ๐Ÿ โฌ†๏ธ โœจ

โœ… 2021 Solutions

Challenge Solution Challenge Solution Challenge Solution Challenge Solution
Week 01 Python Week 14 Python Week 27 Python Week 40 Python
Week 02 Python Week 15 Python Week 28 Python Week 41 Python
Week 03 Python Week 16 Python Week 29 Python Week 42 Python
Week 04 Python Week 17 Python Week 30 Python Week 43 Python
Week 05 Python Week 18 Python Week 31 Python Week 44 Python
Week 06 Python Week 19 Python Week 32 Python Week 45 Python
Week 07 Python Week 20 Python Week 33 Python Week 46 Python
Week 08 Python Week 21 Python Week 34 Python Week 47 Python
Week 09 Python Week 22 Python Week 35 Python Week 48 Python
Week 10 Python Week 23 Python Week 36 Python Week 49 Python
Week 11 Python Week 24 Python Week 37 Python Week 50 Python
Week 12 Python Week 25 Python Week 38 Python Week 51 Python
Week 13 Python Week 26 Python Week 39 Python Week 52 Python

๐Ÿ Python Snippets

Reading Files

Reading csv files | Example: W05 2021

import pandas as pd

df = pd.read_csv('folder\\filename.csv')

Reading Excel files | Example: W04 2021

import pandas as pd

df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name = 'Sheet1')

Reading and aggregrating multiple Excel tabs | Example: W21 2021

import pandas as pd

# Read all Excel tabs and concat as one dateframe
all_tabs = pd.read_excel('folder\\filename.xlsx', sheet_name=None)

# Bring all the sheets together
all_dfs = []
for tab_name, df in all_tabs.items():
    df['sheet_name'] = tab_name
    all_dfs.append(df)
    combined_df = pd.concat(all_dfs, ignore_index=True)

Skipping rows and columns in an Excel tab | Example: W48 2021

import pandas as pd

df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name='Sheet1',nrows= 3,skiprows = range(1,7), usecols = "B:D")

Writing Files

Writing csv files with utf-8 encoding | Example: W10 2021

import pandas as pd

df.to_csv('folder\\filename.csv', encoding='utf-8-sig', index=False)

Writing Excel files | Example: W14 2021

import pandas as pd

with pd.ExcelWriter('folder\\filename.xlsx') as writer:  
    df_1.to_excel(writer, sheet_name='Sheet1', index=False)
    df_2.to_excel(writer, sheet_name='Sheet2', index=False)
    df_3.to_excel(writer, sheet_name='Sheet3', index=False)

DataFrame Transformations

Unioning dataframes together with concat

import pandas as pd

df_total = pd.concat([df1,df2,df3])

Replacing null values with zero, blank, previous or preceeding values

import pandas as pd

# replace nulls with zeroes
df['Column with nulls'] = df['Column with nulls'].fillna(0)

# replace nulls with empty string (blank)
df['Column with nulls'] = df['Column with nulls'].fillna('')

# replace nulls with previous non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='ffill')

# replace nulls with next non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='bfill')

Aggregrating data

Create aggregrated columns grouped by other columns

import pandas as pd

df = df.groupby(['Col1','Col2']).agg(col3_min=('Col3','min'),col3_max=('Col3','max'),col3_sum=('Col3','sum')).reset_index()

Data Clean-up

Rename single column

import pandas as pd

df.rename( columns={'Col1':'Col1_New_Name'}, inplace=True )

preppin-data's People

Contributors

wjsutton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.