Giter VIP home page Giter VIP logo

Instacart-Python-Analysis

A python based prescriptive analysis of the Instacart database for the Career Foundry Data Analysis course using Jupyter Notebook.

Project Overview

Instacart, an online grocery store that operates through an app is considering a targeted marketing strategy. The stakeholders want to know more information about their sales patterns, especially the variety of customers in their database along with their purchasing behaviours.

They have requested an initial exploratory analysis of some of their data in order to derive insights and suggest strategies for better segmentation based on the provided criteria.

To address this objective and the key questions below the Instacart data sets were imported into Jupiter Notebook and analysed using Python.

Key Questions

  • What are the busiest days of the week and hours of the day?
  • Are there particular times of the day when people spend the most money?
  • Can the data be clustered into simpler price range groupings to help direct their efforts?
  • Are there certain types of products that are more popular than others?
  • What’s the distribution among users in regards to their brand loyalty?
  • Are there differences in ordering habits based on a customer’s loyalty status?
  • Are there differences in ordering habits based on a customer’s region?
  • Is there a connection between age and family status in terms of ordering habits?
  • What different classifications does the demographic information suggest?
  • What differences can you find in ordering habits of different customer profiles?

Data Analysis Tasks

  • Data cleaning, wrangling and consistency checks
  • Consider data ethics when dealing with customer information
  • Merge all data sets into a single data set (over 32M records)
  • Conduct exploratory data analysis
  • Derive new variables and create flags (grouping and aggregating the data) to help with the analysis
  • Create visualisations communicating insights for stakeholders

Tools

The full analysis was carried out using the following Python libraries in Jupyter Notebook:

  • pandas | numpy | os | matplotlib pyplot | seaborn | scipy | plotly express

The results were then exported or copied into Excel.

Data

The Instacart project brief and original data sets were provided by Career Foundry.

The following csv files are in folder 02_Data/02_1_Original_Data

And the two larger csv files can be downloaded

Project Findings

The Instacart Population Flow Diagram

Instacart Population Flow Diagram The grey boxes in the first row represent the four original data sets that were eventually merged into the final dataset Orders_products_all.
The second row of coloured boxes represent the data sets after the cleaning process.
The final row represents the data sets after each of the merges.

Report and Datasets Sent to Client

Also available in pdf format

The two client data files - Analysis_all_customers and Analysis_active_customers - that would have been in folder 05_Sent_to_Client, but are too large (+8GB) can been downloaded along with the interactive treemap visualisations:

Project Recommendations

Recommendations_Page1.png Recommendations_Page2.png

Elsa Ekevall's Projects

gameco-excel-analysis icon gameco-excel-analysis

Descriptive analysis using Microsoft Excel to interrogate the VGChartz data set for the Career Foundry Data Analysis Immersion course.

instacart-python-analysis icon instacart-python-analysis

A python based prescriptive analysis of the Instacart database for the Career Foundry Data Analysis Immersion course using Jupiter Notebook.

lc-rhein-main-hackathon icon lc-rhein-main-hackathon

GitHub repository for members participating in the March 2023 Deustsche Bahn Bikesharing Data hackathon

rockbuster-sql-analysis icon rockbuster-sql-analysis

An SQL based descriptive analysis of the Rockbuster database for the CareerFoundry Data Analysis Immersion course using PostgreSQL.

uk-household-food-shopping icon uk-household-food-shopping

Advanced analytical analysis and dashboard design - a machine learning analysis using Python and Tableau to analyse income and spending on food data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.