Giter VIP home page Giter VIP logo

ghas-results / analyze-customer-data-spark-pixiedust Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ibm/analyze-customer-data-spark-pixiedust

0.0 0.0 0.0 4.61 MB

An introductory IBM Developer Code Pattern on how to use PixieDust to visualize customer data

Home Page: https://developer.ibm.com/patterns/analyze-historical-shopping-data-spark-pixiedust-jupyter-notebook/

License: Apache License 2.0

Jupyter Notebook 100.00%
ibm

analyze-customer-data-spark-pixiedust's Introduction

Analyze customer data using Jupyter notebooks, Apache Spark, and PixieDust

In this code pattern historical shopping data is analyzed with Spark and PixieDust. The data is loaded, cleaned and then analyzed by creating various charts and maps.

When you have completed this code patterns, you will understand how to:

The intended audience is anyone interested in quickly analyzing data in a Jupyter notebook.

Flow

arch

  1. Log in to IBM Watson Studio
  2. Load the provided notebook into Watson Studio
  3. Load the customer data in the notebook
  4. Transform the data with Apache Spark
  5. Create charts and maps with PixieDust

About the data

  • x19_income_select.csv: Household income statistics for many categories of income, including wages, interest, social security, public assistance, and retirement. Compiled at the zip code geography level by the United States Census Bureau. Available as a data set on Watson Studio
  • customers_orders1_opt.csv: Fictitious customer demographics and sales data. Published by IBM. Available as a data set on Watson Studio

Included Components

  • IBM Watson Studio: a suite of tools and a collaborative environment for data scientists, developers and domain experts
  • PixieDust: Open source Python package, providing support for Javascript/Node.js code.

Steps

  1. Create a project
  2. Create a notebook
  3. Load customer data in the notebook
  4. Transform the data with Apache Spark
  5. Create charts and maps with PixieDust

1. Create a project and add the Spark services

  • Log into IBM's Watson Studio. Once in, you'll land on the dashboard.

  • Create a new project by clicking + New project and choosing Data Science:

    studio project

  • Enter a name for the project name and click Create.

  • NOTE: By creating a project in Watson Studio a free tier Object Storage service and Watson Machine Learning service will be created in your IBM Cloud account. Select the Free storage type to avoid fees.

    studio-new-project

  • Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the Assets and Settings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.

    studio-project-dashboard

2. Create a notebook

3. Load customer data in the notebook

  • Run the cells one at a time. Select the first cell and press the (โ–บ) Run button to start stepping through the notebook.

  • Load the data set customers_orders1_opt.csv into the notebook.

4. Transform the data with Apache Spark

Before analyzing the data, it needs to be cleaned and formatted. This can be done with a few pyspark commands:

  • Select only the columns you are interested in with df.select()

  • Convert the AGE column to a numeric data type so you can run calculations on customer age with a user defined function (udf).

  • Derive the gender information for each customer based on the salutation and rename the GenderCode column to GENDER with a second udf.

5. Create charts and maps with PixieDust

The data can now be explored with PixieDust:

  • With display() explore the data in a table.

  • Then click on the below button to create one of the charts in the list.

notebook

  • Drag and drop the variables you want to display into the Keys and Values fields. Select the aggregation from the drop-down menu and click OK.

  • From the menu on the right of the chart you can select which renderer you want to use, where each one of them visualises the data in a different way. Other options are clustering by a variable, the size and orientation of the chart and the display of a legend.

  • Below are two examples of a bar chart and a map created in the notebook.

Histogram notebook

Map notebook

Related links

Learn more

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

analyze-customer-data-spark-pixiedust's People

Contributors

stevemar avatar ptitzler avatar margrietgroenendijk avatar margriet avatar stevemart avatar ljbennett62 avatar tqtran7 avatar imgbotapp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.