Giter VIP home page Giter VIP logo

apachesparkexplorationfabric's Introduction

Microsoft Fabric's Spark Data Exploration

This repository provides a code snippet to perform quick data exploration using Microsoft Fabric's Spark. The code demonstrates how to ingest data, perform basic analysis, and visualize the results. Follow the instructions below to set up and run the code.

Setup

  1. Read More About Microsoft Fabric and Apache Spark Here:

  2. Clone this repository:

    git clone https://github.com/gbengaayelab/ApacheSparkExplorationFabric.git

Code

Open Notebook 1 and run the following code:

# Welcome to your new notebook
# Type here in the cell editor to add code!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyspark.sql.functions as F
import pyspark.sql.types as T

df = spark.sql("SELECT * FROM lakehouse_1_lab.sales LIMIT 1000")

df.columns

display(df.describe())

display(df.summary())

# Convert the 'OrderDate' column to a date type
df = df.withColumn('OrderDate', F.to_date('OrderDate').cast(T.DateType()))

# Group the data by 'OrderDate' and calculate the sum of 'TaxAmount' and 'UnitPrice'
grouped_df = df.groupBy('OrderDate').agg(F.sum('TaxAmount').alias('TaxAmount'), F.sum('UnitPrice').alias('Price'))

# Convert the DataFrame to Pandas DataFrame for plotting
pandas_df = grouped_df.toPandas()

# Plotting the data
plt.plot(pandas_df['OrderDate'], pandas_df['TaxAmount'], label='Tax')
plt.plot(pandas_df['OrderDate'], pandas_df['Price'], label='Price')
plt.legend()
plt.title('Price and Tax Amount Relationship to Order Date')
plt.ylabel('TaxAmount and Price')
plt.xlabel('Order Date')
plt.show()

pandas_df.plot(x='OrderDate', y='TaxAmount', label='Tax', kind='line')
plt.legend()
plt.title('Tax Amount Relationship to Order Date')
plt.ylabel('TaxAmount')
plt.xlabel('Order Date')
plt.show()

pandas_df.plot(x='OrderDate', y='Price', label='Price', kind='line')
plt.legend()
plt.title('Price Relationship to Order Date')
plt.ylabel('Price')
plt.xlabel('Order Date')
plt.show()

Usage

  1. Ensure you have set up Microsoft Fabric and have the necessary credentials.

  2. Replace lakehouse_1_lab.sales in the code with your own dataset or table name OR better still, create a 'lakehouse_1_lab Fabric Lakehouse.

  3. Run the code to perform data exploration and generate visualizations based on your data.

Feedback and Support

If you encounter any issues or have any feedback, please open an issue in this repository.

For general support and questions, you can read the offcial Microsoft Fabric Apache Spark Documentation for assistance here: https://learn.microsoft.com/en-us/training/modules/use-apache-spark-work-files-lakehouse/4-dataframe.


Happy exploring and analyzing data with Microsoft Fabric's Spark!


Feel free to copy and paste the above Markdown content into your README file on GitHub. Let me know if there's anything else I can help you with!

apachesparkexplorationfabric's People

Contributors

gbengaayelab avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.