Giter VIP home page Giter VIP logo

mcw-big-data-analytics-and-visualization's Introduction

Big data analytics and visualization

This workshop is archived and no longer being maintained. Content is read-only.

Margie's Travel (MT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves and provide added value to their corporate customers.

They are looking to pilot a web app that their internal customer service agents can use to provide additional valuable information to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecast for the departure hour.

November 2021

Target audience

  • Application developers
  • Data scientists
  • Data engineers
  • Data architects

Abstracts

Workshop

In this workshop, you will deploy a web app using Machine Learning Services to predict travel delays given flight delay data and weather conditions. Plan a bulk data import operation, followed by preparation, such as cleaning and manipulating the data for testing, and training your machine learning model.

At the end of this workshop, you will be better able to build a complete machine learning model in Azure Databricks for predicting if an upcoming flight will experience delays. In addition, you will learn to store the trained model in Azure Machine Learning Model Management, then deploy to Docker containers for scalable on-demand predictions, use Azure Data Factory (ADF) for data movement and operationalizing ML scoring, summarize data with Azure Databricks and Spark SQL, and visualize batch predictions on a map using Power BI.

Whiteboard design session

In this whiteboard design session, you will work with a group to design a solution for ingesting and preparing historic flight delay and weather data and creating, training, and deploying a machine learning model that can predict flight delays.

At the end of this whiteboard design session, you will have learned how to include a web application that obtains weather forecasts from a 3rd party, collects flight information from end-users, and sends that information to the deployed machine learning model for scoring. Part of the exercise will include providing visualizations of historic flight delays and orchestrating the collection and batch scoring of historic and new flight delay data.

Hands-on lab

This hands-on lab is designed to provide exposure to many of Microsoft's transformative line of business applications built using Microsoft big data and advanced analytics.

By the end of the lab, you will be able to show an end-to-end solution, leveraging many of these technologies but not necessarily doing work in every component possible.

Azure services and related products

  • Azure Databricks
  • Azure Data Factory (ADF)
  • Azure Storage
  • Power BI Desktop
  • Azure App Service (optional)

Related references

Help & Support

We welcome feedback and comments from Microsoft SMEs & learning partners who deliver MCWs.

Having trouble?

  • First, verify you have followed all written lab instructions (including the Before the Hands-on lab document).
  • Next, submit an issue with a detailed description of the problem.
  • Do not submit pull requests. Our content authors will make all changes and submit pull requests for approval.

If you are planning to present a workshop, review and test the materials early! We recommend at least two weeks prior.

Please allow 5 - 10 business days for review and resolution of issues.

mcw-big-data-analytics-and-visualization's People

Contributors

abhishekpathania01 avatar aszego avatar codingbandit avatar daronyondem avatar dawnmariedesjardins avatar emilysaeli avatar feaselkl avatar harshil1712 avatar hopero929 avatar joelhulen avatar kant avatar kylebunting avatar mathieu-benoit avatar microsoftopensource avatar mrfalafel avatar msftgits avatar mwasham avatar nansravn avatar saimachi avatar timahenning avatar waltermyersiii avatar zoinertejada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mcw-big-data-analytics-and-visualization's Issues

Screen shot differences between portal and guide

Not a big one but when I follow the guide there are a few instances where the actual Portal Screen does not match the guide images. Not sure if this is due to location or user differences.

1\ To add the container to the blob store my Portal does not show "Containers", it shows "Browse Blobs" (see image attached).

image

2\ When creating the Experimentation Service the screen and instructions say "AI + Cognitive Services" where I am seeing "AI and Machine Learning"

az ml account modelmanagement set - fails

The Error appears to be when running az ml account modelmanagement set -n bigdataholmlexpModelMgmt -g big-data-hol

            {
              "code": "BadRequest",
              "message": "SubDeployment: OperationId=D8F9B9524A432C4D, ProvisioningState=Failed, StatusCode=BadRequest, StatusMessage={\n  \"code\": \"BadRequest\",\n  \"message\": \"An error has occurred in subscription df8428d4-bc25-4601-b458-1c8533ceec0b, resourceGroup: big-data-hol-azureml-fee3b request: OrchestratorProfile has unknown orchestrator version: 1.7.7.\"\n}\n"
            },

I believe this is due to the Kubernetes version 1.7.7 no longer being valid.

c:\HOL\FlightDelays>az ml account modelmanagement set -n bigdataholmlexpModelMgmt -g big-data-hol
The behavior of this command has been altered by the following extension: azure-cli-ml
{
  "created_on": "2018-10-29T18:53:59.357547Z",
  "description": "",
  "id": "/subscriptions/df8428d4-bc25-4601-b458-1c8533ceec0b/resourceGroups/big-data-hol/providers/Microsoft.MachineLearningModelManagement/accounts/bigdataholmlexpModelMgmt",
  "location": "westcentralus",
  "model_management_swagger_location": "https://westcentralus.modelmanagement.azureml.net/api/subscriptions/df8428d4-bc25-4601-b458-1c8533ceec0b/resourceGroups/big-data-hol/accounts/bigdataholmlexpModelMgmt/swagger.json?api-version=2017-09-01-preview",
  "modified_on": "2018-10-29T18:53:59.357547Z",
  "name": "bigdataholmlexpModelMgmt",
  "resource_group": "big-data-hol",
  "sku": {
    "capacity": 1,
    "name": "DevTest"
  },
  "subscription": "df8428d4-bc25-4601-b458-1c8533ceec0b",
  "tags": null,
  "type": "Microsoft.MachineLearningModelManagement/accounts"
}

c:\HOL\FlightDelays>az ml env list
The behavior of this command has been altered by the following extension: azure-cli-ml
[
  {
    "Cluster Name": "flightdelays",
    "Cluster Size": 2,
    "Created On": "2018-10-29T22:21:32.883Z",
    "Current Mode": "cluster",
    "Location": "westcentralus",
    "Provisioning Errors": [
      {
        "error": {
          "code": "BadRequest",
          "details": [
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=C9CA9D729418060B, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=929E883E9C6C3F4E, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=FC5C74D11979371E, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=048F9B1A3F6D131E, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "BadRequest",
              "message": "SubDeployment: OperationId=D8F9B9524A432C4D, ProvisioningState=Failed, StatusCode=BadRequest, StatusMessage={\n  \"code\": \"BadRequest\",\n  \"message\": \"An error has occurred in subscription df8428d4-bc25-4601-b458-1c8533ceec0b, resourceGroup: big-data-hol-azureml-fee3b request: OrchestratorProfile has unknown orchestrator version: 1.7.7.\"\n}\n"
            },
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=AC4B60FE0F7564A7, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "OK",
              "message": "SubDeployment: OperationId=3FE3F9847F8B6FFB, ProvisioningState=Succeeded, StatusCode=OK, StatusMessage=\n"
            },
            {
              "code": "Conflict",
              "message": "SubDeployment: OperationId=08586607550878056481, ProvisioningState=Failed, StatusCode=Conflict, StatusMessage=Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.\n"
            }
          ],
          "message": "Deployment failed with one or more errors. Please look at the inner error details"
        }
      }
    ],
    "Provisioning State": "Failed",
    "Resource Group": "big-data-hol",
    "Subscription": "df8428d4-bc25-4601-b458-1c8533ceec0b"
  }
]
c:\HOL\FlightDelays>az ml env set -n flightdelays -g big-data-hol
The behavior of this command has been altered by the following extension: azure-cli-ml
{
    "Azure-cli-ml Version": null,
    "Error": "Resource with group big-data-hol and name flightdelays cannot be set, as its provisioning state is Failed. Provisioning state succeeded is required."
}

ML environment setup issues

Every one at my table ran into the same problem when setting up the machine learning environment for the Machine Learning Workbench.
The solution we found eventually was to go to the Azure Portal and for your subscription in the "Resource Providers" register the following providers:

Microsoft.ContainerInstance
Microsoft.ContainerRegistry
Microsoft.ContainerService
Microsoft.MachineLearningServices

Note: This happened to users with only 1 Azure subscription as well as to the ones with more than 1 subscription

Abstracts

Abstracts on the ReadMe page do not match the abstracts used in the WDS PPT, trainer/student guides, or the HOLs.

Customer Objections

WDS Trainer guide asks and answers 8 customer objections - PPT only has 7

Exercise 8: Deploy intelligent web app - Issue in the GUI ( Using Existing RG)

Hi @joelhulen

In Exercise 8: Deploy intelligent web app Task 2 Deploy web app from GitHub When we click on provided URL: https://github.com/Microsoft/MCW-Big-data-and-visualization/blob/master/Hands-on%20lab/lab-files/BigDataTravel/README.md to deploy the web, it re-directs to the Azure portal and opens the GUI as show in below image:
image

In this GUI, we have two options for Resource Group - 1. We can select Existing Resource Group 2. We can create a new Resource Group.

Issue: When we're selecting Existing Resource Group even after that it asks for the permissions to create a new Reaource Group.

As, we're using shared subscription pool for this workshop we can't give subscription level permission to users to create new Resource Group. We're proving them a pre-created Resource Group and it works for the lab fine. Only we're having issue while deploying this web app - as it is asking subscription level permissions to create a new resource group.

Could you please take a look and see if this can be fixed ( selecting existing Resource Group can work as expected) ?

I can provide you one lab environment, if needed.

Thanks in Advance!
Abhishek Pathania

"CredSSP encryption oracle remediation" error when RDP to a Windows VM in Azure

Followed the Lab steps to created the DSVM and encountered the below error when RDP to the DSVM.

"CredSSP encryption oracle remediation" error when RDP to a Windows VM in Azure

The below KB describes the issue and workaround.
https://support.microsoft.com/en-us/help/4295591/credssp-encryption-oracle-remediation-error-when-to-rdp-to-azure-vm

I had followed the workaround options and still unable to RDP the DSVM.

We may need to address this issue prior to the workshop or will not able to proceed with the lab.

broken links in CONTRIBUTING.md

hi,

there are some broken links in CONTRIBUTING.md file. Below are 3 broken links which require fixing:

  • Before you create a new issue, please do a search in open issues to see if the issue or feature request has already been filed.
  • Be sure to scan through the most popular feature requests.
  • If you are interested in writing code to fix issues, please see How to Contribute in the wiki.

Hands on lab - step-by-step

When I hover over the images I'm not seeing the alt-text titles - only links popping up at the bottom of the screen.

Step 13 may fail

This step starts successfully but when you run "az ml env show..." command you may get an error due to an unregistered resource provider Microsoft.ContainerService.

To resolve this issue in my subscription I had to run the command "az provider register --namespace Microsoft.ContainerService"

Possible security vulnerability

Please review & fix in scheduled test/fix.

Remediation
Upgrade microsoft.identitymodel.clients.activedirectory to version 5.2.0 or later. For example:

Always verify the validity and compatibility of suggestions with your codebase.

Details
CVE-2019-1258
More information
moderate severity
Vulnerable versions: < 5.2.0
Patched version: 5.2.0
An elevation of privilege vulnerability exists in Azure Active Directory Authentication Library On-Behalf-Of flow, in the way the library caches tokens, aka 'Azure Active Directory Authentication Library Elevation of Privilege Vulnerability'.

Issue in Exercise 4 Task 2: Trigger workflow is failing.

In Exercise 4 Task 2: Trigger Workflow I tried out the lab and found, Trigger is failing with error message: Activity BatchScore failed: Databricks execution failed with error message: . Run page url: https://eastus2.azuredatabricks.net/?o=2487023948884853#job/1/run/1. This URL is pointing to Exercise 4 of Notebook, but according to lab guide we don't need to run this notebook here.

Can someone please look into this on urgent bases. Thanks
Abhishek Pathania

Ex 7 - error while deploying the ARM Template

While deploying the ARM Template in exercise 7, we got an error: Deployment Error - The current api-version doesn't support RBAC users. Please use a newer version instead. . By looking closer to the deployment details we could see that's coming from Microsoft.Web/sites/sourcecontrols.

ML API Key

In Exercise 8, task 2 (deploying the application) the notes refer to "Finally, enter the ML API Primary key (we got that from Azure databricks Notebook #3, remember?) and Weather API information." ... I am struggling to find where this key actually came from?

This is in preparation for running a workshop next week. While not critical or a show stopper the revised content of the workshop makes it much more likely attendees will get all the way to the end and this point will be reached.

Use of Lab VM, as creation of Lab VM is missing but I see its in use In Hands on Lab Step by Steps.

  1. I had raised a issue earlier for Lab VM but author replied we have removed Lab VM in latest update, but I see use of Lab Vm is still there in Exercise 2. Task 2
  • Here are some screenshots:
  1. Exercise 2 Task 2: Install and configure Azure Data Factory Integration Runtime on the Lab VM
    image
  2. Task 3: Configure Azure Data Factory Step 13.
    image

I am confused due to this, can anyone look into this ASAP, we have one workshop scheduled next week. Thanks

Verify links in HOLs

Folder and document names have been updated. Please check your HOL documents for links that use folder names in their path and make sure they are still valid and working.

May 2019 - Scheduled content update

Hello,
This workshop is scheduled for a content update. Please review the workshop and current open issues and give update suggestions.

Please include field request to add Azure Data Lakes as a storage solution to the workshop.

Thanks,
Dawnmarie

Update HTML

May update QC'd and merged, ready for updated HTML

Excercise 2 : 03 - Deploy as Web Service deprecation warnings

Cmd 43: DeprecationWarning: ContainerImage class has been deprecated and will be removed in a future release. Please migrate to using Environments. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

Cmd 45: DeprecationWarning: deploy_from_model has been deprecated and will be removed in a future release. Please migrate to using Environments. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

Download link is broken in HTML lab guide in Exercise1 Task 2

Exercise 1 Task 2: Open Azure Databricks and complete lab nitebook
Step 1. Download: BigDatavVis.dbc , in HTML guide this link is broken. Got blob not found error.

Could anyone please look into this and fix ASAP, we have a workshop scheduled tomorrow we needed this to be fixed ASAP.

Thanks,
Abhishek Pathania

Fake name issue

Adventure Works Travel is not on the list of approved fictitious names. Adventure Works Cycles is, but it doesn't make sense to have a cycle company in the example without a bunch of changes to the travel company example.

Ex 4 - error while running the Azure Data Factory pipeline (no access to storage account)

While running the Azure Data Factory pipeline in Exercise 4 in the Notebook 3 for this cell:

dfDelays = spark.read.csv("wasbs://" + containerName + "@" + accountName + ".blob.core.windows.net/FlightsAndWeather/*/*/FlightsAndWeather.csv",
                    schema=data_schema,
                    sep=",",
                    header=True)

We got this error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container sparkcontainer in account storagex6uezrh5wfar2.blob.core.windows.net not found, and we can&apos;t create it using anoynomous credentials, and no credentials found for them in the configuration.

Even if we have properly setup the Azure Blob storage output step. Is there anything missing in this lab to have this working?

Notes:

  • I needed to update Notebook 3 by replacing STORAGE-ACCOUNT-NAME by my appropriate accountName - it's not explicitly mentioned in this lab to do so, we need to update the lab instructions in that regards.
  • If I change the default access level policy of my blob storage container to public, it's working, it's a temporary workaround but not an ideal solution.

Syntax error in Exercise 2: 01 Data Preparation Cmd 57: f.floor not existing

Hi,
in Exercise 2: 01 Data Preparation Cmd 57, the first line of code has a wrong syntax issue:
df = dfWeather.withColumn('Hour', f.floor(dfWeather['Time']/100)) throws an NameError: name 'f' is not defined. Instead, it must be:
df = dfWeather.withColumn('Hour', F.floor(dfWeather['Time']/100)). Either change the Cmd 57 line 1 or change the Cmd 3 line 5: from pyspark.sql import functions as f

Databricks Notebook Exercise 5 01 Deploy for Batch Scoring fails with Py4JJavaError error

Hi, when I try to run the Databricks Notebook "01 Deploy for Batch Scoring" in Exercise 5, the Cmd 15 fails with a Py4JJavaError error:

`
(1) Spark Jobs
Job 293 View(Stages: 0/0, 1 skipped)

org.apache.spark.SparkException: Job aborted.

Py4JJavaError Traceback (most recent call last)
in ()
----> 1 prediction.write.mode("overwrite").saveAsTable("scoredflights")

/databricks/spark/python/pyspark/sql/readwriter.py in saveAsTable(self, name, format, mode, partitionBy, **options)
773 if format is not None:
774 self.format(format)
--> 775 self._jwrite.saveAsTable(name)
776
777 @SInCE(1.4)

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):`

Before the HOL

Hello,
Please check formatting. I added the TOC to the document but the links inside the table aren't working, not sure what I did wrong.

Content update scheduled

This workshop is scheduled for an update. Please review the contents of this repo and comment on this issue with suggested updates. All suggestions must be added to this issue by EOD 11/9/18.

Databricks cluster version not available.

The directions specify creating a Databricks 3.4 cluster specifically due to the spark version installed on the data science vm; however the lowest cluster version we can create is 3.5 LTS.

The documentation should be updated to reflect either a higher version of the Databricks cluster, and if necessary update the DSVM to a higher version of Spark to match.

Flight datasets names

In the step by step guide, step 9 says "Change the Table Name to "flight_weather_with_airport_code" and select the checkmark for First row is header. Select Create Table."

My guess is the "flight_weather_with_airport_code" should be "flight_delays_with_airport_code" as per the screen shot below it.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.