Giter VIP home page Giter VIP logo

mldotnet-real-time-data-streaming-workshop's Introduction

Build Status

Introduction

Working with real-time data streams, and deriving real-time insights using custom machine learning models have become increasingly important for many organizations. There are numerous real-time data platforms currently available (e.g. Kafka, Hadoop, Spark), but the one we will be focusing on in this workshop in particular is Azure Stream Analytics. In addition to diving in to Azure Stream Analytics, we will also explore the open-source cross-plattform library ML.NET, which we will use to build our custom machine learning models and look at an alternative solution using Azure Machine Learning Service.

Getting Started

Expand for instructions to set up prerequisites

  1. Download the .NET Core SDK
    1. Go to the following page to download the SDK
    2. Select the correct tab for your operating system (e.g. Windows, Linux or Mac)
    3. Click on the Build Apps download option
    4. Open the installer once the download is complete and follow provided instructions

  2. Install VS Code
    1. Go to the following page to download the VS Code
    2. Select the correct installation for your operating system (e.g. Windows, Linux or Mac)
    3. Open the installer once the download is complete and follow provided instructions
    4. Open VS Code once the installation is complete

  3. Install the C# Extension
    1. In VS Code, select View -> Extensions
    2. Search for C#
    3. Click Install

  4. Install the Azure Functions Extension
    1. In VS Code, select View -> Extensions
    2. Search for Azure Function
    3. Click Install

  5. Install the ML.NET CLI
    1. In VS Code, select Terminal -> New Terminal to open a new terminal window
    2. In the terminal, enter dotnet tool install -g mlnet and hit enter

  6. Clone the repository
    1. In VS Code, select Terminal -> New Terminal to open a new terminal window
    2. In the terminal, enter cd C:\ and hit enter
    3. In the terminal, enter git clone https://github.com/aslotte/mldotnet-real-time-data-streaming-workshop.git and hit enter to clone the repository to the C: drive.
      Note: Feel free to clone the repository elsewhere, just make sure to adjust the path in instructions to follow. Furthermore, the repository is also available on provided USB memory sticks, in case the internet bandwidth is not sufficient.

  7. Download the data
    1. There are three (3) ways to get the data we will be working with. Please choose the most convenient for you:
      1. Download the data from provided USB Memory sticks (download the .zip file and extract it on your local computer)
      2. Download the data from here
      3. Download the data from Kaggle (requires free account)

  8. Create a free Azure subscription
    1. Go to Azure to create a free trial account
    2. Enter your contact information and click Next
    3. Fill in your credit card information and click Next.
      Note that this is only used to verify your identify, you'll not be charged.
    4. Check the checkbox to agree to terms and conditions and click Sign-up

  9. Create an Outlook e-mail
    1. Go to Outlook to create a free Outlook account
    2. Follow the provided instructions

  10. Download Azure Storage Explorer (required for part 3)
    1. Download Azure Storage Explorer. Make sure to select the correct OS.
    2. Open the installer
    3. Follow the provided instructions

    Note to macOS users: If the web site downloads an .exe file even after selecting the macOS option please, download the macOS version from here.

Problem Outline

As a financial institution, detecting fraud is imperative to ensure safe and continuous operations for the bank and its customers.

In this workshop we will be looking at detecting fradulent transactions in real-time. We will be training our model based on publicly available data from Kaggle and integrating this custom machine learning model in a real-time data pipeline, supported by Azure Stream Analytics.

Outline of Learning Objectives

  • Part 1: Machine Learning in .NET
    • Introduction to Machine Learning and ML.NET
    • Explore the data with Jupyter Notebooks and Pandas
    • Train a machine learning model using ML.NET
    • Train a machine learning model using AutoML CLI
  • Part 2: Setting up a real-time data streaming pipeline
    • Introduction to Stream Processing and Azure Stream Analytics
    • Introduction to Azure Resource Management (ARM) Templates
  • Part 3: ML.NET + Azure DevOps = MLOps
    • Introduction to MLOps
    • Set up a CI/CD pipeline for model training
  • Part 4: ML.NET + Jupyter
    • Introduction to ML.NET in Jupyter Notebooks
    • Train a machine learning model using ML.NET and Jupyter Notebooks
  • Part 5: Machine Learning in Azure
    • Introduction to Azure Machine Learning Service
    • Train a machine learning model using Azure ML Visual Interface
    • Train a machine learning using Azure AutoML
    • Train a machine learning using Jupyter Notebooks and Scikit Learn
  • Part 6: Consume ONNX Model from Jupyter Notebook in ML.NET
    • Consume an exported ONNX model in ML.NET, which was trained with Scikit Learn (Python)

Reminder: Remember to remove your resource group once finished with this workshop, not to incur additional costs.

Solution Architecture

A Real-Time Data Pipeline with ML.NET

Real-Time Data Pipeline with ML.NET

A Real-Time Data Pipeline with Azure Machine Learning Studio

Real-Time Data Pipeline with Azure ML

Assumptions

This workshop is currently valid for ML.NET v1.4.0

Additional Resources

mldotnet-real-time-data-streaming-workshop's People

Contributors

aslotte avatar lastlink avatar marcelo-barbosa avatar patleong avatar triplee78 avatar waleedershad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mldotnet-real-time-data-streaming-workshop's Issues

Workshop | Add agenda

Description

Add an agenda for the 3 hour workshop including what topics we will be covering and when, including breaks.

Feedback from workshop

Part 1: ML.NET
- For repo on memory stick, fix data.csv issue and test locally

  • General
    • Advise not to start off with the finished solution

Workshop | Create ppt presentation

Description

Combine the ML.NET and Azure Stream Analytics presentation in to a workshop presentation that lays a good foundation for the hands-on work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.