Live streaming and analysis of data with ML Model predictions using IBM Streaming Analytics

In this code pattern, we will create a live dashboard view of data, analyze the data and predict accordingly using IBM Streaming Analytics and Watson Studio.

Every retailer tries to sell a product to a customer but not all customers are willing to accept the product, we gather a customer-related dataset and based on that predict whether the customer will buy the product if offered. All of this will happen in real-time on a dashboard so that it helps the retailer to better understand which type of customers to target to get the maximum sales of their product.

We take the use case of a Bank selling the personal loan to its customers and predict whether the customer will accept a loan offered to them or not. We will code the Machine Learning model in a Jupyter notebook in Watson Studio and deploy the model to Watson Machine Learning. Further, we design a Streams Flow in Watson Studio that has an input node which brings in the data from various sources such as REST API calls, Stream events from a Kafka broker, IBM Event Streams, MQTT broker, Watson IoT device platform etc... which is streamed as input to the next node that is the python model invoked from Watson Machine Learning. The predictions and various features affecting the prediction are reviewed as output which is then stored in Cloud Object Storage as a CSV file. A Streaming analytics instance associated with the flow will start running as soon as the flow is deployed and live data and predictions can be monitored on the IBM Streaming Analytics dashboard in real-time.

When you have completed this code pattern, you will understand how to:

How to deal with real-time data in IBM Streaming Analytics.
How to design custom Stream Flows to build your own live streaming service.
How to read any data in Streams FLow with the available nodes.
How to create a model and deploy it to Watson Machine Learning.

Flow

Create a REST API with python and deploy it to Cloud Foundary service. Calling this API returns a json with random attribute values of the outsource dataset. Thus it simulates real-time data.
Create a Watson Studio instance and a Watson Machine Learning instance in IBM Cloud.
Create a new Jupyter notebook in Watson Studio and execute the cells to successfully train, test, get accuracy and deploy the model to Watson Machine Learning.
Once the Real-time data source and the machine learning model is ready the stream flow can be built. Create a new Streams Flow in Watson Studio.
Build a flow with input as the REST API, data processing by the deployed Watson Machine Learning model and Save the output to a csv file in Cloud Object Storage.
Launch the Streaming Analytics dashboard and visualize the data in real-time.

Pre-requisites

IBM Cloud Account

Steps

Clone the repo
Deploy API
Create Watson Services
Run the Jupyter Notebook and Deploy the ML Model
Create IBM Streaming Analytics service
Create the Streams Flow in Watson Studio
Visualize the Streams Dashboard

1. Clone the repo

Clone the live-streaming-of-IoT-data-using-streaming-analytics repo locally. In a terminal, run:

$ git clone https://github.com/IBM/live-streaming-of-IoT-data-using-streaming-analytics

We’ll be using the file Data/training-testing-data.xlsx and the folder flask-API.

2. Deploy and Test the API

In order to simulate real-time incoming data, we create an API and deploy it to Cloud Foundry.

NOTE: IBM Streaming analytics has the following input sources: Stream events from a Kafka broker, IBM Event Streams, MQTT broker, Watson IoT device platform. If you have knowledge about any of these, then you can skip this step and create your own input block.

2.1. Deploy the API and get API URL

Create a Cloud Foundry service with python runtime and follow the steps.

You can give any app name, in our case we have given the app name as my-api.
From the cloned repo, goto flask-api directory.

$ cd flask-api/

Make sure you have installed IBM Cloud CLI before you proceed.
Log in to your IBM Cloud account, and select an API endpoint.

$ ibmcloud login

NOTE: If you have a federated user ID, instead use the following command to log in with your single sign-on ID.

$ ibmcloud login --sso

Target a Cloud Foundry org and space:

$ ibmcloud target --cf

From within the flask-api directory push your app to IBM Cloud.

$ ibmcloud cf push <YOUR_APP_NAME>

Example: As our app name is my-api we use the following command.

$ ibmcloud cf push my-api

You will see output on your terminal as shown, verify the state is running:

Invoking 'cf push'...

Pushing from manifest to org [email protected] / space dev as [email protected]...

...

Waiting for app to start...

...

  state     since                  cpu     memory           disk           details
#0   running   2019-09-17T06:22:59Z   19.5%   103.4M of 512M   343.4M of 1G

Once the API is deployed and running you can test the API.
Goto IBM Cloud Resources and select the Deployed API, my-api in our case.

Inside the my-api dashboard, right click on Visit App URL and Copy the link address.

Example link address: https://my-api-xx-yy.eu-gb.mybluemix.net/

NOTE: This API Link is Important, please save it in any notepad since it will be used in step 6.

2.2. Test the API

To test the API use any Rest API Client like Postman.
Make a GET request to the earlier copied link (https://my-api-xx-yy.eu-gb.mybluemix.net) as shown.

A Json body is returned in response which is the outsource data that can be sent to the model to get the predictions.

At this point you have successfully deployed an API.

3. Create Watson services

We will be using Watson studio's jupyter notebook to build and deploy the model in Watson Machine Learning service. Also to create a Watson Studio service we require a Cloud Object Storage service hence we will be creating that as well.

3.1. Create Cloud Object Storage Service

Create Cloud Object Storage service.

Thats it! your database is created at this point.

3.2. Create the Watson Machine Learning Service

Create Watson Machine Learning service.

Once the service is created, on the landing page click on Service credentials in the left panel and then click New Credential and create credentials for the service as shown.

Click Add to generate credentials.

The newly created credentials can ve viewed by clicking the View credentials and copy the credentials as shown.

NOTE: Copy the Credentials in some notepad as it will be required in step 4

3.3. Create the Watson Studio Service

Create Watson Studio service.

Then click on Get Started.
In Watson Studio click Create a project > Create an empty project and name it Streaming Analytics Demo.

Once the project is created, click on Add to project on the top right corner and select Notebook in the options.

In the New Notebook page click on From URL and enter name and the URL : https://github.com/IBM/live-streaming-of-IoT-data-using-streaming-analytics/blob/master/notebook/Python_Predictive_Model.ipynb and click Create Notebook as shown.

At this point Watson Services are all setup. Now its time to code!

4. Run the Jupyter Notebook and Deploy the ML Model

In this session we build a Naive Bayes Model for predicting whether a customer will accept personal loan or not. The dataset is taken from Kaggle (https://www.kaggle.com/itsmesunil/bank-loan-modelling).

Open Jupyter Notebook, under Files click on browse and load the training-testing-data.xlsx dataset from the Data directory which was earlier cloned.

You will now see training-testing-data.xlsx on the right side panel. Click on the cell shown in the image below and insert the pandas DataFrame for the dataset as shown.

Now you will see the credentials and the DataFrame object in the selected cell. Replace the two lines as show.

Replace df_data_0 to data and add the parameter 'Data' to the read_excel method as shown.

data = pd.read_excel(body, 'Data')
data.head()

Insert your Watson Machine Learning Credentials which was copied in step 3.2 in the third cell as shown.

Run the notebook by selecting Cell and Run All as shown.

At this point you have successfully Created an API and Deployed a Predictive model. Now we create the Streams Flow.

5. Create IBM Streaming Analytics service

Create a Streaming Analytics service.

This service will be consumed in next step.

6. Create the Streams Flow in Watson Studio

Back to Watson Studio project that you created, click on Add to project again on the top right corner and select Streams flow in the options.

Enter the name as Predictive analytics stream flow Select From file and upload the predictive_stream.stp file from the stream directory which you have cloned. Finally select the Streaming Analytics Service that you created in step 5.

NOTE: If you dont see the Streaming Analytics Service listed you can associate it by clicking the provided link. Checkout TROUBLESHOOTING.md for more.

Before you start the streams flow you need to set a couple of things. In the streams flow dashboard click on Edit the streams flow as shown.

In the Streams Canvas select the first block named Simulating Real-time Data to view its properties. In the URL field enter the API URL that you saved in step 2.1.

Now click on the second block named Python Model to view its properties. Select the Python model deployed earlier.

NOTE: If you dont see the deployed model then you need to add the Watson Machine Learning service to your project manually. Checkout TROUBLESHOOTING.md for more.

Finally click on Save and run to build and deploy the streams flow to your IBM Watson Streaming Analytics service.

The Build and Deploy will take approximately 5 - 10 min so be patient.

You can see the realtime data flow in the Streams Flow.

NOTE: (Optional) If you are interested in understanding the building blocks of the streams flow in detail, refer to the DETAILED.md which demonstrates the streams flow in depth.

7. Visualize the Streams Dashboard

Once the status is running you can visualize the incoming data and the predicted data in IBM Watson Streaming Analytics.
Goto IBM Cloud Resources, under Services select the Streaming Analytics service.

Click on Launch to open the Streams Dashboard.

You can see the streams flow that we built in the Watson Studio.

For the use case that we have considered, we will be monitoring the predictions for the customers who will take personal loan and the Factors influencing the predictions.
Based on the dataset we have understood that the following attributes affect the prediction:
- Income
- CCAvg
- Mortgage
- SecuritiesAccount Hence we will be adding widgets to monitor these attributes along with the Prediction attribute.
To add widget hover your over the arrow mark from Python Model to Debug and click Create Dashboard View as shown.

In Create Data Visualisation View enter View Name: Monitoring Data and click OK.

You can now see Monitoring Data table in your dashboard. Click on create Bar Graph Button in the table as shown.

Enter the Chart Name: Predictor Importance and then click on ** Categories** tab and select SecuritiesAccount, Mortgage, Income & CCAvg.

You can now see Predictor Importance bar graph in the dashboard.

Similarly you can create a line graph for the same attributes as shown.

You can now see Predictor Importance line graph in the dashboard.

Now add the Predictions Bar Graph and Line Graph in the similar way as shown.

Finally you can see the rich dashboard with predictor importance attributes and the predictions.

Conclusion: Data is growing fast in volume, variety, and complexity. Everyday, we create 2.5 quintillion bytes of data! Traditional analytic solutions are not able to fully/unlock the potential value of that data.

In a streams flow, you can access and analyze massive amounts of changing data as it is created. Regardless of whether the data is structured or unstructured, you can leverage data at scale to drive real-time analytics for up-to-the-minute business decisions.

Sample output

Troubleshooting

Commonly faced challenges are listed in TROUBLESHOOTING.md.

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

bhaskers-blu-org1 / live-streaming-of-iot-data-using-streaming-analytics Goto Github PK

live-streaming-of-iot-data-using-streaming-analytics's Introduction