In this code pattern, we will create a live dashboard view of data, analyze the data and predict accordingly using IBM Streaming Analytics and Watson Studio.
Every retailer tries to sell a product to a customer but not all customers are willing to accept the product, we gather a customer-related dataset and based on that predict whether the customer will buy the product if offered. All of this will happen in real-time on a dashboard so that it helps the retailer to better understand which type of customers to target to get the maximum sales of their product.
We take the use case of a Bank selling the personal loan to its customers and predict whether the customer will accept a loan offered to them or not. We will code the Machine Learning model in a Jupyter notebook in Watson Studio and deploy the model to Watson Machine Learning. Further, we design a Streams Flow in Watson Studio that has an input node which brings in the data from various sources such as REST API calls, Stream events from a Kafka broker, IBM Event Streams, MQTT broker, Watson IoT device platform etc... which is streamed as input to the next node that is the python model invoked from Watson Machine Learning. The predictions and various features affecting the prediction are reviewed as output which is then stored in Cloud Object Storage as a CSV file. A Streaming analytics instance associated with the flow will start running as soon as the flow is deployed and live data and predictions can be monitored on the IBM Streaming Analytics dashboard in real-time.
When you have completed this code pattern, you will understand how to:
- How to deal with real-time data in IBM Streaming Analytics.
- How to design custom Stream Flows to build your own live streaming service.
- How to read any data in Streams FLow with the available nodes.
- How to create a model and deploy it to Watson Machine Learning.
- Create a REST API with python and deploy it to Cloud Foundary service. Calling this API returns a json with random attribute values of the outsource dataset. Thus it simulates real-time data.
- Create a Watson Studio instance and a Watson Machine Learning instance in IBM Cloud.
- Create a new Jupyter notebook in Watson Studio and execute the cells to successfully train, test, get accuracy and deploy the model to Watson Machine Learning.
- Once the Real-time data source and the machine learning model is ready the stream flow can be built. Create a new Streams Flow in Watson Studio.
- Build a flow with input as the REST API, data processing by the deployed Watson Machine Learning model and Save the output to a csv file in Cloud Object Storage.
- Launch the Streaming Analytics dashboard and visualize the data in real-time.
- Clone the repo
- Deploy API
- Create Watson Services
- Run the Jupyter Notebook and Deploy the ML Model
- Create IBM Streaming Analytics service
- Create the Streams Flow in Watson Studio
- Visualize the Streams Dashboard
Clone the live-streaming-of-IoT-data-using-streaming-analytics
repo locally. In a terminal, run:
$ git clone https://github.com/IBM/live-streaming-of-IoT-data-using-streaming-analytics
We’ll be using the file Data/training-testing-data.xlsx
and the folder
flask-API
.
In order to simulate real-time incoming data, we create an API and deploy it to Cloud Foundry.
NOTE: IBM Streaming analytics has the following input sources: Stream events from a Kafka broker, IBM Event Streams, MQTT broker, Watson IoT device platform. If you have knowledge about any of these, then you can skip this step and create your own input block.
- Create a Cloud Foundry service with python runtime and follow the steps.
-
You can give any app name, in our case we have given the app name as
my-api
. -
From the cloned repo, goto
flask-api
directory.
$ cd flask-api/
-
Make sure you have installed IBM Cloud CLI before you proceed.
-
Log in to your IBM Cloud account, and select an API endpoint.
$ ibmcloud login
NOTE: If you have a federated user ID, instead use the following command to log in with your single sign-on ID.
$ ibmcloud login --sso
- Target a Cloud Foundry org and space:
$ ibmcloud target --cf
- From within the
flask-api
directory push your app to IBM Cloud.
$ ibmcloud cf push <YOUR_APP_NAME>
Example: As our app name is
my-api
we use the following command.
$ ibmcloud cf push my-api
- You will see output on your terminal as shown, verify the state is
running
:
Invoking 'cf push'...
Pushing from manifest to org [email protected] / space dev as [email protected]...
...
Waiting for app to start...
...
state since cpu memory disk details
#0 running 2019-09-17T06:22:59Z 19.5% 103.4M of 512M 343.4M of 1G
-
Once the API is deployed and running you can test the API.
-
Goto IBM Cloud Resources and select the Deployed API,
my-api
in our case.
- Inside the
my-api
dashboard, right click on Visit App URL and Copy the link address.
Example link address: https://my-api-xx-yy.eu-gb.mybluemix.net/
NOTE: This API Link is Important, please save it in any notepad since it will be used in step 6.
-
To test the API use any Rest API Client like Postman.
-
Make a GET request to the earlier copied link (https://my-api-xx-yy.eu-gb.mybluemix.net) as shown.
- A Json body is returned in response which is the outsource data that can be sent to the model to get the predictions.
At this point you have successfully deployed an API.
We will be using Watson studio's jupyter notebook to build and deploy the model in Watson Machine Learning service. Also to create a Watson Studio service we require a Cloud Object Storage service hence we will be creating that as well.
- Create Cloud Object Storage service.
- Thats it! your database is created at this point.
- Create Watson Machine Learning service.
- Once the service is created, on the landing page click on
Service credentials
in the left panel and then clickNew Credential
and create credentials for the service as shown.
- Click
Add
to generate credentials.
- The newly created credentials can ve viewed by clicking the View credentials and copy the credentials as shown.
NOTE: Copy the Credentials in some notepad as it will be required in step 4
- Create Watson Studio service.
-
Then click on Get Started.
-
In Watson Studio click
Create a project > Create an empty project
and name itStreaming Analytics Demo
.
- Once the project is created, click on
Add to project
on the top right corner and selectNotebook
in the options.
- In the New Notebook page click on
From URL
and enter name and the URL :https://github.com/IBM/live-streaming-of-IoT-data-using-streaming-analytics/blob/master/notebook/Python_Predictive_Model.ipynb
and click Create Notebook as shown.
At this point Watson Services are all setup. Now its time to code!
In this session we build a Naive Bayes Model for predicting whether a customer will accept personal loan or not. The dataset is taken from Kaggle (https://www.kaggle.com/itsmesunil/bank-loan-modelling).
- Open Jupyter Notebook, under
Files
click onbrowse
and load thetraining-testing-data.xlsx
dataset from theData
directory which was earlier cloned.
- You will now see
training-testing-data.xlsx
on the right side panel. Click on the cell shown in the image below and insert the pandas DataFrame for the dataset as shown.
- Now you will see the credentials and the DataFrame object in the selected cell. Replace the two lines as show.
- Replace
df_data_0
todata
and add the parameter'Data'
to theread_excel
method as shown.
data = pd.read_excel(body, 'Data')
data.head()
- Insert your Watson Machine Learning Credentials which was copied in step 3.2 in the third cell as shown.
- Run the notebook by selecting
Cell
andRun All
as shown.
At this point you have successfully Created an API and Deployed a Predictive model. Now we create the Streams Flow.
- Create a Streaming Analytics service.
- This service will be consumed in next step.
- Back to Watson Studio project that you created, click on
Add to project
again on the top right corner and selectStreams flow
in the options.
- Enter the name as Predictive analytics stream flow Select
From file
and upload thepredictive_stream.stp
file from thestream
directory which you have cloned. Finally select the Streaming Analytics Service that you created in step 5.
NOTE: If you dont see the Streaming Analytics Service listed you can associate it by clicking the provided link. Checkout TROUBLESHOOTING.md for more.
- Before you start the streams flow you need to set a couple of things. In the streams flow dashboard click on
Edit the streams flow
as shown.
- In the Streams Canvas select the first block named
Simulating Real-time Data
to view its properties. In the URL field enter the API URL that you saved in step 2.1.
- Now click on the second block named
Python Model
to view its properties. Select the Python model deployed earlier.
NOTE: If you dont see the deployed model then you need to add the Watson Machine Learning service to your project manually. Checkout TROUBLESHOOTING.md for more.
- Finally click on
Save and run
to build and deploy the streams flow to your IBM Watson Streaming Analytics service.
- The Build and Deploy will take approximately 5 - 10 min so be patient.
- You can see the realtime data flow in the Streams Flow.
NOTE: (Optional) If you are interested in understanding the building blocks of the streams flow in detail, refer to the DETAILED.md which demonstrates the streams flow in depth.
-
Once the status is running you can visualize the incoming data and the predicted data in IBM Watson Streaming Analytics.
-
Goto IBM Cloud Resources, under Services select the Streaming Analytics service.
- Click on Launch to open the Streams Dashboard.
- You can see the streams flow that we built in the Watson Studio.
-
For the use case that we have considered, we will be monitoring the predictions for the customers who will take personal loan and the Factors influencing the predictions.
-
Based on the dataset we have understood that the following attributes affect the prediction:
- Income
- CCAvg
- Mortgage
- SecuritiesAccount Hence we will be adding widgets to monitor these attributes along with the Prediction attribute.
-
To add widget hover your over the arrow mark from Python Model to Debug and click Create Dashboard View as shown.
- In Create Data Visualisation View enter View Name:
Monitoring Data
and click OK.
- You can now see Monitoring Data table in your dashboard. Click on create Bar Graph Button in the table as shown.
- Enter the Chart Name:
Predictor Importance
and then click on ** Categories** tab and select SecuritiesAccount, Mortgage, Income & CCAvg.
- You can now see Predictor Importance bar graph in the dashboard.
- Similarly you can create a line graph for the same attributes as shown.
- You can now see Predictor Importance line graph in the dashboard.
- Now add the Predictions Bar Graph and Line Graph in the similar way as shown.
- Finally you can see the rich dashboard with predictor importance attributes and the predictions.
Conclusion: Data is growing fast in volume, variety, and complexity. Everyday, we create 2.5 quintillion bytes of data! Traditional analytic solutions are not able to fully/unlock the potential value of that data.
In a streams flow, you can access and analyze massive amounts of changing data as it is created. Regardless of whether the data is structured or unstructured, you can leverage data at scale to drive real-time analytics for up-to-the-minute business decisions.
- Commonly faced challenges are listed in TROUBLESHOOTING.md.
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.