This project is part of the Udacity Azure ML Nanodegree. In this project, our goal is to determine the next set of potential customers to which to market. The project involves working with the Bank Marketing Dataset. To accomplish this task we are building a model then create consumable pipelines. It includes using Azure AutoML to select best model and most suitable data pre-processing steps, deploying it and then consuming the model through its API endpoints. The project also includes creating a fully operational Machine Learning pipeline, publishing it as well as consuming it using Azure Python SDK.
The architecural diagram explains the flow of the project in a pictorial represenation. We start with creating the Bankmarketing dataset from a CSV file. We then create an Azure AutoML model using ML Studio and deploy the best model from the run. After enabling authentication, we enable Application Insights to maintain the logs of our model. We then use swagger to document model API structure, and thereafter consume the endpoints. We can also use Apache Benchmark tool for benchmarking the response, though it is not mandatory. Lastly, we publish a ML Pipeline using Azure Python SDK.
Upload and register the bank marketing dataset to Azure ML studio datasets.
Then, using Azure AutoML service, create an AutoML model that has to be deployed later. Use an existing compute, or create a new one. Increase the minimum number of nodes available to 1 and run the experiment.
After the experiment completed, AutoML will summarize the task with the best performance, in our case the accuracy, and the best model, in our case, the best perfoming model is is VotingEnsemble model. As the name suggested, VotingEnsemble model is a second level model built on top of other simpler models. It ran last and gives marginal better result than simple models. Deploy this model from ML Studio. Remember to choose ACL as the hosting option and choose enable authentication.
Write the code to enable application insights in the logs.py file and run the file. Observe the logs it provides of the deployed model.
You can see that the Application Insights which was earlier set to false, is now enabled.
Next we are going to use the swagger tool, which is a documentation and API structuring tool. For this we need Docker to be installed on the system as we need to download the container. Once docker is installed, run serve.py and swagger.sh file as shown. The swagger will be hosted locally on the port you have used in the above files.
Since swagger is running locally now, we can see our model name as well as the API methods.
Here you can see model name and API methods GET and POST.
The endpoint.py script sends a request to the deployed model via the POST API method and receives data in the form of JSON file which is displays. We have to set the scoring URI and the authentication key first.
The pipeline created through Azure Python SDK is now running in the portal.
This is the published pipeline's endpoint
The published pipeline status is ACTIVE now and the REST endpoint is also available.
The pipeline runs have been completed.
Click here to see the ScreenCast.