In principal Azure Batch can often speed up your long running for loops by several orders of magnitude, but converting your code to run in Azure Batch requires a good bit of configuration and there are a great variety of ways to do it. This package aims to simplify that process dramatically, and institutes some best practices too.
For example, the following code contains a nested for loop with work which can be spread across multiple workers and orchestrated by Azure Batch:
import numpy as np
POWER = 3
SIZE = (10,)
SEEDS = (1, 12, 123, 1234)
out = np.zeros((len(SEEDS),))
for i, seed in enumerate(SEEDS):
np.random.seed(seed)
tmp = np.random.uniform(size=SIZE)
out[i] = sum(np.power(tmp, POWER))
print(sum(out))
However, to leverage azure batch, we'll need to:
- Set up an Azure Batch instance.
- Bundle the code which does the actual work into it's own script.
- Tell Azure Batch about each of the individual bits of work that need to
be done (i.e. the
for i, seed in enumerate(SEEDS)
part) - Collect the results of each task
- Aggregate the intermediate results to produce our final result (i.e the
sum(out)
part)
This module aims to make this process as smooth as possible, and will take some opinions on how to do so in order to reduce the amount of research you need to do and code you need to write to get your job up and running with Azure Batch. Specifically:
- The Azure Batch instance will be set up using the
Azure command line utility
az
. This makes setting up the Azure Batch Instance fast and repeatable, and allows us to authenticate without having to store credentials, which is a security best practice. - The code to be executed by azure batch will be bundled into a docker image. Using docker ensures that our code can be tested locally and then run in Azure Batch in the exact same computing environment without having to write custom scripts to configure the VMs which will run our code.
Azure batch is responsible for (1) loading our code into a computing environment, (2) loading the data that our code requires into the file system of that environment, (3) executing our code, (4) collecting the data produced by our code.
Therefor, we will need to:
- Bundle our code
- Specify where our code will read in required data and write results
- Run our code in Azure Batch
- Collect the results
- Shut down the Azure Batch instance
This module contains constants that for the contract between the controller which tells Azure Batch about the individual tasks that need to be completed, and the worker which executes an individual task.
# ./constants.py
GLOBAL_CONFIG_FILE = "config.pickle"
TASK_INPUTS_FILE = "inputs.pickle"
TASK_OUTPUTS_FILE = "outputs.pickle"
LOCAL_INPUTS_PATTERN = "task_{}_inputs.pickle"
LOCAL_OUTPUTS_PATTERN = "task_{}_outputs.pickle"
Aside: In this example we'll be passing python pickle files between the
controller and worker, because both the worker and the controller are
written in python. While the SuperBatch helper scripts are written in Python,
this package can be used to automate work which gets done in any language.
Docker images exist for a great variety of languages, and using a worker
from another language stack simply requires changing the files written and
read from python .pickle
s to something more language agnostic, such as
csv, yaml, json, or feather (to name a few).
First, we'll bundle our worker into a python script which is responsible for running a single task. Specifically, it reads in the global and iteration specific configuration, does the work, and writes the results to file in the local computing environment.
# ./worker.py
import numpy as np
import joblib
from constants import GLOBAL_CONFIG_FILE, TASK_INPUTS_FILE, TASK_OUTPUTS_FILE
# Read the designated global config and iteration parameter files
global_config = joblib.load(GLOBAL_CONFIG_FILE)
parameters = joblib.load(TASK_INPUTS_FILE)
# Do the actual work
np.random.seed(parameters["seed"])
out = sum( np.power(np.random.uniform(size=global_config["size"]), global_config["power"]))
# Write the results to the designated output file
joblib.dump(out, TASK_OUTPUTS_FILE)
Next, we need to bundle this code so that it can be executed by Azure Batch. We'll use docker to bundle the code and it's dependencies, which requires writing a docker file like the following:
# ./Dockerfile
FROM python:3.7
RUN pip install --upgrade pip \
&& pip install numpy joblib
COPY worker.py .
COPY constants.py .
In the above docker file, we explicitly installed two packages (numpy and
joblib), but if your code requires more packages and you know your code
runs locally, you can call pip freeze
from the command line and copy the
results of that call to a file called requirements.txt
. Then simply copy
the requirements file into the docker image and install the exact versions
of your requirements into the docker image like so:
# ./Dockerfile
FROM python:3.7
COPY requirements.txt .
RUN pip install --upgrade pip \
&& pip install -r requirements.txt
COPY worker.py .
COPY constants.py .
To create a docker image locally, navigate to the project directory and call:
docker build -t myusername/sum-of-powers:v1 .
At this point, the docker image needs to be uploaded to a registry that is
accessible to Azure Batch. If you own that user name (myusername
) at
Docker Hub and are logged in, you can push your code to a
publicly available registry like so:
docker push myusername/sum-of-powers:v1
However, if you wish to keep your code private, you'll need to use a
private registry such as Azure Container Registry which can be created at
the command line via az acr create
or via the web portal.
Once your private Azure Container Registry has been created, you can build, tag, and upload your image like so:
# build the image locally
docker build -t sum-of-powers:v1 .
# login to Azure and the container registry
az login
az acr login --name myownprivateregistry
# tag the local image
docker tag sum-of-powers:v1 myownprivateregistry.azurecr.io/sum-of-powers:v1
# push the image to the private registry
docker push myownprivateregistry.azurecr.io/sum-of-powers:v1
We need to tell azure batch about our tasks, run the tasks, wait for their
completion, download the results. The following script leverages a helper
provided by super_batch
, which can be installed via:
pip install git+https://github.com/jdthorpe/batch-config
Finally, while the following is one of the longer scripts you'll need to write, it can be written by adding the boiler plate code to you're original code containing the for loop containing the tasks that are to be deligated to Azure Batch.
You'll most likely just need to update it with your batch configuration
preferences (e.g. vm size, node counts, etc), the name of your docker
image, and then replace the body of the for
loop from your original code
with the boiler plate code in the for loop below:
# ./controller.py
import os
import datetime
import pathlib
import joblib
import super_batch
from constants import (
GLOBAL_CONFIG_FILE,
TASK_INPUTS_FILE,
TASK_OUTPUTS_FILE,
LOCAL_INPUTS_PATTERN,
LOCAL_OUTPUTS_PATTERN,
)
# CONSTANTS:
# used to generate unique task names:
_TIMESTAMP = datetime.datetime.utcnow().strftime("%Y%m%d%H%M%S")
# a local directory where temporary files will be stored:
BATCH_DIRECTORY = os.path.expanduser("~/temp/super-batch-test")
pathlib.Path(BATCH_DIRECTORY).mkdir(parents=True, exist_ok=True)
# The `$name` of our created resources:
NAME = "superbatchtest"
# INSTANTIATE THE BATCH HELPER CLIENT:
batch_client = super_batch.client(
POOL_ID=NAME,
JOB_ID=NAME + _TIMESTAMP,
POOL_VM_SIZE="STANDARD_A1_v2",
POOL_NODE_COUNT=0,
POOL_LOW_PRIORITY_NODE_COUNT=2,
DELETE_POOL_WHEN_DONE=False,
BLOB_CONTAINER_NAME=NAME,
BATCH_DIRECTORY=BATCH_DIRECTORY,
DOCKER_IMAGE="myusername/sum-of-powers:v1",
COMMAND_LINE="python /worker.py",
)
# BUILD THE GLOBAL PARAMETER RESOURCE
global_parameters = {"power": 3, "size": (10,)}
joblib.dump( global_parameters, os.path.join(BATCH_DIRECTORY, GLOBAL_CONFIG_FILE))
global_parameters_resource = batch_client.build_resource_file(
GLOBAL_CONFIG_FILE, GLOBAL_CONFIG_FILE
)
# BUILD THE BATCH TASKS
SEEDS = (1, 12, 123, 1234)
for i, seed in enumerate(SEEDS):
# CREATE THE ITERATION PAREMTERS RESOURCE
param_file = LOCAL_INPUTS_PATTERN.format(i)
joblib.dump({"seed": seed}, os.path.join(BATCH_DIRECTORY, param_file))
input_resource = batch_client.build_resource_file(param_file, TASK_INPUTS_FILE)
# CREATE AN OUTPUT RESOURCE
output_resource = batch_client.build_output_file(
TASK_OUTPUTS_FILE, LOCAL_OUTPUTS_PATTERN.format(i)
)
# CREATE A TASK
batch_client.add_task(
[input_resource, global_parameters_resource], [output_resource]
)
# RUN THE BATCH JOB
batch_client.run()
# AGGREGATE INTERMEDIATE STATISTICS
out = [None] * len(SEEDS)
for i in range(len(SEEDS)):
fpath = os.path.join(BATCH_DIRECTORY, LOCAL_OUTPUTS_PATTERN.format(i))
out[i] = joblib.load(fpath)
print(sum(out))
Using Azure batch requires an azure account, and we'll demonstrate how to run this module using the azure command line tool.
After logging into the console with az login
(and potentially setting the default
subscription with az account set -s <subscription>
), you'll need to create an azure
resource group into which the batch account is created. In addition, the
azure batch service requires a storage account which is used to keep track of
details of the jobs and tasks.
Although the resource group, storage account and batch account could have different names, for sake of exposition, we'll give them all the same name and locate them in the US West 2 region, like so:
Best Practice Tip: Use a dedicated resource group for your Azure Batch resources. This ensures that you can delete all the azure resources when you are done with a single command (az rg delete or via the portal) in order to avoid unnecessary charges to your Azure subscription when you have finished with your batch jobs.
Examples are for included for the bash, cmd, and powershell:
# parameters
$name = "azurebatchtest"
$location = "westus2"
# create the resources
az group create -l $location -n $name
az storage account create -n $name -g $name
az batch account create -l $location -n $name -g $name --storage-account $name
# parameters
name="azurebatchtest"
location="westus2"
# create the resources
az group create -l $location -n $name
az storage account create -n $name -g $name
az batch account create -l $location -n $name -g $name --storage-account $name
REM parameters
set name=azurebatchtest
set location=westus2
REM create the resources
az group create -l %location% -n %name%
az storage account create -n %name% -g %name%
az batch account create -l %location% -n %name% -g %name% --storage-account %name%
(Aside: since we're using the name
for parameter for the resource group,
storage account and batch account, it must consist of 3-24 lower case,
letters and be unique across all of azure)
In order to run our tasks in Azure batch, we need credentials for each of
the azure resources (e.g. azure batch, storage, and optionally the azure
container registry). The strategy employed by this package is to log in to
the Azure CLI (az login
) and export the necessary credentials as local
variables in the terminal session.
After exporting these credentials, when the controller is executed in the same terminal session, super batch will read in credentials from local environment.
This implements the security best practice of least privilege (as our code
only has the permissions to run a batch job and nothing more) and
compartmentalization (as the credentials only handled by az
and are never
stored outside the terminal session).
Again, examples are for included for the bash, cmd, and powershell:
$env:BATCH_ACCOUNT_NAME = $name
$env:BATCH_ACCOUNT_KEY = (az batch account keys list -n $name -g $name --query primary) -replace '"',''
$env:BATCH_ACCOUNT_ENDPOINT = (az batch account show -n $name -g $name --query accountEndpoint) -replace '"',''
$env:STORAGE_ACCOUNT_KEY = (az storage account keys list -n $name --query [0].value) -replace '"',''
$env:STORAGE_ACCOUNT_CONNECTION_STRING= (az storage account show-connection-string --name $name --query connectionString) -replace '"',''
# Query the required parameters
export BATCH_ACCOUNT_NAME=$name
export BATCH_ACCOUNT_KEY=$(az batch account keys list -n $name -g $name --query primary)
export BATCH_ACCOUNT_ENDPOINT=$(az batch account show -n $name -g $name --query accountEndpoint)
export STORAGE_ACCOUNT_KEY=$(az storage account keys list -n $name --query [0].value)
export STORAGE_ACCOUNT_CONNECTION_STRING=$(az storage account show-connection-string --name $name --query connectionString)
# clean up the quotes
BATCH_ACCOUNT_KEY=$(sed -e 's/^"//' -e 's/"$//' <<<"$BATCH_ACCOUNT_KEY")
BATCH_ACCOUNT_ENDPOINT=$(sed -e 's/^"//' -e 's/"$//' <<<"$BATCH_ACCOUNT_ENDPOINT")
STORAGE_ACCOUNT_KEY=$(sed -e 's/^"//' -e 's/"$//' <<<"$STORAGE_ACCOUNT_KEY")
STORAGE_ACCOUNT_CONNECTION_STRING=$(sed -e 's/^"//' -e 's/"$//' <<<"$STORAGE_ACCOUNT_CONNECTION_STRING")
REM Query the required parameters
set BATCH_ACCOUNT_NAME=%name%
for /f %i in ('az batch account keys list -n %name% -g %name% --query primary') do @set BATCH_ACCOUNT_KEY=%i
for /f %i in ('az storage account keys list -n %name% --query [0].value') do @set STORAGE_ACCOUNT_KEY=%i
for /f %i in ('az batch account show -n %name% -g %rgname% --query accountEndpoint') do @set BATCH_ACCOUNT_ENDPOINT=%i
for /f %i in ('az storage account show-connection-string --name $name --query connectionString') do @set STORAGE_ACCOUNT_CONNECTION_STRING=%i
REM clean up the quotes
set BATCH_ACCOUNT_KEY=%BATCH_ACCOUNT_KEY:"=%
set BATCH_ACCOUNT_ENDPOINT=%BATCH_ACCOUNT_ENDPOINT:"=%
set STORAGE_ACCOUNT_KEY=%STORAGE_ACCOUNT_KEY:"=%
set STORAGE_ACCOUNT_CONNECTION_STRING=%STORAGE_ACCOUNT_CONNECTION_STRING:"=%
*Note, if used within a .bat
file, replace %i
with %%i
above.
If using a private Azure Container Registry, you'll also need to export the following credentials:
$AZURE_CR_NAME = "MyOwnPrivateRegistry"
# only required once:
az acr update -n $AZURE_CR_NAME --admin-enabled true
$REGISTRY_SERVER = (az acr show -n $AZURE_CR_NAME --query loginServer) -replace '"',''
$REGISTRY_USERNAME = (az acr credential show -n $AZURE_CR_NAME --query username) -replace '"',''
$REGISTRY_PASSWORD = (az acr credential show -n $AZURE_CR_NAME --query passwords[0].value) -replace '"',''
export AZURE_CR_NAME="MyOwnPrivateRegistry"
# only required once:
az acr update -n %AZURE_CR_NAME% --admin-enabled true
export REGISTRY_SERVER=$(az acr show -n %AZURE_CR_NAME% --query loginServer)
export REGISTRY_USERNAME=$(az acr credential show -n %AZURE_CR_NAME% --query username)
export REGISTRY_PASSWORD=$(az acr credential show -n %AZURE_CR_NAME% --query passwords[0].value)
set AZURE_CR_NAME=MyOwnPrivateRegistry
REM Only required once:
az acr update -n %AZURE_CR_NAME% --admin-enabled true
for /f %i in ('az acr show -n %AZURE_CR_NAME% --query loginServer') do @set REGISTRY_SERVER=%i
for /f %i in ('az acr credential show -n %AZURE_CR_NAME% --query username') do @set REGISTRY_USERNAME=%i
for /f %i in ('az acr credential show -n %AZURE_CR_NAME% --query passwords[0].value') do @set REGISTRY_PASSWORD=%i
REGISTRY_SERVER=$(sed -e 's/^"//' -e 's/"$//' <<<"$REGISTRY_SERVER")
REGISTRY_USERNAME=$(sed -e 's/^"//' -e 's/"$//' <<<"$REGISTRY_USERNAME")
REGISTRY_PASSWORD=$(sed -e 's/^"//' -e 's/"$//' <<<"$REGISTRY_PASSWORD")
In the following Python script, a Batch configuration is created and the batch job is executed with Azure Batch. Note that the Batch Account and Storage Account details can be provided directly to the BatchConfig, with default values taken from the system environment.
python controller.py
Note that the pool configuration will only be used to create a new pool if
no pool with the id specified by the POOL_ID
exists. If a pool with that id already
exists, these parameters are ignored and will not update the pool
configuration. Changing pool attrributes such as type or quantity of nodes
can be done through the Azure Portal, Azure
Batch Explorer or the Azure
CLI.
Note that if there is an error in your worker code, you can update your
worker by incrementing the version portion of the tag (e.g v1
to v2
),
and then rebuild, publish your docker image (Step x), and updating the
DOCKER_IMAGE
name in your controller.py script.
In order to prevent unexpected charges, the resource group, including all the resources it contains, such as the storge account and batch pools, can be removed with the following command:
az group delete -n $name
az group delete -n %name%