Deploy FastAI Trained PyTorch Model in TorchServe and Host in Amazon SageMaker Inference Endpoint

Home Page: https://aws.amazon.com/blogs/opensource/deploy-fast-ai-trained-pytorch-model-in-torchserve-and-host-in-amazon-sagemaker-inference-endpoint/

License: MIT No Attribution

Dockerfile 2.90% Shell 0.39% Python 11.74% Jupyter Notebook 84.97%

fastai torchserve sagemaker-deployment deep-learning self-driving-car artificial-intelligence pytorch

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's Introduction

Deploy FastAI Trained PyTorch Model in TorchServe and Host in Amazon SageMaker Inference Endpoint

Deploy FastAI Trained PyTorch Model in TorchServe and Host in Amazon SageMaker Inference Endpoint

Introduction

Over the past few years, FastAI has become one of the most cutting-edge open-source deep learning framework and the go-to choice for many machine learning use cases based on PyTorch. It not only democratized deep learning and made it approachable to the general audiences, but also set as a role model on how scientific software shall be engineered, especially in Python programming. Currently, however, to deploy a FastAI model to production environment often involves setting up and self-maintaining a customized inference solution, e.g. with Flask, which is time-consuming and distracting to manage and maintain issues like security, load balancing, services orchestration, etc.

Recently, AWS developed TorchServe in partnership with Facebook, which is a flexible and easy-to-use open-source tool for serving PyTorch models. It removes the heavy lifting of deploying and serving PyTorch models with Kubernetes, and AWS and Facebook will maintain and continue contributing to TorchServe along with the broader PyTorch community. With TorchServe, many features are out-of-the-box and they provide full flexibility of deploying trained PyTorch models at scale so that a trained model can go to production deployment with few extra lines of code.

Meanwhile, Amazon SageMaker endpoint has been a fully managed service that allows users to make real-time inferences via a REST API, and save Data Scientists and Machine Learning Engineers from managing their own server instances, load balancing, fault-tolerance, auto-scaling and model monitoring, etc. Amazon SageMaker endpoint provides different type of instances suitable for different tasks, including ones with GPU(s), which supports industry level machine learning inference and graphics-intensive applications while being cost-effective.

In this repository we demonstrate how to deploy a FastAI trained PyTorch model in TorchServe eager mode and host it in Amazon SageMaker Inference endpoint.

Getting Started with A FastAI Model

In this section we train a FastAI model that can solve a real-world problem with performance meeting the use-case specification. As an example, we focus on a Scene Segmentation use case from self-driving car.

Installation

The first step is to install FastAI package, which is covered in its Github repository.

If you're using Anaconda then run:
conda install -c fastai -c pytorch -c anaconda fastai gh anaconda
...or if you're using miniconda) then run:
conda install -c fastai -c pytorch fastai

For other installation options, please refer to the FastAI documentation.

Modelling

The following materials are based on the FastAI course: "Practical Deep Learning for Coders".

First, import fastai.vision modules and download the sample data CAMVID_TINY, by:

from fastai.vision.all import *
path = untar_data(URLs.CAMVID_TINY)

Secondly, define helper functions to calculate segmentation performance and read in segmentation mask for each training image.

Note: it's tempting to define one-line python lambda functions to pass to fastai, however, this will introduce issue on serialization when we want to export a FastAI model. Therefore we avoid using anonymous python functions during FastAI modeling steps.

def acc_camvid(inp, targ, void_code=0):
    targ = targ.squeeze(1)
    mask = targ != void_code
    return (inp.argmax(dim=1)[mask] == targ[mask]).float().mean()

def get_y(o, path=path):
    return path / "labels" / f"{o.stem}_P{o.suffix}"

Thirdly, we setup the DataLoader which defines modelling path, training image path, batch size, mask path, mask code, etc. In this example we also record the image size and number of classes from the data. In real-world problem their values may be known in priori and shall be defined when constructing the dataset.

dls = SegmentationDataLoaders.from_label_func(
    path,
    bs=8,
    fnames=get_image_files(path / "images"),
    label_func=get_y,
    codes=np.loadtxt(path / "codes.txt", dtype=str),
)
dls.one_batch()[0].shape[-2:], get_c(dls)
>>> (torch.Size([96, 128]), 32)

Next, setup an U-Net learner with a Residual Neural Network (ResNet) backbone, then trigger the FastAI training process.

learn = unet_learner(dls, resnet50, metrics=acc_camvid)
learn.fine_tune(20)
>>>
epoch	train_loss	valid_loss	acc_camvid	time
0	3.901105	2.671725	0.419333	00:04
epoch	train_loss	valid_loss	acc_camvid	time
0	1.732219	1.766196	0.589736	00:03
1	1.536345	1.550913	0.612496	00:02
2	1.416585	1.170476	0.650690	00:02
3	1.300092	1.087747	0.665566	00:02
4	1.334166	1.228493	0.649878	00:03
5	1.269190	1.047625	0.711870	00:02
6	1.243131	0.969567	0.719976	00:03
7	1.164861	0.988767	0.700076	00:03
8	1.103572	0.791861	0.787799	00:02
9	1.026181	0.721673	0.806758	00:02
10	0.949283	0.650206	0.815247	00:03
11	0.882919	0.696920	0.812805	00:03
12	0.823694	0.635109	0.824582	00:03
13	0.766428	0.631013	0.832627	00:02
14	0.715637	0.591066	0.839386	00:03
15	0.669535	0.601648	0.836554	00:03
16	0.628947	0.598065	0.840095	00:03
17	0.593876	0.578633	0.841116	00:02
18	0.563728	0.582522	0.841409	00:03
19	0.539064	0.580864	0.842272	00:02

Finally, we export the fastai model to use for following sections of this tutorial.

learn.export("./fastai_unet.pkl")

For more details about the modeling process, refer to notebook/01_U-net_Modelling.ipynb [link].

PyTorch Transfer Modeling from FastAI

In this section we build a pure PyTorch model and transfer the model weights from FastAI. The following materials are inspired by "Practical-Deep-Learning-for-Coders-2.0" by Zachary Mueller et al.

Export Model Weights from FastAI

First, restore the FastAI learner from the export pickle at the last Section, and save its model weights with PyTorch.

from fastai.vision.all import *
import torch

def acc_camvid(*_): pass
def get_y(*_): pass

learn = load_learner("/home/ubuntu/.fastai/data/camvid_tiny/fastai_unet.pkl")
torch.save(learn.model.state_dict(), "fasti_unet_weights.pth")

It's also straightforward to obtain the FastAI prediction on a sample image.

"2013.04 - 'Streetview of a small neighborhood', with residential buildings, Amsterdam city photo by Fons Heijnsbroek, The Netherlands" by Amsterdam free photos & pictures of the Dutch city is marked under CC0 1.0. To view the terms, visit https://creativecommons.org/licenses/cc0/1.0/

image_path = "street_view_of_a_small_neighborhood.png"
pred_fastai = learn.predict(image_path)
pred_fastai[0].numpy()
>>>
array([[26, 26, 26, ...,  4,  4,  4],
       [26, 26, 26, ...,  4,  4,  4],
       [26, 26, 26, ...,  4,  4,  4],
       ...,
       [17, 17, 17, ..., 30, 30, 30],
       [17, 17, 17, ..., 30, 30, 30],
       [17, 17, 17, ..., 30, 30, 30]])

PyTorch Model from FastAI Source Code

Next, we need to define the model in pure PyTorch. In Jupyter notebook, one can investigate the FastAI source code by adding ?? in front of a function name. Here we look into unet_learner and DynamicUnet, by:

>> ??unet_learner
>> ??DynamicUnet

Each of these command will pop up a window at bottom of the browser:

After investigating, the PyTorch model can be defined as:

from fastai.vision.all import *
from fastai.vision.learner import _default_meta
from fastai.vision.models.unet import _get_sz_change_idxs, UnetBlock, ResizeToOrig


class DynamicUnetDIY(SequentialEx):
    "Create a U-Net from a given architecture."

    def __init__(
        self,
        arch=resnet50,
        n_classes=32,
        img_size=(96, 128),
        blur=False,
        blur_final=True,
        y_range=None,
        last_cross=True,
        bottle=False,
        init=nn.init.kaiming_normal_,
        norm_type=None,
        self_attention=None,
        act_cls=defaults.activation,
        n_in=3,
        cut=None,
        **kwargs
    ):
        meta = model_meta.get(arch, _default_meta)
        encoder = create_body(
            arch, n_in, pretrained=False, cut=ifnone(cut, meta["cut"])
        )
        imsize = img_size

        sizes = model_sizes(encoder, size=imsize)
        sz_chg_idxs = list(reversed(_get_sz_change_idxs(sizes)))
        self.sfs = hook_outputs([encoder[i] for i in sz_chg_idxs], detach=False)
        x = dummy_eval(encoder, imsize).detach()

        ni = sizes[-1][1]
        middle_conv = nn.Sequential(
            ConvLayer(ni, ni * 2, act_cls=act_cls, norm_type=norm_type, **kwargs),
            ConvLayer(ni * 2, ni, act_cls=act_cls, norm_type=norm_type, **kwargs),
        ).eval()
        x = middle_conv(x)
        layers = [encoder, BatchNorm(ni), nn.ReLU(), middle_conv]

        for i, idx in enumerate(sz_chg_idxs):
            not_final = i != len(sz_chg_idxs) - 1
            up_in_c, x_in_c = int(x.shape[1]), int(sizes[idx][1])
            do_blur = blur and (not_final or blur_final)
            sa = self_attention and (i == len(sz_chg_idxs) - 3)
            unet_block = UnetBlock(
                up_in_c,
                x_in_c,
                self.sfs[i],
                final_div=not_final,
                blur=do_blur,
                self_attention=sa,
                act_cls=act_cls,
                init=init,
                norm_type=norm_type,
                **kwargs
            ).eval()
            layers.append(unet_block)
            x = unet_block(x)

        ni = x.shape[1]
        if imsize != sizes[0][-2:]:
            layers.append(PixelShuffle_ICNR(ni, act_cls=act_cls, norm_type=norm_type))
        layers.append(ResizeToOrig())
        if last_cross:
            layers.append(MergeLayer(dense=True))
            ni += in_channels(encoder)
            layers.append(
                ResBlock(
                    1,
                    ni,
                    ni // 2 if bottle else ni,
                    act_cls=act_cls,
                    norm_type=norm_type,
                    **kwargs
                )
            )
        layers += [
            ConvLayer(ni, n_classes, ks=1, act_cls=None, norm_type=norm_type, **kwargs)
        ]
        apply_init(nn.Sequential(layers[3], layers[-2]), init)
        # apply_init(nn.Sequential(layers[2]), init)
        if y_range is not None:
            layers.append(SigmoidRange(*y_range))
        super().__init__(*layers)

    def __del__(self):
        if hasattr(self, "sfs"):
            self.sfs.remove()

Also check the inheritance hierarchy of the FastAI defined class SequentialEx by:

SequentialEx.mro()
>>> [fastai.layers.SequentialEx,
 fastai.torch_core.Module,
 torch.nn.modules.module.Module,
 object]

Here we can see SequentialEx stems from the PyTorch torch.nn.modules, therefore DynamicUnetDIY is a PyTorch Model.

Note: parameters of arch, n_classes, img_size, etc., must be consistent with the training process. If other parameters are customized during training, they must be reflected here as well. Also in the create_body we set pretrained=False as we are transferring the weights from FastAI so there is no need to download weights from PyTorch again.

Weights Transfer

Now initialize the PyTorch model, load the saved model weights, and transfer that weights to the PyTorch model.

model_torch_rep = DynamicUnetDIY()
state = torch.load("fasti_unet_weights.pth")
model_torch_rep.load_state_dict(state)
model_torch_rep.eval();

If take one sample image, transform it, and pass it to the model_torch_rep, we shall get an identical prediction result as FastAI's.

from torchvision import transforms
from PIL import Image
import numpy as np

image_path = "street_view_of_a_small_neighborhood.png"

image = Image.open(image_path).convert("RGB")
image_tfm = transforms.Compose(
    [
        transforms.Resize((96, 128)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

x = image_tfm(image).unsqueeze_(0)

# inference on CPU
raw_out = model_torch_rep(x)
raw_out.shape
>>> torch.Size([1, 32, 96, 128])

pred_res = raw_out[0].argmax(dim=0).numpy().astype(np.uint8)
pred_res
>>>
array([[26, 26, 26, ...,  4,  4,  4],
       [26, 26, 26, ...,  4,  4,  4],
       [26, 26, 26, ...,  4,  4,  4],
       ...,
       [17, 17, 17, ..., 30, 30, 30],
       [17, 17, 17, ..., 30, 30, 30],
       [17, 17, 17, ..., 30, 30, 30]], dtype=uint8)

np.all(pred_fastai[0].numpy() == pred_res)
>>> True

Here we can see the difference: in FastAI model fastai_unet.pkl, it packages all the steps including the data transformation, image dimension alignment, etc.; but in fasti_unet_weights.pth it has only the pure weights and we have to manually re-define the data transformation procedures among others and make sure they are consistent with the training step.

Note: in image_tfm make sure the image size and normalization statistics are consistent with the training step. In our example here, the size is 96x128 and normalization is by default from ImageNet as used in FastAI. If other transformations were applied during training, they may need to be added here as well.

For more details about the PyTorch weights transferring process, please refer to notebook/02_Inference_in_pytorch.ipynb [link].

Deployment to TorchServe

In this section we deploy the PyTorch model to TorchServe. For installation, please refer to TorchServe Github Repository.

Overall, there are mainly 3 steps to use TorchServe:

Archive the model into *.mar.
Start the torchserve.
Call the API and get the response.

In order to archive the model, at least 3 files are needed in our case:

PyTorch model weights fasti_unet_weights.pth.
PyTorch model definition model.py, which is identical to DynamicUnetDIY definition described in the last section.
TorchServe custom handler.

Custom Handler

As shown in /deployment/handler.py, the TorchServe handler accept data and context. In our example, we define another helper Python class with 4 instance methods to implement: initialize, preprocess, inference and postprocess.

`initialize`

Here we workout if GPU is available, then identify the serialized model weights file path and finally instantiate the PyTorch model and put it to evaluation mode.

    def initialize(self, ctx):
        """
        load eager mode state_dict based model
        """
        properties = ctx.system_properties
        self.device = torch.device(
            "cuda:" + str(properties.get("gpu_id"))
            if torch.cuda.is_available()
            else "cpu"
        )
        model_dir = properties.get("model_dir")

        manifest = ctx.manifest
        logger.error(manifest)
        serialized_file = manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model definition file")

        logger.debug(model_pt_path)

        from model import DynamicUnetDIY

        state_dict = torch.load(model_pt_path, map_location=self.device)
        self.model = DynamicUnetDIY()
        self.model.load_state_dict(state_dict)
        self.model.to(self.device)
        self.model.eval()

        logger.debug("Model file {0} loaded successfully".format(model_pt_path))
        self.initialized = True

`preprocess`

As described in the previous section, we re-define the image transform steps and apply them to the inference data.

    def preprocess(self, data):
        """
        Scales and normalizes a PIL image for an U-net model
        """
        image = data[0].get("data")
        if image is None:
            image = data[0].get("body")

        image_transform = transforms.Compose(
            [
                transforms.Resize((96, 128)),
                transforms.ToTensor(),
                transforms.Normalize(
                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                ),
            ]
        )
        image = Image.open(io.BytesIO(image)).convert(
            "RGB"
        )
        image = image_transform(image).unsqueeze_(0)
        return image

`inference`

Now convert image into PyTorch Tensor, load it into GPU if available, and pass it through the model.

    def inference(self, img):
        """
        Predict the chip stack mask of an image using a trained deep learning model.
        """
        self.model.eval()
        inputs = Variable(img).to(self.device)
        outputs = self.model.forward(inputs)
        logging.debug(outputs.shape)
        return outputs

`postprocess`

Here the inference raw output is unloaded from GPU if available, and encoded with Base64 to be returned back to the API trigger.

    def postprocess(self, inference_output):

        if torch.cuda.is_available():
            inference_output = inference_output[0].argmax(dim=0).cpu()
        else:
            inference_output = inference_output[0].argmax(dim=0)

        return [
            {
                "base64_prediction": base64.b64encode(
                    inference_output.numpy().astype(np.uint8)
                ).decode("utf-8")
            }
        ]

Now it's ready to setup and launch TorchServe.

TorchServe in Action

Step 1: Archive the model PyTorch

>>> torch-model-archiver --model-name fastunet --version 1.0 --model-file deployment/model.py --serialized-file model_store/fasti_unet_weights.pth --export-path model_store --handler deployment/handler.py -f

Step 2: Serve the Model

>>> torchserve --start --ncs --model-store model_store --models fastunet.mar

Step 3: Call API and Get the Response (here we use httpie). For a complete response see sample/sample_output.txt at here.

>>> time http POST http://127.0.0.1:8080/predictions/fastunet/ @sample/street_view_of_a_small_neighborhood.png

HTTP/1.1 200
Cache-Control: no-cache; no-store, must-revalidate, private
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Pragma: no-cache
connection: keep-alive
content-length: 131101
x-request-id: 96c25cb1-99c2-459e-9165-aa5ef9e3a439

{
  "base64_prediction": "GhoaGhoaGhoaGhoaGhoaGhoaGh...ERERERERERERERERERERER"
}

real    0m0.979s
user    0m0.280s
sys     0m0.039s

The first call would have longer latency due to model weights loading defined in initialize, but this will be mitigated from the second call onward. For more details about TorchServe setup and usage, please refer to notebook/03_TorchServe.ipynb [link].

Deployment to Amazon SageMaker Inference Endpoint

In this section we deploy the FastAI trained Scene Segmentation PyTorch model with TorchServe in Amazon SageMaker Endpoint using customized Docker image, and we will be using a ml.g4dn.xlarge instance. For more details about Amazon G4 Instances, please refer to here.

Getting Started with Amazon SageMaker Endpoint

There are 4 steps to setup a SageMaker Endpoint with TorchServe:

Build customized Docker Image and push to Amazon Elastic Container Registry (ECR). The dockerfile is provided in root of this code repository, which helps setup FastAI and TorchServe dependencies.
Compress *.mar into *.tar.gz and upload to Amazon Simple Storage Service (S3).
Create SageMaker model using the docker image from step 1 and the compressed model weights from step 2.
Create the SageMaker endpoint using the model from step 3.

The details of these steps are described in notebook/04_SageMaker.ipynb [link]. Once ready, we can invoke the SageMaker endpoint with image in real-time.

Real-time Inference with Python SDK

Read a sample image.

file_name = "street_view_of_a_small_neighborhood.png"

with open(file_name, 'rb') as f:
    payload = f.read()

Invoke the SageMaker endpoint with the image and obtain the response from the API.

client = boto3.client("runtime.sagemaker")
response = client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="application/x-image", Body=payload
)
response = json.loads(response["Body"].read())

Decode the response and visualize the predicted Scene Segmentation mask.

pred_decoded_byte = base64.decodebytes(bytes(response["base64_prediction"], encoding="utf-8"))
pred_decoded = np.reshape(
    np.frombuffer(pred_decoded_byte, dtype=np.uint8), (96, 128)
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()

What's Next

With an inference endpoint up and running, one could levearge its full power by exploring other features that are important for a Machine Learning product, including AutoScaling, Model monitoring with Human-in-the-loop (HITL) using Amazon Augmented AI (A2I), and incremental modeling iteration.

Clean Up

Make sure that you delete the following resources to prevent any additional charges:

Amazon SageMaker endpoint.
Amazon SageMaker endpoint configuration.
Amazon SageMaker model.
Amazon Elastic Container Registry (ECR).
Amazon Simple Storage Service (S3) Buckets.

Conclusion

This repository presented an end-to-end demonstration of deploying FastAI trained PyTorch models on TorchServe eager mode and host in Amazon SageMaker Endpoint. You can use this repository as a template to deploy your own FastAI models. This approach eliminates the self-maintaining effort to build and manage a customized inference server, which helps you to speed up the process from training a cutting-edge deep learning model to its online application in real-world at scale.

If you have questions please create an issue or submit Pull Request on the GitHub repository.

Reference

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's People

Contributors

Stargazers

Watchers

Forkers

muellerzr tcapelle polyrand stjordanis kokuno1122 robmarkcole gulsid mohitjuneja

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's Issues

Fix Typos

an Unet
a sample image
a SageMaker

Endpoint Runtime has CUDA 10.01 which PyTorch 1.7 would complain about

ModelError Parameter model_name is required

Having successfully deployed the model and endpoint, at client.invoke_endpoint I receive error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "BadRequestException",
  "message": "Parameter model_name is required."
}

Appears to be pytorch/serve#631 with an update to the Dockerfile required (add captum)

http: command not found

The example request using http fails on sagemaker notebook since http: command not found

Propose instead

curl -X POST http://127.0.0.1:8080/predictions/fastunet -T sample/street_view_of_a_small_neighborhood.png

Training notebook fails with newer fastai version

The notebook 01_U-net_Modelling.ipynb requires updating for fastai.version = 2.3.1 as fails at cell:

learn = unet_learner(dls, resnet50, metrics=acc_camvid)
learn.fine_tune(20)

with error:
TypeError: no implementation found for 'torch.Tensor.__getitem__' on types that implement __torch_function__: [TensorImage, TensorMask]

ModelError Parameter model_name is required

batch transform error:

2022-08-30T09:01:17.792:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=50, BatchStrategy=MULTI_RECORD
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: ClientError: 400
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg:
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: Message:
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: {
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: "code": 400,
2022-08-30T09:01:17.884:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: "type": "BadRequestException",
2022-08-30T09:01:17.884:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: "message": "Parameter model_name is required."
2022-08-30T09:01:17.884:[sagemaker logs]: st-s3/trainingPlatform/model/ba0ba70ebb2c48f69c61240a199f7a24/inference/dataset/21f9bb9e2a39407691bdb18e04e1b672/202208170843490.jpg: }

-------------------------------------------------my dockerfile is-------------------------------------------------

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04

NCCL_VERSION=2.4.7, CUDNN_VERSION=7.6.2.24
LABEL maintainer="Amazon AI"
LABEL dlc_major_version="1"
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

Add arguments to achieve the version, python and url
ARG PYTHON=python3
ARG PYTHON_VERSION=3.7.3
ARG OPEN_MPI_VERSION=4.0.1
ARG TS_VERSION="0.3.1"
ARG PT_INFERENCE_URL=https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.6.0_inference/20200727-223446/b0251e7e070e57f34ee08ac59ab4710081b41918/gpu/torch-1.6.0-cp36-cp36m-manylinux1_x86_64.whl
ARG PT_VISION_URL=https://torchvision-build.s3.amazonaws.com/1.6.0/gpu/torchvision-0.7.0-cp36-cp36m-linux_x86_64.whl

See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
ENV LD_LIBRARY_PATH /opt/conda/lib/:$LD_LIBRARY_PATH
ENV PATH /opt/conda/bin:$PATH
ENV SAGEMAKER_SERVING_MODULE sagemaker_pytorch_serving_container.serving:main
ENV TEMP=/home/model-server/tmp

RUN apt-get update
&& apt-get install -y --no-install-recommends software-properties-common
&& add-apt-repository ppa:openjdk-r/ppa
&& apt-get update
&& apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends
build-essential
ca-certificates
cmake
curl
emacs
git
jq
libgl1-mesa-glx
libglib2.0-0
libgomp1
libibverbs-dev
libnuma1
libnuma-dev
libsm6
libxext6
libxrender-dev
openjdk-11-jdk
vim
wget
unzip
zlib1g-dev

docker-library/openjdk#261 https://github.com/docker-library/openjdk/pull/263/files
RUN keytool -importkeystore -srckeystore /etc/ssl/certs/java/cacerts -destkeystore /etc/ssl/certs/java/cacerts.jks -deststoretype JKS -srcstorepass changeit -deststorepass changeit -noprompt;
mv /etc/ssl/certs/java/cacerts.jks /etc/ssl/certs/java/cacerts;
/var/lib/dpkg/info/ca-certificates-java.postinst configure;

RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf -
&& cd openmpi-$OPEN_MPI_VERSION
&& ./configure --prefix=/home/.openmpi
&& make all install
&& cd ..
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz
&& rm -rf openmpi-$OPEN_MPI_VERSION

ENV PATH="$PATH:/home/.openmpi/bin"
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/"

Install OpenSSH. Allow OpenSSH to talk to containers without asking for confirmation
RUN apt-get install -y --no-install-recommends
openssh-client
openssh-server
&& mkdir -p /var/run/sshd
&& cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new
&& echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new
&& mv /etc/ssh/ssh_config.new /etc/ssh/ssh_configs

RUN curl -L -o ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh
&& chmod +x ~/miniconda.sh
&& ~/miniconda.sh -b -p /opt/conda
&& rm ~/miniconda.sh
RUN /opt/conda/bin/conda update conda
&& /opt/conda/bin/conda install -c conda-forge
python=$PYTHON_VERSION
&& /opt/conda/bin/conda install -y
cython==0.29.12
ipython==7.7.0
mkl-include==2019.4
mkl==2019.4
numpy==1.19.1
scipy==1.3.0
typing==3.6.4
&& /opt/conda/bin/conda clean -ya

RUN conda install -c
pytorch magma-cuda101
&& conda install -c
conda-forge
opencv==4.0.1
&& conda install -y
scikit-learn==0.21.2
pandas==0.25.0
h5py==2.9.0
requests==2.22.0
&& conda clean -ya
&& /opt/conda/bin/conda config --set ssl_verify False
&& pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3
&& pip install packaging==20.4
enum-compat==0.0.3
ruamel-yaml

Uninstall and re-install torch and torchvision from the PyTorch website
RUN pip install --no-cache-dir -U https://pypi.tuna.tsinghua.edu.cn/packages/5d/5e/35140615fc1f925023f489e71086a9ecc188053d263d3594237281284d82/torch-1.6.0-cp37-cp37m-manylinux1_x86_64.whl#sha256=87d65c01d1b70bb46070824f28bfd93c86d3c5c56b90cbbe836a3f2491d91c76
RUN pip uninstall -y torchvision
&& pip install --no-deps --no-cache-dir -U https://mirrors.aliyun.com/pypi/packages/4d/b5/60d5eb61f1880707a5749fea43e0ec76f27dfe69391cdec953ab5da5e676/torchvision-0.7.0-cp37-cp37m-manylinux1_x86_64.whl#sha256=0d1a5adfef4387659c7a0af3b72e16caa0c67224a422050ab65184d13ac9fb13

RUN pip uninstall -y model-archiver multi-model-server
&& pip install captum
&& pip install torchserve==$TS_VERSION
&& pip install torch-model-archiver==$TS_VERSION

RUN useradd -m model-server
&& mkdir -p /home/model-server/tmp /opt/ml/model
&& chown -R model-server /home/model-server /opt/ml/model

COPY torchserve-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY config.properties /home/model-server

RUN chmod +x /usr/local/bin/dockerd-entrypoint.py

ADD https://raw.githubusercontent.com/aws/deep-learning-containers/master/src/deep_learning_container.py /usr/local/bin/deep_learning_container.py

RUN chmod +x /usr/local/bin/deep_learning_container.py

RUN pip install --no-cache-dir "sagemaker-pytorch-inference>=2"

RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.6.0/license.txt -o /license.txt

RUN conda install -y -c conda-forge "pyyaml>5.4,<5.5"
RUN pip install pillow==8.2.0 "awscli<2"

RUN python3 -m pip install detectron2==0.4 -f
https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

RUN HOME_DIR=/root
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip
&& unzip {HOME_DIR}/
&& cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance
&& chmod +x /usr/local/bin/testOSSCompliance
&& chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh
&& {HOME_DIR} ${PYTHON}
&& rm -rf ${HOME_DIR}/oss_compliance*

EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["torchserve", "--start", "--ts-config", "/home/model-server/config.properties", "--model-store", "/home/model-server/"]

Fastai2 model deployment

I donot know, why is it not finding the model??

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/model.pth'

Thanks, for any help

Add Next Steps

Be explicit on what can be done after the endpoint is working, or what's the point of having an endpoint.

AutoScaling
Model monitoring and Human-In-The-Loop (HITL) with A2I

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

aws-samples / amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve Goto Github PK

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's Introduction

Deploy FastAI Trained PyTorch Model in TorchServe and Host in Amazon SageMaker Inference Endpoint

Introduction

Getting Started with A FastAI Model

Installation

Modelling

PyTorch Transfer Modeling from FastAI

Export Model Weights from FastAI

PyTorch Model from FastAI Source Code

Weights Transfer

Deployment to TorchServe

Custom Handler

initialize

preprocess

inference

postprocess

TorchServe in Action

Deployment to Amazon SageMaker Inference Endpoint

Getting Started with Amazon SageMaker Endpoint

Real-time Inference with Python SDK

What's Next

Clean Up

Conclusion

Reference

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's People

Contributors

Stargazers

Watchers

Forkers

amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve's Issues

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/model.pth'

Recommend Projects

Recommend Topics

Recommend Org

`initialize`

`preprocess`

`inference`

`postprocess`