Giter VIP home page Giter VIP logo

sagemaker-studio-custom-image-samples's Introduction

SageMaker Studio Custom Image Samples

Overview

This repository contains examples of Docker images that are valid custom images for KernelGateway Apps in SageMaker Studio. These custom images enable you to bring your own packages, files, and kernels for use with notebooks, terminals, and interactive consoles within SageMaker Studio.

Examples

  • conda-env-kernel-image - This example creates a custom Conda environment in the Docker image and demonstrates using it as a custom kernel.
  • echo-kernel-image - This example uses the echo_kernel from Jupyter as a "Hello World" introduction into writing custom KernelGateway images.
  • jupyter-docker-stacks-julia-image - This example leverages the Data Science image from Jupyter Docker Stacks to add a Julia kernel.
  • python-poetry-image - This example uses Poetry to manage the package dependencies in Python.
  • r-image - This example contains the ir kernel and a selection of R packages, along with the AWS Python SDK (boto3) and the SageMaker Python SDK which can be used from R using reticulate
  • rapids-image - This example uses the offical rapids.ai image from Dockerhub. Use with a GPU instance on Studio
  • scala-image - This example adds a Scala kernel based on Almond Scala Kernel.
  • tf2.3-image - This examples uses the official TensorFlow 2.3 image from DockerHub and demonstrates bundling custom files along with the image.

One-time setup

All examples have a one-time setup to create an ECR repository

REGION=<aws-region>
aws --region ${REGION} ecr create-repository \
    --repository-name smstudio-custom

Developing Custom Images

See DEVELOPMENT.md

License

This sample code is licensed under the MIT-0 License. See the LICENSE file.

sagemaker-studio-custom-image-samples's People

Contributors

amazon-auto avatar athewsey avatar eitansela avatar jaipreet-s avatar jpilorget avatar knaresh avatar lstilwell avatar moose-in-australia avatar w601sxs avatar zhanghan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sagemaker-studio-custom-image-samples's Issues

exec /opt/.sagemakerinternal/conda/kgw_variant: exec format error

I've been trying to use my own custom image in SageMaker Studio and I always get the same error when trying to associate a notebook to the kernel (as seen in SageMaker Studio) :

Failed to start kernel
Failed to launch app [xxxxxxx-ml-t3-medium-176855f5e1df73edeeb33d0c81f0]. InternalFailure

And, the only log event seen in CloudWatch for the image is:

exec /opt/.sagemakerinternal/conda/kgw_variant: exec format error.

At first I thought it might be a bug in my container, so I have carefully checked all the steps described in DEVELOPMENT.md.

And finally, to try to discard a bug in the platform I tried to use the examples
echo-kernel-image and python-poetry-image without modifications from the repository and in both of them the same error occurred.

I have also tried the tf23-image example, and it works, so I have been comparing the images and my feeling is that it could be related to the base image (the operating system used in the container): echo-kernel-image and python-poetry-image are based on debian and tf23-image is based on ubuntu. Could that be the issue?

Could you please confirm that echo-kernel-image and python-poetry-image images are working (as is) in current SageMaker Studio version?

Why should I run custom images as sagemaker-user?

The examples in this repo and the image configuration in SageMaker Studio assume that custom images are run as user sagemaker-user with UID 1000 and GUID 100.

However, at least some of SageMaker's own images (Data Science and PyTorch for example, have not tested all) run as root.

Running as something other than root makes installing into the kernel image at runtime difficult:

$ pip install some-package
Defaulting to user installation because normal site-packages is not writeable
...

and I end up with files in the user's home directory where they don't get cleaned up.

Why should I not just run as root like SageMaker's built-in images?

ID issue

Any idea about this issue?

An error occurred (ResourceInUse) when calling the CreateImage operation: The Image with ImageName conda-env-kernel-test already exists. Please use a different ImageName.

An error occurred (ValidationException) when calling the CreateImageVersion operation: Image conda-env-kernel-test is in an invalid state.

An error occurred (ResourceNotFound) when calling the DescribeImageVersion operation: Latest ImageVersion with ImageName conda-env-kernel-test does not exist. Please provide a valid ImageName.

"""An error occurred (ResourceInUse) when calling the CreateAppImageConfig operation: The ID or Name specified is already in use."""

conda-env-kernel-image example is broken

After following the steps listed here exactly I began a SageMaker Studio session. After creating selecting the custom image and beginning a console I received the following error:

Invalid response: 404 Not Found
Kernel with name [myenv] does not exist in image [arn:aws:sagemaker:REGION:ACCOUNT_ID:image/conda-test-kernel] on the KernelGateway App [conda-test-kernel-ml-t3-medium-HASH]. To make the kernel available, either update your AppImageConfig to have same kernel name as available in the image or update your SageMaker Image to have the kernel with the same name as specified in AppImageConfig. You can use https://github.com/aws-samples/sagemaker-studio-custom-image-samples/blob/main/DEVELOPMENT.md#local-testing for testing your image locally.

The Dockerfile and environment.yml are identical to the example. Here is the app-image-config-input.json file:

{
    "AppImageConfigName": "myenv-config",
    "KernelGatewayImageConfig": {
        "KernelSpecs": [
            {
                "Name": "myenv",
                "DisplayName": "Python [conda env: myenv]"
            }
        ],
        "FileSystemConfig": {
            "MountPath": "/home/sagemaker-user",
            "DefaultUid": 0,
            "DefaultGid": 0
        }
    }
}

And here is the anonymized create-domain-input.json contents:

{
    "DomainId": "d-xxxxxxxxx",
    "DefaultUserSettings": {
        "ExecutionRole": "ROLE_ARN",
        "KernelGatewayAppSettings": {
            "CustomImages": [
                {
                    "ImageName": "conda-test-kernel",
                    "AppImageConfigName": "myenv-config"
                }
            ]
        }
    }
}

I used IMAGE_NAME=conda-test-kernel throughout. Other things to note:

  • aws sagemaker describe-image-version shows "ImageVersionStatus": "CREATED"
  • aws sagemaker describe-app-image-config gives back all the expected information

I believe the issue is that conda doesn't automatically follow the kernelspec. This quirk needs to be covered in the README for this example. Unfortunately I haven't figure out the solution yet. Any help is appreciated.

Python virtual environment not used by Studio, completely different Python injected instead

Somewhat similar to #9

I have a Docker image with a virtual environment and dependencies installed by pip:

# Dockerfile
RUN python -m venv /opt/venv \
    && . /opt/venv/bin/activate \
    && pip install --upgrade pip wheel \
    && pip install --requirement requirements.txt \
    && pip cache purge \
    && python -m ipykernel install

The image has a kernelspec in the right place:

# in the image terminal
$ cat /usr/local/share/jupyter/kernels/python3/kernel.json
{
 "argv": [
  "/opt/venv/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python 3 (ipykernel)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

However, in Studio the virtual environment is not used, instead I get:

# in Studio with image kernel running
import sys
sys.exectuable
# '/opt/.sagemakerinternal/conda/bin/python'

which is some Python which isn't even in the image:

# in image terminal
ls -lh /opt/.sagemakerinternal/conda/bin/python3
# lrwxrwxrwx 1 nobody nogroup 9 Oct 12 17:38 /opt/.sagemakerinternal/conda/bin/python3 -> python3.7
ls -lh /opt/.sagemakerinternal/conda/bin/python3.7
# -rwxr-xr-x 1 nobody nogroup 13M Oct 12 17:38 /opt/.sagemakerinternal/conda/bin/python3.7

I was expecting Studio to look through the usual kernelspec locations and find the one matching the config for the attached image (which is set to a kernel called "python3").

What do I have to do for Studio to use the virtual environment as a kernel?

Getting error exec /opt/.sagemakerinternal/conda/kgw_variant: exec format error

Hi All,

I'm using https://github.com/aws-samples/sagemaker-studio-custom-image-samples/tree/main/examples/python-poetry-image
to bring the custom image to Sagemaker Studio.

However, when I start the notebook, with this custom image using apptype KernelGateway, it's takes a lot time to load and in my cloudwatch loggroup I see exec /opt/.sagemakerinternal/conda/kgw_variant: exec format error .

Does anyone knows, why this error message? or how can I fix this?

(Feature request) Example for importing conda env spec as Python kernel

Thanks for the samples!

The examples of different languages are useful, but I think it would also be useful to have a template more optimized towards governance of Python environments:

Could we have an example where the user can provide a Conda environment spec YAML alongside the Dockerfile, and the container build process would create the kernel environment from that?

I guess something along the lines of RUN conda env create -f environment.yml and then just checking SageMaker used that created conda env would work - but haven't had a chance to experiment yet.

[nb_conda_kernels] enabled, 0 kernels found After registering R image sample to Sagemaker Studio

After building, deploying to ECR, attaching to Sagemaker Studio the R image sample and trying to run the kernel I get Kernel Dead error with these logs:

2020-11-26T13:03:33.471-03:00 | + CONDA_DIR=/opt/.sagemakerinternal/conda
-- | --
  | 2020-11-26T13:03:33.471-03:00 | + CONDA_ENV_FILTER=/opt/conda$
  | 2020-11-26T13:03:33.471-03:00 | + command -v python
  | 2020-11-26T13:03:33.471-03:00 | + [ 0 -eq 0 ]
  | 2020-11-26T13:03:33.471-03:00 | + python -c from __future__ import print_function;import sys; print(sys.prefix)
  | 2020-11-26T13:03:33.471-03:00 | + SYSTEM_PYTHON_PREFIX=/opt/conda
  | 2020-11-26T13:03:33.471-03:00 | + export JUPYTER_PATH=/opt/conda/share/jupyter/
  | 2020-11-26T13:03:33.471-03:00 | + [ ! -f /opt/conda/share/jupyter/kernels/python3/kernel.json ]
  | 2020-11-26T13:03:33.471-03:00 | + [ ! -z Custom ]
  | 2020-11-26T13:03:33.471-03:00 | + [ Custom = DLC ]
  | 2020-11-26T13:03:33.471-03:00 | + [ Custom = Studio ]
  | 2020-11-26T13:03:33.471-03:00 | + echo This is not a DLC/Studio image (found Custom), so not adding Python3 kernel.
  | 2020-11-26T13:03:33.471-03:00 | + export PATH=/opt/conda/bin:/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/miniconda3/condabin:/tmp/anaconda3/condabin:/tmp/miniconda2/condabin:/tmp/anaconda2/condabin
  | 2020-11-26T13:03:33.471-03:00 | + /opt/.sagemakerinternal/conda/bin/jupyter-kernelgateway --ip 0.0.0.0 --port 8888 --JupyterWebsocketPersonality.list_kernels=True --KernelSpecManager.ensure_native_kernel=False --MultiKernelManager.default_kernel_name= --KernelGatewayApp.kernel_spec_manager_class=nb_conda_kernels.CondaKernelSpecManager --CondaKernelSpecManager.env_filter=/opt/conda$
  | 2020-11-26T13:03:33.471-03:00 | This is not a DLC/Studio image (found Custom), so not adding Python3 kernel.
  | 2020-11-26T13:03:33.471-03:00 | [KernelGatewayApp] [nb_conda_kernels] enabled, 0 kernels found
  | 2020-11-26T13:03:38.782-03:00 | [KernelGatewayApp] Jupyter Kernel Gateway at http://0.0.0.0:8888
  | 2020-11-26T13:03:48.730-03:00 | [E 201126 16:03:48 web:1792] Uncaught exception POST /api/kernels (169.255.250.1) HTTPServerRequest(protocol='http', host='10.10.21.72:32774', method='POST', uri='/api/kernels', version='HTTP/1.1', remote_ip='169.255.250.1') Traceback (most recent call last): File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute result = await result File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/kernel_gateway/services/kernels/handlers.py", line 63, in post yield super(MainKernelHandler, self).post() File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 46, in post kernel_id = yield maybe_future(km.start_kernel(kernel_name=model['name'])) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/kernel_gateway/services/kernels/manager.py", line 83, in start_kernel kernel_id = yield gen.maybe_future(super(SeedingMappingKernelManager, self).start_kernel(*args, **kwargs)) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/notebook/services/kernels/kernelmanager.py", line 168, in start_kernel super(MappingKernelManager, self).start_kernel(**kwargs) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/multikernelmanager.py", line 185, in start_kernel km.start_kernel(**kwargs) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 309, in start_kernel kernel_cmd, kw = self.pre_start_kernel(**kw) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 262, in pre_start_kernel kernel_cmd = self.format_kernel_cmd(extra_arguments=extra_arguments) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 181, in format_kernel_cmd cmd = self.kernel_spec.argv + extra_arguments File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/manager.py", line 87, in kernel_spec self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/nb_conda_kernels/manager.py", line 327, in get_kernel_spec res = super(CondaKernelSpecManager, self).get_kernel_spec(kernel_name) File "/opt/.sagemakerinternal/conda/lib/python3.7/site-packages/jupyter_client/kernelspec.py", line 235, in get_kernel_spec raise NoSuchKernel(kernel_name) jupyter_client.kernelspec.NoSuchKernel: No such kernel named hdx-r-kernel
  | 2020-11-26T13:03:48.730-03:00 | [E 201126 16:03:48 log:48] { "Authorization": "token", "Connection": "close", "Host": "10.10.21.72:32774", "Content-Length": "117", "Content-Type": "application/x-www-form-urlencoded", "Accept-Encoding": "gzip" }
  | 2020-11-26T13:03:53.471-03:00 | [E 201126 16:03:48 log:49] 500 POST /api/kernels (169.255.250.1) 294.59ms referer=None

Am I doing something wrong?

r-image example docker build failed

=> ERROR [5/6] RUN conda install --quiet --yes 'r-base=4.0.0' 'r-caret=6.' 'r-crayon=1.3' 'r-devtools=2.3*' 'r-forecast= 83.8s

[5/6] RUN conda install --quiet --yes 'r-base=4.0.0' 'r-caret=6.' 'r-crayon=1.3' 'r-devtools=2.3*' 'r-forecast=8.12*' 'r-hexbin=1.28*' 'r-htmltools=0.4*' 'r-htmlwidgets=1.5*' 'r-irkernel=1.1*' 'r-rmarkdown=2.2*' 'r-rodbc=1.3*' 'r-rsqlite=2.2*' 'r-shiny=1.4*' 'r-tidyverse=1.3*' 'unixodbc=2.3.' 'r-tidymodels=0.1' 'r-reticulate=1.*' && pip install --quiet --no-cache-dir 'boto3>1.0<2.0' 'sagemaker>2.0<3.0' && conda clean --all -f -y:
#9 1.425 Collecting package metadata (current_repodata.json): ...working... done
#9 11.72 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#9 11.73 Collecting package metadata (repodata.json): ...working... done
#9 57.83 Solving environment: ...working... Killed


executor failed running [/bin/sh -c conda install --quiet --yes 'r-base=4.0.0' 'r-caret=6.' 'r-crayon=1.3' 'r-devtools=2.3*' 'r-forecast=8.12*' 'r-hexbin=1.28*' 'r-htmltools=0.4*' 'r-htmlwidgets=1.5*' 'r-irkernel=1.1*' 'r-rmarkdown=2.2*' 'r-rodbc=1.3*' 'r-rsqlite=2.2*' 'r-shiny=1.4*' 'r-tidyverse=1.3*' 'unixodbc=2.3.' 'r-tidymodels=0.1' 'r-reticulate=1.*' && pip install --quiet --no-cache-dir 'boto3>1.0<2.0' 'sagemaker>2.0<3.0' && conda clean --all -f -y]: exit code: 137

As stated, image build failed. Here's the log of docker build step.

Docker image critical and high vulnerabilities found after scan in ECR, R kernel dies in SageMaker Studio

Vulnerability results after scan in ECR:

CVE-2019-19816 linux:4.19.152-1 CRITICAL In the Linux kernel 5.0.21, mounting a crafted btrfs filesystem image and performing some operations can cause slab-out-of-bounds write access in __btrfs_map_block in fs/btrfs/volumes.c, because a value of 1 for the number of data stripes is mishandled.
CVE-2019-19814 linux:4.19.152-1 CRITICAL In the Linux kernel 5.0.21, mounting a crafted f2fs filesystem image can cause __remove_dirty_segment slab-out-of-bounds write access because an array is bounded by the number of dirty types (8) but the array index can exceed this.
CVE-2020-27153 bluez:5.50-1.2~deb10u1 HIGH In BlueZ before 5.55, a double free was found in the gatttool disconnect_cb() routine from shared/att.c. A remote attacker could potentially cause a denial of service or code execution, during service discovery, due to a redundant disconnect MGMT event.
CVE-2020-0423 linux:4.19.152-1 HIGH In binder_release_work of binder.c, there is a possible use-after-free due to improper locking. This could lead to local escalation of privilege in the kernel with no additional execution privileges needed. User interaction is not needed for exploitation.Product: AndroidVersions: Android kernelAndroid ID: A-161151868References: N/A

Custom images launched in SageMaker Studio use Python installation not present in uploaded image

Expected Behavior

  1. I attach a custom docker container to my SageMaker Studio session (example Dockerfile)
  2. I launch a SageMaker Studio notebook using a Jupyter Kernel from the image
  3. The kernel launches the system Python installation at /usr/local/bin referencing packages installed at /usr/local/lib/python3.8/site-packages

Οbserved behavior

The SageMaker Studio notebook launches a kernel session using a python installation not present in the Docker container when uploaded to ECR found at /opt/.sagemakerinternal/conda/bin/python.

This happens with Notebook and Console kernel sessions launched from the image.

Attempted debugging

I can confirm that the Notebook and Console sessions are launching the correct image because I can find my installed packages in /usr/local/lib/python3.8/site-packages from within the Notebook and the Console.

I can confirm this doesn't happen if I launch an Image Terminal from the Studio Session. When I do this, the Docker container is launched as expected and executing $ python points to the expected installation in /usr/local/bin

I think this has something to do with how Jupyter sessions are launched within SageMaker Studio. When I look at the logs on CloudWatch, I see this:

image (4)

I have a few concerns:

  1. Why is SageMaker using conda here when my image doesn't have conda installed? Can I override this? I would have thought the main purpose of Custom images was to be able to use a different version of Python than standard SageMaker Images. Why override this when importing custom images?
  2. What does it mean that this warning is getting raised: + echo This is not a DLC/Studio image (found Custom), so not adding Python3 kernel. This should be a SageMaker image. I make it copying the echo_kernel example Dockerfile from this repo. SageMaker has no issues launching the Notebook session from the console.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.