Lately I work mainly in SageMaker Studio, and I'd really like to be able to debug / in

[Feature] SageMaker job as Studio kernel about sagemaker-ssh-helper HOT 2 OPEN

aws-samples commented on June 13, 2024

[Feature] SageMaker job as Studio kernel

from sagemaker-ssh-helper.

Comments (2)

ivan-khvostishkov commented on June 13, 2024

Hi, Alex! It's definitely an interesting idea. I will do some research for the best route to take here.

In a meantime, there's already an API that will help you to achieve the same results. You can run the following code in notebook cells:

proxy = ssh_wrapper.start_ssm_connection(11022)

proxy.run_command_with_output("ps xfa")

proxy.disconnect()

Let me know if it helps and I will update the documentation accordingly.

from sagemaker-ssh-helper.

athewsey commented on June 13, 2024

Thanks Ivan, this is certainly helpful but I guess I'm hoping it's possible to do a bit more...

First, AFAICT the current implementation of run_command_with_output would only return the results of long-running commands after completion which can be frustrating/unusable for some tools: For e.g. compare the results of subprocess.check_output("echo hi && sleep 5 && echo bye", shell=True) and !echo hi && sleep 5 && echo bye in a notebook. It would be nice to have a more interactive implementation here although I know from my own attempts on related topics it can get complicated quickly...

Second, I guess it's more a question of the intended overall workflow for drafting training (/processing/etc) script bundles in Studio JupyterLab and how this tool would fit in.

I'm thinking of SSH Helper mainly as a workaround for lack of local mode in SMStudio & some limitations of warm pools: Looking for a way to iterate quickly on the scripts in JupyterLab UI and try running them in training/processing job context, finding and fixing basic functional/syntax errors without having to wait for infrastructure spin-up. Features that seem helpful to me in this context include:

(Full Interactive debugging if it's possible, but I think that's a stretch)
Notebook-like Python REPL in the context of the job, with visibility of the data and the uploaded code dir. (This is useful because the Studio kernels don't always match available framework container images today, and so we can see the pre-loaded data channels & folder structure)
- Diagnostic shell commands like ps/top matter less to me for this kind of functional debugging
- Shell commands like pwd/ls are of course still useful, but mainly for helping us understand/check the folder structure for our Python scripts
An "easy button" to replace the job's source code dir with an updated one I've drafted in Studio JupyterLab
An "easy button" to launch (updated) job source code dir and entrypoint the same way the framework/toolkit would: Without having to know about e.g. the CLI parameters or environment variables that get created (but maybe being able to override them if needed?)

I raised this issue with (2) originally in mind, but thinking that magics could be used to provide (3) and (4) too: Main goal to provide a super easy-to-use way (after the initial platform SSM/etc setup is done of course) for JuptyerLab-minded scientists to iterate on their scripts until they functionally work, before quitting the interactive training/processing job and running a "proper" non-interactive one to run the training/processing in a known reproducible way.

Appreciate that there are other use-cases for SSH Helper of course (like diagnosing processes/threads/etc in an actually ongoing job) - I'm just wondering if it has potential to deliver a purpose-built, friction-free script debugging experience from Studio.

from sagemaker-ssh-helper.

[Feature] SageMaker job as Studio kernel about sagemaker-ssh-helper HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent