Giter VIP home page Giter VIP logo

Comments (10)

jlewitt1 avatar jlewitt1 commented on May 21, 2024

Thanks for bringing this up! This is something we should definitely support. Would adding a separate parameter for the proxy command work for your use case?

0dc953e

from runhouse.

dongreenberg avatar dongreenberg commented on May 21, 2024

We actually already support this, but haven't explicitly documented it. Can you try adding "ssh_proxy_command": "{proxy string}" to the ssh_creds dictionary?

One caveat - if you're using folder objects (or blob or table, which depend on folder), we haven't yet added this, but are actually in the process of significantly expanding our SSH flexibility, and likely will release it within the week.

from runhouse.

gopitk avatar gopitk commented on May 21, 2024

Thanks for the tipcs @jlewitt1 and @dongreenberg . I tried to add the "ssh_proxy_command" to the ssh_creds dict. It seem to work (in terms of seeing an SSH connect) but then it threw an exception when RH tried to check connectivity again. This is the exception. I dont see this when I have the ProxyCommand in my ~/.ssh/config. BTW - I have a couple of other SSH options (-o ) that I have in the config file which I did not have a way to pass in in the dict.

INFO | 2023-07-12 13:23:29,522 | Checking server myvm again.
---------------------------------------------------------------------------
BaseSSHTunnelForwarderError               Traceback (most recent call last)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:357, in Cluster.check_server(self, restart_server)
    356 try:
--> 357     self.connect_server_client()
    358     cluster_config = self.config_for_rns

File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:324, in Cluster.connect_server_client(self, tunnel, force_reconnect)
    323 else:
--> 324     self._rpc_tunnel, connected_port = self.ssh_tunnel(
    325         HTTPClient.DEFAULT_PORT,
    326         remote_port=DEFAULT_SERVER_PORT,
    327         num_ports_to_try=5,
    328     )
    329 open_cluster_tunnels[self.address] = (
    330     self._rpc_tunnel,
    331     connected_port,
    332     tunnel_refcount + 1,
    333 )

AttributeError                            Traceback (most recent call last)

gpu = rh.cluster(....)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster_factory.py:59, in cluster(name, ips, ssh_creds, dryrun, **kwargs)
     50 if {"instance_type", "num_instances", "provider"} <= kwargs.keys():
     51     # Commenting out for now. If two creation paths creates confusion let's push people to use
     52     # ondemand_cluster() instead.
   (...)
     55     #     "If you would like to create an on-demand cluster, please use `rh.ondemand_cluster()` instead."
     56     # )
     57     return ondemand_cluster(name=name, **kwargs)
---> 59 return Cluster(ips=ips, ssh_creds=ssh_creds, name=name, dryrun=dryrun)

File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:58, in Cluster.__init__(self, name, ips, ssh_creds, dryrun, **kwargs)
     55 self.client = None
     57 if not dryrun and self.address:
---> 58     self.check_server()
     59     # OnDemandCluster will start ray itself, but will also set address later, so won't reach here.
     60     self.start_ray()

File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:379, in Cluster.check_server(self, restart_server)
    377     self.restart_server(resync_rh=False)
    378     logger.info(f"Checking server {self.name} again.")
--> 379     self.client.check_server(cluster_config=cluster_config)
    380 else:
    381     raise ValueError(f"Could not connect to cluster <{self.name}>")

AttributeError: 'NoneType' object has no attribute 'check_server'

from runhouse.

dongreenberg avatar dongreenberg commented on May 21, 2024

Oh great point, we aren't passing the proxy into the tunnel. I can patch that and the options up shortly. Out of curiosity, you said this all works (through to running the remote function) when you've provided the options and proxy info in your SSH config?

from runhouse.

gopitk avatar gopitk commented on May 21, 2024

Actually remote function also did not work with my ~/.ssh/config too. It hung for a long time. What worked was the setup of the functions (like creating the pip installs) and cluster.run_python.

from runhouse.

dongreenberg avatar dongreenberg commented on May 21, 2024

Got it. I've been banging on this and the tunneling library we use (ironically to handle different credentials scenarios....) doesn't support proxies nicely (as in, it looks like it does, but I spent hours debugging and it still wouldn't proxy correctly despite working from the command line). I've implemented a fix (#85) going directly though the command line to remove that discrepancy, but I'll want to test it a bit further before releasing because it's a core execution path. Adding more ssh options is straightforward and I'll push that too shortly. If you're blocked and would like to give it a try so far, please feel free here:
pip install git+https://github.com/run-house/runhouse.git@proxy_tunneling

from runhouse.

gopitk avatar gopitk commented on May 21, 2024

Thanks @dongreenberg for the quick fix. I think there is still some issues after I used the proxy_tunneling branch. Good news is that the e2e (The stable diffusion tutorial) runs fine with remote functions etc when I use ~/.ssh/config to specify my ProxyCommand.

However when I use ssh_proxy_command dict item to pass that info I get some error in how the ssh /bash command is constructed. Seems like it is looking for SSH in my current dir.

/bin/bash: /home/user/runhouse/tutorials/t01_Stable_Diffusion/ssh -i ~/.ssh/id_rsa -W <host>:<port> -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null <proxyhost>: No such file or directory

Then I see the remote pkill and start of the runhouse http server go thru fine. This is followed by a connection refused error.

INFO | 2023-07-13 13:11:47,807 | Checking server myvm again.

ConnectionRefusedError Traceback (most recent call last)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw
176 )
178 except SocketTimeout:

from runhouse.

dongreenberg avatar dongreenberg commented on May 21, 2024

Oh that is interesting. Glad to hear it works with the .ssh/config, and I appreciate you helping us through the dict case too. I think I spotted the error and just pushed a fix to the branch. It runs through on my side, but I've set up a phony jumpbox to test it, so it's really helpful that you've tried in on yours.

from runhouse.

gopitk avatar gopitk commented on May 21, 2024

My environment is a bit custom (not sure how common it is). The target hostname (I pass in the ips field) is somewhat dynamic in nature (but follows some pattern which I specify in the .ssh/config) and wont resolve to any known IP address locally on my client machine and is only meaningful to the proxy host (which has a way to resolve these dynamic target hostname and route it correctly to my target server). As a result if I dont use the ~/.ssh/config and let runhouse use the ssh_proxy_command, my proxy host somehow seems to not resolve the target host passed in the -W option of proxycommand and returns a "Could not resolve IP address for : Name or service not known".

The SSH client I run from command line passes the dynamic host name to the Proxy command (as I have -W %h:%p in the proxy command) and there I dont see my proxy failing to resolving the target.

For now, I can use the ProxyCommand in ~/.ssh/config for now which seems to be working great for me with Runhouse as I was able to run several of the tutorials remotely. I am happy to add some logging on Runhouse locally to see how it is constructing the full SSH commands so I can check diff between my ssh/config setup vs passing ssh_proxy_command in rh.cluster. Where can I find that so I can debug it in my env to further isolate?

from runhouse.

dongreenberg avatar dongreenberg commented on May 21, 2024

Good point, we should log the SSH commands, and I'm curious why they wouldn't be resolving the same way as through the command line. I have a gnarly commit in the works on a separate branch that I'll land shortly, and then add that logging and push to this branch. In general, would you say it's preferable for our ssh activity to run through the command line so it's consistent with whatever you know you can do directly, rather than use tools which depend on Python SSH tools (e.g. Paramiko, asyncssh)?

from runhouse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.