Comments (10)
Thanks for bringing this up! This is something we should definitely support. Would adding a separate parameter for the proxy command work for your use case?
from runhouse.
We actually already support this, but haven't explicitly documented it. Can you try adding "ssh_proxy_command": "{proxy string}"
to the ssh_creds dictionary?
One caveat - if you're using folder
objects (or blob
or table
, which depend on folder
), we haven't yet added this, but are actually in the process of significantly expanding our SSH flexibility, and likely will release it within the week.
from runhouse.
Thanks for the tipcs @jlewitt1 and @dongreenberg . I tried to add the "ssh_proxy_command" to the ssh_creds dict. It seem to work (in terms of seeing an SSH connect) but then it threw an exception when RH tried to check connectivity again. This is the exception. I dont see this when I have the ProxyCommand in my ~/.ssh/config. BTW - I have a couple of other SSH options (-o ) that I have in the config file which I did not have a way to pass in in the dict.
INFO | 2023-07-12 13:23:29,522 | Checking server myvm again.
---------------------------------------------------------------------------
BaseSSHTunnelForwarderError Traceback (most recent call last)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:357, in Cluster.check_server(self, restart_server)
356 try:
--> 357 self.connect_server_client()
358 cluster_config = self.config_for_rns
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:324, in Cluster.connect_server_client(self, tunnel, force_reconnect)
323 else:
--> 324 self._rpc_tunnel, connected_port = self.ssh_tunnel(
325 HTTPClient.DEFAULT_PORT,
326 remote_port=DEFAULT_SERVER_PORT,
327 num_ports_to_try=5,
328 )
329 open_cluster_tunnels[self.address] = (
330 self._rpc_tunnel,
331 connected_port,
332 tunnel_refcount + 1,
333 )
AttributeError Traceback (most recent call last)
gpu = rh.cluster(....)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster_factory.py:59, in cluster(name, ips, ssh_creds, dryrun, **kwargs)
50 if {"instance_type", "num_instances", "provider"} <= kwargs.keys():
51 # Commenting out for now. If two creation paths creates confusion let's push people to use
52 # ondemand_cluster() instead.
(...)
55 # "If you would like to create an on-demand cluster, please use `rh.ondemand_cluster()` instead."
56 # )
57 return ondemand_cluster(name=name, **kwargs)
---> 59 return Cluster(ips=ips, ssh_creds=ssh_creds, name=name, dryrun=dryrun)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:58, in Cluster.__init__(self, name, ips, ssh_creds, dryrun, **kwargs)
55 self.client = None
57 if not dryrun and self.address:
---> 58 self.check_server()
59 # OnDemandCluster will start ray itself, but will also set address later, so won't reach here.
60 self.start_ray()
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/runhouse/rns/hardware/cluster.py:379, in Cluster.check_server(self, restart_server)
377 self.restart_server(resync_rh=False)
378 logger.info(f"Checking server {self.name} again.")
--> 379 self.client.check_server(cluster_config=cluster_config)
380 else:
381 raise ValueError(f"Could not connect to cluster <{self.name}>")
AttributeError: 'NoneType' object has no attribute 'check_server'
from runhouse.
Oh great point, we aren't passing the proxy into the tunnel. I can patch that and the options up shortly. Out of curiosity, you said this all works (through to running the remote function) when you've provided the options and proxy info in your SSH config?
from runhouse.
Actually remote function also did not work with my ~/.ssh/config too. It hung for a long time. What worked was the setup of the functions (like creating the pip installs) and cluster.run_python.
from runhouse.
Got it. I've been banging on this and the tunneling library we use (ironically to handle different credentials scenarios....) doesn't support proxies nicely (as in, it looks like it does, but I spent hours debugging and it still wouldn't proxy correctly despite working from the command line). I've implemented a fix (#85) going directly though the command line to remove that discrepancy, but I'll want to test it a bit further before releasing because it's a core execution path. Adding more ssh options is straightforward and I'll push that too shortly. If you're blocked and would like to give it a try so far, please feel free here:
pip install git+https://github.com/run-house/runhouse.git@proxy_tunneling
from runhouse.
Thanks @dongreenberg for the quick fix. I think there is still some issues after I used the proxy_tunneling branch. Good news is that the e2e (The stable diffusion tutorial) runs fine with remote functions etc when I use ~/.ssh/config to specify my ProxyCommand.
However when I use ssh_proxy_command dict item to pass that info I get some error in how the ssh /bash command is constructed. Seems like it is looking for SSH in my current dir.
/bin/bash: /home/user/runhouse/tutorials/t01_Stable_Diffusion/ssh -i ~/.ssh/id_rsa -W <host>:<port> -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null <proxyhost>: No such file or directory
Then I see the remote pkill and start of the runhouse http server go thru fine. This is followed by a connection refused error.
INFO | 2023-07-13 13:11:47,807 | Checking server myvm again.
ConnectionRefusedError Traceback (most recent call last)
File ~/miniconda3/envs/rh/lib/python3.9/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw
176 )
178 except SocketTimeout:
from runhouse.
Oh that is interesting. Glad to hear it works with the .ssh/config, and I appreciate you helping us through the dict case too. I think I spotted the error and just pushed a fix to the branch. It runs through on my side, but I've set up a phony jumpbox to test it, so it's really helpful that you've tried in on yours.
from runhouse.
My environment is a bit custom (not sure how common it is). The target hostname (I pass in the ips field) is somewhat dynamic in nature (but follows some pattern which I specify in the .ssh/config) and wont resolve to any known IP address locally on my client machine and is only meaningful to the proxy host (which has a way to resolve these dynamic target hostname and route it correctly to my target server). As a result if I dont use the ~/.ssh/config and let runhouse use the ssh_proxy_command, my proxy host somehow seems to not resolve the target host passed in the -W option of proxycommand and returns a "Could not resolve IP address for : Name or service not known".
The SSH client I run from command line passes the dynamic host name to the Proxy command (as I have -W %h:%p in the proxy command) and there I dont see my proxy failing to resolving the target.
For now, I can use the ProxyCommand in ~/.ssh/config for now which seems to be working great for me with Runhouse as I was able to run several of the tutorials remotely. I am happy to add some logging on Runhouse locally to see how it is constructing the full SSH commands so I can check diff between my ssh/config setup vs passing ssh_proxy_command in rh.cluster. Where can I find that so I can debug it in my env to further isolate?
from runhouse.
Good point, we should log the SSH commands, and I'm curious why they wouldn't be resolving the same way as through the command line. I have a gnarly commit in the works on a separate branch that I'll land shortly, and then add that logging and push to this branch. In general, would you say it's preferable for our ssh activity to run through the command line so it's consistent with whatever you know you can do directly, rather than use tools which depend on Python SSH tools (e.g. Paramiko, asyncssh)?
from runhouse.
Related Issues (20)
- [Doc] Issues with Inline Markup Rendering HOT 2
- How to use runhouse on my local server HOT 6
- I consistently see the user script hanging when copying a local package to the cluster.
- Need to support on HPU servers HOT 5
- Consistantly hit "http.client.BadStatusLine" issue in self-hosted tests. HOT 6
- Python 3.11 support
- Ease of porting an existing project to rh HOT 1
- Getting `ValueError: Error calling check on server: Internal Server Error` when checking server on an `aws` cluster HOT 13
- Need help with local gpu system HOT 7
- Consistently hit "BaseSSHTunnelForwarderError" HOT 2
- Hit "failed to rsync up" to test test_self_hosted_huggingface_instructor_embedding_documents() HOT 4
- Discord links in `README` are invalid HOT 2
- Uncaught error when bringing up on-demand GCP cluster with invalid `image_id`
- PX (P90) for inference Cold start HOT 1
- Secrets Management Overview + Tracker
- error when start with '--screen' option HOT 1
- How is this different then Modal? HOT 1
- Install fails with conda and python10 HOT 2
- Running into problems with runhouse in local mode - Very simple example HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from runhouse.