Comments (43)
Same issue, unusual plugin
from community.aws.
does
python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py
open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)
Yes I'm finding that the python command eats the printf. I can get around it by adding an echo | {cmd};
to the wrapped command, and the mark end printf seems to appear.
Here's the full patch, making use of your previous patch @DanielFallon
I made some minor changes so we don't use the regexes, as they were being fickle.
diff --git a/playbooks/connection_plugins/aws_ssm2.py b/playbooks/connection_plugins/aws_ssm2.py
index 1c01dba..baa15f3 100644
--- a/playbooks/connection_plugins/aws_ssm2.py
+++ b/playbooks/connection_plugins/aws_ssm2.py
@@ -444,10 +444,94 @@ class Connection(ConnectionBase):
def _prepare_terminal(self):
""" perform any one-time terminal settings """
- if not self.is_windows:
- cmd = "stty -echo\n" + "PS1=''\n"
- cmd = to_bytes(cmd, errors="surrogate_or_strict")
- self._session.stdin.write(cmd)
+ # No windows setup for now
+ if self.is_windows:
+ return
+
+ # *_complete variables are 3 valued:
+ # - None: not started
+ # - False: started
+ # - True: complete
+
+ startup_complete = False
+ disable_echo_complete = None
+ disable_echo_cmd = to_bytes("stty -echo\n", errors="surrogate_or_strict")
+
+ disable_prompt_complete = None
+ end_mark = "".join(
+ [random.choice(string.ascii_letters) for i in xrange(self.MARK_LENGTH)]
+ )
+ disable_prompt_cmd = to_bytes(
+ "PS1='' ; printf '\\n%s\\n' '" + end_mark + "'\n",
+ errors="surrogate_or_strict",
+ )
+ disable_prompt_reply = re.compile(
+ r"\r\r\n" + re.escape(end_mark) + r"\r\r\n", re.MULTILINE
+ )
+
+ stdout = ""
+ # Custom command execution for when we're waiting for startup
+ stop_time = int(round(time.time())) + self.get_option("ssm_timeout")
+ while (not disable_prompt_complete) and (self._session.poll() is None):
+ remaining = stop_time - int(round(time.time()))
+ if remaining < 1:
+ self._timeout = True
+ display.vvvv(
+ "PRE timeout stdout: {0}".format(to_bytes(stdout)), host=self.host
+ )
+ raise AnsibleConnectionFailure(
+ "SSM start_session timeout on host: %s" % self.instance_id
+ )
+ if self._poll_stdout.poll(1000):
+ stdout += to_text(self._stdout.read(1024))
+ display.vvvv(
+ "PRE stdout line: {0}".format(to_bytes(stdout)), host=self.host
+ )
+ else:
+ display.vvvv("PRE remaining: {0}".format(remaining), host=self.host)
+
+ # wait til prompt is ready
+ if startup_complete is False:
+ match = str(stdout).find("Starting session with SessionId")
+ if match != -1:
+ display.vvvv("PRE startup output received", host=self.host)
+ startup_complete = True
+
+ # disable echo
+ if startup_complete and (disable_echo_complete is None):
+ display.vvvv(
+ "PRE Disabling Echo: {0}".format(disable_echo_cmd), host=self.host
+ )
+ self._session.stdin.write(disable_echo_cmd)
+ disable_echo_complete = False
+
+ if disable_echo_complete is False:
+ match = str(stdout).find("stty -echo")
+ if match != -1:
+ disable_echo_complete = True
+
+ # disable prompt
+ if disable_echo_complete and disable_prompt_complete is None:
+ display.vvvv(
+ "PRE Disabling Prompt: {0}".format(disable_prompt_cmd),
+ host=self.host,
+ )
+ self._session.stdin.write(disable_prompt_cmd)
+ disable_prompt_complete = False
+
+ if disable_prompt_complete is False:
+ match = disable_prompt_reply.search(stdout)
+ if match:
+ stdout = stdout[match.end() :]
+ disable_prompt_complete = True
+
+ if not disable_prompt_complete:
+ raise AnsibleConnectionFailure(
+ "SSM process closed during _prepare_terminal on host: %s"
+ % self.instance_id
+ )
+ else:
+ display.vvv("PRE Terminal configured", host=self.host)
def _wrap_command(self, cmd, sudoable, mark_start, mark_end):
""" wrap command so stdout and status can be extracted """
@@ -460,14 +544,9 @@ class Connection(ConnectionBase):
if sudoable:
cmd = "sudo " + cmd
cmd = (
- "echo "
- + mark_start
- + "\n"
- + cmd
- + "\necho $'\\n'$?\n"
- + "echo "
- + mark_end
- + "\n"
+ f"printf '%s\\n' '{mark_start}';\n"
+ f"echo | {cmd};\n"
+ f"printf '\\n%s\\n%s\\n' \"$?\" '{mark_end}';\n"
)
display.vvvv(u"_wrap_command: '{0}'".format(to_text(cmd)), host=self.host)
I put the patched version of aws_ssm2.py
into my playbooks directory in connection_plugins/
dir, set ansible_connection: aws_ssm2
, and was able to successfully use ssm to run playbooks against my Ubuntu EC2 instance.
from community.aws.
does
python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py
open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)Yes I'm finding that the python command eats the printf. I can get around it by adding an
echo | {cmd};
to the wrapped command, and the mark end printf seems to appear.Here's the full patch, making use of your previous patch @DanielFallon
I made some minor changes so we don't use the regexes, as they were being fickle.
[...]
I put the patched version of
aws_ssm2.py
into my playbooks directory inconnection_plugins/
dir, setansible_connection: aws_ssm2
, and was able to successfully use ssm to run playbooks against my Ubuntu EC2 instance.
Even with the patched code from https://github.com/Filirom1/community.aws/blob/cb79826540fc58a2d13c1d95dde91c7544578748/plugins/connection/aws_ssm.py#L524 and your instructions, I still experience an issue:
ansible-playbook -i inventory_aws_ssm.yml -c aws_ssm2 playbook.yml
#playbook.yml
---
- name: Test command
gather_facts: false
hosts: all
vars:
ansible_connection: aws_ssm2
ansible_aws_ssm_region: eu-central-1
tasks:
- name: test
command:
cmd: ls -l
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: expected string or bytes-like object
fatal: [instance-id]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
Update fixing the new issue:
TLDR; Use the patched codewith @thomas-anderson-bsl instructions and ALSO set the var.ansible_aws_ssm_bucket_name
. The bucket name is required and there is no graceful exit if it's not supplied.
Apparently the plugin requires the ansible_aws_ssm_bucket_name
to be set, otherwise the run fails. I am still not sure why it tries to run _file_transport_command
on a cmd: ls -l
command.. Maybe someone else can clarify that. Anyway, with the patched code and setting the bucket variable - make sure your account has access rights to it-, I can connect to the instance 👍
from community.aws.
As a work-around you can switch default shell to bash
instead of sh
in Session Manager preferences:
https://aws.amazon.com/premiumsupport/knowledge-center/ssm-session-manager-change-shell/
from community.aws.
Here is a workaround that avoids sh
entirely by starting a bash shell.
--- a/plugins/connection/aws_ssm.py 2020-09-03 18:43:43.818000000 +0200
+++ b/plugins/connection/aws_ssm.py 2020-09-03 18:43:19.805000000 +0200
@@ -288,11 +288,17 @@
profile_name = ''
region_name = self.get_option('region')
- ssm_parameters = dict()
client = boto3.client('ssm', region_name=region_name)
self._client = client
- response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+
+ if self.is_windows:
+ ssm_parameters = dict()
+ response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+ else:
+ ssm_parameters = {"command": ["bash -l"]}
+ response = client.start_session(Target=self.instance_id, DocumentName="AWS-StartInteractiveCommand", Parameters=ssm_parameters)
+
self._session_id = response['SessionId']
cmd = [
from community.aws.
As another workaround, here's a proxy command that should just work to allow connections via ec2-instance-connect and then proxy via ssm:
https://gist.github.com/DanielFallon/45310cc76f46c1f1b2f7272d19b76312
^Technically this is significantly more efficient than the plugin in the gist because SSH Multiplexing will be used if it available. (and is by default)
from community.aws.
Why isn't anyone using the fix here #583 ? It seems to work flawlessly for me.
from community.aws.
I found this problem occurs windows also.
from community.aws.
I looked into this, and there are several things wrong I think.
See this line
cmd = "echo " + mark_start + "\n" + cmd + "\necho $'\\n'$?\n" + "echo " + mark_end + "\n"
The shell used by the session-manager is sh
, which does not support bashisms like $'\n'
to print a new line.
See the difference:
sh
$ echo $'\n'$?
$
0
bash
$ echo $'\n'$?
0
However, that might be a red herring. Notice the output:
<i-xxxxx> EXEC stdout line: $ stty -echo
<i-xxxxx> EXEC stdout line: PS1=''
<i-xxxxx> EXEC stdout line: echo YYYYY
<i-xxxxx> EXEC stdout line: echo ~
<i-xxxxx> EXEC stdout line: echo $'\n'$?
<i-xxxxx> EXEC stdout line: echo ZZZZZ
That is the input commands, not the stdout. Where is the output?
from community.aws.
I looked at this for a bit today and found that you were right to be skeptical @abeluck .
The issue appears to actually be in _prepare_terminal
. The current code does not wait for tty -echo
to return before sending additional bytes. This causes all characters that are sent prior to the return of tty -echo
to be printed and any characters written afterwards to not be printed.
Since the commands are sent so quickly, the terminal prints all lines. Then we break out of the loop because <start_mark>
is definitely present in echo <start_mark>
. You can quickly fix the behavior by adding a delay (and also probably the fix for the bashism) Here are some logs to demonstrate:
After removing the loop breakout:
Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python3.8/site-packages/ansible/plugins/callback/minimal.py
META: ran handlers
<ip-___-__-_-___.ec2.internal> ESTABLISH SSM CONNECTION TO: i-0cc44a1e2f7995c53
<ip-___-__-_-___.ec2.internal> SSM COMMAND: ['/usr/local/bin/session-manager-plugin',...]
<ip-___-__-_-___.ec2.internal> SSM CONNECTION ID: DanFallon-095d5d09e63ce624c
<ip-___-__-_-___.ec2.internal> EXEC echo ~ubuntu
<ip-___-__-_-___.ec2.internal> _wrap_command: 'echo PKjewgTIHJcLjaXeCjlasDjfZw
echo ~ubuntu
echo $'\n'$?
echo uZmXXhFRmkCzCEZEPjWMQlNiIF
'
<ip-___-__-_-___.ec2.internal> EXEC stdout line:
<ip-___-__-_-___.ec2.internal> EXEC stdout line: Starting session with SessionId: DanFallon-095d5d09e63ce624c
<ip-___-__-_-___.ec2.internal> EXEC remaining: 60
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ stty -echo ;
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PS1='' ;
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo PKjewgTIHJcLjaXeCjlasDjfZw
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo ~ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo $'\n'$?
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo uZmXXhFRmkCzCEZEPjWMQlNiIF
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ PKjewgTIHJcLjaXeCjlasDjfZw
<ip-___-__-_-___.ec2.internal> EXEC stdout line: /home/ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 0
<ip-___-__-_-___.ec2.internal> EXEC stdout line: uZmXXhFRmkCzCEZEPjWMQlNiIF
<ip-___-__-_-___.ec2.internal> EXEC remaining: 59
<ip-___-__-_-___.ec2.internal> EXEC remaining: 58
<ip-___-__-_-___.ec2.internal> EXEC remaining: 57
<ip-___-__-_-___.ec2.internal> EXEC remaining: 56
<ip-___-__-_-___.ec2.internal> EXEC remaining: 55
<ip-___-__-_-___.ec2.internal> EXEC remaining: 54
<ip-___-__-_-___.ec2.internal> EXEC remaining: 53
...
After adding a 5 second delay
Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python3.8/site-packages/ansible/plugins/callback/minimal.py
META: ran handlers
<ip-___-__-_-___.ec2.internal> ESTABLISH SSM CONNECTION TO: i-0cc44a1e2f7995c53
<ip-___-__-_-___.ec2.internal> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', ...]
Sleeping for 5 seconds.
<ip-___-__-_-___.ec2.internal> SSM CONNECTION ID: DanFallon-05193c016e933c229
<ip-___-__-_-___.ec2.internal> EXEC echo ~ubuntu
<ip-___-__-_-___.ec2.internal> _wrap_command: 'echo KylNOeAyCbfmTzlCZGrbHiYnSm
echo ~ubuntu
echo $'\n'$?
echo PujmuNjsUIsmGZzoUwNAUvowBk
'
<ip-___-__-_-___.ec2.internal> EXEC stdout line:
<ip-___-__-_-___.ec2.internal> EXEC stdout line: Starting session with SessionId: DanFallon-05193c016e933c229
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ stty -echo ;
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PS1='' ;
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ KylNOeAyCbfmTzlCZGrbHiYnSm
<ip-___-__-_-___.ec2.internal> EXEC stdout line: /home/ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 0
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PujmuNjsUIsmGZzoUwNAUvowBk
<ip-___-__-_-___.ec2.internal> EXEC remaining: 60
<ip-___-__-_-___.ec2.internal> EXEC remaining: 59
<ip-___-__-_-___.ec2.internal> EXEC remaining: 58
<ip-___-__-_-___.ec2.internal> EXEC remaining: 57
<ip-___-__-_-___.ec2.internal> EXEC remaining: 56
<ip-___-__-_-___.ec2.internal> EXEC remaining: 55
from community.aws.
Would love to see these fixes land, and maybe get some tests around this plugin (not sure how that would work).
We bailed on the ssm plugin for now and are running ansible over ssh over ssm, which means we still have to manage ssh keys on instances (though we don't need a bastion any longer).
from community.aws.
Another strange issue that i see related to this during gathering facts is the AnsiballZ_setup.py that is run does something to mess with the shell and always hits EXEC remaining and always times out after without echoing to stdout the new line and the end mark. My remote host is running an older version of python so I'm initially thinking thats related. Trying to stand up a venv on remote to use as env
from community.aws.
Here's a patch that I'm willing to contribute (haven't gotten a chance to set up a fork yet)
- Replaces
echo
withprintf
because it's more portable and robust - waits on receiving command replies before starting additional executions.
Problems I still see:
- Does aws_ssm.py need to support python2? I don't think my code is compatible. (and I thought boto3 was python3 only, maybe I'm wrong)
- cmd (line 517) when executed could eat the end_mark command. maybe we could disconnect cmd's stdin? Any solution I thought of for this makes some assumptions about the command or the terminal that I didn't like.
Anywho, I freely release the below under a GPL V3.0+ license like the files from which the were derived. I can look at figuring out the contributor agreement/etc later this week.
diff --git a/plugins/connection/aws_ssm.py b/plugins/connection/aws_ssm.py
index 7f7d692..4dc3965 100644
--- a/plugins/connection/aws_ssm.py
+++ b/plugins/connection/aws_ssm.py
@@ -412,11 +412,97 @@ class Connection(ConnectionBase):
def _prepare_terminal(self):
''' perform any one-time terminal settings '''
- if not self.is_windows:
- cmd = "stty -echo\n" + "PS1=''\n"
- cmd = to_bytes(cmd, errors='surrogate_or_strict')
- self._session.stdin.write(cmd)
+ # No windows setup for now
+ if self.is_windows:
+ return
+
+ # *_complete variables are 3 valued:
+ # - None: not started
+ # - False: started
+ # - True: complete
+
+ startup_complete = False
+ startup_reply = re.compile(
+ r"Starting session with SessionId:\W+" +
+ re.escape(self._session_id) +
+ r"\r\n\$ ", re.MULTILINE)
+
+ disable_echo_complete = None
+ disable_echo_cmd = to_bytes("stty -echo\n", errors='surrogate_or_strict')
+ disable_echo_reply = re.compile(
+ r"stty \-echo" +
+ r"\r\r\n\$", re.MULTILINE
+ )
+
+ disable_prompt_complete = None
+ end_mark = "".join([random.choice(string.ascii_letters) for i in xrange(self.MARK_LENGTH)])
+ disable_prompt_cmd = to_bytes(
+ "PS1='' ; printf '\\n%s\\n' '" + end_mark + "'\n",
+ errors='surrogate_or_strict')
+ disable_prompt_reply = re.compile(
+ r"\r\r\n" +
+ re.escape(end_mark) +
+ r"\r\r\n", re.MULTILINE
+ )
+
+ stdout = ""
+ cursor = 0
+ # Custom command execution for when we're waiting for startup
+ stop_time = int(round(time.time())) + self.get_option('ssm_timeout')
+ while (not disable_prompt_complete) and (self._session.poll() is None):
+ remaining = stop_time - int(round(time.time()))
+ if remaining < 1:
+ self._timeout = True
+ display.vvvv(u"PRE timeout stdout: {0}".format(to_bytes(stdout)), host=self.host)
+ raise AnsibleConnectionFailure("SSM start_session timeout on host: %s"
+ % self.instance_id)
+ if self._poll_stdout.poll(1000):
+ stdout += to_text(self._stdout.read(1024))
+ display.vvvv(u"PRE stdout line: {0}".format(to_bytes(stdout)), host=self.host)
+ else:
+ display.vvvv(u"PRE remaining: {0}".format(remaining), host=self.host)
+
+ # wait til prompt is ready
+ if startup_complete is False:
+ match = startup_reply.search(stdout,cursor)
+ if match:
+ display.vvvv(u"PRE startup output received", host=self.host)
+ cursor = match.end()
+ startup_complete = True
+
+
+ # disable echo
+ if startup_complete and (disable_echo_complete is None):
+ display.vvvv(u"PRE Disabling Echo: {0}".format(disable_echo_cmd), host=self.host)
+ self._session.stdin.write(disable_echo_cmd)
+ disable_echo_complete = False
+
+ if disable_echo_complete is False:
+ match = disable_echo_reply.search(stdout)
+ if match:
+ stdout = stdout[match.end():]
+ disable_echo_complete = True
+
+
+ # disable prompt
+ if disable_echo_complete and disable_prompt_complete is None:
+ display.vvvv(u"PRE Disabling Prompt: {0}".format(disable_prompt_cmd), host=self.host)
+ self._session.stdin.write(disable_prompt_cmd)
+ disable_prompt_complete = False
+
+ if disable_prompt_complete is False:
+ match = disable_prompt_reply.search(stdout)
+ if match:
+ stdout = stdout[match.end():]
+ disable_prompt_complete = True
+
+ if not disable_prompt_complete:
+ raise AnsibleConnectionFailure("SSM process closed during _prepare_terminal on host: %s"
+ % self.instance_id)
+ else:
+ display.vvv(u"PRE Terminal configured", host=self.host)
+
def _wrap_command(self, cmd, sudoable, mark_start, mark_end):
''' wrap command so stdout and status can be extracted '''
@@ -427,7 +513,9 @@ class Connection(ConnectionBase):
else:
if sudoable:
cmd = "sudo " + cmd
- cmd = "echo " + mark_start + "\n" + cmd + "\necho $'\\n'$?\n" + "echo " + mark_end + "\n"
+ cmd = ("printf '%s\\n' '{0}' ;\n".format(mark_start) +
+ cmd +
+ " ;\nprintf '\\n%s\\n%s\\n' \"$?\" '{0}' ;\n".format(mark_end))
display.vvvv(u"_wrap_command: '{0}'".format(to_text(cmd)), host=self.host)
return cmd
from community.aws.
Im willing to try it because im stuck. My executing node is running py3. My endpoint is only running 2.7 (legacy os). So the plugin being py3 only should be fine. The py2 issues may just be with what ansible packages up and sends over to the endpoint
from community.aws.
@DanielFallon thanks for the patch. Had to change your ssm_timeout to timeout for the self.get_option call.
upon executing im getting stuck in a PRE remaining loop eg prompt not ready during gathering facts
from community.aws.
The change in get_option call is because my patch was for the development version instead of the currently released version (they just changed it from timeout to ssm_timeout)
can you post a redacted log with debug level at least 4? (-vvvv)
I'd like to see which part of the setup loop it's getting stuck in.
There are sort of 3 phases:
first, waits for it to receive the string:
Starting session with SessionId:
second, waits to receive the string:
stty -echo
$
and the third wait's to receive a custom generated end string:
<end_mark>
I expected this code to be a bit fragile but fail safe. Maybe I should add some more detail to the timeout exception
from community.aws.
I can grab you a log, but may take some time, my inventory script now isn't populating as expected which sets up the s3 bucket name, and instance id.
from community.aws.
it's also worth noting that I only tested this with a sort of hello world example. I wouldn't be surprised if there are still things that don't work. I do think that some sort of scripted regression testing would be very valuable for this as it is fragile and could be broken by:
- changes to ansible
- changes to the ssm session plugin
- changes to the aws api
- probably differences in a hosts'
/etc/profile
all things to enumerate and test
from community.aws.
Does your patch apply to the 1.2 tagged release or on the main branch?
from community.aws.
Please see sanitized output. I took stock 1.2 and added just your patch and got the following
$ ansible-playbook $ANS_VAULT_FLAGS --limit TEST-GRP --user testuser --tags debug-vars common_aws_node.yml -vvvvvv
ansible-playbook 2.9.9
config file = /home/user/ansibletest/ansible.cfg
configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/user/venvs/ansible/lib64/python3.6/site-packages/ansible
executable location = /home/user/venvs/ansible/bin/ansible-playbook
python version = 3.6.8 (default, Apr 2 2020, 13:34:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
Using /home/user/ansibletest/ansible.cfg as config file
Reading vault password file: XXXX
Reading vault password file: XXXX
Reading vault password file: XXXX
setting up inventory plugins
host_list declined parsing /home/user/ansibletest/inventories/dynamic_inv.yml as it did not pass its verify_file() method
Start parsing DYN inventory
Blacklisted hosts
Done parsing DYN inventory
Parsed /home/user/ansibletest/inventories/dynamic_inv.yml inventory source with auto plugin
statically imported: /home/user/ansibletest/roles/basic-provisioning/tasks/10_install_start_aws_ssm.yml
Loading callback plugin default of type stdout, v2.0 from /home/user/venvs/ansible/lib64/python3.6/site-packages/ansible/plugins/callback/default.py
PLAYBOOK: common_aws_node.yml ************************************************************************************************************************************************************************************************************************
Positional arguments: common_aws_node.yml
verbosity: 6
remote_user: testuser
connection: smart
timeout: 10
become_method: sudo
tags: ('debug-vars',)
inventory: ('/home/user/ansibletest/inventories/dynamic_inv.yml',)
subset: TEST-GRP
vault_ids: XXXX
forks: 5
7 plays in common_aws_node.yml
PLAY [provision] *************************************************************************************************************************************************************************************************************************************
META: ran handlers
META: ran handlers
META: ran handlers
--- CUT --- (Skipped roles)
PLAY [debug-vars] ************************************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************
task path: /home/user/ansibletest/common_aws_node.yml:59
<10.x.x.x> ESTABLISH SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "[email protected]", "TokenValue": "XXXCUTXXX", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/[email protected]?role=publish_subscribe", "ResponseMetadata": {"RequestId": "XXX REQ ID XXX", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Fri, 02 Oct 2020 14:49:59 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "668", "connection": "keep-alive", "x-amzn-requestid": "XXX amzn id XXX"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', '', '{"Target": "i-0AWS_INSTANCEID"}', 'https://ssm.us-east-1.amazonaws.com']
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n'
<10.x.x.x> PRE remaining: 60
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> PRE remaining: 59
<10.x.x.x> PRE remaining: 58
--- CUT ---
<10.x.x.x> PRE remaining: 3
<10.x.x.x> PRE remaining: 2
<10.x.x.x> PRE remaining: 1
<10.x.x.x> PRE timeout stdout: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> ssm_retry: attempt: 0, cmd (echo ~testuser...), pausing for 0 seconds
<10.x.x.x> CLOSING SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> TERMINATE SSM SESSION: [email protected]
<10.x.x.x> ESTABLISH SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "[email protected]", "TokenValue": "XXXCUTXXX", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/[email protected]?role=publish_subscribe", "ResponseMetadata": {"RequestId": "XXX REQ ID XXX", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Fri, 02 Oct 2020 14:51:00 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "668", "connection": "keep-alive", "x-amzn-requestid": "XXX amzn id XXX"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', '', '{"Target": "i-0AWS_INSTANCEID"}', 'https://ssm.us-east-1.amazonaws.com']
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n'
<10.x.x.x> PRE remaining: 60
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> PRE remaining: 59
<10.x.x.x> PRE remaining: 58
I also added the bash shell patch as well (but changed bash -l to bash --noprofile -l) with a similar result
from community.aws.
I modified some of the re's that are in the patch (mainly removing the \r\n's). Now I'm hitting the same issue. The AnsiballZ_setup.py gets transferred and executed. It prints out the json for the facts, but i never get the end mark. As in I never see the second printf output
sudo sudo -H -S -n -u root /bin/sh -c 'echo BECOME-SUCCESS-lzrbkhycxnugzysliwrhdukemyhhtyde ; /usr/bin/python /home/maintuser/.ansible/tmp/ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py' ;
printf '\n%s\n%s\n' "$?" 'icjRpqBJJHvCxBxLeqZMYeJKhQ' ;
Then back to the retry loop
from community.aws.
the bash patch is definitely incompatible with mine. (because the regex would definitely have to change.)
does python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py
open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)
This is one of the things I was worried about in my comment above. I kind of want to look at some other connection plugins and see how they handle this, because I think a lot of pieces of this are pretty fragile and I want to see others ideas.
maybe instead of clearing the prompt, we could set it to an end_mark so that the shell will print it.
from community.aws.
I tried both sh and bash (with and w/o modding the re's) and both have similar behavior. I do think that AnsiballZ_setup.py is doing something to stdin.
from community.aws.
This is not an SSM specific solution, but I'm tempted to try out using SendSSHPublicKey from the EC2 instance connect api and then wrapping the ssh connection library instead. Feels like it might be generally more stable (and would support sftp as well)
SSM might still be necessary for the tunnel, but I'm not sure piping commands to stdin is ever really going to be partiuclarly stable.
Any reason why ssmStartSession is superior that I should know about?
from community.aws.
Looking at this a bit closer, it looks like the right way to address this if use of SSM is wanted is to add a parameter like ssm_exec_document_name
. I had not gotten a chance to use SSM much prior to now and wasn't super familiar with its api. The document name in the current version of the api can define most of the behavior of how an interactive session starts, although it unfortunately seems to still start with a raw shell (see here: https://github.com/aws/amazon-ssm-agent/blob/b9654b268afcb7e70a9cc6c6d9b7d2a676f5b468/agent/session/plugins/shell/shell_unix.go#L53 )
Because customization of the shell is so fragile, it's probably worthwhile to provide a sane default and then allow users to override the --document-name provided to the session to configure things further. To facilitate this, the important part is to establish a protocol for communicating stdout and the exit code back to the client (preferably one that is slightly more robust than what is present now so that a user can always produce working results for their system)
I'll give it another half an hour today and post results
from community.aws.
@DanielFallon would be interested to see your results. Seems to be a step in the right direction.
from community.aws.
Here is a workaround that avoids
sh
entirely by starting a bash shell.--- a/plugins/connection/aws_ssm.py 2020-09-03 18:43:43.818000000 +0200 +++ b/plugins/connection/aws_ssm.py 2020-09-03 18:43:19.805000000 +0200 @@ -288,11 +288,17 @@ profile_name = '' region_name = self.get_option('region') - ssm_parameters = dict() client = boto3.client('ssm', region_name=region_name) self._client = client - response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters) + + if self.is_windows: + ssm_parameters = dict() + response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters) + else: + ssm_parameters = {"command": ["bash -l"]} + response = client.start_session(Target=self.instance_id, DocumentName="AWS-StartInteractiveCommand", Parameters=ssm_parameters) + self._session_id = response['SessionId'] cmd = [
I think this solution work partially, because is not closing the sessions in session manager. And the system open one session for every one task in Ansible. This causes if I execute a playbook with 10 tasks, 10 session is opening and no closing xd ...
from community.aws.
The above wont work if there is anything custom about the systems (or users) bash profile that changes what is presented on stdout. I've tried this even with --noprofile and it still hangs for me when gathering facts.
from community.aws.
@mikeneiderhauser hey btw I did rewrite this using paramiko ssh over the weekend. it's still a bit grody but should work. The biggest problem is that the connections aren't cached. so there are A LOT of aws API calls. I didn't realize that paramiko didn't support persistent connections. I'll have to switch it to borrow from the SSH connection plugin and leverage ssh pipelining instead. Hopefully I can detect whether or not a pipeline is open and not restart the connection from the logic of that plugin. If not, then I'll have to build a disk based cache of some kind for the key timeout.
(Also sorry for the extra cruft trying to cache things, I forgot that ansible forks to a new process for each connection)
Here's what I tried and I think this finally should work just as well as the ssh plugin, but it will exhaust your aws api rate limit rather quickly.
https://gist.github.com/DanielFallon/dffad373c688da32919e709d6738d715
from community.aws.
@DanielFallon thanks for this. ill take a look
from community.aws.
I'll turn this into a pull request later this week, but here's a plugin that uses ec2-instance-connect to push a temporary ssh key to the host and uses this to authenticate.
If paired with a proxycommand for ssm this should be pretty effective:
https://gist.github.com/DanielFallon/67572f4439602774d02fbbe3bce8a5b9
from community.aws.
Hi,
is there an official solution?
I am facing the same problem.
from community.aws.
Hi,
is there an official solution?
I am facing the same problem.
Same
from community.aws.
I have the same issue with Debian e.g. ubuntu
from community.aws.
As with @eliskovets , I had to change the default shell to bash, but I used the instructions at https://unix.stackexchange.com/questions/442510/how-to-use-bash-for-sh-in-ubuntu.
from community.aws.
For me, changing the shell using the profile in the Session Manager setting broke logins to Amazon Linux 2 for some reason (which was working OK before). What worked was changing the link that /bin/sh uses via the "sudo dpkg-reconfigure dash" command in Ubuntu. Thanks @eliskovets!
from community.aws.
Ran into the identical thing: https://stackoverflow.com/questions/68734815/ansible-aws-ssm-connectivity-plugin-ciphertext-refers-to-a-customer-master
Disabling KMS encryption in the SSM config fixed it:
Does this warrant a separate bug?
Or, am I missing a config value to enable the "KMS encryption" in the session manager preferences?
(Just confirmed while replicating that the default shell needed to NOT be dash
as well)
from community.aws.
(Update, moving away from dash has no immediate downside)
Re: Changing the ubuntu default shell fix - ansible version
# See "/var/cache/debconf/config.dat" for name of config item after changing manually
- name: aws-ssm ansible plugin fails if dash is the default shell
ansible.builtin.debconf:
name: dash/sh
question: dash/sh
value: false
vtype: boolean
from community.aws.
@bedge your vtype should be boolean and I don't think you should be quoting 'false'.
I haven't tested with cloudinit yet but hopefully that's all you need to do.
from community.aws.
@jagibson Confirmed your suggestion os correct for the ansible (no) dash fix:
# See "/var/cache/debconf/config.dat" for name of config item after changing manually
- name: aws-ssm ansible plugin fails if dash is the default shell
ansible.builtin.debconf:
name: dash/sh
question: dash/sh
value: false
vtype: boolean
thanks
Now the remaining issue is the s3 permissions. Seems to work only with
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
}
]
}
Any attempt to reign that in and it starts failing with:
2021-08-13 20:29:58 INFO [ssm-agent-worker] [MessageGatewayService] Sending reply {
"SchemaVersion": 1,
"TaskId": "[email protected]",
"Topic": "agent_task_complete",
"FinalTaskStatus": "Failed",
"IsRoutingFailure": false,
"AwsAccountId": "",
"InstanceId": "i-01c257e02de698f9b",
"Output": "Couldn't start the session because we are unable to validate encryption on Amazon S3 bucket. Error: AccessDenied: Access Denied\n\tstatus code: 403, request id: F20PG8SWSH8V4J6Z, host id: 9JY/CcOq6C6Bswaw7AJfbGLcTlzD8scLt/nEBncsI8ac9GPTEeVMDTU7B2yWcgDxn0W+fsZINW4=",
"S3Bucket": "",
"S3UrlSuffix": "",
"CwlGroup": "",
"CwlStream": ""
}
Is there any way to restrict the s3 access to a specific bucket:///... ?
from community.aws.
After some unreliable success with dash removed as the default for /bin/sh on the targets, I switched to setting the shell profile
in the AWS session-manager prefs and that seems to be more reliable.
ie: hasn't failed with that set, and consistently fails with it unset.
Not sure how this is different from removing the /bin/sh -> dash association, but the end result is definitely better.
from community.aws.
Does this plugin work?
1st attempt:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ValueError: invalid literal for int() with base 10: "echo $'\\n'$?"
ubuntu2004 | FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}
2nd attempt:
Changed default shell to /bin/bash
in Session Manager's Preferences as suggested above.
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: expected string or bytes-like object
ubuntu2004 | FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}
from community.aws.
Recent AWS Linux AMIs add some unicode characters to inputs/outputs that can cause this same error message, if you recently launched a new instance / updated AMIs try the following fix - #1756 (comment)
from community.aws.
Related Issues (20)
- Allow for aws_ssm plugin option to be provided by an environment variable HOT 1
- aws.elb_target_group.targets.Port try to convert string to integer to allow use of jinja2 vars HOT 5
- Add Lookup for Get Random Password HOT 1
- s3settings sample not found [community.aws.dms_endpoint]
- Indentation inconsistent on cloudfront_distribution module
- aws_api_gateway only creates REST-API and not HTTP
- Integration tests need to be updated to avoid unsafe templating HOT 1
- Add support for "skip_matching" preference in autoscaling_instance_refresh
- Typo in autoscaling_instance_refresh module in check mode HOT 1
- Unable to connect with SSM on Amazon Linux 2023 based instances HOT 6
- Allow bulk deletion of ECS Task Definitions HOT 1
- etag module util missing unit tests
- Implement Support for Amazon EFS Archive Storage Class and Elastic Mode in Ansible EFS Module
- aws workspaces inventory plugin HOT 1
- SSM Connection: Failed to create temporary directory HOT 2
- Add possibility to create AWS Network Load Balancer with attached Security Group
- Add throughput mode elastic for efs HOT 1
- api_gateway doesn't create a new deployment when API Gateway already exists
- eks_cluster should be able to upgrade cluster versions
- aws_ssm remote directory / HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from community.aws.