Giter VIP home page Giter VIP logo

Comments (43)

ilyavaiser avatar ilyavaiser commented on June 12, 2024 8

Same issue, unusual plugin

from community.aws.

thomas-anderson-bsl avatar thomas-anderson-bsl commented on June 12, 2024 6

does python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)

Yes I'm finding that the python command eats the printf. I can get around it by adding an echo | {cmd}; to the wrapped command, and the mark end printf seems to appear.

Here's the full patch, making use of your previous patch @DanielFallon

I made some minor changes so we don't use the regexes, as they were being fickle.

diff --git a/playbooks/connection_plugins/aws_ssm2.py b/playbooks/connection_plugins/aws_ssm2.py
index 1c01dba..baa15f3 100644
--- a/playbooks/connection_plugins/aws_ssm2.py
+++ b/playbooks/connection_plugins/aws_ssm2.py
@@ -444,10 +444,94 @@ class Connection(ConnectionBase):
     def _prepare_terminal(self):
         """ perform any one-time terminal settings """

-        if not self.is_windows:
-            cmd = "stty -echo\n" + "PS1=''\n"
-            cmd = to_bytes(cmd, errors="surrogate_or_strict")
-            self._session.stdin.write(cmd)
+        # No windows setup for now
+        if self.is_windows:
+            return
+
+        # *_complete variables are 3 valued:
+        #   - None: not started
+        #   - False: started
+        #   - True: complete
+
+        startup_complete = False
+        disable_echo_complete = None
+        disable_echo_cmd = to_bytes("stty -echo\n", errors="surrogate_or_strict")
+
+        disable_prompt_complete = None
+        end_mark = "".join(
+            [random.choice(string.ascii_letters) for i in xrange(self.MARK_LENGTH)]
+        )
+        disable_prompt_cmd = to_bytes(
+            "PS1='' ; printf '\\n%s\\n' '" + end_mark + "'\n",
+            errors="surrogate_or_strict",
+        )
+        disable_prompt_reply = re.compile(
+            r"\r\r\n" + re.escape(end_mark) + r"\r\r\n", re.MULTILINE
+        )
+
+        stdout = ""
+        # Custom command execution for when we're waiting for startup
+        stop_time = int(round(time.time())) + self.get_option("ssm_timeout")
+        while (not disable_prompt_complete) and (self._session.poll() is None):
+            remaining = stop_time - int(round(time.time()))
+            if remaining < 1:
+                self._timeout = True
+                display.vvvv(
+                    "PRE timeout stdout: {0}".format(to_bytes(stdout)), host=self.host
+                )
+                raise AnsibleConnectionFailure(
+                    "SSM start_session timeout on host: %s" % self.instance_id
+                )
+            if self._poll_stdout.poll(1000):
+                stdout += to_text(self._stdout.read(1024))
+                display.vvvv(
+                    "PRE stdout line: {0}".format(to_bytes(stdout)), host=self.host
+                )
+            else:
+                display.vvvv("PRE remaining: {0}".format(remaining), host=self.host)
+
+            # wait til prompt is ready
+            if startup_complete is False:
+                match = str(stdout).find("Starting session with SessionId")
+                if match != -1:
+                    display.vvvv("PRE startup output received", host=self.host)
+                    startup_complete = True
+
+            # disable echo
+            if startup_complete and (disable_echo_complete is None):
+                display.vvvv(
+                    "PRE Disabling Echo: {0}".format(disable_echo_cmd), host=self.host
+                )
+                self._session.stdin.write(disable_echo_cmd)
+                disable_echo_complete = False
+
+            if disable_echo_complete is False:
+                match = str(stdout).find("stty -echo")
+                if match != -1:
+                    disable_echo_complete = True
+
+            # disable prompt
+            if disable_echo_complete and disable_prompt_complete is None:
+                display.vvvv(
+                    "PRE Disabling Prompt: {0}".format(disable_prompt_cmd),
+                    host=self.host,
+                )
+                self._session.stdin.write(disable_prompt_cmd)
+                disable_prompt_complete = False
+
+            if disable_prompt_complete is False:
+                match = disable_prompt_reply.search(stdout)
+                if match:
+                    stdout = stdout[match.end() :]
+                    disable_prompt_complete = True
+
+        if not disable_prompt_complete:
+            raise AnsibleConnectionFailure(
+                "SSM process closed during _prepare_terminal on host: %s"
+                % self.instance_id
+            )
+        else:
+            display.vvv("PRE Terminal configured", host=self.host)

     def _wrap_command(self, cmd, sudoable, mark_start, mark_end):
         """ wrap command so stdout and status can be extracted """
@@ -460,14 +544,9 @@ class Connection(ConnectionBase):
             if sudoable:
                 cmd = "sudo " + cmd
             cmd = (
-                "echo "
-                + mark_start
-                + "\n"
-                + cmd
-                + "\necho $'\\n'$?\n"
-                + "echo "
-                + mark_end
-                + "\n"
+                f"printf '%s\\n' '{mark_start}';\n"
+                f"echo | {cmd};\n"
+                f"printf '\\n%s\\n%s\\n' \"$?\" '{mark_end}';\n"
             )

         display.vvvv(u"_wrap_command: '{0}'".format(to_text(cmd)), host=self.host)

I put the patched version of aws_ssm2.py into my playbooks directory in connection_plugins/ dir, set ansible_connection: aws_ssm2, and was able to successfully use ssm to run playbooks against my Ubuntu EC2 instance.

from community.aws.

lassebenni avatar lassebenni commented on June 12, 2024 3

does python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)

Yes I'm finding that the python command eats the printf. I can get around it by adding an echo | {cmd}; to the wrapped command, and the mark end printf seems to appear.

Here's the full patch, making use of your previous patch @DanielFallon

I made some minor changes so we don't use the regexes, as they were being fickle.

[...]

I put the patched version of aws_ssm2.py into my playbooks directory in connection_plugins/ dir, set ansible_connection: aws_ssm2, and was able to successfully use ssm to run playbooks against my Ubuntu EC2 instance.

Even with the patched code from https://github.com/Filirom1/community.aws/blob/cb79826540fc58a2d13c1d95dde91c7544578748/plugins/connection/aws_ssm.py#L524 and your instructions, I still experience an issue:

ansible-playbook -i inventory_aws_ssm.yml -c aws_ssm2 playbook.yml

#playbook.yml
---
- name: Test command
  gather_facts: false
  hosts: all
  vars:
    ansible_connection: aws_ssm2
    ansible_aws_ssm_region: eu-central-1
  tasks:
    - name: test
      command:
        cmd: ls -l
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: expected string or bytes-like object
fatal: [instance-id]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

Update fixing the new issue:
TLDR; Use the patched codewith @thomas-anderson-bsl instructions and ALSO set the var.ansible_aws_ssm_bucket_name. The bucket name is required and there is no graceful exit if it's not supplied.

Apparently the plugin requires the ansible_aws_ssm_bucket_name to be set, otherwise the run fails. I am still not sure why it tries to run _file_transport_command on a cmd: ls -l command.. Maybe someone else can clarify that. Anyway, with the patched code and setting the bucket variable - make sure your account has access rights to it-, I can connect to the instance 👍

from community.aws.

eliskovets avatar eliskovets commented on June 12, 2024 3

As a work-around you can switch default shell to bash instead of sh in Session Manager preferences:
https://aws.amazon.com/premiumsupport/knowledge-center/ssm-session-manager-change-shell/

from community.aws.

abeluck avatar abeluck commented on June 12, 2024 2

Here is a workaround that avoids sh entirely by starting a bash shell.

--- a/plugins/connection/aws_ssm.py	2020-09-03 18:43:43.818000000 +0200
+++ b/plugins/connection/aws_ssm.py	2020-09-03 18:43:19.805000000 +0200
@@ -288,11 +288,17 @@
 
         profile_name = ''
         region_name = self.get_option('region')
-        ssm_parameters = dict()
 
         client = boto3.client('ssm', region_name=region_name)
         self._client = client
-        response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+
+        if self.is_windows:
+            ssm_parameters = dict()
+            response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+        else:
+            ssm_parameters = {"command": ["bash -l"]}
+            response = client.start_session(Target=self.instance_id, DocumentName="AWS-StartInteractiveCommand", Parameters=ssm_parameters)
+
         self._session_id = response['SessionId']
 
         cmd = [

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024 1

As another workaround, here's a proxy command that should just work to allow connections via ec2-instance-connect and then proxy via ssm:

https://gist.github.com/DanielFallon/45310cc76f46c1f1b2f7272d19b76312

^Technically this is significantly more efficient than the plugin in the gist because SSH Multiplexing will be used if it available. (and is by default)

from community.aws.

wilinger avatar wilinger commented on June 12, 2024 1

Why isn't anyone using the fix here #583 ? It seems to work flawlessly for me.

from community.aws.

Hokwang avatar Hokwang commented on June 12, 2024 1

I found this problem occurs windows also.

from community.aws.

abeluck avatar abeluck commented on June 12, 2024

I looked into this, and there are several things wrong I think.

See this line

            cmd = "echo " + mark_start + "\n" + cmd + "\necho $'\\n'$?\n" + "echo " + mark_end + "\n"

The shell used by the session-manager is sh, which does not support bashisms like $'\n' to print a new line.
See the difference:

sh

$ echo $'\n'$?
$
0

bash

$ echo $'\n'$?

0

However, that might be a red herring. Notice the output:

<i-xxxxx> EXEC stdout line: $ stty -echo
<i-xxxxx> EXEC stdout line: PS1=''
<i-xxxxx> EXEC stdout line: echo YYYYY
<i-xxxxx> EXEC stdout line: echo ~
<i-xxxxx> EXEC stdout line: echo $'\n'$?
<i-xxxxx> EXEC stdout line: echo ZZZZZ

That is the input commands, not the stdout. Where is the output?

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

I looked at this for a bit today and found that you were right to be skeptical @abeluck .

The issue appears to actually be in _prepare_terminal. The current code does not wait for tty -echo to return before sending additional bytes. This causes all characters that are sent prior to the return of tty -echo to be printed and any characters written afterwards to not be printed.

Since the commands are sent so quickly, the terminal prints all lines. Then we break out of the loop because <start_mark> is definitely present in echo <start_mark>. You can quickly fix the behavior by adding a delay (and also probably the fix for the bashism) Here are some logs to demonstrate:

After removing the loop breakout:

Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python3.8/site-packages/ansible/plugins/callback/minimal.py
META: ran handlers
<ip-___-__-_-___.ec2.internal> ESTABLISH SSM CONNECTION TO: i-0cc44a1e2f7995c53
<ip-___-__-_-___.ec2.internal> SSM COMMAND: ['/usr/local/bin/session-manager-plugin',...]
<ip-___-__-_-___.ec2.internal> SSM CONNECTION ID: DanFallon-095d5d09e63ce624c
<ip-___-__-_-___.ec2.internal> EXEC echo ~ubuntu
<ip-___-__-_-___.ec2.internal> _wrap_command: 'echo PKjewgTIHJcLjaXeCjlasDjfZw
echo ~ubuntu
echo $'\n'$?
echo uZmXXhFRmkCzCEZEPjWMQlNiIF
'
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: Starting session with SessionId: DanFallon-095d5d09e63ce624c
<ip-___-__-_-___.ec2.internal> EXEC remaining: 60
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ stty -echo ; 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PS1='' ; 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo PKjewgTIHJcLjaXeCjlasDjfZw
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo ~ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo $'\n'$?
<ip-___-__-_-___.ec2.internal> EXEC stdout line: echo uZmXXhFRmkCzCEZEPjWMQlNiIF
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ PKjewgTIHJcLjaXeCjlasDjfZw
<ip-___-__-_-___.ec2.internal> EXEC stdout line: /home/ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 0
<ip-___-__-_-___.ec2.internal> EXEC stdout line: uZmXXhFRmkCzCEZEPjWMQlNiIF
<ip-___-__-_-___.ec2.internal> EXEC remaining: 59
<ip-___-__-_-___.ec2.internal> EXEC remaining: 58
<ip-___-__-_-___.ec2.internal> EXEC remaining: 57
<ip-___-__-_-___.ec2.internal> EXEC remaining: 56
<ip-___-__-_-___.ec2.internal> EXEC remaining: 55
<ip-___-__-_-___.ec2.internal> EXEC remaining: 54
<ip-___-__-_-___.ec2.internal> EXEC remaining: 53
...

After adding a 5 second delay

Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python3.8/site-packages/ansible/plugins/callback/minimal.py
META: ran handlers
<ip-___-__-_-___.ec2.internal> ESTABLISH SSM CONNECTION TO: i-0cc44a1e2f7995c53
<ip-___-__-_-___.ec2.internal> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', ...]
Sleeping for 5 seconds.
<ip-___-__-_-___.ec2.internal> SSM CONNECTION ID: DanFallon-05193c016e933c229
<ip-___-__-_-___.ec2.internal> EXEC echo ~ubuntu
<ip-___-__-_-___.ec2.internal> _wrap_command: 'echo KylNOeAyCbfmTzlCZGrbHiYnSm
echo ~ubuntu
echo $'\n'$?
echo PujmuNjsUIsmGZzoUwNAUvowBk
'
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: Starting session with SessionId: DanFallon-05193c016e933c229
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ stty -echo ; 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PS1='' ; 
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $ KylNOeAyCbfmTzlCZGrbHiYnSm
<ip-___-__-_-___.ec2.internal> EXEC stdout line: /home/ubuntu
<ip-___-__-_-___.ec2.internal> EXEC stdout line: $
<ip-___-__-_-___.ec2.internal> EXEC stdout line: 0
<ip-___-__-_-___.ec2.internal> EXEC stdout line: PujmuNjsUIsmGZzoUwNAUvowBk
<ip-___-__-_-___.ec2.internal> EXEC remaining: 60
<ip-___-__-_-___.ec2.internal> EXEC remaining: 59
<ip-___-__-_-___.ec2.internal> EXEC remaining: 58
<ip-___-__-_-___.ec2.internal> EXEC remaining: 57
<ip-___-__-_-___.ec2.internal> EXEC remaining: 56
<ip-___-__-_-___.ec2.internal> EXEC remaining: 55

from community.aws.

abeluck avatar abeluck commented on June 12, 2024

Would love to see these fixes land, and maybe get some tests around this plugin (not sure how that would work).

We bailed on the ssm plugin for now and are running ansible over ssh over ssm, which means we still have to manage ssh keys on instances (though we don't need a bastion any longer).

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

Another strange issue that i see related to this during gathering facts is the AnsiballZ_setup.py that is run does something to mess with the shell and always hits EXEC remaining and always times out after without echoing to stdout the new line and the end mark. My remote host is running an older version of python so I'm initially thinking thats related. Trying to stand up a venv on remote to use as env

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

Here's a patch that I'm willing to contribute (haven't gotten a chance to set up a fork yet)

  • Replaces echo with printf because it's more portable and robust
  • waits on receiving command replies before starting additional executions.

Problems I still see:

  • Does aws_ssm.py need to support python2? I don't think my code is compatible. (and I thought boto3 was python3 only, maybe I'm wrong)
  • cmd (line 517) when executed could eat the end_mark command. maybe we could disconnect cmd's stdin? Any solution I thought of for this makes some assumptions about the command or the terminal that I didn't like.

Anywho, I freely release the below under a GPL V3.0+ license like the files from which the were derived. I can look at figuring out the contributor agreement/etc later this week.

diff --git a/plugins/connection/aws_ssm.py b/plugins/connection/aws_ssm.py
index 7f7d692..4dc3965 100644
--- a/plugins/connection/aws_ssm.py
+++ b/plugins/connection/aws_ssm.py
@@ -412,11 +412,97 @@ class Connection(ConnectionBase):
     def _prepare_terminal(self):
         ''' perform any one-time terminal settings '''
 
-        if not self.is_windows:
-            cmd = "stty -echo\n" + "PS1=''\n"
-            cmd = to_bytes(cmd, errors='surrogate_or_strict')
-            self._session.stdin.write(cmd)
+        # No windows setup for now
+        if self.is_windows:
+            return
+
+        # *_complete variables are 3 valued:
+        #   - None: not started
+        #   - False: started
+        #   - True: complete
+
 
+        startup_complete = False
+        startup_reply = re.compile(
+            r"Starting session with SessionId:\W+" +
+            re.escape(self._session_id) +
+            r"\r\n\$ ", re.MULTILINE)
+
+        disable_echo_complete = None
+        disable_echo_cmd = to_bytes("stty -echo\n", errors='surrogate_or_strict')
+        disable_echo_reply = re.compile(
+            r"stty \-echo" +
+            r"\r\r\n\$", re.MULTILINE
+        )
+
+        disable_prompt_complete = None
+        end_mark = "".join([random.choice(string.ascii_letters) for i in xrange(self.MARK_LENGTH)])
+        disable_prompt_cmd = to_bytes(
+            "PS1='' ; printf '\\n%s\\n' '" + end_mark + "'\n",
+            errors='surrogate_or_strict')
+        disable_prompt_reply = re.compile(
+            r"\r\r\n" +
+            re.escape(end_mark) +
+            r"\r\r\n", re.MULTILINE
+        )
+
+        stdout = ""
+        cursor = 0
+        # Custom command execution for when we're waiting for startup
+        stop_time = int(round(time.time())) + self.get_option('ssm_timeout')
+        while (not disable_prompt_complete) and (self._session.poll() is None):
+            remaining = stop_time - int(round(time.time()))
+            if remaining < 1:
+                self._timeout = True
+                display.vvvv(u"PRE timeout stdout: {0}".format(to_bytes(stdout)), host=self.host)
+                raise AnsibleConnectionFailure("SSM start_session timeout on host: %s"
+                                               % self.instance_id)
+            if self._poll_stdout.poll(1000):
+                stdout += to_text(self._stdout.read(1024))
+                display.vvvv(u"PRE stdout line: {0}".format(to_bytes(stdout)), host=self.host)
+            else:
+                display.vvvv(u"PRE remaining: {0}".format(remaining), host=self.host)
+
+            # wait til prompt is ready
+            if startup_complete is False:
+                match = startup_reply.search(stdout,cursor)
+                if match:
+                    display.vvvv(u"PRE startup output received", host=self.host)
+                    cursor = match.end()
+                    startup_complete = True
+                    
+
+            # disable echo
+            if startup_complete and (disable_echo_complete is None):
+                display.vvvv(u"PRE Disabling Echo: {0}".format(disable_echo_cmd), host=self.host)
+                self._session.stdin.write(disable_echo_cmd)
+                disable_echo_complete = False
+
+            if disable_echo_complete is False:
+                match = disable_echo_reply.search(stdout)
+                if match:
+                    stdout = stdout[match.end():]
+                    disable_echo_complete = True
+            
+
+            # disable prompt
+            if disable_echo_complete and disable_prompt_complete is None:
+                display.vvvv(u"PRE Disabling Prompt: {0}".format(disable_prompt_cmd), host=self.host)
+                self._session.stdin.write(disable_prompt_cmd)
+                disable_prompt_complete = False
+
+            if disable_prompt_complete is False:
+                match = disable_prompt_reply.search(stdout)
+                if match:
+                    stdout = stdout[match.end():]
+                    disable_prompt_complete = True
+        
+        if not disable_prompt_complete:
+            raise AnsibleConnectionFailure("SSM process closed during _prepare_terminal on host: %s"
+                                               % self.instance_id)
+        else:
+            display.vvv(u"PRE Terminal configured", host=self.host)
+            
     def _wrap_command(self, cmd, sudoable, mark_start, mark_end):
         ''' wrap command so stdout and status can be extracted '''
 
@@ -427,7 +513,9 @@ class Connection(ConnectionBase):
         else:
             if sudoable:
                 cmd = "sudo " + cmd
-            cmd = "echo " + mark_start + "\n" + cmd + "\necho $'\\n'$?\n" + "echo " + mark_end + "\n"
+            cmd = ("printf '%s\\n' '{0}' ;\n".format(mark_start) +
+                cmd + 
+                " ;\nprintf '\\n%s\\n%s\\n' \"$?\" '{0}' ;\n".format(mark_end))
 
         display.vvvv(u"_wrap_command: '{0}'".format(to_text(cmd)), host=self.host)
         return cmd

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

Im willing to try it because im stuck. My executing node is running py3. My endpoint is only running 2.7 (legacy os). So the plugin being py3 only should be fine. The py2 issues may just be with what ansible packages up and sends over to the endpoint

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

@DanielFallon thanks for the patch. Had to change your ssm_timeout to timeout for the self.get_option call.

upon executing im getting stuck in a PRE remaining loop eg prompt not ready during gathering facts

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

The change in get_option call is because my patch was for the development version instead of the currently released version (they just changed it from timeout to ssm_timeout)

can you post a redacted log with debug level at least 4? (-vvvv)
I'd like to see which part of the setup loop it's getting stuck in.

There are sort of 3 phases:
first, waits for it to receive the string:

Starting session with SessionId:

second, waits to receive the string:

stty -echo
$

and the third wait's to receive a custom generated end string:

<end_mark>

I expected this code to be a bit fragile but fail safe. Maybe I should add some more detail to the timeout exception

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

I can grab you a log, but may take some time, my inventory script now isn't populating as expected which sets up the s3 bucket name, and instance id.

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

it's also worth noting that I only tested this with a sort of hello world example. I wouldn't be surprised if there are still things that don't work. I do think that some sort of scripted regression testing would be very valuable for this as it is fragile and could be broken by:

  • changes to ansible
  • changes to the ssm session plugin
  • changes to the aws api
  • probably differences in a hosts' /etc/profile

all things to enumerate and test

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

Does your patch apply to the 1.2 tagged release or on the main branch?

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

Please see sanitized output. I took stock 1.2 and added just your patch and got the following

$ ansible-playbook $ANS_VAULT_FLAGS --limit TEST-GRP --user testuser --tags debug-vars common_aws_node.yml -vvvvvv
ansible-playbook 2.9.9
  config file = /home/user/ansibletest/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/venvs/ansible/lib64/python3.6/site-packages/ansible
  executable location = /home/user/venvs/ansible/bin/ansible-playbook
  python version = 3.6.8 (default, Apr  2 2020, 13:34:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
Using /home/user/ansibletest/ansible.cfg as config file
Reading vault password file: XXXX
Reading vault password file: XXXX
Reading vault password file: XXXX
setting up inventory plugins
host_list declined parsing /home/user/ansibletest/inventories/dynamic_inv.yml as it did not pass its verify_file() method
Start parsing DYN inventory
Blacklisted hosts
Done parsing DYN inventory
Parsed /home/user/ansibletest/inventories/dynamic_inv.yml inventory source with auto plugin
statically imported: /home/user/ansibletest/roles/basic-provisioning/tasks/10_install_start_aws_ssm.yml
Loading callback plugin default of type stdout, v2.0 from /home/user/venvs/ansible/lib64/python3.6/site-packages/ansible/plugins/callback/default.py

PLAYBOOK: common_aws_node.yml ************************************************************************************************************************************************************************************************************************
Positional arguments: common_aws_node.yml
verbosity: 6
remote_user: testuser
connection: smart
timeout: 10
become_method: sudo
tags: ('debug-vars',)
inventory: ('/home/user/ansibletest/inventories/dynamic_inv.yml',)
subset: TEST-GRP
vault_ids: XXXX
forks: 5
7 plays in common_aws_node.yml

PLAY [provision] *************************************************************************************************************************************************************************************************************************************
META: ran handlers
META: ran handlers
META: ran handlers

--- CUT --- (Skipped roles)

PLAY [debug-vars] ************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************
task path: /home/user/ansibletest/common_aws_node.yml:59
<10.x.x.x> ESTABLISH SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "[email protected]", "TokenValue": "XXXCUTXXX", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/[email protected]?role=publish_subscribe", "ResponseMetadata": {"RequestId": "XXX REQ ID XXX", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Fri, 02 Oct 2020 14:49:59 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "668", "connection": "keep-alive", "x-amzn-requestid": "XXX amzn id XXX"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', '', '{"Target": "i-0AWS_INSTANCEID"}', 'https://ssm.us-east-1.amazonaws.com']
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n'
<10.x.x.x> PRE remaining: 60
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> PRE remaining: 59
<10.x.x.x> PRE remaining: 58
--- CUT ---
<10.x.x.x> PRE remaining: 3
<10.x.x.x> PRE remaining: 2
<10.x.x.x> PRE remaining: 1
<10.x.x.x> PRE timeout stdout: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> ssm_retry: attempt: 0, cmd (echo ~testuser...), pausing for 0 seconds
<10.x.x.x> CLOSING SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> TERMINATE SSM SESSION: [email protected]
<10.x.x.x> ESTABLISH SSM CONNECTION TO: i-0AWS_INSTANCEID
<10.x.x.x> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "[email protected]", "TokenValue": "XXXCUTXXX", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/[email protected]?role=publish_subscribe", "ResponseMetadata": {"RequestId": "XXX REQ ID XXX", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Fri, 02 Oct 2020 14:51:00 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "668", "connection": "keep-alive", "x-amzn-requestid": "XXX amzn id XXX"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', '', '{"Target": "i-0AWS_INSTANCEID"}', 'https://ssm.us-east-1.amazonaws.com']
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n'
<10.x.x.x> PRE remaining: 60
<10.x.x.x> PRE stdout line: b'\r\nStarting session with SessionId: [email protected]\r\n\x1b[?1034hsh-4.2$ '
<10.x.x.x> PRE remaining: 59
<10.x.x.x> PRE remaining: 58

I also added the bash shell patch as well (but changed bash -l to bash --noprofile -l) with a similar result

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

I modified some of the re's that are in the patch (mainly removing the \r\n's). Now I'm hitting the same issue. The AnsiballZ_setup.py gets transferred and executed. It prints out the json for the facts, but i never get the end mark. As in I never see the second printf output

sudo sudo -H -S -n  -u root /bin/sh -c 'echo BECOME-SUCCESS-lzrbkhycxnugzysliwrhdukemyhhtyde ; /usr/bin/python /home/maintuser/.ansible/tmp/ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py' ;
printf '\n%s\n%s\n' "$?" 'icjRpqBJJHvCxBxLeqZMYeJKhQ' ;

Then back to the retry loop

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

the bash patch is definitely incompatible with mine. (because the regex would definitely have to change.)

does python ansible-tmp-1601664494.6311371-10330-41749016017816/AnsiballZ_setup.py open stdin? if yes, then it might be eating the printf command (meaning that the shell would never receive it.)

This is one of the things I was worried about in my comment above. I kind of want to look at some other connection plugins and see how they handle this, because I think a lot of pieces of this are pretty fragile and I want to see others ideas.

maybe instead of clearing the prompt, we could set it to an end_mark so that the shell will print it.

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

I tried both sh and bash (with and w/o modding the re's) and both have similar behavior. I do think that AnsiballZ_setup.py is doing something to stdin.

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

This is not an SSM specific solution, but I'm tempted to try out using SendSSHPublicKey from the EC2 instance connect api and then wrapping the ssh connection library instead. Feels like it might be generally more stable (and would support sftp as well)

SSM might still be necessary for the tunnel, but I'm not sure piping commands to stdin is ever really going to be partiuclarly stable.

Any reason why ssmStartSession is superior that I should know about?

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

Looking at this a bit closer, it looks like the right way to address this if use of SSM is wanted is to add a parameter like ssm_exec_document_name. I had not gotten a chance to use SSM much prior to now and wasn't super familiar with its api. The document name in the current version of the api can define most of the behavior of how an interactive session starts, although it unfortunately seems to still start with a raw shell (see here: https://github.com/aws/amazon-ssm-agent/blob/b9654b268afcb7e70a9cc6c6d9b7d2a676f5b468/agent/session/plugins/shell/shell_unix.go#L53 )

Because customization of the shell is so fragile, it's probably worthwhile to provide a sane default and then allow users to override the --document-name provided to the session to configure things further. To facilitate this, the important part is to establish a protocol for communicating stdout and the exit code back to the client (preferably one that is slightly more robust than what is present now so that a user can always produce working results for their system)

I'll give it another half an hour today and post results

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

@DanielFallon would be interested to see your results. Seems to be a step in the right direction.

from community.aws.

jiba21 avatar jiba21 commented on June 12, 2024

Here is a workaround that avoids sh entirely by starting a bash shell.

--- a/plugins/connection/aws_ssm.py	2020-09-03 18:43:43.818000000 +0200
+++ b/plugins/connection/aws_ssm.py	2020-09-03 18:43:19.805000000 +0200
@@ -288,11 +288,17 @@
 
         profile_name = ''
         region_name = self.get_option('region')
-        ssm_parameters = dict()
 
         client = boto3.client('ssm', region_name=region_name)
         self._client = client
-        response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+
+        if self.is_windows:
+            ssm_parameters = dict()
+            response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
+        else:
+            ssm_parameters = {"command": ["bash -l"]}
+            response = client.start_session(Target=self.instance_id, DocumentName="AWS-StartInteractiveCommand", Parameters=ssm_parameters)
+
         self._session_id = response['SessionId']
 
         cmd = [

I think this solution work partially, because is not closing the sessions in session manager. And the system open one session for every one task in Ansible. This causes if I execute a playbook with 10 tasks, 10 session is opening and no closing xd ...

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

The above wont work if there is anything custom about the systems (or users) bash profile that changes what is presented on stdout. I've tried this even with --noprofile and it still hangs for me when gathering facts.

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

@mikeneiderhauser hey btw I did rewrite this using paramiko ssh over the weekend. it's still a bit grody but should work. The biggest problem is that the connections aren't cached. so there are A LOT of aws API calls. I didn't realize that paramiko didn't support persistent connections. I'll have to switch it to borrow from the SSH connection plugin and leverage ssh pipelining instead. Hopefully I can detect whether or not a pipeline is open and not restart the connection from the logic of that plugin. If not, then I'll have to build a disk based cache of some kind for the key timeout.

(Also sorry for the extra cruft trying to cache things, I forgot that ansible forks to a new process for each connection)
Here's what I tried and I think this finally should work just as well as the ssh plugin, but it will exhaust your aws api rate limit rather quickly.
https://gist.github.com/DanielFallon/dffad373c688da32919e709d6738d715

from community.aws.

mikeneiderhauser avatar mikeneiderhauser commented on June 12, 2024

@DanielFallon thanks for this. ill take a look

from community.aws.

DanielFallon avatar DanielFallon commented on June 12, 2024

I'll turn this into a pull request later this week, but here's a plugin that uses ec2-instance-connect to push a temporary ssh key to the host and uses this to authenticate.

If paired with a proxycommand for ssm this should be pretty effective:
https://gist.github.com/DanielFallon/67572f4439602774d02fbbe3bce8a5b9

from community.aws.

jonormann avatar jonormann commented on June 12, 2024

Hi,
is there an official solution?
I am facing the same problem.

from community.aws.

wilinger avatar wilinger commented on June 12, 2024

Hi,

is there an official solution?

I am facing the same problem.

Same

from community.aws.

ChenTsungYu avatar ChenTsungYu commented on June 12, 2024

I have the same issue with Debian e.g. ubuntu

from community.aws.

hexsel avatar hexsel commented on June 12, 2024

As with @eliskovets , I had to change the default shell to bash, but I used the instructions at https://unix.stackexchange.com/questions/442510/how-to-use-bash-for-sh-in-ubuntu.

from community.aws.

jagibson avatar jagibson commented on June 12, 2024

For me, changing the shell using the profile in the Session Manager setting broke logins to Amazon Linux 2 for some reason (which was working OK before). What worked was changing the link that /bin/sh uses via the "sudo dpkg-reconfigure dash" command in Ubuntu. Thanks @eliskovets!

from community.aws.

bedge avatar bedge commented on June 12, 2024

Ran into the identical thing: https://stackoverflow.com/questions/68734815/ansible-aws-ssm-connectivity-plugin-ciphertext-refers-to-a-customer-master
Disabling KMS encryption in the SSM config fixed it:
Screen Shot 2021-08-10 at 18 28 17

Does this warrant a separate bug?
Or, am I missing a config value to enable the "KMS encryption" in the session manager preferences?

(Just confirmed while replicating that the default shell needed to NOT be dash as well)

from community.aws.

bedge avatar bedge commented on June 12, 2024

(Update, moving away from dash has no immediate downside)

Re: Changing the ubuntu default shell fix - ansible version

# See "/var/cache/debconf/config.dat" for name of config item after changing manually
- name: aws-ssm ansible plugin fails if dash is the default shell
  ansible.builtin.debconf:
    name: dash/sh
    question: dash/sh
    value: false
    vtype: boolean

from community.aws.

jagibson avatar jagibson commented on June 12, 2024

@bedge your vtype should be boolean and I don't think you should be quoting 'false'.

I haven't tested with cloudinit yet but hopefully that's all you need to do.

from community.aws.

bedge avatar bedge commented on June 12, 2024

@jagibson Confirmed your suggestion os correct for the ansible (no) dash fix:

# See "/var/cache/debconf/config.dat" for name of config item after changing manually
- name: aws-ssm ansible plugin fails if dash is the default shell
  ansible.builtin.debconf:
    name: dash/sh
    question: dash/sh
    value: false
    vtype: boolean

thanks

Now the remaining issue is the s3 permissions. Seems to work only with

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

Any attempt to reign that in and it starts failing with:

2021-08-13 20:29:58 INFO [ssm-agent-worker] [MessageGatewayService] Sending reply {
  "SchemaVersion": 1,
  "TaskId": "[email protected]",
  "Topic": "agent_task_complete",
  "FinalTaskStatus": "Failed",
  "IsRoutingFailure": false,
  "AwsAccountId": "",
  "InstanceId": "i-01c257e02de698f9b",
  "Output": "Couldn't start the session because we are unable to validate encryption on Amazon S3 bucket. Error: AccessDenied: Access Denied\n\tstatus code: 403, request id: F20PG8SWSH8V4J6Z, host id: 9JY/CcOq6C6Bswaw7AJfbGLcTlzD8scLt/nEBncsI8ac9GPTEeVMDTU7B2yWcgDxn0W+fsZINW4=",
  "S3Bucket": "",
  "S3UrlSuffix": "",
  "CwlGroup": "",
  "CwlStream": ""
}

Is there any way to restrict the s3 access to a specific bucket:///... ?

from community.aws.

bedge avatar bedge commented on June 12, 2024

After some unreliable success with dash removed as the default for /bin/sh on the targets, I switched to setting the shell profile in the AWS session-manager prefs and that seems to be more reliable.
ie: hasn't failed with that set, and consistently fails with it unset.
Not sure how this is different from removing the /bin/sh -> dash association, but the end result is definitely better.

from community.aws.

shinebayar-g avatar shinebayar-g commented on June 12, 2024

Does this plugin work?

1st attempt:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ValueError: invalid literal for int() with base 10: "echo $'\\n'$?"
ubuntu2004 | FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

2nd attempt:
Changed default shell to /bin/bash in Session Manager's Preferences as suggested above.

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: expected string or bytes-like object
ubuntu2004 | FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

from community.aws.

ulyr avatar ulyr commented on June 12, 2024

Recent AWS Linux AMIs add some unicode characters to inputs/outputs that can cause this same error message, if you recently launched a new instance / updated AMIs try the following fix - #1756 (comment)

from community.aws.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.