openshift / openshift-ansible-contrib Goto Github PK

View Code? Open in Web Editor NEW

283.0 54.0 373.0 5.83 MB

Additional roles and playbooks for OpenShift installation and management

Home Page: https://www.openshift.com

License: Apache License 2.0

Python 41.23% Shell 37.07% HTML 21.65% Dockerfile 0.05%

openshift-ansible-contrib's Issues

GCE Ref Arch should not create its own registry tasks

Should be reusing the roles in the core repo so that the appropriate variables can be used.

app_instance_type is not used

Hello,
I intended to start up a cluster with small defaults but higher capacity app nodes, and used the app_instance_type parameter. But the result is I have all nodes the same size, t2.medium. There were no errors reported.

$ ./ose-on-aws.py --region=eu-west-1 --rhsm-user=jbrannst --public-hosted-zone=ocp.rocks --keypair=OpenShift-Key --rhsm-pool="Employee SKU" --app-instance-type=m4.xlarge --ami=ami-8b8c57f8 --s3-bucket-name=ocp-infra-registry-123456 --s3-username=ec2-user
RHSM password?:
Configured values:
stack_name: openshift-infra
ami: ami-8b8c57f8
region: eu-west-1
master_instance_type: m4.large
node_instance_type: t2.medium
app_instance_type: m4.xlarge
bastion_instance_type: t2.micro
keypair: OpenShift-Key
create_key: no
key_path: /dev/null
create_vpc: yes
vpc_id: None
private_subnet_id1: None
private_subnet_id2: None
private_subnet_id3: None
public_subnet_id1: None
public_subnet_id2: None
public_subnet_id3: None
byo_bastion: no
bastion_sg: /dev/null
console port: 443
deployment_type: openshift-enterprise
public_hosted_zone: ocp.rocks
app_dns_prefix: apps
apps_dns: apps.ocp.rocks
rhsm_user: jbrannst
rhsm_password: *******
rhsm_pool: Employee SKU
containerized: False
s3_bucket_name: ocp-infra-registry-123456
s3_username: ec2-user

Registration Broken

It looks like a 'Subscription Name' update leads to the pool RegEx[1] matching nothing.

Subscription Name: Red Hat OpenShift Container Platform, Premium, 2-Core
Subscription Name: Red Hat OpenShift Container Platform, Standard, 2-Core
Subscription Name: Red Hat OpenShift Container Platform Broker/Master Infrastructure

[1] https://github.com/openshift/openshift-ansible-contrib/blob/master/reference-architecture/aws-ansible/playbooks/roles/rhsm-subscription/tasks/main.yaml#L7

Do not use master-http-proxy in GCE reference arch

      ExecStart=/usr/bin/kubectl proxy -p 8080 --address=0.0.0.0 --accept-hosts=^*$ --config=/etc/origin/master/admin.kubeconfig

This exposes a world readable cluster admin port on the internet. We should not be doing this.

gce.py inventory can race against changes in the GCE project, leading to flakiness

Encountered this during a normal deprovision run - inventory failed because someone was deleting volumes at the same time. This almost certainly will lead to flaking in the CI runs when we turn them on.

ventory()
  File "/usr/share/ansible/openshift-ansible-gce/inventory/gce/hosts/gce.py", line 130, in __init__
    print(self.json_format_dict(self.group_instances(zones),
  File "/usr/share/ansible/openshift-ansible-gce/inventory/gce/hosts/gce.py", line 311, in group_instances
    for node in self.driver.list_nodes():
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 1606, in list_nodes
    v.get('instances', [])]
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 5300, in _to_node
    extra['boot_disk'] = self.ex_get_volume(bd['name'], bd['zone'])
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 4178, in ex_get_volume
    response = self.connection.request(request, method='GET').object
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 120, in request
    response = super(GCEConnection, self).request(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 718, in request
    *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 797, in request
    response = responseCls(**kwargs)
  File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 145, in __init__
    self.object = self.parse_body()
  File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 271, in parse_body
    raise ResourceNotFoundError(message, self.status, code)
libcloud.common.google.ResourceNotFoundError: {u'domain': u'global', u'message': u"The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/qe-chezhang-0106-master-1' was not found", u'reason': u'notFound'}

GCE SSH errors cause install to fail

running gcloud after resolving yum errors on bastion manually results in the following ssh error:

To install or remove components at your current SDK version [138.0.0], run:
$ gcloud components install COMPONENT_ID
$ gcloud components remove COMPONENT_ID

To update your SDK installation to the latest version [138.0.0], run:
$ gcloud components update

Backing up [/home/cloud-user/.bashrc] to [/home/cloud-user/.bashrc.backup].
[/home/cloud-user/.bashrc] has been updated.
Start a new shell for the changes to take effect.

For more information on how to get started, please visit:
https://cloud.google.com/sdk/docs/quickstarts

Generating public/private rsa key pair.
Your identification has been saved in /home/cloud-user/.ssh/google_compute_engine.
Your public key has been saved in /home/cloud-user/.ssh/google_compute_engine.pub.
The key fingerprint is:
1d:ec:5d:80:ed:b3:cd:2b:89:91:32:72:ea:61:2a:db cloud-user
The key's randomart image is:
+--[ RSA 2048]----+
| o. |
| .. .. |
| o. . |
| o oo. |
| S o..= |
| . + o . o |
| o+ o o . . |
| .. o.. . o . |
| .oE.. . |
+-----------------+
Updating project ssh metadata...\Updated [https://www.googleapis.com/compute/v1/projects/machinelearning-nick].
Updating project ssh metadata...done.
Warning: Permanently added 'compute.3888108791522642653' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

(I'm tearing down this build so ssh key information will disappear thus no security risk)

gce-cli: compute-address and other network calls hang because no region

Several commands in ./gcloud.sh hung because they expected region (both create and describe).

++ gcloud --project openshift-gce-devel compute addresses describe origin-ci-master-network-lb-ip '--format=value(address)'
For the following addresses:
 - [origin-ci-master-network-lb-ip]
choose a region:
 [1] asia-east1
 [2] europe-west1
 [3] us-central1
 [4] us-east1
 [5] us-west1
Please enter your numeric choice:  3

I added --region flags to all the commands that didn't have it specified and that fixed the issue.

$ gcloud version
Google Cloud SDK 130.0.0
alpha 2016.01.12
beta 2016.01.12
bq 2.0.24
bq-nix 2.0.24
core 2016.10.07
core-nix 2016.03.28
gsutil 4.21
gsutil-nix 4.18

Infinite loop: "Sending SSH keep-alive..."

Hi,

vagrant up --provider=libvirt --debug results in an infinte loop.

==> node1: Updating /etc/hosts file on host machine (password may be required)...
INFO environment: Getting machine: node1 (libvirt)
INFO environment: Returning cached machine: node1 (libvirt)
INFO environment: Getting machine: admin1 (libvirt)
INFO environment: Returning cached machine: admin1 (libvirt)
INFO warden: Calling OUT action: #VagrantPlugins::HostManager::Action::UpdateAll:0x0000000352d6d8
INFO warden: Calling OUT action: #Vagrant::Action::Builtin::ConfigValidate:0x0000000352d728
INFO provision: Writing provisioning sentinel so we don't provision again
INFO warden: Calling OUT action: #Vagrant::Action::Builtin::Provision:0x00000003565808
INFO warden: Calling OUT action: #VagrantPlugins::ProviderLibvirt::Action::CreateDomain:0x000000035a4990
INFO warden: Calling OUT action: #VagrantPlugins::ProviderLibvirt::Action::CreateDomainVolume:0x0000000360b8c0
INFO warden: Calling OUT action: #VagrantPlugins::ProviderLibvirt::Action::HandleBoxImage:0x0000000364bdd0
INFO warden: Calling OUT action: #Vagrant::Action::Builtin::HandleBox:0x0000000368feb8
INFO warden: Calling OUT action: #VagrantPlugins::ProviderLibvirt::Action::HandleStoragePool:0x000000036e4878
INFO warden: Calling OUT action: #VagrantPlugins::ProviderLibvirt::Action::SetNameOfDomain:0x00000003728d70
INFO warden: Calling OUT action: #<Proc:0x000000037acd28@/opt/vagrant/embedded/gems/gems/vagrant-1.8.6/lib/vagrant/action/warden.rb:94 (lambda)>
INFO warden: Calling OUT action: #Vagrant::Action::Builtin::Call:0x000000036ec230
INFO warden: Calling OUT action: #Vagrant::Action::Builtin::ConfigValidate:0x000000036ec280
INFO interface: Machine: action ["up", "end", {:target=>:node1}]
INFO environment: Released process lock: machine-action-258d7116eec64fc595d6c05f5e723d3c
DEBUG environment: Attempting to acquire process-lock: dotlock
INFO environment: Acquired process lock: dotlock
INFO environment: Released process lock: dotlock
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
...

Best,
Peter

General - DNS settings

Hi again

When I provision a service, lets say jenkins, the url scheme is http://{service}-{project}.apps.domain.com so http://jenkins-me.apps.domain.com. Is there a setting to change the separator from - to . so I get this http://jenkins.me.apps.domain.com?

Websocket connection failures

I get various connection failures when browsing the OSE web console. I have no issues on the command line console. Shutting down the other masters 02 and 03 solves the problem, but I assume it should work in HA as well.

Maybe the asynchronous loading fails when the request is loadbalanced to a different master than the one the first page was served with. This is after accepting permanently all certificates. I am running Firefox 49.0.2 and I have seen this issue before. I noticed a mention in an internal doc about configuring reverse proxy, maybe there is some additional AWS config I must do manually that I may have missed?

Having problems with AWS deployment - ERROR: "Forbidden", while: getting RDS instances

command

./ose-on-aws.py --create-key=yes [email protected] --rhsm-password=XXXX --public-hosted-zone=ocp.alberttwong.com --key-path=/root/.ssh/id_rsa.pub --keypair=us-east-1 --rhsm-pool=Employee

error message

ERROR: "Forbidden", while: getting RDS instancesERROR! Inventory script (inventory/aws/hosts/ec2.py) had an execution error: ERROR: "Forbidden", while: getting RDS instances

modified my IAM policy to below with S3FullAccess

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "Stmt1459269951000",
           "Effect": "Allow",
           "Action": [
               "cloudformation:*",
               "iam:*",
               "route53:*",
               "elasticloadbalancing:*",
               "ec2:*",
               "cloudwatch:*",
               "autoscaling:*"
           ],
           "Resource": [
               "*"
           ]
       }
   ]
}

Missing role in vmware-ansible reference-architecture

The openshift-vm-facts role that is referenced in openshift-install.yaml is not present in the roles directory.

ERROR! the role 'openshift-vm-facts' was not found in /root/openshift/reference-architecture/vmware-ansible/roles:/root/openshift/reference-architecture/vmware-ansible:/opt/ansible/roles:/root/openshift/reference-architecture/vmware-ansible/roles
The error appears to have been in '/root/openshift/reference-architecture/vmware-ansible/openshift-install.yaml': line 9, column 9, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
  - instance-groups
  - openshift-vm-facts
    ^ here

GCE - Fails on yum repos failing on bastion host

./gcloud on fedora ends up on this error:

Warning: Permanently added 'compute.3888108791522642653' (ECDSA) to the list of known hosts.
ansible-config.yml 100% 783 13.8KB/s 00:00
Loaded plugins: search-disabled-repos
There are no enabled repos.
Run "yum repolist all" to see the repos you have.
To enable Red Hat Subscription Management repositories:
subscription-manager repos --enable
To enable custom repositories:
yum-config-manager --enable
Connection to 104.197.128.176 closed.

Message rotation on /var/log/messages

I don't see rotation of /var/log/messages happening in the GCE ref arch, and we have a very high debug level configured. First, do we have this already and I missed it? Second, if not, do we have it in OpenShift Ansible? Third, if not, we should probably add it to the ref-arch, since this is a very likely cause of node failure on GCE.

@pschiffe

Looks like the password for the admin account is wrong

Hi,

I tried to deploy openshift origin using these ansible playbooks with vagrant and for some reason when trying to connect to the web console using admin/admin123 I always had "Invalid username or password". I then changed the htpasswd file with a re-hased value of "admin123" and it worked. What's the actual value of "$2y$11$jJioXC3WgyRq.FVy1vqtfuywDwEZp18d9Kkqb4MgFVzlgCGQNwy36" ?
I guess either the documentation or the Vagrantfile should be updated with something that matches.
Am I alone with this problem ?

AWS: Passwords must be escaped if containing special chars

When using ose-on-aws.py the password must be escaped if containing special characters. For example --rhsm-password=My$ecret would print as My$. A work around is --rhsm-password=My$ecret

AWS - S3 task cannot be run more than once

Im having an issue I dont know how to solve. Basically you cannot run the s3 user tasks more than once and retrieve the required data structure.

If a user exists and this task runs, you do not get the s3user.user_meta data needed to run the subsequent tasks. ie

TASK [s3-registry-user : debug s3] *********************************************
task path: playbooks/roles/s3-registry-user/tasks/main.yaml:9
ok: [localhost] => {
    "s3user": {
        "changed": false,
        "created_keys": [],
        "groups": null,
        "keys": {
            "CHANGED": "Active"
        },
        "user_name": "apim"
    }
}

Which results in

TASK [s3-registry-user : Set fact] *********************************************
task path: playbooks/roles/s3-registry-user/tasks/main.yaml:11
fatal: [localhost]: FAILED! => {
    "failed": true, 
    "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'user_meta'\n\nThe error appears to have been in 'playbooks/roles/s3-registry-user/tasks/main.yaml': line 11, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  debug: var=s3user\n- name: Set fact\n  ^ here\n"
}

This means that any method to add an infra node will fail and any run further run of ose-on-aws.py will not complete due to missing data structures. To get round this could mean pre-creating the s3 user and storing the aws secret and passing in as a param but that seems clunky.

AWS - TASK [instance-groups : Add app instances to host group] - DEPRECATION WARNING

TASK [instance-groups : Add app instances to host group] ***********************
[DEPRECATION WARNING]: Skipping task due to undefined Error, in the future this will be a fatal error.: 'dict object' has no attribute 'tag_provision_node'.
This feature will
 be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
skipping: [localhost]

fail on Template for greenfield

Fail with file not found:

TASK [ssh-key : OSE ec2 key] ***************************************************
task path: /home/mangis/workspace/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/ssh-key/tasks/main.yaml:2
skipping: [localhost] => (item=)  => {"changed": false, "item": "", "skip_reason": "Conditional check failed", "skipped": true}

TASK [cloudformation-infra : Create Greenfield Infrastructure] *****************
task path: /home/mangis/workspace/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/cloudformation-infra/tasks/main.yaml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: mangis
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1475326149.27-207773419727810 `" && echo ansible-tmp-1475326149.27-207773419727810="` echo $HOME/.ansible/tmp/ansible-tmp-1475326149.27-207773419727810 `" ) && sleep 0'
<127.0.0.1> PUT /tmp/tmpTaGfYL TO /home/mangis/.ansible/tmp/ansible-tmp-1475326149.27-207773419727810/cloudformation
<127.0.0.1> EXEC /bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python2 /home/mangis/.ansible/tmp/ansible-tmp-1475326149.27-207773419727810/cloudformation; rm -rf "/home/mangis/.ansible/tmp/ansible-tmp-1475326149.27-207773419727810/" > /dev/null 2>&1 && sleep 0'
An exception occurred during task execution. The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_R4Ze5y/ansible_module_cloudformation.py", line 401, in <module>
    main()
  File "/tmp/ansible_R4Ze5y/ansible_module_cloudformation.py", line 269, in main
    template_body = open(module.params['template'], 'r').read()
IOError: [Errno 2] No such file or directory: 'roles/cloudformation-infra/files/greenfield.json'

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_name": "cloudformation"}, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_R4Ze5y/ansible_module_cloudformation.py\", line 401, in <module>\n    main()\n  File \"/tmp/ansible_R4Ze5y/ansible_module_cloudformation.py\", line 269, in main\n    template_body = open(module.params['template'], 'r').read()\nIOError: [Errno 2] No such file or directory: 'roles/cloudformation-infra/files/greenfield.json'\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed": false}

Testing this with slight modified version as I dont have subscription, but this should not impact this place. Looks like template is not being resolved.

Specify Bastion instance size at launch

AWS - question

Is anyone working on auto scaling groups?
If not I wouldn't mind starting on it

gce-cli: Split gcloud into image building and infra setup?

I'd like to be able to separate the image creation step from the provisioning of the instance group. I was thinking gcloud-image.sh and gcloud.sh, where gcloud.sh would either skip the image creation or call into gcloud-image.sh if and only if REGISTERED_IMAGE was unset.

fails with: 'dict object' has no attribute 'openshift'

I tried containerized setup with this tool in amazon ireland:

./ose-on-aws.py --keypair=$AWS_KEY --rhsm-user=$RHUSER --rhsm-password=$RHPWD --public-hosted-zone=konttikoulu.fi --rhsm-pool=$RHPOOL --region eu-west-1 --ami ami-02ace471 --master-instance-type t2.medium --node-instance-type t2.large --app-instance-type t2.medium --containerized true --app-dns-prefix=iken

which leads to failure in:

TASK [set_fact] ****************************************************************
fatal: [ose-master01.konttikoulu.fi]: FAILED! => {
    "failed": true
}

MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'openshift'

The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/initialize_facts.yml': line 12, column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

        hostname: "{{ openshift_hostname | default(None) }}"
  - set_fact:
    ^ here


fatal: [ose-infra-node02.konttikoulu.fi]: FAILED! => {
    "failed": true
}

My clone of this tool is of version:
commit f5a7778
Merge: dd87087 df0f724
Author: Jason DeTiberus [email protected]
Date: Mon Nov 21 12:10:41 2016 -0500

and I'm running this from rhel 7 latest from rhel container on my laptop.
openshift-ansible-playbooks-3.3.54-1.git.0.61a1dee.el7.noarch

gce-cli: Doesn't handle subscription failures due to bad pool IDs gracefully (should terminate)

Running in my setup, using the qcow downloaded from server. Is this because i'm not in the right pool for my subscription?

+ gcloud --project openshift-gce-devel compute copy-files /Users/clayton/projects/origin/src/github.com/openshift/openshift-ansible-contrib/reference-architecture/gce-cli/ansible-config.yml cloud-user@bastion: --zone us-central1-a
Warning: Permanently added 'compute.5845721122299694843' (ECDSA) to the list of known hosts.
ansible-config.yml                                                                   100%  779     0.8KB/s   00:00
+ gcloud --project openshift-gce-devel compute ssh cloud-user@bastion --zone us-central1-a --ssh-flag=-t --command 'sudo sh -c '\''
    yum install -y python-libcloud atomic-openshift-utils;

    if ! grep -q "export GCE_PROJECT=openshift-gce-devel" /etc/profile.d/ocp.sh 2>/dev/null; then
        echo "export GCE_PROJECT=openshift-gce-devel" >> /etc/profile.d/ocp.sh;
    fi
    if ! grep -q "export INVENTORY_IP_TYPE=internal" /etc/profile.d/ocp.sh 2>/dev/null; then
        echo "export INVENTORY_IP_TYPE=internal" >> /etc/profile.d/ocp.sh;
    fi
'\'''
Loaded plugins: search-disabled-repos
google-cloud-compute/signature                                                                  |  454 B  00:00:00
google-cloud-compute/signature                                                                  | 1.4 kB  00:00:00 !!!
google-cloud-compute/primary                                                                    | 1.8 kB  00:00:00
google-cloud-compute                                                                                               4/4
No package python-libcloud available.
No package atomic-openshift-utils available.
Error: Nothing to do
Connection to 104.155.172.153 closed.

Deploy stucks on rhsm-subscription

Hi
My deployments are sometimes getting stuck quite a while on the RHN registration. After two hours being "stuck" it failed:

PLAY [localhost] ***************************************************************

TASK [instance-groups : Add bastion to group] **********************************
changed: [localhost]

TASK [instance-groups : Add masters to requisite groups] ***********************
changed: [localhost] => (item=ip-10-20-4-101.us-west-2.compute.internal)
changed: [localhost] => (item=ip-10-20-5-157.us-west-2.compute.internal)
changed: [localhost] => (item=ip-10-20-6-26.us-west-2.compute.internal)

TASK [instance-groups : Add a master to the primary masters group] *************
changed: [localhost] => (item=ip-10-20-4-101.us-west-2.compute.internal)

TASK [instance-groups : Add infra instances to host group] *********************
changed: [localhost] => (item=ip-10-20-5-61.us-west-2.compute.internal)
changed: [localhost] => (item=ip-10-20-4-22.us-west-2.compute.internal)

TASK [instance-groups : Add app instances to host group] ***********************
changed: [localhost] => (item=ip-10-20-5-170.us-west-2.compute.internal)
changed: [localhost] => (item=ip-10-20-4-177.us-west-2.compute.internal)

TASK [instance-groups : Add app instances to host group] ***********************

PLAY [localhost] ***************************************************************

TASK [host-up : check to see if host is available] *****************************
ok: [localhost]

PLAY [cluster_hosts] ***********************************************************

TASK [setup] *******************************************************************
ok: [ose-master01....com]

TASK [rhsm-subscription : Register host] ***************************************
changed: [ose-master01....com]

PLAY [cluster_hosts] ***********************************************************

TASK [setup] *******************************************************************
ok: [ose-master02....com]

TASK [rhsm-subscription : Register host] ***************************************

fatal: [ose-master02....com]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Failed to connect to the host via ssh: Shared connection to ose-master02....com closed.


PLAY RECAP *********************************************************************
localhost                  : ok=6    changed=5    unreachable=0    failed=0   
ose-master01....com : ok=2    changed=1    unreachable=0    failed=0   
ose-master02....com : ok=1    changed=0    unreachable=1    failed=0

I suspect it may be caused by some DNS propagation delay or other race condition. Connecting to master-02 did work when connection manually

ssh -v ose-master02.....com 
OpenSSH_7.2p2, OpenSSL 1.0.2h-fips  3 May 2016
debug1: Reading configuration data /home/jhenner/.ssh/config
debug1: /home/jhenner/.ssh/config line 81: Applying options for *
debug1: /home/jhenner/.ssh/config line 198: Applying options for *.....com
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: Executing proxy command: exec ssh ec2-user@bastion -W ose-master02.....com:22
debug1: permanently_drop_suid: 1000
debug1: identity file /home/jhenner/work/cm-infra/jenkins-rhel-slave/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/jhenner/work/cm-infra/jenkins-rhel-slave/id_rsa-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.2
Warning: Permanently added 'bastion.....com' (ECDSA) to the list of known hosts.
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1
debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1* compat 0x04000000
debug1: Authenticating to ose-master02.....com:22 as 'ec2-user'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: [email protected]
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: [email protected] need=64 dh_need=64
debug1: kex: [email protected] need=64 dh_need=64
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:Xxq5YiZ1fL5MiO4pF8pzvHfg/BvUwBX/qjjRIQhz4kU
Warning: Permanently added 'ose-master02.....com' (ECDSA) to the list of known hosts.
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS received
debug1: Skipping ssh-dss key jhenner@jezevec - not in PubkeyAcceptedKeyTypes
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: jhenner@veverka
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Offering RSA public key: /home/jhenner/work/cm-infra/jenkins-rhel-slave/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to ose-master02.....com (via proxy).
debug1: channel 0: new [client-session]
debug1: Requesting [email protected]
debug1: Entering interactive session.
debug1: pledge: proc
debug1: Sending environment.
debug1: Sending env LC_PAPER = cs_CZ.utf8
debug1: Sending env LC_MONETARY = cs_CZ.utf8
debug1: Sending env LC_NUMERIC = cs_CZ.utf8
debug1: Sending env XMODIFIERS = @im=none
debug1: Sending env LANG = en_US.utf8
debug1: Sending env LC_MEASUREMENT = cs_CZ.utf8
debug1: Sending env LC_TIME = cs_CZ.utf8
Last login: Wed Oct 26 09:34:25 2016 from ip-10-20-1-39.us-west-2.compute.internal

Version

55fa033

Steps To Reproduce

./ose-on-aws.py --rhsm-user=* --rhsm-password=** --public-hosted-zone=....com --keypair=jenkins --ami=ami-775e4f16 --rhsm-pool=ES0113909 --no-confirm --region=us-west-2

Current Result

Expected Result

Additional Information

When I run the brownfield deployment (run it again), It passes this phase no problems.

Recreating OSE/OCP cleanly in AWS

need to run

ansible-playbook -i inventory/aws/hosts -e 'region=us-west-2 s3_username=openshift-s3-docker-registry ci=true' playbooks/teardown.yaml

Change the region to your region.

Deploying OpenShift Container Platform 3 on Amazon Web Services - What is the AMI to use if I want to install in another AWS region?

What is the AMI to use if I want to install in another AWS region? It seems that I can only install in us-east-1. I don't have the ami for other regions.

AWS: failed with 'dict object' has no attribute 'tag_openshift_role_master'

TASK [instance-groups : Add masters to requisite groups] ***********************
task path: /root/AWS/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/instance-groups/tasks/main.yaml:9
fatal: [localhost]: FAILED! => {
"failed": true
}

MSG:

'dict object' has no attribute 'tag_openshift_role_master'

AWS - TASK [host-up : check to see if host is available] - Timeout waiting for bastion

getting this error...

TASK [host-up : check to see if host is available] *****************************
fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for bastion.ocp.alberttwong.com:22"}

tried to manually connect

[root@localhost aws-ansible]# ssh [email protected]
[ec2-user@ip-10-20-1-191 ~]$ ls
[ec2-user@ip-10-20-1-191 ~]$ exit
logout
Connection to ec2-35-161-6-30.us-west-2.compute.amazonaws.com closed.

Didn't ask me a username and let me in.

I found out the issue is that somehow the DNS entry in route53 aren't coming down to me (DNS propagation). Make sure you can get to your bastion (bastion.ocp.alberttwong.com) by ssh [email protected]. If this doesn't work, modify your dns.

Move openshift_master_identity_providers from playbooks/openshift-setup.yaml

Hi again :)

I tried moving openshift_master_identity_providers: from playbooks/openshift-setup.yaml to playbooks/vars/main.yaml so I can use a different oauth provider without editing a core playbook.
I commented the provider out of openshift-setup.yaml but always ended up with the default DenyAll provider when I put it in vars/main.yaml
I'd like the ability to change the oauth provider per env so using the --env-file= feature in #39

Use case is to have this repo as a git submodule of a larger deployment system that manages multiple IaaS providers and environments. I dont want to have to modify an upstream repo

add-node.py : Populate AWS data

As a user I want to add a node to my cluster, I know the size and what it should do. I have my credentials ready. I'd preferably not need to dig into the detailed network/security setup of my cluster...

add_node.py
Things the user can easily specify
region: eu-west-1
node_instance_type: r3.xlarge
keypair: OpenShift-Key
public_hosted_zone: ocp.rocks
shortname: ose-app-node04
rhsm_user: jbrannst
rhsm_password: *******
rhsm_pool: Employee SKU

Items that are hard to specify that we should be able to provide defaults for, if needed by querying AWS:
ami: ami-ce66d8bd
(can be looked up, by hardcoded table, per region as suggested in another thread)
subnet_id: subnet-44763020
(as there are typically 3 relevant options they should be printed, one selected and asking the user to confirm or input another option)
node_sg: sg-f62ce390
(there is typically only one option)
iam_role: OpenShift-Infra-NodeInstanceProfile-8DDK8K32INSK
(also only one option)

I have no idea how hard this would be to do, but it would definitely be a boost to user experience

vagrant up fails

Hi,

tried to create a sample environment with the following command:
vagrant up --provider=libvirt --debug

Reported Errors:
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'node1'
machine. Please handle this error then try again:

Vagrant can't use the requested machine because it is locked! This
means that another Vagrant process is currently reading or modifying
the machine. Please wait for that Vagrant process to end and try
again. Details about the machine are shown below:

Name: node1
Provider: libvirt

An error occurred while executing the action on the 'node2'
machine. Please handle this error then try again:

While attempting to connect with SSH, a "no route to host" (EHOSTUNREACH)
error was received. Please verify your network settings are correct
and try again.

An error occurred while executing the action on the 'admin1'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Debug Information:
DEBUG ssh: == Net-SSH connection debug-level log END ==
INFO retryable: Retryable exception raised: #<Errno::EHOSTUNREACH: No route to host - connect(2) for 192.168.121.64:22>
INFO ssh: Attempting to connect to SSH...
INFO ssh: - Host: 192.168.121.64
INFO ssh: - Port: 22
INFO ssh: - Username: vagrant
INFO ssh: - Password? false
INFO ssh: - Key Path: ["/home/labuser/openshift-ansible-contrib/vagrant/.vagrant/machines/node1/libvirt/private_key"]
DEBUG ssh: - connect_opts: {:auth_methods=>["none", "hostbased", "publickey"], :config=>false, :forward_agent=>false, :send_env=>false, :keys_only=>true, :paranoid=>false, :password=>nil, :port=>22, :timeout=>15, :user_known_hosts_file=>[], :verbose=>:debug, :logger=>#<Logger:0x007fc480bf8820 @progname=nil, @Level=0, @default_formatter=#<Logger::Formatter:0x007fc480bf87f8 @datetime_format=nil>, @Formatter=nil, @logdev=#<Logger::LogDevice:0x007fc480bf8730 @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#StringIO:0x007fc480bf8870, @mutex=#<Logger::LogDevice::LogDeviceMutex:0x007fc480bf8708 @mon_owner=nil, @mon_count=0, @mon_mutex=#Mutex:0x007fc480bf8690>>>, :keys=>["/home/labuser/openshift-ansible-contrib/vagrant/.vagrant/machines/node1/libvirt/private_key"]}
DEBUG ssh: == Net-SSH connection debug-level log START ==
DEBUG ssh: D, [2016-10-25T07:01:55.922869 #4853] DEBUG -- net.ssh.transport.session[3fe2405c97b8]: establishing connection to 192.168.121.64:22

DEBUG ssh: == Net-SSH connection debug-level log END ==
ERROR warden: Error occurred: While attempting to connect with SSH, a "no route to host" (EHOSTUNREACH)
error was received. Please verify your network settings are correct
and try again.
INFO warden: Beginning recovery process...
INFO warden: Calling recover: #VagrantPlugins::ProviderLibvirt::Action::WaitTillUp:0x00000002621928
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
ERROR warden: Error occurred: While attempting to connect with SSH, a "no route to host" (EHOSTUNREACH)
error was received. Please verify your network settings are correct
and try again.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
ERROR warden: Error occurred: While attempting to connect with SSH, a "no route to host" (EHOSTUNREACH)
error was received. Please verify your network settings are correct
and try again.
INFO warden: Beginning recovery process...
INFO warden: Calling recover: #Vagrant::Action::Builtin::Call:0x000000024b6598
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO environment: Released process lock: machine-action-985e874cf515a9058c4a05af16e98c77
INFO interface: error: An error occurred. The error will be shown after all tasks complete.
INFO interface: error: ==> master1: An error occurred. The error will be shown after all tasks complete.
==> master1: An error occurred. The error will be shown after all tasks complete.

Host:
Ubuntu 16.04.01
Ansible 2.1.2.0
Vagrant 1.8.6

Best,
Peter

On compiled 2.2 ansible does not warn about add_node being undefined. It just fails

Using compiled ansible when add_node defined in instance groups causes the installation to fail whereas in the RH provided rpm we are prompted with
[DEPRECATION WARNING]: Skipping task due to undefined Error, in the future this will be a fatal error.: 'dict object' has no attribute 'tag_provision_node'.
This feature will be removed in a future release.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
skipping: [localhost]

Google Cloud refarch http proxy shouldn't just forward 8080 to the api service

Instead of leveraging kubeproxy, it would probably be better to stand up either apache or haproxy to forward only the healthz/ready endpoint rather than proxy the entire api behind 8080.

Destination /etc/origin/cloudprovider/aws.conf does not exist

Hi, first of all congratulations for your good work!

I'm trying to install Openshift Origin on AWS on the us-east-1 region.

The ose-on-aws.py command is executed almost successfully. However, I get the following error (on the 3 masters):

TASK [openshift_cloud_provider : Create cloud config] **************************
fatal: [ose-master01.foxtek.net]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "rc": 257
}

MSG: Destination /etc/origin/cloudprovider/aws.conf does not exist !

So, the "play recap" shows the following summary:

PLAY RECAP *********************************************************************
localhost                  : ok=19   changed=13   unreachable=0    failed=0   
ose-app-node01.foxtek.net  : ok=52   changed=8    unreachable=0    failed=0   
ose-app-node02.foxtek.net  : ok=52   changed=8    unreachable=0    failed=0   
ose-infra-node01.foxtek.net : ok=52   changed=8    unreachable=0    failed=0   
ose-infra-node02.foxtek.net : ok=52   changed=8    unreachable=0    failed=0   
ose-master01.foxtek.net    : ok=218  changed=65   unreachable=0    failed=1   
ose-master02.foxtek.net    : ok=195  changed=64   unreachable=0    failed=1   
ose-master03.foxtek.net    : ok=195  changed=64   unreachable=0    failed=1

Do you have any ideas regarding why the /etc/origin/cloudprovider/aws.conf file was not created on any of the master nodes?

I have connected to the master nodes ("ssh [email protected]") and the directory /etc/origin/cloudprovider exists but is totally empty (no files inside). Any ideas?

Thank you very much in advance!

Best Regards

Wrong link in description

Hi,

https://github.com/openshift/openshift-ansible-contrib/blob/master/vagrant/README.md
Section: Installation

After cloning the github repo the intended path "vagrant-openshift-cluster/vagrant" is not available.
It should be:
cd openshift-ansible-contrib/vagrant/

Best,
Peter

cloudformation template

Hi, Where is the cloudformation template located for this HA architecture?

I am looking to make some modifications in cloud formation template and then deploy the stack.

Any help?

Thanks in advance

README file needs update

Uninstall command is wrong. Should be:

ansible-playbook -i inventory/aws/hosts -e 'region=us-east-1 stack_name=openshift-infra ci=false' playbooks/teardown.yaml

Currently has "playook"

s3-registry-user peeks nonexisting `s3user.user_meta`

./ose-on-aws.py --rhsm-user=$USERNAME --rhsm-password=$PASS --public-hosted-zone=jhenner.mooo.com --keypair=jenkins --ami=ami-775e4f16 --rhsm-pool=ES0113909 --no-confirm --region=us-west-2
Configured values:
    ami: ami-775e4f16
    region: us-west-2
    master_instance_type: m4.large
    node_instance_type: t2.medium
    bastion_instance_type: t2.micro
    keypair: jenkins
    create_key: no
    key_path: /dev/null
    create_vpc: yes
    vpc_id: None
    private_subnet_id1: None
    private_subnet_id2: None
    private_subnet_id3: None
    public_subnet_id1: None
    public_subnet_id2: None
    public_subnet_id3: None
    byo_bastion: no
    bastion_sg: /dev/null
    console port: 443
    deployment_type: openshift-enterprise
    public_hosted_zone: jhenner.mooo.com
    app_dns_prefix: apps
    apps_dns: apps.jhenner.mooo.com
    rhsm_user: ******
    rhsm_password: *******
    rhsm_pool: ES0113909
    containerized: False

TASK [s3-registry-user : Create S3 OpenShift registry user] ********************

ok: [localhost]

TASK [s3-registry-user : Set fact] *********************************************
fatal: [localhost]: FAILED! => {
    "failed": true
}

MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'user_meta'

The error appears to have been in '/home/jenkins/workspace/ose-on-aws-deploy/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/s3-registry-user/tasks/main.yaml': line 9, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  register: s3user
- name: Set fact
  ^ here

This is with the commit 8c52fb6

Ansible provisioning failed at "openshift_node_certificates : Create openshift_generated_configs_dir if it does not exist"

Below is a snip from the end of my vagrant provisioning. I believe this shows that admin1 (the one running the ansible playbook in question) CAN in fact reach all the other nodes, but for some reason this one task consistently fails with this same error. Not sure what's going on.

TASK [openshift_node_certificates : Ensure CA certificate exists on openshift_ca_host] ***
ok: [master1 -> 192.168.50.20] => {"changed": false, "stat": {"atime": 1478024342.644244, "checksum": "6f7070de78ad472fe8fee4232319111aaf3873d5", "ctime": 1478024342.638244, "dev": 64769, "exists": true, "gid": 0, "gr_name": "root", "inode": 1573519, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "md5": "66d821c9bb1611bcbc784110e8c23ba6", "mode": "0644", "mtime": 1478024342.638244, "nlink": 1, "path": "/etc/origin/master/ca.crt", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 1070, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}}

TASK [openshift_node_certificates : fail] **************************************
skipping: [master1] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [openshift_node_certificates : Check status of node certificates] *********
ok: [node2] => (item=system:node:node2.example.com.crt) => {"changed": false, "item": "system:node:node2.example.com.crt", "stat": {"exists": false}}
ok: [master1] => (item=system:node:master1.example.com.crt) => {"changed": false, "item": "system:node:master1.example.com.crt", "stat": {"exists": false}}
ok: [node1] => (item=system:node:node1.example.com.crt) => {"changed": false, "item": "system:node:node1.example.com.crt", "stat": {"exists": false}}
ok: [node2] => (item=system:node:node2.example.com.key) => {"changed": false, "item": "system:node:node2.example.com.key", "stat": {"exists": false}}
ok: [master1] => (item=system:node:master1.example.com.key) => {"changed": false, "item": "system:node:master1.example.com.key", "stat": {"exists": false}}
ok: [node1] => (item=system:node:node1.example.com.key) => {"changed": false, "item": "system:node:node1.example.com.key", "stat": {"exists": false}}
ok: [master1] => (item=system:node:master1.example.com.kubeconfig) => {"changed": false, "item": "system:node:master1.example.com.kubeconfig", "stat": {"exists": false}}
ok: [node1] => (item=system:node:node1.example.com.kubeconfig) => {"changed": false, "item": "system:node:node1.example.com.kubeconfig", "stat": {"exists": false}}
ok: [node2] => (item=system:node:node2.example.com.kubeconfig) => {"changed": false, "item": "system:node:node2.example.com.kubeconfig", "stat": {"exists": false}}
ok: [master1] => (item=ca.crt) => {"changed": false, "item": "ca.crt", "stat": {"exists": false}}
ok: [node1] => (item=ca.crt) => {"changed": false, "item": "ca.crt", "stat": {"exists": false}}
ok: [node2] => (item=ca.crt) => {"changed": false, "item": "ca.crt", "stat": {"exists": false}}
ok: [master1] => (item=server.key) => {"changed": false, "item": "server.key", "stat": {"exists": false}}
ok: [node2] => (item=server.key) => {"changed": false, "item": "server.key", "stat": {"exists": false}}
ok: [node1] => (item=server.key) => {"changed": false, "item": "server.key", "stat": {"exists": false}}
ok: [master1] => (item=server.crt) => {"changed": false, "item": "server.crt", "stat": {"exists": false}}
ok: [node1] => (item=server.crt) => {"changed": false, "item": "server.crt", "stat": {"exists": false}}
ok: [node2] => (item=server.crt) => {"changed": false, "item": "server.crt", "stat": {"exists": false}}

TASK [openshift_node_certificates : set_fact] **********************************
ok: [master1] => {"ansible_facts": {"node_certs_missing": true}, "changed": false}
ok: [node1] => {"ansible_facts": {"node_certs_missing": true}, "changed": false}
ok: [node2] => {"ansible_facts": {"node_certs_missing": true}, "changed": false}

TASK [openshift_node_certificates : Create openshift_generated_configs_dir if it does not exist] ***
fatal: [node1]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}
fatal: [node2]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}
fatal: [master1]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}

AWS - ssh key login issue

[root@localhost ~]# ssh [email protected]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
06:35:3b:37:90:8e:50:32:4c:c0:67:d2:0e:54:79:0c.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:2
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Agent forwarding is disabled to avoid man-in-the-middle attacks.
Error: forwarding disabled due to host key check failure
ssh_exchange_identification: Connection closed by remote host
[root@localhost ~]# ssh [email protected]
Last login: Wed Oct 19 23:25:31 2016 from cpe-75-83-58-118.socal.res.rr.com

related to #58

Instruction should contain ntp installation and setup

When doing installation from vagrant image etc. time might not be synced correctly. This may cause python errors that are really hard to figure out.

AWS - unable to deploy OCP infra on European Zone or on default us-east-1 zone.

I tried to deploy OCP on AWS using the python script :
./ose-on-aws.py --keypair=OSE-key --region eu-central-1 --ami ami-7def1712 --create-key=yes --key-path=/root/.ssh/id_rsa.pub --rhsm-user=dwojciec --rhsm-password=password \

--public-hosted-zone=mydomain.com --rhsm-pool="Red Hat OpenShift Container Platform, Standard, 2-Core"
--github-client-secret=6746b9659154d680a06ee9ba07b1d379582ab40f --github-organization=myorg-openshift
--github-client-id=f64169d9eaa3efccf87e -vvv

The error received is :
TASK [cloudformation-infra : Create Greenfield Infrastructure] *****************
task path: /root/AWS/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/cloudformation-infra/tasks/main.yaml:2
Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/cloud/amazon/cloudformation.py
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "echo $HOME/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145" && echo ansible-tmp-1483376807.65-246631564182145="echo $HOME/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145" ) && sleep 0'
<127.0.0.1> PUT /tmp/tmph8KP_7 TO /root/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145/cloudformation.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145/ /root/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145/cloudformation.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145/cloudformation.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1483376807.65-246631564182145/" > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"aws_access_key": null,
"aws_secret_key": null,
"disable_rollback": false,
"ec2_url": null,
"notification_arns": null,
"profile": null,
"region": "eu-central-1",
"security_token": null,
"stack_name": "openshift-infra",
"stack_policy": null,
"state": "present",
"tags": null,
"template": "roles/cloudformation-infra/files/greenfield.json",
"template_format": null,
"template_parameters": {
"AmiId": "ami-7def1712",
"AppNodeInstanceType": "t2.medium",
"AppWildcardDomain": "*.apps.dwojciec.com",
"BastionInstanceType": "t2.micro",
"BastionRootVolType": "gp2",
"BastionUserData": "I2Nsb3VkLWNvbmZpZwp1c2VyczoKLSBkZWZhdWx0CgpzeXN0ZW1faW5mbzoKICBkZWZhdWx0X3VzZXI6CiAgICBuYW1lOiBlYzItdXNlcg==",
"InfraDockerVolSize": 25,
"InfraDockerVolType": "gp2",
"InfraInstanceType": "t2.medium",
"InfraRootVolSize": 15,
"InfraRootVolType": "gp2",
"KeyName": "OSE-key",
"MasterApiPort": "443",
"MasterClusterHostname": "internal-openshift-master.dwojciec.com",
"MasterClusterPublicHostname": "openshift-master.dwojciec.com",
"MasterDockerVolSize": 25,
"MasterDockerVolType": "gp2",
"MasterEmptyVolSize": 5,
"MasterEmptyVolType": "gp2",
"MasterEtcdVolSize": 25,
"MasterEtcdVolType": "gp2",
"MasterHealthTarget": "TCP:443",
"MasterInstanceType": "m4.large",
"MasterRootVolSize": 10,
"MasterRootVolType": "gp2",
"MasterUserData": "I2Nsb3VkLWNvbmZpZwpjbG91ZF9jb25maWdfbW9kdWxlczoKLSBkaXNrX3NldHVwCi0gbW91bnRzCgpmc19zZXR1cDoKLSBsYWJlbDogZXRjZF9zdG9yYWdlCiAgZmlsZXN5c3RlbTogeGZzCiAgZGV2aWNlOiAvZGV2L3h2ZGMKICBwYXJ0aXRpb246IGF1dG8KLSBsYWJlbDogZW1wdHlkaXIKICBmaWxlc3lzdGVtOiB4ZnMKICBkZXZpY2U6IC9kZXYveHZkZAogIHBhcnRpdGlvbjogYXV0bwoKcnVuY21kOgotIG1rZGlyIC1wIC92YXIvbGliL2V0Y2QKLSBta2RpciAtcCAvdmFyL2xpYi9vcmlnaW4vb3BlbnNoaWZ0LmxvY2FsLnZvbHVtZXMKCm1vdW50czoKLSBbIC9kZXYveHZkYywgL3Zhci9saWIvZXRjZCwgeGZzLCAiZGVmYXVsdHMiIF0KLSBbIC9kZXYveHZkZCwgL3Zhci9saWIvb3JpZ2luL29wZW5zaGlmdC5sb2NhbC52b2x1bWVzLCB4ZnMsICJkZWZhdWx0cyxncXVvdGEiIF0KCgp3cml0ZV9maWxlczoKLSBjb250ZW50OiB8CiAgICBERVZTPScvZGV2L3h2ZGInCiAgICBWRz1kb2NrZXJfdm9sCiAgICBEQVRBX1NJWkU9OTUlVkcKICAgIEVYVFJBX0RPQ0tFUl9TVE9SQUdFX09QVElPTlM9Ii0tc3RvcmFnZS1vcHQgZG0uYmFzZXNpemU9M0ciCiAgcGF0aDogL2V0Yy9zeXNjb25maWcvZG9ja2VyLXN0b3JhZ2Utc2V0dXAKICBvd25lcjogcm9vdDpyb290Cgp1c2VyczoKLSBkZWZhdWx0CgpzeXN0ZW1faW5mbzoKICBkZWZhdWx0X3VzZXI6CiAgICBuYW1lOiBlYzItdXNlcg==",
"NodeDockerVolSize": 25,
"NodeDockerVolType": "gp2",
"NodeEmptyVolSize": 50,
"NodeEmptyVolType": "gp2",
"NodeRootVolSize": 15,
"NodeRootVolType": "gp2",
"NodeUserData": "I2Nsb3VkLWNvbmZpZwpjbG91ZF9jb25maWdfbW9kdWxlczoKLSBkaXNrX3NldHVwCi0gbW91bnRzCgpmc19zZXR1cDoKLSBsYWJlbDogZW1wdHlkaXIKICBmaWxlc3lzdGVtOiB4ZnMKICBkZXZpY2U6IC9kZXYveHZkYwogIHBhcnRpdGlvbjogYXV0bwoKcnVuY21kOgotIG1rZGlyIC1wIC92YXIvbGliL29yaWdpbi9vcGVuc2hpZnQubG9jYWwudm9sdW1lcwoKbW91bnRzOgotIFsgL2Rldi94dmRjLCAvdmFyL2xpYi9vcmlnaW4vb3BlbnNoaWZ0LmxvY2FsLnZvbHVtZXMsIHhmcywgImRlZmF1bHRzLGdxdW90YSIgXQoKd3JpdGVfZmlsZXM6Ci0gY29udGVudDogfAogICAgREVWUz0nL2Rldi94dmRiJwogICAgVkc9ZG9ja2VyX3ZvbAogICAgREFUQV9TSVpFPTk1JVZHCiAgICBFWFRSQV9ET0NLRVJfU1RPUkFHRV9PUFRJT05TPSItLXN0b3JhZ2Utb3B0IGRtLmJhc2VzaXplPTNHIgogIHBhdGg6IC9ldGMvc3lzY29uZmlnL2RvY2tlci1zdG9yYWdlLXNldHVwCiAgb3duZXI6IHJvb3Q6cm9vdAoKdXNlcnM6Ci0gZGVmYXVsdAoKc3lzdGVtX2luZm86CiAgZGVmYXVsdF91c2VyOgogICAgbmFtZTogZWMyLXVzZXI=",
"PublicHostedZone": "dwojciec.com",
"Region": "eu-central-1",
"Route53HostedZone": "dwojciec.com.",
"S3BucketName": "openshift-infra-ocp-registry-dwojciec",
"S3User": "openshift-infra-s3-openshift-user",
"SubnetAvailabilityZones": "eu-central-1a,eu-central-1b",
"SubnetCidrBlocks": "10.20.1.0/24,10.20.2.0/24,10.20.3.0/24,10.20.4.0/24,10.20.5.0/24,10.20.6.0/24",
"VpcCidrBlock": "10.20.0.0/16",
"VpcName": "ose-multi-az-vpc-openshift-infra"
},
"template_url": null,
"validate_certs": true
},
"module_name": "cloudformation"
}
}

MSG:

Template error: Fn::Select cannot select nonexistent value at index 2

PLAY RECAP *********************************************************************
localhost : ok=3 changed=2 unreachable=0 failed=1

I tried using the default availability zone US-EAST-1 instead of European zone and I received this issue inside cloudformation console:
CREATE_FAILED AWS::EC2::Subnet PublicSubnet3 Value (us-east-1c) for parameter availabilityZone is invalid. Subnets can currently only be created in the following availability zones: us-east-1e, us-east-1a, us-east-1b, us-east-1d.

I only succeeded by using : --region us-east-2 --ami ami-0a33696f

Apps redirected to openshift-master.rcook-aws.sysdeseng.com for some reason

Hi. Even though I have set the --public-hosted-zone parameter, accessing the console (for example https://openshift-master.MYDOMAIN.TLD/console/ redirects me to

https://openshift-master.rcook-aws.sysdeseng.com/oauth2callback/github?error=redirect_uri_mismatch&error_description=The+redirect_uri+MUST+match+the+registered+callback+URL+for+this+application.&error_uri=https%3A%2F%2Fdeveloper.github.com%2Fv3%2Foauth%2F%23redirect-uri-mismatch&...

GCE: xtables lock fails during restrict-gce-metadata

See openshift/origin-gce#1

It's possible that this has already been solved in the base openshift-ansible and we should be reusing a pattern for there.

AWS ansible error: ERROR! no action detected in task

I'm trying to deploy the aws-ansible reference architecture, and get this error when running ose-on-aws.py, seemingly when it tries to apply playbooks/openshift-install.yaml:

ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/vagrant/openshift-ansible-contrib/reference-architecture/aws-ansible/playbooks/roles/prerequisite/tasks/main.yaml': line 2, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

---
- name: Gather facts
  ^ here

I've tried this on both Mac OSX and CentOS 7 in Vagrant (above). In both cases all dependencies are installed and openshift-ansible is checked out - although on OSX the destination path has to be changed to /usr/local/share/... due to OSX security policy on /usr/share.

AWS - Cloud provider question

Me again :)

Im reading this doc on OCP/AWS integration but cant find the AWS_ vars in any of the specified files.

Am I missing something?

Boto3

Is there a block to using boto3? I'd like to use the cloudformation_facts module and its only in boto3

Install failed with ssh error

Maybe more question than a issue.

After provisioning infrastructure I got all infrastructure deployed with security groups, EC2 instances, etc.

As per reference architecture bastion is ONLY ssh entry point to the platform. But we are running ansible from outside AWS, so access to master hosts is not permitted by security group to build openshift itself.

Action failing:

- hosts: cluster_hosts
  gather_facts: yes 
  become: yes
  serial: 1
  user: ec2-user
  vars_files:
      - vars/main.yaml
  roles:
      - rhsm-subscription

Error simple, no ssh possible.

PLAY [bastion] *****************************************************************

TASK [host-up : check to see if host is available] *****************************
ok: [bastion.bgol.lt] => {"changed": false, "elapsed": 20, "path": null, "port": 22, "search_regex": null, "state": "started"}

PLAY [cluster_hosts] ***********************************************************

TASK [setup] *******************************************************************
fatal: [ose-master01.bgol.lt]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}

PLAY RECAP *********************************************************************
bastion.bgol.lt            : ok=1    changed=0    unreachable=0    failed=0   
localhost                  : ok=5    changed=5    unreachable=0    failed=0   
ose-master01.bgol.lt       : ok=0    changed=0    unreachable=1    failed=0

Does this mean we cant build infrastructure using external ansible master? OR i'm missing something?

ERROR: (gcloud.compute.images.create)

Error message " - The tar archive is not a valid image." when running ./gcloud script on mac.

openshift / openshift-ansible-contrib Goto Github PK

openshift-ansible-contrib's Issues

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

Recommend Projects

Recommend Topics

Recommend Org