openshift / hive Goto Github PK

API driven OpenShift cluster provisioning and management

License: Apache License 2.0

Dockerfile 0.08% Makefile 0.32% Go 97.00% Shell 1.79% Python 0.82%

hive's Introduction

OpenShift Hive

API driven OpenShift 4 cluster provisioning and management.

Hive is an operator which runs as a service on top of Kubernetes/OpenShift. The Hive service can be used to provision and perform initial configuration of OpenShift clusters.

For provisioning OpenShift, Hive uses the OpenShift installer.

Supported cloud providers

AWS
Azure
Google Cloud Platform
IBM Cloud
OpenStack
vSphere

In the future Hive will support more cloud providers.

Documentation

Quick Start Guide
Installation
Using Hive
- Cluster Hibernation
- Cluster Pools
Hiveutil CLI
Scaling Hive
Developing Hive
Frequently Asked Questions
Troubleshooting
Architecture

hive's People

Contributors

Stargazers

Watchers

Forkers

csrwng dgoodwin twiest joelddiaz sallyom abhinavdahiya abutcher ironcladlou wking miciah rogbas staebler yaacov derekwaynecarr nalind jharrington22 suicidesin olegvn88 shivamchamoli knowncitizen oip-rnd jewzaam nimrodshn fabianofranz jmelis aditya-konarde gregsheremeta lalatendumohanty maorfr mjudeikis mwoodson gonzalesraul danehans karstengresch stuartchuan zhuchance rafael-azevedo wanghaoran1988 nautilux inderpaltiwana gyliu513 dhellmann mhrivnak clyang82 cblecker sacharya djzager rwsu tkashem therealhaoliu mvazquezc ch-stark jmelis2 rrati-test dav1x matt-simons marcelomata cben sdminonne bonomali slaterx jparrill mdelder pvasant openshift-cherrypick-robot jtudelag ahadas superfunhappytime igoihman davidffrench dustman9000 spaparaju suhanime akhil-rane xyntrix jeefy lwan-wanglin 2uasimojo vkareh waseem-h nanyte25 himanshudogra quchangl-github aaronlevy onelapahead bdematte kahootali raelga trdoyle81 alrs klaphi priyanka19-98 a7vicky rayashworth apahim hassenius riverzhang lesliekuo filanov bmangoen

hive's Issues

Stuck uninstall when deleting a ClusterDeployment before provisioning is complete

I deleted a ClusterDeployment before provisioning was complete. Now the uninstall pod is stuck trying to delete network interfaces:

time="2018-11-28T02:14:23Z" level=debug msg="Exiting deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:47Z" level=debug msg="Deleting internet gateways (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:47Z" level=debug msg="deleting internet gateway: igw-0597f04b2b21e9150"
time="2018-11-28T05:32:47Z" level=debug msg="detaching Internet GW igw-0597f04b2b21e9150 from VPC vpc-02dbc31a784cf1a70"
time="2018-11-28T05:32:47Z" level=debug msg="error detaching igw: error detaching internet gateway: DependencyViolation: Network vpc-02dbc31a784cf1a70 has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.\n\tstatus code: 400, request id: 3f1c9453-42d5-4680-b708-1538dc27b1d8"
time="2018-11-28T05:32:47Z" level=debug msg="Exiting deleting internet gateways (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="Deleting subnets (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="error deleting subnet: DependencyViolation: The subnet 'subnet-0aab27e3305a5f066' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: ad047fa6-2244-41c5-99c4-067780dc3c8b"
time="2018-11-28T05:32:57Z" level=debug msg="error deleting subnet: DependencyViolation: The subnet 'subnet-0524453910b3f2445' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: c8acf976-0942-4b66-abf0-1636aea03791"
time="2018-11-28T05:32:57Z" level=debug msg="Exiting deleting subnets (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="Deleting VPCs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 16 total load balancers, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting V2 load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 4 total V2 load balancers, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting target groups (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 7 total target groups, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting target groups (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting V2 load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="deleting VPC: vpc-02dbc31a784cf1a70"
time="2018-11-28T05:32:58Z" level=debug msg="error deleting VPC vpc-02dbc31a784cf1a70: DependencyViolation: The vpc 'vpc-02dbc31a784cf1a70' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: 623dcf6e-7f39-491d-8379-2b3b0aa84a8c"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting VPCs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:33:09Z" level=debug msg="Deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:33:09Z" level=debug msg="deleting EIP: eni-0dbf268139e5a6263"
time="2018-11-28T05:33:09Z" level=debug msg="deleting network interface: eni-0dbf268139e5a6263"
time="2018-11-28T05:33:09Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-0dbf268139e5a6263' is currently in use.\n\tstatus code: 400, request id: 5714fd9f-840a-4cc7-92dc-fe8b808297a4"
time="2018-11-28T05:33:09Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-0dbf268139e5a6263' is currently in use.\n\tstatus code: 400, request id: 5714fd9f-840a-4cc7-92dc-fe8b808297a4"
time="2018-11-28T05:33:09Z" level=debug msg="deleting EIP: eni-091fed9de78f91bd6"
time="2018-11-28T05:33:09Z" level=debug msg="deleting network interface: eni-091fed9de78f91bd6"
time="2018-11-28T05:33:10Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-091fed9de78f91bd6' is currently in use.\n\tstatus code: 400, request id: d3554626-ac73-4168-a020-a49d37299205"
time="2018-11-28T05:33:10Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-091fed9de78f91bd6' is currently in use.\n\tstatus code: 400, request id: d3554626-ac73-4168-a020-a49d37299205"
time="2018-11-28T05:33:10Z" level=debug msg="Exiting deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"

These lines have been repeating for ~18 hours now.
ealfassa-test-7-8xqhk-uninstall-bvzrb 1/1 Running 0 18h

openshift-install --log-level=debug destroy cluster does not destroy AWS EC2 instances

Other resources are destroyed; the nodes are not.

Instance IDs are i-0de684f06392257b0 i-04b1e70b9156fd54d i-060f617bed55da9aa i-0c8f50f1f83335d5c

openshift-install_destroy_cluster_output.txt

Retrying Install Can Fail on IAM role already existing.

Not sure how this happens, looks like somehow this didn't get cleaned up. Cluster install was retried several times by the point this happened.

module.vpc.aws_lb_listener.api_internal_api: Creation complete after 1s
module.vpc.aws_lb_listener.api_internal_services: Creation complete after 1s
aws_route53_record.etcd_cluster: Still creating... (20s elapsed)
module.dns.aws_route53_record.tectonic_api_external: Still creating... (20s elapsed)
module.dns.aws_route53_record.tectonic_api_internal: Still creating... (10s elapsed)
aws_route53_record.etcd_cluster: Still creating... (30s elapsed)
module.dns.aws_route53_record.tectonic_api_external: Still creating... (30s elapsed)
module.dns.aws_route53_record.tectonic_api_internal: Still creating... (20s elapsed)
aws_route53_record.etcd_cluster: Still creating... (40s elapsed)
module.dns.aws_route53_record.tectonic_api_external: Creation complete after 35s (ID: Z2I29TC6NNC5SM_jamesh-test-4-api.aws.openshift.com_A)                                                                        
aws_route53_record.etcd_cluster: Creation complete after 45s (ID: Z1S45H7KZWOOGM__etcd-server-ssl._tcp.jamesh-test-4_SRV)                                                                                          
module.dns.aws_route53_record.tectonic_api_internal: Still creating... (30s elapsed)
module.dns.aws_route53_record.tectonic_api_internal: Still creating... (40s elapsed)
module.dns.aws_route53_record.tectonic_api_internal: Creation complete after 46s (ID: Z1S45H7KZWOOGM_jamesh-test-4-api.aws.openshift.com_A)                                                                        

Error: Error applying plan:

1 error(s) occurred:

* module.iam.aws_iam_role.worker_role: 1 error(s) occurred:

* aws_iam_role.worker_role: Error creating IAM Role jamesh-test-4-worker-role: EntityAlreadyExists: Role with name jamesh-test-4-worker-role already exists.                                                       
        status code: 409, request id: a953a657-ecd9-11e8-80ce-5fcb6c926834

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


level=fatal msg="Error executing openshift-install: failed to fetch Cluster: failed to generate asset \"Cluster\": failed to run terraform: failed to execute Terraform: exit status 1"                            
time="2018-11-20T15:37:39Z" level=error msg="error capturing openshift-install stdout" error="read |0: file already closed"                                                                                        
time="2018-11-20T15:37:39Z" level=error msg="error capturing openshift-install stderr" error="read |0: file already closed"                                                                                        
time="2018-11-20T15:37:39Z" level=error msg="error running openshift-install" error="exit status 1"
time="2018-11-20T15:37:39Z" level=info msg="uploading cluster metadata"
time="2018-11-20T15:37:39Z" level=error msg="error creating metadata configmap" error="configmaps \"jamesh-test-4-854wl-metadata\" already exists"                                                                 
time="2018-11-20T15:37:39Z" level=fatal msg="error uploading cluster metadata.json" error="configmaps \"jamesh-test-4-854wl-metadata\" already exists"

Cluster provisioning without cluster installation

Goal: on an existing (freshly–installed) cluster have hive set up all of the standard SRE stuff, especially monitoring. The installation is openshift-installer based, but is not in any way controlled by hive.

Issues I'd like to clarify regarding this goal:

What is the communication channel(s?) used for hive<->endcluster syncing (especially SyncSets as they form the basis afaik) and how can such a channel be set up.
Are there any additional installation steps that a hive–managed cluster is expected to have completed post openshift-install?
Can the communication channels between hive and the end cluster be severed once everything is provisioned? The idea being to have monitored clusters, but without them being managed.

Please add API url and web console URL to the cluster status

It would be very useful to know the API URL and the web console URL after the cluster is successfully deployed.

Use registry.svc.ci.openshift.org/openshift/hive-v4.0:hive as the default hive image

Presently we're using local image defaults in the code, this should match installer:

defaultInstallerImage           = "registry.svc.ci.openshift.org/openshift/origin-v4.0:installer"                                                                                                           
defaultInstallerImagePullPolicy = corev1.PullAlways                                                                                                                                                         
defaultHiveImage                = "hive-controller:latest"                                                                                                                                                  
defaultHiveImagePullPolicy      = corev1.PullNever

Pull policy should be always as well.

Some `hive-controller-manage` messages contain `ERROR: logging before flag.Parse`

Some of the messages in the log of the hive-controller-manager pod contain the ERROR: logging before flag.Parse, for example:

ERROR: logging before flag.Parse: W1120 11:25:50.863560       1 reflector.go:341] github.com/openshift/hive/vendor/sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.DNSZone ended with: The resourceVersion for the provided watch is too old.

This is confusing. In the above example it looks like an error, but it is actually a warning. Can this prefix be removed?

Need a liveness check for Hive

Cluster service depends on hive, hence cluster service liveness check should also check the hive liveness.
We require an API call for querying hive, that will return ok iff hive is ready.

See https://github.com/docker/go-healthcheck as implementation reference.

Support for disconnecting a cluster from Hive wihout de-provisioning it

We have an use case where users would like to end their relationship with us, but they would like to keep the cluster that they have been created. How can this be achieved with Hive? Deleting the cluster is not OK for this, because it will de-provision it. In some cases we edited the clusterdeployment manually to remove the finalizer, and then we deleted the clusterdeployment object. Is that enough, or is there any other think that needs to be done to make sure that the cluster is completely disconnected from Hive? Also, will that be supported going forward?

Cluster provision fails on missing cluster-config configmap

During provision of cluster using hive we get the network-operator pod on error config:

[core@ip-10-0-3-26 ~]$ sudo oc get pods --config=/var/opt/tectonic/auth/kubeconfig --all-namespaces 
NAMESPACE                              NAME                                                              READY     STATUS                       RESTARTS   AGE
kube-system                            etcd-member-ip-10-0-8-201.ec2.internal                            1/1       Running                      0          1h
kube-system                            kube-proxy-n64xx                                                  1/1       Running                      0          1h
kube-system                            kube-scheduler-mjsc9                                              0/1       ContainerCreating            0          1h
kube-system                            tectonic-network-operator-gjps2                                   1/1       Running                      0          1h
openshift-cluster-api                  machine-api-operator-649c446d5b-49rf9                             0/1       Pending                      0          1h
openshift-cluster-dns-operator         cluster-dns-operator-6b8b9cbbcd-rmhsx                             0/1       Pending                      0          1h
openshift-cluster-network-operator     cluster-network-operator-cz466                                    0/1       CreateContainerConfigError   0          1h
openshift-cluster-version              cluster-version-operator-4lkmd                                    1/1       Running                      0          1h
openshift-core-operators               openshift-cluster-kube-apiserver-operator-576768b698-7mwg9        0/1       Pending                      0          1h
openshift-core-operators               openshift-cluster-kube-controller-manager-operator-76c57b55sgf8   0/1       Pending                      0          1h
openshift-core-operators               openshift-cluster-kube-scheduler-operator-5b6c66dd59-cc9nr        0/1       Pending                      0          1h
openshift-core-operators               openshift-cluster-openshift-apiserver-operator-759fd945d8-gw8tl   0/1       Pending                      0          1h
openshift-core-operators               openshift-cluster-openshift-controller-manager-operator-58prxwq   0/1       Pending                      0          1h
openshift-core-operators               openshift-service-cert-signer-operator-7fd688bd7f-rjft5           0/1       Pending                      0          1h
openshift-machine-config-operator      machine-config-operator-744f64d9c7-lrhtk                          0/1       Pending                      0          1h
openshift-operator-lifecycle-manager   catalog-operator-cf4cd9c5c-hhc9h                                  0/1       Pending                      0          1h
openshift-operator-lifecycle-manager   olm-operator-7f44dd6495-fv2wg                                     0/1       Pending                      0          1h
openshift-operator-lifecycle-manager   package-server-54d99f7dfc-s2jbq                                   0/1       Pending                      0          1h

Running sudo oc get -o yaml pod -n openshift-cluster-network-operator cluster-network-operator-cz466 --config=/var/opt/tectonic/auth/kubeconfig outputs that the pod is waiting for cluster-config configmap:

containerStatuses:
  - image: registry.svc.ci.openshift.org/openshift/origin-v4.0-20181107015454@sha256:af8333760046cefb84d5f222a96e28c8af06823de8e2fed44647d614cedc925a
    imageID: ""
    lastState: {}
    name: cluster-network-operator
    ready: false
    restartCount: 0
    state:
      waiting:
        message: configmaps "cluster-config" not found
        reason: CreateContainerConfigError

Please report installation errors in the ClusterDeployment status

Right now, the ClusterDeployment status is very vague, a boolean of installed: true or installed: false. However, it seems that in some cases Hive can know when the installer error'd out, for example:

hive/contrib/pkg/installmanager/installmanager.go

Lines 198 to 208 in fd92d98

 if err := m.updateClusterDeploymentStatus(cd, adminKubeconfigSecret.Name, m); err != nil { 

 // non-fatal. log and continue. 

 // will fix up any updates to the clusterdeployment in the periodic controller 

 m.log.WithError(err).Warning("error updating cluster deployment status") 

 } 

 if installErr != nil { 

 m.log.WithError(installErr).Fatal("failed due to install error") 

 } 

 return nil

In these cases, it would be beneficial if the status of the ClusterDeployment would contain error: true or even error: "failed due to installer error".

Please add the OpenShift version of the cluster to the clusterdeployment status

At the moment, Hive doesn't provide information about which version of OpenShift it installed. It would be very useful to have the exact version that was installed.

Hive limitations on CIDR

Seems like hive have limitations/constraints about the CIDR blocks for ServiceCIDR and PodCIDR.
Can you please share what are the limitations?
Can hive calculate the ServiceCIDR and PodCIDR itself based on the VPCCIDRBlock?

Hive fails to provision cluster - index out out range error

During the process of provisioning a new cluster, hive failed with the following error:

Error: Error applying plan:

3 error(s) occurred:

* module.vpc.data.aws_route_table.worker[5]: data.aws_route_table.worker.5: Your query returned no results. Please change your search criteria and try again.
*module.vpc.aws_route.to_nat_gw[5]: index 5 out of range for list aws_route_table.private_routes.*.id (max 5) in:

${aws_route_table.private_routes.*.id[count.index]}
*module.vpc.aws_route_table_association.worker_routing[5]: index 5 out of range for list aws_route_table.private_routes.*.id (max 5) in:

${aws_route_table.private_routes.*.id[count.index]}

After re-running it with the same arguments to provision another cluster, it passed this phase successfully. Seems like inconsistent failure.

'Zones' field does not work

Specifying the "Zones" field in the ClusterDeployment object should provision the nodes on the specified list of zones, Assuming I understand correctly. However this does not work - for example creating the following deployment:

Name:         degas-mtpjc
Namespace:    uhc-development
Labels:       api.openshift.com/id=1F4FyulChH0kFyhSlewODxbjYVV
              api.openshift.com/name=degas
              controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  hive.openshift.io/v1alpha1
Kind:         ClusterDeployment
...
Spec:
  Cluster UUID:  94d16fa8-3e75-4b0b-a40d-9579c269c5a0
  Config:
    Base Domain:  sdev.devshift.net
    Cluster ID:   degas
    Machines:
      Name:  master
      Platform:
        Aws:
          Iam Role Name:  
          Root Volume:
            Iops:  100
            Size:  32
            Type:  gp2
          Type:    m5.xlarge
      Replicas:    3
      Name:        worker
      Platform:
        Aws:
          Iam Role Name:  TBD
          Root Volume:
            Iops:  100
            Size:  32
            Type:  gp2
          Type:    m5.xlarge
          Zones:
            us-east-1a
....

Does not create a single AZ cluster in zone us-east-1a even though this AZ is available (according to my AWS console - see image below)

cc: @jhernand @oourfali @tzvatot @dgoodwin

Insufficient Creds = Stuck Cluster

If you creds cannot for example list VPCs, attempting to install will likely fail, but uninstall definitely will.

Cluster provisioning fails on unknown flag: --config-override-file

Hive cluster provisioning fails. Logging in to the bootsrap node, found this error on journalctl:

bootkube.sh[27891]: unknown flag: --config-override-file
...
systemd[1]: bootkube.service: main process exited, code=exited, status=1/FAILURE

Persist install logs for debugging?

Hi. Currently the main log is the logs of the installer pod. These have a limited lifetime, and also get trampled if the pod keeps restarting...
Are there any plans to copy them somewhere more persistent?

There are also various other logs that might be of interest — controller's logs (well I guess it could create Events if interesting things happen), bootstrap node logs...

cc @elad661 @tzvatot

Support for post-installation hooks

We have an use case where we would like to perform additional configuration steps once the cluster has been installerd. These steps will most probably need the kubeconfig of the admin of the cluster to run. Will Hive support this kind of post-install hooks?

CC: @zgalor

incorrect console URL on deployed clusters

Hi,

Our QE noticed that webConsoleURL for deployed clusters has an incorrect value. We've been able to reproduce this every time.

The webConsoleURL in the clusterdeployment status is usually something like https://yasun-stg4-03-api.devcluster.openshift.com:6443/console, and that leads to an error page,

but running oc get route -n openshift-console on the deployed cluster shows a different URL, sometimes like console-openshift-console.apps.yasun-stg4-03.devcluster.openshift.com, and that URL works.

If I'm reading the Hive source code correctly, it seems that it makes an assumption about what the console URL would be, based on the API server URL, and that assumption is apparently not accurate.

hive/pkg/controller/clusterdeployment/clusterdeployment_controller.go

Lines 361 to 370 in 628597e

 // We should be able to assume only one cluster in here: 

 server := cluster.Server 

 cdLog.Debugf("found cluster API URL in kubeconfig: %s", server) 

 u, err := url.Parse(server) 

 if err != nil { 

 return err 

 } 

 cd.Status.APIURL = server 

 u.Path = path.Join(u.Path, "console") 

 cd.Status.WebConsoleURL = u.String()

Resource deletion races with MachineAPI actuators

When the installer attempts to tear down a cluster, it ends up racing with MAO, which is desperately trying to replace the machines that keep getting deleted. The result is that a few machines get recreated before we completely tear down the control plane. The deletion logic in hive doesn't scan for new resources when it runs, so it never sees these new machines and will get stuck trying to delete dependent resources.

I think there are basically two options going forward:

Hive/Installer tells MAO (and every future infrastructure operator) to pause before destroying the cluster.
The deletion code continually scans for and deletes resources. This process would end once no more resources have been observed.

For what it's worth, I've observed this on both AWS and libvirt.

Hive can't load the admin kubeconfig it created for a new cluster.

Description:

After creating a new cluster Hive creates a secret with the new clusters kubeconfig.
Hive then tries to load it and fails.

TL;DR: It looks like a generateName vs name bug.

What happen:

In my Hive logs I get:

DEBU[9541] reconcile complete                            clusterDeployment=yaacov-05-jnptv job=yaacov-05-jnptv-install namespace=unified-hybrid-cloud
ERRO[9542] unable to load admin kubeconfig               clusterDeployment=yaacov-05-jnptv controller=remotemachineset error="Secret \"yaacov-05-admin-kubeconfig\" not found" namespace=unified-hybrid-cloud
ERRO[9543] unable to load admin kubeconfig               clusterDeployment=yaacov-05-jnptv controller=remotemachineset error="Secret \"yaacov-05-admin-kubeconfig\" not found" namespace=unified-hybrid-cloud

When I do oc get secrets I see a kubeconfig secret named yaacov-05-jnptv-admin-kubeconfig

Replace `kustomize` to `kubectl`

In the doc https://github.com/openshift/hive/blob/master/docs/install.md , we mentioned kustomize was required.

But since Kubernetes 1.14, kustomize is already a plugin for kubectl, and the oc clinet package include both kubectl and oc, so it is better to mention the hive install request oc and kubectl with a kubectl version above 1.14.

[Question] Why not merge `SelectorSyncSet` and `SyncSet` into one CRD?

SyncSet can use cluster name while SelectorSyncSet can use cluster labels, why not merge those two CRD into one?

What is the difference of hive with cluster API

Can anyone help to show some difference for this project and the https://github.com/kubernetes-sigs/cluster-api ? Why not using cluster api to help manage openshift on different cloud providers?

Hive creates a cluster: missing 1 worker node

When creating a cluster we provide 6 worker nodes. However, hive created only 5 worker nodes. Does the master considered a worker node as well?

How can I use hive in OCP 4.2?

It seems hive is not an official component in OCP 4.2 now, so if I want to use it, I have to follow the guidance https://github.com/openshift/hive/blob/master/docs/install.md , right?

What is the plan to enable Hive in OCP?

Empty AWS creds secret results in stuck cluster

Uninstall fails to list VPCs and thus cannot proceed to delete the cluster, even if we never actually got to a point where we could install anything.

Cannot provision the openshift cluster

Follow the document - https://github.com/openshift/hive/blob/master/docs/using-hive.md to clone the latest hive code and use hiveutil to create the openshift cluster. It always throws Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused. The three master nodes and one bootstrap node are created in aws.

Append more detailed log information:

time="2019-10-14T07:40:07Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.190.58.3:6443: connect: connection refused"
time="2019-10-14T07:40:07Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.190.58.3:6443: connect: connection refused"
time="2019-10-14T07:40:41Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:40:41Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:40:41Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:41:15Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:41:15Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:41:15Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.clyang-oc-cluster.clyang.de:6443/version?timeout=32s: dial tcp 18.189.82.228:6443: connect: connection refused"
time="2019-10-14T07:41:29Z" level=debug msg="Fetching \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Loading \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Fetching \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Fetching \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Loading \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Loading \"Install Config\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"SSH Key\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"SSH Key\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"SSH Key\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-10-14T07:41:29Z" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-10-14T07:41:29Z" level=debug msg="    Loading \"Base Domain\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-10-14T07:41:29Z" level=debug msg="  Loading \"Platform\"..."
time="2019-10-14T07:41:29Z" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-10-14T07:41:29Z" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-10-14T07:41:29Z" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-10-14T07:41:29Z" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-10-14T07:41:29Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-10-14T07:41:29Z" level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: failed to initialize the SSH agent: failed to read directory \"/root/.ssh\": open /root/.ssh: no such file or directory"
time="2019-10-14T07:41:29Z" level=fatal msg="Bootstrap failed to complete: waiting for Kubernetes API: context deadline exceeded"
time="2019-10-14T07:41:29Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-10-14T07:41:29Z" level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: failed to initialize the SSH agent: failed to read directory \"/root/.ssh\": open /root/.ssh: no such file or directory"
time="2019-10-14T07:41:29Z" level=fatal msg="Bootstrap failed to complete: waiting for Kubernetes API: context deadline exceeded"
time="2019-10-14T07:41:29Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-10-14T07:41:29Z" level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: failed to initialize the SSH agent: failed to read directory \"/root/.ssh\": open /root/.ssh: no such file or directory"

I have tried use openshift/installer to install openshift cluster with the same configuration and credentials as using by hive. it can create the cluster successfully in AWS cloud.

Any comments? Thanks.

we need to clarify a pv was needed when provision a hive cluster

@clyang82 can you help update the document to reflect this?

Hive mapping of compute and infra nodes to worker nodes

It's unclear how to map infra and compute nodes of a cluster to "worker" nodes in hive.

Guard against re-installs if an install job is deleted, etc.

If the install runs, and sees cd.status.installed is true, it should warn and immediately exit.

Installer Not Completing Successfully When Run in Hive Pods

We're getting failed clusters where all masters are left in NotReady state due to cni not being configured when we run the installer in Hive pods specifically. If we run externally even with the same image via podman, the cluster comes up fine.

Suspect this points to a problem in how Hive's install manager is executing the installer.

End of a failed Hive provision:

level=info msg="API v1.11.0+d4cacc0 up"                                                                                                                                                                     [0/3239]
level=debug msg="added kube-controller-manager.1566fa9c9ff3b86e: ip-10-0-10-83_b3efcb53-e801-11e8-8d14-0e50234f79ea became leader"                                                                                 
level=debug msg="added kube-scheduler.1566fa9cca7f44e9: ip-10-0-10-83_b3feb2d0-e801-11e8-98c8-0e50234f79ea became leader"
level=warning msg="RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 64"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 54.234.142.41:6443: con
nect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 18.214.173.45:6443: con
nect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 54.234.142.41:6443: con
nect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 54.211.8.168:6443: conn
ect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 54.161.146.11:6443: con
nect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 18.207.48.49:6443: conn
ect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=64&watch=true: dial tcp 54.159.141.39:6443: con
nect: connection refused"
level=error msg="waiting for bootstrap-complete: watch closed before UntilWithoutRetry timeout"
level=info msg="Install complete! Run 'export KUBECONFIG=/output/auth/kubeconfig' to manage your cluster."
level=info msg="After exporting your kubeconfig, run 'oc -h' for a list of OpenShift client commands."
time="2018-11-14T11:58:23Z" level=info msg="uploading cluster metadata"
time="2018-11-14T11:58:23Z" level=info msg="uploaded cluster metadata configmap" configMapName=dgoodwin1-metadata
time="2018-11-14T11:58:23Z" level=info msg="uploading admin kubeconfig"
time="2018-11-14T11:58:23Z" level=info msg="uploaded admin kubeconfig secret" secretName=dgoodwin1-admin-kubeconfig

Running in Podman with:

sudo podman run -ti --rm -e AWS_ACCESS_KEY_ID=SNIP -e AWS_SECRET_ACCESS_KEY=SNIP -e OPENSHIFT_INSTALL_CLUSTER_NAME="dgoodwin2" -e OPENSHIFT_INSTALL_BASE_DOMAIN="SNIP" -e OPENSHIFT_INSTALL_EMAIL_ADDRESS="SNIP" -e OPENSHIFT_INSTALL_PASSWORD="password" -e  OPENSHIFT_INSTALL_SSH_PUB_KEY="SNIP" -e OPENSHIFT_INSTALL_PULL_SECRET="SNIP" -e OPENSHIFT_INSTALL_PLATFORM=aws -e OPENSHIFT_INSTALL_AWS_REGION=us-east-1 -v /home/dgoodwin/go/src/github.com/openshift/installer/output:/output:Z registry.svc.ci.openshift.org/openshift/origin-v4.0:installer create cluster --log-level=debug

The podman install comes up healthy with different output near the end:

WARNING Failed to connect events watcher: Get https://dgoodwin2-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=1940&watch=true: dial tcp 54.145.130.123:6443: connect: co
nnection refused                                                                                                                                                                                                    
DEBUG added openshift-master-controllers.1566fde611cb83a1: controller-manager-nnvhs became leader                                                                                                                  
DEBUG added kube-controller-manager.1566fde67693b5ec: ip-10-0-17-165_06eba434-e80a-11e8-9e6c-02b5eb23c7ce became leader                                                                                             
DEBUG added bootstrap-complete: cluster bootstrapping has completed              
INFO Destroying the bootstrap resources...                                                                                                                                                                          
DEBUG Stopping RetryWatcher.                                                             
INFO Using Terraform to destroy bootstrap resources...

Our code to execute can be seen here: https://github.com/openshift/hive/blob/master/contrib/pkg/installmanager/installmanager.go#L246

This was working previously but may have gone bad with installer changes around monitoring the logs from the bootstrap node.

Appears to be the root cause of openshift/cluster-network-operator#35

CC @wking

delete-after annotation didn't work on us-east-2 (Ohio)

We used the "delete-after" annotation with 8h settings.
It works fine for us-east-1 but not for us-east-2.
I had to manually delete EC2 instances running for days.
Didn't try other regions.

Can the `hiveImage` and `installerImage` be removed or optional?

Currently in order to create a cluster deployment the user of the API has to explicitly specify the hiveImage and the installerImage parameters. This is error prone, as it is easy to make mistakes and easy to use images that aren't in sync with the rest of Hive. Can this be removed or made optional? If optional then Hive should have information/configuration enough to decide what are the right images to use.

ClusterDeployment without aws credentials => uninstall stuck

Might be related to #114 ? Might also be NOTABUG, PEBKAC 😉

I created a ClusterDeployment (actually 2 of them) without providing aws credentials — the aws secrets contain:

data:
  awsAccessKeyId: null
  awsSecretAccessKey: null

Naturally, no cluster got created.
I deleted the ClusterDeployments and now uninstall pods seem stuck.
Logs: https://gist.github.com/cben/3bf9c8c5e4e8c8f207c24b7460b2ee0c
(as can be expected lots of UnauthorizedOperation and AccessDenied...)

Install/Uninstall Jobs Must Be Deleted For Changes to Take Effect

Certain fields in cluster deployment effect the jobs that are created, image overrides for example. If the user updates the cluster deployment to correct a problem with these, the jobs are never updated and must be deleted instead, at which point they are recreated correctly. It would be nice if we made some attempts to update them for critical fields like this.

Is it planned for Hive to provide information about the nodes once the cluster is deployed?

It would be really useful for us (the SD-A team) to get information about the nodes once the cluster is deployed - a list of nodes, their types, IP addresses, their OS version and their container runtime version. All of information is in the output of oc get nodes -o wide

I was wondering if it's planned for Hive to provide this information in the ClusterDeployment object of a deployed cluster.

Role Used For Cluster Installs Is Not Updated For New Permissions

Our code in clusterdeployment controller that creates the role to run the installer pod ("cluster-installer") does not update the role if it already exists. We should probably ensure the permissions on the role are correct.

Workaround is to delete the role and let it be recreated.

Support optional adopting ownership of AWS cred secret (possibly SSH keys as well)

SD would like to have the AWS creds secret go away with the cluster deployment. We did not support this as we thought creds would be shared by multiple clusters. However optionally we could add a boolean to the cluster deployment spec indicating it should adopt and own the AWS creds secret. (and probably ssh key secret as well) Controller could then make sure we own these objects as soon as it's possible to do so.

Clarify Cluster ID, UUID, Name in API

Right now we have the cluster deployment name, a ClusterID, and a ClusterUUID.

The installer has a Name, and a ClusterID. Name is used in DNS in combination with the base domain.

Our ClusterUUID -> their ClusterID
Our ClusterID -> their cluster Name.
Our cluster deployment name is not used.

SD has raised a use case where name is not unique even within one account, as it could be combined with different base domains. As such we can't really map our cluster name to their cluster name. (We're going to need something validating uniqueness)

We should probably rename our ClusterID to ClusterName to eliminate the confusion around ClusterID in each API. Should we make it "DNSName" instead?

SHould we then rename our ClusterUUID to just ClusterID to match installer? Or is the UUID more precise and clear.

The result would be:

ClusterDeployment:
Name: foo-xldkj
DNSName: foo
BaseDomain: example.com
ClusterUUID: UUIDHERE

Hive ignores number of node configured for a cluster provision

When configuring a cluster, hive ignores the number of nodes and assign its own configuration (3 master nodes).

Drop Admin Password From API

This functionality was recently dropped from the installer in favor of using a static user with an auto-generated password.

We need to revendor the installer, adjust our API to match the change in openshift/installer#771, and I suspect we will need to pull out the admin password generated similar to how we upload the admin kubeconfig after an install.

CC @jhernand the installer dropped functionality to let you specify the OpenShift admin password and email, is this workable on your side?

Hive ignores requested instance type

When running a cluster provisioning request, hive ignores the desired instance type, and creates a t2.medium instances instead.

Hive debug not printing LB name correctly

Saw this in one of the CI runs

level=debug msg="Deleting load balancers (%!s(*string=0xc42001c1a8))"

Move the cluster ID generated by the installer to the status of the deployment

I know that adding the cluster ID generated by the installer to the status of the clusterdeployment object is already in your plans. We need that in order to query the telemeter prometheus, for example. Please let us know when it is ready.

cluster provisioning doesn't finish

I've provisioned a cluster using api.openshift.com.

As a result I see the instances were created in AWS, however the bootstrap node is still running.
The end of the output of "journalctl --unit=bootkube.service" on the bootstrap node is:

Nov 26 14:32:48 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver DoesNotExist
Nov 26 14:32:53 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager-ip-10-0-43-201.ec2.internal Pending
Nov 26 14:32:53 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-cluster-version/cluster-version-operator-8bb6cff75-7fhxd Running
Nov 26 14:32:53 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver DoesNotExist
Nov 26 14:32:53 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-18-37.ec2.internal Running
Nov 26 14:34:38 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver-ip-10-0-43-201.ec2.internal Pending
Nov 26 14:34:38 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-18-37.ec2.internal Running
Nov 26 14:34:38 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager-ip-10-0-43-201.ec2.internal Pending
Nov 26 14:34:38 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-cluster-version/cluster-version-operator-8bb6cff75-7fhxd Running
Nov 26 14:34:43 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver-ip-10-0-43-201.ec2.internal Pending
Nov 26 14:34:43 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-18-37.ec2.internal Running
Nov 26 14:34:43 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager-ip-10-0-43-201.ec2.internal Running
Nov 26 14:34:43 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-cluster-version/cluster-version-operator-8bb6cff75-7fhxd Running
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-18-37.ec2.internal Running
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager-ip-10-0-43-201.ec2.internal Running
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-cluster-version/cluster-version-operator-8bb6cff75-7fhxd Running
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver-ip-10-0-43-201.ec2.internal Running
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: All self-hosted control plane components successfully started
Nov 26 14:34:48 ip-10-0-7-196 bootkube.sh[4235]: Tearing down temporary bootstrap control plane...

And on one of the master nodes I see:
[core@ip-10-0-13-248 ~]$ sudo crictl logs $(sudo crictl ps --pod=$(sudo crictl pods --name=etcd-member --quiet) --quiet)
E1127 08:00:27.762235 3992 remote_runtime.go:278] ContainerStatus "CONTAINER" from runtime service failed: rpc error: code = Unknown desc = container with ID starting with CONTAINER not found: ID does not exist
FATA[0000] rpc error: code = Unknown desc = container with ID starting with CONTAINER not found: ID does not exist
[core@ip-10-0-13-248 ~]$ sudo crictl pods --name=etcd-member
POD ID CREATED STATE NAME NAMESPACE ATTEMPT
73e729675f488 17 hours ago Ready etcd-member-ip-10-0-13-248.ec2.internal kube-system 1
0110484f5a685 18 hours ago NotReady etcd-member-ip-10-0-13-248.ec2.internal kube-system 0

How to get OpenShift version out of the ClusterDeployment?

In UHC, we're looking at the ClusterDeployment's status.clusterVersionStatus.current.version, hoping that this would give us the version of OpenShift deployed on the cluster, but the values we're seeing are something like 0.0.1-2018-12-08-172651.

I assume that this is the version of the Cluster Version Operator, and not OpenShift itself (as this is OpenShift 4, not 0.0.1).

Is there a canonical way to convert this string into an OpenShift version? Is this string expected to always look like this, or to turn into the user facing 4.0.x style number when OpenShift 4 is GA?

Thanks.

Install Retries Can Fail Because Cluster Metadata Exists

Currently we're failing to update status after install (separate issue), but in this case the cluster metadata got uploaded, and install re-tries fail as a result here.

Cannot have htpasswd secret created in provisioning cluster

I have followed this guide to create syncidentityprovider, but the htpasswd secret cannot be created in target cluster. Here is my syncidentityprovider file content:

---
apiVersion: hive.openshift.io/v1alpha1
kind: SyncIdentityProvider
metadata:
  name: allowall-identity-provider
spec:
  identityProviders:
  - name: htpasswd
    challenge: true
    login: true
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpasswd-zzl2r
  clusterDeploymentRefs:
  - name: "mycluster"

Am I missed anything? I just tried to create htpasswd-zzl2r secret in openshift-config namespace, everything is working now. Thanks.

	if err := m.updateClusterDeploymentStatus(cd, adminKubeconfigSecret.Name, m); err != nil {
	// non-fatal. log and continue.
	// will fix up any updates to the clusterdeployment in the periodic controller
	m.log.WithError(err).Warning("error updating cluster deployment status")
	}

	if installErr != nil {
	m.log.WithError(installErr).Fatal("failed due to install error")
	}

	return nil

	// We should be able to assume only one cluster in here:
	server := cluster.Server
	cdLog.Debugf("found cluster API URL in kubeconfig: %s", server)
	u, err := url.Parse(server)
	if err != nil {
	return err
	}
	cd.Status.APIURL = server
	u.Path = path.Join(u.Path, "console")
	cd.Status.WebConsoleURL = u.String()