ISSUE:
When installing an OpenShift cluster that has a combination of:
- Outbound Proxy
- SSL Re-encryption with another trusted root CA (Cisco/McAfee/Squid proxy SSL MitM basically)
...the installation halts with the bootstrap node at reached installation stage Waiting for controller: waiting for controller pod ready event
and the other two control plane nodes at a Joined
status.
CAUSE:
Once the assisted-installer-controller Job is created in the assisted-installer namespace, it correctly passes in the Outbound Proxy but the Additional Trust Bundles are not added as a mounted volume. If the Outbound Proxy performs SSL re-encryption then the Pod will fail with the following, even if the CA that is performing the re-encryption is applied as an additionalTrustBundle certificate:
time="2022-08-01T23:39:07Z" level=info msg="Start running Assisted-Controller. Configuration is:\n struct ControllerConfig {\n\tClusterID: \"6e08c390-3b8c-4009-9a41-605c7ff40f25\",\n\tURL: \"https://api.openshift.com\",\n\tPullSecretToken: <SECRET>,\n\tSkipCertVerification: false,\n\tCACertPath: \"\",\n\tNamespace: \"assisted-installer\",\n\tOpenshiftVersion: \"4.10.18\",\n\tHighAvailabilityMode: \"Full\",\n\tWaitForClusterVersion: true,\n\tMustGatherImage: \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:03a06499c3efae948535eb61c340efb8511d3ac45db1ca9fccfe5515e49a70ac\",\n\tDryRunEnabled: false,\n\tDryFakeRebootMarkerPath: \"\",\n\tDryRunClusterHostsPath: \"\",\n\tParsedClusterHosts: config.DryClusterHosts(nil),\n}"
W0801 23:39:07.295522 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0801 23:39:09.236233 1 request.go:601] Waited for 1.045203989s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/node.k8s.io/v1?timeout=32s
time="2022-08-01T23:39:09Z" level=info msg="Using proxy {HTTPProxy:http://192.168.42.31:3128/ HTTPSProxy:http://192.168.42.31:3128/ NoProxy:.cluster.local,.kemo.labs,.kemo.network,.svc,.svc.cluster.local,10.128.0.0/14,127.0.0.1,172.30.0.0/16,192.168.0.0/16,192.168.70.0/23,api-int.core-ocp.d70.lab.kemo.network,localhost} to set env-vars for installer-controller pod"
time="2022-08-01T23:39:09Z" level=info msg="Start waiting to be ready"
time="2022-08-01T23:39:09Z" level=info msg="Making sure service dns-default can reserve the .10 address"
time="2022-08-01T23:39:09Z" level=info msg="No service found with IP 172.30.0.10, attempt 1/45"
time="2022-08-01T23:39:10Z" level=warning msg="Failed to connect to assisted service" error="Get \"https://api.openshift.com/api/assisted-install/v2/clusters/6e08c390-3b8c-4009-9a41-605c7ff40f25?exclude-hosts=true\": x509: certificate signed by unknown authority"
time="2022-08-01T23:39:11Z" level=warning msg="Failed to connect to assisted service" error="Get \"https://api.openshift.com/api/assisted-install/v2/clusters/6e08c390-3b8c-4009-9a41-605c7ff40f25?exclude-hosts=true\": x509: certificate signed by unknown authority"
time="2022-08-01T23:39:12Z" level=warning msg="Failed to connect to assisted service" error="Get \"https://api.openshift.com/api/assisted-install/v2/clusters/6e08c390-3b8c-4009-9a41-605c7ff40f25?exclude-hosts=true\": x509: certificate signed by unknown authority"
time="2022-08-01T23:39:13Z" level=warning msg="Failed to connect to assisted service" error="Get \"https://api.openshift.com/api/assisted-install/v2/clusters/6e08c390-3b8c-4009-9a41-605c7ff40f25?exclude-hosts=true\": x509: certificate signed by unknown authority"
time="2022-08-01T23:39:14Z" level=warning msg="Failed to connect to assisted service" error="Get \"https://api.openshift.com/api/assisted-install/v2/clusters/6e08c390-3b8c-4009-9a41-605c7ff40f25?exclude-hosts=true\": x509: certificate signed by unknown authority"
The container is using it's own ca-certificates
installed RPM trusted bundle, and thus has none of the additionalTrustBundle CA Certificates updated on the RHCOS system.
You can manually force the installation to continue by modifying the assisted-installer-controller-config
ConfigMap to set the .data.ca-cert-path: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
and deleting the assisted-installer-controller
Job in the assisted-installer
namespace and then recreating it with the RHCOS system trusted root store mounted:
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: assisted-installer-controller
job-name: assisted-installer-controller
name: assisted-installer-controller
namespace: assisted-installer
spec:
backoffLimit: 100
completionMode: NonIndexed
completions: 1
parallelism: 1
suspend: false
template:
metadata:
creationTimestamp: null
labels:
app: assisted-installer-controller
job-name: assisted-installer-controller
spec:
containers:
- env:
- name: CLUSTER_ID
valueFrom:
configMapKeyRef:
key: cluster-id
name: assisted-installer-controller-config
- name: INVENTORY_URL
valueFrom:
configMapKeyRef:
key: inventory-url
name: assisted-installer-controller-config
- name: PULL_SECRET_TOKEN
valueFrom:
secretKeyRef:
key: pull-secret-token
name: assisted-installer-controller-secret
- name: CA_CERT_PATH
valueFrom:
configMapKeyRef:
key: ca-cert-path
name: assisted-installer-controller-config
optional: true
- name: SKIP_CERT_VERIFICATION
valueFrom:
configMapKeyRef:
key: skip-cert-verification
name: assisted-installer-controller-config
optional: true
- name: OPENSHIFT_VERSION
value: 4.10.18
- name: HIGH_AVAILABILITY_MODE
valueFrom:
configMapKeyRef:
key: high-availability-mode
name: assisted-installer-controller-config
optional: true
- name: CHECK_CLUSTER_VERSION
valueFrom:
configMapKeyRef:
key: check-cluster-version
name: assisted-installer-controller-config
optional: true
- name: MUST_GATHER_IMAGE
valueFrom:
configMapKeyRef:
key: must-gather-image
name: assisted-installer-controller-config
optional: true
image: registry.redhat.io/rhai-tech-preview/assisted-installer-reporter-rhel8:v1.0.0-238
imagePullPolicy: IfNotPresent
name: assisted-installer-controller
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: service-ca-cert-config
mountPath: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/master: ""
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
serviceAccount: assisted-installer-controller
serviceAccountName: assisted-installer-controller
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
volumes:
- name: service-ca-cert-config
hostPath:
path: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
EXPECTED RESULT:
When Root CA Certificates defined in the additionalTrustBundles
spec are added to the RHCOS system trusted store and the store is update, those Root CA Certificates can now be found prepended in /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
- attach the volume to the Job's spec and associated ConfigMap by setting a default for .CACertPath
when passing to the assisted-installer-controller-pod.yaml.template file.
The assisted-installer-controller
Job Pod should continue as such:
time="2022-08-02T01:26:55Z" level=info msg="Start running Assisted-Controller. Configuration is:\n struct ControllerConfig {\n\tClusterID: \"ef4420c4-7d6b-4184-b746-e194e422b0fc\",\n\tURL: \"https://api.openshift.com\",\n\tPullSecretToken: <SECRET>,\n\tSkipCertVerification: false,\n\tCACertPath: \"/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem\",\n\tNamespace: \"assisted-installer\",\n\tOpenshiftVersion: \"4.10.18\",\n\tHighAvailabilityMode: \"Full\",\n\tWaitForClusterVersion: true,\n\tMustGatherImage: \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:03a06499c3efae948535eb61c340efb8511d3ac45db1ca9fccfe5515e49a70ac\",\n\tDryRunEnabled: false,\n\tDryFakeRebootMarkerPath: \"\",\n\tDryRunClusterHostsPath: \"\",\n\tParsedClusterHosts: config.DryClusterHosts(nil),\n}"
W0802 01:26:55.410855 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0802 01:26:56.465700 1 request.go:601] Waited for 1.014466487s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/k8s.ovn.org/v1?timeout=32s
time="2022-08-02T01:26:57Z" level=info msg="Using proxy {HTTPProxy:http://192.168.42.31:3128/ HTTPSProxy:http://192.168.42.31:3128/ NoProxy:.cluster.local,.kemo.labs,.kemo.network,.svc,.svc.cluster.local,10.128.0.0/14,127.0.0.1,172.30.0.0/16,192.168.0.0/16,192.168.70.0/23,api-int.core-ocp.d70.lab.kemo.network,localhost} to set env-vars for installer-controller pod"
time="2022-08-02T01:26:57Z" level=info msg="Using custom CA certificate: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"
time="2022-08-02T01:26:57Z" level=info msg="Start waiting to be ready"
time="2022-08-02T01:26:57Z" level=info msg="Making sure service dns-default can reserve the .10 address"
time="2022-08-02T01:26:57Z" level=info msg="Service dns-default has successfully taken IP 172.30.0.10"
time="2022-08-02T01:26:57Z" level=info msg="HackDNSAddressConflict finished"
time="2022-08-02T01:26:58Z" level=info msg="assisted-service is available"
time="2022-08-02T01:26:58Z" level=info msg="kube-apiserver is available"
time="2022-08-02T01:26:58Z" level=info msg="Sending ready event"
time="2022-08-02T01:26:58Z" level=info msg="monitor cluster installation status"
time="2022-08-02T01:26:58Z" level=info msg="Start sending logs"
time="2022-08-02T01:26:58Z" level=info msg="Waiting till all nodes will join and update status to assisted installer"
time="2022-08-02T01:26:58Z" level=info msg="Start approving CSRs"
PROPOSED FIX:
- Set a default value of
/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
for the CACertPath
configuration variable in the src/config/config.go file.