Giter VIP home page Giter VIP logo

ocp4-metal-install's Introduction

OpenShift 4 Bare Metal Install - User Provisioned Infrastructure (UPI)

Architecture Diagram

Architecture Diagram

Download Software

  1. Download CentOS 8 x86_64 image

  2. Login to RedHat OpenShift Cluster Manager

  3. Select 'Create Cluster' from the 'Clusters' navigation menu

  4. Select 'RedHat OpenShift Container Platform'

  5. Select 'Run on Bare Metal'

  6. Download the following files:

    • Openshift Installer for Linux
    • Pull secret
    • Command Line Interface for Linux and your workstations OS
    • Red Hat Enterprise Linux CoreOS (RHCOS)
      • rhcos-X.X.X-x86_64-metal.x86_64.raw.gz
      • rhcos-X.X.X-x86_64-installer.x86_64.iso (or rhcos-X.X.X-x86_64-live.x86_64.iso for newer versions)

Prepare the 'Bare Metal' environment

VMware ESXi used in this guide

  1. Copy the CentOS 8 iso to an ESXi datastore
  2. Create a new Port Group called 'OCP' under Networking
    • (In case of VirtualBox choose "Internal Network" when creating each VM and give it the same name. ocp for instance)
    • (In case of ProxMox you may use the same network bridge and choose a specific VLAN tag. 50 for instance)
  3. Create 3 Control Plane virtual machines with minimum settings:
    • Name: ocp-cp-# (Example ocp-cp-1)
    • 4vcpu
    • 8GB RAM
    • 50GB HDD
    • NIC connected to the OCP network
    • Load the rhcos-X.X.X-x86_64-installer.x86_64.iso image into the CD/DVD drive
  4. Create 2 Worker virtual machines (or more if you want) with minimum settings:
    • Name: ocp-w-# (Example ocp-w-1)
    • 4vcpu
    • 8GB RAM
    • 50GB HDD
    • NIC connected to the OCP network
    • Load the rhcos-X.X.X-x86_64-installer.x86_64.iso image into the CD/DVD drive
  5. Create a Bootstrap virtual machine (this vm will be deleted once installation completes) with minimum settings:
    • Name: ocp-boostrap
    • 4vcpu
    • 8GB RAM
    • 50GB HDD
    • NIC connected to the OCP network
    • Load the rhcos-X.X.X-x86_64-installer.x86_64.iso image into the CD/DVD drive
  6. Create a Services virtual machine with minimum settings:
    • Name: ocp-svc
    • 4vcpu
    • 4GB RAM
    • 120GB HDD
    • NIC1 connected to the VM Network (LAN)
    • NIC2 connected to the OCP network
    • Load the CentOS_8.iso image into the CD/DVD drive
  7. Boot all virtual machines so they each are assigned a MAC address
  8. Shut down all virtual machines except for 'ocp-svc'
  9. Use the VMware ESXi dashboard to record the MAC address of each vm, these will be used later to set static IPs

Configure Environmental Services

  1. Install CentOS8 on the ocp-svc host

    • Remove the home dir partition and assign all free storage to '/'
    • Optionally you can install the 'Guest Tools' package to have monitoring and reporting in the VMware ESXi dashboard
    • Enable the LAN NIC only to obtain a DHCP address from the LAN network and make note of the IP address (ocp-svc_IP_address) assigned to the vm
  2. Boot the ocp-svc VM

  3. Move the files downloaded from the RedHat Cluster Manager site to the ocp-svc node

    scp ~/Downloads/openshift-install-linux.tar.gz ~/Downloads/openshift-client-linux.tar.gz ~/Downloads/rhcos-metal.x86_64.raw.gz root@{ocp-svc_IP_address}:/root/
  4. SSH to the ocp-svc vm

    ssh root@{ocp-svc_IP_address}
  5. Extract Client tools and copy them to /usr/local/bin

    tar xvf openshift-client-linux.tar.gz
    mv oc kubectl /usr/local/bin
  6. Confirm Client Tools are working

    kubectl version
    oc version
  7. Extract the OpenShift Installer

    tar xvf openshift-install-linux.tar.gz
  8. Update CentOS so we get the latest packages for each of the services we are about to install

    dnf update
  9. Install Git

    dnf install git -y
  10. Download config files for each of the services

    git clone https://github.com/ryanhay/ocp4-metal-install
  11. OPTIONAL: Create a file '~/.vimrc' and paste the following (this helps with editing in vim, particularly yaml files):

    cat <<EOT >> ~/.vimrc
    syntax on
    set nu et ai sts=0 ts=2 sw=2 list hls
    EOT

    Update the preferred editor

    export OC_EDITOR="vim"
    export KUBE_EDITOR="vim"
  12. Set a Static IP for OCP network interface nmtui-edit ens224 or edit /etc/sysconfig/network-scripts/ifcfg-ens224

    • Address: 192.168.22.1
    • DNS Server: 127.0.0.1
    • Search domain: ocp.lan
    • Never use this network for default route
    • Automatically connect

    If changes arent applied automatically you can bounce the NIC with nmcli connection down ens224 and nmcli connection up ens224

  13. Setup firewalld

    Create internal and external zones

    nmcli connection modify ens224 connection.zone internal
    nmcli connection modify ens192 connection.zone external

    View zones:

    firewall-cmd --get-active-zones

    Set masquerading (source-nat) on the both zones.

    So to give a quick example of source-nat - for packets leaving the external interface, which in this case is ens192 - after they have been routed they will have their source address altered to the interface address of ens192 so that return packets can find their way back to this interface where the reverse will happen.

    firewall-cmd --zone=external --add-masquerade --permanent
    firewall-cmd --zone=internal --add-masquerade --permanent

    Reload firewall config

    firewall-cmd --reload

    Check the current settings of each zone

    firewall-cmd --list-all --zone=internal
    firewall-cmd --list-all --zone=external

    When masquerading is enabled so is ip forwarding which basically makes this host a router. Check:

    cat /proc/sys/net/ipv4/ip_forward
  14. Install and configure BIND DNS

    Install

    dnf install bind bind-utils -y

    Apply configuration

    \cp ~/ocp4-metal-install/dns/named.conf /etc/named.conf
    cp -R ~/ocp4-metal-install/dns/zones /etc/named/

    Configure the firewall for DNS

    firewall-cmd --add-port=53/udp --zone=internal --permanent
    # for OCP 4.9 and later 53/tcp is required
    firewall-cmd --add-port=53/tcp --zone=internal --permanent
    firewall-cmd --reload

    Enable and start the service

    systemctl enable named
    systemctl start named
    systemctl status named

    At the moment DNS will still be pointing to the LAN DNS server. You can see this by testing with dig ocp.lan.

    Change the LAN nic (ens192) to use 127.0.0.1 for DNS AND ensure Ignore automatically Obtained DNS parameters is ticked

    nmtui-edit ens192

    Restart Network Manager

    systemctl restart NetworkManager

    Confirm dig now sees the correct DNS results by using the DNS Server running locally

    dig ocp.lan
    # The following should return the answer ocp-bootstrap.lab.ocp.lan from the local server
    dig -x 192.168.22.200
  15. Install & configure DHCP

    Install the DHCP Server

    dnf install dhcp-server -y

    Edit dhcpd.conf from the cloned git repo to have the correct mac address for each host and copy the conf file to the correct location for the DHCP service to use

    \cp ~/ocp4-metal-install/dhcpd.conf /etc/dhcp/dhcpd.conf

    Configure the Firewall

    firewall-cmd --add-service=dhcp --zone=internal --permanent
    firewall-cmd --reload

    Enable and start the service

    systemctl enable dhcpd
    systemctl start dhcpd
    systemctl status dhcpd
  16. Install & configure Apache Web Server

    Install Apache

    dnf install httpd -y

    Change default listen port to 8080 in httpd.conf

    sed -i 's/Listen 80/Listen 0.0.0.0:8080/' /etc/httpd/conf/httpd.conf

    Configure the firewall for Web Server traffic

    firewall-cmd --add-port=8080/tcp --zone=internal --permanent
    firewall-cmd --reload

    Enable and start the service

    systemctl enable httpd
    systemctl start httpd
    systemctl status httpd

    Making a GET request to localhost on port 8080 should now return the default Apache webpage

    curl localhost:8080
  17. Install & configure HAProxy

    Install HAProxy

    dnf install haproxy -y

    Copy HAProxy config

    \cp ~/ocp4-metal-install/haproxy.cfg /etc/haproxy/haproxy.cfg

    Configure the Firewall

    Note: Opening port 9000 in the external zone allows access to HAProxy stats that are useful for monitoring and troubleshooting. The UI can be accessed at: http://{ocp-svc_IP_address}:9000/stats

    firewall-cmd --add-port=6443/tcp --zone=internal --permanent # kube-api-server on control plane nodes
    firewall-cmd --add-port=6443/tcp --zone=external --permanent # kube-api-server on control plane nodes
    firewall-cmd --add-port=22623/tcp --zone=internal --permanent # machine-config server
    firewall-cmd --add-service=http --zone=internal --permanent # web services hosted on worker nodes
    firewall-cmd --add-service=http --zone=external --permanent # web services hosted on worker nodes
    firewall-cmd --add-service=https --zone=internal --permanent # web services hosted on worker nodes
    firewall-cmd --add-service=https --zone=external --permanent # web services hosted on worker nodes
    firewall-cmd --add-port=9000/tcp --zone=external --permanent # HAProxy Stats
    firewall-cmd --reload

    Enable and start the service

    setsebool -P haproxy_connect_any 1 # SELinux name_bind access
    systemctl enable haproxy
    systemctl start haproxy
    systemctl status haproxy
  18. Install and configure NFS for the OpenShift Registry. It is a requirement to provide storage for the Registry, emptyDir can be specified if necessary.

    Install NFS Server

    dnf install nfs-utils -y

    Create the Share

    Check available disk space and its location df -h

    mkdir -p /shares/registry
    chown -R nobody:nobody /shares/registry
    chmod -R 777 /shares/registry

    Export the Share

    echo "/shares/registry  192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" > /etc/exports
    exportfs -rv

    Set Firewall rules:

    firewall-cmd --zone=internal --add-service mountd --permanent
    firewall-cmd --zone=internal --add-service rpc-bind --permanent
    firewall-cmd --zone=internal --add-service nfs --permanent
    firewall-cmd --reload

    Enable and start the NFS related services

    systemctl enable nfs-server rpcbind
    systemctl start nfs-server rpcbind nfs-mountd

Generate and host install files

  1. Generate an SSH key pair keeping all default options

    ssh-keygen
  2. Create an install directory

    mkdir ~/ocp-install
  3. Copy the install-config.yaml included in the clones repository to the install directory

    cp ~/ocp4-metal-install/install-config.yaml ~/ocp-install
  4. Update the install-config.yaml with your own pull-secret and ssh key.

    • Line 23 should contain the contents of your pull-secret.txt
    • Line 24 should contain the contents of your '~/.ssh/id_rsa.pub'
    vim ~/ocp-install/install-config.yaml
  5. Generate Kubernetes manifest files

    ~/openshift-install create manifests --dir ~/ocp-install

    A warning is shown about making the control plane nodes schedulable. It is up to you if you want to run workloads on the Control Plane nodes. If you dont want to you can disable this with: sed -i 's/mastersSchedulable: true/mastersSchedulable: false/' ~/ocp-install/manifests/cluster-scheduler-02-config.yml. Make any other custom changes you like to the core Kubernetes manifest files.

    Generate the Ignition config and Kubernetes auth files

    ~/openshift-install create ignition-configs --dir ~/ocp-install/
  6. Create a hosting directory to serve the configuration files for the OpenShift booting process

    mkdir /var/www/html/ocp4
  7. Copy all generated install files to the new web server directory

    cp -R ~/ocp-install/* /var/www/html/ocp4
  8. Move the Core OS image to the web server directory (later you need to type this path multiple times so it is a good idea to shorten the name)

    mv ~/rhcos-X.X.X-x86_64-metal.x86_64.raw.gz /var/www/html/ocp4/rhcos
  9. Change ownership and permissions of the web server directory

    chcon -R -t httpd_sys_content_t /var/www/html/ocp4/
    chown -R apache: /var/www/html/ocp4/
    chmod 755 /var/www/html/ocp4/
  10. Confirm you can see all files added to the /var/www/html/ocp4/ dir through Apache

    curl localhost:8080/ocp4/

Deploy OpenShift

  1. Power on the ocp-bootstrap host and ocp-cp-# hosts and select 'Tab' to enter boot configuration. Enter the following configuration:

    # Bootstrap Node - ocp-bootstrap
    coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.22.1:8080/ocp4/rhcos coreos.inst.insecure=yes coreos.inst.ignition_url=http://192.168.22.1:8080/ocp4/bootstrap.ign
    
    # Or if you waited for it boot, use the following command then just reboot after it finishes and make sure you remove the attached .iso
    sudo coreos-installer install /dev/sda -u http://192.168.22.1:8080/ocp4/rhcos -I http://192.168.22.1:8080/ocp4/bootstrap.ign --insecure --insecure-ignition
    # Each of the Control Plane Nodes - ocp-cp-\#
    coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.22.1:8080/ocp4/rhcos coreos.inst.insecure=yes coreos.inst.ignition_url=http://192.168.22.1:8080/ocp4/master.ign
    
    # Or if you waited for it boot, use the following command then just reboot after it finishes and make sure you remove the attached .iso
    sudo coreos-installer install /dev/sda -u http://192.168.22.1:8080/ocp4/rhcos -I http://192.168.22.1:8080/ocp4/master.ign --insecure --insecure-ignition
  2. Power on the ocp-w-# hosts and select 'Tab' to enter boot configuration. Enter the following configuration:

    # Each of the Worker Nodes - ocp-w-\#
    coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.22.1:8080/ocp4/rhcos coreos.inst.insecure=yes coreos.inst.ignition_url=http://192.168.22.1:8080/ocp4/worker.ign
    
    # Or if you waited for it boot, use the following command then just reboot after it finishes and make sure you remove the attached .iso
    sudo coreos-installer install /dev/sda -u http://192.168.22.1:8080/ocp4/rhcos -I http://192.168.22.1:8080/ocp4/worker.ign --insecure --insecure-ignition

Monitor the Bootstrap Process

  1. You can monitor the bootstrap process from the ocp-svc host at different log levels (debug, error, info)

    ~/openshift-install --dir ~/ocp-install wait-for bootstrap-complete --log-level=debug
  2. Once bootstrapping is complete the ocp-boostrap node can be removed

Remove the Bootstrap Node

  1. Remove all references to the ocp-bootstrap host from the /etc/haproxy/haproxy.cfg file

    # Two entries
    vim /etc/haproxy/haproxy.cfg
    # Restart HAProxy - If you are still watching HAProxy stats console you will see that the ocp-boostrap host has been removed from the backends.
    systemctl reload haproxy
  2. The ocp-bootstrap host can now be safely shutdown and deleted from the VMware ESXi Console, the host is no longer required

Wait for installation to complete

IMPORTANT: if you set mastersSchedulable to false the worker nodes will need to be joined to the cluster to complete the installation. This is because the OpenShift Router will need to be scheduled on the worker nodes and it is a dependency for cluster operators such as ingress, console and authentication.

  1. Collect the OpenShift Console address and kubeadmin credentials from the output of the install-complete event

    ~/openshift-install --dir ~/ocp-install wait-for install-complete
  2. Continue to join the worker nodes to the cluster in a new tab whilst waiting for the above command to complete

Join Worker Nodes

  1. Setup 'oc' and 'kubectl' clients on the ocp-svc machine

    export KUBECONFIG=~/ocp-install/auth/kubeconfig
    # Test auth by viewing cluster nodes
    oc get nodes
  2. View and approve pending CSRs

    Note: Once you approve the first set of CSRs additional 'kubelet-serving' CSRs will be created. These must be approved too. If you do not see pending requests wait until you do.

    # View CSRs
    oc get csr
    # Approve all pending CSRs
    oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
    # Wait for kubelet-serving CSRs and approve them too with the same command
    oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
  3. Watch and wait for the Worker Nodes to join the cluster and enter a 'Ready' status

    This can take 5-10 minutes

    watch -n5 oc get nodes

Configure storage for the Image Registry

A Bare Metal cluster does not by default provide storage so the Image Registry Operator bootstraps itself as 'Removed' so the installer can complete. As the installation has now completed storage can be added for the Registry and the operator updated to a 'Managed' state.

  1. Create the 'image-registry-storage' PVC by updating the Image Registry operator config by updating the management state to 'Managed' and adding 'pvc' and 'claim' keys in the storage key:

    oc edit configs.imageregistry.operator.openshift.io
    managementState: Managed
    storage:
      pvc:
        claim: # leave the claim blank
  2. Confirm the 'image-registry-storage' pvc has been created and is currently in a 'Pending' state

    oc get pvc -n openshift-image-registry
  3. Create the persistent volume for the 'image-registry-storage' pvc to bind to

    oc create -f ~/ocp4-metal-install/manifest/registry-pv.yaml
  4. After a short wait the 'image-registry-storage' pvc should now be bound

    oc get pvc -n openshift-image-registry

Create the first Admin user

  1. Apply the oauth-htpasswd.yaml file to the cluster

    This will create a user 'admin' with the password 'password'. To set a different username and password substitue the htpasswd key in the '~/ocp4-metal-install/manifest/oauth-htpasswd.yaml' file with the output of htpasswd -n -B -b <username> <password>

    oc apply -f ~/ocp4-metal-install/manifest/oauth-htpasswd.yaml
  2. Assign the new user (admin) admin permissions

    oc adm policy add-cluster-role-to-user cluster-admin admin

Access the OpenShift Console

  1. Wait for the 'console' Cluster Operator to become available

    oc get co
  2. Append the following to your local workstations /etc/hosts file:

    From your local workstation If you do not want to add an entry for each new service made available on OpenShift you can configure the ocp-svc DNS server to serve externally and create a wildcard entry for *.apps.lab.ocp.lan

    # Open the hosts file
    sudo vi /etc/hosts
    
    # Append the following entries:
    192.168.0.96 ocp-svc api.lab.ocp.lan console-openshift-console.apps.lab.ocp.lan oauth-openshift.apps.lab.ocp.lan downloads-openshift-console.apps.lab.ocp.lan alertmanager-main-openshift-monitoring.apps.lab.ocp.lan grafana-openshift-monitoring.apps.lab.ocp.lan prometheus-k8s-openshift-monitoring.apps.lab.ocp.lan thanos-querier-openshift-monitoring.apps.lab.ocp.lan
  3. Navigate to the OpenShift Console URL and log in as the 'admin' user

    You will get self signed certificate warnings that you can ignore If you need to login as kubeadmin and need to the password again you can retrieve it with: cat ~/ocp-install/auth/kubeadmin-password

Troubleshooting

  1. You can collect logs from all cluster hosts by running the following command from the 'ocp-svc' host:

    ./openshift-install gather bootstrap --dir ocp-install --bootstrap=192.168.22.200 --master=192.168.22.201 --master=192.168.22.202 --master=192.168.22.203
  2. Modify the role of the Control Plane Nodes

    If you would like to schedule workloads on the Control Plane nodes apply the 'worker' role by changing the value of 'mastersSchedulable' to true.

    If you do not want to schedule workloads on the Control Plane nodes remove the 'worker' role by changing the value of 'mastersSchedulable' to false.

    Remember depending on where you host your workloads you will have to update HAProxy to include or exclude the control plane nodes from the ingress backends.

    oc edit schedulers.config.openshift.io cluster

ocp4-metal-install's People

Contributors

idemery avatar ryanhay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocp4-metal-install's Issues

DEBUG Still waiting for the Kubernetes API: the server has asked for the client to provide credentials

[root@ocp-svc ~]# ~/openshift-install --dir ~/ocp-install wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.9.18
DEBUG Built from commit eb132dae953888e736c382f1176c799c0e1aa49e
INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.ocp.lan:6443...
DEBUG Still waiting for the Kubernetes API: the server has asked for the client to provide credentials

can any one know why it's happening like this....

overlayfs: unrecognized mount option ""volatile" or missing value

I'm trying to install openshift 4.9 using IPI and UPI but the same error appears in the bootstrap machine "overlayfs: unrecognized mount option ""volatile" or missing value"
and the installation didn't complete with the following output: "if anyone help i'll very thankful"

time="2021-11-08T05:38:16+02:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
time="2021-11-08T05:38:16+02:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."
time="2021-11-08T05:38:16+02:00" level=fatal msg="Bootstrap failed to complete"

======

I'm using vsphere 6.7U2
.openshift_install.log
.openshift-install.log is attached

When issueing setbool command under haproxy section error

When issueing this command

setsebool -P haproxy_connect_any 1 # SELinux name_bind access

I get the following error and cannot enable haproxy

[root@ocp-svc haproxy]# libsepol.context_from_record: type hwtracing_device_t is not defined
libsepol.context_from_record: could not create context structure
libsepol.context_from_string: could not create context structure
libsepol.sepol_context_to_sid: could not convert system_u:object_r:hwtracing_device_t:s0 to sid
invalid context system_u:object_r:hwtracing_device_t:s0
Failed to commit changes to booleans: Success

RHEL 9

Kubernetes API Issue

Hi Ryan,

Thank you so much with this repository. I am running openshift and am trying to get the Bootstrap to complete and debugging it using the command ~/openshift-install --dir ~/ocp-install wait-for bootstrap-complete --log-level=debug.

Error:
INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.ocp.lan:6443/...
DEBUG Still waiting for the Kubernetes Get " https://api.lab.ocp.lan:6443/version":EOF.

If anyone else had the same error or knows how to fix it let me know.
Thanks

Critical documentation error

I don't know if this was introduced with Openshift 4.7 ... but you need to change the kernel boot arguments otherwise it will fail to boot with missing gpg sig files that you can not get hold of ...

The official documents state ...

"If you are using coreos.inst.image_url, you must also use coreos.inst.insecure. This is because the bare-metal media are not GPG-signed for OpenShift Container Platform."

Your kernal boot documentation is omitting the final

coreos.inst.insecure

!!!
Took me 2 weeks to figure out ... I kept trying to sign the image files with the ssh-keygen which is impossible currently with the tools in RHEL 8.

Suggestion for NFS shares

Hey Ryan,

I have a couple of suggestions for your NFS section (18).

  1. create a couple of extra shares for when the cluster is ready - folks are going to need a few, might as well set them up here...
   mkdir -p /shares/{registry,pv0001,pv0002,pv0003}
   chown -R nobody:nobody /shares/{registry,pv0001,pv0002,pv0003}
   chmod -R 777 /shares/{registry,pv0001,pv0002,pv0003}

and

echo "/shares/registry 192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" > /etc/exports
echo "/shares/pv0001 192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" >> /etc/exports
echo "/shares/pv0002 192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" >> /etc/exports
echo "/shares/pv0003 192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" >> /etc/exports
exportfs -arv
  1. Folks should pay close attention to the subnet and use the correct one. I deployed a cluster on IBM's cloud and my subnet was 10.70.174.128/26 - which looks like a normal IP address but isn't, it's a "network" because the CIDR is /26 ;-)

I would even recommend spinning up an extra "test" vm on the cluster's subnet and actually testing the NFS mounts before attempt to use them persistent volumes:

sudo mkdir /test
mount -t nfs ocp-svc:/shares/registry /test
touch /test/it-works
rm /test/it-works
umount /test

If this works, it can save you some real issues down the road...

You could also use the VM to validate some network routes and name resolution to ensure the firewall is has opened it's ports and is forwarding traffic correctly.

Thanks again for your great work capturing all of this.

E.

Is the ~/ocp-install/auth directory needed to be copied to /var/www/html/ocp4?

For this step
cp -R ~/ocp-install/* /var/www/html/ocp4

This would also copy the ~/ocp-install/auth directory and exposed via the web server.
It would include the two files:
kubeadmin-password and kubeconfig

In case it is necessary for installation of control pane / nodes, would it be ok to remove the directory /var/www/html/ocp4/auth after installation?

HAProxy stats not updating on adding new worker nodes

I followed same steps to configure OCP4-Metal-install and I successfully configured the cluster with 3 control plane nodes and two worker nodes.
I added additional node to the cluster successfully but the status in not updating in HA-proxy. any changes do I need to perform in HA-Proxy. any support is much appreciated.

[root@ocp-svc log]# oc get nodes
NAME STATUS ROLES AGE VERSION
ocp-cp-1.lab.ocp.lan Ready control-plane,master,worker 28h v1.27.8+4fab27b
ocp-cp-2.lab.ocp.lan Ready control-plane,master,worker 27h v1.27.8+4fab27b
ocp-cp-3.lab.ocp.lan Ready control-plane,master,worker 27h v1.27.8+4fab27b
ocp-w-1.lab.ocp.lan Ready worker 26h v1.27.8+4fab27b
ocp-w-2.lab.ocp.lan Ready worker 26h v1.27.8+4fab27b
ocp-w-3.lab.ocp.lan Ready worker 7h37m v1.27.8+4fab27b
ocp-w-4.lab.ocp.lan Ready worker 7h17m v1.27.8+4fab27b
[root@ocp-svc log]#
[root@ocp-svc log]#
[root@ocp-svc log]#

image

Documentation issue under Deploy OpenShift

Step 1 - I can not get either option to work.

Q1: Is rhcos equal to the rhcos*.iso file?
Q2: For the bootstrap method, do you replace the entire line after pressing the TAB?
Q3: For the other method I get the following error:
image

# Bootstrap Node - ocp-bootstrap
coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.22.1:8080/ocp4/rhcos coreos.inst.insecure=yes coreos.inst.ignition_url=http://192.168.22.1:8080/ocp4/bootstrap.ign

# Or if you waited for it boot, use the following command then just reboot after it finishes and make sure you remove the attached .iso
sudo coreos-installer install /dev/sda -u http://192.168.22.1:8080/ocp4/rhcos -I http://192.168.22.1:8080/ocp4/bootstrap.ign --insecure --insecure-ignition

Issue during boot on bootstrap and control plane

Version
$ openshift-install version
./openshift-install 4.5.9
built from commit 0d5c871ce7d03f3d03ab4371dc39916a5415cf5c
release image quay.io/openshift-release-dev/ocp-release@sha256:7ad540594e2a667300dd2584fe2ede2c1a0b814ee6a62f60809d87ab564f4425
Platform:
baremetal

UPI (semi-manual installation on customised infrastructure)
What happened?
Cluster details:
Control planes on the baremetal and worker on baremetal. But the bootstrap is running on the ESXi server which is on the same network.

After i launch my boot strap and control nodes i can see this message for boot strap:

~/openshift-install --dir ~/ocp-install wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.5.9
DEBUG Built from commit 0d5c871
INFO Waiting up to 20m0s for the Kubernetes API at https://api.lab.ocp.lan:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.lab.ocp.lan:6443/version?timeout=32s: EOF

And on the Control plane nodes i can see the error as seen below image:
image

May i know if i have missed something? i fell some issue with connectivity.

expose applications to external IPs

I deployed a oracle Database container and trying to access the DB using service. But after exposing the service, I'm not able to access or ping the external IP. i used nodeport and external IP as well but still not able to access the database.

[root@ocp-svc ~]# oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.30.0.1 443/TCP 21d
openshift ExternalName kubernetes.default.svc.cluster.local 21d
os-sample-java-web ClusterIP 172.30.104.87 8080/TCP,8443/TCP,8778/TCP 4d19h
sdsqa-oracle-svc02 NodePort 172.30.111.211 10.221.92.160 1521:31864/TCP 5d23h
sdsqa-oracle-svc03 NodePort 172.30.167.211 10.221.92.160 1521:31865/TCP 4d21h
sdsqa-oracle-svc04 NodePort 172.30.213.22 1521:31866/TCP 2d4h
[root@ocp-svc ~]#

Any idea how we can access applications using external IPs or nodeport mechanism.

Error installing worker node

Hi,

Following your install guide got an error on worker node installation with attempt more then 600 times and keep going. See attach file for the screenshot.

worker_node_install_01

Please help on what needs to be check and done here.

RHCOS Install file

This file (rhcos-X.X.X-x86_64-metal.x86_64.raw.gz) is no longer an available download when creating a cluster. Any idea what I should use instead?

CP nodes not taking url's

Hello Sir,

I do the same configuration of Service node on CentOS 7 rather than 8 because I was facing issues to download repository on CentOS8. All configuration I done all was successful on CentOS 7 but whenever I try to pull the files on Bootstrap node it failed to download the files from the URL
2021-06-09_10h21_25

I attached the screenshot of error

bootstap fails Error

hello, I have a problem when installing openshift 4.11 get the following error on the bootstrap machine.
Error: couldn't find boot device for /dev/sda
Resetting partition table
Error : install failed

https://photos.app.goo.gl/jFRcjr24hm11rVdN7

Openshift 4.7. There is a known bug that is causing the boostrap and install phases to fail on vmware version 14 VM's.

You may want to put a warning up on your docs :-) #2 months of pain.

https://bugzilla.redhat.com/show_bug.cgi?id=1935539

RH have issued a warning on their web site

Virtual machines (VMs) configured to use virtual hardware version 14 or greater might result in a failed installation. It is recommended to configure VMs with virtual hardware version 13. This is a known issue that is being addressed in BZ#1935539.

I have seen this with both ESXi 7.0b and Proxmox 6.3-6 ..may save people 2 months of pain I have been going through.

The nature of the problem means you just have to keep on restarting the bootstrap phase / install phase repeatedly ever time it times out and fails ... eventually, if lucky you will get through the problem within 24hrs.

Unable to "Open Console"

Hello All
I am new to openshift, I have successfully deployed a cluster as per the all given instructions.
In Overview tab, Cluster status is "Ready", I am unable to open link through "Open Console" button.
Giving an error "This site can't be reached".

Attached snap for reference, Thanks in advance!!

openshift-error

Combination of ESXi and BareMetal

This is more of a question rather than an issue @ryanhay . I would like to know if i can use a combination of ESXi and bare metals? I can spin up all the control plane on bare metal and even worker nodes on bare metals and keep services and bootstrap on the ESXi server?

My catch here is that the OCP network(2nd network that we create) how do i bridge that to my VMs? Also my bootstrap machine which is on ESXi is on the OCP network. But unable to fetch files from the Centos VM. Im a newbie to this, so these might sound like basic questions. Any help would be appreciated.

Image registry access outside the private network

Hi ,

Your Guide was very useful,
just one question though , how to access the OpenShift image registry from outside the network, the image registry url ends with apps.lab.ocp.lan but when I try to access it from the ocp-svc machine , it doesn't show up, the pod is running and it has an internal IP (assigned by libvirt I think) , so how can I access it from the ocp-svc machine ?

HAProxy stats not updating on adding two more worker nodes.

I added two more workers to the cluster.
added their entries in HAProxy configuration.

Entries are listing on stats page but only 2 out of 4 workers details get updated on stats page. WHat more configuration I should do to see updates from all 4 worker nodes.

Making sure that master is not schedulable

Hi Ryan

I would suggest making it clear that the:

sed -i 's/mastersSchedulable: true/mastersSchedulable: false/' ~/ocp-install/manifests/cluster-scheduler-02-config.yml

command is not optional. I missed it the first time and the following instructions only work when the commands were run before. Great guide though, highly appreciated!

Unable to access https://192.168.2.200:9000/stats and https://192.168.2.200:6443/

Looks like the openshift env built, but not able to access the following I see:

https://192.168.2.200:6443/

{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "forbidden: User "system:anonymous" cannot get path "/"",
"reason": "Forbidden",
"details": {

},
"code": 403
}

https://192.168.2.200:9000/stats

This site can’t provide a secure connection192.168.2.200 sent an invalid response.
ERR_SSL_PROTOCOL_ERROR

Details of configured env below.

[root@okd4-services ~]# curl -kv https://oauth-openshift.apps.lab.ocp.lan/healthz

  • Trying 192.168.2.200...
  • TCP_NODELAY set
  • Connected to oauth-openshift.apps.lab.ocp.lan (192.168.2.200) port 443 (#0)
  • ALPN, offering h2
  • ALPN, offering http/1.1
  • successfully set certificate verify locations:
  • CAfile: /etc/pki/tls/certs/ca-bundle.crt
    CApath: none
  • TLSv1.3 (OUT), TLS handshake, Client hello (1):
  • OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.lab.ocp.lan:443
    curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.lab.ocp.lan:443

[root@okd4-services ~]# sh -x /tmp/.g

  • oc get csr
    No resources found
  • oc get nodes
    NAME STATUS ROLES AGE VERSION
    okd4-compute-1.lab.ocp.lan Ready worker 6h10m v1.22.0-rc.0+a44d0f0
    okd4-compute-2.lab.ocp.lan Ready worker 5h35m v1.22.0-rc.0+a44d0f0
    okd4-control-plane-1.lab.ocp.lan Ready master,worker 6h20m v1.22.0-rc.0+a44d0f0
    okd4-control-plane-2.lab.ocp.lan Ready master,worker 6h17m v1.22.0-rc.0+a44d0f0
    okd4-control-plane-3.lab.ocp.lan Ready master,worker 6h14m v1.22.0-rc.0+a44d0f0
  • oc get pods
    NAME READY STATUS RESTARTS AGE
    myapache-7bcf9c6d44-tjmzs 1/1 Running 0 4h5m
    myapache-7bcf9c6d44-xmtqx 1/1 Running 0 4h5m
  • oc get pods -n openshift-network-operator
    NAME READY STATUS RESTARTS AGE
    network-operator-6f4564ffb-b82hr 1/1 Running 15 (14m ago) 6h25m
  • oc get daemonsets -n openshift-sdn
    NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    sdn 5 5 5 5 5 kubernetes.io/os=linux 6h19m
    sdn-controller 3 3 3 3 3 node-role.kubernetes.io/master= 6h19m
  • oc get network.config.openshift.io cluster -o yaml
    apiVersion: config.openshift.io/v1
    kind: Network
    metadata:
    creationTimestamp: "2021-11-12T17:50:19Z"
    generation: 2
    name: cluster
    resourceVersion: "3114"
    uid: 67438d09-d9df-4b9c-b19d-9863e990df05
    spec:
    clusterNetwork:
    • cidr: 10.128.0.0/14
      hostPrefix: 23
      externalIP:
      policy: {}
      networkType: OpenShiftSDN
      serviceNetwork:
    • 172.30.0.0/16
      status:
      clusterNetwork:
    • cidr: 10.128.0.0/14
      hostPrefix: 23
      clusterNetworkMTU: 1450
      networkType: OpenShiftSDN
      serviceNetwork:
    • 172.30.0.0/16

[root@okd4-services ~]# dig ocp.lan

; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> ocp.lan
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58296
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 0478592c3088da21edc7206c618f046c2b6a75fe33e2ea60 (good)
;; QUESTION SECTION:
;ocp.lan. IN A

;; AUTHORITY SECTION:
ocp.lan. 604800 IN SOA okd4-services.ocp.lan. contact.ocp.lan.ocp.lan. 1 604800 86400 2419200 604800

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 12 19:18:52 EST 2021
;; MSG SIZE rcvd: 130

[root@okd4-services ~]# dig -x 192.168.22.211

; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> -x 192.168.22.211
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16721
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 4c8bb19fd94b10e384c6db3b618f046f084882a1da69e033 (good)
;; QUESTION SECTION:
;211.22.168.192.in-addr.arpa. IN PTR

;; ANSWER SECTION:
211.22.168.192.in-addr.arpa. 604800 IN PTR okd4-compute-1.lab.ocp.lan.

;; AUTHORITY SECTION:
22.168.192.in-addr.arpa. 604800 IN NS okd4-services.ocp.lan.

;; ADDITIONAL SECTION:
okd4-services.ocp.lan. 604800 IN A 192.168.22.1

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 12 19:18:55 EST 2021
;; MSG SIZE rcvd: 168

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.