openshift / sandboxed-containers-operator Goto Github PK

An operator to enhance an Openshift/Kubernetes cluster to support running sandboxed containers

License: Apache License 2.0

Dockerfile 1.23% Shell 25.67% Go 68.41% Makefile 4.69%

sandboxed-containers-operator's Introduction

Introduction to sandboxed containers
- Features & benefits of sandboxed containers
OpenShift sandboxed containers Operator
Operator Development
Demos
Further Reading

Introduction to sandboxed containers

OpenShift sandboxed containers, based on the Kata Containers open source project, provides an Open Container Initiative (OCI) compliant container runtime using lightweight virtual machines, running your workloads in their own isolated kernel and therefore contributing an additional layer of isolation back to OpenShift’s Defense-in-Depth strategy.

Features & benefits of sandboxed containers

Isolated Developer Environments & Privileges Scoping As a developer working on debugging an application using state-of-the-art tooling you might need elevated privileges such as CAP_ADMIN or CAP_BPF. With OpenShift sandboxed containers, any impact will be limited to a separate dedicated kernel.
Legacy Containerized Workload Isolation You are mid-way in converting a containerized monolith into cloud-native microservices. However, the monolith still runs on your cluster unpatched and unmaintained. OpenShift sandboxed containers helps isolate it in its own kernel to reduce risk.
Safe Multi-tenancy & Resource Sharing (CI/CD Jobs, CNFs, ..) If you are providing a service to multiple tenants, it could mean that the service workloads are sharing the same resources (e.g., worker node). By deploying in a dedicated kernel, the impact of these workloads have on one another is greatly reduced.
Additional Isolation with Native Kubernetes User Experience OpenShift sandboxed containers is used as a compliant OCI runtime. Therefore, many operational patterns used with normal containers are still preserved including but not limited to image scanning, GitOps, Imagestreams, and so on.

Please refer to this blog for a detailed overview of sandboxed containers use cases and other related details.

OpenShift sandboxed containers Operator

The operator manages the lifecycle (install/configure/update) of sandboxed containers runtime (Kata containers) on OpenShift clusters.

Operator Architecture

The following diagram shows how the operator components are connected to the OpenShift overall architecture:

Here is a brief summary of the components:

OpenShift clusters consist of controller and worker nodes organized as machine config pools.
The Machine Config Operator (MCO) manages the operating system and keeps the cluster up to date and configured.
The control-plane nodes run all the services that are required to control the cluster such as the API server, etcd, controller-manager, and the scheduler.
The OpenShift sandboxed containers operator runs on a control plane node.
The cluster worker nodes run all the end-user workloads.
The container engine CRI-O uses either the default container runtime runc or, in sandboxed containers case, the Kata containers runtime.

KataConfig Custom Resource Definition

The operator owns and control the KataConfig Custom Resource Definition (CRD). Please refer to the code to find details of the KataConfig CRD.

Getting Started

Please refer to the OpenShift release specific documentation for getting started with sandboxed containers.

For OpenShift latest documentation please follow this doc

Further note that starting with OpenShift 4.9, the branch naming is tied to the operator version and not the OpenShift version. For example release-1.1 corresponds to the Operator release verson 1.1.x.

Operator Development

Please take a look at the following doc. Contributions are most welcome!!

Demos

You can find various demos in the following youtube channel.

sandboxed-containers-operator's People

Stargazers

Watchers

sandboxed-containers-operator's Issues

How is NetworkPolicy implemented

Previous discussed elsewhere, @fidencio suggested raising an issue here.

Some people might want to use Kata containers to get enhanced isolation between containers. I'd like to use NetworkPolicy to restrict what network resources a container can access, but would like to know how confident I can be of it's security. Is NetworkPolicy implemented outside or inside the VM - obviously if it were outside that would give increase assurance of that security control.

Adjust podOverhead values for kata-containers pods

We currently use the very same podOverhead values gotten from upstream, which is:

overhead:
  podFixed:
    cpu: 250m
    memory: 160Mi

This, unfortunately, is not accurate, and should be properly calculated for our use case, considering our restrictions (such as the kernel img we load).

Past measurements show we have a memory overhead of ~300Mi when spawning sleep pods. This value should be used instead of the 160Mi currently present there.

About the CPU, we'd need some help from @RobertKrawitz to get to a more accurate number for our use cases.

daemon install failed: error: Base packages would be removed: glusterfs-fuse-6.0-37.el8.x86_64

After running

$ oc apply -f deploy/crds/kataconfiguration.openshift.io_v1alpha1_kataconfig_cr.yaml

I checked the status with

$ oc describe kataconfig example-kataconfig

and observed that "Installation Status: section had errors

tatus:
Installation Status:
Completed:
Failed:
Failed Nodes Count: 3
Failed Nodes List:
Error: exit status 1
Name: worker-02.sunilc-kata.qe.devcluster.openshift.com
Error: exit status 1
Name: worker-00.sunilc-kata.qe.devcluster.openshift.com
Error: exit status 1
Name: worker-01.sunilc-kata.qe.devcluster.openshift.com
In Progress:
Kata Image:
Runtime Class:
Total Nodes Count: 3

All three kata-operator-daemon-install pods had errors in their logs:

W0817 14:50:52.926960 103161 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin/mkdir -p /host/opt/kata-install
/usr/bin/mkdir -p /etc/yum.repos.d/
/usr/bin/cp -f /usr/local/kata/latest/packages.repo /etc/yum.repos.d/
/usr/bin/cp -a /usr/local/kata/latest/packages /opt/kata-install/packages
Checking out tree c3b71a4...done
Enabled rpm-md repositories: packages
rpm-md repo 'packages' (cached); generated: 2020-07-21T11:38:31Z
Importing rpm-md...done
Resolving dependencies...done
error: Base packages would be removed: glusterfs-fuse-6.0-37.el8.x86_64
/bin/bash -c /usr/bin/rpm-ostree install --idempotent kata-runtime kata-osbuilder
2020/08/17 14:51:23 exit status 1

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2020-08-16-072105 True False 66m Cluster version is 4.6.0-0.nightly-2020-08-16-072105

error 'failed to check Node eligibility' when running make test

We should get rid of this error when running 'make test':

2022-01-28T13:55:01.397+0100	ERROR	controllers.KataConfig	Failed to check Node eligibility for running Kata containers	{"error": "No Nodes with required labels found. Is NFD running?"}
github.com/openshift/sandboxed-containers-operator/controllers.(*KataConfigOpenShiftReconciler).Reconcile
	/home/jfreiman/go/src/github.com/openshift/kata-operator/controllers/openshift_controller.go:163
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/jfreiman/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/jfreiman/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/jfreiman/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/jfreiman/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

Update to operator-sdk 1.23

We are currently on operator-sdk 1.20. Latest is 1.23. We need to update the operator-sdk version and apply all the migration changes listed in the docs:

https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.21.0/
https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.22.0/
https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.23.0/

This will also need a golang update to 1.18. Not sure if that can be a separate change or not, but I will file a separate issue for tracking.

Using custom payload resulting in error

Description

Using custom payload resulting in failure. Earlier assumption was some issue with custom payload image. However even using the default kata-operator-payload image via the configmap is failing as well

Steps to reproduce the issue:

Deploy Kata-operator
Create payload-config configmap

kind: ConfigMap
apiVersion: v1
metadata:
  name: payload-config
  namespace: kata-operator-system
data:
  # change to your custom payload repository:tag value
  daemon.payload: quay.io/isolatedcontainers/kata-operator-payload:4.7.0

Deploy kata

 oc create -f config/samples/kataconfiguration_v1_kataconfig.yaml

Describe the results you received:

Status:
  Installation Status:
    Completed:
    Failed:
      Failed Nodes Count:  2
      Failed Nodes List:
        Error:  image layout must contain index.json file
        Name:   worker1.ocp4.example.com
        Error:  image layout must contain index.json file
        Name:   worker0.ocp4.example.com

POD log

I1222 14:20:45.356871  100697 request.go:645] Throttling request took 1.02506771s, request: GET:https://172.30.0.1:443/apis/scheduling.k8s.io/v1beta1?timeout=32s
W1222 14:20:46.813982  100697 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020/12/22 14:20:46 Kata operator payload tag: 4.7.0
W1222 14:20:46.819014  100697 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin/mkdir -p /host/opt/kata-install
2020/12/22 14:20:46 WARNING: kataconfig installation is tainted
2020/12/22 14:20:46 Using env variable KATA_PAYLOAD_IMAGE quay.io/isolatedcontainers/kata-operator-payload:4.7.0
error creating Runtime bundle layout in /usr/local/kata

Describe the results you expected:
Successful Kata setup with custom payload

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

Name:         example-kataconfig
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2020-12-22T14:19:56Z
  Finalizers:
    finalizer.kataconfiguration.openshift.io
  Generation:        2
  Resource Version:  2790479
  Self Link:         /apis/kataconfiguration.openshift.io/v1/kataconfigs/example-kataconfig
  UID:               680b7e1b-d021-46a0-8c2d-172799332823
Spec:
  Config:
    Source Image:
  Kata Config Pool Selector:
    Match Labels:
      node-role.kubernetes.io/worker:
Status:
  Installation Status:
    Completed:
    Failed:
      Failed Nodes Count:  2
      Failed Nodes List:
        Error:  image layout must contain index.json file
        Name:   worker1.ocp4.example.com
        Error:  image layout must contain index.json file
        Name:   worker0.ocp4.example.com
    In Progress:
  Kata Image:
  Runtime Class:
  Total Nodes Count:  2
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
  Upgrade Status:

Additional environment details (platform, options, etc.):
ocp version

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-14-165231   True        False         6d      Cluster version is 4.7.0-0.nightly-2020-12-14-165231

daemon: get rid of policy.json file

Instead initialize the content of the file when we initialize the policyContext.

policyContext, err := signature.NewPolicyContext(policy)

See #27 (comment) for reference

cordoned workers are ignored, installation won't finish

Description
On a cluster where some workers are cordoned the Operator will create installer daemon pods on those nodes anyway.
This means the installation will never finish because it waits for all the daemon pods on the nodes to return when they are done.

Steps to reproduce the issue:

In a cluster with 3 worker nodes do 'oc adm cordon worker-x'
deploy operator and create CR
see how 3 kata-installer-daemon pods are created

Describe the results you received:
Operator tries to run installer daemon an all nodes, cordoned or not.

Describe the results you expected:
Operator detects cordoned nodes and takes them out of the list of nodes where
Kata should be installed on.

Kata Runtime Installation incomplete on 4.6 of s390x

Description

Steps to reproduce the issue:

Follow guide to install kata-operator
oc describe kataconfig example-kataconfig

Describe the results you received:

[root@kubernetes2 deploy]# oc describe kataconfig example-kataconfig
Name:         example-kataconfig
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1alpha1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2021-03-08T14:56:18Z
  Finalizers:
    finalizer.kataconfiguration.openshift.io
  Generation:  2
  Managed Fields:
    API Version:  kataconfiguration.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizer.kataconfiguration.openshift.io":
      f:spec:
        .:
        f:config:
          .:
          f:sourceImage:
        f:kataConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
      f:status:
        .:
        f:installationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:kataImage:
        f:runtimeClass:
        f:totalNodesCount:
        f:unInstallationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:upgradeStatus:
    Manager:      kata-operator
    Operation:    Update
    Time:         2021-03-08T14:56:18Z
    API Version:  kataconfiguration.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2021-03-08T14:56:18Z
  Resource Version:  25410
  Self Link:         /apis/kataconfiguration.openshift.io/v1alpha1/kataconfigs/example-kataconfig
  UID:               441690de-a508-4bf9-9d1a-af50468d2fba
Spec:
  Config:
    Source Image:
  Kata Config Pool Selector:
    Match Labels:
      node-role.kubernetes.io/worker:
Status:
  Installation Status:
    Completed:
    Failed:
    In Progress:
  Kata Image:
  Runtime Class:
  Total Nodes Count:  2
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
  Upgrade Status:
Events:  <none>

Describe the results you expected:
The field 'Completed nodes' in the status, the value matches the number of worker nodes the installation is completed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

see above

Additional environment details (platform, options, etc.):
Some errors from the kata-operator-daemon pod

[root@kubernetes2 deploy]# oc get pod -n kata-operator
NAME                                 READY   STATUS    RESTARTS   AGE
kata-operator-6b94c57479-cvzfz       1/1     Running   0          3m24s
kata-operator-daemon-install-8wbc4   1/1     Running   0          72s
kata-operator-daemon-install-dhbgf   1/1     Running   0          72s
[root@kubernetes2 deploy]# oc logs kata-operator-daemon-install-8wbc4 -n kata-operator
I0308 14:56:28.131259   46717 request.go:645] Throttling request took 1.034577527s, request: GET:https://172.30.0.1:443/apis/packages.operators.coreos.com/v1?timeout=32s
Error while installation: no matches for kind "KataConfig" in version "kataconfiguration.openshift.io/v1"[root@kubernetes2 deploy]#
[root@kubernetes2 deploy]# oc logs kata-operator-6b94c57479-cvzfz -n kata-operator
{"level":"info","ts":1615215254.6735685,"logger":"cmd","msg":"Operator Version: 4.6.0"}
{"level":"info","ts":1615215254.6738572,"logger":"cmd","msg":"Go Version: go1.14.12"}
{"level":"info","ts":1615215254.6738694,"logger":"cmd","msg":"Go OS/Arch: linux/s390x"}
{"level":"info","ts":1615215254.6738803,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1615215254.6743443,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1615215257.095014,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1615215257.111728,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1615215259.520002,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1615215259.5209112,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1615215266.7815835,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"kata-operator-metrics","Service.Namespace":"kata-operator"}
{"level":"info","ts":1615215269.272033,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1615215269.272406,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1615215269.2725463,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kataconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1615215269.373161,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kataconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1615215269.473884,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kataconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1615215269.574539,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kataconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1615215269.675325,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kataconfig-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1615215269.77606,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"kataconfig-controller"}
{"level":"info","ts":1615215269.7761438,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"kataconfig-controller","worker count":1}
{"level":"info","ts":1615215378.1485403,"logger":"controller_kataconfig","msg":"Reconciling KataConfig in OpenShift Cluster","Request.Name":"example-kataconfig"}
{"level":"info","ts":1615215378.2636633,"logger":"controller_kataconfig","msg":"Creating a new installation Daemonset","Request.Name":"example-kataconfig","ds.Namespace":"kata-operator","ds.Name":"kata-operator-daemon-install"}
{"level":"info","ts":1615215378.3360097,"logger":"controller_kataconfig","msg":"Adding Finalizer for the KataConfig","Request.Name":"example-kataconfig"}
{"level":"info","ts":1615215378.3438842,"logger":"controller_kataconfig","msg":"Reconciling KataConfig in OpenShift Cluster","Request.Name":"example-kataconfig"}
{"level":"info","ts":1615215378.3440423,"logger":"controller_kataconfig","msg":"Adding Finalizer for the KataConfig","Request.Name":"example-kataconfig"}
{"level":"error","ts":1615215378.3492286,"logger":"controller_kataconfig","msg":"Failed to update KataConfig with finalizer","Request.Name":"example-kataconfig","error":"Operation cannot be fulfilled on kataconfigs.kataconfiguration.openshift.io \"example-kataconfig\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/openshift/kata-operator/pkg/controller/kataconfig.(*ReconcileKataConfigOpenShift).addFinalizer\n\tkata-operator/pkg/controller/kataconfig/openshift_reconciler.go:407\ngithub.com/openshift/kata-operator/pkg/controller/kataconfig.(*ReconcileKataConfigOpenShift).processKataConfigInstallRequest\n\tkata-operator/pkg/controller/kataconfig/openshift_reconciler.go:530\ngithub.com/openshift/kata-operator/pkg/controller/kataconfig.(*ReconcileKataConfigOpenShift).Reconcile.func1\n\tkata-operator/pkg/controller/kataconfig/openshift_reconciler.go:99\ngithub.com/openshift/kata-operator/pkg/controller/kataconfig.(*ReconcileKataConfigOpenShift).Reconcile\n\tkata-operator/pkg/controller/kataconfig/openshift_reconciler.go:100\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1615215378.34932,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"kataconfig-controller","request":"/example-kataconfig","error":"Operation cannot be fulfilled on kataconfigs.kataconfiguration.openshift.io \"example-kataconfig\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1615215379.349677,"logger":"controller_kataconfig","msg":"Reconciling KataConfig in OpenShift Cluster","Request.Name":"example-kataconfig"}
{"level":"info","ts":1615215387.1940982,"logger":"controller_kataconfig","msg":"Reconciling KataConfig in OpenShift Cluster","Request.Name":"example-kataconfig"}
{"level":"info","ts":1615215387.6948373,"logger":"controller_kataconfig","msg":"Reconciling KataConfig in OpenShift Cluster","Request.Name":"example-kataconfig"}
[root@kubernetes2 deploy]#

add more operator debug data to must-gather image

Please also add the output of the following commands:

oc describe csv -n -openshift-sandboxed-containers-operator
oc describe subscription -n openshift-sandboxed-containers-operator

(pseudo code)
For deploymentUnit in

{pods,deployments,statefulsets,deploymentconfigs}

do:
echo "$deploymentUnit using kata runtime class still running:"
oc get $deploymentUnit -A -o json | jq -r '.items[] | select(.spec.runtimeClassName | test("kata")).metadata.name'

release-4.8 branch content still points to `master`

Description

When taking a look at instructions do install the operator from the release-4.8 branch, everything still points to the master branch.

Steps to reproduce the issue:

Access https://github.com/openshift/sandboxed-containers-operator/tree/release-4.8#without-a-git-repo-checkout
Verify that all the instructions & scripts are using content from master

Describe the results you received:
Everything points to master

Describe the results you expected:
If we branched release-4.8, the content should point to the release-4.8 branch.

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

(paste your output here)

Additional environment details (platform, options, etc.):

run 'make test' during pre-merge test

It seems like we don't run 'make test' in our pre-merge tests right now or it doesn't lead to failing the check.
#pr 168 introduced a bug that made 'make test' fail but it wasn't noticed before merge. Why I did not see it in
my test runs is a different matter :-)

vanilla pods created in the default namespace disappeared due to the kata-operator deployment

Description

As mentioned in the subject, vanilla pods created in the default namespace disappeared due to the kata-operator deployment.
In my case there were 2 vanilal pods running in the default namespace and those just got removed when the operator was deployed. It happened on a 4.6 OpenShift cluster, while deploying kata-operator from the release-4.6 branch.

Steps to reproduce the issue:

Prepare a clean 4.6 environment;
Create a simple pod in the default namespace;
Deploy kata-operator from release-4.6 branch;

Describe the results you received:
The simple pod disappeared from the default namespace;

Describe the results you expected:
Vanilla pods from the default namespace shouldn't be affected by the kata-operator deployment;

Additional information you deem important (e.g. issue happens only occasionally):
I don't have a cluster handy to try to reproduce it here, sorry. I don't know whether it affects OpenShift 4.7 as well.
Output of oc describe kataconfig <your-kataconfig>:

[kni@provisionhost-0-0 ~]$ oc describe kataconfig
Name:         example-kataconfig             
Namespace:                        
Labels:       <none>       
Annotations:  <none>                     
API Version:  kataconfiguration.openshift.io/v1alpha1
Kind:         KataConfig                                                                         
Metadata:                                                
  Creation Timestamp:  2021-01-28T09:35:54Z
  Finalizers:                                                                                
    finalizer.kataconfiguration.openshift.io
  Generation:  2            
  Managed Fields:              
    API Version:  kataconfiguration.openshift.io/v1alpha1
    Fields Type:  FieldsV1                                  
    fieldsV1:                                    
      f:metadata:                                
        f:annotations:          
          .:                      
          f:kubectl.kubernetes.io/last-applied-configuration:
    Manager:      kubectl-client-side-apply
    Operation:    Update                             
    Time:         2021-01-28T09:35:54Z                                                           
    API Version:  kataconfiguration.openshift.io/v1alpha1
    Fields Type:  FieldsV1                 
    fieldsV1:     
      f:status:                             
        f:installationStatus:
          f:completed:
            f:completedNodesCount:                       
            f:completedNodesList:
    Manager:      kata-install-daemon
    Operation:    Update
    Time:         2021-01-28T09:54:50Z
    API Version:  kataconfiguration.openshift.io/v1alpha1
    Fields Type:  FieldsV1                                   
    fieldsV1:                              
      f:metadata:         
        f:finalizers:
          .:
          v:"finalizer.kataconfiguration.openshift.io":
      f:spec:
        .:
        f:config:
        f:kataConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
      f:status:
        .:
        f:installationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:kataImage:
        f:runtimeClass:
        f:totalNodesCount:
        f:unInstallationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:upgradeStatus:
    Manager:         kata-operator
    Operation:       Update
    Time:            2021-01-28T09:54:50Z
  Resource Version:  285986
  Self Link:         /apis/kataconfiguration.openshift.io/v1alpha1/kataconfigs/example-kataconfig
  UID:               7e28ba18-7807-431b-9dcd-8bbe0b5bde55
Spec:
  Config:
    Source Image:
  Kata Config Pool Selector:
    Match Labels:
      node-role.kubernetes.io/worker:
Status:
  Installation Status:
    Completed:
      Completed Nodes Count:  7
      Completed Nodes List:
        worker-0-1
        worker-0-6
        worker-0-0
        worker-0-2
        worker-0-5
        worker-0-4
        worker-0-3
    Failed:
    In Progress:
  Kata Image:
  Runtime Class:      kata
  Total Nodes Count:  7
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
  Upgrade Status:
Events:  <none>

Additional environment details (platform, options, etc.):

[kni@provisionhost-0-0 ~]$ oc version 
Client Version: 4.6.15
Server Version: 4.6.15
Kubernetes Version: v1.19.0+1833054

[Question] What steps can I take to manually verify that the kata runtime is indeed being used by the example workload provided?

N/A

Kata deployment stalls with invalid extensions error

Description

Steps to reproduce the issue:

oc label node runtime=kata
update the CRD as described by the docs to target a specific node

kind: KataConfig
metadata:
  name: demolab-kataconfig
spec:
  kataConfigPoolSelector:
    matchLabels:
       runtime: kata

run the deployment steps as described by the docs

Describe the results you received:
Kata operator is deployed successfully.

Rollout of kata runtime stalls after creation of CR with

Status:
  Installation Status:
    Is In Progress:  True
    Completed:
    Failed:
      Failed Nodes Reason:  Node skylake-fn6tt-worker-5mzzg is reporting: "invalid extensions found: [sandboxed-containers]"

This condition is never reconciled.

Describe the results you expected:
Succesful deployment of kata runtime

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

Name:         demolab-kataconfig
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2021-06-02T11:20:01Z
  Finalizers:
    finalizer.kataconfiguration.openshift.io
  Generation:  2
  Managed Fields:
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:kataConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:runtime:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2021-06-02T11:20:01Z
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizer.kataconfiguration.openshift.io":
      f:spec:
        f:config:
          .:
          f:sourceImage:
      f:status:
        .:
        f:installationStatus:
          .:
          f:IsInProgress:
          f:completed:
          f:failed:
            .:
            f:failedNodesReason:
          f:inprogress:
        f:kataImage:
        f:prevMcpGeneration:
        f:runtimeClass:
        f:totalNodesCount:
        f:unInstallationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
            .:
            f:status:
        f:upgradeStatus:
    Manager:         manager
    Operation:       Update
    Time:            2021-06-02T11:22:37Z
  Resource Version:  52292
  Self Link:         /apis/kataconfiguration.openshift.io/v1/kataconfigs/demolab-kataconfig
  UID:               8fe6d0c6-d44d-4c99-8afb-238ec8ad2444
Spec:
  Config:
    Source Image:  
  Kata Config Pool Selector:
    Match Labels:
      Runtime:  kata
Status:
  Installation Status:
    Is In Progress:  True
    Completed:
    Failed:
      Failed Nodes Reason:  Node skylake-fn6tt-worker-5mzzg is reporting: "invalid extensions found: [sandboxed-containers]"
    Inprogress:
  Kata Image:           
  Prev Mcp Generation:  2
  Runtime Class:        
  Total Nodes Count:    1
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
      Status:  
  Upgrade Status:
Events:  <none>

Additional environment details (platform, options, etc.):
OCP 4.7.12, IPI install on vSphere.

Add Dockerfile and related artefacts for kata-monitor

Tracker issue to include related artefacts for kata-monitor

[RFE] Get rid of sleeps in our codebase

While reviewing #85 we've noticed that at least in a few places of the code we simply sleep a minute in order to wait for something to happen. A better practice, and also a less error prone one, would be having the code asynchronous, waiting for a condition to happen, in order to proceed.

This issue is opened as a place holder to investigate which parts of the code could be improved, could make usage of some async code to remove the sleeps, in the future.

Operator is not able to handle custom machineconfigpool properly

Description

When using custom machineconfig pool like the one shown below, different problems are encountered. For example adding a new node to the MCP doesn't trigger reconcile. Deleting the CR doesn't work.

apiVersion: kataconfiguration.openshift.io/v1
kind: KataConfig
metadata:
 name: example-kataconfig
spec:
 kataConfigPoolSelector:
 matchLabels:
 custom-kata1: test

Steps to reproduce the issue:

Case-1
$ oc label node worker0.ocp4.example.com custom-kata1=test
$ oc create -f custom_kataconfig.yaml
Once install succeeds, label the other worker
$ oc label node worker1.ocp4.example.com custom-kata1=test

Describe the results you received:
Watch the Kataconfig CR. The status will not get updated

Describe the results you expected:
Kataconfig CR should show updated status with new node info

Case-2
$ oc label node worker0.ocp4.example.com custom-kata1=test
$ oc label node worker1.ocp4.example.com custom-kata1=test
$ oc create -f custom_kataconfig.yaml
Once installation is over, delete the KataConfig CR.

Describe the results you received:
Delete never completes.

Describe the results you expected:
Successful deletion

status not updated when nodes are added/removed from machine config pool

Description

the status update (installation/uninstallation) is only updated when the kataconfig CR is
created or deleted. When a custom MCP is used and a node is added or removed it is
not updated even though it triggers enabling/disabling the sandboxed-containers extension
on nodes

Steps to reproduce the issue:

create a kataconfig and specify a node selector
add label to two nodes -> extension will be enabled on those nodes
remove label from one of those nodes -> extension is disabled on this node but status doesn't reflect that operatorion

Describe the results you received:

when removing a label from the node there was no update

Describe the results you expected:

an updated status showing that something is going on

What we could do is:

cache the previous size of the mcp and when mcp status is updating and size
1. increase: clear install status, set installing=true, update install stats, set installing=false when mcp status is 'updated'
2. decreased: clear uninstall status, set uninstalling=true, update uninstall stats, set uninstalling=false when mcp status is 'updated'
3. when size is unchanged, but mcp is updating: do 1. and 2 (maybe same number of nodes was removed and added at the same time)

add .dockerignore from operator-sdk

operater-sdk generates a .dockerignore file that looks like this:

# More info: https://docs.docker.com/engine/reference/builder/#dockerignore-file
# Ignore build and test binaries.
bin/
testbin/

We should start with this.

[RFE] Support sandboxed-containers-operator as part of OKD

Currently, sandboxed-containers-operator only works on OpenShift. This happens mainly due to packages we depend on that are not part of Fedora (and, consequently, not part of Fedora CoreOS).

We need to re-evaluate this as soon as we have a modular QEMU available on Fedora and we also take advantage of the modular QEMU downstream. Plus, internally at Red Hat, we have folks evaluating the possibility / benefits of having the project as part of OKD.

This is issue is a place holder for the effort, and those interested in OKD can subscribe to this issue here.

fix retrieving list of nodes

Description
When the cluster has nodes in the master pool but none in the worker pool the operator
will not update status information in the KataConfig CR. This is due to a bug in the way we
update the nodes, we have a hardcoded node/roles=worker in there.

Steps to reproduce the issue:
This is not super easy to reproduce, you have to change the machine config pools.

Create a cluster, install the operator

take all nodes out of the worker machine config pool, remove the worker label from the nodes
create a KataConfig, dont select a custom node selector
watch during the installation that the status in the KataConfig is not updated, however the installation will still go through, but the runtime will be automatically installed on the maser nodes

Describe the results you received:
Status in KataConfig was not updated during installation/uninstallation.

Describe the results you expected:
An updates Status in the KataConfig (nodes in progress, completed etc.)

Additional information you deem important (e.g. issue happens only occasionally):
This should be ported to release-1.1 and the master branch

NodeSelector is not being set for RuntimeClass when using MatchExpressions in KataConfigPoolSelector

Description
NodeSelector is not being set for RuntimeClass when using MatchExpressions in KataConfigPoolSelector

Steps to reproduce the issue:

Create KataConfig from OpenShift UI which uses matchExpressions or use the following yaml

apiVersion: kataconfiguration.openshift.io/v1
kind: KataConfig
metadata:
  name: example-kataconfig
spec:
  kataConfigPoolSelector:
     matchExpressions:
       - key: custom-kata1
         operator: In
         values:
         - "test"

Describe the results you received:

Kata runtime will be successfully installed on the nodes labelled with custom-kata1=test, however NodeSelector settings in RuntimeClass will be empty
oc get runtimeclass kata -o yaml

scheduling:
  nodeSelector: {}

Describe the results you expected:
oc get runtimeclass kata -o yaml

scheduling:
    nodeSelector:
        custom-kata1: test

Limit the installation to known and tested cases

Description

This afternoon I've created a cluster on azure using clusterbot and it gave me a "3 mains, 1 worker" cluster. After trying to deploy the kata-operator and facing it stuck for more than an hours, I've pinged @jensfr and he told me this may be an edge case and that the operator couldn't finish the installation as the the worker not got degraded, meaining that the node couldn't be rebooted because the running pods couldn't be moved to another worker node.

Steps to reproduce the issue:

send launch 4.7 azure to clusterbot
ensure you get a "3 mains, 1 worker" node (oc get nodes will provide you this info)
deploy kata-operator according to its README

Describe the results you received:

The installation got stuck without much information provided.

Describe the results you expected:

Either a big & large "this cluster configuration is not supported" error before I start the installation, or the installation to finish successfully.

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

fidencio@dahmer ~/src/upstream/kata-operator $ oc describe kataconfig example-kataconfig
Name:         example-kataconfig
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2021-02-03T13:05:05Z
  Finalizers:
    finalizer.kataconfiguration.openshift.io
  Generation:        2
  Resource Version:  28831
  Self Link:         /apis/kataconfiguration.openshift.io/v1/kataconfigs/example-kataconfig
  UID:               150a3ff7-5955-4d64-ae6a-1d4c1ac56105
Spec:
  Config:
    Source Image:  
  Kata Config Pool Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
Status:
  Installation Status:
    Completed:
    Failed:
    In Progress:
      Binaries Install Nodes List:
        ci-ln-gbh7ws2-002ac-97m5b-worker-westus-d8smw
      In Progress Nodes Count:  1
  Kata Image:                   
  Runtime Class:                
  Total Nodes Count:            1
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
  Upgrade Status:
Events:  <none>

Additional environment details (platform, options, etc.):

installation stuck when selected node is in custo machine-config pool

When one of the worker nodes is in an additonal machine-config pool and the user wants to install only
on this node the machine-config is not created.

Steps to reproduce the issue:

add one worker node to a new machine-config pool for example worker-special
start the installation
the daemon pods run on all nodes, but the next step - deploying the crio config files - is not done for the node in the worker-special pool

Describe the results you received:
installation never finishes

Describe the results you expected:
machine-config is also deployed to the node in the worker-special pool

Additional information you deem important (e.g. issue happens only occasionally):
I found a similar problem when putting one node in a pool called worker-perf and then starting a normal install.
So without adding a label to nodes.
The problem seems to be that in our machine config we only have the label worker mentioned. But the one node is also
in worker-perf. So I edited the machine-config and added another line with
machineconfiguration.openshift.io/role: worker-perf

After that the node rebooted and the config file was put there. But looking at the machine config again with 'oc edit'
I see that the role worker was just replaced with worker-perf. So machine-config-operator doesn't seem to support
multiple roles for a machine-config entry. I found this GH issue which confirms it (as of 2019)
openshift/machine-config-operator#429 (comment by kikisdeliveryservice)

Problem: We tell the user the default is to install on all your worker nodes. But we don't do it
when a node is in two machine config pools in the same time. For example worker and worker-perf.
Idea: Create multiple machine-config entries. One for all machine-config pools except master.

Removed unused code

Description
With the switch to RHCOS extension quite a few code paths are unused.
For example KataInstallConfig is no longer used, payload image is not needed since daemonset is no longer used and
kubernetes controller is obsolete.

Don't call rpm-ostree uninstall --all

During uninstall the daemon calls 'rpm-ostree uninstall --all'. This could potentially remove more RPMs than we installed for kata. This should be fixed (on master (after PR #27 is merged) and release-4.6 branch) by explicitly doing uninstall of kata-runtime and kata-osbuilder packages.

See https://github.com/openshift/kata-operator/blob/290bb0b3a139f51922b1ee0ebd0358edd6852784/images/daemon/pkg/daemon/kata_openshift.go#L343

Create and publish a release-4.9 branch

Description

There's no release-4.9 branch, even with OpenShift 4.9 being released on Oct, 18th 2021.

Steps to reproduce the issue:

Open a browser
Go to https://github.com/openshift/sandboxed-containers-operator
Check the branches

Describe the results you received:

The newest release-x.y branch is 4.8.

Describe the results you expected:

A release-4.9 branch should be present and published, considering master is being used for the "under development OCP 4.10".

Additional information you deem important (e.g. issue happens only occasionally):

Not relevant

Output of oc describe kataconfig <your-kataconfig>:

Not relevant

Additional environment details (platform, options, etc.):

Not relevant

Remove `/etc/yum.repos.d/packages.repo` after the installation is finished (successfully or not)

Leaving /etc/yum.repos.d/packages.repo may cause updates / upgrades to break as rpm-ostree would bail due to the non-existent repository.

Migration to Ignition spec 3

The openshift controller is currently using Ignition spec 2
https://github.com/openshift/kata-operator/blob/d36a9ddde10e31e9dad2cb7b3b803cc79b7f3b8b/controllers/openshift_controller.go#L28
https://github.com/openshift/kata-operator/blob/d36a9ddde10e31e9dad2cb7b3b803cc79b7f3b8b/go.mod#L7
which is still supported on 4.6 but not under development anymore. Ignition spec 3 support is recommended from 4.6 onward.

Let me know if you need help to convert the code to spec 3. Note that Ignition spec 3 support is only available since 4.6.

release-1.1 deployment instructions is pointing to wrong links

Description

https://github.com/openshift/sandboxed-containers-operator/tree/release-1.1#without-a-git-repo-checkout points to a non-existent file.

The document mentions:

To deploy the operator and create a custom resource (which installs Kata on all worker nodes), run curl https://raw.githubusercontent.com/openshift/sandboxed-containers-operator/master/deploy/install.sh | bash

However, https://raw.githubusercontent.com/openshift/sandboxed-containers-operator/master/deploy/install.sh is a non-existent file.
Most likely it should point to https://raw.githubusercontent.com/openshift/sandboxed-containers-operator/release-1.1/deploy/install.sh

Steps to reproduce the issue:

Open your web browser
Go to https://github.com/openshift/sandboxed-containers-operator/tree/release-1.1#without-a-git-repo-checkout
Try to follow the instructions provided

Describe the results you received:

[fidencio@kundera ocp]$ curl https://raw.githubusercontent.com/openshift/sandboxed-containers-operator/master/deploy/install.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    14  100    14    0     0     57      0 --:--:-- --:--:-- --:--:--    57
bash: line 1: 404:: command not found

Describe the results you expected:

The deployment would start.

Additional information you deem important (e.g. issue happens only occasionally):

Not relevant

Output of oc describe kataconfig <your-kataconfig>:

Not relevant

Additional environment details (platform, options, etc.):

Not relevant

daemon: "E1123 Unable to rotate token "

When the rpm installation is finished at the end of each daemon installer log we see
above error message. It has to do with chroot'ing and then the container can't see the
service accound tokens any more. This can probably be solved in a similar way to how it was done in the
daemon of MCO: openshift/machine-config-operator#46

Cannot install 1.1.0 operator in OCP 4.9

Description

Steps to reproduce the issue:

Deploy an OCP 4.9 cluster (virtual cluster with kcli in my case)
Add appropriate CatalogSource to access the 1.1.0 operator image
Start installation

Describe the results you received:

The installation seems to complete in less than 30 seconds : the web UI notifies that the operator is installed and ready to be used. But then after a couple of seconds, the operator's status transitions to "Installing". This continues again and again until the cluster sets a CrashLoopBackOff status and installation is canceled.

Describe the results you expected:

The installation succeeds and I can use Kata in the cluster.

Additional information you deem important (e.g. issue happens only occasionally):

This is reproducible 100% in my setup.

Here's the console output of the pod that fails in a loop during installation :

I0909 11:42:30.555078       1 request.go:645] Throttling request took 1.089807089s, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
2021-09-09T11:42:37.952Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
I0909 11:42:40.603369       1 request.go:645] Throttling request took 2.644938699s, request: GET:https://172.30.0.1:443/apis/whereabouts.cni.cncf.io/v1alpha1?timeout=32s
2021-09-09T11:42:46.355Z	ERROR	setup	unable to use discovery client	{"error": "unable to retrieve the complete list of server APIs: subresources.kubevirt.io/v1: the server is currently unable to handle the request, subresources.kubevirt.io/v1alpha3: the server is currently unable to handle the request, upload.cdi.kubevirt.io/v1alpha1: the server is currently unable to handle the request, upload.cdi.kubevirt.io/v1beta1: the server is currently unable to handle the request"}
github.com/go-logr/zapr.(*zapLogger).Error
	/remote-source/deps/gomod/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
main.main
	/remote-source/app/main.go:78
runtime.main
	/usr/lib/golang/src/runtime/proc.go:204

Output of oc describe kataconfig <your-kataconfig>:

N/A

Additional environment details (platform, options, etc.):

CatalogSource for the operator image

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
 name:  gregs-operators
 namespace: openshift-marketplace
spec:
 sourceType: grpc
 image: quay.io/jensfr/sandboxed-containers-operator-index:1.1.0-5

Modified files after building

After running docker-build, bundle-build, catalog-build There are a number of modified files. Are these safe if they accidentally get checked in? Should they be getting updated? Should they be updated in the repo?

bundle.Dockerfile
config/crd/bases/kataconfiguration.openshift.io_kataconfigs.yaml
config/rbac/role.yaml
config/webhook/manifests.yaml
bundle/manifests/kataconfiguration.openshift.io_kataconfigs.yaml
bundle/metadata/annotations.yaml

release-1.1: deployment breaks due to `flag provided but not defined: -metrics-addr`

Description

It's absolutely impossible to deploy the sandboxed-containers operator following instructions from the release-1.1 branch.
After manually editing the installation script to point to the content of the release-1.1 branch, due to #158, the deployment simply breaks due to:

[fidencio@kundera ~]$ oc -n openshift-sandboxed-containers-operator logs openshift-sandboxed-containers-controller-manager-7df78c78mn4fw                                             
flag provided but not defined: -metrics-addr
Usage of /manager:        
  -kubeconfig string                                                                                                                                                                 
        Paths to a kubeconfig. Only required if out-of-cluster.
  -leader-elect                                                                                                                                                                      
        Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.                                                        
  -metrics-bind-address string

Steps to reproduce the issue:

Open your web browser
Go to https://github.com/openshift/sandboxed-containers-operator/tree/release-1.1#without-a-git-repo-checkout
Work around #158 by:
3.1. Manually getting https://raw.githubusercontent.com/openshift/sandboxed-containers-operator/master/deploy/install.sh
3.2. Change references from master to release-1.1
Run the script
After some time check the state of the controller manager pod (oc get pods -n openshift-sandboxed-containers-operator | grep controller-manager), and verify it's in CrashLoopBackOff
Check its logs (

Describe the results you received:

[fidencio@kundera ~]$ oc get pods -n openshift-sandboxed-containers-operator | grep controller-manager                                                                               
openshift-sandboxed-containers-controller-manager-7df78c78mn4fw   0/1     CrashLoopBackOff   10         28m                                                                          
[fidencio@kundera ~]$ oc -n openshift-sandboxed-containers-operator logs openshift-sandboxed-containers-controller-manager-7df78c78mn4fw                                             
flag provided but not defined: -metrics-addr
Usage of /manager:        
  -kubeconfig string                                                                                                                                                                 
        Paths to a kubeconfig. Only required if out-of-cluster.
  -leader-elect                                                                                                                                                                      
        Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.                                                        
  -metrics-bind-address string
        The address the metric endpoint binds to. (default ":8080")

Describe the results you expected:

That the installation would succeed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

[fidencio@kundera ~]$ oc describe kataconfig example-kataconfig
Name:         example-kataconfig
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2021-11-30T13:25:40Z
  Generation:          1
  Managed Fields:
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:kataConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:custom-kata1:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2021-11-30T13:25:40Z
  Resource Version:  24566
  UID:               a35db2f0-eea0-43ed-b41c-2c8dc14e4292
Spec:
  Kata Config Pool Selector:
    Match Labels:
      custom-kata1:  test
Events:              <none>

Additional environment details (platform, options, etc.):

Add build artifacts to .gitignore

I see these new or modified files when I run make docker-build, bundle-build, and catalog-build. They should be added to the .gitignore file so they don't accidentally get checked in.

Untracked files:
bin/controller-gen
bin/setup-envtest
bin/kustomize
bin/opm

Uninstall fails due to incorrect node name when Kata is installed on selected nodes

Description

Uninstall fails when Kata is installed on selected nodes
Steps to reproduce the issue:

Create a KataConfig with kataConfigPoolSelector
Wait for Kata payload installation to succeed
Delete the KataConfig (Uninstall)

Describe the results you received:

2021-02-23T18:00:24.139Z        INFO    controllers.KataConfig  Removing the kata pool selector label from the node     {"node name ": "ip-10-0-223-105"}
2021-02-23T18:00:24.144Z        ERROR   controller      Reconciler error        {"reconcilerGroup": "kataconfiguration.openshift.io", "reconcilerKind": "KataConfig", "controller": "kataconfig", "name": "example-kataconfig", "namespace": "", "error": "nodes \"ip-10-0-223-105\" not found"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:246
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:197
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90

Describe the results you expected:
Successful uninstall

Additional information you deem important (e.g. issue happens only occasionally):
The problem happens due to a mismatch of the node name. Kata install/uninstall is using host name and not fqdn as the node name. For environments like AWS the actual node name is fqdn - ip-10-0-223-105.ec2.internal whereas the host name returned is ip-10-0-223-105.

A possible fix is to modify the following code to return fqdn - https://github.com/openshift/sandboxed-containers-operator/blob/master/images/daemon/pkg/daemon/kata_actions.go#L62

Additional environment details (platform, options, etc.):
OpenShift cluster in AWS

DEVELOPMENT.md is out of date and does not work

Prereqs say golang 1.16+, but 1.18 apparently doesn't work. Also operator-sdk is up to 1.20

make build results in an error during "generate' due to missing controller-gen

/home/cmeadors/repos/github.com/openshift/sandboxed-containers-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
bash: line 1: /home/cmeadors/repos/github.com/openshift/sandboxed-containers-operator/bin/controller-gen: No such file or directory
make: *** [Makefile:91: manifests] Error 127

make controll-gen succeeds but does not resolve the error. Only manually "go getting" and copying the binary to the local bin dir resolves the issue.

Then make build fails because setup-envtest binary is not found similarly to controller-gen (make envtest does not put the binary in the local bin dir)

Fix command to clean up after installer pods finished

Add /host prefix to directories which need to be deleted as this command is not running in chroot

New uninstallation DS created after it finished

Description
After uninstallation successfully finished another uninstallation daemonset is created.

Steps to reproduce the issue:
Happened only once, seems like a race condition

Remove kataconfig CR
Wait until uninstall finished
See a new DS created and error message as copied to bottom of this post

Describe the results you received:
2021-01-18T10:52:04.480Z ERROR controller Reconciler error {"reconcilerGroup": "kataconfiguration.openshift.io", "reconcilerKind": "KataConfig", "controller": "kataconfig", "name": "example-kataconfig", "namespace": "", "error": "Operation cannot be fulfilled on kataconfigs.kataconfiguration.openshift.io "example-kataconfig": StorageError: invalid object, Code: 4, Key: /kubernetes.io/kataconfiguration.openshift.io/kataconfigs/example-kataconfig, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 7dd7007d-d21a-4375-85fe-63b71f453230, UID in object meta: "}

Describe the results you expected:
No new DS is created after all worker machines are ready again and no error message in controller logs

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:
kataconfig CR was deleted as part of uninstall

Additional environment details (platform, options, etc.):

2021-01-18T10:52:04.180Z INFO controllers.KataConfig Monitoring worker mcp {"worker mcp name": "worker", "ready machines": 3, "total machines": 3}
2021-01-18T10:52:04.193Z INFO controllers.KataConfig Deleting uninstall daemonset
2021-01-18T10:52:04.201Z INFO controllers.KataConfig Uninstallation completed on all nodes. Proceeding with the KataConfig deletion
2021-01-18T10:52:04.222Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "kataconfiguration.openshift.io", "reconcilerKind": "KataConfig", "controller": "kataconfig", "name": "example-kataconfig", "namespace": ""}
2021-01-18T10:52:04.223Z INFO controllers.KataConfig Reconciling KataConfig in OpenShift Cluster
2021-01-18T10:52:04.223Z INFO controllers.KataConfig KataConfig deletion in progress:
2021-01-18T10:52:04.275Z INFO controllers.KataConfig Creating a new uninstallation Daemonset {"ds.Namespace": "kata-operator-system", "ds.Name": "kata-operator-daemon-uninstall"}
2021-01-18T10:52:04.373Z INFO controllers.KataConfig Making sure parent MCP is synced properly
2021-01-18T10:52:04.473Z INFO controllers.KataConfig Monitoring worker mcp {"worker mcp name": "worker", "ready machines": 3, "total machines": 3}
2021-01-18T10:52:04.480Z ERROR controller Reconciler error {"reconcilerGroup": "kataconfiguration.openshift.io", "reconcilerKind": "KataConfig", "controller": "kataconfig", "name": "example-kataconfig", "namespace": "", "error": "Operation cannot be fulfilled on kataconfigs.kataconfiguration.openshift.io "example-kataconfig": StorageError: invalid object, Code: 4, Key: /kubernetes.io/kataconfiguration.openshift.io/kataconfigs/example-kataconfig, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 7dd7007d-d21a-4375-85fe-63b71f453230, UID in object meta: "}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:246
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:197
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90
2021-01-18T10:52:05.480Z INFO controllers.KataConfig Reconciling KataConfig in OpenShift Cluster
2021-01-18T10:52:05.480Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "kataconfiguration.openshift.io", "reconcilerKind": "KataConfig", "controller": "kataconfig", "name": "example-kataconfig", "namespace": ""}

README from `master` doesn't point to 4.8 nor 4.9 README

Description

README from the master branch points to 4.7 and 4.6 installation instructions, but doesn't mentioned 4.8 one.

Steps to reproduce the issue:

Open a browser
Go to https://github.com/openshift/sandboxed-containers-operator
Read the README

Describe the results you received:

The first thing you'll see is two warnings saying that "if you're using OCP 4.x please follow ...", for both OpenShift 4.6 and 4.7.

Describe the results you expected:

The same info presented for 4.6, and 4.7. to be also present for OpenShift 4.8 and OpenShift 4.9

Additional information you deem important (e.g. issue happens only occasionally):

Not relevant

Output of oc describe kataconfig <your-kataconfig>:

Not relevant

Additional environment details (platform, options, etc.):

Not relevant

Missing manifest for kata-operator-daemon

Description

Steps to reproduce the issue:

Clone the operator code
make install && make deploy IMG=quay.io/bpradipt/kata-operator:test
oc create -f config/samples/kataconfiguration_v1_kataconfig.yaml

Describe the results you received:
Kata installation unsuccessful

NAME                                                READY   STATUS             RESTARTS   AGE   IP             NODE                       NOMINATED NODE   READINESS GATES
kata-operator-controller-manager-8568c77c5d-qkgwg   2/2     Running            0          35m   10.254.2.94    master1.ocp4.example.com   <none>           <none>
kata-operator-daemon-install-fvn94                  0/1     ImagePullBackOff   0          34m   192.168.7.12   worker1.ocp4.example.com   <none>           <none>
kata-operator-daemon-install-pbhnh                  0/1     ImagePullBackOff   0          34m   192.168.7.11   worker0.ocp4.example.com   <none>           <none>

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  28m                    default-scheduler  Successfully assigned kata-operator-system/kata-operator-daemon-install-fvn94 to worker1.ocp4.example.com
  Normal   Pulling    25m (x4 over 27m)      kubelet            Pulling image "quay.io/isolatedcontainers/kata-operator-daemon@sha256:e34f796499ad304b82833904c27aa1fef837df5ec33851a14a722bc2c4eeaea3"
  Warning  Failed     25m (x4 over 27m)      kubelet            Failed to pull image "quay.io/isolatedcontainers/kata-operator-daemon@sha256:e34f796499ad304b82833904c27aa1fef837df5ec33851a14a722bc2c4eeaea3": rpc error: code = Unknown desc = Error reading manifest sha256:e34f796499ad304b82833904c27aa1fef837df5ec33851a14a722bc2c4eeaea3 in quay.io/isolatedcontainers/kata-operator-daemon: manifest unknown: manifest unknown
  Warning  Failed     25m (x4 over 27m)      kubelet            Error: ErrImagePull
  Normal   BackOff    7m17s (x88 over 27m)   kubelet            Back-off pulling image "quay.io/isolatedcontainers/kata-operator-daemon@sha256:e34f796499ad304b82833904c27aa1fef837df5ec33851a14a722bc2c4eeaea3"
  Warning  Failed     2m21s (x109 over 27m)  kubelet            Error: ImagePullBackOff

Describe the results you expected:

Successful Kata install

daemon: consider getting rid of channel in main.go

At the end of main() we keep the daemon running forever by having it wait for receiving an int via a channel. This is because the controllers reconcile() assumes the daemon is always running. If we want to get rid of this piece of code we have to change the reconcile function. Consider if this change is worth it or if we keep it as is.

See https://github.com/openshift/kata-operator/pull/27/files#r522928096 for reference

followed "without a git repo checkout" steps, failed to launch qemu

Description

# oc version
Client Version: 4.7.0-rc.3
Server Version: 4.7.0-0.nightly-2021-02-18-110409
Kubernetes Version: v1.20.0+bd9e442

followed https://github.com/openshift/kata-operator
without a git repo checkout

1. Make sure that oc is configured to talk to the cluster

2. To deploy the operator and create a custom resource (which installs Kata on all worker nodes), run curl https://raw.githubusercontent.com/openshift/kata-operator/master/deploy/install.sh | bash

result we got

# oc describe kataconfig example-kataconfig
...
Status:
  Installation Status:
    Completed:
      Completed Nodes Count:  3
      Completed Nodes List:
        ip-10-0-207-246
        ip-10-0-149-128
        ip-10-0-176-142
    Failed:
    In Progress:
...

# oc get node -o wide | grep worker
ip-10-0-149-128.us-east-2.compute.internal   Ready    worker   108m   v1.20.0+ba45583   10.0.149.128   <none>        Red Hat Enterprise Linux CoreOS 47.83.202102090044-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.git8921e00.el8.51
ip-10-0-176-142.us-east-2.compute.internal   Ready    worker   108m   v1.20.0+ba45583   10.0.176.142   <none>        Red Hat Enterprise Linux CoreOS 47.83.202102090044-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.git8921e00.el8.51
ip-10-0-207-246.us-east-2.compute.internal   Ready    worker   108m   v1.20.0+ba45583   10.0.207.246   <none>        Red Hat Enterprise Linux CoreOS 47.83.202102090044-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.git8921e00.el8.51

Apply example-fedora.yaml from kata-operator repo

# oc -n default get po
NAME             READY   STATUS              RESTARTS   AGE
example-fedora   0/1     ContainerCreating   0          113s

# oc -n default describe pod example-fedora
...
Events:
  Type     Reason                  Age                             From               Message
  ----     ------                  ----                            ----               -------
  Normal   Scheduled               <invalid>                       default-scheduler  Successfully assigned default/example-fedora to ip-10-0-176-142.us-east-2.compute.internal
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.212/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.213/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.214/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.215/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.216/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.217/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.218/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.219/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.220/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.221/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.222/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.223/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.225/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.226/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.227/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.228/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.229/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.230/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.231/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.232/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.233/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.234/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.235/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.236/23]
  Normal   AddedInterface          <invalid>                       multus             Add eth0 [10.131.0.237/23]
  Warning  FailedCreatePodSandBox  <invalid> (x25 over <invalid>)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: failed to launch qemu: exit status 1, error messages from qemu log: Could not access KVM kernel module: No such file or directory
qemu-kvm: failed to initialize kvm: No such file or directory
: unknown

Additional environment details (platform, options, etc.):
4.7 aws cluster

# TYPE virt_platform gauge
virt_platform{type="kvm"} 1
virt_platform{type="aws"} 1

Deployment instructions are missing

Description

Deployment instructions are missing in the README.

Basic CI for PR checks

We need to implement basic CI for PR checks:

linting
build
run unit tests
get coverage for unit tests

These actions should be manually runable, i.e. as make targets. Then they can easily be automated with github actions or a prow job. This follows the pattern of the CI work done in confidential containers and hopefully will be reusable in the coco operator repo as well.

Support Firecracker

Previous discussed elsewhere, @fidencio suggested raising an issue here.

Can/should the OpenShift implementation of Kata containers support Firecracker? AWS trust Firecracker to isolate lambda functions from different users running on the same physical machine - that is an exceptionally high security bar. QEMU is great, but I'd argue Firecracker provides a higher level of isolation due to not being a general purpose implementation and being focused on simplicity and security, uses Rust etc.

Why would you want the option of a higher level of isolation?

Running workloads with enhanced separation. If someone compromise an Internet facing container in my cluster, I want to be confident they couldn't break out onto the host.
Running untrusted workloads. Maybe I want to do something crazy like automated malware analysis within containers.

Update to golang 1.18

We are on golang 1.17 currently. Operator-sdk 1.23 (latest) requires 1.18. ( Issue for that is #213 ). We will need to update the version and code to support 1.18.

What's the source code of image quay.io/harpatil/kata-install-daemon:1.5

I'm working on integrate kata runtime with OCP 4.6 version on IBM Z(s390x) platform. Currently I built out the kata-operator, kata-operator-daemon and kata-operator-payload image, but in https://github.com/openshift/kata-operator/blob/release-4.6/images/daemon/image/daemon.yaml#L18,
it reference a image kata-install-daemon, I think it's in the private repository of @harche or @jensfr, since it couldn't be found anywhere.

- name: kata-install-daemon 
        image: quay.io/harpatil/kata-install-daemon:1.5
        #image: quay.io/jensfr/kata-install-daemon:latest

Could someone point out where is the code of kata-install-daemon, then I can built it for s390x platform.

Install fails on OKD 4.7

Description

Installation fails on OKD 4.7 with FCOS 33.

Steps to reproduce the issue:

curl https://raw.githubusercontent.com/openshift/kata-operator/master/deploy/install.sh | bash

Describe the results you received:
Installation fails. Looks like the rpm packages' dependencies are not friendly to FCOS...

Describe the results you expected:
Installation succeed ??? (I am actually asking, given it is OKD)

Additional information you deem important (e.g. issue happens only occasionally):

Output of oc describe kataconfig <your-kataconfig>:

$ oc describe kataconfig example-kataconfig
Name:         example-kataconfig
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kataconfiguration.openshift.io/v1
Kind:         KataConfig
Metadata:
  Creation Timestamp:  2021-03-19T01:32:06Z
  Finalizers:
    finalizer.kataconfiguration.openshift.io
  Generation:  2
  Managed Fields:
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2021-03-19T01:32:06Z
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizer.kataconfiguration.openshift.io":
      f:spec:
        .:
        f:config:
          .:
          f:sourceImage:
        f:kataConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
      f:status:
        .:
        f:installationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:kataImage:
        f:runtimeClass:
        f:totalNodesCount:
        f:unInstallationStatus:
          .:
          f:completed:
          f:failed:
          f:inProgress:
        f:upgradeStatus:
    Manager:      manager
    Operation:    Update
    Time:         2021-03-19T01:34:12Z
    API Version:  kataconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:installationStatus:
          f:failed:
            f:failedNodesCount:
            f:failedNodesList:
    Manager:         daemon
    Operation:       Update
    Time:            2021-03-19T01:34:55Z
  Resource Version:  3740359
  Self Link:         /apis/kataconfiguration.openshift.io/v1/kataconfigs/example-kataconfig
  UID:               8190feba-53a0-41ef-9d39-0a5f93da270d
Spec:
  Config:
    Source Image:  
  Kata Config Pool Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
Status:
  Installation Status:
    Completed:
    Failed:
      Failed Nodes Count:  2
      Failed Nodes List:
        Error:  exit status 1
        Name:   okd-compute-1.lab2.okd
        Error:  exit status 1
        Name:   okd-compute-2.lab2.okd
    In Progress:
  Kata Image:         
  Runtime Class:      
  Total Nodes Count:  2
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
  Upgrade Status:
Events:  <none>

Additional environment details (platform, options, etc.):

$ oc version
Client Version: 4.7.0-0.okd-2021-03-07-090821
Server Version: 4.7.0-0.okd-2021-03-07-090821
Kubernetes Version: v1.20.0-1046+5fbfd197c16d3c-dirty
FCOS: 33.20210217.3.0 stable

$ oc get pod -n kata-operator-system
NAME                                                READY   STATUS    RESTARTS   AGE
kata-operator-controller-manager-79556d56d6-6j5g2   2/2     Running   2          11m
kata-operator-daemon-install-jzfxf                  1/1     Running   0          9m41s
kata-operator-daemon-install-zx76r                  1/1     Running   0          9m41s

$ oc logs kata-operator-daemon-install-zx76r -n kata-operator-system
I0319 01:34:22.488189 2250013 request.go:645] Throttling request took 1.040143398s, request: GET:https://172.30.0.1:443/apis/quota.openshift.io/v1?timeout=32s
W0319 01:34:25.546345 2250013 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2021/03/19 01:34:25 Kata operator payload tag: 4.7.0
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin/mkdir -p /host/opt/kata-install
/usr/bin/mkdir -p /etc/yum.repos.d/
/usr/bin/cp -f /usr/local/kata/latest/packages.repo /etc/yum.repos.d/
/usr/bin/cp -a /usr/local/kata/latest/packages /opt/kata-install/packages
Checking out tree 17a9b3a...done
Enabled rpm-md repositories: packages
rpm-md repo 'packages' (cached); generated: 2021-02-04T21:00:23Z
Importing rpm-md...done
error: Packages not found: NetworkManager-ovs, glusterfs, glusterfs-fuse, qemu-guest-agent
/bin/bash -c /usr/bin/rpm-ostree install --idempotent kata-runtime kata-osbuilder
2021/03/19 01:34:50 exit status 1

...and if I log into one of the nodes and try myself:

[core@okd-compute-1 ~]$ sudo rpm-ostree install kata-runtime kata-osbuilder
Checking out tree 17a9b3a... done
Enabled rpm-md repositories: packages
rpm-md repo 'packages' (cached); generated: 2021-02-04T21:00:23Z
Importing rpm-md... done
error: Packages not found: NetworkManager-ovs, glusterfs, glusterfs-fuse, qemu-guest-agent

... which is interesting because:

[core@okd-compute-1 ~]$ rpm -qa | grep gluster
libglusterfs0-8.4-1.fc33.x86_64
glusterfs-8.4-1.fc33.x86_64
glusterfs-client-xlators-8.4-1.fc33.x86_64
glusterfs-fuse-8.4-1.fc33.x86_64
[core@okd-compute-1 ~]$ rpm -qa | grep qemu
qemu-guest-agent-5.1.0-9.fc33.x86_64
[core@okd-compute-1 ~]$ rpm -qa | grep NetworkManager
NetworkManager-libnm-1.26.6-1.fc33.x86_64
NetworkManager-1.26.6-1.fc33.x86_64
NetworkManager-tui-1.26.6-1.fc33.x86_64
NetworkManager-team-1.26.6-1.fc33.x86_64
NetworkManager-ovs-1.26.6-1.fc33.x86_64

... also, not that matters that much, but I cannot find any reference to /bin/bash -c /usr/bin/rpm-ostree install --idempotent kata-runtime kata-osbuilder in this repository code. I could find /bin/bash -c /usr/bin/rpm-ostree install --idempotent kata-containers though (so where is the former coming from) ?

openshift / sandboxed-containers-operator Goto Github PK

sandboxed-containers-operator's Introduction

Introduction to sandboxed containers

Features & benefits of sandboxed containers

OpenShift sandboxed containers Operator

Operator Architecture

KataConfig Custom Resource Definition

Getting Started

Operator Development

Demos

Further Reading

sandboxed-containers-operator's People

Stargazers

Watchers

Forkers

sandboxed-containers-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org