Giter VIP home page Giter VIP logo

must-gather-operator's Introduction

Must Gather Operator

The Must Gather operator helps collecting must-gather information on a cluster and uploading it to a case. To use the operator, a cluster administrator can create the following MustGather CR:

apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather-basic
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  serviceAccountRef:
    name: must-gather-admin

This request will collect the standard must-gather info and upload it to case #02527285 using the credentials found in the caseManagementCreds secret.

Adding other must-gather images

In this example we are using a specific service account (which must have cluster-admin permissions as per must-gather requirements), and we are specifying a couple of additional must gather images to be run for the kubevirt and ocs subsystem. If not specified, serviceAccountRef.Name will default to default. Also the standard must gather image: quay.io/openshift/origin-must-gather:latest is always added by default.

apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather-full
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  serviceAccountRef:
    name: must-gather-admin
  mustGatherImages:
  - quay.io/kubevirt/must-gather:latest
  - quay.io/ocs-dev/ocs-must-gather

Collecting Audit logs

The field audit is false by default unless explicetely set to true. This will generate the default collection of audit logs as per the collection script: gather_audit_logs

apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather-full
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  serviceAccountRef:
    name: must-gather-admin
  audit: true

Proxy Support

The Must Gather operator supports using a proxy. The proxy setting can be specified in the MustGather object. If not specified, the cluster default proxy setting will be used. Here is an example:

apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather-proxy
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  serviceAccountRef:
    name: must-gather-admin
  proxyConfig:
    http_proxy: http://myproxy
    https_proxy: https://my_http_proxy
    no_proxy: master-api

Garbage collection

MustGather instances are cleaned up by the Must Gather operator about 6 hours after completion, regardless of whether they were successful. This is a way to prevent the accumulation of unwanted MustGather resources and their corresponding job resources.

Deploying the Operator

This is a cluster-level operator that you can deploy in any namespace; must-gather-operator is recommended.

Deploying directly with manifests

Here are the instructions to install the latest release creating the manifest directly in OCP.

git clone [email protected]:openshift/must-gather-operator.git; cd must-gather-operator
oc apply -f deploy/crds/managed.openshift.io_mustgathers_crd.yaml
oc new-project must-gather-operator
oc -n must-gather-operator apply -f deploy

Meeting the operator requirements

In order to run, the operator needs a secret to be created by the admin as follows (this assumes the operator is running in the must-gather-operator namespace).

oc create secret generic case-management-creds --from-literal=username=<username> --from-literal=password=<password>

Local Development

Execute the following steps to develop the functionality locally. It is recommended that development be done using a cluster with cluster-admin permissions.

In the operator's Deployment.yaml file, add a variable to the deployment's spec.template.spec.containers.env list called OPERATOR_IMAGE and set the value to your local copy of the image:

          env:
            - name: OPERATOR_IMAGE
              value: "registry.example/repo/image:latest"

Then run:

go mod download

Using the operator-sdk, run the operator locally:

oc apply -f deploy/crds/managed.openshift.io_mustgathers_crd.yaml
oc new-project must-gather-operator
export DEFAULT_MUST_GATHER_IMAGE='quay.io/openshift/origin-must-gather:latest'
export JOB_TEMPLATE_FILE_NAME=./build/templates/job.template.yaml
OPERATOR_NAME=must-gather-operator operator-sdk run --verbose --local --namespace ''

must-gather-operator's People

Contributors

2uasimojo avatar alexvulaj avatar anispate avatar bdematte avatar billmvt avatar bng0y avatar cblecker avatar clcollins avatar csheremeta avatar dependabot[bot] avatar dustman9000 avatar ehvs avatar garethahealy avatar jbpratt avatar jewzaam avatar jharrington22 avatar jwai7 avatar lnguyen1401 avatar nautilux avatar npecka avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar raffaelespazzoli avatar ritmun avatar robotmaxtron avatar sabre1041 avatar sam-nguyen7 avatar yithian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

must-gather-operator's Issues

Check that sftp.access.redhat.com is accessible before generating must-gather

As per AWS firewall prerequisites, sftp.access.redhat.com is only recommended not mandatory. [1]

This can lead to the situation on privatelink clusters due to the above url not being allowed through the firewall and because of retries there are several wasted loops of must-gather capture, archive and upload attempts before failing at the end of each loop in the upload container with:

ssh: connect to host sftp.access.redhat.com port 22: Connection timed out

Performing some check before generating the must-gather is even attempted would save a lot of wasted time for those cases where it is clear a must-gather could never be uploaded due to sftp.access.redhat.com being blocked..

[1] https://docs.openshift.com/rosa/rosa_planning/rosa-sts-aws-prereqs.html#osd-aws-privatelink-firewall-prerequisites_rosa-sts-aws-prereqs

Must Gather Operator Failure on Large OpenShift Clusters with Files > 1GB

On large OpenShift clusters, the Must Gather operator fails when it tries to upload must gather files larger than 1GB to the Red Hat API. This is causing issues in workflows that rely on these large must gathers.

Steps to reproduce:

  1. Run the Must Gather operator on a large OpenShift cluster (large enough to produce a must gather file larger than 1GB)
  2. Observe the failure when the operator tries to upload this must gather file to the Red Hat API.

Expected Behavior:

The Must Gather operator should successfully upload the must gather file to the Red Hat API, regardless of the file size.

Actual Behavior:

When the must gather file is larger than 1GB, the Must Gather operator fails to upload the file to the Red Hat API. The following error message is displayed:

<error>
    <code>400</code>
    <detailMessage>The attachment you are uploading is too big, it must be less than 1024 MB in length</detailMessage>
    <message>Failed to upload attachment must-gather-20230704_142547Z.tar.gz, case number 03509325</message>
</error>
Error: Upload to Red Hat Customer Portal did not return expected status code. Expected: 201. Actual: 400

We also get the following error sometimes

Error: Upload to Red Hat Customer Portal did not return expected status code. Expected: 201. Actual: 504

`operator-sdk run FATA[0000] unknown flag: --local` error

Symptoms

In the Local Development, operator-sdk run --verbose --local is used, but I'm getting error:

operator-sdk run FATA[0000] unknown flag: --local

It looks like that local has been removed from the operator-sdk, and it is recommended to use make run instead:

operator-sdk run local" not present in 1.1.0 #4158

Versions

operator-sdk version: "v1.21.0-11-g07b7a7fc", 
commit: "07b7a7fc99a86e6129cb55a8c1c5fed17d8605f8", 
kubernetes version: "v1.23", 
go version: "go1.17.5", 
GOOS: "linux", 
GOARCH: "amd64"

MGO is not able to run the gather_ppc collection in ROSA/OSD

When running the latest image of the Must-gather quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a491bc654e4fb0b1df3ce9300f1aac9cc21148746abb5a79f2dc29aadda59d6a

The collection of gather_ppc -node performance- fails due to permissions in the serviceaccount used by the MGO.

INFO: Image with low level tools to use: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a491bc654e4fb0b1df3ce9300f1aac9cc21148746abb5a79f2dc29aadda59d6a
daemonset.apps/perf-node-gather-daemonset created
Waiting for performance profile collector pods to become ready: 1
Waiting for performance profile collector pods to become ready: 2
[...]
Waiting for performance profile collector pods to become ready: 158

It runs until timesout.

Events:
  Type     Reason        Age                From                  Message
  ----     ------        ----               ----                  -------
  Warning  FailedCreate  76s (x17 over 4m)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "newrelic-scc": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].hostPID: Invalid value: true: Host PID is not allowed to be used, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, provider restricted: .containers[0].hostPID: Invalid value: true: Host PID is not allowed to be used, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "pcap-dedicated-admins": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "log-collector-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "splunkforwarder": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount

Add a network polciy

The OpenShift Compliance Operator has a check that all namespaces have a network policy. This operator should add an appropriate one.

ACM must-gather image not working

Adding the ACM must-gather image, I can see that the image gets pulled but when the must-gather completes only the OCP must-gather gets created.
Image used for ACM:
registry.redhat.io/rhacm2/acm-must-gather-rhel8:v2.8.0

Deploy following the wiki deploy section don't work.

I was trying to install following the wiki deploy instructions and didn't work.

git clone [email protected]:openshift/must-gather-operator.git; cd must-gather-operator
oc apply -f deploy/crds/managed.openshift.io_mustgathers_crd.yaml          <<< Bad file name
oc new-project must-gather-operator
oc -n must-gather-operator apply -f deploy          <<< Bad NS name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.