Giter VIP home page Giter VIP logo

insights-operator's Introduction

Insights Operator

This cluster operator gathers anonymized system configuration and reports it to Red Hat Insights. It is a part of the standard OpenShift distribution. The data collected allows for debugging in the event of cluster failures or unanticipated errors.

Table of Contents

Building

To build the operator, install Go 1.11 or above and run:

make build

To test the operator against a remote cluster, run:

bin/insights-operator start --config=config/local.yaml --kubeconfig=$KUBECONFIG

where $KUBECONFIG has sufficiently high permissions against the target cluster.

Testing

Unit tests can be started by the following command:

make test

It is also possible to specify CLI options for Go test. For example, if you need to disable test results caching, use the following command:

VERBOSE=-count=1 make test

Integration (e2e) tests are not part of this repository, you can find it here.

Documentation

The document docs/gathered-data contains the list of collected data and the API that is used to collect it. This documentation is generated by the command bellow, by collecting the comment tags located above each Gather method.

To start generating the document run:

make docs

The configuration and functionality of the Insights operator is described more in the architecture document.

Getting metrics from Prometheus

Generate the certificate and key

Certificate and key are required to access Prometheus metrics (instead 404 Forbidden is returned). It is possible to generate these two files from Kubernetes config file. Certificate is stored in users/admin/client-cerfificate-data and key in users/admin/client-key-data. Please note that these values are encoded by using Base64 encoding, so it is needed to decode them, for example by base64 -d.

There's a tool named gen_cert_key.py that can be used to automatically generate both files. It is stored in tools subdirectory.

gen_cert_file.py kubeconfig.yaml

Prometheus metrics provided by Insights Operator

It is possible to read Prometheus metrics provided by Insights Operator. Example of metrics exposed by Insights Operator can be found at metrics.txt, also there is a list of possible metrics available in the architecture document.

Depending on how or where the IO is running you may have different ways to retrieve the metrics. Here is a list of some options, so you can find the one that fits you:

Running IO locally

If the IO runs locally, the following command might be used:

curl --cert k8s.crt --key k8s.key -k https://localhost:8443/metrics

Running IO on K8s

Get the token

oc whoami -t

Read metrics from Pod

oc exec \
    -it deployment/insights-operator \
    -n openshift-insights -- \
    curl -k -H "Authorization: Bearer YOUR-TOKEN-HERE" 'https://localhost:8443/metrics'

Getting the data directly from Prometheus

sudo kubefwd svc -n openshift-monitoring -d openshift-monitoring.svc -l prometheus=k8s
curl --cert k8s.crt --key k8s.key  -k 'https://prometheus-k8s.openshift-monitoring.svc:9091/metrics'

Debugging Prometheus metrics without valid CA

  1. Forward the service
sudo kubefwd svc -n openshift-monitoring -d openshift-monitoring.svc -l prometheus=k8s
  1. Set INSECURE_PROMETHEUS_TOKEN environment variable:
export INSECURE_PROMETHEUS_TOKEN=$(oc sa get-token prometheus-k8s -n openshift-monitoring)
  1. Run the operator.

Debugging

Using the profiler

Starting IO with the profiler

IO starts a profiler if given the correct environment. Set the OPENSHIFT_PROFILE env variable to "web".

export OPENSHIFT_PROFILE=web

Collect profiling data

After IO starts the profiling can be accessed at http://localhost:6060, you can use the pprof tool to connect to it.

Some profiling examples:

# CPU profiling for 30 seconds
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# heap profiling
go tool pprof http://localhost:6060/debug/pprof/heap

These commands will create a compressed file that can be visualized using a variety of tools, one of them is the pprof tool.

Analyzing profiling data

Starting a web ui at localhost:8080 to visualize/analyze the profiling data:

go tool pprof -http=:8080 /path/to/profiling.out

For extra info: check this link

Changelog

You can find the project changelog by clicking here.

Updating the changelog

At ./cmd/changelog/main.go there is a script that can update the changelog for you.

It uses both the local git and GitHub`s API to update the file so:

  • To get info from GitHub you will need to set the GITHUB_TOKEN envvar to a GitHub access-token.
  • Make sure that you have a local, up-to-date copy of each release-branch that might be in the changelog.

It can be used 2 ways:

  1. Providing no command line arguments the script will update the current CHANGELOG.md with the latest changes
  2. according to the local git state.

๐Ÿšจ IMPORTANT: It will only work with changelogs created with this script

go run cmd/changelog/main.go
  1. Providing 2 command line arguments, AFTER and UNTIL dates the script will generate a new CHANGELOG.md within the provided time frame.
go run cmd/changelog/main.go 2021-01-10 2021-01-20

In case of changelog not being generated properly, try switching to release branch you want to generate.

Reported data

  • ClusterVersion
  • ClusterOperator objects
  • All non-secret global config (hostnames and URLs anonymized)

The list of all collected data with description, location in produced archive and link to Api and some examples is at docs/gathered-data.md

The resulting data is packed in .tar.gz archive with folder structure indicated in the document. Example of such archive is at docs/insights-archive-sample.

For more information on which data points are anonymized by the Insights Operator, please see this list.

Insights Operator Archive

Sample IO archive

There is a sample IO archive maintained in this repo to use as a quick reference. (can be found at docs/insights-archive-sample)

To keep it up-to-date it is required to update this manually when developing a new data enhancement.

Make sure the .json files are in a humanly readable format in the sample archive. By doing this its easier to review a data enhancement PR, and rule developers can easily check what data it collects.

Generating a sample archive

Run the insights-operator on a test cluster (from cluster-bot or Quicklab or etc).

Formatting archive json files

This formats .json files from folder with extracted archive.

find . -type f -name '*.json' -print | while read line; do cat "$line" | jq > "$line.tmp" && mv "$line.tmp" "$line"; done

Obfuscating an archive

You can run obfuscation with an archive by running the next command:

go run ./cmd/obfuscate-archive/main.go YOUR_ARCHIVE.tar.gz

where YOUR_ARCHIVE.tar.gz is the path to the archive. The obfuscated version will be created in the same directory and called YOUR_ARCHIVE-obfuscated.tar.gz

Updating the sample archive

The docs/insights-archive-sample/ directory contains an example of an Insights Operator archive, extracted and with pretty-formatted JSON files. In case of any changes that affect multiple files in the archive, it is a good idea to regenerate the sample archive to make sure it remains up-to-date.

There are two ways of updating the sample archive directory automatically. Both of them require running the Insights Operator, letting it generate an archive and extracting the archive into an otherwise empty directory.

The script will automatically replace existing files in the sample archive with their respective counterparts from the supplied extracted IO archive. In case of files with (partially) randomized names, such as pods or nodes, the entire directory is deleted and replaced with a matching directory from the new archive if possible. Changes made by the script can be checked and reverted using Git. The updated JSON files will be automatically pretty-formatted using jq, which is the only dependency required for running the script.

All existing files in the sample archive can be updated using the following command:

./scripts/update_sample_archive.sh <Path of directory with the NEW extracted IO archive>

If you only want to update files containing a certain string pattern, you can supply a regular expression as a second optional argument. For example, the following command was used to replace JSON files containing the managedFields field when it was removed from the IO archive to save space:

./scripts/update_sample_archive.sh <Path of directory with the NEW extracted IO archive> '"managedFields":'

The path of the sample archive directory should be constant relative to the path of the script and therefore does not have to be specified explicitly.

Conditional Gathering

See docs/conditional-gatherer/README.md

Contributing

See CONTRIBUTING for workflow & convention details.

See STYLEGUIDE for file format and coding style guide.

Support

Insights Operator is part of Red Hat OpenShift Container Platform. For product-related issues, please file a ticket in Red Hat JIRA for "Insights Operator" component.

License

This project is licensed by the Apache License 2.0. For more information check the LICENSE file.

insights-operator's People

Contributors

0sewa0 avatar alexandrevicenzi avatar csrwng avatar deads2k avatar eparis avatar imiller0 avatar inecas avatar jhjaggars avatar jholecek-rh avatar joselsegura avatar kartikey-star avatar lance5890 avatar liouk avatar martinkunc avatar mfojtik avatar natiiix avatar ncaak avatar openshift-bot avatar openshift-merge-robot avatar psimovec avatar rhrmo avatar rluders avatar samuelstuchly avatar smarterclayton avatar stbenjam avatar tisnik avatar tremes avatar valaparthvi avatar vrutkovs avatar wking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

insights-operator's Issues

RFE: case attachment helper

consider adding a feature by which a cluster administrator can request the status of the cluster be attached to a case. The request could be triggered by a CR, which among other information contains the case id. the operator would then collect the must gather information, potentially SOS report on a subset of nodes etc... . Once the collection phase is done the operator would upload everything to the case as an attachment using a standard format (so potentially tools can be created on the support side to analyse the payload.

Pull Insights report retry mechanism is broken

When Insights Operator uploads a new archive, it waits for a configured amount of time and then starts to retrieve the report generated by the Insights Results Smart Proxy service.

If something goes wrong, it should be a retry mechanism that allow IO to try again after some time, but it doesn't work properly with some errors.

For example, if the Smart Proxy service returns a 404 (which is something very usual after the first archive upload), the retry mechanism doesn't try to get it again after some time. (The error is very likely to be in this code block:

).

This problem is impacting users, as they won't be able to see Insights results for their clusters in the OCP WebConsole until the second archive is uploaded

RFE: Customized MachineConfigs Insights

In preparation for the Ignition Spec V2 to V3 migration, we'd like to find out how many of our users have customized their MachineConfigs on their clusters.

Initial idea was to do something like this to find out whether or not a cluster has customized MCs (note the factor 2 to account for both worker and master MachineConfigPools and their respective rendered-MCs)

if [len(rendered-MCs) - 2*len(updates-count) > 2]
  customized = true
else
  customized = false

WDYT about this approach?

/cc @runcom @darkmuggle @ajeddeloh @smarterclayton

Support Operator Degraded

Version: 4.4.0-0.okd-2020-02-04-174205

Using full pull secret including secret for cloud.redhat.com.

The support operator becomes degraded as it attempts to post to cloud.redhat.com but is seeing the ingress certificate:

Unable to report: Post https://cloud.redhat.com/api/ingress/v1/upload: x509: certificate is valid for *.apps.okd.example.net, not cloud.redhat.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.