runwhen-contrib / runwhen-local Goto Github PK

View Code? Open in Web Editor NEW

150.0 4.0 8.0 45.81 MB

RunWhen Local provides a tailored troubleshooting cheat sheet for Kubernetes environments

Home Page: https://docs.runwhen.com/public/v/runwhen-local/

License: Apache License 2.0

Dockerfile 0.94% CSS 2.99% JavaScript 1.70% HTML 3.10% Jinja 5.92% Python 82.37% Shell 2.99%

cli discovery kubernetes troubleshooting

runwhen-local's People

Contributors

Stargazers

Watchers

Forkers

preeti2765 liquuid deblike kalyann567 artart788 nikusharoot

runwhen-local's Issues

[runwhen-local-feedback] OpenShift error: sqlite3.OperationalError: unable to open database file The above exception was the direct cause of the following exception:

Observation
Attempting to deploy this directly into OpenShift causes the following error:

Starting up neo4j
Waiting a bit before starting workspace builder REST server
WARNING -  Config value 'build': Unrecognised configuration name: build
WARNING -  Config value 'dev_addr': The use of the IP address '0.0.0.0' suggests a production environment or the use of a proxy to connect to the MkDocs server. However, the MkDocs' server is intended for local development purposes only. Please use a third party production-ready server instead.
INFO    -  Building documentation...
INFO    -  Cleaning site directory
INFO    -  Documentation built in 0.46 seconds
INFO    -  [18:02:55] Watching paths for changes: 'cheat-sheet-docs/docs', 'cheat-sheet-docs/mkdocs.yml'
INFO    -  [18:02:55] Serving on http://0.0.0.0:8081/
Changed password for user 'neo4j'. IMPORTANT: this change will only take effect if performed before the database is started for the first time.
2023-08-17 18:02:59.849+0000 INFO  Starting...
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
    self.connect()
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/db/backends/base/base.py", line 270, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 180, in get_new_connection
    conn = Database.connect(**conn_params)
sqlite3.OperationalError: unable to open database file

Possible Suggestions
The UID assigned from OpenShift would likely be the root cause here; we need to modify the entrypoint to support the random UIDs assigned from OpenShift to add them into the runwhen group for it to have access to the correct files. This will require testing in an OCP cluster.

[runwhen-local-feedback] Need additional error handling

Observation
Additional error handling would be appreciated when executing the run.sh process. With the nature of short lived kubeconfig tokens, it's a common case that the token has expired and currently the error doesn't indicate that this is the case, whereas if we find the actual response from the django server, we can see that the call is unauthorized:

({\'Audit-Id\': \'5a992f37-0fbf-4d48-bfb0-224c68c2b2a8\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Fri, 30 Jun 2023 13:56:44 GMT\', \'Content-Length\': \'129\'})\nHTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}\n\n'}

Possible Suggestions
Build in additional error handling that indicates:

kubeconfig auth issues
kubeconfig/workspaceinfo/output directory permission issues
any other django response error that should be passed through to the caller

Any other details or context

Removal of Neo4j

The container image currently uses Neo4j as a state management tool; while this might make sense for longer term plans that include widening the indexer capability, for the purposes of indexing Kubernetes resources, it feels like it's overkill and has an operational effect of adding quite a bit of extra size to the container image (the base neo4j image is about 500Mb) and all of the packages and dependencies to manage that it comes with. It doesn't appear that it provides value as compared to it's additional overhead, so we will remove it for now and reconsider this at a later date when the design comes up to re-introduce indexers for GCP/AWS/Azure.

Update: below is the (probably mostly redundant) description from a duplicate ticket that I closed.

There are several neo4j-related issues that cause friction/bugs with the deployment of the workspace builder and runwhen local. At least for now, the use of neo4j doesn't really buy us anything since we completely reset the model state before every run of the workspace builder to maintain data isolation of the different runs. So for now it would simplify things to just store the model information in memory. It would presumably eliminate the networking-related issues some people see when deploying runwhen local and it would cut down the size of the docker image by a lot.

Eventually, if/when we have something like a product/resource graph feature, it will be desirable to maintain the model state in a database, so ideally I'll try to make this change by abstracting out the model storage layer so that it can be switched between storing in memory vs. in neo4j (or possibly some other database).

[runwhen-local-feedback] BUG - KeyError: ''

Observation
The following error has been showing up on the latest container image:

  File "/workspace-builder/pgrun.py", line 333, in <module>
    main()
  File "/workspace-builder/pgrun.py", line 312, in main
    cmdassist.command_assist(output_path)
  File "/workspace-builder/cmdassist.py", line 342, in command_assist
    parsed_robot = parse_robot_file(robot_file)
  File "/workspace-builder/cmdassist.py", line 29, in parse_robot_file
    suite = TestSuite.from_file_system(fpath)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/running/model.py", line 445, in from_file_system
    return TestSuiteBuilder(**config).build(*paths)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/running/builder/builders.py", line 156, in build
    suite = SuiteStructureParser(self._get_parsers(paths), self.defaults,
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/running/builder/builders.py", line 217, in parse
    structure.visit(self)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/parsing/suitestructure.py", line 65, in visit
    visitor.visit_file(self)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/running/builder/builders.py", line 224, in visit_file
    suite = self._build_suite_file(structure)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/robot/running/builder/builders.py", line 248, in _build_suite_file
    parser = self.parsers[structure.extension]
KeyError: ''

Possible Suggestions
The issue exists in the processing of the robot file to render the markdown content.

Any other details or context
I can reproduce locally and in our always-online demo version, but can't reproduce in dev (gitpod).

[runwhen-local-feedback] Kubeconfig setup is frustrating

Observation
The initial setup of creating / managing a kubeconfig that is used to scan the environment can cause friction depending on the environment.

Possible Suggestions
Identify possible ways to leverage the existing kubeconfig without having to do any manipulation. We realize this might not be possible, as each kubeconfig context might have different dependencies (e.g. gke doesn't store a token in the kubeconfig, but uses a binary to fetch token details, openshift has a different approach, eks and aks likely do as well).

Part of this approach will be to try to separate out some of the kubeconfig documentation for different use cases, and the other approach is to see how we can better share these resources with the hots system to reduce the steps (if at all possible)

Any other details or context

[runwhen-local-feedback] simplify contribution feedback

Observation
In order to allow users to share their best troubleshooting commands with others in the community, we need to simplify the contribution process so that users can easily:

fire and forget (e.g. ship us a command, describe its purpose, and let RunWhen do the rest)
request help from the RunWhen team with troubleshooting commands

Possible Suggestions
I'm currently proposing a GitHub Issues or Discussions based approach, which should help constrain the conversation to this specific repo and allows teams to easily collaborate with nothing more than a GitHub account.

Any other details or context

[runwhen-local-feedback] Add `oc get co` for openshift users.

Observation
OpenShift users would benefit from the oc get co command to be surfaced in RunWhen Local

Listing PVCs in terminating State

Hi Team,

Environment is Azure AKS 1.25.6

I’ve deployed runwhen-local and able to access dashboard.

I just deleted a PVC. To validate the command/info available in runwhen-local dashboard. The expected result is, it should list the PVC as the PVC is in terminating state. It didn’t happen.

[runwhen-local-feedback] Enhancement suggestion for feeback/upvotes/etc

Observation
It would be nice to have a user experience that can indicate which commands are more helpful than others. While users could bookmark the commands that are most helpful (via standard browser bookmarks), as the environment changes, so too will the bookmark urls.

Possible Suggestions
It's possible that some sort of user experience improvement could be:

community oriented, such as easy linking to github issues or discussions about the command. A tool like https://web.hypothes.is/ could possibly integrate (ex. https://jfix.github.io/mkdocs-hypothesis-test/vol1/introduction/S2/ && squidfunk/mkdocs-material#1145) or we could just tap this into github
personal - such as creating a dynamical personal page to stash shortlisted commands -> This would require some additional persistent storage work which is, right now, the opposite of the current design
a combination of the two, with some sort of upvote/downvote system?

[runwhen-local-feedback] sed / macOS might not operate the same way

Observation

Either people have GNU sed installed and can call to gsed or you could add fancy sed detection into the shell script. The current script likely won't work as desired with BSD based sed implementations.

set sed binary

if [[ $OSTYPE == "darwin"* ]]; then
export SED=gsed
else
export SED=sed
fi

Possible Suggestions
Test and update the script if running on macOS, checking with the most common installation of brew (or is it already in the base OS?)

[runwhen-local-feedback] support OIDC kubelogin in gen_rw_kubeconfig.sh

Observation
For users of kubelogin, we need to enhance gen_rw_kubeconfig.sh to support the token generation as we do for gke and oke.

Any other details or context

Example exec stanza:

users:
  - name: someuser
    user:
      exec:
        apiVersion: [client.authentication.k8s.io/v1beta1](http://client.authentication.k8s.io/v1beta1)
        args:
          - oidc-login
          - get-token
          - --oidc-issuer-url=https://issuer.region.amazonaws.com/xxxxxxx
          - --oidc-client-id=xxxxxxx
        command: kubectl
        env: null
        interactiveMode: IfAvailable
        provideClusterInfo: false

[runwhen-local-feedback] Last Scan Timestamp is incorrect.

Observation
I suspect this isn't the case when a container is run in an ad-hoc manner, but in our persistent Sandbox/demo environment (https://runwhen-local.sandbox.runwhen.com/), the Last Scan timestamp isn't getting updated with each scan (which executes every 30 minutes (I think).

Need to check the timestamp being set in the mkdocs.yaml file - I suspect it's not getting re-read and might need to be stored in another location or written directly to the partials override.

[runwhen-local-feedback] Error 500 from Workspace Builder service for command "run": 'str' object does not support item assignment

Hello RunWhen,
I was trying to test the runwhen in my local by referencing the https://docs.runwhen.com/public/runwhen-local/getting-started/running-locally when i execute this command
shared $ docker exec -w /workspace-builder -- RunWhenLocal ./run.sh
But i am encountering this error
shared $ docker exec -w /workspace-builder -- RunWhenLocal ./run.sh
Error 500 from Workspace Builder service for command "run": 'str' object does not support item assignment
Need help for this issue..
Thanks..

[runwhen-local-feedback] Helpful Links - tuning for kubernetes variants like ocp

Observation
The helpful links content that is generated beside the troubleshooting code currently doesn't take into account a variant like openshift. No matter what, at this point, it will generate Kubernetes documentation, when it might be more appropriate to generate OpenShift documentation.

Possible Suggestions
I'm not sure how feasible or even useful this is since a) the Kubernetes docs are still helpful, and b) without knowing upfront which platform we are querying, we might need to run the generation task for each platform and then have the runwhen-local container image choose the right link reference based on input from the workspaceInfo.yaml file.

Any other details or context
I'm more curious at this point whether this is a feature that anyone would want or use. Please 👍 if you would like or use this.

[enhancement] Support for rendering scripts

Observation
Given the feature here (runwhen-contrib/rw-cli-codecollection#193) to support running scripts, this makes the complex one-liners easier to maintain and manage if they are in multi-line script format. It would be nice to have these accessible to RunWhen Local in the cheat sheet

Possible Suggestions
Expand the rendering to support RW.CLI.Run Bash File keywords.

Any other details or context
For those specific commands, we will likely want to drop the metadata generation for "multi-line expansion with explanations" generated by LLMs - but this would require additional documentation in the script, unless we still use the LLM to dress this up.

Support for using branches of code collections other than the main branch

The support for specifying this in the user configuration of the code collections is already implemented. This handles extracting the gen rules and the templates from the specified branch of the code collection repo. What's missing, though, is including the branch/tag information in the generated SLX content, so that the platform uses the correct branch when executing the SLXs.

The idea is to have the workspace builder configure template variables for the code collection repo URL and the configured branch/ref/tag, which could then be accessed in the templates file to generate the code collection/bundle info in the spec part of the generated SLX contents.

Move workspace builder generation rule definitions to the associated code bundles

This is currently tracked internally https://github.com/runwhen/platform-core/issues/1014 and duplicated here for public visibility.

Currently the generation rules (and associated templates) for the workspace builder are centralized in the workspace builder directory. This does not scale well and more immediately means that creating/editing the gen rules requires a new build of the runwhen local container image, which slows development turnaround time.

The plan is to move the gen rules into the code bundle directory associated with the gen rule. The directory structure will probably be that there's a "workspace-builder" at the top level of the code bundle and inside that directory there are "generation-rules" and "templates" subdirectories.

The workspace builder is configured with a set of code collections that it scans to collect all of the gen rules information across all of the code bundles in the code collections. Individual runs of the workspace builder can be configured (probably in the workspaceInfo.yaml file) to include additional code collections or suppress one or more of the default collections (or perhaps suppress individual code bundles in the included code collections).

The workspace builder tree will still contain the standard "common-labels.yaml" and "common-annotations.yaml" files that were added recently. The templating code in the workspace builder will include in the file search path both the directory for the templates that were downloaded from the code bundle plus the directory that contains the built-in standard templates.

Note: this ticket is for the code changes in the core workspace builder to support this feature. There's a separate ticket (https://github.com/runwhen/platform-core/issues/1017) for the task of actually reorganizing all of the existing gen rules to take advantage of this feature.

[runwhen-local-feedback] Documentation enhancements -> integration with GitBook. Add docs for custom codecollection repositories.

Observation
Currently there are some docs in the open source repo related to runwhen, and a few in the gitbook instance. This needs to be harmonized, with additional runwhen local customizations documentated (there's lots of capability here, just not documented).

Possible Suggestions

Migrate the md docs from GitBook to the Open Source repo.
Open up a new GitBook space pointing to the open source docs, resulting in a single location for RunWhen Local docs that can easily accept contributions.

Any other details or context

[runwhen-local-feedback] Swap position of explanation and command & format fix

Observation

Proposed by user to swap position of explanation and command for better readability - as users will want to understand the command before running it
Fix formatting of CommandWhat in the search:

Possible Suggestions

Any other details or context

[runwhen-local-feedback] Enhancement - add progress indicator

Observation
It would be useful to have visual indication that a discovery is in progress, especially when run directly in Kubernetes environments.

Possible Suggestions
Not quite sure yet, we could add something to the discovery process that customized the mkdocs homepage.

Any other details or context

[runwhen-local-feedback] Install instructions for running in a cluster rather than a local desktop

Observation
Some users might find it useful to host a central version of this cheatsheet. It does require that every user leverages the same kubeconfig context name (as the commands tend to always include the specific context), but it could be useful to share how to run this directly in the cluster.

Possible Suggestions
Document the manifests used to run this in a vanilla K8s cluster, as is done with the live demo / sandbox version here: https://runwhen-local.sandbox.runwhen.com/

Any other details or context

[runwhen-local-feedback] Additional learning resources

Observation
It would be interesting to expand the new Learn More section with documentation links that could pertain to the command - enabling users to explore more details about the specific commands as they are using them in their environments.

Possible Suggestions
Let's see what we can generate with LLM generated documentation links.

Any other details or context

[runwhen-local-feedback] Feature Request - Multi-line educational content with comments.

Observation
One-liners work great for copy-paste, and "what does it do" provides a decent description of functionality, but for learning it would be nice to see the more complex commands in multi-line format with comments.

Possible Suggestions
We can likely use openAI to generate this content (rather than maintaining it in a separate human maintained file) as a starting point and validate the efficacy. I suspect (based on experience) that you couldn't copy/paste most multi-line converted content as openAI isn't perfect at the translation between single-line and multi-line, especially when multi shell compatibility is concerned, but we can put a dislaimer that it is for education purposes only. Likely this would be in a collapsible code block below the copy/paste sections.

Any other details or context

[new-command-request] Ingress Health Check - Deployment ContainerPort/SVC port matching Ingress Backend port

What do you need the command to do?

Hi Stewart,

Please find the attachment with details. Basically, it's related to an ingress health check. Whatever the commands we have in RunWhen Local are just amazing. However,
RunWhen Local - Ingress Health Check - Commands wanted.pdf
I've come across a scenario and thought to share it with the community.

What should the output look like?

No response

Any other helpful context?

No response

Contact

Yes, please

[runwhen-local-feedback] Learn More formatting has an edge case

Observation
With one of the latest commands, the "learn more" is supposed to present a multi-line format of the command with comments in a code block, but it's not presented correctly.

Possible Suggestions
Investigate either why the command expansion is stored as it is in the meta.yaml file or determine if the template that renders the page is picking up some odd characters that we need to escape out. The best thing to to in this case is to view the markdown output of the content and debug from there.

Any other details or context

Add flag to stop processing of a gen rule on the first match

Use case is something like the certificate health check where you only want to create a single SLX even though the gen rule checks across multiple resources.

[runwhen-local-feedback] Docs: Running the container in docker for macOS requires additional network parameters

Observation
There's been a couple of occurrences where running the runwhen-local container in docker for mac has required additional network parameters.

Possible Suggestions
Add documentation for different docker for mac related network issues.

Any other details or context
One use reported running into a DNS issue when trying to connect to an OpenShift CodeReadyContainers instance and an OKD cluster running in Proxmox from my M2 Macbook Pro.They had to add --add-host to the docker command to specify the cluster target:

docker run --name RunWhenLocal -p 8081:8081 --add-host=api.crc.testing:host-gateway -v ~/runwhen-local/shared:/shared -d ghcr.io/runwhen-contrib/runwhen-local:latest && sleep 20

Another macOS user produced an error in which the container was not able to connect to its own (localhost) neo4j database without exposing the port.

[runwhen-local-feedback] Health Check for Namespaces in K8s or how to delete namespaces which are in terminating state

Observation

Namespace Health Check in dashboard- A namespace can be in one of two phases: Active or Terminating

Possible Suggestions

I know it's a very basic health check. But usually, if the namespace is in "Terminating State" either it should not be displayed in runwhen dashboard or we need to provide a command to check the namespace health check/ how to delete the namespace which is in terminating state.

Any other details or context

Currently in my Azure AKS v1.25.6 cluster, I've deleted a few namespaces and it's in a terminating state. But in runwhen portal/dashboard, still I can see the data/details specific to those namespaces.

[runwhen-local-feedback] Installation via helm

Observation
There have been a couple of questions as to whether RunWhen Local can be installed in-cluster via Helm.

Possible Suggestions
While we do provide a pretty simple all-in-one deployment yaml example (https://github.com/runwhen-contrib/runwhen-local/blob/main/deploy/k8s/all-in-one.yaml), helm might enable us to provide additional customization at the values.yaml layer. One simple customization might be to support in-cluster authentication instead of needing to supply a kubeconfig.

Any other details or context
Right now this issue is open to support the discussion or idea.

Please provide a 👍 if you're interested in this.

[runwhen-local-feedback] Scan error handling when branch ref or meta file doesn't exist

Observation
We can run into some scenarios where the referenced troubleshooting code might not exist, or it exists and has no meta.yaml associated with it.

Request failed with status code: 404
Traceback (most recent call last):
  File "/workspace-builder/run.py", line 441, in <module>
    main()
  File "/workspace-builder/run.py", line 353, in main
    cheatsheet.cheat_sheet(output_path)
  File "/workspace-builder/cheatsheet.py", line 693, in cheat_sheet
    interesting_commands = search_keywords(parsed_robot, parsed_runbook_config, search_list, meta)
  File "/workspace-builder/cheatsheet.py", line 213, in search_keywords
    for cmd_meta in meta['commands']:
TypeError: 'NoneType' object is not subscriptable

Possible Suggestions
We need to add in some checking and default content here to keep building the documentation even when the file is not accessible.

Any other details or context

User configuration of which code collections are scanned for gen rules

Most of the code is already implemented in the REST service backend, but run.py needs to be updated to pull the configuration out of workspace info and convert to a request param.

Bonus feature would be to support user customization of which code bundles within a code collection are enabled, which it sounds like it could be useful for Justin with his testing (and probably in general too).

[runwhen-local-feedback] Explain command is broken

Observation
It looks like the explain endpoint was moved and is no longer accessible, breaking the Explain feature.

[runwhen-local-feedback] index page update with summarization

Observation
The index page is very static and generic, which might provide confusion about which "instance" of runwhen-local is running.

Possible Suggestions
It would be helpful to provide a summary on the index page of when it was generated, a high-level count of what it found, which clusters were scanned, and who ran the scan.

Any other details or context

[runwhen-local-feedback] gen_rw_kubeconfig.sh binary validation

Observation
Related to #314 (comment)

Possible Suggestions

Validate some of the logic (e.g. if there wasn't a gcloud user found, it shouldn't have tried to run the command - or better error handing to search for the binary before it tries to execute it).

Any other details or context

[runwhen-local-feedback] macOS permission issues & podman

Observation
Some scenarios in testing have shown nested directory permissions issues in macOS. It's unclear if this is an OS permission level or related to the podman configuration

Possible Suggestions
Reproduce issue, resolve specific configuration requirements for Podman on macOS and update documentation.

Any other details or context
Podman was initialized in this way:

podman machine init -v /tmp:/tmp          # ${HOME}:${HOME}gave a lot of permissions trouble
podman machine start

Lot's of issues continued to occur writing to the initialized path but it's not clear if this was mounted as /tmp or ${HOME}/runwhen-local

[runwhen-local-feedback] adding Oracle Cloud OKE authentication

Observation
I would like to try runwhen-local on my OKE Kubernetes clusters on Oracle Cloud. Unfortunately the kubeconfig converter ./gen_rw_kubeconfig.sh doesn't handle the Oracle authentication. This is how my config file looks like:

apiVersion: v1
kind: ""
clusters:
- name: cluster-c6oz4zfbmka
  cluster:
    server: ****
    certificate-authority-data: ***
users:
- name: user-c6oz4zfbmka
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      command: oci
      args:
      - ce
      - cluster
      - generate-token
      - --cluster-id
      - ocid1.cluster.oc1.***
      - --region
      - ***
      - --profile
      - ***
      env: []
contexts:
- name: context-c6oz4zfbmka
  context:
    cluster: cluster-c6oz4zfbmka
    user: user-c6oz4zfbmka
current-context: context-c6oz4zfbmka

Possible Suggestions
It looks like GKE has a similar token based authentication and you already seem to handle this in the current gen_rw_kubeconfig.sh script:

cat <<E0F >gen_rw_kubeconfig.sh

#!/bin/bash

# Set the working directory
workdir="$workdir"

# Check if the workdir/shared folder exists
if [ -d "$workdir/shared" ]; then
  echo "Directory '$workdir/shared' exists."
else
  echo "Error: Directory '$workdir/shared' does not exist."
  exit 1
fi

# Copy ~/.kube/config to $shared_dir/kubeconfig
cp ~/.kube/config "$workdir/shared/kubeconfig"

# Find Kubernetes users using gke-gcloud-auth-plugin
users=\$(grep -B 7 "command: gke-gcloud-auth-plugin" "$workdir/shared/kubeconfig" | awk '/- name:/{print \$3}')

yq eval 'del(.users[].user.exec)' -i "$workdir/shared/kubeconfig"

# Perform substitution using yq
for user in \$users; do
        echo "Fetching new token"
	token=\$(gke-gcloud-auth-plugin generate-token "\$user" | awk -F'"' '/"token":/{print \$4}')
	T=\$token yq  eval '.users[].user |= {"token": env(T)}' -i "$workdir/shared/kubeconfig"
	#yq eval ".users[] | select(.name == \"\$user\") | .user.token = \"\$token\"" -i "$workdir/shared/kubeconfig"
done

# Set permissions for container to read the file
chmod 655 "$workdir/shared/kubeconfig" 

# Output the modified kubeconfig
cat "$workdir/shared/kubeconfig"
E0F

Would it be as simple as adding a similar line to google:
token=$(gke-gcloud-auth-plugin generate-token "$user" | awk -F'"' '/"token":/{print $4}')

and use the oci command instead:

oci ce cluster generate-token --cluster-id ocid1.cluster.oc1.***--region ap-sydney-1 --profile myprofile

This OCI command results in:

{
    "apiVersion": "client.authentication.k8s.io/v1beta1",
    "kind": "ExecCredential",
    "status": {
        "token": "_TOKEN_",
        "expirationTimestamp": "2023-08-30T05:35:37.977978Z"
    }
}

**Any other details or context**

[runwhen-local-feedback] workspace-builder function needs a way to specify max matched objects

Observation
We have a scenario where a gen rule is set to match the presence of a specific object in a namespace, for example, certificates - but the script/command itself is scoped to the entire namespace, and therefore only 1 should exist for the namespace. Right now, if we have multiple certificates, for example, we will end up with duplicate SLXs/instances of the certificate health check command.

This doesn't tend to show up in the cheat sheet as far as I can tell, but it becomes present in the RunWhen Platform workspace.

Possible Suggestions
Maybe there's some sort of rule option that flags a genrule as a "namespace scoped" or "single instance"? We'd need to also consider the SLX naming here too.

Any other details or context

Error 500 from Workspace Builder service for command "run": while parsing a block collection

I was running runwhen-local with podman, I was encountered with Error 500 Message while discovering resources.

Run the container image

podman run --name RunWhenLocal -p 8081:8081 -v $workdir/shared:/shared --userns=keep-id:uid=999,gid=999 ghcr.io/runwhen-contrib/runwhen-local:latest
Trying to pull ghcr.io/runwhen-contrib/runwhen-local:latest...
Getting image source signatures
Copying blob sha256:47bcaba622c99cea3f9f529579f9714eea4f752d0fa962fdcdffccc32755d970
Copying blob sha256:14726c8f78342865030f97a8d3492e2d1a68fbd22778f9a31dc6be4b4f12a9bc
Copying blob sha256:f7d4946b1998501e173a27b5106ab354beefe3a0cead8354e4f41fc3bc266f22
Copying blob sha256:98a8504abe389a4c8f0b40ca0f43e8394ceecfc90fc5f483f064b5a82b5ee893
Copying blob sha256:227f24781f6bd764b0b800ad8181f3fe4feb7c974df900a8aed47d5fc6939a17
Copying blob sha256:e3f84e9db4a1ee24db5a8ad6b39b7dfac8baf7c533eccad9c1b1ec979fd9da1f
Copying blob sha256:247b7ffb1b23854ca97d9e0f525c7bf004a5b020ad59073c3ad122939ed12023
Copying blob sha256:4bd9d73adfda118620666cb30e399571043b82fc17866d12edd2c880bfb70db9
Copying blob sha256:66b2cb53484684459a1e96284e30d58b51d34ab1a599a6b423017b2984ea36e2
Copying blob sha256:f82910e5a9e1568664faeb21e356a2ace6857bf9176a5c6e375147c1882065ab
Copying blob sha256:31ea48e98a13de8ce1947a50786c59e25c3b81baca2566ed69a9498eb562340c
Copying blob sha256:a7ffe3f652a34476291971d5d0145af680e3ad32276f60d484396362ee628794
Copying blob sha256:b7a708fb29abd16c0c9b41d09197690bbb98550ad1d6fba040bf6a6b883c175a
Copying blob sha256:93a5f4a23a2a39450862862c71bcbad0f8d640dfe5e2b2481e81552fbe3cdca1
Copying blob sha256:b0f50eff62fb4a201a7b5674ae158c6aad357545fd6ce04641012bb56abd0899
Copying blob sha256:36e50f18fcd15a5938bc30dbd61f77e723f3c7313eefc555dd6d0b71a1f4360c
Copying blob sha256:b6e594ce037e18cf0c543be8f93791a6ec71c7a31b4ca38bdbe9c18a4b7a9a2f
Copying config sha256:6cc446bc739696fdc057b0a4ab5d4113848cb0fbbf5f86d0b96d42500d09ab82
Writing manifest to image destination
Directory /shared/output already exists.
Starting up neo4j
Waiting a bit before starting workspace builder REST server
WARNING - Config value 'build': Unrecognised configuration name: build
WARNING - Config value 'dev_addr': The use of the IP address '0.0.0.0' suggests a production environment or the use of a proxy to connect to the MkDocs server. However, the MkDocs' server is intended for local development purposes only. Please use a third party production-ready server instead.
INFO - Building documentation...
INFO - Cleaning site directory
INFO - Documentation built in 1.95 seconds
INFO - [13:10:55] Watching paths for changes: 'cheat-sheet-docs/docs', 'cheat-sheet-docs/mkdocs.yml'
INFO - [13:10:55] Serving on http://0.0.0.0:8081/
Changed password for user 'neo4j'. IMPORTANT: this change will only take effect if performed before the database is started for the first time.
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying auth.0012_alter_user_first_name_max_length... OK
Applying sessions.0001_initial... OK
Starting workspace builder REST server
Performing system checks...

System check identified no issues (0 silenced).
September 13, 2023 - 13:11:10
Django version 4.2.4, using settings 'config.settings'
Starting development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.

2023-09-13 13:11:13.143+0000 INFO Starting...
2023-09-13 13:11:16.459+0000 INFO This instance is ServerId{13c72ad8} (13c72ad8-e081-4765-8b2f-8751fef99fab)
2023-09-13 13:11:18.757+0000 INFO ======== Neo4j 5.11.0 ========
2023-09-13 13:11:23.432+0000 INFO Bolt enabled on 0.0.0.0:7687.
2023-09-13 13:11:26.092+0000 INFO Remote interface available at http://localhost:7474/
2023-09-13 13:11:26.104+0000 INFO id: 977AA64408D53387838BCF5532365AD84FA7BB9E52F171A6FFD7E3CAA05F375E
2023-09-13 13:11:26.107+0000 INFO name: system
2023-09-13 13:11:26.110+0000 INFO creationDate: 2023-09-13T13:11:20.132Z
2023-09-13 13:11:26.115+0000 INFO Started.
INFO - [13:12:32] Browser connected: http://0.0.0.0:8081/
INFO - [13:13:49] Browser connected: http://0.0.0.0:8081/
INFO - [13:14:05] Browser connected: http://0.0.0.0:8081/
INFO - [13:14:08] Browser connected: http://0.0.0.0:8081/about/
INFO - [13:14:09] Browser connected: http://0.0.0.0:8081/list/
INFO - [13:14:14] Browser connected: http://0.0.0.0:8081/
INFO - [13:16:44] Detected file changes
INFO - Building documentation...
WARNING - Config value 'build': Unrecognised configuration name: build
WARNING - Config value 'dev_addr': The use of the IP address '0.0.0.0' suggests a production environment or the use of a proxy to connect to the MkDocs server. However, the MkDocs' server is intended for local development purposes only. Please use a third party production-ready server instead.
INFO - Documentation built in 0.41 seconds
INFO - [13:16:44] Reloading browsers
INFO - [13:16:45] Browser connected: http://0.0.0.0:8081/
File "/workspace-builder/workspace_builder/views.py", line 102, in post
run_components(context, components)

File "/workspace-builder/component.py", line 316, in run_components
component.load_func(context)

File "/workspace-builder/enrichers/generation_rules.py", line 826, in load
generation_rules_config = yaml.safe_load(generation_rules_config_text)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/init.py", line 125, in safe_load
return load(stream, SafeLoader)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/init.py", line 81, in load
return loader.get_single_data()

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/composer.py", line 110, in compose_sequence_node
while not self.check_event(SequenceEndEvent):

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()

File "/opt/pysetup/.venv/lib/python3.9/site-packages/yaml/parser.py", line 392, in parse_block_sequence_entry
raise ParserError("while parsing a block collection", self.marks[-1],

Internal Server Error: /run/
[13/Sep/2023 13:16:47] "POST /run/ HTTP/1.1" 500 12799

——————————————————————————————————————————————————————————————
#podman exec -w /workspace-builder -- RunWhenLocal ./run.sh
Discovering resources...
Error 500 from Workspace Builder service for command "run": while parsing a block collection
in "", line 5, column 5:
- resourceTypes:
^
expected , but found '?'
in "", line 24, column 5:
slxs:
^

[runwhen-local-feedback] Attempting to limit discovery to a single namespace returns odd results

Observation
In the workspaceInto.yaml file, it's possible to specify additional constraints on discovery. There is this notion of specifying the level of detail (LOD) for namespaces. The defaultLOD is what's applied to all discovered namespaces, with the following possible values:

0 - Don't discover anything
1 - Light level of detail
2 - Deep level of detail

In reality, the map generation rules specify the level of detail on how to be included in a discovery task, but the above is the gist of it.

So, if I want to discover or create a cheat sheet for a single namespace, I would do something like this:

defaultLOD: 0
namespaceLODs:
  kube-public: 0
  kube-system: 0
  online-boutique: 2

In the above workspaceInfo.yaml spec, this says to skip detail on all namespaces (defaultLOD: 0), but perform deep discovery on online-boutique.

The challenge that I'm experiencing right now is that the above configuration produces something like this:

which starts to look about right, but then when exploring the commands, we can clearly see an ungrouped section that is including all of the namespace scoped commands for every namespace in the cluster. (Note that we see things like artifactory, argocd, and so on, where all we wanted was online-boutique)

Possible Suggestions
This behaviour didn't occur before, as far as I can tell - but I'm not sure we've tested this "reduced discovery" configuration since migrating the generation rules into the codecollection repositories (e.g. https://github.com/runwhen-contrib/rw-cli-codecollection/tree/main/codebundles/k8s-argocd-application-health/.runwhen)

Let's check the actual output created by the discovery tool to ensure that it's occuring at the discovery layer and not a left over artifact somehow rendered by mkdocs.
If the above turns out to be true, review the generation rules to ensure they aren't being included at LOD:0
Dig deeper to see if some regression opened is causing the behaviour

Any other details or context

runwhen-local:0.2.0

Move all of the existing gen rules to the code bundles

This is currently tracked internally https://github.com/runwhen/platform-core/issues/1017 and duplicated here for public visibility.

This is associated with task #247. That task is for the workspace builder code to support loading gen rules from the code bundler. This task is just to actually reorganize all of the existing gen rules and templates to support that new code structure.

[runwhen-local-feedback] Add github buttons

Observation
It would be helpful to add some github buttons to the app, such as the ability to star a repo. We hope that this will be a starting point towards surfacing the community aspect of the tool to better enable discussions and to let other users know which commands are most helpful or how they can be used.

Possible Suggestions
Leverage the official https://buttons.github.io/ for familiarity.

Any other details or context

[runwhen-local-feedback] Need better grouping rules for multiple clusters

Observation
I have a case where there are some 4000 commands indexed across 9 clusters and it's hard to sort them by context. We need a better way to align the sidebar/nav so that clusters can be easily managed when there's more than one cluster involved in the scan.

Possible Suggestions
We can look at nested folders based on context information potentially - it's probably the most straight forward approach until we start coming up with more intuitive grouping rules.

Any other details or context

[runwhen-local-feedback] argocd commands need format updates

Observation
The commands in the rw-cli-codecollection are rendering poorly in runwhen local - we will need to have a look and likely escape some newline characters

Possible Suggestions

Any other details or context

Support in workspace builder for user configuration of code collections that are scanned for gen rules

The code is already there in the REST service backend, but the run.py CLI tool needs to be updated to pull the config from the workspace info and convert to a request param.

[runwhen-local-feedback] Speed up rendering

Observation
The content rendering process can be fairly lengthy -- a few minutes or more depending on how many resources are discovered. It would be nice to speed this up.

Possible Suggestions
The rendering process is currently serial; we could look at some parallel processing to speed this up, but this will require some thought and design choices around how the local git caches are handled to properly support this.

Any other details or context

[runwhen-local-feedback] Support for in-cluster auth

Observation
It would provide a faster setup experience, especially with helm, if we can support in-cluster auth. Currently RunWhen Local looks for a kubeconfig at /shared/kubeconfig

Possible Suggestions
Couple of options;

construct a kubeconfig builder or template that uses the provisioned service account token and in-cluster credentials
modify the code to attempt to use the projected token from the pre-configured service account at something like /var/run/secrets/kubernetes.io/serviceaccount/token

Note: This is likely a 1-cluster configuration at this point in time. The method of adding in a hand-crafted kubeconfig is still the appropriate method for multi-cluster discovery.

Any other details or context

[runwhen-local-feedback] Dark mode for cards is unreadable

Observation
Dark mode makes the summarized findings hard to read:

Possible Suggestions
A few simple css adjustments are needed.

Any other details or context

[runwhen-local-feedback] Enhancement Suggestion - add requests / resources to deployment example

Observation
Currently, RunWhen Local Deployment doesn't have resource requests/limits.

root@zoh-aks:~# kubectl get pods --context=zoh-aks-cilium -n runwhen-local --field-selector=status.phase=Running -ojson | jq -r '[.items[] as $pod | ($pod.spec.containers // [][])[] | select(.resources.requests == null) | {pod: $pod.metadata.name, container_without_requests: .name}]' [ { "pod": "runwhen-local-66bf8b7b78-jj2nm", "container_without_requests": "runwhen-local" } ]

Possible Suggestions

Use pod requests and limits to manage compute resources within a cluster. Pod requests and limits inform the Kubernetes scheduler of the compute resources to assign to a pod.

Pods in the BestEffort QoS class can use node resources that aren't specifically assigned to Pods in other QoS classes. For example, if you have a node with 16 CPU cores available to the kubelet, and you assign 4 CPU cores to a Guaranteed Pod, then a Pod in the BestEffort QoS class can try to use any amount of the remaining 12 CPU cores.

Adding below can help and it won't fail if we can clusters.

RunWhen Local Resource Management:

request/limits - Pod Memory
request - Pod CPU

Any other details or context

I've deployed RunWhen Local and scanned the cluster. As it doesn't have resource requests/limits defined, it has failed in compliance.

[runwhen-local-feedback] Bug: Developers can't list namespaces in OpenShift

Observation
Tested on OKD 4.13, OpenShift 4.12.3 (CodeReadyContainers).

As a cluster admin there were no issues in either environment.

As a developer with access to only a few namespaces I ran into RBAC issues because OpenShift prefers projects instead of namespaces by default. The standard user / developer RBAC does not include permissions to list namespaces resulting in a fatal exception. (OpenShift creates an identical Project when a Namespace is created and vice-versa)

Note: Make sure the developer context is not in the kubeconfig or generation will fail with an exception.

Internal Server Error: /run/
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/workspace-builder/prodgraph/views.py", line 105, in post
    raise e
  File "/workspace-builder/prodgraph/views.py", line 98, in post
    component.run_func(context)
  File "/workspace-builder/indexers/kubeapi.py", line 113, in index
    ret = core_api_client.list_namespace()
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 14721, in list_namespace
    return self.list_namespace_with_http_info(**kwargs)  # noqa: E501
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 14828, in list_namespace_with_http_info
    return self.api_client.call_api(
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/rest.py", line 241, in GET
    return self.request("GET", url,
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '3b8d84a0-b522-47ed-bfdc-7a4d6324110a', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '19a3dc24-ba00-43cc-bd97-8679fcc8ca83', 'X-Kubernetes-Pf-Prioritylevel-Uid': '78b618f3-7def-4784-8cf4-75416a1bd9d0', 'Date': 'Thu, 01 Jun 2023 05:12:03 GMT', 'Content-Length': '264'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces is forbidden: User \"developer\" cannot list resource \"namespaces\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"kind":"namespaces"},"code":403}


[01/Jun/2023 05:13:00] "POST /run/ HTTP/1.1" 500 145

Possible Suggestions
If running against OpenShift we should try listing projects if namespaces can't be listed.

  apiVersion: project.openshift.io/v1
  kind: Project
  metadata:
    annotations:
      openshift.io/sa.scc.mcs: s0:c26,c5
      openshift.io/sa.scc.supplemental-groups: 1000660000/10000
      openshift.io/sa.scc.uid-range: 1000660000/10000
    creationTimestamp: "2023-06-01T04:49:55Z"
    labels:
      kubernetes.io/metadata.name: test
      pod-security.kubernetes.io/audit: restricted
      pod-security.kubernetes.io/audit-version: v1.24
      pod-security.kubernetes.io/warn: restricted
      pod-security.kubernetes.io/warn-version: v1.24
    name: test
    resourceVersion: "57784"
    uid: ef083331-7581-4b50-8c0c-e3b80c810fcd
  spec:
    finalizers:
    - kubernetes
  status:
    phase: Active

[runwhen-local-feedback] [Error -2] Name or service not known

Observation

Getting a connection broken error:

Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb51f9f7550>: Failed to establish a new connection: [Errno -2] Name or service not known')': /apis/

Possible Suggestions
We need to triage the error until we have better error reporting and provide additional troubleshooting steps.

Any other details or context

This is being tested in a K3S environment.

runwhen-contrib / runwhen-local Goto Github PK

runwhen-local's People

Contributors

Stargazers

Watchers

Forkers

runwhen-local's Issues

set sed binary

What do you need the command to do?

What should the output look like?

Any other helpful context?

Contact

Run the container image

Recommend Projects

Recommend Topics

Recommend Org