Giter VIP home page Giter VIP logo

magtape's People

Contributors

dependabot[bot] avatar freakin avatar ilrudie avatar jsteichen12 avatar kamleshjoshi8102 avatar phenixblue avatar pramod74 avatar xytian315 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magtape's Issues

Fix DockerHub Secret References

What happened:

It looks like the secrets for referencing the DockerHUb username/password got changed in the workflow file, but not in the repo. I can't see why the name change was necessary, so we need to make the workflow match what's configured already.

What you expected to happen:

Image Builds for releases will succeed

allow specification of more diverse kubectl verbs in functional testing

What would you like to be added:
Investigate update to functional testing to allow verbs such as kubectl create instead of only supporting apply.

Why is this needed:
Adds capability to the CI to test a more diverse spectra of polices covering a wider variety of changes which might be requested against a cluster.

Add HPA for MagTape

What would you like to be added:

Add a Horizontal Pod Autoscaler resource to the mandate deployment artifacts.

Why is this needed:

This will be used to scale out replicas vs scaling up workers/threads per pod. This is related to the remediation of the issue noted in #48

Add support for post-assessment Webhook

What would you like to be added:

Add functionality to allow for calling a user defined endpoint for policy failures (possibly passes as well).

Not sure if the granularity should be a single global configuration for a MagTape installation, different endpoint per policy, etc.

should be bypassed if no config is provided

should have a timeout value and should not cause a failure in the policy assessment if the call to the endpoint fails

ideally this can happen asynchronous and be non-blocking to the end-user request

Why is this needed:

to allow for integration with existing systems for alerting/reporting

Add Rego test automation

What would you like to be added:

Need to add Rego unit tests to CI checks

Why is this needed:

Increase automate checking to build higher level of confidence in Policies and related changes.

Add metric for Webhook Cert Expiration

What would you like to be added:

Would like to have a background process/sidecar to track a metric for Webhook cert expiration (ie. Num days left)

Why is this needed:

General observability concerns and tracking for lifecycle touch points (certificate rotation).

Need to update README for Testing

What would you like to be added:

The testing README located here needs to be updated for the most recent changes to the functional testing framework.

Reference #45 for more context

Why is this needed:

The existing info is slightly outdated even the changes to functional-tests.yaml

Validate cert/key pairs for init workflow

What would you like to be added:

Add functionality to the magtape-init workflow to validate TLS Cert/Key relationship for both self-generated pairs and the BYOC mode.

Why is this needed:

To catch init errors sooner in the process to make the UX smoother and more robust.

Add multi-arch image builds

What would you like to be added:

We currently build container images for magtape and magtape-init for amd64 architecture only. We should start building for arm64 and ppc64le at a minimum. Probably good to check out a few other projects that are doing multi-arch builds and include any other architectures that seem relevant.

Why is this needed:

Wider support for hardware architectures that are gaining popularity within the Kubernetes community.

Add support for server-side warnings for K8s v.19+

What would you like to be added:

Add functionality to take advantage of the server-side warnings enabled in Kubernetes v1.19.

More info:

Why is this needed:

This will allow surfacing policy failures bak to the client (kubectl/client-go), even in cases where the admission response is not a denial.

Background scanning for policy violations

What would you like to be added:

a mechanism to scan and alert on kubernetes resources that are already deployed in the cluster (ie. past the initial admission control workflow).

  • Probably needs to run in a configurable interval
  • Could be background daemon or sidecar, or a completely separate pod.
  • Could be a good thing to look at doing in Golang
  • Maybe think through the possibility of an enforcement action in addition to alerts (ie. Scale to 0 pods on Deploymebt with privileged pod spec)
  • not sure if we'd want a separate severity/deny level for the background scanning vs. the admission response flow

Why is this needed:

This would cover brownfield environments or scenarios where new policies are added/policy severity changes and resources may be long-lived/deployed infrequently

Add documentation for fix release procedure

What would you like to be added:

Need to enhance contributor docs to include steps for fix release procedures.

Why is this needed:

To provide a consistent maintainer experience when back porting bug/security fixes.

Add conditional CI Checks

What would you like to be added:

Need to add some conditional logic to certain CI checks to increase efficiency for some PR's.

Why is this needed:

Not all checks need to run on a given PR unless certain files change. Github Actions has filtering capabilities, but it breaks things if you enable that with a required check (https://github.community/t5/GitHub-Actions/Feature-request-conditional-required-checks/m-p/36938#M2735
) and then the check doesn’t trigger in a PR. Until there’s a solution from Github we may need to add the conditional logic into the CI checks themselves.

Use something like this in a helper function that lives somewhere in ./hack. This should be generic to work for any number of directly/file paths and for any CI check.

$ git --no-pager diff --name-only --ignore-blank-lines HEAD $REF -- app

Probably not an exhaustive list, but good to start collecting the specific paths we want to trigger on for each set of checks:

Python Checks

    - /app
    - /.github/workflows

e2e Checks

    - /app
    - /deploy/manifests
    - /policies
    - /hack
    - /.github/workflows

Manifests Checks

    - /deploy/manifests
    - /hack
    - /.github/workflows

Instrument Distributed Tracing for MagTape

What would you like to be added:

Instrument Distributed Tracing for the MagTape application. Ideally using OpenTelemetry packages.

Why is this needed:

To give more robust telemetry collection for MagTape. This is helpful for development of new features with performance in mind, tracking regressions between releases, for tracking the impact of additional policies over time, and general troubleshooting of performance related issues.

Update CHANGELOG and release docs post-v2.3.0 release

What would you like to be added:

  • Add changes to CHANGELOG.md noted from v2.3.0 release
  • Add clarification to release docs based on feedback form the v2.3.0 release

Why is this needed:

We made some last minute updates to the changelog for the v2.3.0 release and need to make sure those changes make their way back into the actual CHANGELOG.md. file.

Also noted some gaps in the release documentation that need to be added.

Update PR Template

What would you like to be added:

  • Need to add /kind release to PR type section in PR template

Why is this needed:

Additional relevance for PR template that should make the contribex a bit better.

Enhance docs around kube-mgmt cache customization

What would you like to be added:

Need to addd some verbiage to the docs that cover thee possibility of adjusting resources (CPU/MEM) as you add additional Kubernetes resources to be replicated by kube-mgmt.

Why is this needed:

I've noticed on some large clusters with many objects that the HPA kicks in automatically following an initial deployment of MagTape and remains at max replicas. This seems to be associated with kube-mgmt replication (ie. more resource types/number of a given type of resource on the cluster).

Possibly look into adjusting sync interval or adjusting resource location for the kube-mgmt container. At the very least we need to call it out in the dos.

Move end-user Slack Webhook URL to Secret

What would you like to be added:

Move end-user Slack Webhook URL to Secret

Why is this needed:

Currently an end-user can supply their own Slack Incoming Webhook URL as an annotation on their namespace to direct alerts at their own Slack channel for policy violations within their namespace. As the Slack Incoming Webhook URL is considered sensitive, this should be moved to a Secret resource.

The two ideas I have for this are:

  • Use a namespace label to specify a Secret resource to read the information from (ie. k8s.t-mobile.com/slack-webhook-secret: <my_custom_secret_name>)
    • The expected namespace label should be globally configurable via ENV var similar to MAGTAPE_SLACK_USER_LABEL
  • Use a consistent Secret resource name (ie. magtape-slack-secret)
    • The expected secret name should be globally configurable via ENV var similar to MAGTAPE_SLACK_USER_SECRET

Cleanup:

The end goal should involve the cleanup of the existing configs/tests. For example:

  • Remove the existing MAGTAPE_SLACK_ANNOTATION ENV var

Support arm64 Architecture

What would you like to be added:

We need to have MagTape support deployment to arm64 based cluster environments.

We have multi-arch builds of the magtape-init and magtape container images, but we need supported images for opa and kube-mgmt as well.

Related to open-policy-agent/opa#2233 for arm64 support with OPA.

Why is this needed:

Further deployment flexibility

Verify all K8s resource manifests have standard labels

What would you like to be added:

Need to make sure all kubernetes resource manifests in ./deploy/manifests have a standard set of labels:

  • app=magtape
  • resource=<resource_type> (ie. resource=deployment)

Why is this needed:

Standardization to easily identify all installed MagTape resources.

Pods crash when scheduled on nodes with >24 CPU's

What happened:

Installing and running MagTape on worker nodes with 24 or more CPU's generates a high number of threads with Gunicorn and there appears to be a memory leak of some sort.

What you expected to happen:

Pods to startup normally

How to reproduce it (as minimally and precisely as possible):

Run the simple install in a cluster with worker nodes that have 24 or more CPU's

Anything else we need to know?:

Experienced on worker nodes with 24 cores x 128GB RAM

Example output from MagTape container logs:

[2020-10-02 04:52:27 +0000] [107] [INFO] Booting worker with pid: 107
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:62)
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:63)
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:64)
[2020-10-02 04:52:27 +0000] [62] [INFO] Worker exiting (pid: 62)
[2020-10-02 04:52:27 +0000] [64] [INFO] Worker exiting (pid: 64)
[2020-10-02 04:52:27 +0000] [63] [INFO] Worker exiting (pid: 63)
[2020-10-02 04:52:29 +0000] [108] [INFO] Booting worker with pid: 108
[2020-10-02 04:52:29 +0000] [109] [INFO] Booting worker with pid: 109
[2020-10-02 04:52:30 +0000] [1] [INFO] Unhandled exception in main loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211, in run
    self.manage_workers()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
    self.spawn_workers()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
    self.spawn_worker()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 567, in spawn_worker
    pid = os.fork()
OSError: [Errno 12] Out of memory

Environment:

  • Kubernetes version (use kubectl version): v1.15.5
  • Worker Node OS: Ubuntu 16.04
  • Cloud provider or hardware configuration:
  • Others:

Add logic to handle Github Action Workflow dependencies on releases

What would you like to be added:

Add logic to handle Github Action Workflow dependencies on releases

Why is this needed:

Currently the release flow executes e2e tests prior to the new container image build because the same flow is used for PR/push to master. Need to make sure image build happens before e2e tests for release prep (push to master). This may require pre-release image builds or refactoring the Github Action workflows overall.

Review CI for Required Checks

What would you like to be added:

Need to review CI configuration with regards to required checks. I've seen some other projects that have minimal required checks (vs. almost all checks for MagTape being required).

Why is this needed:

If we can unmark certain checks as required, we can add path filters to minimize CI checks for small things like docs updates, etc. (ie. No need to run e2e checks if only docs are updated, or no need to run Rego checks if no Rego files are touched in a PR).

Add ability to disable policy per namespace

What would you like to be added:

It would be nice to have the ability to disable individual policies on a per-namespace basis.

Why is this needed:

This allows for flexibility to granularly disable policies without the need to completely remove a policy from a cluster, or having to lower the severity level of the policy at a global cluster level to meet the needs of a specific namespace.

Ideally I think something along the lines of a label on the namespace could work. Something like:

k8s.t-mobile.com/magtape-disable-<policy_name>: true

If the label is found for a specific policy we need to add logic to skip deny's, but still track failures for alerting/event creation. Because this is primarily valid in environments where end-users can't manipulate their assigned namespace resource, this should also be a global toggle with an ENV var similar to MAGTAPE_ENABLE_NS_TOGGLE

Linter check fails without any Python changes

What happened:

The lint job in the python-checks workflow is failing when no changes to Python files are made.

What you expected to happen:

listing should pass if no changes to Python files have been made

How to reproduce it (as minimally and precisely as possible):

open any PR, check will fail.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

Add descriptions to functional tests

What would you like to be added:

Add descriptions for all existing functional tests to support the new functionality added in #86

May also be good to add some comments to the contrib docs around this pattern now that we have a more solid structure to the testing framework.

Why is this needed:

More descriptive output to track what each functional test is actually testing for in a simple human readable format.

Extend testing options for functional-test automation

What would you like to be added:

Currently there's not a good way to perform per test setup/breakdown for things outside of k8s artifacts.

Why is this needed:

Some tests may require setting up specific scenarios before/after a given functional test. Example:

NodePort test should add an annotation to the target namespace and then remove it when done.

Ideally this is done in a generic way where each resource type can have a setup/breakdown related hook.

Fix Bring Your Own Cert Docs

What would you like to be added:

The references to the MAGTAPE_TLS_SECRET environment variable should be removed and the documentation for the "Bring Your Own Cert" (BYOC) model needs to be corrected.

The BYOC model requires an annotation on the magtape-tls secret. Details are within the magtape-init code.

Why is this needed:

To correctly describe the BYOC scenario and configuration.

Add Shellcheck CI checks

What would you like to be added:

Need to add Shellcheck CI Checks for all bash scripts in the repo.

Why is this needed:

To ensure adherence to a standard for BASH scripts for consistency and best practices. This should help to maintain trusted tooling within the repo.

Review CI for pinning utilities to specific versions

What would you like to be added:

During work on #77 I encountered an issue with a change to the version of the kubectl utility used in the ubuntu-latest GitHub Actions image. v1.19.0 seemed to produce errors for the compare-manifest CI job. I added a ci-bootstrap Make target to download a specific version of kubectl and replace the default in the container image.

Why is this needed:

We should review the work I did and assess the process for more longterm usage and to extend to any other utilities we want (ops, kustomize, kind, etc.). The Gatekeeper project had some examples of doing this that can be used for reference.

Fix typos in Policies Doc

What would you like to be added:

Fix a few typos in the Policies doc.

  • NodePort policy: The nodePort annotation on the namespace should be "k8s.t-mobile.com/nodeportRange" Set the annotation to "na" if no nodePort range will to be set, that is seen as an exception value
  • emptyDir policy: Using emptyDir leads to consumption of ephemeral storage on the underlying nodes and can fill up easily affecting others on the platform.

There are probably others, so a good overall review would be nice.

Why is this needed:

Make our docs clear and understandable.

Reorganize Rego tests/mock data

What would you like to be added:

Would like to reorganize the Rego policy unit tests and mocked data. We should add some additional verbiage specific to policy contributions in CONTRIBUTING.md (expected file layout, test coverage, etc.)

Why is this needed:

  • Having the tests in their own packages seems against standards
  • Having the mocked data inline with the tests can be a bit busy/doesn't lend well to reuse.

Add matrix to test multiple K8s versions

What would you like to be added:

Add a matrix for Kubernetes versions to the e2e CI check Action

Why is this needed:

We need to identify a target range of Kubernetes versions to test each release against/support. We currently only test against the latest (currently 1.18.2) in the version of KinD that's used in the e2e check Action.

Enable functional tests to have a descriptive name

What would you like to be added:
Allow a descriptive name to be associated with each functional test which can be printed during testing.

Why is this needed:
Enable more meaningful output during functional testing to make it easier to determine exactly what is being tested.

Documentation update:
Once this is implemented and descriptive names are set for each test the Test Samples Available table can be removed from readme.md

Install times out waiting for CSR approval

What happened:
Ran kubectl apply -f https://raw.githubusercontent.com/tmobile/magtape/master/deploy/install.yaml
magtape init on Kubernetes 1.18 (KinD) timed out waiting for CSR to be approved

What you expected to happen:
MagTape to deploy on my test system

How to reproduce it (as minimally and precisely as possible):
Running a 1.18 version of Kubernetes apply the install.yaml

Anything else we need to know?:
INFO: Waiting for certificate approval
INFO: Timed out reading certificate request "magtape-svc.magtape-system.cert-request"
forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"","reason":"Forbidden"

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: v1.16.6-beta.0
    Server Version: v1.18.2
  • Cloud provider or hardware configuration:
  • Others:
    kind v0.8.1 go1.14.6 darwin/amd64

Bump KinD node Image versions used in CI

What would you like to be added:

Bump the KinD node image version to be on the latest dot releases for each currently supported minor version.

KinD node images are defined here in our e2e checks workflow:

kind-node-image: kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600

Reference the KinD release pages for current node images.

Why is this needed:

There have been some security fixes and associated new Kubernetes upstream releases

Add documentation for customizing webhook label

What would you like to be added:
Need to add documentation on how to customize the namespace label used with the webhook labelselector.

Why is this needed:
Additional flexibility in customization

Investigate porting magtape-init to Go

What would you like to be added:

Investigate effort level/advantages for moving the magtape-init code to Golang.

Why is this needed:

This came up in conversation for a couple of reasons:

  • Migrating the core magtape code to Go in order to consume the OPA Go library and move away from the sidecar
  • Potential simplification of TLS bits in init process and better support for extended cert/key validation

Add more detail around contrib scenarios

What would you like to be added:

Need to add additional documentation around different contributor scenarios.

Examples:

  • Run linting/formatting if you edit Python files (#80)
  • Run linting/formatting if you edit Rego files (#60/#80)
  • Rebuild install manifest if you edit YAML manifests (#80)

Why is this needed:

Lower barrier to entry for new contributors and to help me not have to remember!

Investigate porting magtape core code to Go

What would you like to be added:

Investigate migrating core MagTape code from Python to Golang

Need to know general idea of functionality with OPA Go library and assess UX in project lifecycle improvements as well as installation/testing of MagTape.

Why is this needed:

Potential usability simplification and performance increase by consuming the OPA Go library and moving away from the sidecar

Migrate to Gunicorn WSGI Server

What would you like to be added:

Currently the native Flask HTTP server is used within the container image for MagTape. This is not the best for Production use and should be updated to Guniorn or some other production ready WSGI server.

Why is this needed:

Better performance and resiliency

Extend NodePort policy functional testing

What would you like to be added:

Since #45 has now been merged we should be able to extend the functional testing for the NodePort policy.

Why is this needed:

We're currently not accurately testing the NodePort policies.

These tests require specific annotations on the target namespace for testing and will require a specifically formatted script that adheres to the pattern specified by the new functional testing framework.

Disable name suffix in configmapGenerator

What would you like to be added:

Need to disable the name suffix for the configmapGenerator in the base customization.yaml

Why is this needed:

This is needed to be consistent with the advanced install workflow.

Need to add versioning for QuickStart install link in README

What would you like to be added:

We need to add a versioned reference for the install.yaml in linked in the main README.

Why is this needed:

Currently visitors to the repo will pull the latest install.yaml linked to the master branch, which could be under active development/not in a working state. Moving to a versioned reference provides a higher degree of stability to the casual visitor and allows us to maintain active development on the master branch.

This should be tied to the set-release-version make target for updating on new releases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.