Giter VIP home page Giter VIP logo

observability's Introduction

Observability

A repository to collect all of the initiatives around Observability currently being worked on at Canonical.

A list of all the active repositories maintained by the Observability team can be found using the observability topic.

Want to know more? See the CharmHub topic page on Observability.

GitHub Workflows

This repository holds all of our reusable workflows, in the .github/workflows folder; our other repositories implement their workflows by calling these. We follow two conventions for naming them:

  • workflows starting with _ are “private”, meaning they are used by other workflows and shouldn't be called directly;
  • the name should loosely follow a {scope}-{function}.yaml schema, to make the folder easily searchable.

Base Workflows

The issues.yaml workflow is used in all of our repositories to propagate GitHub issues to Jira; both opening and closing an issue will cause the related Jira issue to be (respectively) created or closed.

Charm Workflows

On PRs On main Periodically Manually
charm-pull-request.yaml charm-release.yaml charm-update-libs.yaml charm-promote.yaml
└── _charm-quality-checks.yaml ├── _charm-quality-checks.yaml (charm-update-libs.yaml)
....├── _charm-codeql-analysis.yaml ....├── _charm-codeql-analysis.yaml
....├── _charm-static-analysis.yaml ....├── _charm-static-analysis.yaml
....├── _charm-linting.yaml ....├── _charm-linting.yaml
....├── _charm-unit-tests.yaml ....├── _charm-unit-tests.yaml
....├── _charm-scenario-tests.yaml ....├── _charm-scenario-tests.yaml
....└── _charm-integration-tests.yaml ....└── _charm-integration-tests.yaml
└── _charm-release.yaml

Whenever a PR is opened to a charm repository, some quality checks are run:

  • first check that the CHARMHUB_TOKEN secret is set on the repo, as it's needed by other actions;
  • run the Canonical inclusive naming workflow;
  • make sure charm libraries are updated and tag the PR accordingly with "Libraries: OK" or "Libraries: Out of Sync";
  • run linting, analyses and tests to ensure the code quality.

After a PR is merged, the same quality checks are run on the main branch; when passing, the CI takes care of publishing any bumped charm library and releasing the charm to edge.

Periodically, CI checks whether the charm libraries are up-to-date; if not (i.e., another charm published an updated library), a PR is automatically opened to update them with the new version.

There's also a manual action to promote the charm (i.e., from latest/edge to latest/beta), making the process more user-friendly.

ROCK Workflows

On PRs On main Periodically Manually
_rock-pull-request.yaml rock-release-dev.yaml rock-update.yaml (rock-release-dev.yaml)
└── _rock-build-test.yaml rock-release-oci-factory.yaml (rock-update.yaml)

Our ROCKs are built in oci-factory, which covers:

  • building and publishing the ROCKS to DockerHub;
  • tagging with semantic versions (e.g., prometheus:{major} pointing to the latest prometheus:{major}.{minor}.{patch})
  • periodically rebuilding ROCKs to pull any security fix.

These workflows make the repositories holding our ROCKs almost fully automated: whenever the upstream project releases a new version, a PR is opened automatically to add a ROCK for that specific version. Consequently, a workflow is run to make a quality check by trying to build the ROCK locally.

When the PR is merged, the ROCK is published to the GitHub Container Registry (GHCR) with a :dev tag. At the same time, a PR is opened to the oci-factory repo for the ROCKs team to approve and merge, triggering the actual build process.

Meta Repo

This repo also contains the manifest (manifest.yaml) for syncing all repositories maintained by the observability team. The script assumes that you want to place all repos in the parent folder of the observability repo. To use it, do the following:

# install the git-metarepo module
$ pip3 install metarepo

# sync the repos using the manifest
$ git meta sync

observability's People

Contributors

abuelodelanada avatar artivis avatar barrettj12 avatar beliaev-maksim avatar bencekov avatar dstathis avatar jnsgruk avatar lucabello avatar mmkay avatar nsklikas avatar rbarry82 avatar sed-i avatar simskij avatar skatsaounis avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

observability's Issues

Wrong tags when adding multiple Rocks to OCI Factory

There's a bug in our CI workflow that opens a PR to OCI Factory.

As you can see in this file (produced by our CI), multiple images are assigned the major 10-22.04 tag, which is obviously wrong (only 10.3.3 should have it, not 10.3.1).

The way we decide what to tag with major.minor and major is based on checking against the list of already-existing tags. However, this means that if we now add 11.0, 11.1, and 11.2 in the same PR, they will all be tagged as 11, which is wrong.

The CI needs to be modified to not only check against already-existing tags, but on the union of that and the tags being added/modified by the PR itself.

Add inclusive naming to CI?

Enhancement Proposal

pytest-operator has this:

jobs:
  call-inclusive-naming-check:
    name: Inclusive naming
    uses: canonical-web-and-design/Inclusive-naming/.github/workflows/woke.yaml@main
    with:
      fail-on-error: "true"

Unpin unnecessary deps for itests

Need to verify and remove the following workaround from the itests CI:

  • Juju agent pinning to 2.9.29
  • Hostpath provisioner pinning to 1.3.0

bootstrap-options: "--agent-version 2.9.29"
- name: Patch hostpath-provisioner
run: >
sg microk8s -c "kubectl patch deployment hostpath-provisioner -n kube-system -p '{\"spec\": {\"template\": {\"spec\": {\"containers\": [{\"name\":\"hostpath-provisioner\", \"image\": \"cdkbot/hostpath-provisioner:1.3.0\" }] }}}}'"

Parametrize the `provider` key in `_charm-tests-integration.yaml`

grafana-agent-operator is a machine charm the utilizes our centralized CI (canonical/grafana-agent-operator/pull/2).
We never ran machine itests in CI when it was part of the hybrid k8s-machine charm repo.

Now, for itests to work, we need to be able to bootstrap a lxd controller.

- name: Setup operator environment
uses: charmed-kubernetes/actions-operator@main
with:
juju-channel: 3.1/edge
provider: microk8s

Ability to release a rock from different branches on different tags

On a rock repository, we've been duplicating the rock-release-dev.yaml file.

We quickly had the need to release branches on different tags (to test a k8s charm for example).

We modified our flow as followed: https://github.com/ubuntu-robotics/foxglove-studio-rock/pull/12/files

This way, when manually triggered on a branch (branch_A) the flow is publishing the rock with the associated tag (:branch_A).

Would you be interested in integrating a similar flow?

rock-pull-request workflow does nothing

The rock-pull-request workflow does nothing.

The reason is that it relies on the tj-actions/changed-files action to get a list of all modified rockcraft.yaml files but then uses the wrong output variable from this action. It uses changed_files -- which was renamed a while back to type_changed_files -- which gets the list of files which extension have been changed.
This results in an empty list, which is happily and successfully iterated over by the job resulting in a happy CI.

Promotions do not work correctly when using multiple builds-on and runs-on combinations

After running Promote charm from edge to beta, the following happens:

simme@willow:~$ charmcraft status blackbox-exporter-k8s
Track    Base                  Channel    Version    Revision    Resources                                            
latest   ubuntu 20.04 (amd64)  stable     -          -           -                                                    
                               candidate  -          -           -                                                    
                               beta       1          1           blackbox-exporter-image (r1)                         
                               edge       1          1           blackbox-exporter-image (r1)                         
         ubuntu 22.04 (amd64)  stable     -          -           -                                                    
                               candidate  -          -           -                                                    
                               beta       -          -           -                                                    
                               edge       3          3           blackbox-exporter-image (r3)   

Expectation would be for this to happen for both bases. Similarly, the revs do not match between the bases from the get-go.

Tracking issue: Mimir distributed deployment

Deployment Graph

Workloads:

  • coordinator: nginx + Grafana agent
  • worker: Mimir
graph LR

subgraph mimir
worker1 ---|mimir-cluster| coordinator+gagent
worker2 ---|mimir-cluster| coordinator+gagent
worker3 ---|mimir-cluster| coordinator+gagent


end

subgraph cos-lite
coordinator+gagent ---|logging| loki
coordinator+gagent ---|remote-write| prometheus
coordinator+gagent ---|dashboards| grafana
coordinator+gagent ---|trace| tempo

end

Alert rules and dashboards belong to the coordinator.

Issues to complete

Mimir Worker

Mimir Coordinator

Others

Releasing libs should depends on a successful release to edge

Currently, on merge, libs are released separately from release to edge.
This is a problem because if release to edge fails, then charmhub would have a lib version that is not yet part of the charm itself.

Maybe releasing the libs should be a step in the release-to-edge workflow?

image

Runners crash when building too many rocks

When a PR is modifying multiple rocks, the rock-pull-request.yaml workflow will try to build all of them. However, without some cleanup we fill up the disk of the runner, which in turn makes CI fail.

We should run at least rockcraft clean (which might be enough if we run it as we build every rock), or something along the lines of:

for instance in $(lxc list --project= rockcraft --format json | jq -r '.[] | select(.name | startswith("rockcraft-")) | .name'); do
    lxc --project=rockcraft delete $instance
done

Dump logs on itest failure

Inspired by kubeflow, shall we add a log dump to the itests workflow?

    - name: Dump deployments
      if: failure()
      run: kubectl describe deployments -A
    - name: Dump replicasets
      if: failure()
      run: kubectl describe replicasets -A
    - name: Dump pods and their logs
      if: failure()
      shell: bash
      run: |
        juju status --relations --storage
        kubectl get pods -A -o=jsonpath='{range.items[*]}{.metadata.namespace} {.metadata.name}{"\n"}' --sort-by=.metadata.namespace | while read namespace pod; do kubectl -n $namespace describe pod $pod; kubectl -n $namespace logs $pod --all-containers=true --tail=100; done

cc: @lucabello @ca-scribner

Investigate a solution to push messages on critical issues to users

We should implement automation so that we can tag GitHub issues and run automatically post the message by linking the issue; this should happen through multiple mediums like Mattermost, RSS feed, a mailing list, etc.

We should figure out if we want to write a bot to run somewhere, use the GitHub bot, or different things.

CI should fail if a tox env doesn't exist

tox will happily pass for non-existing testenvs:

$ tox -e volkswagen
  volkswagen: OK (0.84 seconds)
  congratulations :) (0.87 seconds)

This is especially misleading when instead of unit we have unit-k8s and unit-machine.

We should probably change the CI command to run tox to:

if tox -l | grep -q '^unit$'; then tox -e unit; else echo "Error: 'unit' testenv does not exist"; fi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.