Comments (5)
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5487.
This message was autogenerated
from bundle-kubeflow.
admission-webhook
Code doesn't implement any metrics although there are some references in its go.mod and .sum files.
argo-controller
- Upstream docs on metrics exposed https://argo-workflows.readthedocs.io/en/latest/metrics
- argo-controller.txt
dex
- Code in upstream implementing metrics https://github.com/dexidp/dex/blob/088339fc287a24de23ffcfe7985287cef2a3b2fa/cmd/dex/serve.go#L26-L30
- dex-auth.txt
envoy
- Metrics include (they call them statistics) https://www.envoyproxy.io/docs/envoy/v1.15.0/configuration/upstream/cluster_manager/cluster_stats https://www.envoyproxy.io/docs/envoy/v1.15.0/configuration/http/http_conn_man/stats
- envoy.txt
istio-gateway
- charm endpoint: https://github.com/canonical/istio-operators/blob/e0d2c0261a3760366b87e7be27100393009967ac/charms/istio-gateway/src/manifest.yaml#L39-L40
- Upstream docs on metrics (it's an envoy): https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/observability/statistics.html https://istio.io/latest/docs/ops/configuration/telemetry/envoy-stats/
- istio-gateway.txt
istio-pilot
jupyter-controller
- Provides default golang/prometheus and custom metrics
- jupyter-controller.txt
jupyter-ui
Code doesn't implement any metrics.
katib-controller
- Provides default golang/prometheus and custom metrics.
- katib-controller.txt
katib-db-manager
Code doesn't implement any prometheus metrics.
katib-ui
Code doesn't implement any metrics.
kfp-api
- Upstream endpoint https://github.com/kubeflow/pipelines/blob/5399585b6a0f92446bcfc5a7588f2a85ea0fe6a3/backend/src/apiserver/main.go#L208
- custom metrics in code
- kfp-api.txt
Pipeline steps don't expose metrics by default. Feature requests:
kfp-metadata-writer
Code doesn't implement anything related to metrics https://github.com/kubeflow/pipelines/tree/master/backend/metadata_writer
kfp-persistence
Code doesn't implement any metrics. The only reference to metrics
is about "metrics" provided from the application for exposing artifacts in the UI.
kfp-profile-controller
Code doesn't implement any metrics.
kfp-schedwf
Code doesn't implement any prometheus metrics.
kfp-ui
Code doesn't implement any prometheus metrics.
kfp-viewer
Code doesn't implement any prometheus metrics.
kfp-viz
Code doesn't implement any prometheus metrics.
knative-eventing & knative-serving
- They do expose metrics
- However, through one shared exporter (can't use for querying
up
metric of those charms) - knative-eventing-serving.txt
knative-operator
- Exposes metrics through this endpoint (but charm doesn't configure its service right now)
- knative-operator.txt
kserve-controller
- Workload exposes metrics at endpoint (uses
/metrics
by default) although our charm doesn't patch the service to expose this endpoint. - Implements some kind of default k8s metrics.
- kserve-controller.txt
Side note that if we 'd like ISVCs to expose metrics too (via their own endpoints - docs), we should modify the values in the configmap according to this guide.
kubeflow-dashboard
- implements metrics but are only accessible in the case of a deployment in GCE. When deploying CKF, that's what the dashboard logs:
2024-04-02T08:03:21.120Z [serve] > [email protected] serve 2024-04-02T08:03:21.120Z [serve] > node dist/server.js 2024-04-02T08:03:21.120Z [serve] 2024-04-02T08:03:23.555Z [serve] Initializing Kubernetes configuration 2024-04-02T08:03:23.611Z [serve] Unable to fetch Application information: 404 page not found 2024-04-02T08:03:23.611Z [serve] 2024-04-02T08:03:23.637Z [serve] "other" is not a supported platform for Metrics 2024-04-02T08:03:23.638Z [serve] Using Profiles service at http://kubeflow-profiles.kubeflow:8081/kfam 2024-04-02T08:03:23.645Z [serve] Server listening on port http://localhost:8082 (in production mode)
- Curling the pod's endpoint
<pod-ip>:80802/api/metrics
returns{"error":"Operation not supported"}%
Thus, metrics are not available from upstream in our case.
kubeflow-profiles
kfam
- implements metrics at endpoint
- golang/prometheus and custom metrics
- kfam.txt
profiles
- implements metrics at endpoint
- golang/prometheus and custom metrics
- kubeflow-profiles.txt
kubeflow-roles
There isn't an upstream app for this charm.
kubeflow-volumes
Code doesn't implement any metrics.
metacontroller
- Implements golang/prometheus plus some custom metrics at endpoint (already exposed by charm)
- metacontroller-operator.txt
minio
mlmd
Code doesn't implement any metrics.
oidc-gatekeeper
There are some references to prometheus packages in go.mod and .sum files but nothing is implemented in its code.
pvcviewer-operator
- Code defines a metric endpoint and also has the corresponding manifests too (auth endpoint, serviceMonitor)
- However, I couldn't access those although the charm exposes the port
This is probably due to this note in upstream manifests. We should probably remove this from our charm command (although I didn't try and need to verify this). It could also be that we have not included the Prometheus manifests in our case.
k exec envoy-operator-0 -n kubeflow -- curl pvcviewer-operator.kubeflow.svc.cluster.local:8443/metrics Client sent an HTTP request to an HTTPS server.
- Thus, no text file containing exposed metrics here.
seldon-controller-manager
- Implements metrics.
- Upstream docs on metrics and here
- seldon-core.txt
tensorboard-controller
- Implements metrics and defines an endpoint in manifests. This also (like pvcviewer-operator) has references about putting metrics behind auth so we should be careful when implementing it (looks like now it's not behind auth)
- tensorboard-controller.txt
tensorboard-web-app
Code doesn't implement any metrics.
training-operator
- Metrics are exposed in endpoint
- Golang/prometheus and custom metrics are implemented. Custom metrics are also described here.
- (not 100% releveant but) Jobs implement some custom metrics too (references 1, 2 and 3)
- training-operator.txt
from bundle-kubeflow.
Upstream apps that do not already expose metrics (wip)
- admission-webhook
- jupyter-ui (jupyter-web-app)
- katib-db-manager
- katib-ui
- kfp-metadata-writer
- kfp-persistence
- kfp-profile-controller
- kfp-schedwf
- kfp-ui
- kfp-viewer
- kfp-viz
- kubeflow-dashboard (explanation in previous comment)
- kubeflow-roles. This isn't an upstream app but we 'd still need an exporter if we 'd like metrics from this charm.
- kubeflow-volumes (volumes-web-app)
- mlmd
- oidc-authservice
- tensorboards-web-app
from bundle-kubeflow.
Regarding all the K8s Controllers from kubeflow/kubeflow
(notebooks, profiles, tensorboards) they will get some quite useful metrics by default because of controller-runtime
golang package, that comes with Kubebuilder
https://book.kubebuilder.io/reference/metrics-reference
Those are perfect for capturing if the controllers are working as expected, and it's great it will be handled by default.
In order for this to happen though, someone upstream will need to bump the controller-runtime
package from 0.11 to 0.16.3
from bundle-kubeflow.
from bundle-kubeflow.
Related Issues (20)
- Pin integration tests deployed dependencies of repos [21-30] HOT 2
- Pin integration tests deployed dependencies of repos [31-39] HOT 3
- Verify that podspec charms can be deployed using `juju 3.5` HOT 8
- bump version of `ops` used by all charms as part of the Charmed Kubeflow 1.9 release HOT 2
- Bump the `build-on`/`run-on` base for all Charmed Kubeflow charms HOT 1
- Update the Kubeflow notebook creation page docs for Kubeflow 1.9 HOT 1
- Ensure that Loki epic is ready for 24.10 HOT 1
- Ensure that Monitoring: Metrics epic is ready for 24.10 HOT 1
- Write a spec for a generic metrics exporter HOT 1
- Implement charms state grafana dashboard HOT 2
- docs: Add documentation page with information for each application metrics and/or current alerts HOT 2
- docs: Create a reference page with all current grafana dashboards available HOT 2
- chore: Bump o11y libs in CKF charms HOT 2
- `[mysql-k8s]` Create public documentation for data backups and restoration HOT 1
- `[PVCs]` Create public documentation for data backups and restoration HOT 4
- `ModuleNotFoundError: No module named 'markupsafe'` error at build time HOT 4
- Have automated CI that bumps observability (at least) libs HOT 2
- Refactor Charmed Kubeflow documentation to make it clear which docs are aimed at administrators vs users HOT 2
- Tracker for things to improve how we release rocks, integrate them into charms, and release charms HOT 1
- mysql ha HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bundle-kubeflow.