Giter VIP home page Giter VIP logo

temporal-operator's Introduction

temporal-operator

The Kubernetes Operator to deploy and manage Temporal clusters.

Using this operator, deploying a Temporal Cluster on Kubernetes is as easy as deploying the following manifest:

apiVersion: temporal.io/v1beta1
kind: TemporalCluster
metadata:
  name: prod
  namespace: demo
spec:
  version: 1.23.0
  numHistoryShards: 1
  persistence:
    defaultStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal
        connectAddr: postgres.demo.svc.cluster.local:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: postgres-password
        key: PASSWORD
    visibilityStore:
      sql:
        user: temporal
        pluginName: postgres
        databaseName: temporal_visibility
        connectAddr: postgres.demo.svc.cluster.local:5432
        connectProtocol: tcp
      passwordSecretRef:
        name: postgres-password
        key: PASSWORD

Documentation

The documentation is available at: https://temporal-operator.pages.dev/.

Quick start

To start using the Operator and deploy you first cluster in a matter of minutes, follow the documentation's getting started guide.

Examples

Somes examples are available to help you get started:

Compatibility matrix

The following table shows operator compatibility with Temporal and Kubernetes. Please note this table only reports end-to-end tests suite coverage, others versions may work.

Temporal Operator Temporal Kubernetes
v0.18.x v1.19.x to v1.23.x v1.25 to v1.29
v0.17.x v1.18.x to v1.22.x v1.25 to v1.29
v0.16.x v1.18.x to v1.22.x v1.24 to v1.27
v0.15.x v1.18.x to v1.21.x v1.24 to v1.27
v0.14.x v1.18.x to v1.21.x v1.24 to v1.27
v0.13.x v1.18.x to v1.20.x v1.24 to v1.27
v0.12.x v1.18.x to v1.20.x v1.23 to v1.26
v0.11.x v1.17.x to v1.19.x v1.23 to v1.26
v0.10.x v1.17.x to v1.19.x v1.23 to v1.26
v0.9.x v1.16.x to v1.18.x v1.22 to v1.25

Roadmap

Features

  • Deploy a new temporal cluster.
  • Ability to deploy multiple clusters.
  • Support for SQL datastores.
  • Deploy Web UI.
  • Deploy admin tools.
  • Support for Elastisearch.
  • Support for Cassandra datastore.
  • Automatic mTLS certificates management (using cert-manager).
  • Support for integration in meshes: istio & linkerd.
  • Namespace management using CRDs.
  • Cluster version upgrades.
  • Cluster monitoring.
  • Complete end2end test suite.
  • Archival.
  • Auto scaling.
  • Multi cluster replication.

Contributing

Feel free to contribute to the project ! All issues and PRs are welcome! To start hacking on the project, you can follow the local development documentation page.

License

Temporal Operator is licensed under Apache License Version 2.0. See LICENSE for more information.

temporal-operator's People

Contributors

alexandrevilain avatar aoao54 avatar brianirish avatar debuggerpk avatar demch1k avatar dependabot[bot] avatar dmytro-orlenko avatar dodgecamaro avatar ed-marks avatar ganievs avatar jashandeep-sohi avatar ktenzer avatar mcombspangea avatar michaelcombs28 avatar yujunz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

temporal-operator's Issues

CSV release needs to update containerImage and replaces

containerImage should be the same as image.
containerImage: ghcr.io/alexandrevilain/temporal-operator:v0.10.0

Replaces should be the last version of the operator, for 0.10.0 we will replace 0.9.1
replaces: temporal-operator.v0.9.1

Add feature to provide performance profiles

This likely won't be an exact science as performance can vary drastically depending on workload. However there are likely some patterns we can extrapolate and some knobs / dials that would work well with those assumptions.

We have a sliding multiplier (we have a base and it can be increased by 1-5x for example). Ideally we would do STNs (state transitions per second) but I don't think this would be possible as there is too much variance.

We would tune things like shards and other parameters as well as how many replicas of each type (frontend, history, matching, worker) and of course their resource limits (CPU/Mem).

We could also deploy our benchmark tool maru: https://github.com/temporalio/maru as a way to allow users to verify easily.

For the spec I was thinking something like

...
spec:
  performance:
    profile: base, 1x,2x, 3x, 4x, 5x     
... 

UI Completes / CrashLoopBackOff

UI is running temporalio/ui:2.5.0

The process logs are:

2023/01/04 22:44:03 Loading config; env=docker,configDir=config
2023/01/04 22:44:03 Loading config files=[config/docker.yaml]

After which, it exists with code 0 and completes.

I tried upgrading to ui 2.9.0, but the logs do change and it crashes:

2023/01/04 22:46:20 Loading config; env=docker,configDir=config
2023/01/04 22:46:20 Loading config files=[config/docker.yaml]
config file corrupted: yaml: unmarshal errors:
  line 2: cannot unmarshal !!str `tcp://1...` into int

Add mutating and validating admission webhooks

For now, the operator has no defaulting or validating webhooks.
Default field values are checked before starting the first reconciliation and validation is done during the reconciliation too.
This leads to allowed manifest apply on cluster, but bad reconciliations.

We should improve UX using mutating and validating admission webhooks.

Improve how schema migrations are applied

The current implementation is pretty naive.
It should be improved by :

  • determining from the desired temporal version, the desired schema version. (Maybe we will have to maintain a compatibility matrix in this repository).
  • applying the schema update if the current schema version doesn't match the desired schema version
  • otherwise do nothing.

I think it will help reduce database connections done by the operator and be a first step toward supporting version upgrades.

Create operator bundle and add to Operatorhub

I would like to propose adding this operator to OperatorHub by creating the necessary manifests and building a bundle.

I have created a sample CSV here: https://gist.githubusercontent.com/ktenzer/74874e5d902d60115654c31135928314/raw/eb0dfdfbf43e4c4469bb0374d09f7cd4aa69631f/temporal-operator.clusterserviceversion.yaml

The CSV can be previewed here: https://operatorhub.io/preview

Things to do:

  • I would still need to add sample spec for the namespace and client CRDs
  • Add to CI/CD build process to generate bundle after building operator
  • Replace current CRD endpoints .apps.alexandrevilain.dev with .temporal.io
  • Consider replacing k8s v1alpha1 with just v1 for Operator since the channel provides way to distinguish alpha from stable, etc which is more elegant
  • Decide how to do versioning
  • Decide what channels to provide (probably just alpha for now)

Jobs should be cleaned up and not left around

The job spec provides a ttl .spec.ttlSecondsAfterFinished which once elapsed will delete the job.

Make the .spec.ttlSecondsAfterFinished configurable and also make owner reference of jobs dependent from controller so they arent re-created when deleted by the controller.

Elasticsearch not working after configured

After deploying elasticsearch and connecting to admintoold pod I run following command:
$ tctl adm cluster gsa

This should return the default elastic search attributes but instead throws bad request (400).

Namespace creation through CRD isn't working

As of latest merge namespace creation no longer works

Status:
  Conditions:
    Last Transition Time:  2022-09-29T00:11:18Z
    Message:               can't create "default" namespace: context deadline exceeded
    Observed Generation:   1
    Reason:                LastReconcileCycleFailed
    Status:                True
    Type:                  ReconcileError
Events:                    <none>

Tested from admin pod and namespace creation working.

# tctl namespace re
Namespace default successfully registered.

Also we might want to reconsider renaming CRDs. Been playing with it and because k8s has namespace object you need to reference as temporal.io.namespace. That might be confusing.

I would propose re-consider TemporalNamespace, TemporalCluster and TemporalClusterClient. WDYT?

Reconcile errors (cosmetic)

These are more cosmetic errors based on object changing while being updated but it would be nice to avoid them if possible.

1.6638699037493746e+09	ERROR	Can't reconcile resources	{"controller": "temporalcluster", "controllerGroup": "apps.alexandrevilain.dev", "controllerKind": "TemporalCluster", "temporalCluster": {"name":"prod","namespace":"temporal"}, "namespace": "temporal", "name": "prod", "reconcileID": "e2f53a27-5f93-40bd-aa5d-2c184441bf0e", "error": "Operation cannot be fulfilled on deployments.apps \"prod-admintools\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.6638699042728388e+09	ERROR	Can't reconcile resources	{"controller": "temporalcluster", "controllerGroup": "apps.alexandrevilain.dev", "controllerKind": "TemporalCluster", "temporalCluster": {"name":"prod","namespace":"temporal"}, "namespace": "temporal", "name": "prod", "reconcileID": "eec5e0a6-54a5-44ca-97af-aa0735399024", "error": "Operation cannot be fulfilled on deployments.apps \"prod-history\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.6638699042732098e+09	DEBUG	events	Warning	{"object": {"kind":"TemporalCluster","namespace":"temporal","name":"prod","uid":"51555dfc-2690-4fe0-94e3-3bfe589f3443","apiVersion":"apps.alexandrevilain.dev/v1alpha1","resourceVersion":"30391000"}, "reason": "ProcessingError", "message": "Operation cannot be fulfilled on deployments.apps \"prod-history\": the object has been modified; please apply your changes to the latest version and try again"}
1.6638699043738122e+09	ERROR	Reconciler error	{"controller": "temporalcluster", "controllerGroup": "apps.alexandrevilain.dev", "controllerKind": "TemporalCluster", "temporalCluster": {"name":"prod","namespace":"temporal"}, "namespace": "temporal", "name": "prod", "reconcileID": "eec5e0a6-54a5-44ca-97af-aa0735399024", "error": "Operation cannot be fulfilled on temporalclusters.apps.alexandrevilain.dev \"prod\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/ktenzer/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

Custom annotations for pods

Perhaps there is a general operator pattern that I am missing, but I'm looking to add some custom annotations to the pods so that I can tell the Datadog Agent to scrape metrics. Is there a way I can pass through these annotations from the CRD? If this is the right way to go about this, but doesn't exist yet, would you take a PR for this?

Thanks for all the work on this operator! I'm pretty new to operators and have been doing lots of reading, so I'm enjoying poking around this one!

Add support for external credentials providers like Hashicorp Vault

User can handle databases secrets from others sources than kubernetes secret, like vault.
Most of the external credentials providers provides secrets to pods using sidecars + volume by writing secret to a file.
The operator should provide a way for users to use credentials comming from files.

The main issue I see is that the operator can't connect itself on the database to setup and maintain schema up-to-date to follow version upgrades.

Some strategies can be adopted:

  • If the users want to use an external credential provider, the operator does not reconcile persistence schemas. If the user has to do database migrations by itself, the operator has less interest ...
  • If the user uses an external credential provider, it should provide credentials for jobs and the operator spawn jobs running migrations. It seems hacky and It forces the operator to maintain two ways of running db migrations.
  • Maybe someone has another idea ..

Going towards 1.0.0

This issue is the main issue to list tasks to stabilize and release the operator in 1.0.0

Improve API

Improve unit test coverage

  • #295
  • Unit tests for pkg/temporal

Improve e2e test coverage

Improve build process

Improve reconciler logic

Improve documentation

Feel free to purpose new tasks so I can add tasks to this issue. Don't hesitate to create issues for tasks too.

SIGSEV on reconciling temporal worker process

Hi,

I am getting following error on reconciling Temporal worker process resource.

1.6692205965847738e+09  INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "temporalworkerprocess", "controllerGroup": "temporal.io", "controllerKind": "TemporalWorkerProcess", "TemporalWorkerProcess": {"name":"test","namespace":"dev-stage"}, "namespace": "dev-stage", "name": "test", "reconcileID": "49490e44-9277-4f1d-972c-5c2288ffb791"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1c0a7e6]

Please see full log and my manifests in the attachment.

Thanks,
Premek
temporallogs.tar.gz

Error with WorkerProcess

My YAML looks like this:

apiVersion: temporal.io/v1beta1
kind: TemporalWorkerProcess
metadata:
  name: test
spec:
  version: latest
  replicas: 3
  image: ghcr.io/rawkodeacademy/studio-workflows-youtube
  pullPolicy: "Always"
  clusterRef:
    name: temporal
    namespace: studio
  temporalNamespace: default

My expectation is that this image will be run with enough environment variables to configure the Temporal client for the cluster.

However, we get the following error:

2023-01-04T17:59:18Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "temporalworkerprocess", "controllerGroup": "temporal.io", "controllerKind": "TemporalWorkerProcess", "TemporalWorkerProcess": {"name":"test","namespace":"studio"}, "namespace": "studio", "name": "test", "reconcileID": "f9cbef8c-5b9e-4f6d-b0a2-2c1a72cebf9b"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1cf66c9]

goroutine 448 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x1fa
panic({0x1ef1b00, 0x381ba30})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/alexandrevilain/temporal-operator/pkg/resource/workerprocess.(*DeploymentBuilder).Update(0xc00075e690, {0x26fbfc0?, 0xc00082c500?})
        /workspace/pkg/resource/workerprocess/workerprocess_deployment_builder.go:71 +0x409
github.com/alexandrevilain/temporal-operator/pkg/reconciler.(*Base).ReconcileBuilders.func1()
        /workspace/pkg/reconciler/base.go:133 +0x2d
sigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate(0x26c2600?, {{0xc000a3acf0?, 0x0?}, {0xc0005a11c0?, 0x26e2a38?}}, {0x26fbfc0, 0xc00082c500})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:339 +0x4f
sigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate({0x26e2a38, 0xc000faf920}, {0x26ed068, 0xc0008a73e0}, {0x26fbfc0?, 0xc00082c500}, 0x0?)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:201 +0x13f
github.com/alexandrevilain/temporal-operator/pkg/reconciler.(*Base).ReconcileBuilders(0xc000ccb4d0, {0x26e2a38, 0xc000faf920}, {0x26c5db8, 0xc000c9e000}, {0xc0001ba9f0, 0x1, 0x2?})

The nil reference appears to be accessing builder.buildAttempt, which confused me. So I updated the YAML with:

builder:
  enabled: false

However, this continues to error - only a little different:

2023-01-04T18:00:28Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "temporalworkerprocess", "controllerGroup": "temporal.io", "controllerKind": "TemporalWorkerProcess", "TemporalWorkerProcess": {"name":"test","namespace":"studio"}, "namespace": "studio", "name": "test", "reconcileID": "8b0ad98b-32d0-4558-8916-bf2a8e7b93a4"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1d07ef2]

goroutine 423 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x1fa
panic({0x1ef1b00, 0x381ba30})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/alexandrevilain/temporal-operator/controllers.(*TemporalWorkerProcessReconciler).reconcileDefaults(0x0?, {0x0?, 0x0?}, 0xc000582700)
        /workspace/controllers/temporalworkerprocess_controller.go:228 +0x92
github.com/alexandrevilain/temporal-operator/controllers.(*TemporalWorkerProcessReconciler).Reconcile(0xc000bb9770, {0x26e2a38, 0xc000c97050}, {{{0xc000bd6244?, 0x10?}, {0xc000bd6240?, 0x40da87?}}})
        /workspace/controllers/temporalworkerprocess_controller.go:79 +0x1bb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x26e2a38?, {0x26e2a38?, 0xc000c97050?}, {{{0xc000bd6244?, 0x1e4dbe0?}, {0xc000bd6240?, 0x0?}}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000b76be0, {0x26e2990, 0xc000b50ac0}, {0x1f85960?, 0xc0007de4c0?})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323 +0x38f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000b76be0, {0x26e2990, 0xc000b50ac0})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2

Integrate with Temporal Cloud

In addition to working with OSS Temporal cluster the operator should also work with Temporal Cloud.

TemporalWorkerProcess

  • Provide connection string to access temporal cloud: namespace.accountId.tmprl.cloud:7233 via ENV parameter
  • Consume mtls certificates inside container image (TemporalClusterClient should provide cert so just need to consume them and ensure it works with builder process)

TemporalClusterClients

  • Support cluster without clusterref since temporal cloud isnt running on k8s locally
  • Update certificate in temporal cloud when cert rotation occurs?

TemporalNamespaces

  • Create namespace in Temporal cloud using tcld
  • Add certificate (public)
  • Set retention
  • Set or update and customer search attributes

@alexandrevilain I am not sure how to deal with cert-manager and cert rotation? So once we create namespace and upload cert to Temporal Cloud how could we update cert if it gets rotated by cert-manager? Would we need to listen for cert-manager API events and trigger and update that way? Can you add any implementation details I am missing?

v1beta1 API and GroupVersionKind update

This issue is here to open discussion about API changes before going to v1.0.0

API

The current state of the operator's API:

  • apps.alexandrevilain.dev/v1alpha1.TemporalCluster
  • apps.alexandrevilain.dev/v1alpha1.TemporalClusterClient
  • apps.alexandrevilain.dev/v1alpha1.TemporalNamespace

I find the Temporal prefix for each CRDs redoundant.
Maybe we can rename them to:

  • temporal.alexandrevilain.dev/v1beta1.Cluster
  • temporal.alexandrevilain.dev/v1beta1.Client
  • temporal.alexandrevilain.dev/v1beta1.Namespace

The v1beta1 API version can be the starting point to declare the API stable and stop making breaking changes to it.

Webhooks

The v1beta1 can also be the first API with defaulting and validating webhooks.

As webhooks need certificates two solutions are available:

  • Asking the user to install the operator alongside cert-manager
  • Using cert-controller

Any thoughts

I would be happy to get any feedbacks about this idea.๐Ÿ™‚

Support Advanced Visibility with Elasticsearch

TODO:

  • Support Elasticsearch in CRD (in DatastoreSpec)
  • Create Elasticsearch indexes when reconciliing persistence
  • Report elasticsearch schema version in status
  • Add en example to deploy Elasticsearch and use it in a temporal cluster

Improve controller logging

For now, the controller logs some actions but does not provide useful logs to help the platform administrator understand the operator's behavior.
The persistence manager uses temporal's default CLI logger, we should have a log adapter to keep logs consistent in the whole operator code base.

Add new CRD TemporalWorker to manage worker fleets

It would be great to have a CRD endpoint to manage workers. The temporal-operator could then be used not only by those self-hosting but also to help scale and manage workers which would likely be controlled by another group (developers instead of platform engineers).

Initial ideas:

  • CR should control worker farm for a given app/workflow
  • CR should allow setting resource limits
  • CR should support worker mtls and handling cert bootstrap via ClusterClient CR
  • CR should allow image to be selected
  • CR should perform upgrade strategy if the image is changed
  • CR should have clear naming convention and differentiate with internal worker(s) deployed by temporal server CR

Nice to have:

  • CR to support autoscaling using server metrics to detect worker starvation
  • CR to support code injection through mount or maybe sidecar
  • CR to support SDK metrics metering
  • CR to support CI through tekton or something else

Add Prometheus endpoint to config for server metrics scraping

Under global params of config map we should add binding for prometheus endpoint.

...
  metrics:
    tags:
      type: {{ .Env.SERVICES }}
    prometheus:
      timerType: histogram
      listenAddress: "0.0.0.0:9090"

Each service can be scrapped upon configuring.
curl http://temporaltest-frontend-headless:9090/metrics
curl http://temporaltest-history-headless:9090/metrics
curl http://temporaltest-matching-headless:9090/metrics
curl http://temporaltest-worker-headless:9090/metrics

Add support for automatic cluster upgrades

This is mainly done but some things are needed before declaring it's ok:

  • add e2e test suite for each supported upgrade paths
  • support elasticsearch schema migration (v1.17 introduces elasticsearch schema v2)

Add support for mysql8 and postgres12 sql driver names

Support for Temporal v1.20.x

Temporal v1.20.0 has been released.
This release provides many news features like Advanced Visibility for SQL databases and internal Frontend.
This PR is listing needed work to fully support v1.20.x new features:

Support creating default namespace using the CRD

When we spin up a temporal cluster, we have to create namespaces using tctl or with custom code using the SDK.
It can be a great feature if the operator can create them.
For instance:

apiVersion: apps.alexandrevilain.dev/v1alpha1
kind: TemporalCluster
metadata:
  name: prod
  namespace: demo
spec:
  version: 1.17.0
  numHistoryShards: 1
  [...]
  namespaces:
    - default
    - accounting

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.