oam-dev / rudr Goto Github PK

View Code? Open in Web Editor NEW

1.5K 81.0 136.0 4.1 MB

A Kubernetes implementation of the Open Application Model specification

Home Page: https://oam.dev

License: MIT License

Rust 97.05% Dockerfile 0.86% Makefile 1.12% Smarty 0.89% Shell 0.08%

kubernetes cloud-native distributed-applications serverless

rudr's People

Contributors

Stargazers

Watchers

Forkers

technosophos suhuruli wonderflow bacongobbler philliphoff hongchaodeng rbitia rickducott alexmt isgasho sharajava fengzixu slievrly chakra-coder pingf jiria justcoworker slack servicefoundation jsbxyyx desmax74 kittboy yllierop anubhavmishra martinpeck loster-smith samnela amit794 squillace nuaays cregod forkkit shaunstanislauslau mraipsec-mra opotmans codeaudit ichekrygin rajkrishnamurthy lachie83 gokube jonneyzhang bharatnarasimman dumbnose likevi-msft intellifora bert-r macolso isurulucky glybuaa forestsia chenbilong noelbundick deenamanick gunniwho sutthipong cronologi garethr vshan beaver-company rajadeepan sowsan zuodh nilpower resouer xjump zhxu2 mikkelhegn colstuwjx mandur gaoxp answer1991 sparty02 andrebriggs farck xuzhou911 happysky2046 hokhyk fate-grand-order 2morales andypeng2015 ryanzhang-oss liangbaolin clarkezone hasheddan iyyappam handong890 simoncqk cloudmelon vturecek abhirockzz allenstecat abchow nickomang artursouza ming535 yisaer virajs hangyejiadao1993 li-yanzhi xp10102232

rudr's Issues

Integrate SMI

The Service Mesh Interface or SMI allows us to have a vendor neutral interface so infrastructure admins have the flexibility to choose a service mesh that works for them. We will start by integrating SMI and providing Linkerd and Consul Connect as implementations first. From there we hope to integrate more service mesh implementations as the community sees fit.

App operators will be able to use traffic routing rules to split traffic between versions of an application with a service mesh integration for example. These bits of functionality will be exposed as traits per component.

Service Ports default to 80

I am experimenting with a ReplicatedService. I specify a containerPort in the ComponentSchematic.

The k8s service is created and the targetPort matches above, but the service port itself is always 80. Is there any way to control this? Or at least match it to the containerPort?

Add `instanceName` support to ops config

We need to add instanceName support to the ops config for a component.

Integrate HPA as a trait to enable autoscaling

Integrate the Kubernetes Horizontal Pod Autoscaler into Hydra to allow users to scale their applications on CPU and Mem metrics. Each 'component' in a developers config that's type replicated service twill be able to use this autoscaling trait.

We should also create a path forward for users who want to scale on custom metrics but the actual implementation can come after we release.

Allow users to create PVCs

Depending on the PersistentVolumes (PV) paved in the cluster a user can make PersistentVolumeClaims (PVCs) to specify volumes as a trait for their components.

The infrastructure admin must have paved that kind of PV for static provisioning or they must of created and configured a storage class for dynamic provisioning.

Rename Component to ComponentSchematic

At one point we were going to change the Hydra spec to call ComponentSchematic simply Component. Scylla followed the proposed naming change. However, the change never was formally added to the spec. So we need to rename Component to ComponentSchematic.

Do we have any rust code statute？

For example, my IDE told me that the client can be omitted by just one. So should we write like this?

Or just keep it ?

scylla reinstallation error

I was able to successfully install scylla in AKS but after scaling the cluster none of the components/configuration installation was successful. Because of that, I have deleted the deployment using helm and tried to install it again. But i'm stuck here with the below error. I had to manually remove the crd as it was deleting as part of the helm delete scylla. Thanks for your help.

helm install --name scylla ./charts/scylla --wait
Error: validation failed: unable to recognize "": no matches for kind "Trait" in version "core.hydra.io/v1alpha1"

kubectl get trait
error: the server doesn't have a resource type "trait"

Mount cache directory to Dockerized build

It should be possible to make a directory (say, _cache) that can be mounted rw during docker build. That way, we won't incur the global rebuild of all of Rust's libraries each time we do make docker-build.

Helm: Pod in CrashLoopBackOff

After deploying with Helm the pod is never able to start. I can see the following information in the logs:

thread 'main' panicked at 'component reflector cannot be created: Error(Status(403), "https://10.0.0.1/apis/core.hydra.io/v1alpha1/namespaces/default/components?")', src/libcore/result.rs:999:5

Which refers to line 44 in main.rs.

let component_cache: Reflector<Component, Status> =
        Reflector::new(client.clone(), component_resource.clone().into())
            .expect("component reflector cannot be created");

Are we lacking some permissions?

Add health check to Scylla

Right now, Scylla runs as a non-server watcher on a pod. To make it work as a proper Kubernetes controller, though, we need a health check.

Option A: Set up an Actix server in main.rs

Here's an example: https://github.com/clux/controller-rs/blob/master/src/main.rs#L34-L48

Option B: Write an external health check sidecar

Option B is a huge pain, but Option A is probably pretty straightforward.

Proposal: Trait phases

So some traits are pre-boot, while others are post-boot. And still others may be both.

Examples:

Setting up a PVC is pre-boot (has to be done before starting a pod)
Setting up an Ingress is post-boot (has to be done after the pod is running)
One could imagine a networky thing where setup was required pre-boot, and then further wiring was required once the pod was running.

THere could even be a more fine-grained set of policies than pre and post boot.

So I'm thinking that we internally implement a traits lifecycle system where any given trait can respond to several lifecycle hooks. Since this is not something that the spec folks are generally in agreement on, this will be a Scylla-specific implementation for now.

Add ComponentInstance CRD

We recently discussed adding a new CRD that is not in the Hydra spec. This type would be ComponentInstance and would track each instance of a component.

Relationally, an OpsConfig is the parent of N ComponentInstances, each of which is the parent of the component's workload type implementation. So an ops config with a Singleton component and a Task component would have two component instances, one of which would have the owner reference for the Deployment/Pod of the Singleton, and the other would have the owner reference for the Job.

Currently, the only utility this buys us is allowing the user to easily discover the hierarchy of the ops config

Proposal: How namespacing should work

The controller should listen for new OpsConfigurations on all namespaces.
When a new configuration comes in, the controller should take note of the namespace into which that configuration is created
All resources that the controller creates that are not global are placed into the same namespace as the configuration

In this model, it is not possible to create a configuration that contains components deployed in different namespaces, but it may be possible to create configurations in multiple namespaces that contain components that can communicate with each other.

This may have limitations when scopes are introduced, since we will have to decide whether scopes are global or are namespaced.

Implement the `[variable(NAME)]` function in YAML for ApplicationConfigurations

For post-processing of operational configurations, there are a few places where the syntax [fromVariable(NAME)] should result in a substitution of a variable name into a value. This is described here: https://github.com/microsoft/hydra-spec/blob/master/6.operational_configuration.md#properties

Public CI results?

Currently the azure pipeline CI results can not be shown unless granted some sort of access. It would be great to have some way to share results.

Umbrella issue for Scylla refactoring to Go

Our team has discussed and reached consensus on the following work.

This serves as the umbrella issue for tracking the goals and work to refactor Scylla.

The goals we want to achieve:

refactor the codebase from Rust to Go
use operator-sdk/kube-builder to implement Scylla as an k8s Operator

Here is a list of sub-tasks or features that we want to implement:

"rust to go" is the features that are current implemented in Rust and needs migration to Go.
"W" - Workload. "T" - Trait.

Hydra API	Implementation	status
ReplicatedService (W)	k8s Deployment	rust to go
ReplicatedTask (W)	k8s Job	rust to go
SingleTon (W)	k8s Pod+Service	rust to go
Task (W)	k8s Job	rust to go
AutoScaler (T)	k8s HPA	rust to go
Ingress (T)	k8s Ingress	rust to go
ManualScaler (T)	k8s Deployment replicas, Job parallelism	rust to go
StatefulService (W)	k8s StatefulSet	TODO
DaemonService (W)	k8s DaemonSet	TODO
CronTask (W)	k8s CronJob	TODO
Monitoring (T)	Prometheus	TODO
Rollout (T)	k8s Deployment, StatefulSet	TODO

Use owner references

I think it is possible to user owner references to link configurations to component instances, and component instances to all their resources.

Health Probes Fail

The deployment defines health probes, which currently fail after a deployment. The current yaml has a typo listing port: http instead of port: 8080. The deployment also lists a containerPort of 80, and looking at the main.rs code it looks like it should be 8080. I tried updating the containerPort to 8080 and both probe ports to 8080, but the health checks still failed.

Warning  Unhealthy  <invalid> (x6 over <invalid>)    kubelet, aks-nodepool1-28966993-0  Liveness probe failed: Get http://10.244.0.19:8080/health: dial tcp 10.244.0.19:8080: connect: connection refused                
Warning  Unhealthy  <invalid> (x94 over <invalid>)   kubelet, aks-nodepool1-28966993-0  Readiness probe failed: Get http://10.244.0.19:8080/health: dial tcp 10.244.0.19:8080: connect: connection refused

I removed the health checks from the deployment to get past it for now.

Don't panic if no componentschematics CR exists

➜  scylla git:(fix404) RUST_LOG="scylla=debug" RUST_BACKTRACE=short cargo run
   Compiling scylla v0.1.0 (/Users/sunjianbo/codebox/scylla)
    Finished dev [unoptimized + debuginfo] target(s) in 16.13s
     Running `target/debug/scylla`
Error: Error { inner: stack backtrace:
   0: failure::backtrace::internal::InternalBacktrace::new::h4b8e3b3717dcd6bb (0x10cf0f65f)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/internal.rs:44
   1: failure::backtrace::Backtrace::new::h47f946e0254b7690 (0x10cf0f3ee)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/mod.rs:111
   2: failure::context::Context<D>::new::hcc4c441593440f30 (0x10c68d7b1)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/context.rs:84
   3: <kube::Error as core::convert::From<kube::ErrorKind>>::from::he90041a04b79ba96 (0x10c6852f9)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/lib.rs:80
   4: kube::client::APIClient::request::hf3374454a4b9b630 (0x10c2d5068)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/client/mod.rs:97
   5: kube::api::reflector::Reflector<K>::get_full_resource_entries::he9ecd491374f6bf8 (0x10c21a7ba)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/api/reflector.rs:165
   6: kube::api::reflector::Reflector<K>::init::h4dd0003b630f00b6 (0x10c21ae47)
             at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/api/reflector.rs:116
   7: scylla::main::hd307ca40df2bb7c5 (0x10c2bc8fc)
             at /Users/sunjianbo/codebox/scylla/src/main.rs:43
   8: std::rt::lang_start::{{closure}}::h5d5a11653c904935 (0x10c2cad61)
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:64
   9: std::panicking::try::do_call::h8037d9f03e27d896 (0x10d136047)
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/panicking.rs:293
  10: ___rust_maybe_catch_panic (0x10d13a52e)
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libpanic_unwind/lib.rs:85
  11: std::rt::lang_start_internal::hc8e69e673740d4ae (0x10d136b2d)
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:48
  12: std::rt::lang_start::h6524883d1453e01f (0x10c2cad41)
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:64
  13: _main (0x10c2bd6e1)

ApiError https://192.168.99.101:8443/apis/core.hydra.io/v1alpha1/namespaces/default/componentschematics?: Client Error: 404 Not Found ("Error(Status(404), \"https://192.168.99.101:8443/apis/core.hydra.io/v1alpha1/namespaces/default/componentschematics?\")") }

How to reproduce?

just delete all componentschematics and cargo run scylla

Add Cron trait for Tasks.

This may be a tricky one because Kubernetes distinguishes between Job and CronJob.... so it sorta changes the workload type

Create worker type

The worker type recently added to the spec is not implemented

what about create an organization docker hub account

The organization account can authorize our account to cooperate and make some test images.

I have created an account, and also scylla repo: hydraoss/scylla

If you agree this, I can help add all scylla maintainers to this docker image repo's collaborators

\cc @technosophos @resouer @hongchaodeng

Support imagePullSecret

Implement imagePullSecret as defined here: https://github.com/microsoft/hydra-spec/pull/37

It should be backed by Kubernetes secrets.

Add `cmd`, `args` to container image parameters

A recent PR against the spec added cmd and args fields to the component's container section. We need to add that

Are environment variables implemented?

I'm attempting to implement a ReplicatedService with some basic environment variables to allow for connecting to CosmosDB and AppInsights.

I tried using parameters and also fromVariable but it doesn't work so far. The variables end up in the pod spec, but they are blank.

I am hoping to define the variables in the ComponentSchematic and then set them in the OperationalConfiguration.

Add logging

Right now, all "log" messages are just println!. We need to use a real logging facility.

Autoscaler: Add the other kinds of metrics

The first cut of the Autoscaler trait only supports CPU metrics. Of course, there are tons of others supported by Kubernetes. We need to support those as well.

Handle workload settings and custom workloads

The specification defines extensible workload types, but we don't support such types or providing workload settings.

Support scopes

The specification defines several scopes. None of them are present or provided by the current implementation.

Consider changes to components when applying a configuration

Consider attempting to apply changes to a configuration after updating a referenced component:
c:>kubectl apply -f scylla-component.yaml
component.core.hydra.io/scyllacomponent created

c:>kubectl apply -f scylla-configuration.yaml
configuration.core.hydra.io/scyllaconfiguration created

c:>kubectl apply -f scylla-component.yaml
component.core.hydra.io/scyllacomponent configured

c:>kubectl apply -f scylla-configuration.yaml
configuration.core.hydra.io/scyllaconfiguration unchanged

In the fourth kubectl call I expected Scylla to notice that the component referenced by the configuration had been modified and consequently to apply changes to the existing component instances. It, however, did not do that because I did not modify scylla-configuration.yaml itself.

Helm install of scylla results in traits.yaml not available

Repro steps:

Create default AKS cluster with no RBAC, Virtual Nodes. Basic Networking and no Diagnostics
Pull credentials into kubeconfig
Install helm3 from: https://github.com/helm/helm/releases/tag/v3.0.0-alpha.2 and set it in path
Run kubectl create -f k8s/crds.yaml
Run helm install scylla ./charts/scylla

Result: Error: apiVersion "core.hydra.io/v1alpha1" in scylla/templates/traits.yaml is not available

Expected Result:
NAME: scylla
LAST DEPLOYED: 2019-08-08 09:00:07.754179 -0600 MDT m=+0.710068733
NAMESPACE: default
STATUS: deployed

NOTES:
Scylla is a Kubernetes controller to manage Configuration CRDs.

It has been successfully installed.

Scylla instigator shouldn't distinguish event with create/modify/delete

When I was developing with scylla, I find there was always an error reporting ErrorMessage { msg: "Modify on first-app-nginx: ApiError NotFound (\"componentinstances.core.hydra.io \\\"first-app-nginx\\\" not found\")" }.

Finally, I figure out that the object was not successfully created the first time I create the Configuration yaml. Then every time I modify the yaml, there will trigger a modify event and get the first-app-nginx componentinstances object, this will always fail as our instigator modify event handler won't handle this case.

Controller written in golang also don't distinguish event with create/modify/delete, all cases were handled in reconcile function.

unable to install scylla with helm

I upgraded to helm 3 (i think I did it right?) but now I'm getting an error when I try and install the charts.

Error: failed pre-install: unable to decode "": no kind "CustomResourceDefinition" is registered for version "apiextensions.k8s.io/v1beta1" in scheme "k8s.io/client-go/kubernetes/scheme/register.go:65"
Rias-MacBook-Pro:scylla riabhatia$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.8", GitCommit:"a89f8c11a5f4f132503edbc4918c98518fd504e3", GitTreeState:"clean", BuildDate:"2019-04-23T04:41:47Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Rias-MacBook-Pro:scylla riabhatia$ helm version
version.BuildInfo{Version:"v3.0.0-alpha.2", GitCommit:"97e7461e41455e58d89b4d7d192fed5352001d44", GitTreeState:"clean", GoVersion:"go1.12.7"}```

How are ops configs stored?

My understanding of an ops config is that it represents a release of a set of component instances at a particular point in time. Under the covers, it effectively just provisions the set of component instances. That leaves the ops config object as something purely for auditing purposes. But that only works if there is history, and there's no history associated with individual Kubernetes objects. For it to work here, you'd have to rename the ops config object each time you deploy it, and then query the set of ops configs ordered by creation time to get a history.

What's the plan here?

Error deleting pod

When a singleton pod is being deleted, the following message shows up in the logs:

Error processing event: Error(Status(404), "https://hydra-dev-hydra-dev-5a2d89-c7de8a1f.hcp.westus2.azmk8s.io/api/v1/namespaces/default/pods/first-app?")

Looking at the pod list, it appears that the name of the pod should be: first-app-nginx-singleton

Add support for manual scaling trait

The replicaCount on an ops config was recently added.

Implement schema on CRD definition

Hopefully, we should be able to just drop in a subset of the schema from the spec. Reference: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#specifying-a-structural-schema

Add support for ReplicatedTask workload type

Support Rollout Trait

Rollout definition goes as rolling out or upgrading existing application to a new version, and might involve other manual/automated processes like verification, rolling delay, etc.

Rollout is an important feature that we want to add as a trait to a Component. We could begin with using rollout tweaks already in k8s Deployment/StatefulSet. We would further implement some advanced strategies like Canary-rollout.

Update to new naming of workload types

In the spec, workload types are being re-named to worker, task and service, along with singletonWorker, singletonTask, and singletonService.

This change needs to be reflected in the code.

Handle parameter injection

Right now, parameters are not correctly injected into the components. We need to add support for this.

Inject parameters into Traits

Right now, traits cannot accept parameters.

Implement Task type

Currently, teh Task workload type is not implemented.

Add StatefulService Workload

Currently, we have ReplicatedService which provides good support for stateless workload and actually based on k8s deployment. For our use case, we want to support existing stateful workload that already runs on k8s StatefulSet on Scylla. Adding this StatefulService workload abstraction would be necessary for more adoption, and could further benefits the community.

Bug: Wrong name used when deleting ingress

When deleting a configuration with an ingress, the following error is sent to the log:

Error deleting trait for ingress: https://hydra-dev-hydra-dev-5a2d89-c7de8a1f.hcp.westus2.azmk8s.io/apis/extensions/v1beta1/namespaces/default/ingresses/first-app?: Client Error: 404 Not Found

This is because the actual name of the ingress is first-app-nginx-singleton

Support Monitoring Trait

Monitoring trait is an important feature but missing in current implementation. It would be great to define a monitoring trait and have it implemented in Scylla. We could begin with some very standard metrics exposed via Prometheus rules, e.g. CPU/Mem/IO metrics, application healthiness.