oam-dev / rudr Goto Github PK
View Code? Open in Web Editor NEWA Kubernetes implementation of the Open Application Model specification
Home Page: https://oam.dev
License: MIT License
A Kubernetes implementation of the Open Application Model specification
Home Page: https://oam.dev
License: MIT License
The Service Mesh Interface or SMI allows us to have a vendor neutral interface so infrastructure admins have the flexibility to choose a service mesh that works for them. We will start by integrating SMI and providing Linkerd and Consul Connect as implementations first. From there we hope to integrate more service mesh implementations as the community sees fit.
App operators will be able to use traffic routing rules to split traffic between versions of an application with a service mesh integration for example. These bits of functionality will be exposed as traits per component.
I am experimenting with a ReplicatedService. I specify a containerPort in the ComponentSchematic.
The k8s service is created and the targetPort matches above, but the service port itself is always 80. Is there any way to control this? Or at least match it to the containerPort?
We need to add instanceName
support to the ops config for a component.
Integrate the Kubernetes Horizontal Pod Autoscaler into Hydra to allow users to scale their applications on CPU and Mem metrics. Each 'component' in a developers config that's type replicated service twill be able to use this autoscaling trait.
We should also create a path forward for users who want to scale on custom metrics but the actual implementation can come after we release.
Depending on the PersistentVolumes (PV) paved in the cluster a user can make PersistentVolumeClaims (PVCs) to specify volumes as a trait for their components.
The infrastructure admin must have paved that kind of PV for static provisioning or they must of created and configured a storage class for dynamic provisioning.
At one point we were going to change the Hydra spec to call ComponentSchematic simply Component. Scylla followed the proposed naming change. However, the change never was formally added to the spec. So we need to rename Component to ComponentSchematic.
I was able to successfully install scylla in AKS but after scaling the cluster none of the components/configuration installation was successful. Because of that, I have deleted the deployment using helm and tried to install it again. But i'm stuck here with the below error. I had to manually remove the crd as it was deleting as part of the helm delete scylla. Thanks for your help.
helm install --name scylla ./charts/scylla --wait
Error: validation failed: unable to recognize "": no matches for kind "Trait" in version "core.hydra.io/v1alpha1"
kubectl get trait
error: the server doesn't have a resource type "trait"
It should be possible to make a directory (say, _cache
) that can be mounted rw
during docker build
. That way, we won't incur the global rebuild of all of Rust's libraries each time we do make docker-build
.
After deploying with Helm the pod is never able to start. I can see the following information in the logs:
thread 'main' panicked at 'component reflector cannot be created: Error(Status(403), "https://10.0.0.1/apis/core.hydra.io/v1alpha1/namespaces/default/components?")', src/libcore/result.rs:999:5
Which refers to line 44 in main.rs.
let component_cache: Reflector<Component, Status> =
Reflector::new(client.clone(), component_resource.clone().into())
.expect("component reflector cannot be created");
Are we lacking some permissions?
Right now, Scylla runs as a non-server watcher on a pod. To make it work as a proper Kubernetes controller, though, we need a health check.
main.rs
Here's an example: https://github.com/clux/controller-rs/blob/master/src/main.rs#L34-L48
Option B is a huge pain, but Option A is probably pretty straightforward.
So some traits are pre-boot, while others are post-boot. And still others may be both.
Examples:
THere could even be a more fine-grained set of policies than pre and post boot.
So I'm thinking that we internally implement a traits lifecycle system where any given trait can respond to several lifecycle hooks. Since this is not something that the spec folks are generally in agreement on, this will be a Scylla-specific implementation for now.
We recently discussed adding a new CRD that is not in the Hydra spec. This type would be ComponentInstance and would track each instance of a component.
Relationally, an OpsConfig is the parent of N ComponentInstances, each of which is the parent of the component's workload type implementation. So an ops config with a Singleton component and a Task component would have two component instances, one of which would have the owner reference for the Deployment/Pod of the Singleton, and the other would have the owner reference for the Job.
Currently, the only utility this buys us is allowing the user to easily discover the hierarchy of the ops config
In this model, it is not possible to create a configuration that contains components deployed in different namespaces, but it may be possible to create configurations in multiple namespaces that contain components that can communicate with each other.
This may have limitations when scopes are introduced, since we will have to decide whether scopes are global or are namespaced.
For post-processing of operational configurations, there are a few places where the syntax [fromVariable(NAME)]
should result in a substitution of a variable name into a value. This is described here: https://github.com/microsoft/hydra-spec/blob/master/6.operational_configuration.md#properties
Currently the azure pipeline CI results can not be shown unless granted some sort of access. It would be great to have some way to share results.
Our team has discussed and reached consensus on the following work.
This serves as the umbrella issue for tracking the goals and work to refactor Scylla.
The goals we want to achieve:
Here is a list of sub-tasks or features that we want to implement:
"rust to go" is the features that are current implemented in Rust and needs migration to Go.
"W" - Workload. "T" - Trait.
Hydra API | Implementation | status |
---|---|---|
ReplicatedService (W) | k8s Deployment | rust to go |
ReplicatedTask (W) | k8s Job | rust to go |
SingleTon (W) | k8s Pod+Service | rust to go |
Task (W) | k8s Job | rust to go |
AutoScaler (T) | k8s HPA | rust to go |
Ingress (T) | k8s Ingress | rust to go |
ManualScaler (T) | k8s Deployment replicas, Job parallelism | rust to go |
StatefulService (W) | k8s StatefulSet | TODO |
DaemonService (W) | k8s DaemonSet | TODO |
CronTask (W) | k8s CronJob | TODO |
Monitoring (T) | Prometheus | TODO |
Rollout (T) | k8s Deployment, StatefulSet | TODO |
I think it is possible to user owner references to link configurations to component instances, and component instances to all their resources.
The deployment defines health probes, which currently fail after a deployment. The current yaml has a typo listing port: http
instead of port: 8080
. The deployment also lists a containerPort of 80, and looking at the main.rs code it looks like it should be 8080. I tried updating the containerPort to 8080 and both probe ports to 8080, but the health checks still failed.
Warning Unhealthy <invalid> (x6 over <invalid>) kubelet, aks-nodepool1-28966993-0 Liveness probe failed: Get http://10.244.0.19:8080/health: dial tcp 10.244.0.19:8080: connect: connection refused
Warning Unhealthy <invalid> (x94 over <invalid>) kubelet, aks-nodepool1-28966993-0 Readiness probe failed: Get http://10.244.0.19:8080/health: dial tcp 10.244.0.19:8080: connect: connection refused
I removed the health checks from the deployment to get past it for now.
➜ scylla git:(fix404) RUST_LOG="scylla=debug" RUST_BACKTRACE=short cargo run
Compiling scylla v0.1.0 (/Users/sunjianbo/codebox/scylla)
Finished dev [unoptimized + debuginfo] target(s) in 16.13s
Running `target/debug/scylla`
Error: Error { inner: stack backtrace:
0: failure::backtrace::internal::InternalBacktrace::new::h4b8e3b3717dcd6bb (0x10cf0f65f)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/internal.rs:44
1: failure::backtrace::Backtrace::new::h47f946e0254b7690 (0x10cf0f3ee)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/mod.rs:111
2: failure::context::Context<D>::new::hcc4c441593440f30 (0x10c68d7b1)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/context.rs:84
3: <kube::Error as core::convert::From<kube::ErrorKind>>::from::he90041a04b79ba96 (0x10c6852f9)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/lib.rs:80
4: kube::client::APIClient::request::hf3374454a4b9b630 (0x10c2d5068)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/client/mod.rs:97
5: kube::api::reflector::Reflector<K>::get_full_resource_entries::he9ecd491374f6bf8 (0x10c21a7ba)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/api/reflector.rs:165
6: kube::api::reflector::Reflector<K>::init::h4dd0003b630f00b6 (0x10c21ae47)
at /Users/sunjianbo/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-0.12.0/src/api/reflector.rs:116
7: scylla::main::hd307ca40df2bb7c5 (0x10c2bc8fc)
at /Users/sunjianbo/codebox/scylla/src/main.rs:43
8: std::rt::lang_start::{{closure}}::h5d5a11653c904935 (0x10c2cad61)
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:64
9: std::panicking::try::do_call::h8037d9f03e27d896 (0x10d136047)
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/panicking.rs:293
10: ___rust_maybe_catch_panic (0x10d13a52e)
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libpanic_unwind/lib.rs:85
11: std::rt::lang_start_internal::hc8e69e673740d4ae (0x10d136b2d)
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:48
12: std::rt::lang_start::h6524883d1453e01f (0x10c2cad41)
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libstd/rt.rs:64
13: _main (0x10c2bd6e1)
ApiError https://192.168.99.101:8443/apis/core.hydra.io/v1alpha1/namespaces/default/componentschematics?: Client Error: 404 Not Found ("Error(Status(404), \"https://192.168.99.101:8443/apis/core.hydra.io/v1alpha1/namespaces/default/componentschematics?\")") }
just delete all componentschematics and cargo run
scylla
This may be a tricky one because Kubernetes distinguishes between Job and CronJob.... so it sorta changes the workload type
The worker
type recently added to the spec is not implemented
The organization account can authorize our account to cooperate and make some test images.
I have created an account, and also scylla repo: hydraoss/scylla
If you agree this, I can help add all scylla maintainers to this docker image repo's collaborators
Implement imagePullSecret as defined here: https://github.com/microsoft/hydra-spec/pull/37
It should be backed by Kubernetes secrets.
A recent PR against the spec added cmd
and args
fields to the component's container section. We need to add that
I'm attempting to implement a ReplicatedService with some basic environment variables to allow for connecting to CosmosDB and AppInsights.
I tried using parameters and also fromVariable
but it doesn't work so far. The variables end up in the pod spec, but they are blank.
I am hoping to define the variables in the ComponentSchematic and then set them in the OperationalConfiguration.
Right now, all "log" messages are just println!
. We need to use a real logging facility.
The first cut of the Autoscaler trait only supports CPU metrics. Of course, there are tons of others supported by Kubernetes. We need to support those as well.
The specification defines extensible workload types, but we don't support such types or providing workload settings.
The specification defines several scopes. None of them are present or provided by the current implementation.
Consider attempting to apply changes to a configuration after updating a referenced component:
c:>kubectl apply -f scylla-component.yaml
component.core.hydra.io/scyllacomponent created
c:>kubectl apply -f scylla-configuration.yaml
configuration.core.hydra.io/scyllaconfiguration created
c:>kubectl apply -f scylla-component.yaml
component.core.hydra.io/scyllacomponent configured
c:>kubectl apply -f scylla-configuration.yaml
configuration.core.hydra.io/scyllaconfiguration unchanged
In the fourth kubectl call I expected Scylla to notice that the component referenced by the configuration had been modified and consequently to apply changes to the existing component instances. It, however, did not do that because I did not modify scylla-configuration.yaml itself.
Repro steps:
Create default AKS cluster with no RBAC, Virtual Nodes. Basic Networking and no Diagnostics
Pull credentials into kubeconfig
Install helm3 from: https://github.com/helm/helm/releases/tag/v3.0.0-alpha.2 and set it in path
Run kubectl create -f k8s/crds.yaml
Run helm install scylla ./charts/scylla
Result: Error: apiVersion "core.hydra.io/v1alpha1" in scylla/templates/traits.yaml is not available
Expected Result:
NAME: scylla
LAST DEPLOYED: 2019-08-08 09:00:07.754179 -0600 MDT m=+0.710068733
NAMESPACE: default
STATUS: deployed
NOTES:
Scylla is a Kubernetes controller to manage Configuration CRDs.
It has been successfully installed.
When I was developing with scylla, I find there was always an error reporting ErrorMessage { msg: "Modify on first-app-nginx: ApiError NotFound (\"componentinstances.core.hydra.io \\\"first-app-nginx\\\" not found\")" }
.
Finally, I figure out that the object was not successfully created the first time I create the Configuration yaml. Then every time I modify the yaml, there will trigger a modify event and get the first-app-nginx
componentinstances object, this will always fail as our instigator modify event handler won't handle this case.
Controller written in golang also don't distinguish event with create/modify/delete, all cases were handled in reconcile function.
I upgraded to helm 3 (i think I did it right?) but now I'm getting an error when I try and install the charts.
Error: failed pre-install: unable to decode "": no kind "CustomResourceDefinition" is registered for version "apiextensions.k8s.io/v1beta1" in scheme "k8s.io/client-go/kubernetes/scheme/register.go:65"
Rias-MacBook-Pro:scylla riabhatia$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.8", GitCommit:"a89f8c11a5f4f132503edbc4918c98518fd504e3", GitTreeState:"clean", BuildDate:"2019-04-23T04:41:47Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Rias-MacBook-Pro:scylla riabhatia$ helm version
version.BuildInfo{Version:"v3.0.0-alpha.2", GitCommit:"97e7461e41455e58d89b4d7d192fed5352001d44", GitTreeState:"clean", GoVersion:"go1.12.7"}```
My understanding of an ops config is that it represents a release of a set of component instances at a particular point in time. Under the covers, it effectively just provisions the set of component instances. That leaves the ops config object as something purely for auditing purposes. But that only works if there is history, and there's no history associated with individual Kubernetes objects. For it to work here, you'd have to rename the ops config object each time you deploy it, and then query the set of ops configs ordered by creation time to get a history.
What's the plan here?
When a singleton pod is being deleted, the following message shows up in the logs:
Error processing event: Error(Status(404), "https://hydra-dev-hydra-dev-5a2d89-c7de8a1f.hcp.westus2.azmk8s.io/api/v1/namespaces/default/pods/first-app?")
Looking at the pod list, it appears that the name of the pod should be: first-app-nginx-singleton
The replicaCount
on an ops config was recently added.
Hopefully, we should be able to just drop in a subset of the schema from the spec. Reference: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#specifying-a-structural-schema
Rollout definition goes as rolling out or upgrading existing application to a new version, and might involve other manual/automated processes like verification, rolling delay, etc.
Rollout is an important feature that we want to add as a trait to a Component. We could begin with using rollout tweaks already in k8s Deployment/StatefulSet. We would further implement some advanced strategies like Canary-rollout.
In the spec, workload types are being re-named to worker
, task
and service
, along with singletonWorker
, singletonTask
, and singletonService
.
This change needs to be reflected in the code.
Right now, parameters are not correctly injected into the components. We need to add support for this.
Right now, traits cannot accept parameters.
Currently, teh Task workload type is not implemented.
Currently, we have ReplicatedService which provides good support for stateless workload and actually based on k8s deployment. For our use case, we want to support existing stateful workload that already runs on k8s StatefulSet on Scylla. Adding this StatefulService workload abstraction would be necessary for more adoption, and could further benefits the community.
When deleting a configuration with an ingress, the following error is sent to the log:
Error deleting trait for ingress: https://hydra-dev-hydra-dev-5a2d89-c7de8a1f.hcp.westus2.azmk8s.io/apis/extensions/v1beta1/namespaces/default/ingresses/first-app?: Client Error: 404 Not Found
This is because the actual name of the ingress is first-app-nginx-singleton
Monitoring trait is an important feature but missing in current implementation. It would be great to define a monitoring trait and have it implemented in Scylla. We could begin with some very standard metrics exposed via Prometheus rules, e.g. CPU/Mem/IO metrics, application healthiness.
The Kube package has undergone a major refactor, and version 0.10 is now out. Need to update.
After #11 gets merged, add secrets to ReplicatedService and Singleton implementations
If you can load a malformed componentschematic into Kubernetes, starting Scylla will fail with an empty error message. The Reflector seems to be the source of the failure. Can probably capture this and log an error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.