bookingcom / shipper Goto Github PK

Kubernetes native multi-cluster canary or blue-green rollouts using Helm

License: Apache License 2.0

Go 97.75% Perl 0.14% Shell 1.10% Makefile 1.01%

kubernetes deployment-automation blue-green-deployment

shipper's Introduction

Deprecation notice

⚠️ Shipper has reached end of life. We are no longer accepting pull requests or providing support for community users. The project has been discontinued and the core team will not develop any feature or bug-fix.

Shipper

Visit Read the Docs for the full documentation, examples and guides.

Shipper is an extension for Kubernetes to add sophisticated rollout strategies and multi-cluster orchestration.

It lets you use kubectl to manipulate objects which represent any kind of rollout strategy, like blue/green or canary. These strategies can deploy to one cluster, or many clusters across the world.

Why does Shipper exist?

Kubernetes is a wonderful platform, but implementing mature rollout strategies on top of it requires subtle multi-step orchestration: Deployment objects are a building block, not a solution.

When implemented as a set of scripts in CI/CD systems like Jenkins, GitLab, or Brigade, these strategies can become hard to debug, or leave out important properties like safe rollbacks.

These problems become more severe when the rollout targets multiple Kubernetes clusters in multiple regions: the complex, multi-step orchestration has many opportunities to fail and leave clusters in inconsistent states.

Shipper helps by providing a higher level API for complex rollout strategies to one or many clusters. It simplifies CI/CD pipeline scripts by letting them focus on the parts that matter to that particular application.

Multi-cluster, multi-region, multi-cloud

Shipper can deploy your application to multiple clusters in different regions.

It expects a Kubernetes API and requires no agent in the application clusters, so it should work with any compliant Kubernetes implementation like GKE or AKS. If you can use kubectl with it, chances are, you can use Shipper with it as well.

Release Management

Shipper doesn't just copy-paste your code onto multiple clusters for you -- it allows you to customize the rollout strategy fully. This allows you to craft a rollout strategy with the appropriate speed/risk balance for your particular situation.

After each step of the rollout strategy, Shipper pauses to wait for another update to the Release object. This checkpointing approach means that rollouts are fully declarative, scriptable, and resumable. Shipper can keep a rollout on a particular step in the strategy for ten seconds or ten hours. At any point the rollout can be safely aborted, or moved backwards through the strategy to return to an earlier state.

Roll Backs

Since Shipper keeps a record of all your successful releases, it allows you to roll back to an earlier release very easily.

Charts As Input

Shipper installs a complete set of Kubernetes objects for a given application.

It does this by relying on Helm, and using Helm Charts as the unit of configuration deployment. Shipper's Application object provides an interface for specifying values to a Chart just like the helm command line tool.

Relationship to Tiller

Tiller is the server-side component of Helm 2 which installs Charts into the cluster, and keeps track of releases. Shipper does not use Tiller: it replaces Tiller entirely.

Shipper consumes Charts directly from a Chart repository like ChartMuseum, and installs objects into clusters itself. This has the nice property that regular Kubernetes authentication and RBAC controls can be used to manage access to Shipper APIs.

Documentation and Support

Visit Read the Docs for the full documentation, examples and guides.

You can find us at #shipper channel of Kubernetes slack.

Demo

Here's a video demo of Shipper from the SIG Apps community call June 2018. Shipper object definitions have changed a little bit since then, but this is still a good way to get a general idea of what problem Shipper is solving and how it looks in action.

License

Apache License 2.0, see LICENSE.

shipper's People

Contributors

Stargazers

Watchers

shipper's Issues

Release controller does not give feedback when choosing a cluster fails

We had a report from a user that his release was stuck, and Shipper did not report any events nor any false conditions (or any condition at all, for that matter). Logs reported that Shipper couldn't choose a cluster ("Failed to choose clusters for release "foo/bar" (will retry): Not enough clusters in region "baz". Required: X / Available: Y"). In these cases, Shipper should always log an event, and also set the Scheduled condition to False.

Relevant area in the code is here: https://github.com/bookingcom/shipper/blob/7855e52/pkg/controller/release/release_controller.go#L299-L318

Also, the error handling in that block wraps all errors in a RecoverableError, regardless of the original error type. It'd be nice if we could keep the retry-awareness of the original error.

clusters apply fails if context name has underscore

GKE clusters typically have underscore in them (ie. gke_ACCOUNT_ZONE_CLUSTERNAME), and that causes the apply to fail

Workaround is to rename contexts - context - name in ~/.kube/config

shipperctl admin clusters apply -f
...
Joining management cluster gke_account_us-central1-a_somename to application cluster gke_account_us-central1-a_somename:
Creating or updating the cluster object for cluster gke_account_us-central1-a_somename on the management cluster... Error! Cluster.shipper.booking.com "gke_account_us-central1-a_somename" is invalid: metadata.name: Invalid value: "gke_account_us-central1-a_somename": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Upgrade client-go

ClusterController

When we're ready to introduce capacity based scheduling, we should create a Cluster Controller.

Its primary task is to collect capacity information from clusters (by consuming their Node objects) and fill the Cluster.Status with aggregated capacity data for consumption by the Schedule controller.

It could also use the information from the clients to set conditions on the Cluster object to help provide more information to operators. For example, if Shipper has problems connecting to a cluster, this would be the place to surface that information.

Create a `Ready` condition in TrafficTarget to inform Release Controller when traffic has been achieved

Motivation

The current implementation of the Strategy Controller can't reliably determine, with the information currently available, whether a traffic shifting operation is finished without consider the implementation details of the traffic shifting backend.

In order to make this decision process more precise and reliable, we need to either:

Add more information regarding the internals of the current traffic shifting backend implementation, encoded in the Traffic Target object (for example, number of replicas for the pod labeling traffic shifting implementation for a particular cluster); or
Adjust the Traffic Target object interface to encode in .status the the last desired weight, the achieved weight and whether, from the Traffic Controller perspective, the optimal weight for the given desired weight has been achieved.

Between those two approaches, I chose to describe, in this document, the implementation of 2. because:

Makes the Strategy Controller less aware of any Traffic Controller traffic shifting internal implementation details, thus making its decision process simpler;
Makes the Traffic Controller very independent, meaning the traffic shifting implementation can be swapped and no other changes in the system would be required;
Doesn't require any design regarding generic primitives to allow swapping traffic shifting implementations.

Changes

Both Traffic Controller and Strategy Controller should be modified to encode this information in each cluster traffic status' conditions, and use this information to decide whether to continue the strategy, respectively.

In the current implementation, the Traffic Controller uses the information encoded in .spec.clusters[].weight as input to the new state it should try to converge. Once it has finished its work (regardless of internal implementation) it encodes the results in .status.clusters[].achievedTraffic, leaving the interpretation of whether the achieved traffic is appropriate to continue the strategy or not to the Strategy Controller.

The current traffic shifting implementation (adding Pods to a specific Service through labels) falls short in situations where the specified weight can never be achieved; for example, any weight that would represent 50% of traffic of a deployment with 3 replicas.

This requires the Strategy Controller to consider whether a Release is the contender or incumbent to properly evaluate whether the desired state has been achieved, requiring the Strategy Controller to know more about the traffic shifting implementation than it should.

One option to alleviate this issue is to encode whether the optimal weight has been achieved from the point of view of the Traffic Controller through the Ready condition in .status.clusters[].conditions.

Take in account the following Traffic Target object excerpt:

spec:
  clusters:
  - name: cluster-a
    weight: 50

We have an internal policy to always round up any calculations involving number of replicas, both for capacity and the current traffic shifting implementation. This means that when the strategy requires the equivalent of 50% of 3 replicas, 2 replicas will be included in the load balancer.

Considering that the Release associated with this object has specified 3 replicas, its .status field might end up like the following:

status:
  clusters:
  - name: cluster-a
    achievedTraffic: 67

With this information at hand, the Strategy Controller can't actually decide whether or not the Traffic Controller can still converge into the desired weight. At least not without understanding the current traffic shifting implementation, as previously stated.

To solve this issue, the Traffic Controller should then encode whether it can still converge (or not) into the desired weight through the Ready conditions, like the following:

status:
  clusters:
  - name: cluster-a
    achievedTraffic: 67
    conditions:
    - type: Ready
      status: False
      reason: EndpointMissingReplicas
      message: The Endpoint associated to this Release's Service doesn't include the number of replicas expected (1 out of 2)

Once the Traffic Controller identifies that all the work required to fulfill to achieve the desired state, this information should be encoded in the same Ready condition, like the following:

status:
  clusters:
  - name: cluster-a
    achievedTraffic: 67
    conditions:
    - type: Ready
      status: True
      reason: EndpointHasAllDesiredReplicas
      message: The Endpoint associated to this Release's Service include all replicas expected (2 out of 2)

Unfortunately this isn't enough for the Strategy Controller to perform its work; in a very probably case where the Strategy Controller modifies .spec.clusters[].weight and the Traffic Controller hasn't picked that up to work (for example, Shipper is busy) the Strategy Controller might wrongly infer, by looking solely at the Ready condition that the strategy step status has been achieved. To avoid this scenario, we need to include the desired weight in .status.clusters[] as well. This state would look like the following:

spec:
  clusters:
  - name: cluster-a
    weight: 60
status:
  clusters:
  - name: cluster-a
    desiredWeight: 50
    achievedTraffic: 67
    conditions:
    - type: Ready
      status: True
      reason: EndpointHasAllDesiredReplicas
      message: The Endpoint associated to this Release's Service include all replicas expected (2 out of 2)

The Strategy Controller should then, when evaluating whether or not continue the strategy, check whether .spec.clusters[].weight matches .status.clusters[].desiredWeight and only then check whether the Ready condition is True. This means that in the previous listing the Strategy Controller would not continue the strategy until the Traffic Controller does its job. The following listing shows the Traffic Target object after it's been touched by the Traffic Controller:

spec:
  clusters:
  - name: cluster-a
    weight: 60
status:
  clusters:
  - name: cluster-a
    desiredWeight: 60
    achievedTraffic: 67
    conditions:
    - type: Ready
      status: True
      reason: EndpointHasAllDesiredReplicas
      message: The Endpoint associated to this Release's Service include all replicas expected (2 out of 2)

The Strategy Controller would evaluate, for this particular state, that the strategy step has been finished and can move to the next action.

In the case of multiple clusters, the Strategy Controller would consider that a step has been finished when all clusters listed in .spec.clusters[] have the same weight in .status.clusters[], and all clusters' Ready condition have True status.

Move target object creation from Schedule Controller to Application Controller

Rationale

In the current implementation the Schedule Controller is responsible, in addition of finding suitable clusters for a Release to be installed, for creating the Installation, Capacity and Traffic target objects related to such Release. Making this creation of objects part of Schedule Controller's responsibility seems off (perhaps better wording in here).

It is currently the responsibility of the Application Controller create new Release objects once an Application object's .spec.template field is modified. This new Release object is then marked as schedulable (that is, setting the Release condition type Scheduled to status True), then triggering the Schedule Controller to perform the aforementioned operations.

This proposal is about moving the responsibility of creating Installation, Capacity and Traffic target objects from Scheduler Controller to the Application Controller.

This means that, in addition to create the Release object reflecting an Application object's .spec.template field, the Application Controller will also create the supporting Installation, Capacity and Traffic target objects. It also means that the operations on Scheduler Controller would change as well: the Schedule Controller's sole responsibility becomes finding suitable clusters for a Release object and modify all the Release related target objects.

Benefits

An outside reader might consider surprising when discovering the Schedule Controller is responsible for unpacking and decoding charts and propagating Release conditions; it is also surprising is also its responsibility to create the Release related target objects.

The main benefit of this proposal is a more well defined and predictable boundaries between Application and Schedule controller, and their interactions with the Release related target objects.

Overview

Application Controller

The following list describes the high-level operations expected at a single Application sync on Application Controller if this proposal is implemented. Not including other Application Controller responsibilities like handling roll backs in here is intentional.

Creates a new Release object using Application object's .spec.template.
Creates a new Installation target object with empty .spec field having the new Release object as owner.
Creates a new Capacity target object with empty .spec field having the new Release object as owner.
Creates a new Traffic target object with empty .spec field having the new Release object as owner.
Determines chart health (whether a chart could be fetched, parsed and decoded successfully) and extracts from it required information by target objects.
- Stores the chart into the cache location. Failing to do so will be reflected in the ChartCached Release condition.
Updates relevant target objects if chart is healthy.
- Set shipper.booking.com/capacity.replica-count annotation in Capacity target object to be consumed by the Schedule Controller and used in .spec.clusters[].totalReplicaCount when updating .spec.clusters.
Updates Release conditions based on the chart's health.
- ChartCached/False
  - CacheError: Chart is reachable but couldn't be stored into the cache location. This can be used by other actors to avoid using the cache infrastructure and try to fetch the chart directly from the specified repository as last resort before failing.
- ChartCached/True
  - Chart was properly stored into the cache location.
- ChartHealthy/False
  - NotReachable: Chart is not reachable due to some HTTP error (not found, server error, etc).
  - SyntaxError: Chart can not be rendered due to syntax errors. This error is not recoverable.
  - DecodeError: Chart could be rendered but couldn't be decoded into valid Kubernetes manifests. This error is not recoverable.
  - ShipperContractError: Chart could be rendered but doesn't fulfill Shipper's contract, for example more than one Deployment manifest being rendered, or missing required Service manifest.
- ChartHealthy/True
  - Chart has been downloaded and processed successfully.

Scheduler Controller

The following list describes the high-level operations expected at a single Release sync on Scheduler Controller if this proposal is implemented.

Finds suitable clusters for the Release object.
Updates shipper.booking.com/release.clusters annotation in Release object.
- Failure in updating the Release object should abort the sync handler and re-enqueue the Release.
Updates Installation target object's .spec.clusters with selected cluster names.
Updates Capacity target object's .spec.clusters, setting for each cluster the .spec.clusters[].percent field with 0, and .spec.clusters[].totalReplicaCount with the value set in shpper.booking.com/capacity.replica-count.
Updates Traffic target object's .spec.clusters, setting for each cluster the .spec.clusters[].weight field with 0.
Updates Release conditions based on cluster selection result and whether all related target object's have been updated.
- Scheduled/False
  - NoClusters: No clusters have been found matching Release's .spec.clusterRequirements.
  - TargetObjectMissingClusters: Not all target objects have been updated to match Release selected clusters. Message should contain the objects which .spec is incorrect. Example: it/reviews-api-deadbeef-0, ct/reviews-api-deadbeef-0.
- Scheduled/True
  - Schedule Controller was able to find suitable clusters to scheduled the Release and update all Release related target objects.

Strategy Controller

It is likely that no modifications in Strategy Controller are required for this proposal, since the Strategy Controller will not interact with target objects until the Release has been scheduled (by checking the Scheduled Release condition).

Installation Controller

This proposal is based on the premise that Installation Controller can handle Installation target objects with an empty .spec, so this implementation should add or increase test coverage in cases where Installation target objects have empty .spec field.

Capacity Controller

This proposal is based on the premise that Capacity controller can handle Capacity target objects with an empty .spec, so this implementation should add or increase test coverage in cases where Capacity target objects have empty .spec field.

Traffic Controller

This proposal is based on the premise that Traffic Controller can handle Traffic target objects with an empty .spec, so this implementation should add or increase test coverage in cases where Traffic target objects have empty .spec field.

E2E test moving strategy steps backwards

Installation controller reports object addresses in recorded events

Installation controller reports object addresses instead of their stringified definition, e.g:

$ kubectl -n foo describe it
Events:
  Type    Reason                     Age    From                     Message
  ----    ------                     ----   ----                     -------
  Normal  InstallationStatusChanged  7m35s  installation-controller  Set "foo/foo-bar-a382ff2e-0" status to [0xc00068e640]
  Normal  InstallationStatusChanged  7m35s  installation-controller  Set "foo/foo-bar-a382ff2e-0" status to [0xc0067d8fa0]

It boils down to a "wrong" verb being used in Sprintf call here: https://github.com/bookingcom/shipper/blob/master/pkg/controller/installation/installation_controller.go#L326

Expected behavior: an event is recorded with a status described in a human-readable format.

E2E test for aborts

Check the production Service Endpoint object to see whether traffic has progressed

The actual implementation considers that if a Pod has been successfully patched to be added into the Load Balancer, the Pod is receiving traffic.

This is not reliable, since the production Service .spec.selector is modified, we risk that existing Pods do not have matching labels and traffic will never reach those Pods, resulting in Shipper being stuck while reporting erroneously that traffic is being routed to those Pods.

Check compatibility with Kubernetes 1.13 & 1.14

Support a no-op roll out

Right now, we watch for modifications to the Application object, and use those moments to create a new release.

However, the problem with this approach is that there is no way of forcing a rollout. This would be useful in cases where:

The user wants to restart all pods to refresh a value retrieved at runtime
The user wants us to re-fetch the chart with the same version

I'm not aware of us having come up with any solutions for this in the past, although there were a lot of discussions. The easiest option we came up with, however, which is not perfect, is people changing a label on their Application object so that technically, it would result in an update event and cause a new release.

Another idea we discussed was to have an annotation, like shipper.booking.com/noop, that people would set to true to cause a rollout to happen.

The problem with these approaches is that they are not intuitive and seem to be more of a hack to get around the issue than to solve it properly.

Change clusterclientstore to keep clients per controller per cluster

clusterclientstore keeps distinct clients per target cluster. We think it makes sense to change that to per target cluster per controller, so that that client throughput is not shared across multiple controllers talking to the same target cluster (think traffic controller getting throttled because installation controller is actively retrying something).

Kubernetes 1.11 drops support for custom fields in 'metadata'

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#sig-api-machinery-1

This means that when Releases are created, they get their .metadata.environment stripped out. This causes the Application controller to go into a spin-loop of madly creating new Releases.

Discovered while testing K8s 1.12.

To fix this, we need to store the Release Environment somewhere else. Likely in .spec.

print additional column CLUSTERS and STEP for release object

Currently the output of the command $ kubectl get releases shows the following:

NAME                  AGE
frontend-3bb976cc-0   4h
frontend-3c64b4d6-0   40d
frontend-5a57ddcf-0   24d

To know which release was deployed in what clusters or to know the current achieved step for a particular release, the following scripting is required at this moment, which is pretty hacky and time consuming:

$ kubectl get release frontend-3bb976cc-0 -o jsonpath='{.status.achievedStep.name}'
full on

$ kubectl get release frontend-3bb976cc-0 -o jsonpath='{.metadata.annotations.shipper\.booking\.com\/release\.clusters}'
eu-north-n2

Starting from Kubernetes 1.11, we can show these fields using Additional Printer Column for CRDs and the final outcome would look like the following:

NAME                         CLUSTERS                STEP      AGE
frontend-3bb976cc-0          eu-north-n2             full on   14h
backend-3c678546-0           eu-north-n1             staging   4d

This would be really handy with --watch flag and also will save some scripting from user point of view.

The 'Complete' condition on Release objects prevents them from respecting changes in targetStep

When a release is marked as 'complete', it will no longer respect changes made to targetStep, even if it remains the head of history. This means that 'graceful' linear rollbacks are not possible once the strategy is complete.

Shipper Doesn't Work With Resyncs Disabled

If you run Shipper with resyncs disabled, there are cases where Shipper will get stuck while taking a release forward.

This is because sometimes we rely on the resync to happen, and we don't do as much as we can in each sync loop.

We haven't dug into this yet, so the best way to reproduce this is to run the end-to-end tests while running Shipper with the resync period of 0 to disable all resyncs, and go from there.

Diagnosing rollout progress: fleet summary in the Capacity Target object

(ported from the internal repo design docs)

This document describes part of our plan for helping users diagnose how their rollout is going. This has two components: the low-level high detail view in the status of the CapacityTarget object; the high-level low-detail view in the status of the Release object. This document discusses the low-level high-detail CapacityTarget view: we think it'll be easier to start with a domain where we don't need to invent summarization/prioritization schemes.

Reporting Progress

Previously, we introduced the concept of sad pods, which allowed the
user to see the pods that were not ready. There were a few problems
with this approach:

It was hard to read: we were dumping the whole status of the pod
into the capacity target, for every single pod that was not working
We were only using a hard limit of 5 to keep the objects small. This
meant that different problems wouldn't be necessarily surfaced.
The user wouldn't see the positive things (i.e. pods running), so
it'd be hard to see if the release was progressing, or for tooling
to show the status of the whole release across multiple clusters.

So, we decided to summarize the status of all the pods per cluster.

Criteria For Summarizing

Owner

The first level of the summary is the owner of the pod.

Multiple Kubernetes objects can lead to one or more pods being
created. DaemonSets, Deployments, Jobs, ReplicaSets, and StatefulSets
can create new pods, which means that later down the hierarchy, we
might have container names that clash. To prevent that, we are using
the owner of a pod to categorize the summary report at the top level.

Pod Condition: Type, Reason and Status

Under each owner, there is a pod status breakdown. This breakdown is
grouped by the following fields, in order:

Pod condition type (e.g. Ready)
Pod condition reason (e.g. ContainersNotReady)
Pod condition status (True, False, or Unknown)

Apart from categorizing pods by their conditions, we also sort the
results with the same criteria to keep the ordering consistent across
multiple updates.

To aid humans in deciding which problem to look into, we also maintain
a count for the number of pods with this type, reason and status.

Container Name

In the containers field of each type + reason + status
combination, there is another grouping happening, and that is grouped
by container name.

This means that we have a report per container name. And that report
follows a pretty similar structure as the report for pods.

Container State: Type and Reason

Just as the pod status breakdown works with conditions, container
state breakdown works with container states. The only difference is
that unlike pods, container states are not as transparent, and we need
to infer type and reason through a logic of our own..

Type

Each container state has three nullable fields. They are called
Waiting, Running, and Terminated.

We use these to derive the container state Type. The type of a
container state is whichever field that is not null.

Reason

Containers keep two states, not one. The state called State is their
current state, and the one called LastTerminationState contains the
last state that happened. In other words, this is the state of the
container before it got restarted.

Reason is tricky mostly because it is not always informative. Based on
our experience so far, what users usually want to see is the Reason
of the current state, if the current state is Waiting.

Here are the steps we go through to come up with the Reason for a
container state:

If the State (i.e. the container's current state) is Waiting, we
use the reason.
If it's not, the Reason is empty.

Constructing Examples

In each pod status breakdown, we have an example that contains a pod
name and a message. At best, this message helps the user know what is
wrong without having to switch to the target cluster. At worst, the
user can use the pod name to look through logs or events after
switching to the application cluster.

This example is picked from the list of pods which fall into that
breakdown. However, to keep this example pod consistent, pods are
sorted, and then the first pod is picked as the example.

The example contains only two fields, the pod name and a message.

Pod Name

This is copied, verbatim, from the name of the example pod.

Message

We are trying to show some useful information to the user through the
Message of the example. Here is where we get the message from:

If LastTerminationState.Terminated.Message is set, meaning that
the user has written to the termination message path, we choose
it.
If it's not set, we construct a message ourselves. The initial
proposal is to go with a string like Terminated with exit code <exitcode> or Terminated with signal <signal> if there is a
signal instead of an exit code.

Example

To bring it all together, here is an example of what a capacity target would
look like with a replica set maintaining 20 pods with 2 containers (app and
envoy):

status:
  clusters:
  - name: us-west1
    report:
    - owner: 
        name: replicaset/reviewsapi-$hash-0-$hash
      breakdown:
      - type: Ready
        status: True
        count: 12
        containers:
        - name: app
          states:
          - type: Running
            count: 12
            example:
              pod: reviewsapi-$hash-0-$hash-1234
        - name: envoy
          states:
          - type: Running
            count: 12
            example:
              pod: reviewsapi-$hash-0-$hash-1234
      - type: Ready
        status: False
        reason: ContainersNotReady
        count: 8
        containers:
        - name: app
          states:
          - type: Waiting
            reason: ImagePullBackOff
            count: 6
            example:
              pod: reviewsapi-$hash-0-$hash-4567
              message: "failed to pull reviewsapi:abcd123"
          - type: Waiting
            reason: ContainerCreating
            count: 1
            example:
              pod: reviewsapi-$hash-0-$hash-4567
          - type: Waiting
            reason: CrashLoopBackOff
            count: 2
            example:
              pod: reviewsapi-$hash-0-$hash-4567
              message: 'Terminated with exit code 1' # constructed by Shipper from `LastState.Terminated.ExitCode`
        - name: envoy
          states:
          - type: Waiting
            reason: CrashLoopBackOff
            count: 8
            example:
              pod: reviewsapi-$hash-0-$hash-4567
              message: 'cannot fetch service mesh config. argh!' # Read from terminationMessagePath

Caveats

Memory impact of pod informer for each cluster

This scheme is predicated on maintaining a pod informer for each cluster. For very large clusters with hundreds of thousands of pods, this may add up to a significant memory impact. Taking an extreme case, consider a management cluster orchestrating 10 Kubernetes clusters each with 5000 nodes and 100 pods per node: this represents about 50gb of heap if we think each pod is about ~10kb in memory.

Update rate for informers subscribing to a very large number of pod changes

We're not sure how client-go will handle a very high churn subscription on big clusters.

CPU impact of doing crunchy summarization work

The summarization scheme we're proposing, implemented involves a lot of aggregation over the set of pods and their containers. This might end up being a lot of CPU load for multiple very large clusters.

API call throttling updating capacity target objects for a high-churn pod fleet

We're likely to run into the client-go throttling limits when attempting to keep a CapacityTarget object up-to-date with a very large pod fleet. In this case, it should be safe to drop updates and re-process at the next resync period, or retry after a certain amount of time. None of the state depends on catching each update.

Shipper Should Stop Re-processing All Objects Over and Over

Right now, after each resync, we process all Shipper objects that exist in Kubernetes, even those that haven't changed at all.

This has two side effects:

Our queues get longer and longer as more objects are added to our clusters, because we process all of them and don't do any selective filtering
Our API requests are unnecessarily high, which means that at some point, we keep hitting our API request throttling limit.

Create a Shipperctl Command to Validate Helm Charts

Right now, there is no quick and easy way to figure out what Helm charts work with Shipper. It would be great if we could provide a command to check a chart against our constraints.

Here are the benefits for this approach:

Users can validate the charts they've written before installing it
Users can validate Helm charts in the stable repo before using them

One thing that needs mentioning here is that the scope of this command should only be to check the chart against Shipper constraints. We're not telling users if their objects will install or work as expected – we're only telling them that it would pass Shipper's rudimentary checks.

Allow rollouts to continue even if the incumbent is unhealthy

In cases where the pods of the incumbent release are crashing, our efforts to modify the number of available pods fail, because the pods never come up. This means that a new release is impossible to rollout, because incumbent will be stuck at waiting for capacity for ever.
We should allow a way to ignore the incumbent, or maybe come up with a way of heuristically detecting this and continuing even though modifications to the incumbent would fail.

Large integer `image tag` gets converted to scientific notation

When an image tag is a large integer (e.g. 20190612073634) it gets converted to the scientific notation (e.g 2.0190612073634e+13)

For example the following release configuration:

Values:
  Image:
    Repository:  [...]
    Tag:         20190612073634

Results in this pod configuration:

Containers:
  app:
    Container ID:
    Image:          [...]:2.0190612073634e+13

As a result, the pod is at InvalidImageName error state

kubectl Plugin - Shipper

What

AS
A user of Kubernetes Clusters to manage my workloads using kubectl

GIVEN
I have kubectl installed on my workstation and my build-environment

WHEN
I need to perform a Multi-Cluster phased roll-out of my application through intent-driven-specifications
AND
I runkubectl krew install shipper

THEN
I should be able to perform all actions as provide by shipperctl

Why

Kubernetes Command line interface supports plug-in(s)
Shipper is a well maintained tool for Kubernetes users.
Using Krew, if All users of Kubectl could seamlessly leverage the powers offered by shipperctl, shipper might end up becoming the defacto tool of choice for multi-cluster deployments.

References

Do not retry unrecoverable k8s API errors

It is possible that an Update to an object stored in k8s will make it fundamentally invalid: for example, it is larger than the etcd 1mb storage limit. In this case, we should not continue to retry inserting this object. This may manifest as a 422 "unprocessable entity" HTTP error.

All writes to k8s clusters should inspect their errors to decide if it is possible to retry. If it is possible, the sync handler may return an error to the workqueue. If it is not possible, we should not leave the item in the workqueue.

Only replace the services created by this Application object

If there are two Application object that both render charts with the same Service object, Shipper will constantly replace the Service.

This started happening when a user had two Application objects, and passed the same Service name as one of the values in the chart for both applications. What happened was that Shipper started replacing the Service object over and over.

Add a RollingOut condition in Application

The RollingOut condition should indicate whether or not an Application is subject to a transition between Releases.

In the case RollingOut status is True, the message should contain a description of the transition (for example, "Transitioning from 'foo-1' to 'foo-2'").

In the case RollingOut status is False, the message should inform the current deployed release (for example, "Last deployed 'foo-2'").

Can't foresee a reason where RollingOut should be set to Unknown.

Strategy checks don't account for "rounding up" pod counts

This breaks strategy progression when we're looking at small pod counts: it's easy for the incumbent to be stable with too many pods.

Shipper needs better error handling

Currently, Shipper retries all errors a small number of times, regardless of the nature of the error: permanent errors that have absolutely no chance of ever succeeding (such as an invalid Application) are retried just as many times as a transient error (such as a network blip caused a cluster to be unreachable for a short amount of time). This is suboptimal for permanent errors because they get retried needlessly and eat up resources, and it's probably a bug for transient errors, as they might not get retried enough and drop off the work queues before the issue that caused them is resolved. It is important that this gets fixed before we're able to run shipper without resyncs (#77).

To solve this, we need to:

Identify all of the permanent errors (not only the ones identified in #15) and simply stop retrying them at all.
Make the remaining errors be retried for longer than current limit of about 11 retries. Possibly we can keep on retrying them indefinitely if we're sure enough that the underlying cause is gonna be resolved "soon".

To make that happen, I guess this is where we finally introduce structured errors 😊

Also, I have been made aware of the fact that workqueue.RateLimitingInterface.AddRateLimited blocks for the period of the required delay, so in effect it can deplete shipper's capacity for handling work, so we need to be very careful with retrying operations for longer periods of time.

E2E test with multiple applications in the namespace concurrently

This test should probably include sending traffic to the resulting pods in order to verify that the Traffic Controller is doing the right thing.

Document or Automate Cleaning Clusters Up and Uninstalling Shipper

There are two ways for this:

Document the steps users should go through to remove Shipper from a management/application cluster, or
Write a shipperctl command to do those steps and output what it's doing

Strategy checks don't account for "rounding up" pod counts in traffic check

Breaks strategy progression when we're looking at small pod counts, similar to #7.

A possible solution is to transform weights into percentages and use the same API used when checking Capacity progress.

Installation controller should recreate resources on application clusters if they don't exist

Right now, we only recreate resources for the last release in the release history. We should do this for all releases.

Here are the steps to reproduce a scenario we've seen in production:

Set up an environment in which you have an Application object with at least 1 completed Release
Start a new release by changing something in your Application object
Go into the application cluster and delete the deployment belonging to the incumbent
Try to advance the contender. You should see the condition incumbentAchievedCapacity turn to false, because when Shipper is looking for the deployment for it, there isn't any.

The way this should work is that when a resource belonging to a release is deleted, it should be put back the way it was.

No intermediate release strategy state transitions on new application abort

As a followup on #11, @parhamdoustdar suggested to create this ticket as a side note to ourselves.

The problem. Geven: a new application, a vanguard strategy (3 steps: 0, 50%, 100%). A new release is happening and is progressing to step 1 (50% traffic). An abort happens (release object removal). Shipper immediately re-creates a release object and moves it back to step 0. The release itself will skip any waiting-for-traffic steps and go straight to waiting-for-command step. As suggested by @parhamdoustdar, this is somewhat we might handle with a better care.

Expected behavior: release reports at least 1 more intermediate step, like: waiting-for-traffic.

This issue requires a precise definition of an expected behavior of a release object.

Rollout blocks

The system should be able to support per–namespace and global rollout blocks. This interface should be easy to engage and disengage. A block should include a human message describing why the block is in place.

While the block is in place we should not create any new Release objects.

The blocks should be 'soft' so that you can override the block by confirming intent.

Shipper requires access to all Secrets in management cluster

Right now our client store uses a Secrets informer generated from a kubeInformerFactory configured to list things for the entire cluster. This is more power than we need. Instead, that kubeInformerFactory should be filtered to only look at the Shipper namespace.

Once this is done, our RBAC configuration should restrict Shipper's access to secrets to just the Shipper namespace.

This is part of a wider topic about changing Shipper to be a good multi-tenant citizen (other work in this area includes making Clusters a namespaced object instead of cluster scoped, and somehow indicating to Shipper the set of namespaces it should manage on startup).

Admission controller to validate our resources

We could prevent some simple errors by creating an admission controller plugin.

Some conditions we might want to check:

only one shipment order active per application at once?
edits to clusters or secrets: check that we can create a working client with that combination of credentials/hostame
read-only fields (Release.Environment)
... ?

YAML problems on the Application object are not surfaced

Today a user inserted the following Application object into a management cluster:

apiVersion: shipper.booking.com/v1alpha1                                                                                                                                                                                                                                                                                                                                                               
kind: Application                                                                                                                                                                                                                                                                                                                                                                                
metadata:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  name: some-application                                                                                                                                                                                                                                                                                                                                                                                 
  namespace: some-namespace                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
spec:                                                                                                                                                                                                                                                                                                                                                                                            
  revisionHistoryLimit: 10                                                                                                                                                                                                                                                                                                                                                                       
  template:                                                                                                                                                                                                                                                                                                                                                                                      
    chart:                                                                                                                                                                                                                                                                                                                                                                                       
      name: cool-chart                                                                                                                                                                                                                                                                                                                                                                                 
      repoUrl: https://internal-chart-repo.example.com/charts                                                                                                                                                                                                                                                                                                                                            
      version: 12                                                                                                                                                                                                                                                                                                                                                                            
    clusterRequirements:                                                                                                                                                                                                                                                                                                                                                                         
      capabilities:                                                                                                                                                                                                                                                                                                                                                                              
      - cloudlb                                                                                                                                                                                                                                                                                                                                                                                      
      regions:                                                                                                                                                                                                                                                                                                                                                                                   
      - name: us-east1                                                                                                                                                                                                                                                                                                                                                                             
        replicas: 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
    sidecars:
      foo:
        enabled: false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
    strategy:                                                                                                                                                                                                                                                                                                                                                                                    
      steps:                                                                                                                                                                                                                                                                                                                                                                                     
      - capacity:                                                                                                                                                                                                                                                                                                                                                                                
          contender: 1                                                                                                                                                                                                                                                                                                                                                                           
          incumbent: 100                                                                                                                                                                                                                                                                                                                                                                         
        name: staging                                                                                                                                                                                                                                                                                                                                                                            
        traffic:                                                                                                                                                                                                                                                                                                                                                                                 
          contender: 0                                                                                                                                                                                                                                                                                                                                                                           
          incumbent: 100                                                                                                                                                                                                                                                                                                                                                                         
      - capacity:                                                                                                                                                                                                                                                                                                                                                                                
          contender: 10                                                                                                                                                                                                                                                                                                                                                                          
          incumbent: 90                                                                                                                                                                                                                                                                                                                                                                          
        name: canary                                                                                                                                                                                                                                                                                                                                                                           
        traffic:                                                                                                                                                                                                                                                                                                                                                                                 
          contender: 10                                                                                                                                                                                                                                                                                                                                                                          
          incumbent: 90                                                                                                                                                                                                                                                                                                                                                                          
      - capacity:                                                                                                                                                                                                                                                                                                                                                                                
          contender: 100                                                                                                                                                                                                                                                                                                                                                                         
          incumbent: 0                                                                                                                                                                                                                                                                                                                                                                           
        name: full on                                                                                                                                                                                                                                                                                                                                                                            
        traffic:                                                                                                                                                                                                                                                                                                                                                                                 
          contender: 100                                                                                                                                                                                                                                                                                                                                                                         
          incumbent: 0                                                                                                                                                                                                                                                                                                                                                                           
    values:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
      image:                                                                                                                                                                                                                                                                                                                                                                                     
        repository: internal-docker-registry.example.com                                                                                                                                                                                                                                                                                                                       
        tag: deadbeef                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
      ports:                                                                                                                                                                                                                                                                                                                                                                                     
      - 443                                                                                                                                                                                                                                                                                                                                                                                      
      replicaCount: 1

This yielded errors in the Shipper logs like so:

E1122 14:54:29.831311       1 reflector.go:205] github.com/bookingcom/shipper/pkg/client/informers/externalversions/factory.go:74: Failed to list *v1alpha1.Application: v1alpha1.ApplicationList: Items: []v1alpha1.Application: v1alpha1.Application: Spec: v1alpha1.ApplicationSpec: Template: v1alpha1.ReleaseEnvironment: Sidecars: []v1alpha1.Sidecar: ReadArrayCB: expect [ or n, but found {, error found in #10 byte of ...|idecars":{"foo":|..., bigger context ... ,"sidecars":{"foo":{"enabled":false}},"strategy":{"steps":[|...

(there's an error in the Application yaml: 'sidecars' is a vestigial key that we should delete, and it had contents that our struct doesn't expect)

The Application object had no 'status' written, no events, and no Release created. As a result, the user was really confused about why their rollout wasn't going anywhere.

A few tasks here:

remove 'sidecars' from our Application/Release API, Shipper isn't doing sidecar injection and probably shouldn't start without careful consideration.
Make sure that YAML problems in the template are surfaced in 'status' and events of the Application, ideally also on a Release object.
Look into stricter validation using an admission controller or similar.

Store the Shipper deployment object in our repository

Currently, Shipper's deployment object is a release artefact that we're keeping in Github.

We should bring it into our repository, so that it's easy to modify, and takes no extra toil
We should update the documentation to point to this new URL

The first item is done in pull request #60, so we should wait for that to be merged first. Once that's done, we can clean it up and update it, then update the documentation to point to it.

Make Installation Target objects auto-contained

Objectives

The main object is to make the Installation Controller able to converge to an
installation target desired state independent of the role its Release object has
according to the current Application description (past releases, latest
incumbent, contender or currently active).

Another objective is to make the installation target object richer than it
currently is by including all information required to render and install the
Application's associated chart; this would enable a Shipper user to create
standalone installation target objects and Shipper would still try to converge
to its desired state.

Proposed Changes

`shipperv1.InstallationTarget` Changes

To achieve the objectives stated earlier in this document,
InstallationTargetSpec.Clusters should be changed from []string to a richer
type ClusterInstallationSpec representing each cluster installation target
specification, like the following type specification:

package v1

type ClusterInstallationSpec struct{
   CanOverride bool           `json:"canOverride,omitempty"`
   Chart       v1.Chart       `json:"chart"`
   Values      v1.ChartValues `json:"values,omitempty"`
}

canOverride bool: Informs the Installation Controller it should override
existing Kubernetes objects that were previously created by another installation
target object. The default value is false.
values shipperv1.ChartValues: The values that should be informed when
rendering the chart. This field should be informed by Shipper at the time the
installation target object is created.
chart shipperv1.Chart: Required chart information that will be used to
render the objects that will be installed on the application cluster. This field
should be informed by Shipper at the time the installation object is created.

Installation Controller Changes

When rendering the Kubernetes objects contained in the specified chart, the
Installation Controller should include the shipper-owned-by label with its
value being the identity of the installation target object that resulted on
the installation.

When proceeding to install the rendered Kubernetes objects on a specific
application cluster, the Installation Controller should check a) the identity
of the installation target object that lastly modified the object and b)
whether canOverride is enabled. With this two pieces of information, the
controller can properly decide whether it is allowed to override existing
objects that were lastly modified by another installation target object. In the
case where an installation target object overrides an existing Kubernetes
object in the application cluster that wasn't lastly modified by itself, the
installation controller will replace the shipper-owned-by label's value with
its own identity.

After those changes are implemented, the Installation Controller should not
use any heuristics based on the current installation target object's related
Release role (past releases, incumbent, contender or currently active).

The installation target object Ready condition should be updated accordingly:
in case of partial installation (canOverride is set to false, and
shipper-owned-by label specifies a different installation target object)
the condition's status should be set to False, with OverrideNotAllowed
as reason. This should be used to block any further actions using the
installation target as input (for example, proceeding to the next slice on the
current Release strategy's step).

Strategy Controller Changes

The Strategy Controller should orchestrate transitions by updating the related
installation target's ClusterInstallationSpec.canOverride to the appropriate
values (those operations should be performed per cluster):

Initial Release: canOverride is set to true.
Before transition: incumbent's canOverride is true, contender's is false.
After contender's installation is finished: incumbent's canOverride is
false, contender's is true. This transfers the control over to the
contender Release's installation target object, allowing the Installation
Controller to override any offending objects.

The Strategy Controller's "installation, capacity and traffic" loop should now
respect the "installation" aspect of it; this means that a strategy step can
potentially be stuck until the Ready condition is set to True.

Take a deeper look at our use of workqueue in the controller

While writing tests I encounter a problem in the release_controller_test: since there we also listen to events, I had a test failing because of a lot of re-processing of the same release key.
Ignoring events in the release controller test made said tests pass, as was replacing adding items to release workqueue from Add to AddRateLimited.
To make sure, I added to application controller test an option to listen to events. When doing so, 2 tests that used to pass, now failed. Replacing Add with AddRateLimited in the application workqueue made these tests pass again.

We use workqueue in our controllers.
A workqueue will not process a single item multiple times concurrently, and if an item is added multiple times before it can be processed, it will only be processed once. However, it is allowed for an item to be reenqueued while it is being processed. To add an item to a workqueue we can use Add(item interface{})
We use out workqueues as RateLimiters, initialised with a DefaultControllerRateLimiter. To add an item to a RateLimitingInterface interface implementing queue, we can use AddRateLimited(item interface{}), adds an item to the workqueue after the rate limiter says it's ok.

Make Our E2E Tests Choose the Right Region

Right now we're hard-coding our tests to use the eu-west region. However, with the introduction of shipperctl admin clusters apply, people could define their own clusters very easily and run tests against those. We should retrieve that cluster and use its region instead of hard-coding the region when creating cluster requirements.

chart version range specifications

In order to keep a Chart ecosystem healthy, it is important for some range of Chart updates to be applied quickly and consistently. Encoding a specific version into Application objects and then updating it with every change means that new versions are unlikely to be applied across the whole population of Application objects: the Chart maintainers might release a new version, but only some Application owners will action the upgrade, which represents toil for them.

To address this, we think it makes sense for Shipper to develop the ability to work with semver version range specifications for Charts. In this way, Application owners can specify a version constraint that they're comfortable with (for example, upwards float on minor or bugfix versions), and get those updates automatically.

Approach

Allow a version range specification as part of the version key of the chart field in Application objects; for example, a floating patch specification: ~1.1.0. When a Release is created as part of a rollout, Shipper should resolve that range specification to a concrete, pinned version, such as 1.1.6. The Release should get that pinned version in the version key.

Release will not support version range constraints: Releases must always have a concrete version. This ensures that aborts and rollbacks always go back to a predictable version. Version range resolution is a feature of the higher level Application interface.

Considerations

Become a real Helm repo client

In order to resolve the version range spec, Shipper will need to become a fully fledged Helm repo client which processes index.yaml. Right now it just combines the version and repoUrl and appends .tgz to try to download a Chart directly, which is incorrect behavior. If we were to keep this behavior, repoUrl should be renamed to chartUrl.

Speed

Shipper should check for an updated index.yaml as soon as it needs to do a rollout. This allows CI/CD pipelines to push a new chart and then immediately deploy it.

Robustness to repo unavailability

Shipper currently caches Charts aggressively. This is in order to prevent rollouts with a previously-deployed Chart from failing due to a broken Chart repository. We want to maintain this property as much as possible: we'd like to avoid going over the wire to contact the chart repo unless absolutely required. As such:

We always try to use a cached Chart tarball when dealing with concrete versions.
If the index.yaml fetch fails when resolving a version range specification, we should try to fall back to a cached index.yaml.

This means that a version range resolution may not return the latest possible version if the Chart repo is down. In this case Shipper should indicate the resolution failure or fallback using a Condition on the Application (or similar).

Release Controller Should Enforce State For All Releases Based On Their Successor If Found

Right now Shipper only reacts to the last two releases, the incumbent and contender. This means that if there is a problem in a release that is not either the incumbent and contender, it will stay there for ever.

This issue showed itself with a Release object that was supposed to be completely rolled out. however, for some reason, the spec of the CapacityTarget and TrafficTarget were still saying that this release should have 10% capacity and 10% traffic.

In a perfect world, Shipper would do the following:

Receive an update event for an old Release
Find out the next release in the history, the newer Release
Figure out what state the associated target objects (InstallationTarget, CapacityTarget, and TrafficTarget) of the old release should be, based on the newer release
Apply that to the target objects

Validate That Deployment Names Are Unique

There is an implicit requirement with Shipper that the deployments in a chart should not have the same name. That is, two different releases should result in two different deployments. The easiest way to do this is to use {{ .Release.name }} as the name of the deployment, but we never validate this.

This causes confusing errors for users. If we validate this, we have a chance to provide a better message and user experience.

Define Better Printing Columns For Our Custom Resources

Shipper CRDs are currently not taking advantage of the capabilities added in Kubernetes 1.13 for defining printing columns. These are great for improving the user experience of Shipper, so it's important for us to define and surface useful information at each level.

#67 already discusses this for Release objects, but we need to define this for other CRDs such as Application and target objects as well.

Rolling capacity upgrade on rollouts

When doing a rollout, currently there's a requirement for additional resources available to transition from one step to the next:

capacity must equal the total incumbent from current_step - 1 + total contender from current_step

When you have a limited resource quota, this means that you cannot fully take advantage of it. In the worst case, you need to reserve 50% of your resources just to be able to do a rollout.

Can we please implement an option of transitioning from one rollout step to the next that can cap the number of additional pods used at any given time?

For example, I would like to tell shipper that it can use up to 2 pods at the time, to make the shift from step A to step B, so that I know I only need to reserve capacity for 2 pods for this.

RFC: `shipperctl admin` interface

Now that we're working to simplify setup, administration, and scripting, I think it makes sense to re-open the discussion around a shipperctl CLI for shipper.

I'd like to start with how we can simplify Shipper's setup and administration, so focusing on shipperctl admin:

`shipperctl admin init`

Create YAML manifests for shipper, shipper-state-metrics, and the Shipper CRDs. These would include:

Deployment objects for each with a pinned version of Shipper
Service account for Shipper with appropriate Role and Rolebinding

Args

-n/--namespace namespace to run Shipper in, defaults shipper-system
-i/--install to apply the manifests immediately to the cluster
--kube-config same as kubectl

`shipperctl admin cluster register $name $cluster-api-url`

Create YAML manifest for a new Application cluster object.

Args

-n/--namespace shipper system namespace
-i/--install apply the manifests directly instead of spitting out to disk
--kube-config same as kubectl
--region region name for the new cluster

`shipperctl admin cluster prepare $name $cluster-api-url`

Create YAML manifests for the given application cluster:

Shipper namespace
service account
role / rolebinding

Args

-n/--namespace shipper system namespace in both clusters
-i/--install apply the manifests directly instead of spitting out to disk
--kube-config same as kubectl
--region region name for the new cluster

`shipperctl admin cluster join $name $cluster-api-url`

Combine 'register' and 'prepare --install' into a single command. This will create the namespace, service account, and role/role binding on the application cluster. Then:

Create YAML manifests for:

Shipper Cluster object
Shipper-formatted service account Secret object for this cluster (type: Opaque, etc.)

Args

-n/--namespace shipper system namespace in both clusters
-i/--install apply the manifests directly instead of spitting out to disk
--kube-config same as kubectl
--region region name for the new cluster
--insecure whether to set --insecureTLSVerify (and thus enable Kubernetes by Docker for desktop)

Surface Chart Render Errors

Right now, when there is an error in rendering a chart, we don't surface it anywhere. The Application object says that we are rolling out, ubt it's not true, because no new release gets created for it.

We should have a ChartIsValid condition that is set to false in this case with the error message we get from Helm.

clusters apply fails on minikube

Following the instructions at Shipper Docs in step 3 when applying cluster manifest

shipperctl admin clusters apply -f clusters.yaml

I get the following error:

Setting up management cluster minikube: Registering or updating custom resource definitions... Error! CustomResourceDefinition.apiextensions.k8s.io "applications.shipper.booking.com" is invalid: spec.version: Required value

installing Shipper from binary v0.1.0 on:

Ubuntu 16.04 64bit
Minikube v0.28.2
Kubectl v1.11.0

Add "kubectl explain" support for Application and Release objects

ATM it does not seems to be possible to get documentation about Application and Release objects using kubectl explain:

$ kubectl explain application.shipper.booking.com
error: Couldn't find resource for "shipper.booking.com/v1, Kind=Application"

$ kubectl explain release.shipper.booking.com
error: Couldn't find resource for "shipper.booking.com/v1, Kind=Release"

Are you planning on adding support for this feature?

bookingcom / shipper Goto Github PK

shipper's Introduction

Deprecation notice

Shipper

Why does Shipper exist?

Multi-cluster, multi-region, multi-cloud

Release Management

Roll Backs

Charts As Input

Relationship to Tiller

Documentation and Support

Demo

License

shipper's People

Contributors

Stargazers

Watchers

Forkers

shipper's Issues

Motivation

Changes

Rationale

Benefits

Overview

Application Controller

Scheduler Controller

Strategy Controller

Installation Controller

Capacity Controller

Traffic Controller

Reporting Progress

Criteria For Summarizing

Owner

Pod Condition: Type, Reason and Status

Container Name

Container State: Type and Reason

Type

Reason

Constructing Examples

Pod Name

Message

Example

Caveats

Memory impact of pod informer for each cluster

Update rate for informers subscribing to a very large number of pod changes

CPU impact of doing crunchy summarization work

API call throttling updating capacity target objects for a high-churn pod fleet

What

Why

References

Objectives

Proposed Changes

shipperv1.InstallationTarget Changes

Installation Controller Changes

Strategy Controller Changes

Approach

Considerations

Become a real Helm repo client

Speed

Robustness to repo unavailability

shipperctl admin init

Args

shipperctl admin cluster register $name $cluster-api-url

Args

shipperctl admin cluster prepare $name $cluster-api-url

Args

shipperctl admin cluster join $name $cluster-api-url

Args

Recommend Projects

Recommend Topics

Recommend Org

`shipperv1.InstallationTarget` Changes

`shipperctl admin init`

`shipperctl admin cluster register $name $cluster-api-url`

`shipperctl admin cluster prepare $name $cluster-api-url`

`shipperctl admin cluster join $name $cluster-api-url`