Giter VIP home page Giter VIP logo

cockroach-operator's Introduction

CockroachDB Kubernetes Operator

The CockroachDB Kubernetes Operator deploys CockroachDB on a Kubernetes cluster. You can use the Operator to manage the configuration of a running CockroachDB cluster, including:

  • Authenticating certificates
  • Configuring resource requests and limits
  • Scaling the cluster
  • Performing a rolling upgrade

Build Status

GKE Nightly: GKE Nightly

OpenShift Nightly: OpenShift Nightly

Limitations

Prerequisites

  • Kubernetes 1.18 or higher
  • kubectl
  • A GKE cluster (n2-standard-4 is the minimum requirement for testing)

Install the Operator

Apply the custom resource definition (CRD) for the Operator:

kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/crds.yaml

Apply the Operator manifest. By default, the Operator is configured to install in the cockroach-operator-system namespace. To use the Operator in a custom namespace, download the Operator manifest and edit all instances of namespace: cockroach-operator-system to specify your custom namespace. Then apply this version of the manifest to the cluster with kubectl apply -f {local-file-path} instead of using the command below.

kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/operator.yaml

Validate that the Operator is running:

kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
cockroach-operator-6f7b86ffc4-9ppkv   1/1     Running   0          54s

Start CockroachDB

Download the example.yaml custom resource.

Note: The latest stable CockroachDB release is specified by default in image.name.

Resource requests and limits

By default, the Operator allocates 2 CPUs and 8Gi memory to CockroachDB in the Kubernetes pods. These resources are appropriate for n2-standard-4 (GCP) and m5.xlarge (AWS) machines.

On a production deployment, you should modify the resources.requests object in the custom resource with values appropriate for your workload. For details, see the CockroachDB documentation.

Certificate signing

The Operator generates and approves 1 root and 1 node certificate for the cluster.

Apply the custom resource

Apply example.yaml:

kubectl create -f example.yaml

Check that the pods were created:

kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
cockroach-operator-6f7b86ffc4-9t9zb   1/1     Running   0          3m22s
cockroachdb-0                         1/1     Running   0          2m31s
cockroachdb-1                         1/1     Running   0          102s
cockroachdb-2                         1/1     Running   0          46s

Each pod should have READY status soon after being created.

Access the SQL shell

To use the CockroachDB SQL client, first launch a secure pod running the cockroach binary.

kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/examples/client-secure-operator.yaml

Get a shell into the client pod:

kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach/cockroach-certs --host=cockroachdb-public

If you want to access the DB Console, create a SQL user with a password while you're here:

CREATE USER roach WITH PASSWORD 'Q7gc8rEdS';

Then assign roach to the admin role to enable access to secure DB Console pages:

GRANT admin TO roach;
\q

Access the DB Console

To access the cluster's DB Console, port-forward from your local machine to the cockroachdb-public service:

kubectl port-forward service/cockroachdb-public 8080

Access the DB Console at https://localhost:8080.

Scale the CockroachDB cluster

Note: Due to a known issue, automatic pruning of PVCs is currently disabled by default. This means that after decommissioning and removing a node, the Operator will not remove the persistent volume that was mounted to its pod. If you plan to eventually scale up the cluster after scaling down, you will need to manually delete any PVCs that were orphaned by node removal before scaling up. For more information, see the documentation.

To scale the cluster up and down, modify nodes in the custom resource. For details, see the CockroachDB documentation.

Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on CockroachDB and will cause errors.

Note: You must scale by updating the nodes value in the Operator configuration. Using kubectl scale statefulset <cluster-name> --replicas=4 will result in new pods immediately being terminated.

Upgrade the CockroachDB cluster

Perform a rolling upgrade by changing image.name in the custom resource. For details, see the CockroachDB documentation.

Stop the CockroachDB cluster

Delete the custom resource:

kubectl delete -f example.yaml

Remove the Operator:

kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/manifests/operator.yaml

Note: If you want to delete the persistent volumes and free up the storage used by CockroachDB, be sure you have a backup copy of your data. Data cannot be recovered once the persistent volumes are deleted. For more information, see the Kubernetes documentation.

Releases

We have a few phases to our releases. The first involves creating a new branch, updating the version, and then getting a PR merged into master with all of the generated files.

Subsequent steps will need to be carried out in TeamCity and RedHat Connect.

Creating a new release PR

From a clean, up-to-date master (seriously...check), run the following where <version> is the desired new version (e.g. 2.2.0).

$ make release/new VERSION=<version>
...
...
$ git push origin release-$(cat version.txt)

This will do the following for you:

  • Create a new branch named release-<version>
  • Update version.txt
  • Generate the manifest, bundles, etc.
  • Commit the changes with the message Bump version to <version>.
  • Push to a new branch on origin (that wasn't automated)

Tag the release

After the PR is merged run the following to create the tag (you'll need to be a member of CRL to do this).

git tag v$(cat version.txt)
git push upstream v$(cat version.txt)

Run Release Automation

From here, the rest of the release process is done with TeamCity. A CRL team member will need to perform some manual steps in RedHat Connect as well. Ping one of us in Slack for info.

cockroach-operator's People

Contributors

abhishekdwivedi3060 avatar alinadonisa avatar antonioua avatar chrislovecnm avatar chrisseto avatar davidwding avatar dbist avatar falcon-pioupiou avatar github-actions[bot] avatar himanshu-cockroach avatar jfrconley avatar jlinder avatar juanleon1 avatar keith-mcclellan avatar mbrancato avatar meridional avatar moonsphere avatar neurodrone avatar noguchitoshi avatar pawelprazak avatar prafull01 avatar pseudomuto avatar rail avatar sukki37 avatar sumlare avatar taroface avatar udnay avatar vladdy avatar zmalik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cockroach-operator's Issues

Recover Docker PR

Somehow #16 did not end up in the master after merge. It is required to find out why it happened and ensure the needed changes get into the main branch.

cc @johnrk

Downgrade/Upgrade fails with EmptyDir

Downgraded a cluster from cockroachdb/cockroach:v20.1.3 to cockroachdb/cockroach:v20.1.2 and got

I200713 14:46:01.758684 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:02.466477 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:02.466658 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.468843 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.469082 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.758602 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.758686 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:04.469214 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:04.469488 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
W200713 14:46:04.593992 234 kv/kvserver/node_liveness.go:563  [n1,liveness-hb] slow heartbeat took 4.5s
W200713 14:46:04.594057 234 kv/kvserver/node_liveness.go:488  [n1,liveness-hb] failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.5s
(1) operation "node liveness heartbeat" timed out after 4.5s
Wraps: (2) context deadline exceeded
Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
I200713 14:46:05.471587 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:05.471726 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:06.471659 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:06.472125 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.474348 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.474422 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.577865 227 server/status/runtime.go:498  [n1] runtime stats: 182 MiB RSS, 227 goroutines, 76 MiB/43 MiB/115 MiB GO alloc/idle/total, 19 MiB/25 MiB CGO alloc/total, 14.9 CGO/sec, 1.4/0.7 %(u/s)time, 0.0 %gc (0x), 46 KiB/60 KiB (r/w)net
I200713 14:46:07.759240 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.759331 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.104190 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.474522 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.474745 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
W200713 14:46:09.094304 234 kv/kvserver/node_liveness.go:563  [n1,liveness-hb] slow heartbeat took 4.5s
W200713 14:46:09.094403 234 kv/kvserver/node_liveness.go:488  [n1,liveness-hb] failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.5s
(1) operation "node liveness heartbeat" timed out after 4.5s
Wraps: (2) context deadline exceeded
Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
I200713 14:46:09.477148 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:09.759314 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:09.759448 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:10.477071 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:10.477388 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.479423 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.479698 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.656282 15212 rpc/nodedialer/nodedialer.go:160  [n1] unable to connect to n3: failed to connect to n3 at crdb-tls-enabled-2.crdb-tls-enabled.default.svc.cluster.local:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client requested node ID 3 doesn't match server node ID 4
I200713 14:46:11.759730 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.760068 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.975049 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.975415 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:12.479738 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:12.480122 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:13.483348 6839 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:13.483692 6703 gossip/server.go:227  [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:262

Pods are not starting

First Tasks

cc @johnrk

List of the first possible task that I need to complete. We can figure out what you want me to knock out first

  • CI
  • scripted end to end tasks
  • implement new functionality

CI

  1. Get the binary and docker building in TeamCity
  2. Push the binary to a private CGP registry - we can switch this to Dockerhub once the operator is deemed as beta-level by your team

e2e testing

  1. Get a basic e2e test running when the container is pushed to the gcp registry
  2. build a k8s cluster
  3. run the e2e tests

Implement new functionality

From the requirements document:

As an operator, I can configure CockroachDB custom resources on a
new cluster in Kubernetes, so that I can initialize a cluster that will meet
my deployment needs.

What is missing currently? Do we have a diff on what still needs implementation in the statefulset?

e2e testing error

I ran into this error, which might be a flake or a framework error:

  TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster: e2e_test.go:109:
                Error Trace:    e2e_test.go:109
                Error:          Received unexpected error:
                                an error on the server ("Internal Server Error: \"/apis/crd.projectcalico.org/v1/namespaces/crdb-test-8vk9nj/ne
                                failed to list objects in namespace crdb-test-8vk9nj
                                github.com/cockroachdb/cockroach-operator/pkg/testutil/env.listAllObjs
                                        /ws/pkg/testutil/env/sandbox.go:193
                                github.com/cockroachdb/cockroach-operator/pkg/testutil/env.(*DiffingSandbox).Diff
                                        /ws/pkg/testutil/env/sandbox.go:174
                                github.com/cockroachdb/cockroach-operator/e2e.TestCreatesSecureClusterWithGeneratedCert.func1
                                        /ws/e2e/e2e_test.go:108
                                testing.tRunner
                                        /usr/local/go/src/testing/testing.go:991
                                runtime.goexit
                                        /usr/local/go/src/runtime/asm_amd64.s:1373
                Test:           TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster

It does not see to be an error with the operator.

Pod security with manifest that is deployed

I need to look at the Pod manifest that we are deploying with the operator. I think it still has the auth token mounted. I should also double-check the container security as well. I might be wrong on the auth token, but we may need to remove it.

Future CI

First phase task list

This issue is for me to track the first phase of work. This is brainstorming so that we have an initial list of stuff.

Here is a breakdown of tasks:

  • Learn more about coachroachdb
  • Learn more about the db helm chart
  • Find documentation for various configuration
  • Learn more about the operator
  • Test the operator and determine a baseline
  • Get the operator building
  • Determine if the project has CI/CD
  • Get the tests for the operator working
  • Openstack testing is one thing I am concerned about
  • Fix headers in code to meet apache 2
  • Document coding standard that this project is meeting

Administrative Items

  • Get Johns Github ID and start organizing the project in GitHub
  • Determine standup schedule
  • Get slack access
  • cloud access
  • Clean up Github issues a bit
  • I need GitHub access to do stuff like add labels and such
  • Status of #26

Rename to cockroachdb-operator?

I think we should probably rename this to cockroachdb-operator (or maybe cockroach-operator) for consistency with everything else - we don't really use crdb in formal contexts much. It'll get a lot harder to change after this ships, so if we're going to change it we should do it soon.

Update CRDB pod config

Provide a mechanism for updating the pod config for an existing CRDB cluster (add more pods, change persistent volume size, change CPU count, etc)

Google Cloud Marketplace

As an operator, I can get the latest Google Anthos certified
CockroachDB Kubernetes Operator on the Google Cloud Marketplace,
so that I can purchase and deploy CockroachDB on my Google Cloud
Anthos-based environment.

Nodes value in the API might be a tad confusing

We have the following value in the API

Nodes int32 `json:"nodes"`
// (Required) Container image with supported CockroachDB version

The term Node and Nodes are also used by Kubernetes

https://kubernetes.io/docs/concepts/architecture/nodes/

@johnrk I recommend renaming this to something like PodReplicaCount, Replicas, or something else that does not use the term Node.

I am not the best at figuring out names that make sense, but I am good at figuring out overloaded terms, and this one is overloaded.

Improve e2e testing kubectl auth with GKE

I am testing against gke and I am getting a kubectl auth timeout. The container that is running expects to be able to access gcloud binary that is set in my kubeconfig file. This command is the helper command to refresh the auth token for the cluster.

For instance:

        cmd-path: /Users/clove/Downloads/google-cloud-sdk/bin/gcloud

The container needs to run gcloud auth inside of the container, and not rely on my kubeconfig file.

gcloud container clusters get-credentials "$CLUSTER_NAME" --zone "$ZONE"

There are a bunch of ways to fix this, but the auth for kubectl is different between cloud providers and k8s cluster types on cloud providers. I would like this to work against multiple different cloud providers, so I need to work on this a bit.

Acceptance testing platforms

@johnrk what platforms does this operator needs to be tested against? We are building this to support on-prem as well as cloud-deployed k8s. What type of on-prem installations? I think this is a shortlist of testing that we can start with:

  • Kind (baseline)
  • GKE
  • EKS
  • kops on AWS
  • Some bare metal / VMware / Openstack

Create a new secure cluster

  1. download an intermediate cert from K8s CA
  2. sign a cert bundle for the cluster
  3. Start an {n} node secure cluster using this bundle with PVs (Persistent Volumes) using a custom name
  4. Initialize the database

[crdb-operator] [bug] unable to make changes (e.g. crdb version, cache size, # of nodes, size of pvc) to an existing k8s cluster

Unable to make changes to an existing k8s cluster
As one of the expected features in the K8s Operator MVP, it appears the Operator is not watching for any changes to a Cockroach cluster. When starting up a new cluster, it appears to properly apply the spec.

I think Vlad intended to watch for changes here: https://github.com/cockroachdb/crdb-operator/blob/master/pkg/controller/cluster_controller.go#L103

Test Details

  1. Setup a K8s cluster deployed on GCP, following these instructions: https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html
  • gcloud container clusters create cockroachdb --machine-type n1-standard-4
  • kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=[email protected]
  1. Apply CRD: kubectl apply -f ./config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml
  2. Apply deployment file (in PR): kubectl apply -f deploy/operator.yaml
  3. Apply example config to create new cockroachdb pods: kubectl apply -f config/examples/example.yaml
  • at this point it works
  1. then make minor change to config/examples/example.yaml, such as major version, minor version, number of nodes, size of persistent volume, cache

Expected Result: K8s is watching the status of the pods, notices a discrepancy between the status of the pods and their spec, in reconcile function, updates the pods

Actual Result: no restarts happen, when looking at the node in gcp and when opening a crdb client, nothing changed

Intial CI configured in Teamcity

When we have a PR we need to run the following make targets

  • make docker/build/operator-ubi
  • make docker/build/test-runner
  • make test
  • make lint

Operator and database configurations of Custom Resources

  1. As an operator, I can configure CockroachDB custom resources on a
    new cluster in Kubernetes, so that I can initialize a cluster that will meet
    my deployment needs.
  2. As an operator, I can modify custom resources on an existing
    CockroachDB cluster in Kubernetes, so that I can tune my cluster to
    meet my changing workload needs.

Pod crashed when it first started

This is a partial log dump. The pod restarted and came up, but I am wondering if this is a timing issue.

This is from running the container cockroachdb/cockroach:v19.2.6

Factory (0x7ff46d8153e8)
  cache_index_and_filter_blocks: 0
  cache_index_and_filter_blocks_with_high_priority: 0
  pin_l0_filter_and_index_blocks_in_cache: 0
  pin_top_level_index_and_filter: 1
  index_type: 0
  data_block_index_type: 0
  index_shortening: 1
  data_block_hash_table_util_ratio: 0.750000
  hash_index_allow_collision: 1
  checksum: 1
  no_block_cache: 0
  block_cache: 0x7ff46d976210
  block_cache_name: LRUCache
  block_cache_options:
    capacity : 650437632
    num_shard_bits : 4
    strict_capacity_limit : 0
    memory_allocator : None
    high_pri_pool_ratio: 0.000
  block_cache_compressed: (nil)
  persistent_cache: (nil)
  block_size: 32768
  block_size_deviation: 10
  block_restart_interval: 16
  index_block_restart_interval: 1
  metadata_block_size: 4096
  partition_filters: 0
  use_delta_encoding: 1
  filter_policy: rocksdb.BuiltinBloomFilter
  whole_key_filtering: 0
  verify_compression: 0
  read_amp_bytes_per_bit: 0
  format_version: 2
  enable_index_compression: 1
  block_align: 0
I200708 20:44:20.993521 22 storage/engine/rocksdb.go:120         Options.write_buffer_size: 67108864
I200708 20:44:20.993609 22 storage/engine/rocksdb.go:120   Options.max_write_buffer_number: 4
I200708 20:44:20.993704 22 storage/engine/rocksdb.go:120           Options.compression: Snappy
I200708 20:44:20.993773 22 storage/engine/rocksdb.go:120                   Options.bottommost_compression: Disabled
I200708 20:44:20.993837 22 storage/engine/rocksdb.go:120        Options.prefix_extractor: cockroach_prefix_extractor
I200708 20:44:20.993909 22 storage/engine/rocksdb.go:120    Options.memtable_insert_with_hint_prefix_extractor: nullptr
I200708 20:44:20.993975 22 storage/engine/rocksdb.go:120              Options.num_levels: 7
I200708 20:44:20.994039 22 storage/engine/rocksdb.go:120         Options.min_write_buffer_number_to_merge: 1
I200708 20:44:20.994047 22 storage/engine/rocksdb.go:120      Options.max_write_buffer_number_to_maintain: 0
I200708 20:44:20.994053 22 storage/engine/rocksdb.go:120             Options.bottommost_compression_opts.window_bits: -14
I200708 20:44:20.994073 22 storage/engine/rocksdb.go:120                   Options.bottommost_compression_opts.level: 32767
I200708 20:44:20.994170 22 storage/engine/rocksdb.go:120                Options.bottommost_compression_opts.strategy: 0
I200708 20:44:20.994193 22 storage/engine/rocksdb.go:120          Options.bottommost_compression_opts.max_dict_bytes: 0
I200708 20:44:20.994200 22 storage/engine/rocksdb.go:120          Options.bottommost_compression_opts.zstd_max_train_bytes: 0
I200708 20:44:20.994206 22 storage/engine/rocksdb.go:120                   Options.bottommost_compression_opts.enabled: false
I200708 20:44:20.994212 22 storage/engine/rocksdb.go:120             Options.compression_opts.window_bits: -14
I200708 20:44:20.994219 22 storage/engine/rocksdb.go:120                   Options.compression_opts.level: 32767
I200708 20:44:20.994228 22 storage/engine/rocksdb.go:120                Options.compression_opts.strategy: 0
I200708 20:44:20.994234 22 storage/engine/rocksdb.go:120          Options.compression_opts.max_dict_bytes: 0
I200708 20:44:20.994239 22 storage/engine/rocksdb.go:120          Options.compression_opts.zstd_max_train_bytes: 0
I200708 20:44:20.994245 22 storage/engine/rocksdb.go:120                   Options.compression_opts.enabled: false
I200708 20:44:20.994251 22 storage/engine/rocksdb.go:120       Options.level0_file_num_compaction_trigger: 2
I200708 20:44:20.994259 22 storage/engine/rocksdb.go:120           Options.level0_slowdown_writes_trigger: 950
I200708 20:44:20.994265 22 storage/engine/rocksdb.go:120               Options.level0_stop_writes_trigger: 1000
I200708 20:44:20.994271 22 storage/engine/rocksdb.go:120                    Options.target_file_size_base: 4194304
I200708 20:44:20.994276 22 storage/engine/rocksdb.go:120              Options.target_file_size_multiplier: 2
I200708 20:44:20.994282 22 storage/engine/rocksdb.go:120                 Options.max_bytes_for_level_base: 67108864
I200708 20:44:20.994291 22 storage/engine/rocksdb.go:120  Options.level_compaction_dynamic_level_bytes: 1
I200708 20:44:20.994299 22 storage/engine/rocksdb.go:120           Options.max_bytes_for_level_multiplier: 10.000000
I200708 20:44:20.994304 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[0]: 1
I200708 20:44:20.994310 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[1]: 1
I200708 20:44:20.994315 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[2]: 1
I200708 20:44:20.994324 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[3]: 1
I200708 20:44:20.994329 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[4]: 1
I200708 20:44:20.994335 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[5]: 1
I200708 20:44:20.994340 22 storage/engine/rocksdb.go:120  Options.max_bytes_for_level_multiplier_addtl[6]: 1
I200708 20:44:20.994346 22 storage/engine/rocksdb.go:120        Options.max_sequential_skip_in_iterations: 8
I200708 20:44:20.994354 22 storage/engine/rocksdb.go:120                     Options.max_compaction_bytes: 104857600
I200708 20:44:20.994360 22 storage/engine/rocksdb.go:120                         Options.arena_block_size: 8388608
I200708 20:44:20.994365 22 storage/engine/rocksdb.go:120    Options.soft_pending_compaction_bytes_limit: 2199023255552
I200708 20:44:20.994371 22 storage/engine/rocksdb.go:120    Options.hard_pending_compaction_bytes_limit: 4400193994752
I200708 20:44:20.994380 22 storage/engine/rocksdb.go:120        Options.rate_limit_delay_max_milliseconds: 100
I200708 20:44:20.994385 22 storage/engine/rocksdb.go:120                 Options.disable_auto_compactions: 0
I200708 20:44:20.994393 22 storage/engine/rocksdb.go:120                         Options.compaction_style: kCompactionStyleLevel
I200708 20:44:20.994404 22 storage/engine/rocksdb.go:120                           Options.compaction_pri: kMinOverlappingRatio
I200708 20:44:20.994409 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.size_ratio: 1
I200708 20:44:20.994418 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.min_merge_width: 2
I200708 20:44:20.994425 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.max_merge_width: 4294967295
I200708 20:44:20.994431 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.max_size_amplification_percent: 200
I200708 20:44:20.994437 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.compression_size_percent: -1
I200708 20:44:20.994443 22 storage/engine/rocksdb.go:120  Options.compaction_options_universal.stop_style: kCompactionStopStyleTotalSize
I200708 20:44:20.994452 22 storage/engine/rocksdb.go:120  Options.compaction_options_fifo.max_table_files_size: 1073741824
I200708 20:44:20.994458 22 storage/engine/rocksdb.go:120  Options.compaction_options_fifo.allow_compaction: 0
I200708 20:44:20.994467 22 storage/engine/rocksdb.go:120                    Options.table_properties_collectors: TimeBoundTblPropCollectorFactory; DeleteRangeTblPropCollectorFactory;
I200708 20:44:20.994474 22 storage/engine/rocksdb.go:120                    Options.inplace_update_support: 0
I200708 20:44:20.994480 22 storage/engine/rocksdb.go:120                  Options.inplace_update_num_locks: 10000
I200708 20:44:20.994497 22 storage/engine/rocksdb.go:120                Options.memtable_prefix_bloom_size_ratio: 0.000000
I200708 20:44:20.994503 22 storage/engine/rocksdb.go:120                Options.memtable_whole_key_filtering: 0
I200708 20:44:20.994509 22 storage/engine/rocksdb.go:120    Options.memtable_huge_page_size: 0
I200708 20:44:20.994515 22 storage/engine/rocksdb.go:120                            Options.bloom_locality: 0
I200708 20:44:20.994520 22 storage/engine/rocksdb.go:120                     Options.max_successive_merges: 0
I200708 20:44:20.994529 22 storage/engine/rocksdb.go:120                 Options.optimize_filters_for_hits: 1
I200708 20:44:20.994535 22 storage/engine/rocksdb.go:120                 Options.paranoid_file_checks: 0
I200708 20:44:20.994540 22 storage/engine/rocksdb.go:120                 Options.force_consistency_checks: 0
I200708 20:44:20.994880 22 storage/engine/rocksdb.go:120                 Options.report_bg_io_stats: 0
I200708 20:44:20.994961 22 storage/engine/rocksdb.go:120                                Options.ttl: 0
I200708 20:44:20.995010 22 storage/engine/rocksdb.go:120           Options.periodic_compaction_seconds: 0
I200708 20:44:20.995873 22 storage/engine/rocksdb.go:120  [db/version_set.cc:4286] Recovered from manifest file:/cockroach/cockroach-data/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
I200708 20:44:20.996005 22 storage/engine/rocksdb.go:120  [db/version_set.cc:4295] Column family [default] (ID 0), log number is 0
I200708 20:44:21.002097 22 storage/engine/rocksdb.go:120  DB pointer 0x7ff46d963000
I200708 20:44:21.002442 22 server/config.go:502  [n?] 1 storage engine initialized
I200708 20:44:21.002585 22 server/config.go:505  [n?] RocksDB cache size: 684 MiB
I200708 20:44:21.002669 22 server/config.go:505  [n?] store 0: RocksDB, max size 0 B, max open file limit 1043576
W200708 20:44:21.003037 22 gossip/gossip.go:1517  [n?] no incoming or outgoing connections
I200708 20:44:21.003225 22 server/server.go:1391  [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node.
I200708 20:44:21.017215 75 gossip/client.go:124  [n?] started gossip client to crdb-0.crdb.default:26257
I200708 20:44:21.019346 22 server/node.go:645  [n?] connecting to gossip network to verify cluster ID...
I200708 20:44:21.019582 22 server/node.go:665  [n?] node connected via gossip and verified as part of cluster "cf11fb7a-2dc4-4976-9af7-87b88567520e"
I200708 20:44:21.036451 22 server/node.go:381  [n?] new node allocated ID 2
I200708 20:44:21.036825 22 gossip/gossip.go:394  [n2] NodeDescriptor set to node_id:2 address:<network_field:"tcp" address_field:"crdb-1.crdb.default.svc.cluster.local:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:2 patch:0 unstable:0 > build_tag:"v19.2.6" started_at:1594241061036645861 cluster_name:"" sql_address:<network_field:"tcp" address_field:"crdb-1.crdb.default.svc.cluster.local:26257" >
I200708 20:44:21.037014 22 storage/stores.go:240  [n2] read 0 node addresses from persistent storage
I200708 20:44:21.037204 22 storage/stores.go:259  [n2] wrote 1 node addresses to persistent storage
I200708 20:44:21.088507 22 server/node.go:620  [n2] bootstrapped store [n2,s2]
I200708 20:44:21.090884 22 server/node.go:512  [n2] node=2: started with [<no-attributes>=/cockroach/cockroach-data] engine(s) and attributes []
I200708 20:44:21.091299 22 server/server.go:1519  [n2] starting http server at [::]:8080 (use: crdb-1.crdb.default.svc.cluster.local:8080)
I200708 20:44:21.091457 22 server/server.go:1526  [n2] starting grpc/postgres server at [::]:26257
I200708 20:44:21.091671 22 server/server.go:1527  [n2] advertising CockroachDB node at crdb-1.crdb.default.svc.cluster.local:26257
W200708 20:44:21.124763 22 jobs/registry.go:340  [n2] unable to get node liveness: node not in the liveness table
I200708 20:44:21.196382 172 sql/event_log.go:130  [n2,intExec=add-constraints-ttl] Event: "set_zone_config", target: 25, info: {Target:TABLE system.public.replication_constraint_stats Config: Options:"gc.ttlseconds" = 600 User:root}
I200708 20:44:21.243416 180 sql/event_log.go:130  [n2,intExec=add-replication-status-ttl] Event: "set_zone_config", target: 27, info: {Target:TABLE system.public.replication_stats Config: Options:"gc.ttlseconds" = 600 User:root}
I200708 20:44:21.247871 185 sql/sqlbase/structured.go:1529  [n2,intExec=update-reports-meta-generated] publish: descID=28 (reports_meta) version=2 mtime=1970-01-01 00:00:00 +0000 UTC
I200708 20:44:22.618476 60 gossip/gossip.go:1531  [n2] node has connected to cluster via gossip
I200708 20:44:22.618881 60 storage/stores.go:259  [n2] wrote 1 node addresses to persistent storage
I200708 20:44:31.110661 143 server/status/runtime.go:498  [n2] runtime stats: 130 MiB RSS, 138 goroutines, 78 MiB/46 MiB/125 MiB GO alloc/idle/total, 2.4 MiB/3.4 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (9x), 69 KiB/59 KiB (r/w)net
E200708 20:44:31.258105 204 sql/flowinfra/flow_registry.go:234  [n2,intExec=count-leases] flow id:f166c019-b942-4adf-bec6-493b3f5f4875 : 1 inbound streams timed out after 10s; propagated error throughout flow
E200708 20:44:31.508664 22 util/log/crash_reporting.go:537  [n2] Reported as error a5c43e29b97e4a328e5ce96e5e88b509
F200708 20:44:31.508900 22 server/server.go:1592  [n2] error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:472
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:569
    github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).ExecWithUser
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:320
    github.com/cockroachdb/cockroach/pkg/sqlmigrations.glob..func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:242
    github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations
    	/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:552
    github.com/cockroachdb/cockroach/pkg/server.(*Server).Start
    	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1586
    github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2
    	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699
    github.com/cockroachdb/cockroach/pkg/cli.runStart.func3
    	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
  - error with embedded safe details: update-reports-meta-generated
  - update-reports-meta-generated:
  - error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:472
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:569
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).queryInternal
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:252
    github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).Query
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:223
    github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).QueryRow
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:278
    github.com/cockroachdb/cockroach/pkg/sql.CountLeases
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:535
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).checkTableTwoVersionInvariant
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:505
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).commitSQLTransaction
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:577
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).handleAutoCommit
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1255
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState.func5
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:211
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:446
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:98
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1243
    github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1172
    github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:202
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
  - error with embedded safe details: count-leases
  - count-leases:
  - no inbound stream connection
    github.com/cockroachdb/cockroach/pkg/sql/flowinfra.init.ializers
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow_registry.go:30
    runtime.main
    	/usr/local/go/src/runtime/proc.go:188
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
failed to run migration "change reports fields from timestamp to timestamptz"
github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations
	/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:553
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1586
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337
goroutine 22 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc00041c300, 0xc00041c300, 0x0, 0xc00063c6b8)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1024 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x753dbe0, 0xc000000004, 0x6ceef15, 0x10, 0x638, 0xc000ecd900, 0x12c2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:871 +0x95b
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4a895a0, 0xc00068c540, 0x4000000000000004, 0x2, 0x40efecb, 0x3, 0xc000a5ce40, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4a895a0, 0xc00068c540, 0x1, 0xc000000004, 0x40efecb, 0x3, 0xc000a5ce40, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start(0xc0009a0000, 0x4a895a0, 0xc000d40390, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1592 +0x2b9b
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2(0xc0006767e0, 0xc000010538, 0xc0002b4220, 0x4a895a0, 0xc000d40390, 0xc00008b700, 0x30cb81ed, 0xed6982724, 0x0, 0x7c716e, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699 +0x10d
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3(0xc000010538, 0x4a895a0, 0xc000d40390, 0x4af3940, 0xc000b889a0, 0xc0006767e0, 0xc0002b4220, 0x0, 0x30cb81ed, 0xed6982724, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814 +0x12e
created by github.com/cockroachdb/cockroach/pkg/cli.runStart
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:655 +0x8f1


****************************************************************************

This node experienced a fatal error (printed above), and as a result the
process is terminating.

Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
problem in CockroachDB. With your help, the support team at Cockroach Labs
will try to determine the root cause, recommend next steps, and we can
improve CockroachDB based on your report.

Please submit a crash report by following the instructions here:

    https://github.com/cockroachdb/cockroach/issues/new/choose

If you would rather not post publicly, please contact us directly at:

    [email protected]

The Cockroach Labs team appreciates your feedback.
I200708 20:44:31.509939 1 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber

Operators Major Version Upgrade/Downgrades

As an operator, I can perform major version upgrades via rolling restarts,
so that my application can take advantage of new features and use on a
more stable version of CockroachDB without losing availability.

Developer Documentation

Most of what is in the README.md is developer documentation. We probably should move that into a developer document, and start working on user documentation in the README.md.

e2e tests failures on GKE

We are getting some test failures because of calico annotations

TestCreatesInsecureCluster/creates_1-node_insecure_cluster: assert.go:10: unexpected result (-want +got):
          strings.Join({
                ... // 14 identical lines
                "kind: Pod",
                "metadata:",
        +       "  annotations:",
        +       "    cni.projectcalico.org/podIP: 10.56.1.5/32",
                "  labels:",
                "    app.kubernetes.io/component: database",
                ... // 233 identical lines
          }, "\n")
--- FAIL: TestCreatesInsecureCluster (24.40s)
    --- FAIL: TestCreatesInsecureCluster/creates_1-node_insecure_cluster (21.38s)
TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster: assert.go:10: unexpected result (-want +got):
          strings.Join({
                ... // 15 identical lines
                "kind: Pod",
                "metadata:",
        +       "  annotations:",
        +       "    cni.projectcalico.org/podIP: 10.56.1.6/32",
                "  labels:",
                "    app.kubernetes.io/component: database",
                ... // 299 identical lines
          }, "\n")
--- FAIL: TestCreatesSecureClusterWithGeneratedCert (45.40s)
    --- FAIL: TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster (42.37s)

Not cleaning up PVCs

When we delete a database we do not seem to delete the PVCs correctly

# you need a bad storage class name or request over quota in the example
# this step has to fail
./hack/apply-apply-crdb-example.sh -c test
# then delete
./hack/apply-delete-crdb-example.sh -c test

You can then see the hanging PVC

$ k get pvc
NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-crdb-0   Pending                                      crdb-io1       3m43s

applying crd requires turning off client side validation (--validate=false)

When applying the CRD (kubectl apply -f config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml), it requires using the --validate=false flag.

At the moment, I receive this error:
error: error validating "config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml": error validating data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.dataStore.properties.emptyDir.properties.sizeLimit): unknown field "x-kubernetes-int-or-string" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaProps; if you choose to ignore these errors, turn validation off with --validate=false

Environment:
Running on a k8s cluster on GKE, following these instructions: https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html

kubectl version:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.36", GitCommit:"34a615f32e9a0c9e97cdb9f749adb392758349a6", GitTreeState:"clean", BuildDate:"2020-04-06T16:33:17Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

Improve testing - Using text files that represent object might be able to be improved.

Testing against strings that represent objects is a bit too fragile for my preference. I am wondering if

var expected Pod *v1.Pod
cmp.Diff(expectedPod, actualPod)

Would work better. When we update API or the k8s API is updated I do not want to modify a ton of text documents. Also as I an into with Calico, the objects are even modified by the Kubernetes server. I had to remove the Pod annotations added by calico in order to get the e2e testing to pass.

I like it when unit won't compile when APIs are updated. I am uncertain about how to address this well.

I would love feedback on this design.

Makefile improvement

  • documentation in the file
  • .PHONY statements
  • allow for specific testing targets with variables

Update CRDB configuration

Provide a mechanism for updating the configuration of an existing crdb cluster (e.g. update cache settings, etc)

RedHat Marketplace support

As an operator, I can get the latest OpenShift certified CockroachDB
Kubernetes Operator on the Red Hat Marketplace, so that I can more
easily purchase and deploy CockroachDB in my OpenShift environment.

Scripts for CI Support

This is a larger issue tracking sub-tasks. We need to get this project building within TeamCity.

  • script out unit testing. Might be able to use Makefile
  • push containers for testing
  • create containers
  • script out integration testing (big task)

Note: we need to ensure that #36 will run on the build servers or use a container.

Feature parity with helm chart

Our helm chart currently has features not found in this operator. One of our goals for the operator is to be able to deprecate and replace the helm chart.

There's some upcoming maintenance work that we'd like to be able to avoid due to infrastructure changes in the helm project. The deadline to either do that work or deprecate the helm chart is Aug 13.

Error when deleting

k delete -f ../config/examples/example.yaml

and we get

2020-07-08T18:52:28.256Z	ERROR	controller.CrdbCluster	failed to retrieve CrdbCluster resource	{"CrdbCluster": "default/crdb", "error": "CrdbCluster.crdb.cockroachlabs.com \"crdb\" not found"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile
	/workspace/pkg/controller/cluster_controller.go:50
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Minor Version Upgrades/Downgrades

As an operator, I can perform minor version upgrades via rolling restarts,
so that my application can start using a more stable version of
CockroachDB without losing availability.

Minor version upgrades do not require a finalization step.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.