pravega / pravega-operator Goto Github PK

Pravega Kubernetes Operator

License: Apache License 2.0

Go 97.21% Shell 1.65% Dockerfile 0.37% Makefile 0.77%

pravega-operator's Introduction

Pravega

Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.

To learn more about Pravega, visit https://pravega.io

Prerequisites

Java 11+

In spite of the requirements of using JDK 11+ to build this project, client artifacts (and its dependencies) must be compatible with a Java 8 runtime. All other components are built and ran using JDK11+.

The clientJavaVersion project property determines the version used to build the client (defaults to 8).

Building Pravega

Checkout the source code:

git clone https://github.com/pravega/pravega.git
cd pravega

Build the pravega distribution:

./gradlew distribution

Install pravega jar files into the local maven repository. This is handy for running the pravega-samples locally against a custom version of pravega.

./gradlew install

Running unit tests:

./gradlew test

Setting up your IDE

Pravega uses Project Lombok so you should ensure you have your IDE setup with the required plugins. Using IntelliJ is recommended.

To import the source into IntelliJ:

Import the project directory into IntelliJ IDE. It will automatically detect the gradle project and import things correctly.
Enable Annotation Processing by going to Build, Execution, Deployment -> Compiler > Annotation Processors and checking 'Enable annotation processing'.
Install the Lombok Plugin. This can be found in Preferences -> Plugins. Restart your IDE.
Pravega should now compile properly.

For eclipse, you can generate eclipse project files by running ./gradlew eclipse.

^{Note: Some unit tests will create (and delete) a significant amount of files. For improved performance on Windows machines, be sure to add the appropriate 'Microsoft Defender' exclusion.}

Releases

The latest pravega releases can be found on the Github Release project page.

Snapshot artifacts

All snapshot artifacts from master and release branches are available in GitHub Packages Registry

Add the following to your repositories list and import dependencies as usual.

maven {
    url "https://maven.pkg.github.com/pravega/pravega"
    credentials {
        username = "pravega-public"
        password = "\u0067\u0068\u0070\u005F\u0048\u0034\u0046\u0079\u0047\u005A\u0031\u006B\u0056\u0030\u0051\u0070\u006B\u0079\u0058\u006D\u0035\u0063\u0034\u0055\u0033\u006E\u0032\u0065\u0078\u0039\u0032\u0046\u006E\u0071\u0033\u0053\u0046\u0076\u005A\u0049"
    }
}

Note GitHub Packages requires authentication to download packages thus credentials above are required. Use the provided password as is, please do not decode it.

If you need a dedicated token to use in your repository (and GitHub Actions) please reach out to us.

As alternative option you can use JitPack (https://jitpack.io/#pravega/pravega) to get pre-release artifacts.

Quick Start

Read Getting Started page for more information, and also visit sample-apps repo for more applications.

Running Pravega

Pravega can be installed locally or in a distributed environment. The installation and deployment of pravega is covered in the Running Pravega guide.

Support

Don’t hesitate to ask! Contact the developers and community on slack (signup) if you need any help. Open an issue if you found a bug on Github Issues.

Documentation

The Pravega documentation is hosted on the website: https://pravega.io/docs/latest or in the documentation directory of the source code.

Contributing

Become one of the contributors! We thrive to build a welcoming and open community for anyone who wants to use the system or contribute to it. Here we describe how to contribute to Pravega! You can see the roadmap document here.

About

Pravega is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.

pravega-operator's People

Contributors

Stargazers

Watchers

pravega-operator's Issues

Controller resource structure

Proposal for initial controller resource structure:

Controller should be defined as either a Deployment or ReplicaSet. A single controller service provides access to the cluster.

Headless service is needed by Bookies

Since we are using hostname as bookieID, headless service is a must have

Add the support for the replica of bookkeeper and pravega-segmentstore updating

To support update the replica ( bookkeeper and pravega-segmentstore) by "kubectl apply -f cr.yaml“， the operator need add the related updating logic.

Add an initContainer field to the PravegaClusterResource

Motivation:

To facilitate the customizing of Pravega cluster installation, it would be very useful to have an initContainer "hook", such that one can inject files in the controller's runtime environment. The concrete use of it for now, is the ability to provide plugin implementations (the auth handler being the primary one at the moment, I think).

Proposal:

 pravega:
    controllerReplicas: 1
    segmentStoreReplicas: 3

    plugins:  <-- new element
      images: [ "repohost/reponame/imagename:tag" ]  

    cacheVolumeClaimTemplate:

When the plugin element is seen, the operator would do the following:

add an empty volume in the controller pod
map this volume to /opt/pravega/plugin/lib to an init-container spec of the main controller deployment spec, using the specified image from the plugins spec
map that same volume to the same location in the controller container itself.

Let k8 do the rest. :)

The requirement for the init-container image implementation is simply to:

contain a fat jar for the desired plugin to implement
have an entry point/command that automatically does a cp of the jar into /opt/pravega/plugin/lib.

This will allow the said jars files to eventually show up in the controller's main container's /opt/pravega/plugin/lib directory and be ready to use.

note: the proposal uses a list for images, as one could think there could be multiple init containers back to back which, according to k8 documentation, would execute one after the other in the order of the list.

Identify Operator goals and required actions

Some operator goals are fairly obvious (like deployment and scaling) we should identify which Operator goals are required for the operator to be "minimally viable" for release. Also, the separation of duties between the Pravega Operator and other projects (Nautilus, Flink, etc.) should be considered.

This issue should track the goals considered for initial release. Once identified, these will be tracked as issues and added to the 0.0.1 milestone.

Add Initial Scaling Support

Add support for syncing the size of components to an updated PravegaCluster resource

Bookkeeper's ledger & journal volumes are not getting cleanup after destroying Pravega cluster

After destroying Pravega Cluster, Bookkeeper volumes have not been cleaned up, so they contain data from older deployments. Due to the stale volumes present, new deployment of bookie pods are failing with below error

2018-10-31 10:27:44,400 - INFO  - [main-SendThread(10.100.200.42:2181):ClientCnxn$SendThread@1381] - Session establishment complete on server 10.100.200.42/10.100.200.42:2181, sessionid = 0x1000384e32b00a7, negotiated timeout = 10000
2018-10-31 10:27:44,402 - INFO  - [main-EventThread:ZooKeeperWatcherBase@131] - ZooKeeper client is connected now.
2018-10-31 10:27:44,491 - INFO  - [main:BookieNettyServer@382] - Shutting down BookieNettyServer
2018-10-31 10:27:44,497 - ERROR - [main:BookieServer@435] - Exception running bookie server :
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "10.200.84.29:3181"
journalDir: "/bk/journal"
ledgerDirs: "1\t/bk/ledgers"
instanceId: "f3db659f-6039-4399-b76e-8d5bedbf2bd7"
] is not matching with [4
bookieHost: "10.200.59.7:3181"
journalDir: "/bk/journal"
ledgerDirs: "1\t/bk/ledgers"
instanceId: "f3db659f-6039-4399-b76e-8d5bedbf2bd7"
]
       at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:141)
        at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:152)
        at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:329)
        at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:687)
        at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:124)
        at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:100)
        at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:418)

Segment store resource structure

There are a few open questions regarding the proper design for the segment store resources:

Does the segment store require a persistent volume, or should the local disk be considered ephemeral?
Should the Bookkeeper be included within the segment store pod? If these are always expected to scale in concert, then it would make sense to keep them together.
Segment stores find the bookkeeper instances via Zookeeper. Does each segment store write directly to the associated Bookee?

Pravega-operator Document structure planning

Will decide on the list of docs required

Apply API Conventions

Let's review the CRD in the light of the official API Conventions.

Handle Go errors correctly

The ReconcilePravegaCluster() function does not seem to handle Go errors correctly:

deployBookie(pravegaCluster)
if err != nil {
	return err
}

err in this case is never going to be set to an error if it occurs from deployBookie(). We should change that first line in this scenario to err = deployBookie(pravegaCluster).

Allow Authorization to be configured

Currently the operator configures Pravega with authorization turned off. This should be modified to allow authorization to be configured and turned on.

An example from Pravega controller configuration:

pravega-operator/pkg/pravega/pravega_controller.go

Lines 114 to 125 in 2fd3108

 configData := map[string]string{ 

 "CLUSTER_NAME": pravegaCluster.Name, 

 "ZK_URL": pravegaCluster.Spec.ZookeeperUri, 

 "JAVA_OPTS": strings.Join(javaOpts, " "), 

 "REST_SERVER_PORT": "10080", 

 "CONTROLLER_SERVER_PORT": "9090", 

 "AUTHORIZATION_ENABLED": "false", 

 "TOKEN_SIGNING_KEY": "secret", 

 "USER_PASSWORD_FILE": "/etc/pravega/conf/passwd", 

 "TLS_ENABLED": "false", 

 "WAIT_FOR": pravegaCluster.Spec.ZookeeperUri, 

 }

Identify Pravega top-level structure

The Pravega project is made up of a number of components, which will each be defined as independent Kubernetes resources. For each component, we must evaluate both the appropriate resource type, resource requirements (connectivity, volumes, configuration, etc.) and inter-dependencies.

Push docker image to docker hub

We are going to open pravega-operator to public, need move docker images from the private location to a well known place for Pravega community.

Update ReadMe with further information

The ReadMe file needs to be updated with the following:

References to Zookeeper Operator removed
Specify that an instance of Zookeeper 3.5 must be installed (perhaps suggest the Zookeeper Operator)
An example of using kubectl get all -l apps=example to show the running resources

Configure Bookie metrics

The operator should support bookie metrics, notably graphite (as provided by codahale) and prometheus.

Admission Controller for PravegaCluster resource

Currently the resource contains the Bookkeeper and Pravega image definitions. The operator should have sensible defaults set in its configuration and apply these to the resource via a mutating Admission Controller.

Support external connectivity

Overview

The Pravega clusters that are produced by the operator should support external connectivity (i.e. connectivity from outside the Kubernetes cluster). The specific endpoints in question are the controller RPC/REST ports, and the segment store RPC port.

Challenges

Pravega ingests data directly from client to a dynamic set of segment stores, unlike a conventional service that relies on a stable, load-balanced endpoint. The client discovers the segment stores with the help of the controller, who's aware of active segment stores and their endpoint addresses. Specific challenges include:

advertising usable addresses to the client
facilitating transport encryption (TLS) to the segment store (e.g. supporting hostname verification)
optimizing internal connectivity vs external connectivity (e.g. avoiding an expensive route when possible)

Vendor Specifics

PKS: has option to use NSXT for Ingress. Istio is apparently on the roadmap.
GKE: see references at bottom

Implementation

For conventional services, external connectivity is generally accomplished with an Ingress resource. Ingress primarily supports HTTP(s) and it is unclear whether gRPC (which is HTTP/2-based) is supported (ref).

Ingress is probably not suitable for exposing the segment store. For workloads that are similar to Pravega, e.g. Kafka, the typical solution is to use a NodePort.

Keep in mind that Ingress and services of type LoadBalancer may incur additional costs in cloud environments (GCP pricing).

Multiple Advertised Addresses

Certain Pravega clients will be internal to the cluster, others external. Imagine that the segment store advertised only an external address (backed by a NodePort or other type of service); would the performance of internal clients suffer due to a needlessly expensive route? A mitigation would be to introduce support for numerous advertised advertised addresses ("internal"/"external"). Given a prioritized list, the client could strive to connect to the cheapest endpoint.

This idea could extend to full-fledged multi-homing, where the segment store binds to a separate interface/port per endpoint, possibly with a separate SSL configuration per endpoint.

NodePort Details

Be sure to set the externalTrafficPolicy field of the Service to local. This will ensure that traffic entering a given VM will be routed to the segment store on that same VM.

One limitation of NodePort is that only a single segment store may be scheduled on a given cluster node. If multiple were to be scheduled, some would fail with a port-unavailable error. One way to avoid this is to use a DaemonSet to manage the segment store pods.

References

Bookkeeper resource structure

Proposal for initial bookkeeper resource structure:

Bookkeeper should be defined as a StatefulSet with Read Write Once (RWO) volumes identified for each Pod. A single bookkeeper service provides access to the cluster.

Allow empty namespace to be passed in

Currently the operator requires a namespace, however passing in a blank namespace "" specifies that the operator should listen on all namespaces and should be an allowed value.

Add OpenAPI Schema Validation

Add a validation section to the CRD definition to validate the Custom Resources

References:

Update operator to use Pravega 0.3.2 release

We will need update operator to use Pravega 0.3.2 (coming release) release and have regression tests done for Pravega-operator.

Zookeeper resource structure

Proposal for initial zookeeper resource structure:

Zookeeper should be defined as a StatefulSet with Read Write Once (RWO) volumes identified for each Pod. A single zookeeper service provides access to the cluster.

Operator crashes if Spec is invalid

If the PravegaCluster resource spec is invalid the operator crashes. It should handle invalid specifications without crashing.

For example, the following specification will crash the operator because the Options expects map[string]string and instead numbers have been specified

apiVersion: "pravega.pravega.io/v1alpha1"
kind: "PravegaCluster"
metadata:
  name: "nautilus"
  namespace: {{ .Values.pravegaNamespace }}
spec:
  zookeeperUri: nautilus-pravega-zookeeper-client:2181

  bookkeeper:
    image:
      repository: pravega/bookkeeper
      tag: 0.3.0
      pullPolicy: IfNotPresent

    replicas: 3

    storage:
      ledgerVolumeClaimTemplate:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 10Gi

      journalVolumeClaimTemplate:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "standard"
        resources:
          requests:
            storage: 10Gi

    autoRecovery: true

  pravega:
    controllerReplicas: 1
    segmentStoreReplicas: 3

    cacheVolumeClaimTemplate:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 20Gi

    image:
      repository: pravega/pravega
      tag: 0.3.0
      pullPolicy: IfNotPresent

    options:
      metrics.enableStatistics: "true"
      metrics.statsdHost: telegraf
      metrics.statsdPort: 8125

    tier2:
      filesystem:
        persistentVolumeClaim:
          claimName: nautilus-pravega-tier2

Results in the operator crashing whilst parsing the options, which should be strings:

    options:
      metrics.enableStatistics: "true"
      metrics.statsdHost: "telegraf"
      metrics.statsdPort: "8125"

Add versioning

Since we are about to open up the repo and make the first release. We need to add versioning to the project.

Ensure data protection (DU/DL)

Operator should ensure that components (e.g. bookies) are deployed in a reasonable way from a data protection perspective. Kubernetes has features including anti-affinity, multi-zone, and pod disruption budgets.

Achieving adequate data protection likely has deployment- and upgrade-time considerations, e.g. anti-affinity to prevent DL, pod disruption budget to prevent DU during upgrade/maintenance. Note that Kubernetes has support for "planned" node maintenance operations, e.g. kubectl drain (ref). Such operations respect the pod disruption budget.

For a conceptual overview, please read the Disruptions section of the Kubernetes documentation.

Instructions and conditions for contributing to the project

Need to add a CONTRIBUTING.md with the details on how to contribute to the project and a Developer Certificate of Origin (DCO) file.

Add CONTRIBUTING.md
Configure DCO

Add support for multiple Bookie ledger directories

Bookkeeper supports writing to multiple ledger directories for better performance. The ledgers should each be a separate volume and supplied to bookie with ledgerDirs which is a comma separated list of mounted directories.

Operator versioning strategy

The operator will use SemVer for versioning. The initial release will be 0.1.0, with the following meaning:

0: indicates initial development. Anything may change at any time. The public API should not be considered stable.
1: first set of features.
0: no bug fixes yet.

After each release, we will immediately append the +git suffix to the version number to indicate that there are changes on top of that version. E.g. after the first release, we will change the version to 0.1.0+git. This is what the Operator SDK and other operators are also doing.

Update the ZooKeeper operator reference

Update the ZooKeeper operator reference when our own operator is released.

Add Helm Chart for installation

Develop a Helm chart for the operator.

Validate that HDFS Tier 2 storage is working correct

Validate that when Pravega is setup with Tier2 HDFS storage that everything works as expected.

Fixing formatting issues in README.md

Handle Resource Deletions

Currently the Operator only handles creation of resources when a PravegaCluster resource is created. It should also handle the PravegaCluster resource being deleted and destroy all corresponding Kubernetes resources for that PravegaCluster resource

Support Pravega upgrades

As new Pravega versions are released, users will want to upgrade to a newer version. The operator should provide support for such upgrades with minimal to no disruption.

Cleanup Zookeeper MetaData on PravegaCluster Delete

When deleting a PravegaCluster the Kubernetes resources are removed but the metadata for the cluster remains in Zookeeper. This should be removed in order to clean everything up as well as stop it interfering if a cluster with the same name is recreated.

Validate Tier2 Configuration with Admissions Webhook

Validate that the provided Tier2 configuration is correct before the PravegaCluster Resource is committed to the API. It is assumed that an Admissions Webhook would be put into the controller in order to perform this validation.

Update Pravega version to 0.3.2

As v0.3.2 has been released, we should update our reference to use the latest version.

Add Travis build

Add extra options to configurations

The PravegaCluster resource allows extra options to be specified for both Bookkeeper and Pravega, however these are currently NOT added to the configuration ConfigMaps.

Wait for Zookeeper to become available

Pravega has the option to WAIT_FOR zookeeper to be available, all components of the cluster should use this.

Support for `PravegaCluster` resource status

Report the high-level status / conditions via the status section of the custom resource. A key scenario is to enable Helm to --wait on the deployment of the cluster.

Ideally the status would accurately indicate whether Pravega is ready to create streams. Note that the Pravega controller startup routine involves more than exposing a TCP endpoint; some system streams are created before Pravega is truly ready. Could the true readiness be detected somehow?

The solution will involve adding a status section to the CRD, and using the UpdateStatus method.

Note that the status section is expected to have some well-known elements, e.g. conditions. The word conditions has a meaning like "weather conditions" not "conditional logic". There is a mindful design underlying this (example).

Pravega IO workload failed when Pravega Controller end point is accessed from outside cluster using 'kubectl' port forward

Running Pravega-Benchmark IO tool on Pravega controller end point through 'kubectl' port forwarding is failing. It may be due to the pravega-segment-store end points are not being forwarded through 'kubectl'.

However the expectation here is, client should be able to access Pravega by using Pravega Controller end point from outside of the K8 cluster. For example, if customer wants to process or do analytics on their existing IO workload and IOT sensors which is outside of the k8 environement then is there a way to achieve that on pravega over k8 cluster.

[root@rhel ~]# kubectl port-forward -n default pravega-pravega-controller-7489c9776d-vdqxl 9090:9090 10080:10080
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from 127.0.0.1:10080 -> 10080
Handling connection for 9090
Handling connection for 9090

Error logs:
[root@rhel pravega-benchmark]# ./pravega-benchmark/bin/pravega-benchmark  --controller tcp://127.0.0.1:9090 --stream StreamName-r1  -producers 1 --size 1000  -eventspersec 300000 --runtime 60 --randomkey true  --writeonly true
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerImpl - Controller client connecting to server at 127.0.0.1:9090
[StreamManager-Controller-1] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Updating client with controllers: [[addrs=[/127.0.0.1:9090], attrs={}]]
[grpc-default-executor-0] WARN io.pravega.client.stream.impl.ControllerImpl - Scope already exists: Scope
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerImpl - Controller client connecting to server at 127.0.0.1:9090
[main] INFO io.pravega.client.admin.impl.StreamManagerImpl - Creating scope/stream: Scope/StreamName-r1 with configuration: StreamConfiguration(scope=Scope, streamName=StreamName-r1, scalingPolicy=ScalingPolicy(scaleType=FIXED_NUM_SEGMENTS, targetRate=0, scaleFactor=0, minNumSegments=1), retentionPolicy=null)
[grpc-default-executor-0] WARN io.pravega.client.stream.impl.ControllerImpl - Stream already exists: StreamName-r1
[pool-1-thread-1] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Updating client with controllers: [[addrs=[/127.0.0.1:9090], attrs={}]]
Current segments of the stream: StreamName-r1 = 1
[main] WARN io.pravega.client.ClientConfig - The credentials are not specified or could not be extracted.
[main] INFO io.pravega.client.stream.impl.ControllerResolverFactory - Shutting down ControllerNameResolver
[main] INFO io.pravega.client.stream.impl.ClientFactoryImpl - Creating writer for stream: StreamName-r1 with configuration: EventWriterConfig(initalBackoffMillis=1, maxBackoffMillis=20000, retryAttempts=10, backoffMultiple=10, transactionTimeoutTime=29999)
[main] INFO io.pravega.client.stream.impl.SegmentSelector - Refreshing segments for stream StreamImpl(scope=Scope, streamName=StreamName-r1)
[clientInternal-1] INFO io.pravega.client.segment.impl.SegmentOutputStreamImpl - Fetching endpoint for segment Scope/StreamName-r1/3.#epoch.1, writerID: b9782b6b-ca3f-472c-81bd-2413d69a7887
[clientInternal-1] INFO io.pravega.client.segment.impl.SegmentOutputStreamImpl - Establishing connection to PravegaNodeUri(endpoint=10.32.0.8, port=12345) for Scope/StreamName-r1/3.#epoch.1, writerID: b9782b6b-ca3f-472c-81bd-2413d69a7887
[epollEventLoopGroup-4-1] INFO io.pravega.client.netty.impl.ClientConnectionInboundHandler - Connection established ChannelHandlerContext(ClientConnectionInboundHandler#0, [id: 0xa981360b])
[epollEventLoopGroup-4-1] WARN io.pravega.client.netty.impl.ClientConnectionInboundHandler - Keep alive failed, killing connection 10.32.0.8 due to DefaultChannelPromise@31ffd58e(uncancellable)
[epollEventLoopGroup-4-1] WARN io.pravega.client.segment.impl.SegmentOutputStreamImpl - b9782b6b-ca3f-472c-81bd-2413d69a7887 Failed to connect:
java.util.concurrent.CompletionException: io.pravega.shared.protocol.netty.ConnectionFailedException: java.nio.channels.ClosedChannelException
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1088)
        at java.util.concurrent.CompletableFuture$BiApply.tryFire(CompletableFuture.java:1070)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at io.pravega.client.netty.impl.ConnectionFactoryImpl$2.operationComplete(ConnectionFactoryImpl.java:166)
        at io.pravega.client.netty.impl.ConnectionFactoryImpl$2.operationComplete(ConnectionFactoryImpl.java:155)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
        at io.netty.channel.epoll.AbstractEpollChannel.doClose(AbstractEpollChannel.java:163)
        at io.netty.channel.epoll.AbstractEpollStreamChannel.doClose(AbstractEpollStreamChannel.java:686)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:763)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:740)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:611)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1301)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465)
        at io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:973)
        at io.netty.channel.AbstractChannel.close(AbstractChannel.java:238)
        at io.pravega.client.netty.impl.ClientConnectionInboundHandler.close(ClientConnectionInboundHandler.java:164)
        at io.pravega.client.netty.impl.ClientConnectionInboundHandler$KeepAliveTask.run(ClientConnectionInboundHandler.java:195)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:126)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.pravega.shared.protocol.netty.ConnectionFailedException: java.nio.channels.ClosedChannelException
        ... 34 more
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.epoll.AbstractEpollChannel.doClose()(Unknown Source)
[epollEventLoopGroup-4-1] ERROR io.pravega.shared.protocol.netty.ExceptionLoggingHandler - Uncaught exception on connection 10.32.0.8
java.nio.channels.ClosedChannelException

Note:-
Here, 10.32.0.8 IP address belongs to pravega-segment-store-2 where Pravega-Benchmark client tries to access segment store directly from outside the cluster and eventually fails.

Update ReadMe with GKE Tier2 Storage Options

The current 'demo' uses the NFS provisioner to provide some basic Tier2 storage. ECS is obviously not an option and Google Cloud Storage, although it has an HDFS connector, does not support the HDFS Append command required by Pravega.

Google FileStore presents itself as an NFS share therefore which WOULD be a compatible native storage option for installing on GKE.

Allow k8 service account to be configured for the PravegaCluster resource

In some environments, the various components of the Pravega cluster may need to run under specific K8 service accounts. Such service accounts would have different types of secrets, permissions and/or annotations.

Two components of interest are the controller and segment stores. Webhooks may attach different secrets to the service accounts associated with the pods running these components. Pravega's auth plugin implementations will often require a way to inject credentials related data.

Taking as an example an implementation where the controller has a plugin that can validate an oAuth token, the controller will need some configuration injected related to the issuer. The segment store, will need some secret/configuration in order to obtain a token when talking to the controller.

All this can be facilitated with opaque (to the operator) annotations on the K8 service accounts that get associated with the pods. What we need here is a way to specify which K8 service account need to be associated with the controller and the segment store.

Something along the lines of:

...
pravega:
   controller-service-account: foo
   segment-store-service-account: bar
...

The chart deploying the PravegCluster resource will be responsible for requesting foo and bar. But the operator should responsible for settings those appropriate service account names to the correponding pods.

Automatically create a PersistentVolumeClaim for tier-2 storage

To reduce the amount of manual work needed to deploy a cluster, the operator should automatically create a claim for tier-2 based on a template. The current implementation relies on an existing claim. Seems useful to support both options.

The claim would always be of ReadWriteMany. The storageClassName is the more important input, in addition to resources.

The template approach will combine nicely with CSI plugins, e.g. gcp-filestore-csi-driver, which automatically create PVs given PVCs.

Add SegmentStore RocksDB cache Volume

Segmentstore uses a RocksDB cache that should be on a dedicated fast Volume. The volume is ephemeral and not required if the process is restarted.

From Pravega Configuration (https://github.com/pravega/pravega/blob/master/config/config.properties#L347):

#Path to the working directory where RocksDB can store its databases. The contents of this folder can be discarded after
#the process exits (and it will be cleaned up upon startup), but Pravega requires exclusive use of this while running.
#Recommended values: a path to a locally mounted directory that sits on top of a fast SSD.
#rocksdb.dbDir=/tmp/pravega/cache

Support cluster rescaling

Do the needful to support scale changes for the various components (e.g. bookies, segment stores).

Scale down might not be supported by all components, or may require special attention. For example, it may be necessary to drain the tier-1 storage when tearing down a given bookie. Similar functionality may be implied by #60.

Add LICENSE information

Before opening up the repo we'll need to upload a LICENSE file. I guess it'll be Apache 2, but I would like to hear from @spiegela @maddisondavid @mcgG @fpj.

Error trying to run the operator locally

I'm trying to run the operator locally as instructed in the operator-sdk user guide using the operator-sdk up local command. However, the operators panics and outputs the following log:

$ operator-sdk up local
INFO[0000] Go Version: go1.11                                                                                                       
INFO[0000] Go OS/Arch: linux/amd64                                                                                                  
INFO[0000] operator-sdk Version: 0.0.5+git                                                                                          
INFO[0000] Watching pravega.pravega.io/v1alpha1, PravegaCluster, default, 5                                                        
panic: No Auth Provider found for name "gcp"

goroutine 1 [running]:
github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1.NewForConfigOrDie(0xc0001ef0e0, 0xc0002b82d0)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1/admissionregistration_client.go:58 +0x65                                                               
github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes.NewForConfigOrDie(0xc0001ef0e0, 0x0)                        
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/k8s.io/client-go/kubernetes/clientset.go:529 +0x49
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.mustNewKubeClientAndConfig(0x56, 0xc00018bc20, 0xe83800)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:138 +0x68
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.newSingletonFactory()          
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:52 +0x34
sync.(*Once).Do(0x1b9d370, 0x114b4b8)
        /home/adrian/.gvm/gos/go1.11/src/sync/once.go:44 +0xb3
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.GetResourceClient(0x10e5c3e, 0x1b, 0x10da599, 0xe, 0xc00003e450, 0x7, 0xc000429380, 0xc00018be10, 0xe8268e, 0xc0000e14f0, ...)                                     
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:70 +0x3d
github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.Watch(0x10e5c3e, 0x1b, 0x10da599, 0xe, 0xc00003e450, 0x7, 0x12a05f200, 0x0, 0x0, 0x0)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/api.go:45 +0x84
main.main()
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/pravega-operator/cmd/pravega-operator/main.go:33 +0x287     
exit status 2
Error: failed to run operator locally: exit status 1

I made a quick research and found some sources that suggest importing the gcp from the k8s.io/client-go/plugin/pkg/client/auth/gcp package in the main.go file. I'll give it a try and submit a PR if it fixes the issue.

	configData := map[string]string{
	"CLUSTER_NAME": pravegaCluster.Name,
	"ZK_URL": pravegaCluster.Spec.ZookeeperUri,
	"JAVA_OPTS": strings.Join(javaOpts, " "),
	"REST_SERVER_PORT": "10080",
	"CONTROLLER_SERVER_PORT": "9090",
	"AUTHORIZATION_ENABLED": "false",
	"TOKEN_SIGNING_KEY": "secret",
	"USER_PASSWORD_FILE": "/etc/pravega/conf/passwd",
	"TLS_ENABLED": "false",
	"WAIT_FOR": pravegaCluster.Spec.ZookeeperUri,
	}