OpenSearch Kubernetes Operator

License: Apache License 2.0

Dockerfile 0.12% Makefile 0.70% Go 97.81% Starlark 0.07% Shell 0.82% Smarty 0.24% Mustache 0.24%

kubernetes operator kubernetes-operator opensearch autoscaler upgrades monitoring k8s

opensearch-k8s-operator's Introduction

OpenSearch Kubernetes Operator

The Kubernetes OpenSearch Operator is used for automating the deployment, provisioning, management, and orchestration of OpenSearch clusters and OpenSearch dashboards.

Getting started

The Operator can be easily installed using helm on any CNCF-certified Kubernetes cluster. Please refer to the User Guide for installation instructions.

Roadmap

Auto-Scaler.
OpenShift support.
Data-prepper support.

Current feature list

Features:

Installation

The Operator can be easily installed using Helm:

Add the helm repo: helm repo add opensearch-operator https://opensearch-project.github.io/opensearch-k8s-operator/
Install the Operator: helm install opensearch-operator opensearch-operator/opensearch-operator

OpenSearch Kubernetes Operator installation & demo video

Compatibility

The opensearch k8s operator aims to be compatible to all supported opensearch versions. Please check the table below for details:

Operator Version	Min Supported Opensearch Version	Max supported Opensearch version	Comment
2.3	1.0	2.8
2.2	1.0	2.5
2.1	1.0	2.3
2.0	1.0	2.3
1.x	1.0	1.x
0.x	1.0	1.x	Beta

This table only lists versions that have been explicitly tested with the operator, the operator will not prevent you from using other versions. Newer minor versions (2.x) not listed here generally also work but you should proceed with caution and test it our in a non-production environment first.

Development

If you want to develop the operator, please see the separate developer docs.

Contributions

We welcome contributions! See how you can get involved by reading CONTRIBUTING.md.

opensearch-k8s-operator's People

Stargazers

Watchers

Forkers

bl1tz23 thealgo mixitgit swoehrl-mw pelpro a-pel-a simonmacklin andriykalashnykov glenrsmith sharonrosencwaig napoleon211092 der-ofenmeister envien khushbu1407 dbason frankfanslc asafm leandrocostam unb0rn jamesclair zollie vanveele ekarlso frotsch djosix duhang jbontech boer0924 jlamillan djangoyi-yunify prudhvigodithi andloh aarontams pwalleni domstolene kannappansomu albertollamaso akshaypolji hakman ibotty djeezus madhukarmmallia-plivo ayhamk jinchengsix robcxyz jacklozano max-frank roelandvanbatenburg plivo lowell-criteo ivanpedersen disaster37 liuj84 apefactory vijay2308 azimin-ex42 rajesha-ceras thoughtfuldev anthonyamanse tu170591 msudevan1 alekszimin mancandoz lieberlois ekneg54 factort teamsalvador rhysjtevans sungyil mackerslabs drmiru idanl21 krishnabeee cfbrianmiller coreywagehoft franzqat scrawfor99 gaurav-05 depositphotos mclayer vyaradaikin suyambuganesh82 syousif94 k-yomo yahavhamburg96 andy-hoffman mahima081 djcass44 mrvdcsg ryaniveyq6 saketmht rhysxevans gortegams verrazzano shaunmaher kkapper turesheim bart2001 edytask jplanckeel

opensearch-k8s-operator's Issues

explicitly list supported OpenSearch & OpenSearch Dashboards versions

currently neither the README nor the release notes (on GitHub) list the versions of OpenSearch and OpenSearch Dashboards which are supported by this version of the operator.

it'd be great if:

the README would contain a compatibility matrix for all still supported operator versions (i.e. the version of the README at the tip of the master branch would always reflect the current state, checking the README as of an older version will just confirm the compatibility for that version w/o stating whether it's still supported in any way)
the release notes would always list the currently compatible version range for that operator version

Supports different storage types

Support gp2 and local storage type.

Shards balancing and allocation service

Create scaler service

Create a standalone service called scaler which runs on K8 pod.

The service should be able (by API’s) to scale up/down nodes according to node role.

Detailed explanation:
As part of architecture the operator will be responsible to load service workers (like this scaler).
The scaler service will be the one that handled what's needed in order to scale down/up cluster wise.

Work flow:

The service will received an API call to increase/decrease the number of nodes according to the desired node role.
If the service finds that more nodes should be added it will update the CRD with the desired number of nodes.
If the service finds that nodes should removed from the cluster than it will trigger a drain process which on successful operation should update the CRD by API.

Allow override of BusyBox image

It should be possible to override the image for BusyBox either at the operator level as a argument when starting or as a part of the CRD?

Create cluster resources service

Service worker to manage the cluster resources:

Increase/Decrease disks size (Only PVC)
Change memory allocation and limits

Add the ability to auto scale the cluster according to utilized resources

Add validating

Opensearch Operator + Istio

I have tried to create an Opensearch cluster in a namespace that has:

Istio enabled
The Istio sidecar image is vendored and the registry hosting the sidecar requires an imagePullSecret

Cluster creation fails with image pull errors of the istio sidecar.

I attempted to apply configuration that includes the appropriate stanza in the general portion of the spec:

spec:
  general:
    imagePullSecrets:
      - name: XXX

This still does not succeed. When first creating the opensearchcluster resource the imagePullSecrets is omitted. If I apply the identical resource a second time the imagePullSecrets configuration appears in the output of kubectl describe. Unfortunately, even after applying the configuration twice so that imagePullSecrets is set, the containers managed by the operator do not have the imagePullSecrets in their specification. I would expect that the imagePullSecrets doesn't require applying the configuration twice.

Example:

$ cat ha-poc-third-try.yml
---
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: ha-poc-third-try
spec:
  general:
    serviceName: os-ha-poc-third-try
    version: 1.3.1
    imagePullSecrets:
      - name: XXX
  dashboards:
    enable: true
    version: 1.3.1
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      diskSize: "5Gi"
      NodeSelector:
      resources:
         requests:
            memory: "2Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "500m"
      roles:
        - "data"
        - "master"

[1080] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl describe os -n ha-poc
No resources found in ha-poc namespace.
[1081] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl apply -n ha-poc -f ha-poc-third-try.yml
opensearchcluster.opensearch.opster.io/ha-poc-third-try created
[1082] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl describe os -n ha-poc
Name:         ha-poc-third-try
Namespace:    ha-poc
Labels:       <none>
Annotations:  <none>
API Version:  opensearch.opster.io/v1
Kind:         OpenSearchCluster
Metadata:
  Creation Timestamp:  2022-04-27T22:33:20Z
  Finalizers:
    Opster
  Generation:  2
  Managed Fields:
    API Version:  opensearch.opster.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:dashboards:
          .:
          f:enable:
          f:replicas:
          f:resources:
            .:
            f:limits:
              .:
              f:cpu:
              f:memory:
            f:requests:
              .:
              f:cpu:
              f:memory:
          f:version:
        f:general:
          .:
          f:httpPort:
          f:serviceName:
          f:version:
        f:nodePools:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-04-27T22:33:20Z
    API Version:  opensearch.opster.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:confMgmt:
        f:dashboards:
          f:opensearchCredentialsSecret:
      f:status:
        .:
        f:componentsStatus:
        f:phase:
        f:version:
    Manager:         manager
    Operation:       Update
    Time:            2022-04-27T22:33:21Z
  Resource Version:  13740903
  Self Link:         /apis/opensearch.opster.io/v1/namespaces/ha-poc/opensearchclusters/ha-poc-third-try
  UID:               fc1ff7e1-19bc-4101-92b9-a6401794aeaf
Spec:
  Conf Mgmt:
  Dashboards:
    Enable:  true
    Opensearch Credentials Secret:
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  512Mi
      Requests:
        Cpu:     200m
        Memory:  512Mi
    Version:     1.3.1
  General:
    Http Port:     9200
    Service Name:  os-ha-poc-third-try
    Version:       1.3.1
  Node Pools:
    Component:  masters
    Disk Size:  5Gi
    Replicas:   3
    Resources:
      Limits:
        Cpu:     500m
        Memory:  2Gi
      Requests:
        Cpu:     500m
        Memory:  2Gi
    Roles:
      data
      master
Status:
  Components Status:
  Phase:    RUNNING
  Version:  1.3.1
Events:     <none>
[1083] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl apply -n ha-poc -f ha-poc-third-try.yml
opensearchcluster.opensearch.opster.io/ha-poc-third-try configured
[1084] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl describe os -n ha-poc
Name:         ha-poc-third-try
Namespace:    ha-poc
Labels:       <none>
Annotations:  <none>
API Version:  opensearch.opster.io/v1
Kind:         OpenSearchCluster
Metadata:
  Creation Timestamp:  2022-04-27T22:33:20Z
  Finalizers:
    Opster
  Generation:  3
  Managed Fields:
    API Version:  opensearch.opster.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:confMgmt:
        f:dashboards:
          f:opensearchCredentialsSecret:
      f:status:
        .:
        f:componentsStatus:
        f:phase:
        f:version:
    Manager:      manager
    Operation:    Update
    Time:         2022-04-27T22:33:21Z
    API Version:  opensearch.opster.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:dashboards:
          .:
          f:enable:
          f:replicas:
          f:resources:
            .:
            f:limits:
              .:
              f:cpu:
              f:memory:
            f:requests:
              .:
              f:cpu:
              f:memory:
          f:version:
        f:general:
          .:
          f:httpPort:
          f:serviceName:
          f:version:
        f:nodePools:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2022-04-27T22:34:10Z
  Resource Version:  13741241
  Self Link:         /apis/opensearch.opster.io/v1/namespaces/ha-poc/opensearchclusters/ha-poc-third-try
  UID:               fc1ff7e1-19bc-4101-92b9-a6401794aeaf
Spec:
  Conf Mgmt:
  Dashboards:
    Enable:  true
    Opensearch Credentials Secret:
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  512Mi
      Requests:
        Cpu:     200m
        Memory:  512Mi
    Version:     1.3.1
  General:
    Http Port:  9200
    Image Pull Secrets:
      Name:        XXX
    Service Name:  os-ha-poc-third-try
    Version:       1.3.1
  Node Pools:
    Component:  masters
    Disk Size:  5Gi
    Replicas:   3
    Resources:
      Limits:
        Cpu:     500m
        Memory:  2Gi
      Requests:
        Cpu:     500m
        Memory:  2Gi
    Roles:
      data
      master
Status:
  Components Status:
  Phase:    RUNNING
  Version:  1.3.1
Events:     <none>
[1085] GalensReltioMBP:~/Documents/gdrive/opensearch-k8s-operator% kubectl describe pod -n ha-poc ha-poc-third-try-bootstrap-0
Name:         ha-poc-third-try-bootstrap-0
Namespace:    ha-poc
Priority:     0
Node:         ip-10-10-144-255.ec2.internal/10.10.144.255
Start Time:   Wed, 27 Apr 2022 15:33:21 -0700
Labels:       opster.io/opensearch-cluster=ha-poc-third-try
              security.istio.io/tlsMode=istio
Annotations:  banzaicloud.com/last-applied:
                UEsDBBQACAAIAAAAAAAAAAAAAAAAAAAAAAAIAAAAb3JpZ2luYWzMVF+P4jYQ/yponpOQcHs9kTe0e1VV6QqC6+mkE0KOM2xcHNsa2+wilO9e2aEhbHe3L304IZGxPX9+85s/Z2jRsZ...
              kubernetes.io/psp: eks.privileged
              sidecar.istio.io/status:
                {"version":"023ae377d1a8981380141286422d04a98c39883f0647804d1ec0a7b5683da18d","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status:       Pending
IP:           X.X.X.X
IPs:
  IP:           X.X.X.X
Controlled By:  OpenSearchCluster/ha-poc-third-try
Init Containers:
  init:
    Container ID:  docker://26438a038ba34b5a81bada0d60751206e61c5942f06b3f6f5699c461fb3eaa7b
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:d2b53584f580310186df7a2055ce3ff83cc0df6caacf1e3489bff8cf5d0af5d8
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      chown -R 1000:1000 /usr/share/opensearch/data
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Apr 2022 15:33:22 -0700
      Finished:     Wed, 27 Apr 2022 15:33:22 -0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/opensearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-smv66 (ro)
  istio-init:
    Container ID:
    Image:         gcr.io/XXXXX/istio/proxyv2:1.4.4
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      10.10.0.0/16
      -x

      -b
      *
      -d
      9160,9042,15020
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-smv66 (ro)
Containers:
  opensearch:
    Container ID:
    Image:          docker.io/opensearchproject/opensearch:1.3.1
    Image ID:
    Ports:          9200/TCP, 9300/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Liveness:       tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
    Startup:        tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
    Environment:
      cluster.initial_master_nodes:  ha-poc-third-try-bootstrap-0
      discovery.seed_hosts:          ha-poc-third-try-discovery
      cluster.name:                  ha-poc-third-try
      network.bind_host:             0.0.0.0
      network.publish_host:          ha-poc-third-try-bootstrap-0 (v1:metadata.name)
      OPENSEARCH_JAVA_OPTS:          -Xmx512M -Xms512M
      node.roles:                    master
    Mounts:
      /usr/share/opensearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-smv66 (ro)
  istio-proxy:
    Container ID:
    Image:         gcr.io/customer-facing/istio/proxyv2:1.4.4
    Image ID:
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --configPath
      /etc/istio/proxy
      --binaryPath
      /usr/local/bin/envoy
      --serviceCluster
      ha-poc-third-try-bootstrap-0.ha-poc
      --drainDuration
      45s
      --parentShutdownDuration
      1m0s
      --discoveryAddress
      istio-pilot.helm-istio-system:15010
      --zipkinAddress
      zipkin.helm-istio-system:9411
      --proxyLogLevel=warning
      --dnsRefreshRate
      5s
      --connectTimeout
      10s
      --proxyAdminPort
      15000
      --concurrency
      2
      --controlPlaneAuthPolicy
      NONE
      --statusPort
      15020
      --applicationPorts
      9200,9300
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      POD_NAME:                          ha-poc-third-try-bootstrap-0 (v1:metadata.name)
      ISTIO_META_POD_PORTS:              [
                                             {"name":"http","containerPort":9200,"protocol":"TCP"}
                                             ,{"name":"transport","containerPort":9300,"protocol":"TCP"}
                                         ]
      ISTIO_META_CLUSTER_ID:             Kubernetes
      POD_NAMESPACE:                     ha-poc (v1:metadata.namespace)
      INSTANCE_IP:                        (v1:status.podIP)
      SERVICE_ACCOUNT:                    (v1:spec.serviceAccountName)
      ISTIO_AUTO_MTLS_ENABLED:           true
      ISTIO_META_POD_NAME:               ha-poc-third-try-bootstrap-0 (v1:metadata.name)
      ISTIO_META_CONFIG_NAMESPACE:       ha-poc (v1:metadata.namespace)
      SDS_ENABLED:                       false
      ISTIO_META_INTERCEPTION_MODE:      REDIRECT
      ISTIO_META_INCLUDE_INBOUND_PORTS:  9200,9300
      ISTIO_METAJSON_ANNOTATIONS:        {"banzaicloud.com/last-applied":"UEsDBBQACAAIAAAAAAAAAAAAAAAAAAAAAAAIAAAAb3JpZ2luYWzMVF+P4jYQ/yponpOQcHs9kTe0e1VV6QqC6+mkE0KOM2xcHNsa2+w
ilO9e2aEhbHe3L304IZGxPX9+85s/Z2jRsZo5BuUZJKtQ2iBpYx1SJvRUG1QWGfEm5dKHWyihYanRPHWNoDp1dIIuAcVafOUprbR21hEzaQ69ljWMX1UhAf2kkNa4R0LF0UL54wzMiG9IVmgFJVxBZFdkxwISqKTmh2Wwf0CJLqo78pgA
18qRljIA7m8OQtVQwtKg2kRn95eE3gQPCXgRbPa82O8/YZEW84qnd0VepPNZNU/ZL3d58Wl+x5Dtodt2CViDPDAYwjOhkPp0UB3j9xLoQmUmlHCCyV3LwnGndI0WEjgy6f+TzC4Z3NXCcn1EOmUWsd412rp33Qz6Yyf/YIrHt43HJgrdk
6ZDVgnVRx3Z5Vn8vaZufCWFbW4sfiXdBt72AmW9xn2Qb3oglju+rphroBw6twfcdaNAy9XnPzafF+v733a/L74tdsvV180IWvq9ff5YzL5M0u+tDcINSF1jRlreFKKvD3TbBETLHiPnmh9ejogh/RdyN7opi+xDFpBLcUSF1q5IVxgTZU
J6wq8NoW20rKEs8gQuDfGAkp02yLWqbf9gkISuh6tZnoD1nKO1Yw8JOG42AZkLMYwmB+V8luddAk60qL0bXHy8Du0VLyTRqG/aoYdXg5/rqDhnIm0vlT6MlBwxZSOIwByh1Z7igJ/DpDhGzpufi5Cjlr7FL9qrCwdtEC8dN/WWprZhhKM
KT+P2HFKOp24bO0UJd3+7Bhg9BgF4o5/UJF1PijzPy/A3ecf5NmyztmVhff0AG4qUchj1YuXtqdLPVxQhNPybceSehDsFUPgcCSGvFvZPG5Zk/j+n3/u6rL/WuNODoB7HC93YCs5HkN3fAQAA//9QSwcIbVtlHqkCAACSBgAAUEsBAhQA
FAAIAAgAAAAAAG1bZR6pAgAAkgYAAAgAAAAAAAAAAAAAAAAAAAAAAG9yaWdpbmFsUEsFBgAAAAABAAEANgAAAN8CAAAAAA==","kubernetes.io/psp":"eks.privileged"}

      ISTIO_METAJSON_LABELS:             {"opster.io/opensearch-cluster":"ha-poc-third-try"}

      ISTIO_META_WORKLOAD_NAME:          ha-poc-third-try-bootstrap-0
      ISTIO_META_OWNER:                  kubernetes://apis/v1/namespaces/ha-poc/pods/ha-poc-third-try-bootstrap-0
      ISTIO_KUBE_APP_PROBERS:            {}
    Mounts:
      /etc/certs/ from istio-certs (ro)
      /etc/istio/proxy from istio-envoy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-smv66 (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-smv66:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-smv66
    Optional:    false
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  istio.default
    Optional:    true
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  11m                   default-scheduler  Successfully assigned ha-poc/ha-poc-third-try-bootstrap-0 to ip-10-10-144-255.ec2.internal
  Normal   Pulling    11m                   kubelet            Pulling image "busybox"
  Normal   Pulled     11m                   kubelet            Successfully pulled image "busybox"
  Normal   Created    11m                   kubelet            Created container init
  Normal   Started    11m                   kubelet            Started container init
  Warning  Failed     10m (x3 over 11m)     kubelet            Error: ErrImagePull
  Normal   Pulling    9m33s (x4 over 11m)   kubelet            Pulling image "gcr.io/XXX/istio/proxyv2:1.4.4"
  Warning  Failed     9m33s (x4 over 11m)   kubelet            Failed to pull image "gcr.io/XXX/istio/proxyv2:1.4.4": rpc error: code = Unknown desc = Error response
 from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps
 in: https://cloud.google.com/container-registry/docs/advanced-authentication
  Warning  Failed     5m54s (x20 over 11m)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    62s (x42 over 11m)    kubelet            Back-off pulling image "gcr.io/XXX/istio/proxyv2:1.4.4"

Rework CRD to make it more congruent with kubernetes and opensearch terminology

After looking at the CRD and having worked with the opensearch helm chart I have some suggestions on how to rework the CRD to make it more congruent with kubernetes and opensearch terminology.

Note: This builds on the changes I proposed in #24.

To start a discussion this is my suggestion of a reworked example definition (all changes commented):

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opster-opensearch
  namespace: operator-os
spec:
  general:
    clusterName: os-from-operator  # Could be optional, if not set the operator can just use metadata.name
    httpPort: 9200
    transportPort: 9300 # Introduced to make both ports configurable
    #vendor: opensearch # Do we need that? Managing both opensearch and elasticsearch would likely become very hard due to the different implementations. I'd say remove field until we really have a use for it
    version: latest
    serviceName: es-svc  # Could be optional, if not set the operator can just use clusterName
  dashboards:
    enabled: true
  nodePools:
    - name: masters  # Renamed from component, use it as just a name, the component will be defined by the roles
      replicas: 3
      storage: # renamed and introduced substructure
        size: "30Gi" # Switch to using kubernetes resource units
        storageClassName: "default" # Optional to e.g. use local disks or fast SSDs
      nodeSelector:  # start with lowercase letter to be consistent
      resources: # Switch to resource definition structure and units as is used for pods/containers in kubernetes
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      roles: # Removed ingest parameter and made it more generic 
        - master
    - name: data
      replicas: 3
      storage:
        size: "100Gi"
      nodeSelector:
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      roles:
        - ingest
        - data
    - name: coordinators
      replicas: 3
      storage:
        size: "100Gi"
      nodeSelector:
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      roles: []  # No roles means the node is a coordinator node

Looking forward to your ideas.

Make defining security files optional

For example if I want to replace internal_users.yml with custom values I should be able to do that without having to include every securityconfig file.

This is related to #89

Implementation Process for Scaling nodes' disks - increase/replace disks.

Hey just created this issue as a thought on we can achieve this disk reconciler task, to increase the disk size.

Solution [1]:

Blue/Green model to add new nodes with new disk size, adjust the OpenSearch cluster and remove gracefully the old nodes and underlying PV's
Cluster should be functional with this approach.
On failure, remove the new added nodes and again re-adjust the OpenSearch cluster.

Solution [2]:

Work with existing Volumes, just increase the PVC's size and allow the k8's and underlying cloud provider to take care of backend volume adjustment and restart the OpenSearch cluster each pod one at a time.

Allow to configure exposing clusters via ingress in the CRD

Exposing the cluster with selected vendors (ingress/haproxy .etc.).

Operator security: for tls to work there is a dependance on securityConfigSecret.

For operator to add tls setting as follows:

    tls:
      transport:
        generate: true
      http:
        generate: true

There is a dependance on

  security:
    config:
      securityConfigSecret:
##Pre create this secret with required roles and security configs
       name: <secret_name>

If only TLS is added
Error

ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:02,622][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-first-cluster-masters-2] Failure no such index [.opendistro_security] retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
[2022-03-30T17:47:03,001][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)
[2022-03-30T17:47:03,004][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)
[2022-03-30T17:47:05,500][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)
[2022-03-30T17:47:05,503][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)
[2022-03-30T17:47:08,001][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)
[2022-03-30T17:47:08,004][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-masters-2] Not yet initialized (you may need to run securityadmin)

Background:
OpenSearch once TLS is added for Node Transport and HTTP rest API, the embedded security plugin creates.opendistro_security index to enable security settings, for this the securityadmin.sh has to run to load new settings, else the demo install_demo_configuration.sh file will run by default if the TLS setting is not added (If you do not configure anything opensearch will use included demo TLS certificates that are not suited for real deployments.)

curl -k https://localhost:9200/_cat/indices -u admin:admin
green open security-auditlog-2022.03.29 SHZ_xtRBTGub4NFhbtugSw 1 1 7 0 116.4kb 96.8kb
green open .kibana_1                    UOntE6z9Soa73BSdk3JI5Q 1 1 0 0    416b   208b
green open .opendistro_security         RYmlNkB5RgWAKMZU3_S05Q 1 2 9 0 178.1kb 59.3kb

With the current setup from the PR https://github.com/Opster/opensearch-k8s-operator/pull/61/files#diff-190387233823a104ed9004f0cba248cf0aa504090c923cad3be1a901bd01e99f
the securityadmin.sh will be called by a kubernetes batch job.

securityadmin.sh need to run when we add tls or custom secrets and securityadmin.sh should also run when we add new config files.

With just adding TLS setting does not run the batch job, the following is seen in logs, as once TLS is added to operator opensearch.yml is already modified with Security settings, so the Demo Installer will quit

OpenSearch Security Demo Installer
 ** Warning: Do not use on production or public reachable systems **
Basedir: /usr/share/opensearch
OpenSearch install type: rpm/deb on NAME="Amazon Linux"
OpenSearch config dir: /usr/share/opensearch/config
OpenSearch config file: /usr/share/opensearch/config/opensearch.yml
OpenSearch bin dir: /usr/share/opensearch/bin
OpenSearch plugins dir: /usr/share/opensearch/plugins
OpenSearch lib dir: /usr/share/opensearch/lib
Detected OpenSearch Version: x-content-1.2.3
Detected OpenSearch Security Version: 1.2.3.0
/usr/share/opensearch/config/opensearch.yml seems to be already configured for Security. Quit.
sed: cannot rename /usr/share/opensearch/config/seddRF6sR: Device or resource busy
Enabling OpenSearch Security Plugin

To move forward, we need to add securityConfigSecret for the security plugin to pick up TLS and passed in roles example as https://github.com/opensearch-project/security/tree/main/securityconfig
A Readme doc on configuring this setup would be helpful.
Once added

security:
   config:
     securityConfigSecret:
##Pre create this secret with required roles and security configs
      name: securityconfig-secret
   tls:
     transport:
       generate: true
     http:
       generate: true

To job runs to call securityadmin.sh

And now I can see all pods coming up

Add toleration and taints support

Add toleration and taints support for each node group

Finalise user guide documentation

There are few TBD sections that we need to elaborate on:

Need to describe those sections, and maybe add a short intro, for simple on-boarding of new users.

Create security service

Features

All components are wired using secure connection:
Basic auth
SSL
#6
Disks encryption options

Add unit and integration tests infrastructure

The tests should run on push, as part of git hub actions.

OpenSearch dashboard replicas to desired user input.

Make opensearch-k8s-operator able to adjust the OpenSearch dashboard Deployment replicas to desired user input. Dashboards can work with HA, so user should be able to pass number of replicas for Dashboard, currently its default hardcoded to 1.

Example Configuration.

  dashboards:
    enable: true
    replicas: 2

Support ephemeral disks

Add support in ephemeral disks

Add infrastructure for creating standalone service workers

Create config for the operator to use for creating workers.
The operator will use this config to load specific workers to handle desired operator operations, such as scaler, configuration services .etc.

Test operator compatibility on all clouds

Currently, the Operator is tested locally and on AWS. We would like to make sure before the Operator is GA that it is fully compatible with the major clouds - AWS, GCP, and Azure.

Add the ability to define CA

a

Modify disk size

Increase/Decrease disks size (Only PVC)

Transport certs creation is taking too long

it seems to take a really long time to create transport certs - I think that we need to try to create it using go routine asynchronously and let the rest of the reconciliation continue.
Opensearch should automatically load the certs when they appear.

Operator DiskSize: to modify with String input

Ability to add DiskSize as string rather an a fixed int value, which currently parses backend to 'Gi', user should be flexible to pass 'G' or 'Gi' and soo on.

Background: from doc
Limits and requests for memory are measured in bytes. We can express memory as a plain integer or as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. We can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.

The kibibyte was designed to replace the kilobyte in those computer science contexts in which the term kilobyte is used to mean 1024 bytes. The interpretation of kilobyte to denote 1024 bytes, conflicting with the SI definition of the prefix kilo (1000), used to be common.

So, as we can see, 5G means 5 Gigabytes while 5Gi means 5 Gibibytes. They amount to:

5 G = 5000000 KB / 5000 MB
5 Gi = 5368709.12 KB / 5368.70 MB
Therefore, in terms of size, they are not the same.

Rework CRD to avoid the Os abbreviation

Currently the kind for the operator CRD is Os. That abbreviation is also used in other places in the CRD.

Using the Os abbreviation is not optimal for two reasons:

OS is already widely known and used for Operating System, so using it here differently can confuse people
A leading member of the OpenSearch community has asked in a discussion thread (OpenSearch name: abbreviation?) not to abbreviate the name to establish a clear brand

As such I suggest renaming the kind of the CRD and also any mention of Os in the CRD fields.
Note: os can still be configured as a shortName for the CRD so a user can still do a quick kubectl get os

This is my suggestion of how the example definition could look like with new names:

apiVersion: opensearch.opster.io/v1 # Renamed
kind: OpenSearchCluster # Renamed
metadata:
  name: opster-opensearch
  namespace: operator-opensearch
spec:
  general:
    clusterName: my-opensearch
    httpPort: 9200 # renamed from osPort
    transportPort: 9300 # Introduced to make both ports configurable
    vendor: opensearch
    version: latest
    serviceName: es-svc
  dashboards: # Renamed from osConfMgmt and moved to make kibana/dashboards a top-level member, later we can add other dashboards-related config options here
    enabled: true
  nodePools: # renamed from osNodes
    - component: masters
      replicas: 3
      diskSize: 30
      NodeSelector:
      cpu: 4
      memory: 16
      ingest: "false"
    - component: nodes
      replicas: 3
      diskSize: 100
      NodeSelector:
      cpu: 4
      memory: 16
      ingest: "true"
    - component: coordinators
      replicas: 3
      diskSize: 100
      NodeSelector:
      cpu: 4
      memory: 16
      ingest: "false"

Cluster not able to reach quorum with single master.

When creating a single master cluster the, get an error as master not discovered or elected yet

Cluster.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: os-logs
  namespace: os
spec:
  security:
    tls:
      http:
        generate: true
      transport:
        generate: true
        perNode: true
  general:
    httpPort: 9200
    vendor: opensearch
    version: 1.2.3
    serviceName: os-svc
    setVMMaxMapCount: true
  confMgmt:
    autoScaler: false
    monitoring: false
  dashboards:
    enable: true
    version: 1.2.0
    replicas: 1
  nodePools:
    - component: master
      replicas: 1
      diskSize: "100Gi"
      resources:
        requests:
          cpu: 500m
          memory: 1Gi
        limits:
          memory: 1Gi
      roles:
        - master
    - component: nodes
      replicas: 1
      diskSize: 1000Gi
      resources:
        requests:
          cpu: 500m
          memory: 2Gi
        limits:
          memory: 2Gi
      jvm: "-Xmx1G -Xms1G"
      roles:
        - data
    - component: client
      replicas: 1
      diskSize: 100Gi
      resources:
        requests:
          cpu: 500m
          memory: 2Gi
        limits:
          memory: 2Gi
      jvm: "-Xmx1G -Xms1G"
      roles:
        - data

Log:

[2022-04-04T12:05:00,713][WARN ][o.o.c.NodeConnectionsService] [os-logs-master-0] failed to connect to {os-logs-bootstrap-0}{zs52XaaoT0mHtvYMg3N_Aw}{0NSI6IQXSEqq-0WJv0bICQ}{os-logs-bootstrap-0}{192.168.4.104:9300}{m}{shard_indexing_pressure_enabled=true} (tried [25] times)
org.opensearch.transport.ConnectTransportException: [os-logs-bootstrap-0][192.168.4.104:9300] connect_exception
	at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1064) ~[opensearch-1.2.3.jar:1.2.3]
	at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:213) ~[opensearch-1.2.3.jar:1.2.3]
	at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-1.2.3.jar:1.2.3]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
	at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-1.2.3.jar:1.2.3]
	at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: os-logs-bootstrap-0/192.168.4.104:9300
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
	... 7 more

master not discovered or elected yet, an election requires a node with id [zs52XaaoT0mHtvYMg3N_Aw], have discovered [{os-logs-master-0}{T08PHfy_TgOcFmrxp8t-Xg}{G4DypmTBQn63VlsqqgI7fw}{os-logs-master-0}{192.168.20.231:9300}{m}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [192.168.39.238:9300, 192.168.50.75:9300] from hosts providers and [{os-logs-bootstrap-0}{zs52XaaoT0mHtvYMg3N_Aw}{0NSI6IQXSEqq-0WJv0bICQ}{os-logs-bootstrap-0}{192.168.4.104:9300}{m}{shard_indexing_pressure_enabled=true}, {os-logs-master-0}{T08PHfy_TgOcFmrxp8t-Xg}{G4DypmTBQn63VlsqqgI7fw}{os-logs-master-0}{192.168.20.231:9300}{m}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 1, last-accepted version 29 in term 1

Cause:
It's because the operator spin up a cluster with effectively 2 master nodes, then remove the initial master which causes quorum problems.

Possible solution to fix:
Before removing the operator has to make sure that the bootstrap node should not be part of the voting configuration.

Resource requirements missing from cluster bootstrap pod

The bootstrap pod does not declare any required resources and is also not configurable via CRD.
The BootstrapPod when launched on a cluster with resource defaults defined (to ensure that all pods are configured with limits) may fail to complete a bootstrap if the configured defaults are too small to accommodate the hard-coded Java heap opts.
Both the pod resources and the bootstrap java opts should be exposed via the CRD as configurable parameters or else a default resource limit configured to match the Java opts.

OpenSearch Operator CLI

Add CLI for interacting with the operator. The CLI should expose the operator APIs and capabilities in the high-level DSL CLI.

add CHANGELOG.md

please consider adding a changelog which follows the Keep a Changelog standard. OpenSearch is moving in the same direction, see opensearch-project/OpenSearch#1868 (discussion) & opensearch-project/security#1821 (POC showing how it'd look like).

this would be very helpful for users and you can then re-use this 1:1 for the release notes.

operator should be installed into whatever namespace helm is used in

No namespace name should be set in the values.yaml. The operator should be installed into whatever namespace helm is used in (by default the selected namespace of kubectl or the one specified as helm install -n ). Helm offers {{ .Release.Namespace }} for that.

The recent (1.0?) Helm Chart Namespace changes have made the chart unusable

The Chart outputs a namespace definition with no name now... and aside from dropping the chart as a raw template and manipulating it, you can no longer install the operator using the Helm Repo at https://opster.github.io/opensearch-k8s-operator-chart/. These changes all seem to have been merged in the last 24 hours. PR #130 even references this issue, yet, that PR has not been merged yet. Seems the changes were made to main and the gh-pages publication process may have 'jumped the gun'.

Multitenant dashboards support

Is it possible to enable multitenancy in the opensearch-dashboards using the operator ?

Provide secure certificates and credentials by default

This came up during discussions in the biweekly meeting. Seems like today there is a way for users to setup Kubernetes operator with weak (hardcoded) demo certificates and admin:admin default credentials. These weak configurations are fine for PoC building but bad if carried forward to production. Going forward, we should fix this to provide strong out of the box defaults with autogenerated self-signed certificates and stronger default passwords.

Partial OpenSearchCluster spec > security > tls results in controller validation error

Creating an OpenSearchCluster resource with a partial spec:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: test-opensearch
  namespace: default
spec:
  security:
    tls:
      http:
        secret:
          name: test-opensearch-http
...

Results in:

1.6522032290237098e+09  INFO    controller.opensearchcluster    Reconciling OpenSearchCluster   {"reconciler group": "opensearch.opster.io", "reconciler kind": "OpenSearchCluster", "name": "test-opensearch", "namespace": "default", "cluster": "default/test-opensearch"}
1.6522032290357742e+09  ERROR   controller.opensearchcluster    Not all secrets for http provided       {"reconciler group": "opensearch.opster.io", "reconciler kind": "OpenSearchCluster", "name": "test-opensearch", "namespace": "default", "error": "missing secret in spec"}
opensearch.opster.io/pkg/reconcilers.(*TLSReconciler).Reconcile
        /workspace/pkg/reconcilers/tls.go:70
opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).reconcilePhaseRunning
        /workspace/controllers/opensearchController.go:326
opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).Reconcile
        /workspace/controllers/opensearchController.go:141
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

If pass in a ~full spec (with expected defaults) it will error the same.

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: test-opensearch
  namespace: default
spec:
  security:
    tls:
      http:
        generate: false
        caSecret:
          name: test-opensearch-http
        secret:
          name: test-opensearch-http
      transport:
        generate: true
        perNode: true
...

Expected behavior would be that remaining defaults would populate on their own... the CRD documentation list most of the properties as optional and does not specify this level of cross dependency (all or none like behavior). Or that the logged error was more clear... since the "secret" is not missing...

Disable namespace creation

OS cluster is currently namespaced although it creates a separate namespace to deploy cluster
I think for more convenience it should either be namespaced and deploy cluster in the same namespace where crd has been created, or should be a cluster resource and deploy cluster in a namespace specified in spec
In my opinion namespace management via operator makes sense when crd requires multiple namespaces, otherwise we will have some pain managing full lifecycle in terms of PVC management
There is a number of operators for orchestrating databases that I used to work with and all of them create clusters in the same namespace where crd is created (cassandra, vicroriametrics, etcd etc)

[FR] OpenSearch Operator on OperatorHub.io

OperatorHub.io is a new home for the Kubernetes community to share Operators.

It would be great to see OpenSearch Operator make it to the OperatorHub.io so the OpenSearch Kubernetes user-base could discover it easily.

Operator not creating pods with default installation.

Hey i'm following this guide to install the operator but I dont see operator creating pods.

make build manifests
(connecting to the cluster)
make install

But I dont see any pods bought up by the controller

kubectl get pods
No resources found in default namespace.

Used the following cluster.yaml file

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  general:
    serviceName: my-first-cluster
  dashboards:
    enable: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: 30
      NodeSelector:
      cpu: 1
      memory: 1
      roles:
        - "master"
        - "data"

The following is the output describing the cluster

Name:         my-cluster
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  opensearch.opster.io/v1
Kind:         OpenSearchCluster
Metadata:
  Creation Timestamp:  2022-03-17T20:50:17Z
  Generation:          1
  Managed Fields:
    API Version:  opensearch.opster.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:confMgmt:
          .:
          f:smartScaler:
        f:dashboards:
          .:
          f:enable:
        f:general:
          .:
          f:httpPort:
          f:serviceName:
          f:vendor:
          f:version:
        f:nodePools:
        f:security:
          .:
          f:tls:
            .:
            f:http:
              .:
              f:generate:
            f:transport:
              .:
              f:generate:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2022-03-17T20:50:17Z
  Resource Version:  264753
  UID:               8f6f66ce-04f8-413d-9c50-6b1cb1985c50
Spec:
  Conf Mgmt:
    Smart Scaler:  true
  Dashboards:
    Enable:  true
  General:
    Http Port:     9200
    Service Name:  my-cluster
    Vendor:        opensearch
    Version:       latest
  Node Pools:
    Component:  masters
    Cpu:        1
    Disk Size:  30
    Memory:     1
    Replicas:   3
    Roles:
      master
      data
    Component:  nodes
    Cpu:        1
    Disk Size:  100
    Memory:     1
    Replicas:   3
    Roles:
      data
    Component:  coordinators
    Cpu:        1
    Disk Size:  100
    Replicas:   3
    Roles:
      ingest
  Security:
    Tls:
      Http:
        Generate:  true
      Transport:
        Generate:  true
Events:            <none>

test and validate compatibility with OpenSearch v2.0

Add version upgrade procedure

Create configuration service

This service is responsible for updating configuration that requires a rolling restart

Add OpenSearch gateway

Library for communicating with the OpenSearch cluster:
This functionality should support different clusters versions.

Create snapshot and restore service

Service responsibility:

Snapshot creation
Snapshots status
Schedule restore.

Add prometheus exporter

Cluster resources reconciler - CPU and Memory

Make opensearch-k8s-operator able to adjust the OpenSearch statefulset resources (cpu/memory resources requests and limits)
The user should be able to pass cpu and memory resources to the cluster from cluster.yaml
https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/userguide/main.md
Example as

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  general:
    serviceName: my-first-cluster
  dashboards:
    enable: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: 30
      NodeSelector:
      cpu: "0.3"
      memory: "2Gi"
      roles:
        - "data"
        - "master"

The values should be passed as follows, user should be able to adjust cpu with. "m" and memory with "Mi" or "Gi".

      cpu: "0.3"
      memory: "2Gi"

How do I add additional labels and additional Env variables for mater nodes

Discussed in #133

^{Originally posted by NoorKumar May 10, 2022}
Hi Team,
I was trying to create the cluster using operator, I do not see the way to add any additional labels or any extra environment variables to any nodes (master nodes or data nodes). When I submit the cluster config file to operator to create cluster I would like add some labels and env variables.

ARM-based support?

Firstly, thanks so much for this initiative! I attempted to install the operator on a c6g.4xlarge Graviton (ARM) based processor and received the below error:

standard_init_linux.go:228: exec user process caused: exec format error

Has this come up before?

operator docker build ARM support

Need to build the operator for ARM, as part of the build process.

opensearch-project / opensearch-k8s-operator Goto Github PK

opensearch-k8s-operator's Introduction

OpenSearch Kubernetes Operator

Getting started

Roadmap

Current feature list

Installation

OpenSearch Kubernetes Operator installation & demo video

Compatibility

Development

Contributions

opensearch-k8s-operator's People

Stargazers

Watchers

Forkers

opensearch-k8s-operator's Issues

Work flow:

Features

Discussed in #133

Recommend Projects

Recommend Topics

Recommend Org