Giter VIP home page Giter VIP logo

keng-operator's People

Contributors

anjan-keysight avatar ankur-sheth avatar ashutshkumr avatar biplamal avatar hashwini-keysight avatar raballew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

anjan-keysight

keng-operator's Issues

otg-port pods require extended privileges but still use the default service account

When deploying a topology with IXIA to a cluster a bunch of otg-port pods are created. Since they are not using a specific service account on OpenShift only minimal privileges are used to run the container. This causes logs in the controller manager such as this entry:

time="2022-10-11T07:44:07Z" level=error msg="Failed to create pod for otg in 3-node-ceos-with-traffic - pods \"otg-port-eth1\" is forbidden: unable to validate against any security context constraint: [provider \"anyuid\": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider \"nonroot\": Forbidden: not usable by user or serviceaccount, provider \"hostmount-anyuid\": Forbidden: not usable by user or serviceaccount, provider \"machine-api-termination-handler\": Forbidden: not usable by user or serviceaccount, provider \"hostnetwork\": Forbidden: not usable by user or serviceaccount, provider \"hostaccess\": Forbidden: not usable by user or serviceaccount, provider \"node-exporter\": Forbidden: not usable by user or serviceaccount, provider \"meshnet\": Forbidden: not usable by user or serviceaccount, provider \"privileged\": Forbidden: not usable by user or serviceaccount]"

For more details please check the attached log file.
ixiatg-op-controller-manager-66d9845cd9-27v25-manager.log

This seems to be fixable by extending the privileges of the default service account as shown below, but in general this is not a practice recommend anywhere as other pods that do not specify an other service account will also inherit these privileges.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ixiatg-role
rules:
  - apiGroups:
      - security.openshift.io
    resourceNames:
      - privileged
    resources:
      - securitycontextconstraints
    verbs:
      - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ixiatg-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ixiatg-role
subjects:
  - kind: ServiceAccount
    name: default

A better solution would be to use a dedicated service account for pods created by the controller, so that extending privileges is limited to a specific set of application running in this namespace.

Missing readme.md

Please describe:

  • How to build production image
  • How to setup dev environment
  • How to ensure the changes I've made are validated / tested

Review the dependencies needed to setup operator

As of now, I noticed following dependencies are installed, which may necessarily not be needed:

curl git openssh-server vim unzip tar make bash wget sshpass build-essential

Let's review and see if we really need openssh-server and build-essential at least.

Allow Ixia-c Community Edition deployments

We should allow to deploy Ixia-c Community Edition (Traffic Engine only). Right now, if there is no Protocol Engine in the configmap, the deployment fails with:

Error: Error in cpdp node deploy Failed to find protocol engine image for release local-latest

Tested with configmap:

apiVersion: v1
kind: ConfigMap
metadata:
    name: ixiatg-release-config
    namespace: ixiatg-op-system
data:
    versions: |
        {
          "release": "local-latest",
          "images": [
                {
                    "name": "controller",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-controller",
                    "tag": "0.0.1-3662"
                },
                {
                    "name": "gnmi-server",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-gnmi-server",
                    "tag": "1.9.9"
                },
                {
                    "name": "traffic-engine",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-traffic-engine",
                    "tag": "1.6.0.19"
                }
            ]
        }

And a deployment spec:

{
    "metadata": {
        "name": "otg",
        "namespace": "ixia-c"
    },
    "spec": {
        "api_endpoint_map": {
            "https": {
                "in": 443,
                "out": 31001
            },
            "grpc": {
                "in": 40051,
                "out": 31002
            },
            "gnmi": {
                "in": 50051,
                "out": 31003
            }
        },
        "interfaces": [
            {
                "name": "eth1",
                "peer": "localhost",
                "peer_interface": "veth0"
            },
            {
                "name": "eth2",
                "peer": "localhost",
                "peer_interface": "veth1"
            }
        ],
        "release": "local-latest"
    }
}

OOM killed while trying to locate the configmap

I have installed both, the operator and the configmap to my cluster:

kubectl apply -f https://github.com/open-traffic-generator/ixia-c-operator/releases/download/v0.2.2/ixiatg-operator.yaml
kubectl apply -f https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-3423/ixia-configmap.yaml
kne create ...
I1007 11:39:19.682752 1 request.go:601] Waited for 1.046900147s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/migration.k8s.io/v1alpha1?timeout=32s
{"level":"info","ts":1665142761.2360508,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.2364528,"logger":"setup","msg":"starting manager - version 0.2.1\n"}
{"level":"info","ts":1665142761.2368538,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.236937,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I1007 11:39:21.237000 1 leaderelection.go:248] attempting to acquire leader lease ixiatg-op-system/b867187a.keysight.com...
I1007 11:39:36.934777 1 leaderelection.go:258] successfully acquired lease ixiatg-op-system/b867187a.keysight.com
{"level":"info","ts":1665142776.936252,"logger":"controller.ixiatg","msg":"Starting EventSource","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","source":"kind source: *v1beta1.IxiaTG"}
{"level":"info","ts":1665142776.936296,"logger":"controller.ixiatg","msg":"Starting Controller","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG"}
{"level":"info","ts":1665142777.0372126,"logger":"controller.ixiatg","msg":"Starting workers","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","worker count":1}
time="2022-10-07T11:39:37Z" level=info msg="Reconcile: otg (Desired State: INITIATED), Namespace: 3-node-ceos-withtraffic"
time="2022-10-07T11:39:37Z" level=info msg="Checking for finalizer"
time="2022-10-07T11:39:37Z" level=info msg="IXIA DS INITIATED CS "
time="2022-10-07T11:39:37Z" level=info msg="Contacting Ixia server for release dependency info - https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-9999/ixia-configmap.yaml"
time="2022-10-07T11:39:37Z" level=error msg="Failed to download release config file - Got http response 404"
time="2022-10-07T11:39:37Z" level=info msg="Try locating in ConfigMap..."

Additionally it is not able to locate the configmap which I created before creating any topology resulting in OOM kills and finally in a CrashLoopBackOff.

otg-controller pod containers do not support arbitrary user IDs

Both containers ixia-c and gnmi of the otg-controller pod fail to start due to permission denied errors when trying to run the operator on OpenShift. This is most likely due to the usage of arbitrary UIDs as part of the OpenShift multi layer security strategy as described here.

panic: Logger init failed: mkdir /home/keysight/ixia-c/controller/logs: permission denied
goroutine 1 [running]:
keysight/athena/controller/config.init.0()
/home/keysight/athena/controller/config/init.go:102 +0x1b7
panic: Logger init failed: mkdir /home/keysight/ixia-c-gnmi-server/logs: permission denied
goroutine 1 [running]:
github.com/open-traffic-generator/ixia-c-gnmi-server/config.init.0()
/home/keysight/ixia-c-gnmi-server/config/init.go:76 +0x173

To support using this operator on OpenShift the files access should be readable and writable by GID=0 (a container is always member of the root group). Thus commands invoked by the Entrypoint will be executed with a unprivileged UID and GID=0 pair. That means, it is an unprivileged user executing the commands and the UID that will be used during execution is not known in advance. From the technical design perspective, that means, directories and files that may be written to by processes in the Container should be owned by the root group and be read/writable by GID=0. Files to be executed should also have group execute permissions.

If you could point me in the right direction, I could contribute the required changes myself.

otg-controller uses privileged port 443

When creating the otg-controller pod (ixia-c container) there is no way to change the desired port on which the HTTPS server should be started

https://github.com/open-traffic-generator/ixia-c-operator/blob/372d38785bd5210586b9d256ab2cd070bbd63674/controllers/ixiatg_controller.go#L1064

Port 443 is a privileged port so that the application requires additional user privileges (root) to run properly. Either this port should be configurable or set to a non-privileged port (>1024).

Transient container pull from remote repos causes flakes with topology creation in KNE

Hello @anjan-keysight @biplamal, seeing some flakes when bringing up KNE topos with IxiaTG:

creating topology: failed to create topology: Node "otg": Status FAILED Reason got failure in ixia CRD status: Container ixia-c failed - rpc error: code = Unknown desc = failed to pull and unpack image "us-west1-docker.pkg.dev/.../ixia-c-controller:0.0.1-4013": failed to copy: read tcp 172.18.0.2:47650->74.125.132.82:443: read: connection reset by peer

This happens rarely (less than 1%) but is still affecting our KNE test runs. It appears to me that the Ixia operator treats image pull as a FAILED state (https://github.com/open-traffic-generator/ixia-c-operator/blob/a6bc34d9bc987a7d01869cfbfae670c7294862b7/README.md#ixiatg-crd) however cases where there is a flake in the pull (interrupted, etc.) there should be retry allowed before returning FAILED. If the image is not found and thats what lead to the failure then FAILED makes sense, but for the transient pull errors FAILED is too harsh and instead INITIATED should be returned for a certain amount of pull failures before declaring FAILED.

K8s retires ErrImagePull failures automatically, however for ixiatg we poll the status from the operator and thats whats causing the error

This is specifically when there is a transient error with kubernetes pulling the image from a remote repo (in this case read: connection reset by peer). Normally k8 silently retries these errors and will hang in a backoff loop indefinitely. The ixia-c operator treats these transient errors as unrecoverable failures.

Support deploying Ixia-C pods for topologies involving OTG SW ports and DUT HW ports in docker environment

Here’s the summary:

  • FeatureProfiles has already been enhanced to support static binding in order to run OTG tests (earlier only ATE tests were supported).
    • OTG tests will now run without issues against both Ixia-C S/W ports and Ixia-C H/W ports
    • The PR is still under review by a team at Google
  • From test execution POV, The OTG and DUT endpoint addresses need to be specified in static binding file (no change here).
  • From deployment POV,
    • ixia-c-operator will need to be deployed on host node, mapped to docker socket
    • ixia-c controller and port containers shall be deployed using docker API on same host based on a declarative YAML
    • the YAML will need to be pushed using curl. e.g. curl -k -X POST https://localhost:6443/ -d @deploy.yaml
    • the YAML would specify list of port containers to deploy - i.e. each item consisting of the name of interfaces inside port container and name of corresponding real interface on host it’ll bind to (real interface shall be connected to DUT H/W ports)
    • ixia-c-operator will automate network plumbing (e.g. use MacVLAN to bind port container interface to and host interface)
  • Deployment is limited to single node and KNE is not needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.