open-traffic-generator / keng-operator Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
When deploying a topology with IXIA to a cluster a bunch of otg-port pods are created. Since they are not using a specific service account on OpenShift only minimal privileges are used to run the container. This causes logs in the controller manager such as this entry:
time="2022-10-11T07:44:07Z" level=error msg="Failed to create pod for otg in 3-node-ceos-with-traffic - pods \"otg-port-eth1\" is forbidden: unable to validate against any security context constraint: [provider \"anyuid\": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider \"nonroot\": Forbidden: not usable by user or serviceaccount, provider \"hostmount-anyuid\": Forbidden: not usable by user or serviceaccount, provider \"machine-api-termination-handler\": Forbidden: not usable by user or serviceaccount, provider \"hostnetwork\": Forbidden: not usable by user or serviceaccount, provider \"hostaccess\": Forbidden: not usable by user or serviceaccount, provider \"node-exporter\": Forbidden: not usable by user or serviceaccount, provider \"meshnet\": Forbidden: not usable by user or serviceaccount, provider \"privileged\": Forbidden: not usable by user or serviceaccount]"
For more details please check the attached log file.
ixiatg-op-controller-manager-66d9845cd9-27v25-manager.log
This seems to be fixable by extending the privileges of the default service account as shown below, but in general this is not a practice recommend anywhere as other pods that do not specify an other service account will also inherit these privileges.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ixiatg-role
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- privileged
resources:
- securitycontextconstraints
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ixiatg-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ixiatg-role
subjects:
- kind: ServiceAccount
name: default
A better solution would be to use a dedicated service account for pods created by the controller, so that extending privileges is limited to a specific set of application running in this namespace.
Please describe:
As of now, I noticed following dependencies are installed, which may necessarily not be needed:
curl git openssh-server vim unzip tar make bash wget sshpass build-essential
Let's review and see if we really need openssh-server and build-essential at least.
We should allow to deploy Ixia-c Community Edition (Traffic Engine only). Right now, if there is no Protocol Engine in the configmap, the deployment fails with:
Error: Error in cpdp node deploy Failed to find protocol engine image for release local-latest
Tested with configmap:
apiVersion: v1
kind: ConfigMap
metadata:
name: ixiatg-release-config
namespace: ixiatg-op-system
data:
versions: |
{
"release": "local-latest",
"images": [
{
"name": "controller",
"path": "ghcr.io/open-traffic-generator/ixia-c-controller",
"tag": "0.0.1-3662"
},
{
"name": "gnmi-server",
"path": "ghcr.io/open-traffic-generator/ixia-c-gnmi-server",
"tag": "1.9.9"
},
{
"name": "traffic-engine",
"path": "ghcr.io/open-traffic-generator/ixia-c-traffic-engine",
"tag": "1.6.0.19"
}
]
}
And a deployment spec:
{
"metadata": {
"name": "otg",
"namespace": "ixia-c"
},
"spec": {
"api_endpoint_map": {
"https": {
"in": 443,
"out": 31001
},
"grpc": {
"in": 40051,
"out": 31002
},
"gnmi": {
"in": 50051,
"out": 31003
}
},
"interfaces": [
{
"name": "eth1",
"peer": "localhost",
"peer_interface": "veth0"
},
{
"name": "eth2",
"peer": "localhost",
"peer_interface": "veth1"
}
],
"release": "local-latest"
}
}
As of today, all Ixia-C port pods deployed on a given node use the same set of CPU cores.
Check if it's feasible to rename the file to ixiatg-config.yaml and assess compatibility issues.
I noticed multiple environment variables referring to internal artifactory, which may not really be needed
Add a license similar to https://github.com/open-traffic-generator/snappi/blob/main/LICENSE
Please remove it if it's not needed anymore
I have installed both, the operator and the configmap to my cluster:
kubectl apply -f https://github.com/open-traffic-generator/ixia-c-operator/releases/download/v0.2.2/ixiatg-operator.yaml
kubectl apply -f https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-3423/ixia-configmap.yaml
kne create ...
I1007 11:39:19.682752 1 request.go:601] Waited for 1.046900147s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/migration.k8s.io/v1alpha1?timeout=32s
{"level":"info","ts":1665142761.2360508,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.2364528,"logger":"setup","msg":"starting manager - version 0.2.1\n"}
{"level":"info","ts":1665142761.2368538,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.236937,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I1007 11:39:21.237000 1 leaderelection.go:248] attempting to acquire leader lease ixiatg-op-system/b867187a.keysight.com...
I1007 11:39:36.934777 1 leaderelection.go:258] successfully acquired lease ixiatg-op-system/b867187a.keysight.com
{"level":"info","ts":1665142776.936252,"logger":"controller.ixiatg","msg":"Starting EventSource","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","source":"kind source: *v1beta1.IxiaTG"}
{"level":"info","ts":1665142776.936296,"logger":"controller.ixiatg","msg":"Starting Controller","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG"}
{"level":"info","ts":1665142777.0372126,"logger":"controller.ixiatg","msg":"Starting workers","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","worker count":1}
time="2022-10-07T11:39:37Z" level=info msg="Reconcile: otg (Desired State: INITIATED), Namespace: 3-node-ceos-withtraffic"
time="2022-10-07T11:39:37Z" level=info msg="Checking for finalizer"
time="2022-10-07T11:39:37Z" level=info msg="IXIA DS INITIATED CS "
time="2022-10-07T11:39:37Z" level=info msg="Contacting Ixia server for release dependency info - https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-9999/ixia-configmap.yaml"
time="2022-10-07T11:39:37Z" level=error msg="Failed to download release config file - Got http response 404"
time="2022-10-07T11:39:37Z" level=info msg="Try locating in ConfigMap..."
Additionally it is not able to locate the configmap which I created before creating any topology resulting in OOM kills and finally in a CrashLoopBackOff.
Both containers ixia-c
and gnmi
of the otg-controller pod fail to start due to permission denied errors when trying to run the operator on OpenShift. This is most likely due to the usage of arbitrary UIDs as part of the OpenShift multi layer security strategy as described here.
panic: Logger init failed: mkdir /home/keysight/ixia-c/controller/logs: permission denied
goroutine 1 [running]:
keysight/athena/controller/config.init.0()
/home/keysight/athena/controller/config/init.go:102 +0x1b7
panic: Logger init failed: mkdir /home/keysight/ixia-c-gnmi-server/logs: permission denied
goroutine 1 [running]:
github.com/open-traffic-generator/ixia-c-gnmi-server/config.init.0()
/home/keysight/ixia-c-gnmi-server/config/init.go:76 +0x173
To support using this operator on OpenShift the files access should be readable and writable by GID=0 (a container is always member of the root group). Thus commands invoked by the Entrypoint will be executed with a unprivileged UID and GID=0 pair. That means, it is an unprivileged user executing the commands and the UID that will be used during execution is not known in advance. From the technical design perspective, that means, directories and files that may be written to by processes in the Container should be owned by the root group and be read/writable by GID=0. Files to be executed should also have group execute permissions.
If you could point me in the right direction, I could contribute the required changes myself.
When creating the otg-controller pod (ixia-c container) there is no way to change the desired port on which the HTTPS server should be started
Port 443 is a privileged port so that the application requires additional user privileges (root) to run properly. Either this port should be configurable or set to a non-privileged port (>1024).
Hello @anjan-keysight @biplamal, seeing some flakes when bringing up KNE topos with IxiaTG:
creating topology: failed to create topology: Node "otg": Status FAILED Reason got failure in ixia CRD status: Container ixia-c failed - rpc error: code = Unknown desc = failed to pull and unpack image "us-west1-docker.pkg.dev/.../ixia-c-controller:0.0.1-4013": failed to copy: read tcp 172.18.0.2:47650->74.125.132.82:443: read: connection reset by peer
This happens rarely (less than 1%) but is still affecting our KNE test runs. It appears to me that the Ixia operator treats image pull as a FAILED state (https://github.com/open-traffic-generator/ixia-c-operator/blob/a6bc34d9bc987a7d01869cfbfae670c7294862b7/README.md#ixiatg-crd) however cases where there is a flake in the pull (interrupted, etc.) there should be retry allowed before returning FAILED. If the image is not found and thats what lead to the failure then FAILED makes sense, but for the transient pull errors FAILED is too harsh and instead INITIATED should be returned for a certain amount of pull failures before declaring FAILED.
K8s retires ErrImagePull failures automatically, however for ixiatg we poll the status from the operator and thats whats causing the error
This is specifically when there is a transient error with kubernetes pulling the image from a remote repo (in this case read: connection reset by peer
). Normally k8 silently retries these errors and will hang in a backoff loop indefinitely. The ixia-c operator treats these transient errors as unrecoverable failures.
Also need to address probe for multiple ports and code refactor for probe addition for each published port.
Here’s the summary:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.