intel / intel-device-plugins-for-kubernetes Goto Github PK

Collection of Intel device plugins for Kubernetes

License: Apache License 2.0

Makefile 1.08% Python 1.11% Go 89.60% Shell 2.00% Dockerfile 6.21%

kubernetes swrepo cloud-computing edge-computing 5g cloud plug-ins

intel-device-plugins-for-kubernetes's Introduction

Overview

This repository contains a framework for developing plugins for the Kubernetes device plugins framework, along with a number of device plugin implementations utilizing that framework.

The v0.30 release is the latest feature release with its documentation available here.

Table of Contents

Prerequisites
Plugins
Device Plugins Operator
XeLink XPU Manager sidecar
Demos
Workload Authors
Developers
Releases
- Supported Kubernetes versions
- Release procedures
Pre-built plugin images
- Signed container images
License
Helm charts

Prerequisites

Prerequisites for building and running these device plugins include:

Appropriate hardware and drivers
A fully configured [Kubernetes cluster]
A working [Go environment], of at least version v1.16.

Plugins

The below sections detail existing plugins developed using the framework.

GPU Device Plugin

The GPU device plugin provides access to discrete and integrated Intel GPU device files.

The demo subdirectory contains both a GPU plugin demo video and an OpenCL sample deployment (intelgpu-job.yaml).

FPGA Device Plugin

The FPGA device plugin supports FPGA passthrough for the following hardware:

Intel® Arria® 10 devices
Intel® Stratix® 10 devices

The FPGA plugin comes as three parts.

the device plugin
the admission controller
the OCI createRuntime hook

Refer to each individual sub-components documentation for more details. Brief overviews of the sub-components are below.

The demo subdirectory contains a video showing deployment and use of the FPGA plugin. Sources relating to the demo can be found in the opae-nlb-demo subdirectory.

Device Plugin

The FPGA device plugin is responsible for discovering and reporting FPGA devices to kubelet.

Admission Controller

The FPGA admission controller webhook is responsible for performing mapping from user-friendly function IDs to the Interface ID and Bitstream ID that are required for FPGA programming. It also implements access control by namespacing FPGA configuration information.

OCI createRuntime Hook

The FPGA OCI createRuntime hook performs discovery of the requested FPGA function bitstream and programs FPGA devices based on the environment variables in the workload description.

QAT Device Plugin

The QAT plugin supports device plugin for Intel QAT adapters, and includes code showing deployment via DPDK.

The demo subdirectory includes details of both a QAT DPDK demo and a QAT OpenSSL demo. Source for the OpenSSL demo can be found in the relevant subdirectory.

Details for integrating the QAT device plugin into Kata Containers can be found in the Kata Containers documentation repository.

SGX Device Plugin

The SGX device plugin allows workloads to use Intel® Software Guard Extensions (Intel® SGX) on platforms with SGX Flexible Launch Control enabled, e.g.,:

3rd Generation Intel® Xeon® Scalable processor family, code-named “Ice Lake”
Intel® Xeon® E3 processor
Intel® NUC Kit NUC7CJYH

The Intel SGX plugin comes in three parts.

the device plugin
the admission webhook
the SGX EPC memory registration

The demo subdirectory contains a video showing the deployment and use of the Intel SGX device plugin. Sources relating to the demo can be found in the sgx-sdk-demo and sgx-aesmd-demo subdirectories.

Brief overviews of the Intel SGX sub-components are given below.

device plugin

The SGX device plugin is responsible for discovering and reporting Intel SGX device nodes to kubelet.

Containers requesting Intel SGX resources in the cluster should not use the device plugins resources directly.

Intel SGX Admission Webhook

The Intel SGX admission webhook is responsible for performing Pod mutations based on the sgx.intel.com/quote-provider pod annotation set by the user. The purpose of the webhook is to hide the details of setting the necessary device resources and volume mounts for using Intel SGX remote attestation in the cluster. Furthermore, the Intel SGX admission webhook is responsible for writing a pod/sandbox sgx.intel.com/epc annotation that is used by Kata Containers to dynamically adjust its virtualized Intel SGX encrypted page cache (EPC) bank(s) size.

The Intel SGX admission webhook is available as part of Intel Device Plugin Operator or as a standalone SGX Admission webhook image.

Intel SGX EPC memory registration

The Intel SGX EPC memory available on each node is registered as a Kubernetes extended resource using node-feature-discovery (NFD). An NFD Node Feature Rule is installed as part of SGX device plugin operator deployment and NFD is configured to register the Intel SGX EPC memory extended resource.

Containers requesting Intel SGX EPC resources in the cluster use sgx.intel.com/epc resource which is of type memory.

DSA Device Plugin

The DSA device plugin supports acceleration using the Intel Data Streaming accelerator(DSA).

DLB Device Plugin

The DLB device plugin supports Intel Dynamic Load Balancer accelerator(DLB).

IAA Device Plugin

The IAA device plugin supports acceleration using the Intel Analytics accelerator(IAA).

Device Plugins Operator

To simplify the deployment of the device plugins, a unified device plugins operator is implemented.

Currently the operator has support for the DSA, DLB, FPGA, GPU, IAA, QAT, and Intel SGX device plugins. Each device plugin has its own custom resource definition (CRD) and the corresponding controller that watches CRUD operations to those custom resources.

The Device plugins operator README gives the installation and usage details for the community operator available on operatorhub.io.

The Device plugins Operator for OpenShift gives the installation and usage details for the operator available on Red Hat OpenShift Container Platform.

XeLink XPU Manager Sidecar

To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.

The XeLink XPU Manager sidecar README gives information how the sidecar functions and how to use it.

Demos

The demo subdirectory contains a number of demonstrations for a variety of the available plugins.

Workload Authors

For workloads to get accesss to devices managed by the plugins, the Pod spec must specify the hardware resources needed:

spec:
  containers:
    - name: demo-container
      image: <registry>/<image>:<version>
      resources:
        limits:
          <device namespace>/<resource>: X

The summary of resources available via plugins in this repository is given in the list below.

Device Namespace : Registered Resource(s)

dlb.intel.com : pf or vf
- dlb-libdlb-demo-pod.yaml
dsa.intel.com : wq-user-[shared or dedicated]
- dsa-accel-config-demo-pod.yaml
fpga.intel.com : custom, see mappings
- intelfpga-job.yaml
gpu.intel.com : i915, i915_monitoring, xe or xe_monitoring
- intelgpu-job.yaml
iaa.intel.com : wq-user-[shared or dedicated]
- iaa-accel-config-demo-pod.yaml
qat.intel.com : generic or cy/dc/asym-dc/sym-dc
- compress-perf-dpdk-pod-requesting-qat-dc.yaml
- crypto-perf-dpdk-pod-requesting-qat-cy.yaml
sgx.intel.com : epc
- intelsgx-job.yaml

Developers

For information on how to develop a new plugin using the framework or work on development task in this repository, see the Developers Guide.

Releases

Supported Kubernetes Versions

Releases are made under the github releases area. Supported releases and matching Kubernetes versions are listed below:

Branch	Kubernetes branch/version	Status
release-0.30	Kubernetes 1.30 branch v1.30.x	supported
release-0.29	Kubernetes 1.29 branch v1.29.x	supported
release-0.28	Kubernetes 1.28 branch v1.28.x	supported
release-0.27	Kubernetes 1.27 branch v1.27.x	unsupported
release-0.26	Kubernetes 1.26 branch v1.26.x	unsupported
release-0.25	Kubernetes 1.25 branch v1.25.x	unsupported
release-0.24	Kubernetes 1.24 branch v1.24.x	unsupported
release-0.23	Kubernetes 1.23 branch v1.23.x	unsupported
release-0.22	Kubernetes 1.22 branch v1.22.x	unsupported
release-0.21	Kubernetes 1.21 branch v1.21.x	unsupported
release-0.20	Kubernetes 1.20 branch v1.20.x	unsupported

Note: Device plugins leverage the Kubernetes v1 API. The API itself is GA (generally available) and does not change between Kubernetes versions. One does not necessarily need to use the latest Kubernetes cluster with the latest device plugin version. Using a newer device plugins release should work without issues on an older Kubernetes cluster. One possible exception to this are the device plugins CRDs that can vary between versions.

Go environment | Kubernetes cluster setup

Release procedures

Project's release cadence is tied to Kubernetes release cadence. Device plugins release typically follows a couple of weeks after the Kubernetes release. There can be some delays on the releases due to required changes in the pull request pipeline. Once the content is available in the main branch and CI & e2e validation PASSes, release branch will be created (e.g. release-0.26). The HEAD of release branch will also be tagged with the corresponding tag (e.g. v0.26.0).

During the release creation, the project's documentation, deployment files etc. will be changed to point to the newly created version.

Patch releases (e.g. 0.26.3) are done on a need basis if there are security issues or minor fixes requested for specific version. Fixes are always cherry-picked from the main branch to the release branches.

Pre-built plugin images

Pre-built images of the plugins are available on the Docker hub. These images are automatically built and uploaded to the hub from the latest main branch of this repository.

Release tagged images of the components are also available on the Docker hub, tagged with their release version numbers in the format x.y.z, corresponding to the branches and releases in this repository.

Note: the default deployment files and operators are configured with imagePullPolicy IfNotPresent and can be changed with scripts/set-image-pull-policy.sh.

Signed container images

Starting from 0.31 release, the images (0.31.0 etc., not devel) are signed with keyless signing using cosign. The signing proof is stored in rekor.sigstore.dev in an append-only transparency log. The signature is also stored within the dockerhub.

To verify the signing in Kubernetes, one can use policy managers with keyless authorities.

License

All of the source code required to build intel-device-plugins-for-kubernetes is available under Open Source licenses. The source code files identify external Go modules used. Binaries are distributed as container images on DockerHub*. Those images contain license texts and source code under /licenses.

Helm Charts

Device Plugins Helm Charts are located in Intel Helm Charts repository Intel Helm Charts. This is another way of distributing Kubernetes resources of the device plugins framework.

To add repo:

helm repo add intel https://intel.github.io/helm-charts

intel-device-plugins-for-kubernetes's People

Contributors

Stargazers

Watchers

Forkers

rojkov mythi eero-t tkatila hugo-syn jstamel eriklindahl hj-johannes-lee joshuaandrew

intel-device-plugins-for-kubernetes's Issues

FPGA webhook should be in kube-system namespace

@rojkov please update deployment configurations that FPGA admission hook components will be deployed to kube-system namespace.

Create new demos based on OpenVINO

For FPGA and maybe GPU demos we can use something from https://software.intel.com/en-us/openvino-toolkit/choose-download to create more spectacular demos

Broken link to QAT screencast

We have a broken link on our demos page:
https://asciinema.org/a/H723QlyIf69H6Rpv0YihCtcKw

It leads to error page with:

This recording has been archived
All unclaimed recordings (the ones not linked to any user account) are automatically archived 7 days after upload.

qat_plugin: additional review comments

I've tested the QAT plug-in on my host and read the code a little bit. Here's my observations/improvement ideas:

while re-running the plug-in multiple times, I eventually run "out of resources" because the devices remain bound to the dpdk user IO driver. The driver should probably implement:
for i in `ls -d /sys/bus/pci/drivers/vfio-pci/0000:*`; do echo `basename $i` > /sys/bus/pci/drivers/vfio-pci/unbind; done (for the given dpdk-driver) in its init (Edit: this might be trickier than just unbind)
maxNumdevice check looks wrong. 1) the loop gives maxNumdevicefrom each kernel driver resulting N_drivers * maxNumdevice devices. 2) the loop continues for all files even if maxNumdevice is already reached

opae-nlb-demo: install content in /usr/local

in order to better separate between "base image" and add-on content.

vendored deps: prune unused code.

We need to check if we can reduce size of vendored dependencies by playing with dep tool options:
https://golang.github.io/dep/docs/Gopkg.toml.html#prune

check for formatting issues in TravisCI

It looks like TravisCI is supposed to detect code that isn't formatted in the canonic way, because there is a make format invocation in .travis.yml. But what that does is reformat code, without failing the CI build because go fmt returns a zero exit code regardless whether changes were necessary or not.

Here's what I've been using in a different project:

test_fmt:
	@ files=$$(find pkg cmd test -name '*.go'); \
	if [ $$(gofmt -d $$files | wc -l) -ne 0 ]; then \
		echo "formatting errors:"; \
		gofmt -d $$files; \
		false; \
	fi
fmt:
	gofmt -l -w $$(find pkg cmd test -name '*.go')

qat_plugin: README: add notes about kernel mode

we should add a brief intro about -mode kernel in qat_plugin README. It should also add a note about the fact that this mode currently requires all UIO devices mounted in all containers.

gpu plugin device can only be used by one pod

Hi, currently gpu plugin only exposes one device instance e.g one 'card0' with device node /dev/dri/card0 and /dev/dri/renderD128 for service, so only one pod can use it. But as drm device node could be accessed by any number of clients, only limit to one pod is not good, gpu device can be utilized by many more pods.

I'm not sure what's the best way to handle this. One option is to be able to pass max number of pods for gpu access when plugin start, then plugin will report that number of devices to kubelet service, which could server more pods for gpu access. Idea?

Health monitoring for FPGA

We seen situations where FPGA device become irresponsible after using some bitstreams.
We need to get proper health reporting by device plugin

Support for Intel Movidius Neural Compute Stick?

Any chance of seeing a plugin for supporting using the ncs with kubernetes? I'd really like to be able to use it in my cluster.

QAT: crypto-perf image fails to build

Here is what I'v got when I tried to build crypto-perf image:

$ cd demo
$ ./build-image.sh crypto-perf
++ dirname ./build-image.sh
+ CWD=.
+ IMG=crypto-perf
+ '[' -z crypto-perf ']'
+ '[' '!' -d crypto-perf ']'
+ PROXY_VARS='http_proxy https_proxy'
+ BUILD_ARGS=
+ for proxy in '$PROXY_VARS'
+ '[' -v http_proxy ']'
++ tr -d ' '
++ echo http://proxy-chain.intel.com:911
+ val=http://proxy-chain.intel.com:911
+ BUILD_ARGS=' --build-arg http_proxy=http://proxy-chain.intel.com:911'
+ RUN_ARGS=' -e http_proxy=http://proxy-chain.intel.com:911'
+ for proxy in '$PROXY_VARS'
+ '[' -v https_proxy ']'
++ tr -d ' '
++ echo http://proxy-chain.intel.com:911
+ val=http://proxy-chain.intel.com:911
+ BUILD_ARGS=' --build-arg http_proxy=http://proxy-chain.intel.com:911 --build-arg https_proxy=http://proxy-chain.intel.com:911'
+ RUN_ARGS=' -e http_proxy=http://proxy-chain.intel.com:911 -e https_proxy=http://proxy-chain.intel.com:911'
+ docker build -t crypto-perf --build-arg http_proxy=http://proxy-chain.intel.com:911 --build-arg https_proxy=http://proxy-chain.intel.com:911 ./crypto-perf/
Sending build context to Docker daemon  2.048kB
Step 1/8 : FROM centos:7
 ---> 5182e96772bf
Step 2/8 : ENV SHELL /bin/bash
 ---> Using cache
 ---> 19e95b4e9030
Step 3/8 : COPY ./resolv.conf /etc/resolv.conf
COPY failed: stat /var/lib/docker/tmp/docker-builder406254565/resolv.conf: no such file or directory

OpenCL does not work on servers with two PACs

Hi. Using this plug-in,
I'm trying OpenVINO container with FPGA.

When run OpenVINO on my DELL R640 with one PAC,
I could execute inference with FPGA without any problem.

But, on my DELL R740 which has 2 PAC,
I could not execute OpenVINO inference or OpenCL memory diagnostics.

I attached some experiments log.
Please find it.

log summary

please find detailed log of these 6 patterns below.

DELL R640 (It has one PAC)
- 1. on the pod
  - "It works"
DELL R740 (It has two PAC)
- 1. on the host
- 1. on the pod (each pod is attached 1 PAC)
- 1. on the pod (single pod is attached 2 PAC)
- 1. on docker container without k8s (container is attached 2 PAC)
  - "It works, but 2 PAC are necessary..."
- 1. on docker container without k8s (container is attached 1 PAC)

enviroments

DELL R640 opae version 1.0.2-1
DELL R740 opae version 1.0.2-1

R640

0. on the pod

It works

[kaz@r640-2 openvino]$ k apply -f openvino-pod.yaml
pod/openvino-pod created


[kaz@r640-2 openvino]$ k get po
NAME                                            READY   STATUS    RESTARTS   AGE
intel-fpga-webhook-deployment-98d745549-27cnx   1/1     Running   2          33d
openvino-pod-nfwrc                              1/1     Running   0          11s


[kaz@r640-2 openvino]$ k exec -ti openvino-job-nfwrc bash
root@openvino-pod-nfwrc:/#

root@openvino-pod-nfwrc:/# aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_f300000     Passed            PAC Arria 10 Platform (pac_a10_f300000)
                                      PCIe 216:00.0
                                      FPGA temperature = 52 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices


root@openvino-pod-nfwrc:/# aocl diagnose all
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_f300000)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write top speed = 6192.44 MB/s
Read top speed = 5507.40 MB/s
Throughput = 5849.92 MB/s

DIAGNOSTIC_PASSED


[ INFO ] InferenceEngine:
	API version ............ 1.4
	Build .................. 17328
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ]     /opt/intel/computer_vision_sdk/deployment_tools/demo/car.png
[ INFO ] Loading plugin

	API version ............ 1.4
	Build .................. heteroPlugin
	Description ....... heteroPlugin
[ INFO ] Loading network files:
	/root/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml
	/root/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Top 10 results:

Image /opt/intel/computer_vision_sdk/deployment_tools/demo/car.png

817 0.8741471 label sports car, sport car
511 0.0435212 label convertible
479 0.0435212 label car wheel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ INFO ] Execution successful

R740

1. on the host

[kaz@r740 openvino]$ aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0

Package Pat:
/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00001     Passed            PAC Arria 10 Platform (pac_a10_ee00001)
                                      PCIe 216:00.0
                                      FPGA temperature = 43 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------
--------------------------------------------------------------------
Device Name:
acl1

Package Pat:
/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00000     Passed            PAC Arria 10 Platform (pac_a10_ee00000)
                                      PCIe 134:00.0
                                      FPGA temperature = 42 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices


[kaz@r740 openvino]$ aocl diagnose all
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00001)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6824.68 MB/s
Read top speed = 6797.44 MB/s
Throughput = 6811.06 MB/s

DIAGNOSTIC_PASSED

Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00000)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6848.64 MB/s
Read top speed = 6766.13 MB/s
Throughput = 6807.38 MB/s

DIAGNOSTIC_PASSED

2. on the pod (each pod is attached 1 PAC)

[kaz@r740 openvino]$ k apply -f openvino-pod.yaml
pod/openvino-pod created

[kaz@r740 openvino]$ k apply -f openvino-pod2.yaml
pod/openvino-pod2 created

[kaz@r740 openvino]$ k get po
NAME                                            READY   STATUS    RESTARTS   AGE
intel-fpga-webhook-deployment-98d745549-27cnx   1/1     Running   2          33d
openvino-pod                                    1/1     Running   0          4s
openvino-pod2                                   1/1     Running   0          2s

[kaz@r740 openvino]$ k exec -ti openvino-pod bash
root@openvino-pod:/# aocl diagnose
Error opening AFC: no driver available
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00001     Passed            PAC Arria 10 Platform (pac_a10_ee00001)
                                      PCIe 216:00.0
                                      FPGA temperature = 42 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
root@openvino-pod:/# aocl diagnose all
Error opening AFC: no driver available
Error opening AFC: no driver available

root@openvino-pod:/# exit
exit

[kaz@r740 openvino]$ k exec -ti openvino-pod2 bash
root@openvino-pod2:/# aocl diagnose
Error opening AFC: no driver available
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00000     Passed            PAC Arria 10 Platform (pac_a10_ee00000)
                                      PCIe 134:00.0
                                      FPGA temperature = 41 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
root@openvino-pod2:/# aocl diagnose all
Error opening AFC: no driver available
Error opening AFC: no driver available
root@openvino-pod2:/#

3. on the pod (single pod is attached 2 PAC)

[kaz@r740 openvino]$ clear
[kaz@r740 openvino]$ k apply -f openvino-pod12.yaml
pod/openvino-pod12 created
[kaz@r740 openvino]$ k exec -ti openvino-pod12 bash
root@openvino-pod12:/# aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00001     Passed            PAC Arria 10 Platform (pac_a10_ee00001)
                                      PCIe 216:00.0
                                      FPGA temperature = 43 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------
--------------------------------------------------------------------
Device Name:
acl1

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00000     Passed            PAC Arria 10 Platform (pac_a10_ee00000)
                                      PCIe 134:00.0
                                      FPGA temperature = 41 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices

root@openvino-pod12:/# aocl diagnose all
Error initializing DMA: 1
Error initializing mmd dma
Error initializing bsp
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00001)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6853.11 MB/s
Read top speed = 6812.40 MB/s
Throughput = 6832.75 MB/s

DIAGNOSTIC_PASSED

Error initializing DMA: 1
Error initializing mmd dma
Error initializing bsp
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Can't open device #1
root@openvino-pod12:/#

root@openvino-pod12:/# aocl diagnose acl0
Error initializing DMA: 1
Error initializing mmd dma
Error initializing bsp
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00001)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6845.63 MB/s
Read top speed = 6823.58 MB/s
Throughput = 6834.60 MB/s

DIAGNOSTIC_PASSED

root@openvino-pod12:/# aocl diagnose acl1
Error initializing DMA: 1
Error initializing mmd dma
Error initializing bsp
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Can't open device #1

4. on docker container without k8s (container is attached 2 PAC)

It works, but 2 PAC are necessary...

root@bf083a3cd0a1:/# aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00001     Passed            PAC Arria 10 Platform (pac_a10_ee00001)
                                      PCIe 216:00.0
                                      FPGA temperature = 43 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------
--------------------------------------------------------------------
Device Name:
acl1

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00000     Passed            PAC Arria 10 Platform (pac_a10_ee00000)
                                      PCIe 134:00.0
                                      FPGA temperature = 42 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
root@bf083a3cd0a1:/# aocl diagnose all
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00001)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6800.08 MB/s
Read top speed = 6746.52 MB/s
Throughput = 6773.30 MB/s

DIAGNOSTIC_PASSED
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ee00000)
~~~~~~~~~~~~~
~~~~~~~~~~~~~
Write top speed = 6852.69 MB/s
Read top speed = 6690.27 MB/s
Throughput = 6771.48 MB/s

DIAGNOSTIC_PASSED

5. on docker container without k8s (container is attached 1 PAC)

[kaz@r740 openvino]$ docker run --rm -it --mount type=bind,source=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/,destination=/opt/a10_gx_pac_ias_1_1_pv/ --mount type=bind,source=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp/linux64/lib/,destination=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp/linux64/lib/ --mount type=bind,source=/opt/intel/computer_vision_sdk/bitstreams/a10_dcp_bitstreams/,destination=/opt/intel/computer_vision_sdk/bitstreams/a10_dcp_bitstreams/ --mount type=bind,source=/opt/intel/fpga-sw/opae/lib/,destination=/opt/intel/fpga-sw/opae/lib/ --mount type=bind,source=/opt/intel/fpga-sw/opencl/opencl_bsp/linux64/lib/,destination=/opt/intel/fpga-sw/opencl/opencl_bsp/linux64/lib/ --mount type=bind,source=/home/kaz/inteldevstack/intelFPGA_pro,destination=/opt/intel/intelFPGA_pro --mount type=bind,source=/opt/altera,destination=/opt/altera --mount type=bind,source=/etc/OpenCL/vendors,destination=/etc/OpenCL/vendors --mount type=bind,source=/opt/Intel/OpenCL/Boards,destination=/opt/Intel/OpenCL/Boards --device /dev/intel-fpga-fme.0:/dev/intel-fpga-fme.0 --device /dev/intel-fpga-port.0:/dev/intel-fpga-port.0  --cap-add=IPC_LOCK openvino_fpga:1.0

root@b9d0856a4801:/# aocl diagnose
Error opening AFC: no driver available
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00000     Passed            PAC Arria 10 Platform (pac_a10_ee00000)
                                      PCIe 134:00.0
                                      FPGA temperature = 42 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
root@b9d0856a4801:/# aocl diagnose all
Error opening AFC: no driver available
Error opening AFC: no driver available
root@b9d0856a4801:/# exit
exit

switch PAC


[kaz@r740 openvino]$ docker run --rm -it --mount type=bind,source=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/,destination=/opt/a10_gx_pac_ias_1_1_pv/ --mount type=bind,source=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp/linux64/lib/,destination=/home/kaz/inteldevstack/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp/linux64/lib/ --mount type=bind,source=/opt/intel/computer_vision_sdk/bitstreams/a10_dcp_bitstreams/,destination=/opt/intel/computer_vision_sdk/bitstreams/a10_dcp_bitstreams/ --mount type=bind,source=/opt/intel/fpga-sw/opae/lib/,destination=/opt/intel/fpga-sw/opae/lib/ --mount type=bind,source=/opt/intel/fpga-sw/opencl/opencl_bsp/linux64/lib/,destination=/opt/intel/fpga-sw/opencl/opencl_bsp/linux64/lib/ --mount type=bind,source=/home/kaz/inteldevstack/intelFPGA_pro,destination=/opt/intel/intelFPGA_pro --mount type=bind,source=/opt/altera,destination=/opt/altera --mount type=bind,source=/etc/OpenCL/vendors,destination=/etc/OpenCL/vendors --mount type=bind,source=/opt/Intel/OpenCL/Boards,destination=/opt/Intel/OpenCL/Boards  --device /dev/intel-fpga-fme.1:/dev/intel-fpga-fme.1 --device /dev/intel-fpga-port.1:/dev/intel-fpga-port.1 --cap-add=IPC_LOCK openvino_fpga:1.0
root@3d83eeca62e8:/# export AOCL_BOARD_PACKAGE_ROOT=/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp
root@3d83eeca62e8:/# export LD_LIBRARY_PATH=/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp/linux64/lib:$LD_LIBRARY_PATH:
root@3d83eeca62e8:/# export LD_LIBRARY_PATH=/opt/intel/fpga-sw/opae/lib/:$LD_LIBRARY_PATH:
root@3d83eeca62e8:/#
root@3d83eeca62e8:/# aocl diagnose
Error opening AFC: no driver available
--------------------------------------------------------------------
Device Name:
acl0

BSP Install Location:
/opt/a10_gx_pac_ias_1_1_pv/opencl/opencl_bsp

Vendor: Intel Corp

Physical Dev Name   Status            Information

pac_a10_ee00001     Passed            PAC Arria 10 Platform (pac_a10_ee00001)
                                      PCIe 216:00.0
                                      FPGA temperature = 43 degrees C.

DIAGNOSTIC_PASSED
--------------------------------------------------------------------

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
root@3d83eeca62e8:/# aocl diagnose all
Error opening AFC: no driver available
Error opening AFC: no driver available

Evaluate operator approach to distribute out-of-tree kernel drivers and tools for accelerators

For FPGA and QAT we want to evaluate means on how to distribute and automate lifecycle of out-of-tree kernel modules and tools via operators.

Reference:
https://github.com/zvonkok/special-resource-operator

Restructure README.md file(s)

We need to split current README.md file into several:

main README.md file in top level directory should have high level description of the repo, supported k8s versions, list and pointers to README files for each plugin inside that repo.
CONTRIBUTING.md - description on how to contribute to that repository.
common file building.md or something like that: instructions how to get sources and build binaries.
For each plugin (GPU, FPGA, ...) we need to have dedicated README file, containing pointer to building.md, and then description of functionality and usages for this particular plugin.

FPGA CRI hook: make hook output to be visible in the logs

Currently output of the hook is not visible anywhere, which makes it hard to investigate what's going on when hook doesn't work as expected.

It would be great to have the output visible in the CRI-O systemd logs: journalctl -u crio.

Hopefully it's just a matter of configuring systemd to put the logs into its binary log.

Use FPGA in container

I can use FPGA in a physical machine with opencl, but I can not use it the same way in
container. And I see, I can not aocl program a aocx file in container. Can someone tell me how to use FPGA in a container, whether it doesn't make sense to use it in containers. If not, can give me a artificial intelligence demo to use FPGA in container

kubernetes application

I just allocated one region of fpga for my pod. While , in my pod, i can see all of fpga on this host. The attachment named cern.yml is the detail of my pod, and the 1.png is the output of kubectl get pod -o yaml, and the 2.png is the output of lspci | grep 9c[45] running in my pod, 3.png for fpgainfo fme in my pod and 4.png for fpgainfo portin my pod.

demo/ubuntu-demo-opencl looks like can't work

Hi,
I try to build demo/ubuntu-demo-opencl, and use it to test gpu_plugin. But it looks like that this demo can't work correctly.

If only run this image using docker in host os, this app crashed.

$ docker run -it --rm -v /dev:/dev -v /etc:/etc -v /opt:/opt --privileged ubuntu-demo-opencl:devel

$ ./run-opencl-example.sh /root/6-1/fft

Segmentation fault (core dumped)

and at the same time, It looks like "clinfo" could not get the gpu infomation.

$ clinfo

Number of platforms 0

If copy "/root/6-1/fft/*" to the host environment and execute it, the app can work correctly.

I'm not sure whether there are something is wrong in my steps. Or need to do some extra setup operations?

FPGA: init container for plugin daemon set

Idea is to create init container for FPGA device plugin daemonset where following items will be done:

host mount some directory, e.g. /opt/intel/fpga and CRI-O hook configuration directory
rsync --delete all needed libraries and binaries for fpga programming into this directory (including fpga_crihook)
force-update fpga_crihook configuration file for CRI-O

Multiple afu modes

There are 2 fpga afu on one of our hosts with uuid 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f and d8424dc4-a4a3-c413-f89e-433683f9040b like the attachment named afu.png.
While, if we run fpga_plugin in af mode, we can found only one afu (18b79ffa-2ee5-4aa0-96ef-4230dafacb5f) been registed into k8s, like the attachment name registed_afu.png.

can't build fpga_plugin

What's the golang version I should use? I can't pass the fpga_plugin build:
with golang 1.6 I get the following error:

../../internal/deviceplugin/server.go:18:2: cannot find the package "context" in any of:
$GOROOT
$GOPATH

With Golang 1.7 and 1.8 I get:

*server does not implement v1beta1.DevicePluginServer (wrong type for Allocate method)
		have Allocate("context".Context, *v1beta1.AllocateRequest) (*v1beta1.AllocateResponse, error)
		want Allocate("github.com/intel/intel-device-plugins-for-kubernetes/vendor/golang.org/x/net/context".Context, *v1beta1.AllocateRequest) (*v1beta1.AllocateResponse, error)

and With latest I get:

import cycle not allowed
package github.com/intel/intel-device-plugins-for-kubernetes/cmd/fpga_plugin
	imports flag
	imports errors
	imports runtime
	imports internal/bytealg
	imports internal/cpu
	imports runtime
Makefile:35: recipe for target 'fpga_plugin' failed
make: *** [fpga_plugin] Error 1

Support for GVT-g vGPUs

I'm interested in using GVT-g to virtualize the GPUs in my cluster nodes and assign them to pods (essentially for use with qemu in kubevirt).

I came up with a very naïve implementation, and wanted to check with you whether there's any interest in making this part of the "official" device plugin?

"make build" against vendored Kubernetes v1.13 fails

The code generated with the codegenerator from client-go v8.0 isn't compatible with the latest K8s release. As result make build produces this output:

$ make all
cd cmd/fpga_admissionwebhook; go build
# github.com/intel/intel-device-plugins-for-kubernetes/vendor/k8s.io/client-go/pkg/apis/clientauthentication/v1beta1
../../vendor/k8s.io/client-go/pkg/apis/clientauthentication/v1beta1/zz_generated.conversion.go:39:15: scheme.AddGeneratedConversionFuncs undefined (type *runtime.Scheme has no field or method AddGeneratedConversionFuncs)
# github.com/intel/intel-device-plugins-for-kubernetes/vendor/k8s.io/client-go/pkg/apis/clientauthentication/v1alpha1
../../vendor/k8s.io/client-go/pkg/apis/clientauthentication/v1alpha1/zz_generated.conversion.go:39:15: scheme.AddGeneratedConversionFuncs undefined (type *runtime.Scheme has no field or method AddGeneratedConversionFuncs)
make: *** [Makefile:35: fpga_admissionwebhook] Error 2

Upgrade vendored K8s packages to v1.13 and fix the build issues.

gpu deployment: mount /dev/dri only

Looks like we don't need to bind mount full /dev. Can we limit that to /dev/dri/?

Switch to 'go mod' instead of 'dep'

When we will be updating to new version of kubernetes dependecies, let's switch to use go mod and go 1.12+ for our codebase.

Need to explicitly check for QAT devices when DPDK drivers are used as other devices (eg NICs) bound to DPDK driver get wrongly counted as QAT devices

When the device plugin runs it scans the /sys/bus/pci/drivers/ directory for the following drivers to detect the VFs associated with them:
dh895xccvf,c6xxvf,c3xxxvf,d15xxvf,igb_uio

The first 4 drivers have no issue as they are the kernel drivers for QAT VFs.
However, when scanning the igb_uio driver for devices attached to it as well as detecting QAT VFs the device plugin will also detect any NIC VFs that are bound to the igb_uio driver.
This results in the QAT device plugin managing and allocating NIC VFs.

qat_plugin: update loop exhausts VF driver devices

The update loop doing scan() exhausts devices from the VF driver pool

for _, driver := range append(dp.kernelVfDrivers, dp.dpdkDriver) {

After a while, all devices bound to kernelVfDrivers have been unbound and the plugin only scans dpdkDriver directory.

intel-fpga-plugin dial tcp i/o timeout

intel-fpga-plugin pod get error

$ kubectl logs intel-fpga-plugin-5dppj -n kube-system
ERROR: Get https://110.1.0.1:443/api/v1/nodes/node10: dial tcp 110.1.0.1:443: i/o timeout

I created Kubernetes cluster as follow. There are two worker nodes in the cluster, node09(110.1.1.109), node10(110.1.1.110)

kubeadm init --kubernetes-version=1.14.2 \
  --apiserver-advertise-address=110.1.1.108  \
  --image-repository registry.aliyuncs.com/google_containers  \
  --service-cidr=110.1.0.0/16  --pod-network-cidr=110.244.0.0/16

I created intel-fpga-plugin DaemonSet as intel-device-plugins-for-kubernetes/cmd/fpga_plugin. And I finished all the guides: FPGA device plugin, FPGA admission controller webhook, FPGA prestart CRI-O hook.

However, logs in intel-fpga-plugin pods and intel-fpga-webhook pod get errors. Theses pods try to access https://110.1.0.1:443, but there is no such a pod with this ip.

Could you guys help me figure out how to fix it? Am I wrong to create cluster with wrong service-cidr?

Check that FPGA mapping is working correctly within namespace bounds

In some of the tests I've noticed that pod that requested FPGA in namespace default was successfully admitted with mappings from kube-system namespace.
It should fail if mapping is not present in namespace of the pod.

Can’t run the demo on k8s node with two FPGAs

Hi.
I'm trying the FPGA plugin and I have 2 questions related to a multi-FPGA k8s node.
If any one here can give me some hint I will be very appreciate.

I have a DELL R740 server with two Intel PAC (FPGA Board), and I use this server as a k8s node.
BTW, I also have a DELL R640 with single Intel PAC and I do not have problem with this one.

Question1

I want to create two pod, each pod attached with one PAC.
But inside pod, I can see both two PAC with lspci command.
Is this behavior correct?

$ lspci  | grep accelerators
86:00.0 Processing accelerators: Intel Corporation Device 09c4
d8:00.0 Processing accelerators: Intel Corporation Device 09c4

Question2

When I run the nlb3 of ubuntu-demo-opae with FPGA Card,
the demo fails with Error: device enumeration failed. in af mode although the AFU id is correct.
And Error: couldn't open device. in both af and region mode.

The only way to run the demo I found is to change af mode to region mode and also specify the bus number like ./nbl3 -B 0x86.

Is it required to specify the bus number on a multi-FPGA node?
If so, how do I know the allocated FPGA bus number from the pod? Because I can not tell which is the one with the lspci command.

pod creation

I modified the yaml and created my pod as following

test-fpga-region2.yml (copy of test-fpga-region.yml)

-name: test-fpga-region
+name: test-fpga-region2
-command: ["sh", "/usr/bin/test_fpga.sh"]
+command: ["sleep", "3600"]
-fpga.intel.com/arria10-nlb3: 1 
+fpga.intel.com/arria10: 1　# To use region mode

nlb3 execution result

I executed my pod as following

## execute nlb3  (region mode)

$ kubectl exec -ti test-fpga-region2 /usr/bin/nlb3
[22][WARN][accelerator::open] Errors encountered while opening accelerator resource
Error: couldn't open device.
command terminated with exit code 102

## execute nlb3 with Bus number option (region mode)

$ kubectl exec -ti test-fpga-region2 /usr/bin/nlb3 -- -B 0x86


Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss   Eviction 'Clocks(@200 MHz)'   Rd_Bandwidth   Wr_Bandwidth
         1          1           0            0            0             0             0          0              155     0.083 GB/s     0.000 GB/s

VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
           1            1            0            0            0            0

## execute nlb3  (af mode)
ubectl exec -ti test-fpga-region /usr/bin/nlb3
[74][WARN][accelerator::open] Errors encountered while opening accelerator resource
Error: couldn't open device.
command terminated with exit code 102

## execute nlb3 with Bus number option (af mode)
ubectl exec -ti test-fpga-region /usr/bin/nlb3 -- -B 0x86
Error: device enumeration failed.
Please make sure that the driver is loaded and that a bitstream for
AFU id: F7DF405C-BD7A-CF72-22F1-44B0B93ACD18 is programmed.
command terminated with exit code 102

k8s status

$ kubectl get no
NAME     STATUS   ROLES    AGE   VERSION
r740     Ready    <none>   31d   v1.12.3
master   Ready    master   40d   v1.12.3

$ kubectl get po
NAME                                            READY   STATUS    RESTARTS   AGE
intel-fpga-webhook-deployment-98d745549-nf8v5   1/1     Running   0          17h

$ kubectl describe node r740 | grep fpga.intel.com
fpga.intel.com/device-plugin-mode: region
fpga.intel.com/region-9926ab6d6c925a68aabca7d84c545738:  2
fpga.intel.com/region-9926ab6d6c925a68aabca7d84c545738:  2

$ kubectl describe node r740 | grep hugepage
 hugepages-1Gi:                                       0
 hugepages-2Mi:                                       2Gi
 hugepages-1Gi:                                       0
 hugepages-2Mi:                                       2Gi
 
$ cat /sys/class/fpga/intel-fpga-dev.0/intel-fpga-port.0/afu_id
f7df405cbd7acf7222f144b0b93acd18

$ cat /sys/class/fpga/intel-fpga-dev.1/intel-fpga-port.1/afu_id
f7df405cbd7acf7222f144b0b93acd18

FPGA CRI hook: implement support for OpenCL bitstreams

Current implementation supports only OPAE bitstreams.
It would be great to have OpenCL bitstreams also supported.

OpenVINO project uses OpenCL bitstreams, so supporting OpenCL bitstreams would also help to support OpenVINO workloads.

VCA1(VCA1283LVV) support？

I installed the plug-in in the VCA card , but when I started gpu_plugins, I saw the following error through debug mode:

Is VCA1 supported?

Upgrade to OPAE 1.3.0

Evaluate and potentially upgrade the tools in init container to OPAE SDK 1.3.0 (or later)

Enable codecov.io in travis-ci

Main site: https://codecov.io/
example for Go integration: https://github.com/codecov/example-go

Dockerfile for openssl-qat-engine fails to build with debian:sid as builder

I am not sure what exactly is going on but the ca-certificates package refuses to install. Below at 1 is what the failure looks like using the Dockerfile as is. Below at 2 is when I modified the docker file to try to specify installing ca-certificates right after the apt-get update. This failed on 3 different Clear Linux machines with builds from December and from March. These machines are all behind a proxy. However, another clear linux machine that is behind a transparent proxy (Don't need to set proxy variables) works fine for building. If I change the Dockerfile so that the builder is from ubuntu:latest vs from debian:sid it gets past this ca-certificates error and nearly finishes except for some sed errors. This worked for me a couple days ago with debian:sid so I have no idea what might have changed and can't find any recent issues from others on installing this package on debian.

I have tried the following without success
a. RUN apt-get install --reinstall ca-certificates before installing ca-certificates.
b. RUN dpkg --purge --force-depends ca-certificates before install ca-certificates
c. dpkg --configure -a

Note: When I manually specify to install the ca-certificates I get a lot of errors about TERM not set but if I set "ENV DEBIAN_FRONTEND="noninteractive"" then those errors don't come up. The image still fails to build from debian though.

Setting up libalgorithm-merge-perl (0.08-3) ...
Processing triggers for libc-bin (2.28-7) ...
Errors were encountered while processing:
 ca-certificates
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get update &&     apt-get install -y git build-essential wget libssl-dev openssl libudev-dev pkg-config autoconf autogen libtool &&     git clone https://github.com/intel/QAT_Engine &&     git clone -b OpenSSL_1_1_1-stable https://github.com/openssl/openssl.git &&     wget https://01.org/sites/default/files/downloads/intelr-quickassist-technology/$QAT_DRIVER_RELEASE.tar.gz &&     tar zxf $QAT_DRIVER_RELEASE.tar.gz' returned a non-zero code: 100

Unpacking libssl1.1:amd64 (1.1.1b-1) ...
Selecting previously unselected package openssl.
Preparing to unpack .../openssl_1.1.1b-1_amd64.deb ...
Unpacking openssl (1.1.1b-1) ...
Selecting previously unselected package ca-certificates.
Preparing to unpack .../ca-certificates_20190110_all.deb ...
Unpacking ca-certificates (20190110) ...
Setting up libssl1.1:amd64 (1.1.1b-1) ...
Setting up openssl (1.1.1b-1) ...
Setting up ca-certificates (20190110) ...
Updating certificates in /etc/ssl/certs...
mv: cannot move '/tmp/ca-certificates.crt.tmp.sL7u4e' to a subdirectory of itself, 'ca-certificates.crt'
dpkg: error processing package ca-certificates (--configure):
 installed ca-certificates package post-installation script subprocess returned error exit status 1
Processing triggers for libc-bin (2.28-7) ...
Errors were encountered while processing:
 ca-certificates
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get install -y ca-certificates' returned a non-zero code: 100

Wrong FPGA device plugin af/region ID

I followed the doc cmd/fpga_plugin.

Run FPGA device plugin as administrator in af mode

# kubectl describe node node08 node09 node10 | grep fpga.intel.com
                    fpga.intel.com/device-plugin-mode: af
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
  fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18      0           0
  fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee  0           0
                    fpga.intel.com/device-plugin-mode: af
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
  fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18      0          0
  fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee  0          0
                    fpga.intel.com/device-plugin-mode: af
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
 fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:      1
 fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee:  0
  fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18      0          0
  fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee  0          0

the af ID is af-f7df405cbd7acf7222f144b0b93acd18, BUT when I run the demo and replaced fpga.intel.com/arria10-nlb3 to fpga.intel.com/arria10-nlb0. I got

[root@node08 demo]# kubectl apply -f test-fpga-region-ypc.yml
pod/test-fpga-region-ypc2 created
[root@node08 demo]#
[root@node08 demo]# kubectl describe pod test-fpga-region-ypc2
Name:               test-fpga-region-ypc2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"test-fpga-region-ypc2","namespace":"default"},"spec":{"containers":[{...
Status:             Pending
IP:
Containers:
  test-container:
    Image:      ubuntu-demo-opae:devel
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      /usr/bin/test_fpga.sh
    Limits:
      cpu:                                                 1
      fpga.intel.com/af-d8424dc4a4a3c413f89e433683f9040b:  1
      hugepages-2Mi:                                       20Mi
    Requests:
      cpu:                                                 1
      fpga.intel.com/af-d8424dc4a4a3c413f89e433683f9040b:  1
      hugepages-2Mi:                                       20Mi
    Environment:                                           <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9mrqb (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-9mrqb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9mrqb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  3s    default-scheduler  0/3 nodes are available: 3 Insufficient fpga.intel.com/af-d8424dc4a4a3c413f89e433683f9040b, 3 Insufficient hugepages-2Mi.

Such situation appeared when I use region mode too. My region id is fpga.intel.com/region-a9f2d0f3b39857b0b34fd226bf364fee. BUT when I run the demo, reported that it requests region-9926ab6d6c925a68aabca7d84c545738

So how to set right configuration in yaml.

spec:
  containers:
  - name: test-container
    ...
    resources:
      limits:
        fpga.intel.com/arria10-nlb3: 1  # here
        cpu: 1
        hugepages-2Mi: 20Mi

Thank you!

Update FPGA demo to use HugePages

Currently FPGA demo expects that hugepages are pre-allocated outside of kubernetes.
We need to update pod spec for using standard kubernetes feature for hugepages:
https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

Document updating of /etc/crio/crio.conf

Some installation (e.g. package in ubuntu) has in default config:

hooks_dir = [
]

This means that no hooks will be discovered and enabled.
Updates to documentation is needed to make sure that /etc/containers/oci/hooks.d is listed there.

FPGA: support multiple FPGA devices

FPGA plugin, Admission Webhook and CRI hook currently support only one FPGA device. They should be extended to support multiple devices. This will help to support more complex FPGA workflows that use more than one accelerated function.

Evaluate usage of FPGA/GPU/QAT devices on nodes with SELinux enabled.

In order to support properly usage of our device plugins with OpenShift on top of CentOS/RHEL hosts, we need to validate that SELinux policies are not conflicting with any of the devices.

Any chance you of Intel RealSense device plugin?

We are currently building solution on top of Kubernetes and would like to plugin RealSesne cameras all the way into container. It would be really good to see support for USB devices like RealSense.

qat_plugin: increase code coverage with unit tests

Currently about 50% of qat_plugin is covered with unit tests. More tests are needed.

FPGA: Failure on creating container if host OS does not have some python modules

Failure on creating container if host OS does not have some python modules
9926ab6d6c925a68aabca7d84c545738/f7df405cbd7acf7222f144b0b93acd18: can't get bitstream info

/opt/intel/fpga-sw/opae/bin/packager 
jsonschema module has no validatiors() or exceptions()
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/opt/intel/fpga-sw/opae/bin/packager/__main__.py", line 2, in <module>
  File "/opt/intel/fpga-sw/opae/bin/packager/packager.py", line 35, in <module>
  File "/opt/intel/fpga-sw/opae/bin/packager/afu.py", line 52, in <module>
ImportError: No module named jsonschema

non-root workloads

Implement (with documentation) cluster/node setup scripts and deployments that allows to run non-root workloads.

Enable TopologyInfo in pkg/deviceplugin

Allow passing NUMA information when using pkg/deviceplugin

https://github.com/kubernetes/kubernetes/blob/11678fb1c029ee1f3c7223e87d339b2d0016fdb8/pkg/kubelet/apis/deviceplugin/v1beta1/api.proto#L84-L101

Automate e2e testing for fpga_admissionwebhook

The proposed design is that the Makefile target e2e-test depending on intel-fpga-admissionwebhook

creates a cluster with kind,
copies the image intel-fpga-admissionwebhook:devel to the cluster's container,
import the image inside the cluster,
deploys the webhook,
creates a test pod like

apiVersion: v1
kind: Pod
metadata:
  name: test-pod-1
  annotations:
      key1: value1
spec:
  restartPolicy: Never
  containers:
    -
      name: test-pod-1-container-1
      image: ubuntu:bionic
      imagePullPolicy: IfNotPresent
      command: [ "ls", "-l", "/" ]
      resources:
        limits:
          fpga.intel.com/arria10-nlb0: 1

waits until the pod gets successfully mutated to

# kubectl get pods -o jsonpath="{..env}"
[map[name:FPGA_AFU_1 value:d8424dc4a4a3c413f89e433683f9040b] map[name:FPGA_REGION_1 value:9926ab6d6c925a68aabca7d84c545738]]

Otherwise times out.
Finally tears down the cluster.