Giter VIP home page Giter VIP logo

Comments (8)

mythi avatar mythi commented on August 23, 2024

After investigating a bit, we root caused it to the by-path feature added to gpu plugin. We have verified by reverting this change on an image and it works for us.

Is this the same issue we chatted with Dan earlier? The problem was the by-path symlink labeling.

from intel-device-plugins-for-kubernetes.

mregmi avatar mregmi commented on August 23, 2024

@mythi Yes. its the same issue.

from intel-device-plugins-for-kubernetes.

mythi avatar mythi commented on August 23, 2024

My takeaway was a fix was provided. On our side, we have the need to keep by-path so the fix likely needs more SELinux changes?

from intel-device-plugins-for-kubernetes.

mregmi avatar mregmi commented on August 23, 2024

Yea its little strange why that lead to SELinux issue. it seems in that change all nodes are added properly. May need some more debugging to know why exactly its causing that.

from intel-device-plugins-for-kubernetes.

mregmi avatar mregmi commented on August 23, 2024

One thing that stands out to me is the files and folders in by-path are labelled differently and also the owner/groups are different. may be that's the issue? Will check what exactly sets the owner and label

sh-5.1# ls -lZ /dev/dri
total 0
drwxr-xr-x. 2 root root   system_u:object_r:device_t:s0          140 Oct 31 04:50 by-path
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   0 Oct 31 03:51 card0
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   1 Oct 31 04:50 card1
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   2 Oct 31 04:50 card2
crw-rw-rw-. 1 root render system_u:object_r:dri_device_t:s0 226, 128 Oct 31 04:50 renderD128
crw-rw-rw-. 1 root render system_u:object_r:dri_device_t:s0 226, 129 Oct 31 04:50 renderD129

sh-5.1# ls -lZ /dev/dri/by-path/
total 0
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 03:51 pci-0000:02:00.0-card -> ../card0
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 04:50 pci-0000:37:00.0-card -> ../card1
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0 13 Oct 31 04:50 pci-0000:37:00.0-render -> ../renderD128
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 04:50 pci-0000:3c:00.0-card -> ../card2
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0 13 Oct 31 04:50 pci-0000:3c:00.0-render -> ../renderD129

from intel-device-plugins-for-kubernetes.

mregmi avatar mregmi commented on August 23, 2024

I think the issue is that labels inside the container is incorrect and is labelled as dri_device_t instead of container_file_t
Not sure why the label under by-path is wrong.

Workload Pod

sh-4.4$ ls -laZ /dev/dri   
total 0
drwxr-xr-x. 3 root root  system_u:object_r:container_file_t:s0:c27,c28      140 Nov  8 22:42 .
drwxr-xr-x. 6 root root  system_u:object_r:container_file_t:s0:c27,c28      380 Nov  8 22:42 ..
drwxr-xr-x. 2 root root  system_u:object_r:container_file_t:s0:c27,c28      120 Nov  8 22:42 by-path
crw-rw-rw-. 1 root video system_u:object_r:container_file_t:s0:c27,c28 226,   1 Nov  8 22:42 card1
crw-rw-rw-. 1 root video system_u:object_r:container_file_t:s0:c27,c28 226,   2 Nov  8 22:42 card2
crw-rw-rw-. 1 root   797 system_u:object_r:container_file_t:s0:c27,c28 226, 128 Nov  8 22:42 renderD128
crw-rw-rw-. 1 root   797 system_u:object_r:container_file_t:s0:c27,c28 226, 129 Nov  8 22:42 renderD129
sh-4.4$ ls -laZ /dev/dri/by-path/
total 0
drwxr-xr-x. 2 root root  system_u:object_r:container_file_t:s0:c27,c28      120 Nov  8 22:42 .
drwxr-xr-x. 3 root root  system_u:object_r:container_file_t:s0:c27,c28      140 Nov  8 22:42 ..
crw-rw----. 1 root video system_u:object_r:dri_device_t:s0             226,   1 Oct 31 04:50 pci-0000:37:00.0-card
crw-rw-rw-. 1 root   797 system_u:object_r:dri_device_t:s0             226, 128 Oct 31 04:50 pci-0000:37:00.0-render
crw-rw----. 1 root video system_u:object_r:dri_device_t:s0             226,   2 Oct 31 04:50 pci-0000:3c:00.0-card
crw-rw-rw-. 1 root   797 system_u:object_r:dri_device_t:s0             226, 129 Oct 31 04:50 pci-0000:3c:00.0-render
sh-4.4$

from intel-device-plugins-for-kubernetes.

eero-t avatar eero-t commented on August 23, 2024

I checked host by-path dir contents with upstream 6.6 kernel under Ubuntu, 6.5 under Fedora, and 5.14+DKMS under SLES.

On all of those hosts, by-path content indeed differs slightly from Pod contents. All files are:

  • symlinks to parent dir files
  • Owned by root:root

On Fedora it looks like this:

$ ls -laZ /dev/dri/by-path/
total 0
drwxr-xr-x. 2 root root system_u:object_r:device_t:s0  80  9.11. 10:44 .
drwxr-xr-x. 3 root root system_u:object_r:device_t:s0 100  9.11. 10:44 ..
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0   8  9.11. 10:44 pci-0000:00:02.0-card -> ../card1
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  13  9.11. 10:44 pci-0000:00:02.0-render -> ../renderD128

Besides the SELinux labeling issue, pod files not being symlinks to matching device file might cause issues for some workloads?

from intel-device-plugins-for-kubernetes.

mregmi avatar mregmi commented on August 23, 2024

This is fixed via a change in container-selinux.

from intel-device-plugins-for-kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.