Giter VIP home page Giter VIP logo

Comments (6)

bart0sh avatar bart0sh commented on August 23, 2024

But inside pod, I can see both two PAC with lspci command. Is this behavior correct?

Yes, it's correct.
lspci output is not masked inside containers as far as I know at least. You can see all PCI devices in the output, not just FPGA accelerators.

from intel-device-plugins-for-kubernetes.

bart0sh avatar bart0sh commented on August 23, 2024

@khrd

When I run the nlb3 of ubuntu-demo-opae with FPGA Card, the demo fails with Error: device enumeration failed.

Can you run these commands inside the container and show me the output, please?

lsmod
ls -la /dev/intel-fpga*

from intel-device-plugins-for-kubernetes.

khrd avatar khrd commented on August 23, 2024

Thanks for the reply.
here are the results(I tried both mode).

inside the container (af mode)

lsmod

# lsmod
Module                  Size  Used by
ipt_REJECT             12541  0
nf_reject_ipv4         13373  1 ipt_REJECT
veth                   13410  0
vxlan                  49241  0
ip6_udp_tunnel         12755  1 vxlan
udp_tunnel             14137  1 vxlan
xt_statistic           12601  2
xt_nat                 12681  7
xt_comment             12504  24
xt_mark                12563  5
ipt_MASQUERADE         12678  4
nf_nat_masquerade_ipv4    13412  1 ipt_MASQUERADE
nf_conntrack_netlink    40449  0
nfnetlink              14696  2 nf_conntrack_netlink
iptable_nat            12875  1
nf_conntrack_ipv4      15053  6
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
nf_nat_ipv4            14115  1 iptable_nat
xt_addrtype            12676  3
iptable_filter         12810  1
xt_conntrack           12760  5
nf_nat                 26787  3 nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
nf_conntrack          133387  6 nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
br_netfilter           22209  0
bridge                136173  1 br_netfilter
stp                    12976  1 bridge
llc                    14552  2 stp,bridge
overlay                51863  10
rpcrdma                86152  0
sunrpc                348674  1 rpcrdma
ib_isert               50770  0
iscsi_target_mod      302966  1 ib_isert
ib_iser                47813  0
libiscsi               57233  1 ib_iser
scsi_transport_iscsi    99909  2 ib_iser,libiscsi
ib_srpt                48170  0
target_core_mod       367918  3 iscsi_target_mod,ib_srpt,ib_isert
ib_srp                 48454  0
scsi_transport_srp     20993  1 ib_srp
scsi_tgt               20027  1 scsi_transport_srp
ib_ipoib              110142  0
rdma_ucm               26841  0
ib_ucm                 22589  0
ib_uverbs              64636  2 ib_ucm,rdma_ucm
ib_umad                22080  0
rdma_cm                54426  4 rpcrdma,ib_iser,rdma_ucm,ib_isert
ib_cm                  47287  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
iw_cm                  46260  1 rdma_cm
mlx5_ib               171306  0
ib_core               211874  14 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx5_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
iTCO_wdt               13480  0
iTCO_vendor_support    13718  1 iTCO_wdt
dcdbas                 14847  0
skx_edac               13258  0
edac_core              58151  1 skx_edac
intel_powerclamp       14419  0
vfat                   17461  1
coretemp               13444  0
fat                    65950  1 vfat
intel_rapl             19362  0
iosf_mbi               13523  1 intel_rapl
kvm_intel             170086  0
kvm                   566340  1 kvm_intel
irqbypass              13503  1 kvm
crc32_pclmul           13113  0
ghash_clmulni_intel    13259  0
aesni_intel            69884  0
lrw                    13286  1 aesni_intel
gf128mul               14951  1 lrw
glue_helper            13990  1 aesni_intel
ablk_helper            13597  1 aesni_intel
cryptd                 20359  3 ghash_clmulni_intel,aesni_intel,ablk_helper
ipmi_ssif              23829  0
pcspkr                 12718  0
sg                     40721  0
mei_me                 32712  0
mei                    96296  1 mei_me
shpchp                 37032  0
lpc_ich                21073  0
i2c_i801               22418  0
ipmi_si                53456  0
ipmi_devintf           17572  0
ipmi_msghandler        46608  3 ipmi_ssif,ipmi_devintf,ipmi_si
nfit                   49183  0
libnvdimm             132047  1 nfit
acpi_pad              116305  0
acpi_power_meter       18087  0
ip_tables              27115  2 iptable_filter,iptable_nat
xfs                   978100  3
libcrc32c              12644  3 xfs,nf_nat,nf_conntrack
sd_mod                 46322  4
sr_mod                 22416  0
crc_t10dif             12714  2 target_core_mod,sd_mod
cdrom                  42556  1 sr_mod
crct10dif_generic      12647  0
altera_asmip2          13168  0
spi_nor_mod            26321  1 altera_asmip2
mtd                    59531  3 altera_asmip2
intel_fpga_pac_hssi    18107  0
intel_fpga_fme         52380  0
intel_fpga_afu         31735  0
fpga_mgr_mod           14693  1 intel_fpga_fme
mgag200                41365  1
drm_kms_helper        159169  1 mgag200
syscopyarea            12529  1 drm_kms_helper
sysfillrect            12701  1 drm_kms_helper
sysimgblt              12640  1 drm_kms_helper
crct10dif_pclmul       14289  1
fb_sys_fops            12703  1 drm_kms_helper
crct10dif_common       12595  3 crct10dif_pclmul,crct10dif_generic,crc_t10dif
crc32c_intel           22079  1
ttm                    99345  1 mgag200
mlx5_core             389556  1 mlx5_ib
drm                   370825  4 ttm,drm_kms_helper,mgag200
igb                   205773  0
ahci                   34042  0
libahci                31992  1 ahci
devlink                30193  1 mlx5_core
libata                238896  2 ahci,libahci
intel_fpga_pci         26519  2 intel_fpga_afu,intel_fpga_fme
megaraid_sas          149357  3
ptp                    19231  2 igb,mlx5_core
dca                    15130  1 igb
pps_core               19057  1 ptp
i2c_algo_bit           13413  2 igb,mgag200
i2c_core               40756  7 drm,igb,i2c_i801,ipmi_ssif,drm_kms_helper,mgag200,i2c_algo_bit
dm_mirror              22124  0
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_mod                123303  10 dm_log,dm_mirror

ls -la /dev/intel-fpga*

# ls -la /dev/intel-fpga*
crw-------. 1 root root 243, 0 Jan  9 00:57 /dev/intel-fpga-port.0

inside the container (region mode)

lsmod

$ lsmod
Module                  Size  Used by
ip6table_nat           12864  0
nf_conntrack_ipv6      18935  1
nf_defrag_ipv6         35104  1 nf_conntrack_ipv6
nf_nat_ipv6            14131  1 ip6table_nat
ip6_tables             26901  1 ip6table_nat
ipt_REJECT             12541  0
nf_reject_ipv4         13373  1 ipt_REJECT
veth                   13410  0
vxlan                  49241  0
ip6_udp_tunnel         12755  1 vxlan
udp_tunnel             14137  1 vxlan
xt_statistic           12601  2
xt_nat                 12681  7
xt_comment             12504  24
xt_mark                12563  5
ipt_MASQUERADE         12678  4
nf_nat_masquerade_ipv4    13412  1 ipt_MASQUERADE
nf_conntrack_netlink    40449  0
nfnetlink              14696  2 nf_conntrack_netlink
iptable_nat            12875  1
nf_conntrack_ipv4      15053  6
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
nf_nat_ipv4            14115  1 iptable_nat
xt_addrtype            12676  3
iptable_filter         12810  1
xt_conntrack           12760  5
nf_nat                 26787  4 nf_nat_ipv4,nf_nat_ipv6,xt_nat,nf_nat_masquerade_ipv4
nf_conntrack          133387  8 nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
br_netfilter           22209  0
bridge                136173  1 br_netfilter
stp                    12976  1 bridge
llc                    14552  2 stp,bridge
overlay                51863  10
rpcrdma                86152  0
sunrpc                348674  1 rpcrdma
ib_isert               50770  0
iscsi_target_mod      302966  1 ib_isert
ib_iser                47813  0
libiscsi               57233  1 ib_iser
scsi_transport_iscsi    99909  2 ib_iser,libiscsi
ib_srpt                48170  0
target_core_mod       367918  3 iscsi_target_mod,ib_srpt,ib_isert
ib_srp                 48454  0
scsi_transport_srp     20993  1 ib_srp
scsi_tgt               20027  1 scsi_transport_srp
ib_ipoib              110142  0
rdma_ucm               26841  0
ib_ucm                 22589  0
ib_uverbs              64636  2 ib_ucm,rdma_ucm
ib_umad                22080  0
rdma_cm                54426  4 rpcrdma,ib_iser,rdma_ucm,ib_isert
ib_cm                  47287  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
iw_cm                  46260  1 rdma_cm
mlx5_ib               171306  0
ib_core               211874  14 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx5_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
dcdbas                 14847  0
iTCO_wdt               13480  0
iTCO_vendor_support    13718  1 iTCO_wdt
vfat                   17461  1
fat                    65950  1 vfat
skx_edac               13258  0
edac_core              58151  1 skx_edac
intel_powerclamp       14419  0
ipmi_ssif              23829  0
coretemp               13444  0
intel_rapl             19362  0
iosf_mbi               13523  1 intel_rapl
kvm_intel             170086  0
kvm                   566340  1 kvm_intel
irqbypass              13503  1 kvm
crc32_pclmul           13113  0
ghash_clmulni_intel    13259  0
aesni_intel            69884  0
lrw                    13286  1 aesni_intel
gf128mul               14951  1 lrw
glue_helper            13990  1 aesni_intel
ablk_helper            13597  1 aesni_intel
cryptd                 20359  3 ghash_clmulni_intel,aesni_intel,ablk_helper
pcspkr                 12718  0
sg                     40721  0
mei_me                 32712  0
mei                    96296  1 mei_me
i2c_i801               22418  0
shpchp                 37032  0
lpc_ich                21073  0
ipmi_si                53456  0
ipmi_devintf           17572  0
ipmi_msghandler        46608  3 ipmi_ssif,ipmi_devintf,ipmi_si
nfit                   49183  0
libnvdimm             132047  1 nfit
acpi_power_meter       18087  0
acpi_pad              116305  0
ip_tables              27115  2 iptable_filter,iptable_nat
xfs                   978100  3
libcrc32c              12644  3 xfs,nf_nat,nf_conntrack
sd_mod                 46322  4
sr_mod                 22416  0
cdrom                  42556  1 sr_mod
crc_t10dif             12714  2 target_core_mod,sd_mod
crct10dif_generic      12647  0
altera_asmip2          13168  0
spi_nor_mod            26321  1 altera_asmip2
mtd                    59531  3 altera_asmip2
intel_fpga_pac_hssi    18107  0
intel_fpga_fme         52380  0
intel_fpga_afu         31735  0
fpga_mgr_mod           14693  1 intel_fpga_fme
crct10dif_pclmul       14289  1
crct10dif_common       12595  3 crct10dif_pclmul,crct10dif_generic,crc_t10dif
mgag200                41365  1
crc32c_intel           22079  1
drm_kms_helper        159169  1 mgag200
syscopyarea            12529  1 drm_kms_helper
mlx5_core             389556  1 mlx5_ib
sysfillrect            12701  1 drm_kms_helper
sysimgblt              12640  1 drm_kms_helper
fb_sys_fops            12703  1 drm_kms_helper
ttm                    99345  1 mgag200
igb                   205773  0
drm                   370825  4 ttm,drm_kms_helper,mgag200
ahci                   34042  0
libahci                31992  1 ahci
devlink                30193  1 mlx5_core
libata                238896  2 ahci,libahci
intel_fpga_pci         26519  2 intel_fpga_afu,intel_fpga_fme
megaraid_sas          149357  3
ptp                    19231  2 igb,mlx5_core
pps_core               19057  1 ptp
dca                    15130  1 igb
i2c_algo_bit           13413  2 igb,mgag200
i2c_core               40756  7 drm,igb,i2c_i801,ipmi_ssif,drm_kms_helper,mgag200,i2c_algo_bit
dm_mirror              22124  0
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_mod                123303  10 dm_log,dm_mirror

ls -la /dev/intel-fpga*

# ls -la /dev/intel-fpga*
crw-------. 1 root root 244, 0 Jan  9 00:24 /dev/intel-fpga-port.a

from intel-device-plugins-for-kubernetes.

bart0sh avatar bart0sh commented on August 23, 2024

@khrd thanks. The output looks ok except of one thing:

# ls -la /dev/intel-fpga*
crw-------. 1 root root 244, 0 Jan  9 00:24 /dev/intel-fpga-port.a

I have never seen this before. Usually port devices are named /dev/intel-fpga-port.N, where N is a number. Are you sure it's /dev/intel-fpga-port.a ?

Anyway, I'm not able to reproduce this issue on my setup. I have the same setup - 2 Arria 10 DCP1.1 cards on a DELL R720 machine flashed with nlb3 green bitstream:

> cat /sys/class/fpga/intel-fpga-dev.0/intel-fpga-fme.0/pr/interface_id 
9926ab6d6c925a68aabca7d84c545738
> cat /sys/class/fpga/intel-fpga-dev.1/intel-fpga-fme.1/pr/interface_id 
9926ab6d6c925a68aabca7d84c545738
> cat /sys/class/fpga/intel-fpga-dev.0/intel-fpga-port.0/afu_id
f7df405cbd7acf7222f144b0b93acd18
> cat /sys/class/fpga/intel-fpga-dev.1/intel-fpga-port.1/afu_id
f7df405cbd7acf7222f144b0b93acd18

The plugin is running in region mode:

> kubectl get pod -n kube-system
NAME                                             READY   STATUS    RESTARTS   AGE
coredns-86c58d9df4-g7hpn                         1/1     Running   4          24h
coredns-86c58d9df4-r42sp                         1/1     Running   4          24h
etcd-r720-1                                      1/1     Running   4          24h
intel-fpga-plugin-lncxv                          1/1     Running   2          20h
intel-fpga-webhook-deployment-58845f7bfc-sq22p   1/1     Running   1          20h
kube-apiserver-r720-1                            1/1     Running   1          20h
kube-controller-manager-r720-1                   1/1     Running   16         24h
kube-proxy-lzp2m                                 1/1     Running   4          24h
kube-scheduler-r720-1                            1/1     Running   15         24h
weave-net-rjxcb                                  2/2     Running   7          24h

> kubectl logs intel-fpga-plugin-lncxv -n kube-system
Overriding mode to  region
FPGA device plugin started in  region  mode
Start server for region-9926ab6d6c925a68aabca7d84c545738 at: /var/lib/kubelet/device-plugins/fpga.intel.com-region-9926ab6d6c925a68aabca7d84c545738.sock
Device plugin for region-9926ab6d6c925a68aabca7d84c545738 registered

Here is my modified spec. Please, note it differs from yours. You removed AFU id from the resource for some reason, I didn't.

> cat test-fpga-region2.yml 
apiVersion: v1
kind: Pod
metadata:
  name: test-fpga-region2
spec:
  containers:
  - name: test-container
    image: ubuntu-demo-opae:devel
    imagePullPolicy: IfNotPresent
    command: ["sleep", "1000000"]
    securityContext:
      capabilities:
        add:
          [IPC_LOCK]
    resources:
      limits:
        fpga.intel.com/arria10-nlb3: 1
        cpu: 1
        hugepages-2Mi: 20Mi

  restartPolicy: Never

Here is how I started the pod:

> kubectl create -f test-fpga-region2.yml 
pod/test-fpga-region2 created

> kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
test-fpga-region2   1/1     Running   0          13m

Here is how I run nlb3:

> kubectl exec test-fpga-region2 /usr/bin/nlb3

Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss   Eviction 'Clocks(@200 MHz)'   Rd_Bandwidth   Wr_Bandwidth
         1          1           0            0            0             0             0          0              140     0.091 GB/s     0.000 GB/s

VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 
           1            1            0            0            0            0 

As you see everything works as expected, which is good. We just need to figure out the difference between your and my setups.

My current idea is that you installed latest version(1.2) of kernel drivers. Our plugins have been tested with OPAE 1.1.0 from here https://github.com/OPAE/opae-sdk/releases/tag/1.1.0-2

I'd propose you to do this:

  • forget about af mode for now. The demo uses region mode and I have working setup in region mode. Let's concentrate on this until it works for you.
  • show interface ids for both your cards here: cat /sys/class/fpga/intel-fpga-dev.0/intel-fpga-fme.0/pr/interface_id && cat /sys/class/fpga/intel-fpga-dev.1/intel-fpga-fme.1/pr/interface_id
  • Use my test-fpga-region2.yml as yours is not correct
  • Make sure you have OPAE kernel drivers v 1.1.0-2 installed and loaded. Reinstall them if needed and power cycle your server just to be sure hardware is setup correctly.
  • Try to repeat what I did
  • Report here if it works for you or not. Please do it in either case.

Hope it helps,
Ed

from intel-device-plugins-for-kubernetes.

khrd avatar khrd commented on August 23, 2024

I appreciate your help with this.

Actually, the OPAE kernel driver v1.0.2-2 was installed.
So, I updated v 1.1.0-2.
But, I found that one of FPGA has trouble ( It failed diagnostics with NLB_3 on the host machine).
I might it take a long to resolve this problem.
I'll try asking vender support.

Anyway, Thank you for your support!

from intel-device-plugins-for-kubernetes.

kad avatar kad commented on August 23, 2024

This should be fixed with earlier #179 and now updated with #199

from intel-device-plugins-for-kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.