Comments (14)
Hey Hao,
Thanks for reporting the issue.
Could you provide the following information:
- Step by step instructions and the command line you use to run the training?
- Provide detailed logs with
-x FI_LOG_LEVEL=warn -x FI_LOG_PROV=efa
? - Output of
fi_info -p efa -t FI_EP_RDM
andlspci -i efa
? - EFA installer version? The reason if that libfabric only has versions upto v1.15.1.
from aws-ofi-nccl.
Hi Rashika,
I am working with Hao to troubleshoot this. Steps to repo this problem
- Install EFA using https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html on 2 instances of p4dn.24xlarge with kernel version 5.13.0-1023-aws and using CUDA 11.6
- Download the distributed training code from https://github.com/pytorch/examples/tree/main/distributed/ddp
- Command on node1
FI_PROVIDER="efa" FI_EFA_USE_DEVICE_RDMA=1 NCCL_DEBUG=INFO FI_LOG_LEVEL=warn FI_LOG_PROV=efa LD_LIBRARY_PATH=/opt/nccl/build/lib:/usr/local/cuda/lib64:/opt/amazon/efa/lib:/opt/aws-ofi-nccl/lib:$LD_LIBRARY_PATH \
python <path_to_launch.py> \
--nnode=2 --node_rank=0 --nproc_per_node=8 --master_addr="10.216.179.193" --master_port=35000 \
<path_to_example.py> --local_world_size=8
- Command on node2
FI_PROVIDER="efa" FI_EFA_USE_DEVICE_RDMA=1 NCCL_DEBUG=INFO FI_LOG_LEVEL=warn FI_LOG_PROV=efa LD_LIBRARY_PATH=/opt/nccl/build/lib:/usr/local/cuda/lib64:/opt/amazon/efa/lib:/opt/aws-ofi-nccl/lib:$LD_LIBRARY_PATH \
python <path_to_launch.py> \
--nnode=2 --node_rank=1 --nproc_per_node=8 --master_addr="10.216.179.193" --master_port=35000 \
<path_to_example.py> --local_world_size=8
Output is
p-10-216-179-87:718885:719039 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718885:1658904522::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
libfabric:718885:1658904522::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718885:719039 [0] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718885:719039 [0] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718890:718890 [5] NCCL INFO cudaDriverVersion 11060
ip-10-216-179-87:718892:718892 [7] NCCL INFO cudaDriverVersion 11060
ip-10-216-179-87:718891:718891 [6] NCCL INFO cudaDriverVersion 11060
ip-10-216-179-87:718889:718889 [4] NCCL INFO cudaDriverVersion 11060
ip-10-216-179-87:718887:718887 [2] NCCL INFO cudaDriverVersion 11060
ip-10-216-179-87:718890:718890 [5] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718890:718890 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718890:718890 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718890:719040 [5] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718890:719040 [5] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718890:719040 [5] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718890:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
libfabric:718890:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718890:719040 [5] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718890:719040 [5] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718892:718892 [7] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718892:718892 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718892:718892 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718892:719041 [7] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718892:719041 [7] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718892:719041 [7] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718892:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
libfabric:718892:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718892:719041 [7] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718892:719041 [7] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718891:718891 [6] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718891:718891 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718891:718891 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718891:719042 [6] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718891:719042 [6] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718891:719042 [6] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718891:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718886:718886 [1] NCCL INFO cudaDriverVersion 11060
libfabric:718891:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718891:719042 [6] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718891:719042 [6] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718889:718889 [4] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718889:718889 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718889:718889 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718889:719043 [4] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718889:719043 [4] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718889:719043 [4] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718889:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718887:718887 [2] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718887:718887 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718887:718887 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718887:719044 [2] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718887:719044 [2] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718887:719044 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718887:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
libfabric:718889:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718889:719043 [4] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718889:719043 [4] NCCL INFO Using network AWS Libfabric
libfabric:718887:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718887:719044 [2] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718887:719044 [2] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718886:718886 [1] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718886:718886 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718886:718886 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718886:719045 [1] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718886:719045 [1] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718886:719045 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718886:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718888:718888 [3] NCCL INFO cudaDriverVersion 11060
libfabric:718886:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718886:719045 [1] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718886:719045 [1] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718888:718888 [3] NCCL INFO Bootstrap : Using ens32:10.216.179.87<0>
ip-10-216-179-87:718888:718888 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
ip-10-216-179-87:718888:718888 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
ip-10-216-179-87:718888:719046 [3] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
ip-10-216-179-87:718888:719046 [3] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-87:718888:719046 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
libfabric:718888:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
libfabric:718888:1658904523::core:core:cuda_gdrcopy_hmem_init():191<warn> gdrcopy_dl_hmem_init failed!
ip-10-216-179-87:718888:719046 [3] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-87:718888:719046 [3] NCCL INFO Using network AWS Libfabric
ip-10-216-179-87:718887:719044 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00ffffff
ip-10-216-179-87:718888:719046 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffff0000,00ffffff
ip-10-216-179-87:718889:719043 [4] NCCL INFO Setting affinity for GPU 4 to ffffff00,0000ffff,ff000000
ip-10-216-179-87:718891:719042 [6] NCCL INFO Setting affinity for GPU 6 to ffffff00,0000ffff,ff000000
ip-10-216-179-87:718890:719040 [5] NCCL INFO Setting affinity for GPU 5 to ffffff00,0000ffff,ff000000
ip-10-216-179-87:718886:719045 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
ip-10-216-179-87:718892:719041 [7] NCCL INFO Setting affinity for GPU 7 to ffffff00,0000ffff,ff000000
ip-10-216-179-87:718885:719039 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
ip-10-216-179-87:718891:719042 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->6 [3] 15/-1/-1->14->13 [4] 15/-1/-1->14->13 [5] 15/6/-1->14->-1
ip-10-216-179-87:718890:719040 [5] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->12 [2] -1/-1/-1->13->12 [3] 14/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] -1/-1/-1->13->12
ip-10-216-179-87:718892:719041 [7] NCCL INFO Trees [0] 8/-1/-1->15->14 [1] 8/-1/-1->15->14 [2] 8/-1/-1->15->14 [3] 8/-1/-1->15->14 [4] 8/-1/-1->15->14 [5] 8/-1/-1->15->14
ip-10-216-179-87:718886:719045 [1] NCCL INFO Trees [0] -1/-1/-1->9->8 [1] 10/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] -1/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] 10/-1/-1->9->8
ip-10-216-179-87:718887:719044 [2] NCCL INFO Trees [0] 11/-1/-1->10->2 [1] 11/-1/-1->10->9 [2] 11/-1/-1->10->9 [3] 11/2/-1->10->-1 [4] 11/-1/-1->10->9 [5] 11/-1/-1->10->9
ip-10-216-179-87:718888:719046 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] -1/-1/-1->11->10 [2] 12/-1/-1->11->10 [3] 12/-1/-1->11->10 [4] -1/-1/-1->11->10 [5] 12/-1/-1->11->10
ip-10-216-179-87:718889:719043 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->4 [2] 13/-1/-1->12->11 [3] 13/-1/-1->12->11 [4] 13/4/-1->12->-1 [5] 13/-1/-1->12->11
ip-10-216-179-87:718885:719039 [0] NCCL INFO Trees [0] 9/-1/-1->8->15 [1] 9/-1/-1->8->15 [2] 9/-1/-1->8->15 [3] 9/-1/-1->8->15 [4] 9/-1/-1->8->15 [5] 9/-1/-1->8->15
ip-10-216-179-87:718887:719044 [2] NCCL INFO Channel 00 : 10[201c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718889:719043 [4] NCCL INFO Channel 02 : 12[901c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 00 : 8[101c0] -> 11[201d0] via P2P/IPC/read
ip-10-216-179-87:718887:719044 [2] NCCL INFO Channel 03 : 10[201c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718889:719043 [4] NCCL INFO Channel 05 : 12[901c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 03 : 8[101c0] -> 11[201d0] via P2P/IPC/read
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 02 : 8[101c0] -> 13[901d0] via P2P/IPC/read
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 05 : 8[101c0] -> 13[901d0] via P2P/IPC/read
ip-10-216-179-87:718888:719046 [3] NCCL INFO Channel 00/0 : 11[201d0] -> 2[201c0] [send] via NET/AWS Libfabric/0/GDRDMA
ip-10-216-179-87:718888:719046 [3] NCCL INFO Channel 03/0 : 11[201d0] -> 2[201c0] [send] via NET/AWS Libfabric/0/GDRDMA
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 01 : 8[101c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718885:719039 [0] NCCL INFO Channel 04 : 8[101c0] -> 15[a01d0] via P2P/IPC/read
ip-10-216-179-87:718890:719040 [5] NCCL INFO Channel 01/0 : 13[901d0] -> 4[901c0] [send] via NET/AWS Libfabric/1/GDRDMA
ip-10-216-179-87:718890:719040 [5] NCCL INFO Channel 04/0 : 13[901d0] -> 4[901c0] [send] via NET/AWS Libfabric/1/GDRDMA
ip-10-216-179-87:718892:719041 [7] NCCL INFO Channel 02/0 : 15[a01d0] -> 6[a01c0] [send] via NET/AWS Libfabric/2/GDRDMA
ip-10-216-179-87:718892:719041 [7] NCCL INFO Channel 05/0 : 15[a01d0] -> 6[a01c0] [send] via NET/AWS Libfabric/2/GDRDMA
ip-10-216-179-87:718891:719042 [6] NCCL INFO Channel 02/0 : 7[a01d0] -> 14[a01c0] [receive] via NET/AWS Libfabric/2/GDRDMA
ip-10-216-179-87:718891:719042 [6] NCCL INFO Channel 05/0 : 7[a01d0] -> 14[a01c0] [receive] via NET/AWS Libfabric/2/GDRDMA
ip-10-216-179-87:718889:719043 [4] NCCL INFO Channel 01/0 : 5[901d0] -> 12[901c0] [receive] via NET/AWS Libfabric/1/GDRDMA
ip-10-216-179-87:718889:719043 [4] NCCL INFO Channel 04/0 : 5[901d0] -> 12[901c0] [receive] via NET/AWS Libfabric/1/GDRDMA
ip-10-216-179-87:718887:719044 [2] NCCL INFO Channel 00/0 : 3[201d0] -> 10[201c0] [receive] via NET/AWS Libfabric/0/GDRDMA
ip-10-216-179-87:718887:719044 [2] NCCL INFO Channel 03/0 : 3[201d0] -> 10[201c0] [receive] via NET/AWS Libfabric/0/GDRDMA
libfabric:718891:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718891:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718891:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718891:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718891:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
libfabric:718889:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718889:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718889:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718889:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718889:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
libfabric:718887:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718887:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718887:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718887:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718887:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
libfabric:718892:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718892:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718890:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718890:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718888:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718888:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718892:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718892:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718892:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718892:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718892:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
libfabric:718890:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718890:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718890:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718890:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718890:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
libfabric:718888:1658904526::efa:mr:efa_mr_reg_impl():517<warn> Unable to register MR: Cannot allocate memory
libfabric:718888:1658904526::efa:mr:efa_mr_regattr():632<warn> Unable to register MR: Cannot allocate memory
libfabric:718888:1658904526::efa:cq:rxr_ep_grow_rx_pkt_pools():1585<warn> cannot allocate memory for EFA's RX packet pool. error: Cannot allocate memory
libfabric:718888:1658904526::efa:eq:efa_eq_write_error():662<warn> Writing error Cannot allocate memory to EQ.
libfabric:718888:1658904526::efa:eq:efa_eq_write_error():676<warn> Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12)
Unable to write to EQ: Missing or unavailable event queue. err: Cannot allocate memory (-12) prov_errno: Cannot allocate memory (-12) ./prov/efa/src/rxr/rxr.h:683
The output of $ fi_info -p efa -t FI_EP_RDM
is
provider: efa
fabric: efa
domain: rdmap32s27-rdm
version: 116.0
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: efa
domain: rdmap144s27-rdm
version: 116.0
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: efa
domain: rdmap160s27-rdm
version: 116.0
type: FI_EP_RDM
protocol: FI_PROTO_EFA
The output of lspci -i efa
is
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
10:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
10:1c.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
10:1d.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
10:1e.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
10:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
20:01.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
20:1b.0 Ethernet controller: Amazon.com, Inc. Elastic Fabric Adapter (EFA)
20:1c.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
20:1d.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
20:1e.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
20:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
80:1a.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
80:1b.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
80:1c.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
80:1d.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
80:1e.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
80:1f.0 Bridge: NVIDIA Corporation Device 1af1 (rev a1)
90:01.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
90:1b.0 Ethernet controller: Amazon.com, Inc. Elastic Fabric Adapter (EFA)
90:1c.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
90:1d.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
90:1e.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
90:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
a0:01.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
a0:1b.0 Ethernet controller: Amazon.com, Inc. Elastic Fabric Adapter (EFA)
a0:1c.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
a0:1d.0 3D controller: NVIDIA Corporation Device 20b0 (rev a1)
a0:1e.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
a0:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
The output of sudo cat /opt/amazon/efa_installed_packages
is
# EFA installer version: 1.17.2
# Debug packages installed: no
# Packages installed:
efa-config_1.10_all efa-profile_1.5_all libfabric-aws-bin_1.16.0~amzn3.0_amd64 libfabric-aws-dev_1.16.0~amzn3.0_amd64 libfabric1-aws_1.16.0~amzn3.0_amd64 openmpi40-aws_4.1.4-1_amd64 ibacm_41.0-1_amd64 ibverbs-providers_41.0-1_amd64 ibverbs-utils_41.0-1_amd64 infiniband-diags_41.0-1_amd64 libibmad-dev_41.0-1_amd64 libibmad5_41.0-1_amd64 libibnetdisc-dev_41.0-1_amd64 libibnetdisc5_41.0-1_amd64 libibumad-dev_41.0-1_amd64 libibumad3_41.0-1_amd64 libibverbs-dev_41.0-1_amd64 libibverbs1_41.0-1_amd64 librdmacm-dev_41.0-1_amd64 librdmacm1_41.0-1_amd64 rdma-core_41.0-1_amd64 rdmacm-utils_41.0-1_amd64 efa_1.16.0-1.amzn1_amd64
It looks like even though EFA installer 1.17.2 is supposed to install libfabric 1.16.0, in the folder /opt/amazon/efa/lib
, what get's installed is libfabric.so.1.19.0
from aws-ofi-nccl.
One additional thing is that when we run nccl-tests across these multiple instances, we don't see any errors and everything passes.
/opt/amazon/openmpi/bin/mpirun \
-x FI_PROVIDER="efa" \
-x FI_EFA_USE_DEVICE_RDMA=1 \
-x LD_LIBRARY_PATH=/opt/nccl/build/lib:/usr/local/cuda/lib64:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:$LD_LIBRARY_PATH \
-x NCCL_DEBUG=INFO \
--hostfile my-hosts -n 32 -N 8 \
--mca pml ^cm --mca btl tcp,self --mca btl_tcp_if_exclude lo,docker0 --bind-to none \
$HOME/nccl-tests/build/all_reduce_perf -b 8 -e 1G -f 2 -g 1 -c 1 -n 100
...
...
..
ip-10-216-179-193:641226:641269 [7] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /opt/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-179-193:641226:641269 [7] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
ip-10-216-179-193:641220:641267 [1] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-193:641220:641267 [1] NCCL INFO Using network AWS Libfabric
ip-10-216-179-193:641224:641268 [5] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-193:641224:641268 [5] NCCL INFO Using network AWS Libfabric
ip-10-216-179-193:641226:641269 [7] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-179-193:641226:641269 [7] NCCL INFO Using network AWS Libfabric
ip-10-216-179-193:641223:641223 [4] NCCL INFO Bootstrap : Using ens32:10.216.179.193<0>
...
...
...
#
# out-of-place in-place
# size count type redop time algbw busbw error time algbw busbw error
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
ip-10-216-179-193:641221:641271 [2] NCCL INFO comm 0x7f115c000c60 rank 2 nranks 32 cudaDev 2 busId 201c0 - Init COMPLETE
ip-10-216-178-255:416486:416524 [7] NCCL INFO comm 0x7ff6bc000c60 rank 23 nranks 32 cudaDev 7 busId a01d0 - Init COMPLETE
ip-10-216-178-255:416483:416523 [5] NCCL INFO comm 0x7f1720000c60 rank 21 nranks 32 cudaDev 5 busId 901d0 - Init COMPLETE
ip-10-216-178-255:416479:416530 [1] NCCL INFO comm 0x7fb05c000c60 rank 17 nranks 32 cudaDev 1 busId 101d0 - Init COMPLETE
ip-10-216-178-255:416482:416525 [4] NCCL INFO comm 0x7f7998000c60 rank 20 nranks 32 cudaDev 4 busId 901c0 - Init COMPLETE
ip-10-216-178-255:416481:416522 [3] NCCL INFO comm 0x7f586c000c60 rank 19 nranks 32 cudaDev 3 busId 201d0 - Init COMPLETE
ip-10-216-178-255:416478:416527 [0] NCCL INFO comm 0x7ff8f8000c60 rank 16 nranks 32 cudaDev 0 busId 101c0 - Init COMPLETE
ip-10-216-178-255:416480:416529 [2] NCCL INFO comm 0x7fd66c000c60 rank 18 nranks 32 cudaDev 2 busId 201c0 - Init COMPLETE
ip-10-216-178-255:416484:416528 [6] NCCL INFO comm 0x7f191c000c60 rank 22 nranks 32 cudaDev 6 busId a01c0 - Init COMPLETE
8 2 float sum 167.0 0.00 0.00 2e-07 161.7 0.00 0.00 2e-07
16 4 float sum 162.9 0.00 0.00 2e-07 162.8 0.00 0.00 2e-07
32 8 float sum 163.2 0.00 0.00 2e-07 162.9 0.00 0.00 2e-07
64 16 float sum 163.7 0.00 0.00 2e-07 164.0 0.00 0.00 2e-07
128 32 float sum 164.3 0.00 0.00 2e-07 164.0 0.00 0.00 2e-07
256 64 float sum 165.6 0.00 0.00 2e-07 165.2 0.00 0.00 2e-07
512 128 float sum 169.3 0.00 0.01 2e-07 169.1 0.00 0.01 1e-07
1024 256 float sum 175.0 0.01 0.01 5e-07 175.3 0.01 0.01 5e-07
2048 512 float sum 182.2 0.01 0.02 5e-07 181.6 0.01 0.02 5e-07
4096 1024 float sum 193.1 0.02 0.04 5e-07 193.1 0.02 0.04 5e-07
8192 2048 float sum 210.0 0.04 0.08 5e-07 194.4 0.04 0.08 5e-07
16384 4096 float sum 212.0 0.08 0.15 5e-07 195.3 0.08 0.16 5e-07
32768 8192 float sum 391.8 0.08 0.16 5e-07 382.1 0.09 0.17 5e-07
65536 16384 float sum 392.4 0.17 0.32 7e-07 421.1 0.16 0.30 7e-07
131072 32768 float sum 406.8 0.32 0.62 7e-07 468.9 0.28 0.54 7e-07
262144 65536 float sum 413.8 0.63 1.23 7e-07 417.3 0.63 1.22 7e-07
524288 131072 float sum 518.2 1.01 1.96 7e-07 524.6 1.00 1.94 7e-07
1048576 262144 float sum 721.1 1.45 2.82 7e-07 719.4 1.46 2.82 7e-07
2097152 524288 float sum 808.1 2.60 5.03 7e-07 806.9 2.60 5.04 7e-07
4194304 1048576 float sum 956.5 4.39 8.50 7e-07 953.2 4.40 8.53 7e-07
8388608 2097152 float sum 1446.7 5.80 11.23 7e-07 1453.5 5.77 11.18 7e-07
16777216 4194304 float sum 2436.3 6.89 13.34 7e-07 2437.5 6.88 13.34 7e-07
33554432 8388608 float sum 3616.4 9.28 17.98 7e-07 3645.9 9.20 17.83 7e-07
67108864 16777216 float sum 5298.8 12.66 24.54 1e-06 5248.0 12.79 24.78 1e-06
134217728 33554432 float sum 11120 12.07 23.39 1e-06 11208 11.98 23.20 1e-06
268435456 67108864 float sum 19494 13.77 26.68 1e-06 19551 13.73 26.60 1e-06
536870912 134217728 float sum 37906 14.16 27.44 1e-06 37963 14.14 27.40 1e-06
1073741824 268435456 float sum 70783 15.17 29.39 1e-06 70790 15.17 29.39 1e-06
ip-10-216-179-193:641220:641220 [1] NCCL INFO comm 0x7faea0000c60 rank 1 nranks 32 cudaDev 1 busId 101d0 - Destroy COMPLETE
ip-10-216-179-119:416602:416602 [1] NCCL INFO comm 0x7f8328000c60 rank 25 nranks 32 cudaDev 1 busId 101d0 - Destroy COMPLETE
ip-10-216-179-193:641224:641224 [5] NCCL INFO comm 0x7f9a40000c60 rank 5 nranks 32 cudaDev 5 busId 901d0 - Destroy COMPLETE
ip-10-216-179-87:417132:417132 [1] NCCL INFO comm 0x7f6738000c60 rank 9 nranks 32 cudaDev 1 busId 101d0 - Destroy COMPLETE
ip-10-216-179-87:417136:417136 [5] NCCL INFO comm 0x7f0fd8000c60 rank 13 nranks 32 cudaDev 5 busId 901d0 - Destroy COMPLETE
ip-10-216-179-119:416606:416606 [5] NCCL INFO comm 0x7fb640000c60 rank 29 nranks 32 cudaDev 5 busId 901d0 - Destroy COMPLETE
ip-10-216-178-255:416479:416479 [1] NCCL INFO comm 0x7fb05c000c60 rank 17 nranks 32 cudaDev 1 busId 101d0 - Destroy COMPLETE
ip-10-216-179-193:641222:641222 [3] NCCL INFO comm 0x7f031c000c60 rank 3 nranks 32 cudaDev 3 busId 201d0 - Destroy COMPLETE
ip-10-216-179-193:641219:641219 [0] NCCL INFO comm 0x7f94f0000c60 rank 0 nranks 32 cudaDev 0 busId 101c0 - Destroy COMPLETE
ip-10-216-179-119:416601:416601 [0] NCCL INFO comm 0x7efd4c000c60 rank 24 nranks 32 cudaDev 0 busId 101c0 - Destroy COMPLETE
ip-10-216-179-119:416604:416604 [3] NCCL INFO comm 0x7f8a5c000c60 rank 27 nranks 32 cudaDev 3 busId 201d0 - Destroy COMPLETE
ip-10-216-179-119:416608:416608 [7] NCCL INFO comm 0x7f5330000c60 rank 31 nranks 32 cudaDev 7 busId a01d0 - Destroy COMPLETE
ip-10-216-179-193:641226:641226 [7] NCCL INFO comm 0x7ff018000c60 rank 7 nranks 32 cudaDev 7 busId a01d0 - Destroy COMPLETE
ip-10-216-178-255:416483:416483 [5] NCCL INFO comm 0x7f1720000c60 rank 21 nranks 32 cudaDev 5 busId 901d0 - Destroy COMPLETE
# Out of bounds values : 0 OK
# Avg bus bandwidth : 6.95596
#
ip-10-216-179-193:641225:641225 [6] NCCL INFO comm 0x7fc938000c60 rank 6 nranks 32 cudaDev 6 busId a01c0 - Destroy COMPLETE
ip-10-216-178-255:416478:416478 [0] NCCL INFO comm 0x7ff8f8000c60 rank 16 nranks 32 cudaDev 0 busId 101c0 - Destroy COMPLETE
ip-10-216-179-87:417131:417131 [0] NCCL INFO comm 0x7f51ec000c60 rank 8 nranks 32 cudaDev 0 busId 101c0 - Destroy COMPLETE
ip-10-216-179-87:417134:417134 [3] NCCL INFO comm 0x7f40ec000c60 rank 11 nranks 32 cudaDev 3 busId 201d0 - Destroy COMPLETE
ip-10-216-179-119:416607:416607 [6] NCCL INFO comm 0x7f477c000c60 rank 30 nranks 32 cudaDev 6 busId a01c0 - Destroy COMPLETE
ip-10-216-178-255:416481:416481 [3] NCCL INFO comm 0x7f586c000c60 rank 19 nranks 32 cudaDev 3 busId 201d0 - Destroy COMPLETE
ip-10-216-179-119:416603:416603 [2] NCCL INFO comm 0x7fd5ac000c60 rank 26 nranks 32 cudaDev 2 busId 201c0 - Destroy COMPLETE
ip-10-216-179-193:641223:641223 [4] NCCL INFO comm 0x7f006c000c60 rank 4 nranks 32 cudaDev 4 busId 901c0 - Destroy COMPLETE
ip-10-216-179-193:641221:641221 [2] NCCL INFO comm 0x7f115c000c60 rank 2 nranks 32 cudaDev 2 busId 201c0 - Destroy COMPLETE
ip-10-216-179-87:417138:417138 [7] NCCL INFO comm 0x7f0a9c000c60 rank 15 nranks 32 cudaDev 7 busId a01d0 - Destroy COMPLETE
ip-10-216-178-255:416486:416486 [7] NCCL INFO comm 0x7ff6bc000c60 rank 23 nranks 32 cudaDev 7 busId a01d0 - Destroy COMPLETE
ip-10-216-179-119:416605:416605 [4] NCCL INFO comm 0x7fd69c000c60 rank 28 nranks 32 cudaDev 4 busId 901c0 - Destroy COMPLETE
ip-10-216-178-255:416480:416480 [2] NCCL INFO comm 0x7fd66c000c60 rank 18 nranks 32 cudaDev 2 busId 201c0 - Destroy COMPLETE
ip-10-216-178-255:416484:416484 [6] NCCL INFO comm 0x7f191c000c60 rank 22 nranks 32 cudaDev 6 busId a01c0 - Destroy COMPLETE
ip-10-216-179-87:417137:417137 [6] NCCL INFO comm 0x7fe7fc000c60 rank 14 nranks 32 cudaDev 6 busId a01c0 - Destroy COMPLETE
ip-10-216-179-87:417135:417135 [4] NCCL INFO comm 0x7f8d90000c60 rank 12 nranks 32 cudaDev 4 busId 901c0 - Destroy COMPLETE
ip-10-216-179-87:417133:417133 [2] NCCL INFO comm 0x7f05f0000c60 rank 10 nranks 32 cudaDev 2 busId 201c0 - Destroy COMPLETE
ip-10-216-178-255:416482:416482 [4] NCCL INFO comm 0x7f7998000c60 rank 20 nranks 32 cudaDev 4 busId 901c0 - Destroy COMPLETE
from aws-ofi-nccl.
Is there a reason that you are using just 3 interfaces on a P4Dn rather than 4? Could you also provide dmesg
output when the failure happens?
from aws-ofi-nccl.
I've made the change to use 4 interfaces now (we wanted to understand the performace impact as well of going from 1 to 4).
Here is the new fi_info result.
fi_info -p efa -t FI_EP_RDM
provider: efa
fabric: EFA-fe80::5a:feff:feea:aec1
domain: rdmap16s27-rdm
version: 111.10
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: EFA-fe80::fc:75ff:fe6c:e223
domain: rdmap32s27-rdm
version: 111.10
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: EFA-fe80::3e:52ff:fe5a:367d
domain: rdmap144s27-rdm
version: 111.10
type: FI_EP_RDM
protocol: FI_PROTO_EFA
provider: efa
fabric: EFA-fe80::90:eaff:fe36:3bed
domain: rdmap160s27-rdm
version: 111.10
type: FI_EP_RDM
The error we get it
ip-10-216-181-207:3966:3966 [0] NCCL INFO Bootstrap : Using ens32:10.216.181.207<0>
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.2.0aws
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/OFI Running on P4d platform, Setting NCCL_TOPO_FILE environment variable to /usr/local/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/OFI Setting RDMAV_FORK_SAFE environment variable to 1
libibverbs: Warning: RLIMIT_MEMLOCK is 100 bytes.
This will severely limit memory registrations.
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/OFI Forcing AWS OFI ndev 4
ip-10-216-181-207:3966:3966 [0] NCCL INFO NET/OFI Selected Provider is efa
ip-10-216-181-207:3966:3966 [0] NCCL INFO Using network AWS Libfabric
NCCL version 2.10.3+cuda11.6
...
...
...
libfabric:3966:efa:mr:efa_mr_reg_impl():312<warn> Unable to register MR: Unknown error -12
libfabric:3966:efa:mr:efa_mr_regattr():413<warn> Unable to register MR: Cannot allocate memory
libfabric:3966:efa:ep_ctrl:rxr_ep_post_buf():288<warn> Unable to allocate rx_pkt_entry
ip-10-216-181-207:3966:4034 [0] create_nccl_ofi_component:765 NCCL WARN NET/OFI Couldn't enable endpoint. RC: -12, ERROR: Cannot allocate memory
ip-10-216-181-207:3966:4034 [0] NCCL INFO include/net.h:20 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO transport/net.cc:199 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO transport.cc:34 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO transport.cc:84 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO init.cc:778 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO init.cc:904 -> 2
ip-10-216-181-207:3966:4034 [0] NCCL INFO group.cc:72 -> 2 [Async thread]
libfabric:3968:efa:mr:efa_mr_reg_impl():312<warn> Unable to register MR: Unknown error -12
libfabric:3968:efa:mr:efa_mr_regattr():413<warn> Unable to register MR: Cannot allocate memory
libfabric:3968:efa:ep_ctrl:rxr_ep_post_buf():288<warn> Unable to allocate rx_pkt_entry
ip-10-216-181-207:3968:4041 [2] create_nccl_ofi_component:765 NCCL WARN NET/OFI Couldn't enable endpoint. RC: -12, ERROR: Cannot allocate memory
ip-10-216-181-207:3968:4041 [2] NCCL INFO include/net.h:20 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO transport/net.cc:199 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO transport.cc:34 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO transport.cc:84 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO init.cc:778 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO init.cc:904 -> 2
ip-10-216-181-207:3968:4041 [2] NCCL INFO group.cc:72 -> 2 [Async thread]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3967 closing signal SIGTERM
The dmesg output is as follows
[ 0.000000] Linux version 5.13.0-1023-aws (buildd@lcy02-amd64-104) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #25~20.04.1-Ubuntu SMP Mon Apr 25 19:28:27 UTC 2022 (Ubuntu 5.13.0-1023.25~20.04.1-aws 5.13.19)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-1023-aws root=UUID=436cf32d-5e3d-46ca-b557-f870c8a25794 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]: 8
[ 0.000000] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffe1fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffe2000-0x000000007fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000e03fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000008ef7ffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000008ef8000000-0x000000907fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000009080000000-0x0000011ef7ffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000011ef8000000-0x000001207fffffff] reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.7 present.
[ 0.000000] DMI: Amazon EC2 p4d.24xlarge/, BIOS 1.0 10/16/2017
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 47fb801001, primary cpu clock
[ 0.000001] kvm-clock: using sched offset of 5868486377 cycles
[ 0.000003] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000006] tsc: Detected 2999.998 MHz processor
[ 0.000318] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000321] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000325] last_pfn = 0x11ef8000 max_arch_pfn = 0x400000000
[ 0.000370] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.000377] last_pfn = 0x7ffe2 max_arch_pfn = 0x400000000
[ 0.006564] Using GB pages for direct mapping
[ 0.006768] RAMDISK: [mem 0x2c11b000-0x32084fff]
[ 0.006784] ACPI: Early table checksum verification disabled
[ 0.006790] ACPI: RSDP 0x00000000000F8F40 000014 (v00 AMAZON)
[ 0.006796] ACPI: RSDT 0x000000007FFE7380 000044 (v01 AMAZON AMZNRSDT 00000001 AMZN 00000001)
[ 0.006800] ACPI: FACP 0x000000007FFEFF80 000074 (v01 AMAZON AMZNFACP 00000001 AMZN 00000001)
[ 0.006805] ACPI: DSDT 0x000000007FFE73D0 001F87 (v01 AMAZON AMZNDSDT 00000001 AMZN 00000001)
[ 0.006808] ACPI: FACS 0x000000007FFEFF40 000040
[ 0.006811] ACPI: SSDT 0x000000007FFEA190 005DA1 (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.006813] ACPI: APIC 0x000000007FFE9CB0 000366 (v01 AMAZON AMZNAPIC 00000001 AMZN 00000001)
[ 0.006816] ACPI: SRAT 0x000000007FFE9400 0006A8 (v01 AMAZON AMZNSRAT 00000001 AMZN 00000001)
[ 0.006819] ACPI: SLIT 0x000000007FFE9390 00006C (v01 AMAZON AMZNSLIT 00000001 AMZN 00000001)
[ 0.006822] ACPI: WAET 0x000000007FFE9360 000028 (v01 AMAZON AMZNWAET 00000001 AMZN 00000001)
[ 0.006826] ACPI: HPET 0x00000000000C9000 000038 (v01 AMAZON AMZNHPET 00000001 AMZN 00000001)
[ 0.006829] ACPI: SSDT 0x00000000000C9040 00007B (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.006831] ACPI: Reserving FACP table memory at [mem 0x7ffeff80-0x7ffefff3]
[ 0.006833] ACPI: Reserving DSDT table memory at [mem 0x7ffe73d0-0x7ffe9356]
[ 0.006834] ACPI: Reserving FACS table memory at [mem 0x7ffeff40-0x7ffeff7f]
[ 0.006834] ACPI: Reserving SSDT table memory at [mem 0x7ffea190-0x7ffeff30]
[ 0.006835] ACPI: Reserving APIC table memory at [mem 0x7ffe9cb0-0x7ffea015]
[ 0.006836] ACPI: Reserving SRAT table memory at [mem 0x7ffe9400-0x7ffe9aa7]
[ 0.006837] ACPI: Reserving SLIT table memory at [mem 0x7ffe9390-0x7ffe93fb]
[ 0.006838] ACPI: Reserving WAET table memory at [mem 0x7ffe9360-0x7ffe9387]
[ 0.006839] ACPI: Reserving HPET table memory at [mem 0xc9000-0xc9037]
[ 0.006840] ACPI: Reserving SSDT table memory at [mem 0xc9040-0xc90ba]
[ 0.006858] ACPI: Local APIC address 0xfee00000
[ 0.006922] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[ 0.006924] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[ 0.006926] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[ 0.006927] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[ 0.006929] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[ 0.006931] SRAT: PXM 0 -> APIC 0x05 -> Node 0
[ 0.006933] SRAT: PXM 0 -> APIC 0x06 -> Node 0
[ 0.006935] SRAT: PXM 0 -> APIC 0x07 -> Node 0
[ 0.006936] SRAT: PXM 0 -> APIC 0x08 -> Node 0
[ 0.006938] SRAT: PXM 0 -> APIC 0x09 -> Node 0
[ 0.006940] SRAT: PXM 0 -> APIC 0x0a -> Node 0
[ 0.006942] SRAT: PXM 0 -> APIC 0x0b -> Node 0
[ 0.006944] SRAT: PXM 0 -> APIC 0x0c -> Node 0
[ 0.006945] SRAT: PXM 0 -> APIC 0x0d -> Node 0
[ 0.006947] SRAT: PXM 0 -> APIC 0x0e -> Node 0
[ 0.006949] SRAT: PXM 0 -> APIC 0x0f -> Node 0
[ 0.006951] SRAT: PXM 0 -> APIC 0x10 -> Node 0
[ 0.006953] SRAT: PXM 0 -> APIC 0x11 -> Node 0
[ 0.006954] SRAT: PXM 0 -> APIC 0x12 -> Node 0
[ 0.006956] SRAT: PXM 0 -> APIC 0x13 -> Node 0
[ 0.006958] SRAT: PXM 0 -> APIC 0x14 -> Node 0
[ 0.006960] SRAT: PXM 0 -> APIC 0x15 -> Node 0
[ 0.006961] SRAT: PXM 0 -> APIC 0x16 -> Node 0
[ 0.006963] SRAT: PXM 0 -> APIC 0x17 -> Node 0
[ 0.006965] SRAT: PXM 0 -> APIC 0x18 -> Node 0
[ 0.006967] SRAT: PXM 0 -> APIC 0x19 -> Node 0
[ 0.006969] SRAT: PXM 0 -> APIC 0x1a -> Node 0
[ 0.006970] SRAT: PXM 0 -> APIC 0x1b -> Node 0
[ 0.006972] SRAT: PXM 0 -> APIC 0x1c -> Node 0
[ 0.006974] SRAT: PXM 0 -> APIC 0x1d -> Node 0
[ 0.006976] SRAT: PXM 0 -> APIC 0x1e -> Node 0
[ 0.006977] SRAT: PXM 0 -> APIC 0x1f -> Node 0
[ 0.006979] SRAT: PXM 0 -> APIC 0x20 -> Node 0
[ 0.006981] SRAT: PXM 0 -> APIC 0x21 -> Node 0
[ 0.006983] SRAT: PXM 0 -> APIC 0x22 -> Node 0
[ 0.006985] SRAT: PXM 0 -> APIC 0x23 -> Node 0
[ 0.006986] SRAT: PXM 0 -> APIC 0x24 -> Node 0
[ 0.006988] SRAT: PXM 0 -> APIC 0x25 -> Node 0
[ 0.006990] SRAT: PXM 0 -> APIC 0x26 -> Node 0
[ 0.006992] SRAT: PXM 0 -> APIC 0x27 -> Node 0
[ 0.006993] SRAT: PXM 0 -> APIC 0x28 -> Node 0
[ 0.006995] SRAT: PXM 0 -> APIC 0x29 -> Node 0
[ 0.006997] SRAT: PXM 0 -> APIC 0x2a -> Node 0
[ 0.006999] SRAT: PXM 0 -> APIC 0x2b -> Node 0
[ 0.007001] SRAT: PXM 0 -> APIC 0x2c -> Node 0
[ 0.007002] SRAT: PXM 0 -> APIC 0x2d -> Node 0
[ 0.007004] SRAT: PXM 0 -> APIC 0x2e -> Node 0
[ 0.007006] SRAT: PXM 0 -> APIC 0x2f -> Node 0
[ 0.007008] SRAT: PXM 1 -> APIC 0x40 -> Node 1
[ 0.007010] SRAT: PXM 1 -> APIC 0x41 -> Node 1
[ 0.007011] SRAT: PXM 1 -> APIC 0x42 -> Node 1
[ 0.007013] SRAT: PXM 1 -> APIC 0x43 -> Node 1
[ 0.007015] SRAT: PXM 1 -> APIC 0x44 -> Node 1
[ 0.007017] SRAT: PXM 1 -> APIC 0x45 -> Node 1
[ 0.007018] SRAT: PXM 1 -> APIC 0x46 -> Node 1
[ 0.007020] SRAT: PXM 1 -> APIC 0x47 -> Node 1
[ 0.007022] SRAT: PXM 1 -> APIC 0x48 -> Node 1
[ 0.007024] SRAT: PXM 1 -> APIC 0x49 -> Node 1
[ 0.007026] SRAT: PXM 1 -> APIC 0x4a -> Node 1
[ 0.007027] SRAT: PXM 1 -> APIC 0x4b -> Node 1
[ 0.007029] SRAT: PXM 1 -> APIC 0x4c -> Node 1
[ 0.007031] SRAT: PXM 1 -> APIC 0x4d -> Node 1
[ 0.007033] SRAT: PXM 1 -> APIC 0x4e -> Node 1
[ 0.007034] SRAT: PXM 1 -> APIC 0x4f -> Node 1
[ 0.007036] SRAT: PXM 1 -> APIC 0x50 -> Node 1
[ 0.007038] SRAT: PXM 1 -> APIC 0x51 -> Node 1
[ 0.007040] SRAT: PXM 1 -> APIC 0x52 -> Node 1
[ 0.007042] SRAT: PXM 1 -> APIC 0x53 -> Node 1
[ 0.007043] SRAT: PXM 1 -> APIC 0x54 -> Node 1
[ 0.007045] SRAT: PXM 1 -> APIC 0x55 -> Node 1
[ 0.007047] SRAT: PXM 1 -> APIC 0x56 -> Node 1
[ 0.007049] SRAT: PXM 1 -> APIC 0x57 -> Node 1
[ 0.007050] SRAT: PXM 1 -> APIC 0x58 -> Node 1
[ 0.007052] SRAT: PXM 1 -> APIC 0x59 -> Node 1
[ 0.007054] SRAT: PXM 1 -> APIC 0x5a -> Node 1
[ 0.007056] SRAT: PXM 1 -> APIC 0x5b -> Node 1
[ 0.007057] SRAT: PXM 1 -> APIC 0x5c -> Node 1
[ 0.007059] SRAT: PXM 1 -> APIC 0x5d -> Node 1
[ 0.007061] SRAT: PXM 1 -> APIC 0x5e -> Node 1
[ 0.007063] SRAT: PXM 1 -> APIC 0x5f -> Node 1
[ 0.007065] SRAT: PXM 1 -> APIC 0x60 -> Node 1
[ 0.007066] SRAT: PXM 1 -> APIC 0x61 -> Node 1
[ 0.007068] SRAT: PXM 1 -> APIC 0x62 -> Node 1
[ 0.007070] SRAT: PXM 1 -> APIC 0x63 -> Node 1
[ 0.007072] SRAT: PXM 1 -> APIC 0x64 -> Node 1
[ 0.007073] SRAT: PXM 1 -> APIC 0x65 -> Node 1
[ 0.007075] SRAT: PXM 1 -> APIC 0x66 -> Node 1
[ 0.007077] SRAT: PXM 1 -> APIC 0x67 -> Node 1
[ 0.007079] SRAT: PXM 1 -> APIC 0x68 -> Node 1
[ 0.007081] SRAT: PXM 1 -> APIC 0x69 -> Node 1
[ 0.007082] SRAT: PXM 1 -> APIC 0x6a -> Node 1
[ 0.007084] SRAT: PXM 1 -> APIC 0x6b -> Node 1
[ 0.007086] SRAT: PXM 1 -> APIC 0x6c -> Node 1
[ 0.007088] SRAT: PXM 1 -> APIC 0x6d -> Node 1
[ 0.007089] SRAT: PXM 1 -> APIC 0x6e -> Node 1
[ 0.007091] SRAT: PXM 1 -> APIC 0x6f -> Node 1
[ 0.007095] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[ 0.007098] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x907fffffff]
[ 0.007101] ACPI: SRAT: Node 1 PXM 1 [mem 0x9080000000-0x1207fffffff]
[ 0.007108] NUMA: Initialized distance table, cnt=2
[ 0.007110] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x907fffffff] -> [mem 0x00000000-0x907fffffff]
[ 0.007118] NODE_DATA(0) allocated [mem 0x8ef7fd6000-0x8ef7ffffff]
[ 0.007150] NODE_DATA(1) allocated [mem 0x11ef7fd3000-0x11ef7ffcfff]
[ 0.008662] Zone ranges:
[ 0.008663] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.008665] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.008667] Normal [mem 0x0000000100000000-0x0000011ef7ffffff]
[ 0.008668] Device empty
[ 0.008669] Movable zone start for each node
[ 0.008672] Early memory node ranges
[ 0.008672] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.008674] node 0: [mem 0x0000000000100000-0x000000007ffe1fff]
[ 0.008675] node 0: [mem 0x0000000100000000-0x0000008ef7ffffff]
[ 0.008706] node 1: [mem 0x0000009080000000-0x0000011ef7ffffff]
[ 0.008737] Initmem setup node 0 [mem 0x0000000000001000-0x0000008ef7ffffff]
[ 0.008739] On node 0 totalpages: 149389184
[ 0.008740] DMA zone: 64 pages used for memmap
[ 0.008741] DMA zone: 158 pages reserved
[ 0.008741] DMA zone: 3998 pages, LIFO batch:0
[ 0.008743] DMA32 zone: 8128 pages used for memmap
[ 0.008744] DMA32 zone: 520162 pages, LIFO batch:63
[ 0.008744] Normal zone: 2326016 pages used for memmap
[ 0.008745] Normal zone: 148865024 pages, LIFO batch:63
[ 0.008746] Initmem setup node 1 [mem 0x0000009080000000-0x0000011ef7ffffff]
[ 0.008748] On node 1 totalpages: 149389312
[ 0.008748] Normal zone: 2334208 pages used for memmap
[ 0.008749] Normal zone: 149389312 pages, LIFO batch:63
[ 0.008807] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.008832] On node 0, zone DMA: 97 pages in unavailable ranges
[ 1.007176] On node 0, zone Normal: 30 pages in unavailable ranges
[ 2.345979] ACPI: PM-Timer IO Port: 0xb008
[ 2.345983] ACPI: Local APIC address 0xfee00000
[ 2.345999] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 2.346032] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[ 2.346035] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 2.346037] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 2.346038] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 2.346039] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 2.346041] ACPI: IRQ5 used by override.
[ 2.346042] ACPI: IRQ9 used by override.
[ 2.346042] ACPI: IRQ10 used by override.
[ 2.346043] ACPI: IRQ11 used by override.
[ 2.346045] Using ACPI (MADT) for SMP configuration information
[ 2.346046] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 2.346049] TSC deadline timer available
[ 2.346050] smpboot: Allowing 96 CPUs, 0 hotplug CPUs
[ 2.346080] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 2.346084] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 2.346086] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 2.346088] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 2.346091] PM: hibernation: Registered nosave memory: [mem 0x7ffe2000-0x7fffffff]
[ 2.346093] PM: hibernation: Registered nosave memory: [mem 0x80000000-0xdfffffff]
[ 2.346095] PM: hibernation: Registered nosave memory: [mem 0xe0000000-0xe03fffff]
[ 2.346097] PM: hibernation: Registered nosave memory: [mem 0xe0400000-0xfffbffff]
[ 2.346099] PM: hibernation: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
[ 2.346102] PM: hibernation: Registered nosave memory: [mem 0x8ef8000000-0x907fffffff]
[ 2.346105] [mem 0x80000000-0xdfffffff] available for PCI devices
[ 2.346107] Booting paravirtualized kernel on KVM
[ 2.346110] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 2.346120] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:96 nr_cpu_ids:96 nr_node_ids:2
[ 2.358753] percpu: Embedded 65 pages/cpu s229376 r8192 d28672 u524288
[ 2.358763] pcpu-alloc: s229376 r8192 d28672 u524288 alloc=1*2097152
[ 2.358765] pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
[ 2.358770] pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
[ 2.358775] pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
[ 2.358779] pcpu-alloc: [0] 48 49 50 51 [0] 52 53 54 55
[ 2.358783] pcpu-alloc: [0] 56 57 58 59 [0] 60 61 62 63
[ 2.358787] pcpu-alloc: [0] 64 65 66 67 [0] 68 69 70 71
[ 2.358791] pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
[ 2.358796] pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
[ 2.358800] pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
[ 2.358804] pcpu-alloc: [1] 72 73 74 75 [1] 76 77 78 79
[ 2.358808] pcpu-alloc: [1] 80 81 82 83 [1] 84 85 86 87
[ 2.358812] pcpu-alloc: [1] 88 89 90 91 [1] 92 93 94 95
[ 2.358851] kvm-guest: stealtime: cpu 0, msr 8cbc837080
[ 2.358854] kvm-guest: PV spinlocks disabled, no host support
[ 2.358861] Built 2 zonelists, mobility grouping on. Total pages: 294109922
[ 2.358863] Policy zone: Normal
[ 2.358864] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-1023-aws root=UUID=436cf32d-5e3d-46ca-b557-f870c8a25794 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
[ 2.358931] printk: log_buf_len individual max cpu contribution: 4096 bytes
[ 2.358932] printk: log_buf_len total cpu_extra contributions: 389120 bytes
[ 2.358933] printk: log_buf_len min size: 262144 bytes
[ 2.360386] printk: log_buf_len: 1048576 bytes
[ 2.360388] printk: early log buf free: 246704(94%)
[ 2.361383] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 4.946776] Memory: 1176201756K/1195113984K available (16393K kernel code, 3519K rwdata, 10532K rodata, 2896K init, 5724K bss, 18911968K reserved, 0K cma-reserved)
[ 4.946782] random: get_random_u64 called from __kmem_cache_create+0x2d/0x440 with crng_init=0
[ 4.948036] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=96, Nodes=2
[ 4.948085] Kernel/User page tables isolation: enabled
[ 4.948154] ftrace: allocating 49236 entries in 193 pages
[ 4.962908] ftrace: allocated 193 pages with 3 groups
[ 4.963462] rcu: Hierarchical RCU implementation.
[ 4.963463] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=96.
[ 4.963465] Rude variant of Tasks RCU enabled.
[ 4.963466] Tracing variant of Tasks RCU enabled.
[ 4.963467] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 4.963468] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=96
[ 4.966837] NR_IRQS: 524544, nr_irqs: 1192, preallocated irqs: 16
[ 4.967182] random: crng done (trusting CPU's manufacturer)
[ 5.089395] Console: colour VGA+ 80x25
[ 5.864795] printk: console [tty1] enabled
[ 6.106796] printk: console [ttyS0] enabled
[ 6.110291] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 6.117688] ACPI: Core revision 20210331
[ 6.121062] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns
[ 6.127630] APIC: Switch to symmetric I/O mode setup
[ 6.131464] x2apic enabled
[ 6.135105] Switched APIC routing to physical x2apic.
[ 6.140433] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2b3e43c8763, max_idle_ns: 440795360101 ns
[ 6.147595] Calibrating delay loop (skipped) preset value.. 5999.99 BogoMIPS (lpj=11999992)
[ 6.151594] pid_max: default: 98304 minimum: 768
[ 6.151594] LSM: Security Framework initializing
[ 6.151594] Yama: becoming mindful.
[ 6.151594] AppArmor: AppArmor initialized
[ 6.151594] Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes, vmalloc)
[ 6.151594] Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes, vmalloc)
[ 6.151594] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
[ 6.151594] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
[ 6.151594] process: using mwait in idle threads
[ 6.151594] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 6.151594] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[ 6.151594] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 6.151594] Spectre V2 : Mitigation: Retpolines
[ 6.151594] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 6.151594] Speculative Store Bypass: Vulnerable
[ 6.151594] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 6.151594] Freeing SMP alternatives memory: 40K
[ 6.151594] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1262
[ 6.151594] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz (family: 0x6, model: 0x55, stepping: 0x7)
[ 6.151856] Performance Events: Skylake events, Intel PMU driver.
[ 6.155598] ... version: 2
[ 6.158782] ... bit width: 48
[ 6.159596] ... generic registers: 4
[ 6.162768] ... value mask: 0000ffffffffffff
[ 6.163596] ... max period: 000000007fffffff
[ 6.167248] ... fixed-purpose events: 3
[ 6.167596] ... event mask: 000000070000000f
[ 6.171451] rcu: Hierarchical SRCU implementation.
[ 6.173244] smp: Bringing up secondary CPUs ...
[ 6.175736] x86: Booting SMP configuration:
[ 6.178976] .... node #0, CPUs: #1
[ 1.184458] kvm-clock: cpu 1, msr 47fb801041, secondary cpu clock
[ 6.181931] kvm-guest: stealtime: cpu 1, msr 8cbc8b7080
[ 6.187719] #2
[ 1.184458] kvm-clock: cpu 2, msr 47fb801081, secondary cpu clock
[ 6.189319] kvm-guest: stealtime: cpu 2, msr 8cbc937080
[ 6.195712] #3
[ 1.184458] kvm-clock: cpu 3, msr 47fb8010c1, secondary cpu clock
[ 6.200580] kvm-guest: stealtime: cpu 3, msr 8cbc9b7080
[ 6.207719] #4
[ 1.184458] kvm-clock: cpu 4, msr 47fb801101, secondary cpu clock
[ 6.209176] kvm-guest: stealtime: cpu 4, msr 8cbca37080
[ 6.215714] #5
[ 1.184458] kvm-clock: cpu 5, msr 47fb801141, secondary cpu clock
[ 6.217172] kvm-guest: stealtime: cpu 5, msr 8cbcab7080
[ 6.223703] #6
[ 1.184458] kvm-clock: cpu 6, msr 47fb801181, secondary cpu clock
[ 6.228145] kvm-guest: stealtime: cpu 6, msr 8cbcb37080
[ 6.235719] #7
[ 1.184458] kvm-clock: cpu 7, msr 47fb8011c1, secondary cpu clock
[ 6.237193] kvm-guest: stealtime: cpu 7, msr 8cbcbb7080
[ 6.243709] #8
[ 1.184458] kvm-clock: cpu 8, msr 47fb801201, secondary cpu clock
[ 6.245153] kvm-guest: stealtime: cpu 8, msr 8cbcc37080
[ 6.251707] #9
[ 1.184458] kvm-clock: cpu 9, msr 47fb801241, secondary cpu clock
[ 6.255779] kvm-guest: stealtime: cpu 9, msr 8cbccb7080
[ 6.259715] #10
[ 1.184458] kvm-clock: cpu 10, msr 47fb801281, secondary cpu clock
[ 6.265012] kvm-guest: stealtime: cpu 10, msr 8cbcd37080
[ 6.271713] #11
[ 1.184458] kvm-clock: cpu 11, msr 47fb8012c1, secondary cpu clock
[ 6.273167] kvm-guest: stealtime: cpu 11, msr 8cbcdb7080
[ 6.279713] #12
[ 1.184458] kvm-clock: cpu 12, msr 47fb801301, secondary cpu clock
[ 6.283631] kvm-guest: stealtime: cpu 12, msr 8cbce37080
[ 6.287715] #13
[ 1.184458] kvm-clock: cpu 13, msr 47fb801341, secondary cpu clock
[ 6.292934] kvm-guest: stealtime: cpu 13, msr 8cbceb7080
[ 6.299712] #14
[ 1.184458] kvm-clock: cpu 14, msr 47fb801381, secondary cpu clock
[ 6.301168] kvm-guest: stealtime: cpu 14, msr 8cbcf37080
[ 6.307721] #15
[ 1.184458] kvm-clock: cpu 15, msr 47fb8013c1, secondary cpu clock
[ 6.309168] kvm-guest: stealtime: cpu 15, msr 8cbcfb7080
[ 6.315706] #16
[ 1.184458] kvm-clock: cpu 16, msr 47fb801401, secondary cpu clock
[ 6.320837] kvm-guest: stealtime: cpu 16, msr 8cbd037080
[ 6.327710] #17
[ 1.184458] kvm-clock: cpu 17, msr 47fb801441, secondary cpu clock
[ 6.329159] kvm-guest: stealtime: cpu 17, msr 8cbd0b7080
[ 6.335709] #18
[ 1.184458] kvm-clock: cpu 18, msr 47fb801481, secondary cpu clock
[ 6.337164] kvm-guest: stealtime: cpu 18, msr 8cbd137080
[ 6.343729] #19
[ 1.184458] kvm-clock: cpu 19, msr 47fb8014c1, secondary cpu clock
[ 6.348587] kvm-guest: stealtime: cpu 19, msr 8cbd1b7080
[ 6.355704] #20
[ 1.184458] kvm-clock: cpu 20, msr 47fb801501, secondary cpu clock
[ 6.357149] kvm-guest: stealtime: cpu 20, msr 8cbd237080
[ 6.363710] #21
[ 1.184458] kvm-clock: cpu 21, msr 47fb801541, secondary cpu clock
[ 6.365146] kvm-guest: stealtime: cpu 21, msr 8cbd2b7080
[ 6.371715] #22
[ 1.184458] kvm-clock: cpu 22, msr 47fb801581, secondary cpu clock
[ 6.376391] kvm-guest: stealtime: cpu 22, msr 8cbd337080
[ 6.383714] #23
[ 1.184458] kvm-clock: cpu 23, msr 47fb8015c1, secondary cpu clock
[ 6.385172] kvm-guest: stealtime: cpu 23, msr 8cbd3b7080
[ 6.487596] .... node #1, CPUs: #24
[ 1.184458] kvm-clock: cpu 24, msr 47fb801601, secondary cpu clock
[ 1.184458] smpboot: CPU 24 Converting physical 0 to logical die 1
[ 6.492340] kvm-guest: stealtime: cpu 24, msr 11cbc837080
[ 6.495750] #25
[ 1.184458] kvm-clock: cpu 25, msr 47fb801641, secondary cpu clock
[ 6.497347] kvm-guest: stealtime: cpu 25, msr 11cbc8b7080
[ 6.503734] #26
[ 1.184458] kvm-clock: cpu 26, msr 47fb801681, secondary cpu clock
[ 6.505168] kvm-guest: stealtime: cpu 26, msr 11cbc937080
[ 6.511756] #27
[ 1.184458] kvm-clock: cpu 27, msr 47fb8016c1, secondary cpu clock
[ 6.516724] kvm-guest: stealtime: cpu 27, msr 11cbc9b7080
[ 6.523744] #28
[ 1.184458] kvm-clock: cpu 28, msr 47fb801701, secondary cpu clock
[ 6.525171] kvm-guest: stealtime: cpu 28, msr 11cbca37080
[ 6.531742] #29
[ 1.184458] kvm-clock: cpu 29, msr 47fb801741, secondary cpu clock
[ 6.533180] kvm-guest: stealtime: cpu 29, msr 11cbcab7080
[ 6.539733] #30
[ 1.184458] kvm-clock: cpu 30, msr 47fb801781, secondary cpu clock
[ 6.544902] kvm-guest: stealtime: cpu 30, msr 11cbcb37080
[ 6.551749] #31
[ 1.184458] kvm-clock: cpu 31, msr 47fb8017c1, secondary cpu clock
[ 6.553190] kvm-guest: stealtime: cpu 31, msr 11cbcbb7080
[ 6.559742] #32
[ 1.184458] kvm-clock: cpu 32, msr 47fb801801, secondary cpu clock
[ 6.563697] kvm-guest: stealtime: cpu 32, msr 11cbcc37080
[ 6.571704] #33
[ 1.184458] kvm-clock: cpu 33, msr 47fb801841, secondary cpu clock
[ 6.573125] kvm-guest: stealtime: cpu 33, msr 11cbccb7080
[ 6.579740] #34
[ 1.184458] kvm-clock: cpu 34, msr 47fb801881, secondary cpu clock
[ 6.581152] kvm-guest: stealtime: cpu 34, msr 11cbcd37080
[ 6.587755] #35
[ 1.184458] kvm-clock: cpu 35, msr 47fb8018c1, secondary cpu clock
[ 6.591853] kvm-guest: stealtime: cpu 35, msr 11cbcdb7080
[ 6.599736] #36
[ 1.184458] kvm-clock: cpu 36, msr 47fb801901, secondary cpu clock
[ 6.601170] kvm-guest: stealtime: cpu 36, msr 11cbce37080
[ 6.607752] #37
[ 1.184458] kvm-clock: cpu 37, msr 47fb801941, secondary cpu clock
[ 6.609208] kvm-guest: stealtime: cpu 37, msr 11cbceb7080
[ 6.615749] #38
[ 1.184458] kvm-clock: cpu 38, msr 47fb801981, secondary cpu clock
[ 6.620163] kvm-guest: stealtime: cpu 38, msr 11cbcf37080
[ 6.627755] #39
[ 1.184458] kvm-clock: cpu 39, msr 47fb8019c1, secondary cpu clock
[ 6.629165] kvm-guest: stealtime: cpu 39, msr 11cbcfb7080
[ 6.635735] #40
[ 1.184458] kvm-clock: cpu 40, msr 47fb801a01, secondary cpu clock
[ 6.637163] kvm-guest: stealtime: cpu 40, msr 11cbd037080
[ 6.643735] #41
[ 1.184458] kvm-clock: cpu 41, msr 47fb801a41, secondary cpu clock
[ 6.648335] kvm-guest: stealtime: cpu 41, msr 11cbd0b7080
[ 6.655753] #42
[ 1.184458] kvm-clock: cpu 42, msr 47fb801a81, secondary cpu clock
[ 6.657199] kvm-guest: stealtime: cpu 42, msr 11cbd137080
[ 6.663742] #43
[ 1.184458] kvm-clock: cpu 43, msr 47fb801ac1, secondary cpu clock
[ 6.665189] kvm-guest: stealtime: cpu 43, msr 11cbd1b7080
[ 6.671749] #44
[ 1.184458] kvm-clock: cpu 44, msr 47fb801b01, secondary cpu clock
[ 6.676589] kvm-guest: stealtime: cpu 44, msr 11cbd237080
[ 6.683738] #45
[ 1.184458] kvm-clock: cpu 45, msr 47fb801b41, secondary cpu clock
[ 6.685134] kvm-guest: stealtime: cpu 45, msr 11cbd2b7080
[ 6.691738] #46
[ 1.184458] kvm-clock: cpu 46, msr 47fb801b81, secondary cpu clock
[ 6.693179] kvm-guest: stealtime: cpu 46, msr 11cbd337080
[ 6.699756] #47
[ 1.184458] kvm-clock: cpu 47, msr 47fb801bc1, secondary cpu clock
[ 6.704808] kvm-guest: stealtime: cpu 47, msr 11cbd3b7080
[ 6.713958] .... node #0, CPUs: #48
[ 1.184458] kvm-clock: cpu 48, msr 47fb801c01, secondary cpu clock
[ 6.717011] kvm-guest: stealtime: cpu 48, msr 8cbd437080
[ 6.723896] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 6.727735] #49
[ 1.184458] kvm-clock: cpu 49, msr 47fb801c41, secondary cpu clock
[ 6.729173] kvm-guest: stealtime: cpu 49, msr 8cbd4b7080
[ 6.735709] #50
[ 1.184458] kvm-clock: cpu 50, msr 47fb801c81, secondary cpu clock
[ 6.737118] kvm-guest: stealtime: cpu 50, msr 8cbd537080
[ 6.743712] #51
[ 1.184458] kvm-clock: cpu 51, msr 47fb801cc1, secondary cpu clock
[ 6.747647] kvm-guest: stealtime: cpu 51, msr 8cbd5b7080
[ 6.751729] #52
[ 1.184458] kvm-clock: cpu 52, msr 47fb801d01, secondary cpu clock
[ 6.756897] kvm-guest: stealtime: cpu 52, msr 8cbd637080
[ 6.763716] #53
[ 1.184458] kvm-clock: cpu 53, msr 47fb801d41, secondary cpu clock
[ 6.765148] kvm-guest: stealtime: cpu 53, msr 8cbd6b7080
[ 6.771715] #54
[ 1.184458] kvm-clock: cpu 54, msr 47fb801d81, secondary cpu clock
[ 6.773154] kvm-guest: stealtime: cpu 54, msr 8cbd737080
[ 6.779717] #55
[ 1.184458] kvm-clock: cpu 55, msr 47fb801dc1, secondary cpu clock
[ 6.784657] kvm-guest: stealtime: cpu 55, msr 8cbd7b7080
[ 6.791718] #56
[ 1.184458] kvm-clock: cpu 56, msr 47fb801e01, secondary cpu clock
[ 6.793116] kvm-guest: stealtime: cpu 56, msr 8cbd837080
[ 6.799719] #57
[ 1.184458] kvm-clock: cpu 57, msr 47fb801e41, secondary cpu clock
[ 6.801131] kvm-guest: stealtime: cpu 57, msr 8cbd8b7080
[ 6.807710] #58
[ 1.184458] kvm-clock: cpu 58, msr 47fb801e81, secondary cpu clock
[ 6.812422] kvm-guest: stealtime: cpu 58, msr 8cbd937080
[ 6.819721] #59
[ 1.184458] kvm-clock: cpu 59, msr 47fb801ec1, secondary cpu clock
[ 6.821156] kvm-guest: stealtime: cpu 59, msr 8cbd9b7080
[ 6.827712] #60
[ 1.184458] kvm-clock: cpu 60, msr 47fb801f01, secondary cpu clock
[ 6.829152] kvm-guest: stealtime: cpu 60, msr 8cbda37080
[ 6.835719] #61
[ 1.184458] kvm-clock: cpu 61, msr 47fb801f41, secondary cpu clock
[ 6.840303] kvm-guest: stealtime: cpu 61, msr 8cbdab7080
[ 6.847713] #62
[ 1.184458] kvm-clock: cpu 62, msr 47fb801f81, secondary cpu clock
[ 6.849144] kvm-guest: stealtime: cpu 62, msr 8cbdb37080
[ 6.855713] #63
[ 1.184458] kvm-clock: cpu 63, msr 47fb801fc1, secondary cpu clock
[ 6.857110] kvm-guest: stealtime: cpu 63, msr 8cbdbb7080
[ 6.863714] #64
[ 1.184458] kvm-clock: cpu 64, msr 118f68001, secondary cpu clock
[ 6.867975] kvm-guest: stealtime: cpu 64, msr 8cbdc37080
[ 6.875709] #65
[ 1.184458] kvm-clock: cpu 65, msr 118f68041, secondary cpu clock
[ 6.877120] kvm-guest: stealtime: cpu 65, msr 8cbdcb7080
[ 6.883722] #66
[ 1.184458] kvm-clock: cpu 66, msr 118f68081, secondary cpu clock
[ 6.885145] kvm-guest: stealtime: cpu 66, msr 8cbdd37080
[ 6.891721] #67
[ 1.184458] kvm-clock: cpu 67, msr 118f680c1, secondary cpu clock
[ 6.893150] kvm-guest: stealtime: cpu 67, msr 8cbddb7080
[ 6.899709] #68
[ 1.184458] kvm-clock: cpu 68, msr 118f68101, secondary cpu clock
[ 6.904692] kvm-guest: stealtime: cpu 68, msr 8cbde37080
[ 6.911721] #69
[ 1.184458] kvm-clock: cpu 69, msr 118f68141, secondary cpu clock
[ 6.913130] kvm-guest: stealtime: cpu 69, msr 8cbdeb7080
[ 6.919717] #70
[ 1.184458] kvm-clock: cpu 70, msr 118f68181, secondary cpu clock
[ 6.921118] kvm-guest: stealtime: cpu 70, msr 8cbdf37080
[ 6.927714] #71
[ 1.184458] kvm-clock: cpu 71, msr 118f681c1, secondary cpu clock
[ 6.932303] kvm-guest: stealtime: cpu 71, msr 8cbdfb7080
[ 6.941980] .... node #1, CPUs: #72
[ 1.184458] kvm-clock: cpu 72, msr 118f68201, secondary cpu clock
[ 6.944128] kvm-guest: stealtime: cpu 72, msr 11cbd437080
[ 6.951750] #73
[ 1.184458] kvm-clock: cpu 73, msr 118f68241, secondary cpu clock
[ 6.953182] kvm-guest: stealtime: cpu 73, msr 11cbd4b7080
[ 6.959742] #74
[ 1.184458] kvm-clock: cpu 74, msr 118f68281, secondary cpu clock
[ 6.961148] kvm-guest: stealtime: cpu 74, msr 11cbd537080
[ 6.967755] #75
[ 1.184458] kvm-clock: cpu 75, msr 118f682c1, secondary cpu clock
[ 6.972284] kvm-guest: stealtime: cpu 75, msr 11cbd5b7080
[ 6.979797] #76
[ 1.184458] kvm-clock: cpu 76, msr 118f68301, secondary cpu clock
[ 6.981191] kvm-guest: stealtime: cpu 76, msr 11cbd637080
[ 6.987756] #77
[ 1.184458] kvm-clock: cpu 77, msr 118f68341, secondary cpu clock
[ 6.989182] kvm-guest: stealtime: cpu 77, msr 11cbd6b7080
[ 6.995747] #78
[ 1.184458] kvm-clock: cpu 78, msr 118f68381, secondary cpu clock
[ 7.000611] kvm-guest: stealtime: cpu 78, msr 11cbd737080
[ 7.007755] #79
[ 1.184458] kvm-clock: cpu 79, msr 118f683c1, secondary cpu clock
[ 7.009177] kvm-guest: stealtime: cpu 79, msr 11cbd7b7080
[ 7.015743] #80
[ 1.184458] kvm-clock: cpu 80, msr 118f68401, secondary cpu clock
[ 7.017134] kvm-guest: stealtime: cpu 80, msr 11cbd837080
[ 7.023751] #81
[ 1.184458] kvm-clock: cpu 81, msr 118f68441, secondary cpu clock
[ 7.028730] kvm-guest: stealtime: cpu 81, msr 11cbd8b7080
[ 7.035744] #82
[ 1.184458] kvm-clock: cpu 82, msr 118f68481, secondary cpu clock
[ 7.037130] kvm-guest: stealtime: cpu 82, msr 11cbd937080
[ 7.043751] #83
[ 1.184458] kvm-clock: cpu 83, msr 118f684c1, secondary cpu clock
[ 7.045155] kvm-guest: stealtime: cpu 83, msr 11cbd9b7080
[ 7.051754] #84
[ 1.184458] kvm-clock: cpu 84, msr 118f68501, secondary cpu clock
[ 7.056747] kvm-guest: stealtime: cpu 84, msr 11cbda37080
[ 7.063749] #85
[ 1.184458] kvm-clock: cpu 85, msr 118f68541, secondary cpu clock
[ 7.065179] kvm-guest: stealtime: cpu 85, msr 11cbdab7080
[ 7.071738] #86
[ 1.184458] kvm-clock: cpu 86, msr 118f68581, secondary cpu clock
[ 7.073164] kvm-guest: stealtime: cpu 86, msr 11cbdb37080
[ 7.079764] #87
[ 1.184458] kvm-clock: cpu 87, msr 118f685c1, secondary cpu clock
[ 7.084926] kvm-guest: stealtime: cpu 87, msr 11cbdbb7080
[ 7.091739] #88
[ 1.184458] kvm-clock: cpu 88, msr 118f68601, secondary cpu clock
[ 7.093156] kvm-guest: stealtime: cpu 88, msr 11cbdc37080
[ 7.099769] #89
[ 1.184458] kvm-clock: cpu 89, msr 118f68641, secondary cpu clock
[ 7.103626] kvm-guest: stealtime: cpu 89, msr 11cbdcb7080
[ 7.111603] #90
[ 1.184458] kvm-clock: cpu 90, msr 118f68681, secondary cpu clock
[ 7.113032] kvm-guest: stealtime: cpu 90, msr 11cbdd37080
[ 7.119747] #91
[ 1.184458] kvm-clock: cpu 91, msr 118f686c1, secondary cpu clock
[ 7.121182] kvm-guest: stealtime: cpu 91, msr 11cbddb7080
[ 7.127749] #92
[ 1.184458] kvm-clock: cpu 92, msr 118f68701, secondary cpu clock
[ 7.131767] kvm-guest: stealtime: cpu 92, msr 11cbde37080
[ 7.139710] #93
[ 1.184458] kvm-clock: cpu 93, msr 118f68741, secondary cpu clock
[ 7.141114] kvm-guest: stealtime: cpu 93, msr 11cbdeb7080
[ 7.147757] #94
[ 1.184458] kvm-clock: cpu 94, msr 118f68781, secondary cpu clock
[ 7.149198] kvm-guest: stealtime: cpu 94, msr 11cbdf37080
[ 7.155745] #95
[ 1.184458] kvm-clock: cpu 95, msr 118f687c1, secondary cpu clock
[ 7.160118] kvm-guest: stealtime: cpu 95, msr 11cbdfb7080
[ 7.167754] smp: Brought up 2 nodes, 96 CPUs
[ 7.170993] smpboot: Max logical packages: 2
[ 7.171599] smpboot: Total of 96 processors activated (575999.61 BogoMIPS)
[ 7.231219] devtmpfs: initialized
[ 7.231634] x86/mm: Memory block size: 128MB
[ 7.295189] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 7.296044] futex hash table entries: 32768 (order: 9, 2097152 bytes, vmalloc)
[ 7.299979] pinctrl core: initialized pinctrl subsystem
[ 7.303845] PM: RTC time: 07:10:05, date: 2022-07-28
[ 7.307793] NET: Registered protocol family 16
[ 7.312026] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
[ 7.316428] DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 7.320416] DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[ 7.323605] audit: initializing netlink subsys (disabled)
[ 7.327320] audit: type=2000 audit(1658992205.187:1): state=initialized audit_enabled=0 res=1
[ 7.327320] thermal_sys: Registered thermal governor 'fair_share'
[ 7.327597] thermal_sys: Registered thermal governor 'bang_bang'
[ 7.331563] thermal_sys: Registered thermal governor 'step_wise'
[ 7.331596] thermal_sys: Registered thermal governor 'user_space'
[ 7.335552] thermal_sys: Registered thermal governor 'power_allocator'
[ 7.335600] EISA bus registered
[ 7.342473] cpuidle: using governor ladder
[ 7.343613] cpuidle: using governor menu
[ 7.347792] ACPI: bus type PCI registered
[ 7.350981] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 7.351895] PCI: Using configuration type 1 for base access
[ 7.369483] Kprobes globally optimized
[ 7.371675] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 7.375599] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 7.379761] ACPI: Added _OSI(Module Device)
[ 7.383598] ACPI: Added _OSI(Processor Device)
[ 7.386963] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 7.387612] ACPI: Added _OSI(Processor Aggregator Device)
[ 7.391328] ACPI: Added _OSI(Linux-Dell-Video)
[ 7.391596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 7.395266] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 7.400856] ACPI: 3 ACPI AML tables successfully acquired and loaded
[ 7.410818] ACPI: Interpreter enabled
[ 7.411604] ACPI: (supports S0 S4 S5)
[ 7.414638] ACPI: Using IOAPIC for interrupt routing
[ 7.415605] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 7.420107] ACPI: Enabled 16 GPEs in block 00 to 0F
[ 7.436179] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00])
[ 7.439601] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 7.443605] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 7.447894] acpiphp: Slot [3] registered
[ 7.451040] acpiphp: Slot [4] registered
[ 7.451609] acpiphp: Slot [5] registered
[ 7.454737] acpiphp: Slot [6] registered
[ 7.455609] acpiphp: Slot [7] registered
[ 7.458771] acpiphp: Slot [8] registered
[ 7.459609] acpiphp: Slot [9] registered
[ 7.462785] acpiphp: Slot [10] registered
[ 7.463610] acpiphp: Slot [11] registered
[ 7.466789] acpiphp: Slot [12] registered
[ 7.467609] acpiphp: Slot [13] registered
[ 7.470790] acpiphp: Slot [14] registered
[ 7.471610] acpiphp: Slot [15] registered
[ 7.474831] acpiphp: Slot [16] registered
[ 7.475609] acpiphp: Slot [17] registered
[ 7.478803] acpiphp: Slot [18] registered
[ 7.479609] acpiphp: Slot [19] registered
[ 7.482811] acpiphp: Slot [20] registered
[ 7.483610] acpiphp: Slot [21] registered
[ 7.486807] acpiphp: Slot [22] registered
[ 7.487609] acpiphp: Slot [23] registered
[ 7.490790] acpiphp: Slot [24] registered
[ 7.491609] acpiphp: Slot [25] registered
[ 7.494767] acpiphp: Slot [26] registered
[ 7.495612] acpiphp: Slot [27] registered
[ 7.498798] acpiphp: Slot [28] registered
[ 7.499609] acpiphp: Slot [29] registered
[ 7.502791] acpiphp: Slot [30] registered
[ 7.503609] acpiphp: Slot [31] registered
[ 7.506790] PCI host bridge to bus 0000:00
[ 7.507597] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
[ 7.511597] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 7.515597] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 7.519596] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[ 7.523597] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xc3ffffff window]
[ 7.527627] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[ 7.532163] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[ 7.536596] pci 0000:00:01.3: [8086:7113] type 00 class 0x000000
[ 7.540347] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[ 7.543613] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[ 7.547645] pci 0000:00:01.3: PIIX4 devres E PIO at fff0-ffff
[ 7.551498] pci 0000:00:01.3: PIIX4 devres F MMIO at ffc00000-ffffffff
[ 7.551612] pci 0000:00:01.3: PIIX4 devres G PIO at fff0-ffff
[ 7.555419] pci 0000:00:01.3: PIIX4 devres H MMIO at ffc00000-ffffffff
[ 7.555612] pci 0000:00:01.3: PIIX4 devres I PIO at fff0-ffff
[ 7.559441] pci 0000:00:01.3: PIIX4 devres J PIO at fff0-ffff
[ 7.559597] pci 0000:00:01.3: quirk_piix4_acpi+0x0/0x170 took 19531 usecs
[ 7.564310] pci 0000:00:03.0: [1d0f:1111] type 00 class 0x030000
[ 7.568531] pci 0000:00:03.0: reg 0x10: [mem 0xc2000000-0xc23fffff pref]
[ 7.575573] pci 0000:00:03.0: reg 0x30: [mem 0xc0000000-0xc000ffff pref]
[ 7.575674] pci 0000:00:03.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[ 7.580074] pci 0000:00:04.0: [1d0f:8061] type 00 class 0x010802
[ 7.585056] pci 0000:00:04.0: reg 0x10: [mem 0xc0010000-0xc0013fff]
[ 7.600982] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
[ 7.604847] pci 0000:00:1f.0: reg 0x10: [mem 0xc0014000-0xc0017fff]
[ 7.614873] ACPI: PCI Root Bridge [PC01] (domain 0000 [bus 10])
[ 7.615603] acpi PNP0A03:01: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 7.619612] acpi PNP0A03:01: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 7.623847] acpiphp: Slot [32] registered
[ 7.627002] acpiphp: Slot [33] registered
[ 7.627609] acpiphp: Slot [34] registered
[ 7.630810] acpiphp: Slot [35] registered
[ 7.631609] acpiphp: Slot [36] registered
[ 7.634797] acpiphp: Slot [37] registered
[ 7.635609] acpiphp: Slot [38] registered
[ 7.638812] acpiphp: Slot [39] registered
[ 7.639609] acpiphp: Slot [40] registered
[ 7.642802] acpiphp: Slot [41] registered
[ 7.643610] acpiphp: Slot [42] registered
[ 7.646810] acpiphp: Slot [43] registered
[ 7.647608] acpiphp: Slot [44] registered
[ 7.650792] acpiphp: Slot [45] registered
[ 7.651611] acpiphp: Slot [46] registered
[ 7.654822] acpiphp: Slot [47] registered
[ 7.655609] acpiphp: Slot [48] registered
[ 7.658790] acpiphp: Slot [49] registered
[ 7.659608] acpiphp: Slot [50] registered
[ 7.662842] acpiphp: Slot [51] registered
[ 7.663611] acpiphp: Slot [52] registered
[ 7.666774] acpiphp: Slot [53] registered
[ 7.667608] acpiphp: Slot [54] registered
[ 7.670820] acpiphp: Slot [55] registered
[ 7.671609] acpiphp: Slot [56] registered
[ 7.674787] acpiphp: Slot [57] registered
[ 7.675609] acpiphp: Slot [58] registered
[ 7.678798] acpiphp: Slot [59] registered
[ 7.679609] acpiphp: Slot [60] registered
[ 7.682790] acpiphp: Slot [61] registered
[ 7.683609] acpiphp: Slot [62] registered
[ 7.686803] acpiphp: Slot [63] registered
[ 7.687605] PCI host bridge to bus 0000:10
[ 7.690800] pci_bus 0000:10: root bus resource [bus 10]
[ 7.691597] pci_bus 0000:10: root bus resource [mem 0xc4000000-0xc7ffffff window]
[ 7.695597] pci_bus 0000:10: root bus resource [mem 0x39c000000000-0x39f4e80fffff window]
[ 7.699678] pci 0000:10:00.0: [1d0f:ec20] type 00 class 0x020000
[ 7.705879] pci 0000:10:00.0: reg 0x10: [mem 0xc6800000-0xc6803fff]
[ 7.711580] pci 0000:10:00.0: reg 0x18: [mem 0x39d417c00000-0x39d417ffffff 64bit pref]
[ 7.716074] pci 0000:10:00.0: enabling Extended Tags
[ 7.720302] pci 0000:10:01.0: [1d0f:ec20] type 00 class 0x020000
[ 7.725591] pci 0000:10:01.0: reg 0x10: [mem 0xc6804000-0xc6807fff]
[ 7.731397] pci 0000:10:01.0: reg 0x18: [mem 0x39d417800000-0x39d417bfffff 64bit pref]
[ 7.736062] pci 0000:10:01.0: enabling Extended Tags
[ 7.746020] pci 0000:10:1b.0: [1d0f:efa0] type 00 class 0x020000
[ 7.750098] pci 0000:10:1b.0: reg 0x10: [mem 0xc6808000-0xc680bfff]
[ 7.756089] pci 0000:10:1b.0: reg 0x18: [mem 0x39d418000000-0x39d41fffffff 64bit pref]
[ 7.762030] pci 0000:10:1b.0: reg 0x20: [mem 0xc6000000-0xc67fffff]
[ 7.768074] pci 0000:10:1b.0: enabling Extended Tags
[ 7.772370] pci 0000:10:1c.0: [10de:20b0] type 00 class 0x030200
[ 7.789325] pci 0000:10:1c.0: reg 0x10: [mem 0xc4000000-0xc4ffffff]
[ 7.797315] pci 0000:10:1c.0: reg 0x14: [mem 0x39e000000000-0x39efffffffff 64bit pref]
[ 7.805304] pci 0000:10:1c.0: reg 0x1c: [mem 0x39f420000000-0x39f421ffffff 64bit pref]
[ 7.817393] pci 0000:10:1c.0: Enabling HDA controller
[ 7.820050] pci 0000:10:1c.0: PME# supported from D0 D3hot
[ 7.824266] pci 0000:10:1d.0: [10de:20b0] type 00 class 0x030200
[ 7.841300] pci 0000:10:1d.0: reg 0x10: [mem 0xc5000000-0xc5ffffff]
[ 7.849317] pci 0000:10:1d.0: reg 0x14: [mem 0x39c000000000-0x39cfffffffff 64bit pref]
[ 7.857292] pci 0000:10:1d.0: reg 0x1c: [mem 0x39d420000000-0x39d421ffffff 64bit pref]
[ 7.869422] pci 0000:10:1d.0: Enabling HDA controller
[ 7.872054] pci 0000:10:1d.0: PME# supported from D0 D3hot
[ 7.876186] pci 0000:10:1e.0: [1d0f:cd01] type 00 class 0x010802
[ 7.880884] pci 0000:10:1e.0: reg 0x10: [mem 0xc680c000-0xc680ffff]
[ 7.886054] pci 0000:10:1e.0: reg 0x18: [mem 0x39d4177fe000-0x39d4177fffff 64bit pref]
[ 7.891929] pci 0000:10:1f.0: [1d0f:cd01] type 00 class 0x010802
[ 7.896868] pci 0000:10:1f.0: reg 0x10: [mem 0xc6810000-0xc6813fff]
[ 7.902117] pci 0000:10:1f.0: reg 0x18: [mem 0x39d4177fc000-0x39d4177fdfff 64bit pref]
[ 7.907674] pci_bus 0000:10: on NUMA node 0
[ 7.907814] ACPI: PCI Root Bridge [PC02] (domain 0000 [bus 20])
[ 7.911599] acpi PNP0A03:02: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 7.915611] acpi PNP0A03:02: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 7.919844] acpiphp: Slot [64] registered
[ 7.923007] acpiphp: Slot [65] registered
[ 7.923612] acpiphp: Slot [66] registered
[ 7.926772] acpiphp: Slot [67] registered
[ 7.927609] acpiphp: Slot [68] registered
[ 7.930805] acpiphp: Slot [69] registered
[ 7.931609] acpiphp: Slot [70] registered
[ 7.934783] acpiphp: Slot [71] registered
[ 7.935610] acpiphp: Slot [72] registered
[ 7.938797] acpiphp: Slot [73] registered
[ 7.939609] acpiphp: Slot [74] registered
[ 7.942788] acpiphp: Slot [75] registered
[ 7.943611] acpiphp: Slot [76] registered
[ 7.946791] acpiphp: Slot [77] registered
[ 7.947609] acpiphp: Slot [78] registered
[ 7.950778] acpiphp: Slot [79] registered
[ 7.951608] acpiphp: Slot [80] registered
[ 7.954756] acpiphp: Slot [81] registered
[ 7.955608] acpiphp: Slot [82] registered
[ 7.958801] acpiphp: Slot [83] registered
[ 7.959609] acpiphp: Slot [84] registered
[ 7.962860] acpiphp: Slot [85] registered
[ 7.963609] acpiphp: Slot [86] registered
[ 7.966806] acpiphp: Slot [87] registered
[ 7.967609] acpiphp: Slot [88] registered
[ 7.970769] acpiphp: Slot [89] registered
[ 7.971644] acpiphp: Slot [90] registered
[ 7.971922] acpiphp: Slot [91] registered
[ 7.974781] acpiphp: Slot [92] registered
[ 7.975594] acpiphp: Slot [93] registered
[ 7.983610] acpiphp: Slot [94] registered
[ 7.986790] acpiphp: Slot [95] registered
[ 7.987609] PCI host bridge to bus 0000:20
[ 7.990768] pci_bus 0000:20: root bus resource [bus 20]
[ 7.995598] pci_bus 0000:20: root bus resource [mem 0xc8000000-0xcbffffff window]
[ 7.999597] pci_bus 0000:20: root bus resource [mem 0x3ac000000000-0x3af4e80fffff window]
[ 8.007906] pci 0000:20:01.0: [1d0f:ec20] type 00 class 0x020000
[ 8.013576] pci 0000:20:01.0: reg 0x10: [mem 0xca800000-0xca803fff]
[ 8.023286] pci 0000:20:01.0: reg 0x18: [mem 0x3ad417c00000-0x3ad417ffffff 64bit pref]
[ 8.031972] pci 0000:20:01.0: enabling Extended Tags
[ 8.041973] pci 0000:20:1b.0: [1d0f:efa0] type 00 class 0x020000
[ 8.049794] pci 0000:20:1b.0: reg 0x10: [mem 0xca804000-0xca807fff]
[ 8.056017] pci 0000:20:1b.0: reg 0x18: [mem 0x3ad418000000-0x3ad41fffffff 64bit pref]
[ 8.065857] pci 0000:20:1b.0: reg 0x20: [mem 0xca000000-0xca7fffff]
[ 8.076022] pci 0000:20:1b.0: enabling Extended Tags
[ 8.080361] pci 0000:20:1c.0: [10de:20b0] type 00 class 0x030200
[ 8.137275] pci 0000:20:1c.0: reg 0x10: [mem 0xc8000000-0xc8ffffff]
[ 8.157259] pci 0000:20:1c.0: reg 0x14: [mem 0x3ae000000000-0x3aefffffffff 64bit pref]
[ 8.177252] pci 0000:20:1c.0: reg 0x1c: [mem 0x3af420000000-0x3af421ffffff 64bit pref]
[ 8.213360] pci 0000:20:1c.0: Enabling HDA controller
[ 8.221299] pci 0000:20:1c.0: PME# supported from D0 D3hot
[ 8.224249] pci 0000:20:1d.0: [10de:20b0] type 00 class 0x030200
[ 8.281324] pci 0000:20:1d.0: reg 0x10: [mem 0xc9000000-0xc9ffffff]
[ 8.301268] pci 0000:20:1d.0: reg 0x14: [mem 0x3ac000000000-0x3acfffffffff 64bit pref]
[ 8.321267] pci 0000:20:1d.0: reg 0x1c: [mem 0x3ad420000000-0x3ad421ffffff 64bit pref]
[ 8.357357] pci 0000:20:1d.0: Enabling HDA controller
[ 8.364049] pci 0000:20:1d.0: PME# supported from D0 D3hot
[ 8.368185] pci 0000:20:1e.0: [1d0f:cd01] type 00 class 0x010802
[ 8.372941] pci 0000:20:1e.0: reg 0x10: [mem 0xca808000-0xca80bfff]
[ 8.377839] pci 0000:20:1e.0: reg 0x18: [mem 0x3ad417bfe000-0x3ad417bfffff 64bit pref]
[ 8.387777] pci 0000:20:1f.0: [1d0f:cd01] type 00 class 0x010802
[ 8.392873] pci 0000:20:1f.0: reg 0x10: [mem 0xca80c000-0xca80ffff]
[ 8.402123] pci 0000:20:1f.0: reg 0x18: [mem 0x3ad417bfc000-0x3ad417bfdfff 64bit pref]
[ 8.412104] pci_bus 0000:20: on NUMA node 0
[ 8.412246] ACPI: PCI Root Bridge [PC03] (domain 0000 [bus 80])
[ 8.415599] acpi PNP0A03:03: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 8.423611] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 8.427857] acpiphp: Slot [96] registered
[ 8.431038] acpiphp: Slot [97] registered
[ 8.435609] acpiphp: Slot [98] registered
[ 8.438793] acpiphp: Slot [99] registered
[ 8.443610] acpiphp: Slot [100] registered
[ 8.446812] acpiphp: Slot [101] registered
[ 8.447609] acpiphp: Slot [102] registered
[ 8.450832] acpiphp: Slot [103] registered
[ 8.455609] acpiphp: Slot [104] registered
[ 8.458836] acpiphp: Slot [105] registered
[ 8.459610] acpiphp: Slot [106] registered
[ 8.462832] acpiphp: Slot [107] registered
[ 8.467609] acpiphp: Slot [108] registered
[ 8.470840] acpiphp: Slot [109] registered
[ 8.475610] acpiphp: Slot [110] registered
[ 8.478792] acpiphp: Slot [111] registered
[ 8.479609] acpiphp: Slot [112] registered
[ 8.482822] acpiphp: Slot [113] registered
[ 8.487610] acpiphp: Slot [114] registered
[ 8.490812] acpiphp: Slot [115] registered
[ 8.491610] acpiphp: Slot [116] registered
[ 8.494825] acpiphp: Slot [117] registered
[ 8.499609] acpiphp: Slot [118] registered
[ 8.502824] acpiphp: Slot [119] registered
[ 8.507613] acpiphp: Slot [120] registered
[ 8.510864] acpiphp: Slot [121] registered
[ 8.511610] acpiphp: Slot [122] registered
[ 8.514862] acpiphp: Slot [123] registered
[ 8.519609] acpiphp: Slot [124] registered
[ 8.522818] acpiphp: Slot [125] registered
[ 8.527609] acpiphp: Slot [126] registered
[ 8.530830] acpiphp: Slot [127] registered
[ 8.531605] PCI host bridge to bus 0000:80
[ 8.534806] pci_bus 0000:80: root bus resource [bus 80]
[ 8.539597] pci_bus 0000:80: root bus resource [mem 0xd4000000-0xdfffffff window]
[ 8.549710] pci 0000:80:1a.0: [10de:1af1] type 00 class 0x068000
[ 8.591203] pci 0000:80:1a.0: reg 0x10: [mem 0xd4000000-0xd5ffffff]
[ 8.615916] pci 0000:80:1a.0: PME# supported from D0 D3hot
[ 8.620337] pci 0000:80:1b.0: [10de:1af1] type 00 class 0x068000
[ 8.663152] pci 0000:80:1b.0: reg 0x10: [mem 0xd6000000-0xd7ffffff]
[ 8.688528] pci 0000:80:1b.0: PME# supported from D0 D3hot
[ 8.692345] pci 0000:80:1c.0: [10de:1af1] type 00 class 0x068000
[ 8.731234] pci 0000:80:1c.0: reg 0x10: [mem 0xd8000000-0xd9ffffff]
[ 8.756542] pci 0000:80:1c.0: PME# supported from D0 D3hot
[ 8.760343] pci 0000:80:1d.0: [10de:1af1] type 00 class 0x068000
[ 8.807351] pci 0000:80:1d.0: reg 0x10: [mem 0xda000000-0xdbffffff]
[ 8.832535] pci 0000:80:1d.0: PME# supported from D0 D3hot
[ 8.836345] pci 0000:80:1e.0: [10de:1af1] type 00 class 0x068000
[ 8.859278] pci 0000:80:1e.0: reg 0x10: [mem 0xdc000000-0xddffffff]
[ 8.879997] pci 0000:80:1e.0: PME# supported from D0 D3hot
[ 8.884345] pci 0000:80:1f.0: [10de:1af1] type 00 class 0x068000
[ 8.927242] pci 0000:80:1f.0: reg 0x10: [mem 0xde000000-0xdfffffff]
[ 8.952546] pci 0000:80:1f.0: PME# supported from D0 D3hot
[ 8.956198] pci_bus 0000:80: on NUMA node 1
[ 8.956329] ACPI: PCI Root Bridge [PC04] (domain 0000 [bus 90])
[ 8.959599] acpi PNP0A03:04: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 8.967611] acpi PNP0A03:04: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 8.975866] acpiphp: Slot [128] registered
[ 8.979093] acpiphp: Slot [129] registered
[ 8.979612] acpiphp: Slot [130] registered
[ 8.982794] acpiphp: Slot [131] registered
[ 8.987609] acpiphp: Slot [132] registered
[ 8.990825] acpiphp: Slot [133] registered
[ 8.995611] acpiphp: Slot [134] registered
[ 8.998801] acpiphp: Slot [135] registered
[ 8.999611] acpiphp: Slot [136] registered
[ 9.002809] acpiphp: Slot [137] registered
[ 9.007609] acpiphp: Slot [138] registered
[ 9.010916] acpiphp: Slot [139] registered
[ 9.015610] acpiphp: Slot [140] registered
[ 9.018937] acpiphp: Slot [141] registered
[ 9.019611] acpiphp: Slot [142] registered
[ 9.022839] acpiphp: Slot [143] registered
[ 9.027609] acpiphp: Slot [144] registered
[ 9.030866] acpiphp: Slot [145] registered
[ 9.031609] acpiphp: Slot [146] registered
[ 9.034839] acpiphp: Slot [147] registered
[ 9.039609] acpiphp: Slot [148] registered
[ 9.042817] acpiphp: Slot [149] registered
[ 9.047611] acpiphp: Slot [150] registered
[ 9.050859] acpiphp: Slot [151] registered
[ 9.051609] acpiphp: Slot [152] registered
[ 9.054824] acpiphp: Slot [153] registered
[ 9.059610] acpiphp: Slot [154] registered
[ 9.062839] acpiphp: Slot [155] registered
[ 9.063609] acpiphp: Slot [156] registered
[ 9.066888] acpiphp: Slot [157] registered
[ 9.071616] acpiphp: Slot [158] registered
[ 9.074825] acpiphp: Slot [159] registered
[ 9.079606] PCI host bridge to bus 0000:90
[ 9.082813] pci_bus 0000:90: root bus resource [bus 90]
[ 9.083597] pci_bus 0000:90: root bus resource [mem 0xcc000000-0xcfffffff window]
[ 9.091597] pci_bus 0000:90: root bus resource [mem 0x3ec000000000-0x3ef4e7ffffff window]
[ 9.095910] pci 0000:90:01.0: [1d0f:ec20] type 00 class 0x020000
[ 9.105593] pci 0000:90:01.0: reg 0x10: [mem 0xce800000-0xce803fff]
[ 9.111119] pci 0000:90:01.0: reg 0x18: [mem 0x3ed417c00000-0x3ed417ffffff 64bit pref]
[ 9.124001] pci 0000:90:01.0: enabling Extended Tags
[ 9.133964] pci 0000:90:1b.0: [1d0f:efa0] type 00 class 0x020000
[ 9.138111] pci 0000:90:1b.0: reg 0x10: [mem 0xce804000-0xce807fff]
[ 9.148040] pci 0000:90:1b.0: reg 0x18: [mem 0x3ed418000000-0x3ed41fffffff 64bit pref]
[ 9.154072] pci 0000:90:1b.0: reg 0x20: [mem 0xce000000-0xce7fffff]
[ 9.164046] pci 0000:90:1b.0: enabling Extended Tags
[ 9.168359] pci 0000:90:1c.0: [10de:20b0] type 00 class 0x030200
[ 9.225362] pci 0000:90:1c.0: reg 0x10: [mem 0xcc000000-0xccffffff]
[ 9.245352] pci 0000:90:1c.0: reg 0x14: [mem 0x3ee000000000-0x3eefffffffff 64bit pref]
[ 9.269348] pci 0000:90:1c.0: reg 0x1c: [mem 0x3ef420000000-0x3ef421ffffff 64bit pref]
[ 9.397440] pci 0000:90:1c.0: Enabling HDA controller
[ 9.404052] pci 0000:90:1c.0: PME# supported from D0 D3hot
[ 9.408257] pci 0000:90:1d.0: [10de:20b0] type 00 class 0x030200
[ 9.467598] pci 0000:90:1d.0: reg 0x10: [mem 0xcd000000-0xcdffffff]
[ 9.487600] pci 0000:90:1d.0: reg 0x14: [mem 0x3ec000000000-0x3ecfffffffff 64bit pref]
[ 9.509278] pci 0000:90:1d.0: reg 0x1c: [mem 0x3ed420000000-0x3ed421ffffff 64bit pref]
[ 9.545374] pci 0000:90:1d.0: Enabling HDA controller
[ 9.548056] pci 0000:90:1d.0: PME# supported from D0 D3hot
[ 9.552189] pci 0000:90:1e.0: [1d0f:cd01] type 00 class 0x010802
[ 9.556759] pci 0000:90:1e.0: reg 0x10: [mem 0xce808000-0xce80bfff]
[ 9.561651] pci 0000:90:1e.0: reg 0x18: [mem 0x3ed417bfe000-0x3ed417bfffff 64bit pref]
[ 9.567396] pci 0000:90:1f.0: [1d0f:cd01] type 00 class 0x010802
[ 9.568646] pci 0000:90:1f.0: reg 0x10: [mem 0xce80c000-0xce80ffff]
[ 9.573720] pci 0000:90:1f.0: reg 0x18: [mem 0x3ed417bfc000-0x3ed417bfdfff 64bit pref]
[ 9.579194] pci_bus 0000:90: on NUMA node 1
[ 9.579336] ACPI: PCI Root Bridge [PC05] (domain 0000 [bus a0])
[ 9.579599] acpi PNP0A03:05: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 9.583611] acpi PNP0A03:05: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 9.587867] acpiphp: Slot [160] registered
[ 9.591068] acpiphp: Slot [161] registered
[ 9.591610] acpiphp: Slot [162] registered
[ 9.594800] acpiphp: Slot [163] registered
[ 9.595611] acpiphp: Slot [164] registered
[ 9.598834] acpiphp: Slot [165] registered
[ 9.599609] acpiphp: Slot [166] registered
[ 9.602829] acpiphp: Slot [167] registered
[ 9.603608] acpiphp: Slot [168] registered
[ 9.606815] acpiphp: Slot [169] registered
[ 9.607610] acpiphp: Slot [170] registered
[ 9.610815] acpiphp: Slot [171] registered
[ 9.611609] acpiphp: Slot [172] registered
[ 9.614823] acpiphp: Slot [173] registered
[ 9.615608] acpiphp: Slot [174] registered
[ 9.618876] acpiphp: Slot [175] registered
[ 9.619610] acpiphp: Slot [176] registered
[ 9.622853] acpiphp: Slot [177] registered
[ 9.623608] acpiphp: Slot [178] registered
[ 9.626901] acpiphp: Slot [179] registered
[ 9.627610] acpiphp: Slot [180] registered
[ 9.630867] acpiphp: Slot [181] registered
[ 9.631613] acpiphp: Slot [182] registered
[ 9.634830] acpiphp: Slot [183] registered
[ 9.635608] acpiphp: Slot [184] registered
[ 9.638835] acpiphp: Slot [185] registered
[ 9.639609] acpiphp: Slot [186] registered
[ 9.642832] acpiphp: Slot [187] registered
[ 9.643609] acpiphp: Slot [188] registered
[ 9.646924] acpiphp: Slot [189] registered
[ 9.647609] acpiphp: Slot [190] registered
[ 9.650796] acpiphp: Slot [191] registered
[ 9.651605] PCI host bridge to bus 0000:a0
[ 9.654769] pci_bus 0000:a0: root bus resource [bus a0]
[ 9.655597] pci_bus 0000:a0: root bus resource [mem 0xd0000000-0xd3ffffff window]
[ 9.659597] pci_bus 0000:a0: root bus resource [mem 0x3fc000000000-0x3ff4e7ffffff window]
[ 9.663905] pci 0000:a0:01.0: [1d0f:ec20] type 00 class 0x020000
[ 9.669697] pci 0000:a0:01.0: reg 0x10: [mem 0xd2800000-0xd2803fff]
[ 9.675161] pci 0000:a0:01.0: reg 0x18: [mem 0x3fd417c00000-0x3fd417ffffff 64bit pref]
[ 9.679998] pci 0000:a0:01.0: enabling Extended Tags
[ 9.689930] pci 0000:a0:1b.0: [1d0f:efa0] type 00 class 0x020000
[ 9.694137] pci 0000:a0:1b.0: reg 0x10: [mem 0xd2804000-0xd2807fff]
[ 9.700031] pci 0000:a0:1b.0: reg 0x18: [mem 0x3fd418000000-0x3fd41fffffff 64bit pref]
[ 9.706015] pci 0000:a0:1b.0: reg 0x20: [mem 0xd2000000-0xd27fffff]
[ 9.712057] pci 0000:a0:1b.0: enabling Extended Tags
[ 9.716331] pci 0000:a0:1c.0: [10de:20b0] type 00 class 0x030200
[ 9.729373] pci 0000:a0:1c.0: reg 0x10: [mem 0xd0000000-0xd0ffffff]
[ 9.737368] pci 0000:a0:1c.0: reg 0x14: [mem 0x3fe000000000-0x3fefffffffff 64bit pref]
[ 9.745372] pci 0000:a0:1c.0: reg 0x1c: [mem 0x3ff420000000-0x3ff421ffffff 64bit pref]
[ 9.757491] pci 0000:a0:1c.0: Enabling HDA controller
[ 9.760053] pci 0000:a0:1c.0: PME# supported from D0 D3hot
[ 9.764254] pci 0000:a0:1d.0: [10de:20b0] type 00 class 0x030200
[ 9.781298] pci 0000:a0:1d.0: reg 0x10: [mem 0xd1000000-0xd1ffffff]
[ 9.789299] pci 0000:a0:1d.0: reg 0x14: [mem 0x3fc000000000-0x3fcfffffffff 64bit pref]
[ 9.797310] pci 0000:a0:1d.0: reg 0x1c: [mem 0x3fd420000000-0x3fd421ffffff 64bit pref]
[ 9.809400] pci 0000:a0:1d.0: Enabling HDA controller
[ 9.813224] pci 0000:a0:1d.0: PME# supported from D0 D3hot
[ 9.816184] pci 0000:a0:1e.0: [1d0f:cd01] type 00 class 0x010802
[ 9.820746] pci 0000:a0:1e.0: reg 0x10: [mem 0xd2808000-0xd280bfff]
[ 9.825540] pci 0000:a0:1e.0: reg 0x18: [mem 0x3fd417bfe000-0x3fd417bfffff 64bit pref]
[ 9.831302] pci 0000:a0:1f.0: [1d0f:cd01] type 00 class 0x010802
[ 9.832549] pci 0000:a0:1f.0: reg 0x10: [mem 0xd280c000-0xd280ffff]
[ 9.837573] pci 0000:a0:1f.0: reg 0x18: [mem 0x3fd417bfc000-0x3fd417bfdfff 64bit pref]
[ 9.843274] pci_bus 0000:a0: on NUMA node 1
[ 9.843472] ACPI: PCI: Interrupt link LNKA configured for IRQ 10
[ 9.847674] ACPI: PCI: Interrupt link LNKB configured for IRQ 10
[ 9.851734] ACPI: PCI: Interrupt link LNKC configured for IRQ 11
[ 9.859667] ACPI: PCI: Interrupt link LNKD configured for IRQ 11
[ 9.863753] ACPI: PCI: Interrupt link LNKS configured for IRQ 9
[ 9.876443] iommu: Default domain type: Translated
[ 9.879831] SCSI subsystem initialized
[ 9.882939] libata version 3.00 loaded.
[ 9.882939] pci 0000:00:03.0: vgaarb: setting as boot VGA device
[ 9.883551] pci 0000:00:03.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[ 9.883602] pci 0000:00:03.0: vgaarb: bridge control possible
[ 9.887442] vgaarb: loaded
[ 9.887611] ACPI: bus type USB registered
[ 9.890803] usbcore: registered new interface driver usbfs
[ 9.891601] usbcore: registered new interface driver hub
[ 9.895310] usbcore: registered new device driver usb
[ 9.895611] pps_core: LinuxPPS API ver. 1 registered
[ 9.899154] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 9.899598] PTP clock support registered
[ 9.902770] EDAC MC: Ver: 3.0.0
[ 9.905738] NetLabel: Initializing
[ 9.907597] NetLabel: domain hash size = 128
[ 9.910877] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 9.911607] NetLabel: unlabeled traffic allowed by default
[ 9.915360] PCI: Using ACPI for IRQ routing
[ 9.915597] PCI: pci_cache_line_size set to 64 bytes
[ 9.916073] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[ 9.916075] e820: reserve RAM buffer [mem 0x7ffe2000-0x7fffffff]
[ 9.916423] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
[ 9.919597] hpet0: 8 comparators, 32-bit 62.500000 MHz counter
[ 9.926869] clocksource: Switched to clocksource kvm-clock
[ 9.938840] VFS: Disk quotas dquot_6.6.0
[ 9.942145] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 9.946610] AppArmor: AppArmor Filesystem Enabled
[ 9.950056] pnp: PnP ACPI init
[ 9.952890] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[ 9.952914] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[ 9.952931] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[ 9.952975] pnp 00:03: Plug and Play ACPI device, IDs PNP0400 (active)
[ 9.953017] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active)
[ 9.953371] pnp: PnP ACPI: found 5 devices
[ 9.962407] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 9.969094] NET: Registered protocol family 2
[ 9.972856] IP idents hash table entries: 262144 (order: 9, 2097152 bytes, vmalloc)
[ 9.981946] tcp_listen_portaddr_hash hash table entries: 65536 (order: 8, 1048576 bytes, vmalloc)
[ 9.989500] TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
[ 9.996376] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, vmalloc)
[ 10.002390] TCP: Hash tables configured (established 524288 bind 65536)
[ 10.007100] MPTCP token hash table entries: 65536 (order: 8, 1572864 bytes, vmalloc)
[ 10.013809] UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[ 10.018762] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[ 10.025131] NET: Registered protocol family 1
[ 10.028418] NET: Registered protocol family 44
[ 10.031765] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
[ 10.035757] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
[ 10.039724] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[ 10.043963] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xc3ffffff window]
[ 10.048227] pci_bus 0000:10: resource 4 [mem 0xc4000000-0xc7ffffff window]
[ 10.052469] pci_bus 0000:10: resource 5 [mem 0x39c000000000-0x39f4e80fffff window]
[ 10.058538] pci_bus 0000:20: resource 4 [mem 0xc8000000-0xcbffffff window]
[ 10.062764] pci_bus 0000:20: resource 5 [mem 0x3ac000000000-0x3af4e80fffff window]
[ 10.068841] pci_bus 0000:80: resource 4 [mem 0xd4000000-0xdfffffff window]
[ 10.073099] pci_bus 0000:90: resource 4 [mem 0xcc000000-0xcfffffff window]
[ 10.077336] pci_bus 0000:90: resource 5 [mem 0x3ec000000000-0x3ef4e7ffffff window]
[ 10.083370] pci_bus 0000:a0: resource 4 [mem 0xd0000000-0xd3ffffff window]
[ 10.087613] pci_bus 0000:a0: resource 5 [mem 0x3fc000000000-0x3ff4e7ffffff window]
[ 10.093661] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 10.097567] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 10.103150] PCI: CLS 32 bytes, default 64
[ 10.106327] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 10.106377] Trying to unpack rootfs image as initramfs...
[ 10.110420] software IO TLB: mapped [mem 0x000000007bfe2000-0x000000007ffe2000] (64MB)
[ 10.110487] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2b3e43c8763, max_idle_ns: 440795360101 ns
[ 10.129067] clocksource: Switched to clocksource tsc
[ 10.133693] Initialise system trusted keyrings
[ 10.137062] Key type blacklist registered
[ 10.140299] workingset: timestamp_bits=36 max_order=29 bucket_order=0
[ 10.145311] zbud: loaded
[ 10.241271] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 10.245443] fuse: init (API version 7.34)
[ 10.249091] integrity: Platform Keyring initialized
[ 10.258861] Freeing initrd memory: 97704K
[ 10.258883] Key type asymmetric registered
[ 10.265258] Asymmetric key parser 'x509' registered
[ 10.268745] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[ 10.274832] io scheduler mq-deadline registered
[ 10.279661] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 10.288104] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 10.293978] ACPI: button: Power Button [PWRF]
[ 10.297232] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[ 10.302935] ACPI: button: Sleep Button [SLPF]
[ 10.308443] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 10.338328] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 10.344745] Linux agpgart interface v0.103
[ 10.495684] loop: module loaded
[ 10.499384] nvme nvme0: pci function 0000:00:04.0
[ 10.502871] nvme nvme1: pci function 0000:00:1f.0
[ 10.506426] nvme nvme2: pci function 0000:10:1e.0
[ 10.510016] nvme nvme3: pci function 0000:10:1f.0
[ 10.513058] nvme nvme0: 2/0/0 default/read/poll queues
[ 10.513598] nvme nvme4: pci function 0000:20:1e.0
[ 10.514017] nvme nvme1: 2/0/0 default/read/poll queues
[ 10.515567] nvme1n1: p1
[ 10.522833] nvme0n1: p1 p128
[ 10.524378] nvme nvme2: 31/0/0 default/read/poll queues
[ 10.524500] nvme nvme5: pci function 0000:20:1f.0
[ 10.527448] nvme nvme3: 31/0/0 default/read/poll queues
[ 10.530241] nvme nvme6: pci function 0000:90:1e.0
[ 10.540680] nvme nvme4: 31/0/0 default/read/poll queues
[ 10.544012] nvme nvme7: pci function 0000:90:1f.0
[ 10.549469] nvme nvme5: 31/0/0 default/read/poll queues
[ 10.551655] nvme nvme8: pci function 0000:a0:1e.0
[ 10.558691] nvme nvme9: pci function 0000:a0:1f.0
[ 10.562593] tun: Universal TUN/TAP device driver, 1.6
[ 10.566629] PPP generic driver version 2.4.2
[ 10.570060] VFIO - User Level meta-driver version: 0.3
[ 10.573946] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 10.574648] nvme nvme6: 31/0/0 default/read/poll queues
[ 10.578358] ehci-pci: EHCI PCI platform driver
[ 10.579555] nvme nvme7: 31/0/0 default/read/poll queues
[ 10.580641] nvme nvme8: 31/0/0 default/read/poll queues
[ 10.584822] nvme nvme9: 31/0/0 default/read/poll queues
[ 10.585570] ehci-platform: EHCI generic platform driver
[ 10.599759] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 10.603676] ohci-pci: OHCI PCI platform driver
[ 10.606945] ohci-platform: OHCI generic platform driver
[ 10.610491] uhci_hcd: USB Universal Host Controller Interface driver
[ 10.614415] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 10.620620] i8042: Warning: Keylock active
[ 10.625013] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 10.628511] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 10.632017] mousedev: PS/2 mouse device common for all mice
[ 10.635848] rtc_cmos 00:00: RTC can wake from S4
[ 10.640113] rtc_cmos 00:00: registered as rtc0
[ 10.643583] rtc_cmos 00:00: setting system clock to 2022-07-28T07:10:09 UTC (1658992209)
[ 10.649716] rtc_cmos 00:00: alarms up to one day, 114 bytes nvram
[ 10.653517] i2c /dev entries driver
[ 10.656412] device-mapper: uevent: version 1.0.3
[ 10.659739] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: [email protected]
[ 10.665940] platform eisa.0: Probing EISA bus 0
[ 10.669197] platform eisa.0: EISA: Cannot allocate resource for mainboard
[ 10.673259] platform eisa.0: Cannot allocate resource for EISA slot 1
[ 10.677166] platform eisa.0: Cannot allocate resource for EISA slot 2
[ 10.681058] platform eisa.0: Cannot allocate resource for EISA slot 3
[ 10.684963] platform eisa.0: Cannot allocate resource for EISA slot 4
[ 10.688876] platform eisa.0: Cannot allocate resource for EISA slot 5
[ 10.692817] platform eisa.0: Cannot allocate resource for EISA slot 6
[ 10.696713] platform eisa.0: Cannot allocate resource for EISA slot 7
[ 10.700623] platform eisa.0: Cannot allocate resource for EISA slot 8
[ 10.704507] platform eisa.0: EISA: Detected 0 cards
[ 10.707836] intel_pstate: P-states controlled by the platform
[ 10.716839] ledtrig-cpu: registered to indicate activity on CPUs
[ 10.720867] drop_monitor: Initializing network drop monitor service
[ 10.724829] NET: Registered protocol family 10
[ 10.733828] Segment Routing with IPv6
[ 10.736905] NET: Registered protocol family 17
[ 10.740243] Key type dns_resolver registered
[ 10.756768] No MBM correction factor available
[ 10.760111] IPI shorthand broadcast: enabled
[ 10.763266] sched_clock: Marking stable (9579648027, 1180458248)->(11937101814, -1176995539)
[ 10.772450] registered taskstats version 1
[ 10.775586] Loading compiled-in X.509 certificates
[ 10.779634] Loaded X.509 cert 'Build time autogenerated kernel key: 1c87debd80b0db7d2d960450056c96567636ad46'
[ 10.786719] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[ 10.793768] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[ 10.800401] blacklist: Loading compiled-in revocation X.509 certificates
[ 10.804501] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing: 61482aa2830d0ab2ad5af10b7250da9033ddcef0'
[ 10.819215] zswap: loaded using pool lzo/zbud
[ 10.823050] Key type ._fscrypt registered
[ 10.826138] Key type .fscrypt registered
[ 10.829151] Key type fscrypt-provisioning registered
[ 10.836462] Key type encrypted registered
[ 10.839627] AppArmor: AppArmor sha1 policy hashing enabled
[ 10.843239] ima: No TPM chip found, activating TPM-bypass!
[ 10.846890] Loading compiled-in module X.509 certificates
[ 10.850821] Loaded X.509 cert 'Build time autogenerated kernel key: 1c87debd80b0db7d2d960450056c96567636ad46'
[ 10.857591] ima: Allocated hash algorithm: sha1
[ 10.860848] ima: No architecture policies found
[ 10.864083] evm: Initialising EVM extended attributes:
[ 10.867572] evm: security.selinux
[ 10.870411] evm: security.SMACK64
[ 10.873222] evm: security.SMACK64EXEC
[ 10.876090] evm: security.SMACK64TRANSMUTE
[ 10.879199] evm: security.SMACK64MMAP
[ 10.882150] evm: security.apparmor
[ 10.885011] evm: security.ima
[ 10.887665] evm: security.capability
[ 10.890563] evm: HMAC attrs: 0x1
[ 10.893697] PM: Magic number: 14:467:168
[ 10.897090] acpi device:31: hash matches
[ 10.900174] memory memory8796: hash matches
[ 10.903414] memory memory8115: hash matches
[ 10.906723] memory memory7024: hash matches
[ 10.909859] memory memory6869: hash matches
[ 10.913213] memory memory5583: hash matches
[ 10.916471] memory memory4747: hash matches
[ 10.919749] memory memory3461: hash matches
[ 10.923054] memory memory2370: hash matches
[ 10.926332] memory memory1534: hash matches
[ 10.929486] memory memory1088: hash matches
[ 10.932613] memory memory747: hash matches
[ 10.951521] RAS: Correctable Errors collector initialized.
[ 10.957517] Freeing unused decrypted memory: 2036K
[ 10.961744] Freeing unused kernel image (initmem) memory: 2896K
[ 10.999638] Write protecting the kernel read-only data: 30720k
[ 11.004401] Freeing unused kernel image (text/rodata gap) memory: 2036K
[ 11.009183] Freeing unused kernel image (rodata/data gap) memory: 1756K
[ 11.085472] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 11.089370] x86/mm: Checking user space page tables
[ 11.154485] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 11.158361] Run /init as init process
[ 11.161284] with arguments:
[ 11.161285] /init
[ 11.161286] with environment:
[ 11.161287] HOME=/
[ 11.161288] TERM=linux
[ 11.161288] BOOT_IMAGE=/boot/vmlinuz-5.13.0-1023-aws
[ 11.167986] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[ 11.703789] cryptd: max_cpu_qlen set to 1000
[ 11.704428] ena 0000:10:00.0: ENA device version: 0.10
[ 11.711004] ena 0000:10:00.0: ENA controller version: 0.0.1 implementation version 1
[ 11.728938] AVX2 version of gcm_enc/dec engaged.
[ 11.731808] ena 0000:10:00.0: Elastic Network Adapter (ENA) found at mem c6800000, mac addr 02:60:79:9a:33:fb
[ 11.740392] AES CTR mode by8 optimization enabled
[ 11.740479] ena 0000:10:01.0: ENA device version: 0.10
[ 11.747563] ena 0000:10:01.0: ENA controller version: 0.0.1 implementation version 1
[ 11.756358] md127: detected capacity change from 0 to 14685380608
[ 11.772286] ena 0000:10:01.0: Elastic Network Adapter (ENA) found at mem c6804000, mac addr 02:ea:29:2f:53:57
[ 11.780381] ena 0000:20:01.0: ENA device version: 0.10
[ 11.784156] ena 0000:20:01.0: ENA controller version: 0.0.1 implementation version 1
[ 11.799250] ena 0000:20:01.0: Elastic Network Adapter (ENA) found at mem ca800000, mac addr 02:d0:ac:04:56:c1
[ 11.806992] ena 0000:90:01.0: ENA device version: 0.10
[ 11.810686] ena 0000:90:01.0: ENA controller version: 0.0.1 implementation version 1
[ 11.814895] nvidia: loading out-of-tree module taints kernel.
[ 11.814895] nvidia: loading out-of-tree module taints kernel.
[ 11.814895] nvidia: loading out-of-tree module taints kernel.
[ 11.814898] nvidia: loading out-of-tree module taints kernel.
[ 11.814907] nvidia: module license 'NVIDIA' taints kernel.
[ 11.814907] nvidia: module license 'NVIDIA' taints kernel.
[ 11.814908] Disabling lock debugging due to kernel taint
[ 11.832937] ena 0000:90:01.0: Elastic Network Adapter (ENA) found at mem ce800000, mac addr 02:3b:22:6c:86:a3
[ 11.836646] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 11.840531] ena 0000:a0:01.0: ENA device version: 0.10
[ 11.953774] ena 0000:a0:01.0: ENA controller version: 0.0.1 implementation version 1
[ 11.964903] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 11.968838] ena 0000:a0:01.0: Elastic Network Adapter (ENA) found at mem d2800000, mac addr 02:7d:ff:c7:46:5d
[ 11.977967] nvidia-nvswitch: Probing device 0000:80:1a.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 11.987381] nvidia-nvswitch 0000:80:1a.0: can't derive routing for PCI INT A
[ 11.988891] ena 0000:10:01.0 ens33: renamed from eth1
[ 11.991671] nvidia-nvswitch 0000:80:1a.0: PCI INT A: no GSI - using ISA IRQ 10
[ 12.312689] nvidia-nvswitch0: using MSI
[ 12.512211] nvidia-nvswitch: Probing device 0000:80:1b.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 12.519211] nvidia-nvswitch 0000:80:1b.0: can't derive routing for PCI INT A
[ 12.523442] nvidia-nvswitch 0000:80:1b.0: PCI INT A: no GSI - using ISA IRQ 11
[ 12.523894] ena 0000:10:00.0 ens32: renamed from eth0
[ 12.585674] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
[ 12.803749] ena 0000:20:01.0 ens65: renamed from eth2
[ 12.856924] nvidia-nvswitch1: using MSI
[ 13.057462] nvidia-nvswitch: Probing device 0000:80:1c.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 13.064467] nvidia-nvswitch 0000:80:1c.0: can't derive routing for PCI INT A
[ 13.068705] nvidia-nvswitch 0000:80:1c.0: PCI INT A: no GSI - using ISA IRQ 11
[ 13.107884] ena 0000:90:01.0 ens129: renamed from eth3
[ 13.411286] nvidia-nvswitch2: using MSI
[ 13.653383] nvidia-nvswitch: Probing device 0000:80:1d.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 13.660431] nvidia-nvswitch 0000:80:1d.0: can't derive routing for PCI INT A
[ 13.664682] nvidia-nvswitch 0000:80:1d.0: PCI INT A: no GSI - using ISA IRQ 10
[ 13.676367] ena 0000:a0:01.0 ens161: renamed from eth4
[ 14.006394] nvidia-nvswitch3: using MSI
[ 14.207517] nvidia-nvswitch: Probing device 0000:80:1e.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 14.214864] nvidia-nvswitch 0000:80:1e.0: can't derive routing for PCI INT A
[ 14.219108] nvidia-nvswitch 0000:80:1e.0: PCI INT A: no GSI - using ISA IRQ 10
[ 14.558065] nvidia-nvswitch4: using MSI
[ 14.760095] nvidia-nvswitch: Probing device 0000:80:1f.0, Vendor Id = 0x10de, Device Id = 0x1af1, Class = 0x68000
[ 14.767383] nvidia-nvswitch 0000:80:1f.0: can't derive routing for PCI INT A
[ 14.771684] nvidia-nvswitch 0000:80:1f.0: PCI INT A: no GSI - using ISA IRQ 11
[ 15.111720] nvidia-nvswitch5: using MSI
[ 15.314290] nvidia 0000:10:1c.0: can't derive routing for PCI INT A
[ 15.318243] nvidia 0000:10:1c.0: PCI INT A: no GSI - using ISA IRQ 11
[ 15.372373] nvidia 0000:10:1d.0: can't derive routing for PCI INT A
[ 15.376381] nvidia 0000:10:1d.0: PCI INT A: no GSI - using ISA IRQ 10
[ 15.426467] nvidia 0000:20:1c.0: can't derive routing for PCI INT A
[ 15.430409] nvidia 0000:20:1c.0: PCI INT A: no GSI - using ISA IRQ 11
[ 15.481000] nvidia 0000:20:1d.0: can't derive routing for PCI INT A
[ 15.485009] nvidia 0000:20:1d.0: PCI INT A: no GSI - using ISA IRQ 10
[ 15.534587] nvidia 0000:90:1c.0: can't derive routing for PCI INT A
[ 15.538926] nvidia 0000:90:1c.0: PCI INT A: no GSI - using ISA IRQ 11
[ 15.588587] nvidia 0000:90:1d.0: can't derive routing for PCI INT A
[ 15.592611] nvidia 0000:90:1d.0: PCI INT A: no GSI - using ISA IRQ 10
[ 15.646342] nvidia 0000:a0:1c.0: can't derive routing for PCI INT A
[ 15.650261] nvidia 0000:a0:1c.0: PCI INT A: no GSI - using ISA IRQ 11
[ 15.700342] nvidia 0000:a0:1d.0: can't derive routing for PCI INT A
[ 15.704368] nvidia 0000:a0:1d.0: PCI INT A: no GSI - using ISA IRQ 10
[ 15.751639] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.73.08 Wed May 18 20:34:14 UTC 2022
[ 15.762982] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.73.08 Wed May 18 20:27:26 UTC 2022
[ 15.771992] [drm] [nvidia-drm] [GPU ID 0x0000101c] Loading driver
[ 15.775982] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:1c.0 on minor 0
[ 15.782159] [drm] [nvidia-drm] [GPU ID 0x0000101d] Loading driver
[ 15.786077] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:1d.0 on minor 1
[ 15.792033] [drm] [nvidia-drm] [GPU ID 0x0000201c] Loading driver
[ 15.795933] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:20:1c.0 on minor 2
[ 15.802281] [drm] [nvidia-drm] [GPU ID 0x0000201d] Loading driver
[ 15.806142] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:20:1d.0 on minor 3
[ 15.812058] [drm] [nvidia-drm] [GPU ID 0x0000901c] Loading driver
[ 15.815967] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:90:1c.0 on minor 4
[ 15.822050] [drm] [nvidia-drm] [GPU ID 0x0000901d] Loading driver
[ 15.825959] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:90:1d.0 on minor 5
[ 15.832019] [drm] [nvidia-drm] [GPU ID 0x0000a01c] Loading driver
[ 15.835927] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:a0:1c.0 on minor 6
[ 15.842002] [drm] [nvidia-drm] [GPU ID 0x0000a01d] Loading driver
[ 15.845768] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:a0:1d.0 on minor 7
[ 17.375598] raid6: avx512x4 gen() 286620 MB/s
[ 17.423598] raid6: avx512x4 xor() 132062 MB/s
[ 17.471598] raid6: avx512x2 gen() 286885 MB/s
[ 17.519598] raid6: avx512x2 xor() 494582 MB/s
[ 17.567600] raid6: avx512x1 gen() 287266 MB/s
[ 17.615598] raid6: avx512x1 xor() 448854 MB/s
[ 17.663599] raid6: avx2x4 gen() 286438 MB/s
[ 17.711600] raid6: avx2x4 xor() 126087 MB/s
[ 17.759599] raid6: avx2x2 gen() 286869 MB/s
[ 17.807597] raid6: avx2x2 xor() 363353 MB/s
[ 17.855598] raid6: avx2x1 gen() 218711 MB/s
[ 17.903598] raid6: avx2x1 xor() 308254 MB/s
[ 17.951600] raid6: sse2x4 gen() 195529 MB/s
[ 17.999598] raid6: sse2x4 xor() 122240 MB/s
[ 18.047599] raid6: sse2x2 gen() 211031 MB/s
[ 18.095598] raid6: sse2x2 xor() 128693 MB/s
[ 18.143600] raid6: sse2x1 gen() 194295 MB/s
[ 18.191599] raid6: sse2x1 xor() 102450 MB/s
[ 18.194844] raid6: using algorithm avx512x1 gen() 12696 MB/s
[ 18.198536] raid6: .... xor() 448854 MB/s, rmw enabled
[ 18.202001] raid6: using avx512x2 recovery algorithm
[ 18.206547] xor: automatically using best checksumming function avx
[ 18.212155] async_tx: api initialized (async)
[ 18.286138] Btrfs loaded, crc32c=crc32c-intel, zoned=yes
[ 18.372884] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 18.632655] systemd[1]: Inserted module 'autofs4'
[ 18.658125] systemd[1]: systemd 245.4-4ubuntu3.15 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[ 18.672705] systemd[1]: Detected virtualization kvm.
[ 18.676222] systemd[1]: Detected architecture x86-64.
[ 18.712678] systemd[1]: Set hostname to <ip-10-216-181-207>.
[ 18.903191] systemd[1]: Configuration file /etc/systemd/system/ufw.service.d/override.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 18.966349] systemd[1]: Configuration file /etc/systemd/system/sensei-tags.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 18.978798] systemd[1]: Configuration file /etc/systemd/system/sensei-init-script-setup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 18.991924] systemd[1]: Configuration file /etc/systemd/system/sensei-init-script.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.016129] systemd[1]: Configuration file /etc/systemd/system/sensei-fs-symlink.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.032760] systemd[1]: Configuration file /etc/systemd/system/process-exporter.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.053714] systemd[1]: Configuration file /etc/systemd/system/node_exporter.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.069726] systemd[1]: Configuration file /etc/systemd/system/mpproxy.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.082181] systemd[1]: Configuration file /etc/systemd/system/miniprom.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.096725] systemd[1]: Configuration file /etc/systemd/system/jupyterlab.service is marked executable. Please remove executable permission bits. Proceeding anyway.
[ 19.108339] systemd[1]: Configuration file /etc/systemd/system/jupyterlab-setup-user.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.133870] systemd[1]: Configuration file /usr/lib/systemd/system/docker.service.d/users-permission-docker-socket.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.150268] systemd[1]: Configuration file /etc/systemd/system/dcgm_exporter.service is marked executable. Please remove executable permission bits. Proceeding anyway.
[ 19.160867] systemd[1]: Configuration file /etc/systemd/system/dcgm_exporter.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.178047] systemd[1]: Configuration file /etc/systemd/system/sensei-init-script-started.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.193757] systemd[1]: Configuration file /etc/systemd/system/aws-mount-local-ssds.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[ 19.273537] systemd[1]: Created slice system-modprobe.slice.
[ 19.368068] systemd[1]: Created slice system-serial\x2dgetty.slice.
[ 19.373431] systemd[1]: Created slice User and Session Slice.
[ 19.378188] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[ 19.385378] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[ 19.392985] systemd[1]: Reached target Slices.
[ 19.396981] systemd[1]: Reached target Swap.
[ 19.400874] systemd[1]: Reached target System Time Set.
[ 19.405298] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[ 19.410548] systemd[1]: Listening on LVM2 poll daemon socket.
[ 19.415259] systemd[1]: Listening on multipathd control socket.
[ 19.420165] systemd[1]: Listening on Syslog Socket.
[ 19.424380] systemd[1]: Listening on fsck to fsckd communication Socket.
[ 19.429613] systemd[1]: Listening on initctl Compatibility Named Pipe.
[ 19.434849] systemd[1]: Listening on Journal Audit Socket.
[ 19.439509] systemd[1]: Listening on Journal Socket (/dev/log).
[ 19.444346] systemd[1]: Listening on Journal Socket.
[ 19.448676] systemd[1]: Listening on Network Service Netlink Socket.
[ 19.453786] systemd[1]: Listening on udev Control Socket.
[ 19.458407] systemd[1]: Listening on udev Kernel Socket.
[ 19.464055] systemd[1]: Mounting POSIX Message Queue File System...
[ 19.470506] systemd[1]: Mounting Kernel Debug File System...
[ 19.476356] systemd[1]: Mounting Kernel Trace File System...
[ 19.482630] systemd[1]: Starting Journal Service...
[ 19.487992] systemd[1]: Starting Elastic Fabric Adapter Configuration...
[ 19.494597] systemd[1]: Starting Set the console keyboard layout...
[ 19.500612] systemd[1]: Starting Create list of static device nodes for the current kernel...
[ 19.509263] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
[ 19.517388] systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
[ 19.523705] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
[ 19.531584] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[ 19.538210] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
[ 19.546951] systemd[1]: Starting Load Kernel Modules...
[ 19.553099] systemd[1]: Starting Remount Root and Kernel File Systems...
[ 19.559352] systemd[1]: Starting udev Coldplug all Devices...
[ 19.563520] EXT4-fs (nvme0n1p1): re-mounted. Opts: discard. Quota mode: none.
[ 19.566662] systemd[1]: Started Journal Service.
[ 19.577529] IPMI message handler: version 39.2
[ 19.584597] ipmi device interface
[ 19.589723] systemd-journald[1342]: Received client request to flush runtime journal.
[ 19.770073] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 19.775111] nvidia-uvm: Loaded the UVM driver, major device number 509.
[ 20.285935] parport_pc 00:03: reported by Plug and Play ACPI
[ 20.391337] ppdev: user-space parallel port driver
[ 20.411899] efa 0000:10:1b.0: Setup irq:0x0000000014e0030e vector:457 name:efa-mgmnt@pci:0000:10:1b.0
[ 20.417953] efa 0000:10:1b.0 efa_0: IB device registered
[ 20.533296] efa 0000:20:1b.0: Setup irq:0x00000000cd6abbb1 vector:458 name:efa-mgmnt@pci:0000:20:1b.0
[ 20.552824] efa 0000:20:1b.0 efa_1: IB device registered
[ 20.664401] efa 0000:90:1b.0: Setup irq:0x00000000b2317bb1 vector:459 name:efa-mgmnt@pci:0000:90:1b.0
[ 20.667181] efa 0000:90:1b.0 efa_2: IB device registered
[ 20.776003] efa 0000:a0:1b.0: Setup irq:0x00000000033d5165 vector:460 name:efa-mgmnt@pci:0000:a0:1b.0
[ 20.778647] efa 0000:a0:1b.0 efa_3: IB device registered
[ 20.821481] Loading iSCSI transport class v2.0-870.
[ 20.859415] iscsi: registered transport (iser)
[ 22.065010] alua: device handler registered
[ 22.066305] emc: device handler registered
[ 22.068331] rdac: device handler registered
[ 22.116965] loop0: detected capacity change from 0 to 51152
[ 22.272003] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[ 22.275133] XFS (nvme1n1p1): Mounting V5 Filesystem
[ 22.368650] XFS (nvme1n1p1): Ending clean mount
[ 22.388749] xfs filesystem being mounted at /var/tmp supports timestamps until 2038 (0x7fffffff)
[ 22.423946] loop1: detected capacity change from 0 to 113792
[ 22.579808] loop2: detected capacity change from 0 to 137712
[ 22.651800] loop3: detected capacity change from 0 to 138880
[ 22.707800] loop4: detected capacity change from 0 to 51416
[ 22.755789] loop5: detected capacity change from 0 to 96176
[ 22.907796] loop6: detected capacity change from 0 to 113736
[ 22.939773] loop7: detected capacity change from 0 to 91496
[ 23.139816] loop8: detected capacity change from 0 to 126824
[ 23.527831] loop9: detected capacity change from 0 to 126888
[ 23.694332] bpfilter: Loaded bpfilter_umh pid 2006
[ 23.694557] Started bpfilter
[ 23.696898] audit: type=1400 audit(1658992222.551:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1997 comm="apparmor_parser"
[ 23.697214] audit: type=1400 audit(1658992222.551:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1999 comm="apparmor_parser"
[ 23.697221] audit: type=1400 audit(1658992222.551:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1999 comm="apparmor_parser"
[ 23.697718] audit: type=1400 audit(1658992222.551:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=2001 comm="apparmor_parser"
[ 23.697725] audit: type=1400 audit(1658992222.551:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=2001 comm="apparmor_parser"
[ 23.697731] audit: type=1400 audit(1658992222.551:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=2001 comm="apparmor_parser"
[ 23.700007] audit: type=1400 audit(1658992222.555:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/chronyd" pid=2003 comm="apparmor_parser"
[ 23.700492] audit: type=1400 audit(1658992222.555:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=1995 comm="apparmor_parser"
[ 23.700499] audit: type=1400 audit(1658992222.555:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=1995 comm="apparmor_parser"
[ 23.702866] audit: type=1400 audit(1658992222.555:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=2000 comm="apparmor_parser"
[ 31.532159] LNet: HW NUMA nodes: 2, HW CPU cores: 96, npartitions: 2
[ 31.644359] kauditd_printk_skb: 33 callbacks suppressed
[ 31.644362] audit: type=1400 audit(1658992230.499:45): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/usr/share/sssd/cfg_rules.ini" pid=2317 comm="sssd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 31.789641] audit: type=1400 audit(1658992230.643:46): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/passwd" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.790121] audit: type=1400 audit(1658992230.643:47): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/passwd" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.797319] audit: type=1400 audit(1658992230.651:48): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/group" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.797374] audit: type=1400 audit(1658992230.651:49): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/group" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.802890] audit: type=1400 audit(1658992230.655:50): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/initgroups" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.802895] audit: type=1400 audit(1658992230.655:51): apparmor="ALLOWED" operation="file_lock" profile="/usr/sbin/sssd" name="/var/lib/sss/mc/initgroups" pid=2494 comm="sssd_nss" requested_mask="k" denied_mask="k" fsuid=0 ouid=0
[ 31.900127] aufs 5.x-rcN-20210809
[ 32.159729] audit: type=1400 audit(1658992231.015:52): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/2579/cmdline" pid=2494 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 32.426004] Lustre: Lustre: Build Version: 2.10.8
[ 32.566500] LNet: 2732:0:(config.c:1637:lnet_inet_enumerate()) lnet: Ignoring interface ens33: it's down
[ 32.566738] LNet: Added LNI 10.216.181.207@tcp [8/256/0/180]
[ 32.566783] LNet: Accept secure, port 988
[ 32.607391] audit: type=1400 audit(1658992231.459:53): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=2946 comm="apparmor_parser"
[ 32.836335] Lustre: ggazbbmv: root_squash is set to 65534:65534
[ 32.992725] Lustre: ggazbbmv: nosquash_nids set to 10.216.139.147@tcp 10.216.139.89@tcp 10.216.139.62@tcp 10.216.138.45@tcp 10.216.139.166@tcp 10.216.139.161@tcp 10.216.143.143@tcp 10.216.140.188@tcp 10.216.139.205@tcp *@tcp1 0@lo
[ 33.096625] audit: type=1400 audit(1658992231.951:54): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/3044/cmdline" pid=2494 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 33.378933] EXT4-fs (md127): mounted filesystem without journal. Opts: (null). Quota mode: none.
[ 33.411942] Lustre: Mounted ggazbbmv-client
[ 33.530008] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 33.533601] Bridge firewalling registered
[ 33.585962] Initializing XFRM netlink socket
[ 35.323901] loop10: detected capacity change from 0 to 8
[ 35.378685] docker0: port 1(veth8012af0) entered blocking state
[ 35.378688] docker0: port 1(veth8012af0) entered disabled state
[ 35.378749] device veth8012af0 entered promiscuous mode
[ 35.419367] nvidia-nvswitch0: open (major=511)
[ 35.419642] nvidia-nvswitch1: open (major=511)
[ 35.419833] nvidia-nvswitch2: open (major=511)
[ 35.420018] nvidia-nvswitch3: open (major=511)
[ 35.420198] nvidia-nvswitch4: open (major=511)
[ 35.420391] nvidia-nvswitch5: open (major=511)
[ 35.635327] docker0: port 1(veth8012af0) entered disabled state
[ 35.636652] device veth8012af0 left promiscuous mode
[ 35.636656] docker0: port 1(veth8012af0) entered disabled state
[ 37.137797] kauditd_printk_skb: 30 callbacks suppressed
[ 37.137802] audit: type=1400 audit(1658992235.991:85): apparmor="DENIED" operation="ptrace" profile="/snap/snapd/16292/usr/lib/snapd/snap-confine" pid=3681 comm="ps" requested_mask="readby" denied_mask="readby" peer="snap.amazon-ssm-agent.amazon-ssm-agent"
[ 37.146049] audit: type=1400 audit(1658992235.999:86): apparmor="DENIED" operation="ptrace" profile="/snap/snapd/16292/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=3681 comm="ps" requested_mask="readby" denied_mask="readby" peer="snap.amazon-ssm-agent.amazon-ssm-agent"
[ 37.146059] audit: type=1400 audit(1658992235.999:87): apparmor="DENIED" operation="ptrace" profile="snap-update-ns.lxd" pid=3681 comm="ps" requested_mask="readby" denied_mask="readby" peer="snap.amazon-ssm-agent.amazon-ssm-agent"
[ 42.786983] audit: type=1400 audit(1658992243.003:88): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/3818/cmdline" pid=2494 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=91223
[ 47.655162] nvidia-modeset: ERROR: Failed to find GPU ID
[ 58.797258] nvidia-nvlink: nvlink driver open
[ 58.797263] nvidia-nvlink: nvlink driver close
[ 58.797265] nvidia-nvlink: nvlink driver open
[ 70.100775] nvidia-nvswitch0: open (major=511)
[ 70.100804] nvidia-nvswitch0: open (major=511)
[ 70.100814] nvidia-nvswitch0: open (major=511)
[ 70.119720] nvidia-nvswitch0: open (major=511)
[ 70.119748] nvidia-nvswitch1: open (major=511)
[ 70.119766] nvidia-nvswitch1: open (major=511)
[ 70.119775] nvidia-nvswitch1: open (major=511)
[ 70.119784] nvidia-nvswitch1: open (major=511)
[ 70.119794] nvidia-nvswitch2: open (major=511)
[ 70.119803] nvidia-nvswitch2: open (major=511)
[ 70.119812] nvidia-nvswitch2: open (major=511)
[ 70.119822] nvidia-nvswitch2: open (major=511)
[ 70.119832] nvidia-nvswitch3: open (major=511)
[ 70.119842] nvidia-nvswitch3: open (major=511)
[ 70.119851] nvidia-nvswitch3: open (major=511)
[ 70.119860] nvidia-nvswitch3: open (major=511)
[ 70.119881] nvidia-nvswitch4: open (major=511)
[ 70.119884] nvidia-nvswitch4: open (major=511)
[ 70.119888] nvidia-nvswitch4: open (major=511)
[ 70.119891] nvidia-nvswitch4: open (major=511)
[ 70.119895] nvidia-nvswitch5: open (major=511)
[ 70.119898] nvidia-nvswitch5: open (major=511)
[ 70.119901] nvidia-nvswitch5: open (major=511)
[ 70.119904] nvidia-nvswitch5: open (major=511)
[ 88.547686] python[4034]: segfault at 9 ip 00007fb57226fa24 sp 00007fb483ffede0 error 4 in libc-2.31.so[7fb5721fc000+178000]
[ 88.547697] Code: c9 0f 11 4b 20 48 89 ee 66 48 0f 6e c0 48 83 ce 01 0f 16 44 24 08 48 89 73 08 0f 11 43 10 49 89 2c 24 48 85 d2 74 8f 48 89 d3 <48> 8b 43 08 89 c2 c1 ea 04 83 ea 02 49 8d 54 d7 10 49 39 d5 0f 85
[ 88.561612] python[4041]: segfault at 9 ip 00007f00a78d7a24 sp 00007effb8268de0 error 4 in libc-2.31.so[7f00a7864000+178000]
[ 88.561623] Code: c9 0f 11 4b 20 48 89 ee 66 48 0f 6e c0 48 83 ce 01 0f 16 44 24 08 48 89 73 08 0f 11 43 10 49 89 2c 24 48 85 d2 74 8f 48 89 d3 <48> 8b 43 08 89 c2 c1 ea 04 83 ea 02 49 8d 54 d7 10 49 39 d5 0f 85
[ 97.583987] docker0: port 1(vetha1e8df7) entered blocking state
[ 97.583993] docker0: port 1(vetha1e8df7) entered disabled state
[ 97.584054] device vetha1e8df7 entered promiscuous mode
[ 97.637097] docker0: port 1(vetha1e8df7) entered disabled state
[ 97.638089] device vetha1e8df7 left promiscuous mode
[ 97.638092] docker0: port 1(vetha1e8df7) entered disabled state
[ 139.870984] audit: type=1400 audit(1658992340.091:89): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4331/cmdline" pid=2498 comm="sssd_sudo" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 139.875658] audit: type=1400 audit(1658992340.095:90): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4331/cmdline" pid=2494 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 139.900535] audit: type=1400 audit(1658992340.123:91): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4331/cmdline" pid=2495 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 155.035263] audit: type=1400 audit(1658992355.255:92): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4340/cmdline" pid=2498 comm="sssd_sudo" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 155.038672] audit: type=1400 audit(1658992355.259:93): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4340/cmdline" pid=2494 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 155.045388] audit: type=1400 audit(1658992355.267:94): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/proc/4340/cmdline" pid=2495 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
from aws-ofi-nccl.
Reading through libfabric code for this error, to me, it looks like the max locked memory set by the installers not getting honored. EFA installation created a file /etc/security/limits.d/efa.conf
but whenever I run ulimit -l
, I still get the system default value of 64.
I went ahead and made modifications to /etc/systemd/system.conf
and /etc/systemd/user.conf
to add DefaultLimitMEMLOCK=107374182400
After reboot, ulimit -l
showed 104857600
and subsequently I was able to run some of these tests.
I there a reason why the configuration in /etc/security/limits.d/efa.conf
is not getting honored?
from aws-ofi-nccl.
Is there any other file under /etc/security/limits.d
that has higher priority than efa.conf
?
from aws-ofi-nccl.
No. This is the only file under /etc/security/limits.d
from aws-ofi-nccl.
Did you log out and log back in after installing the EFA installer?
from aws-ofi-nccl.
Yes. We reboot the machine after the installation is completed.
from aws-ofi-nccl.
Hi,
Can you let us know what OS are you using?
from aws-ofi-nccl.
Ubuntu 20.04 with CIS benchmarks applied for security hardening.
from aws-ofi-nccl.
can you check whether you have a line like
session required pam_limits.so
in /etc/pam.d/system-auth
?
from aws-ofi-nccl.
@taruntandon88 Any updates on Wei's question?
from aws-ofi-nccl.
Related Issues (20)
- WARNING: unrecognized options: --with-nccl when attempting to install HOT 10
- Mellanox and EFA in Docker Image HOT 6
- NCCL WARN NET/OFI Only EFA provider is supported HOT 2
- potential reoccurrence of https://github.com/aws/aws-ofi-nccl/issues/69 HOT 1
- aws branch does not build on centos 7 with gcc 4.8.5 HOT 2
- Support Ubuntu 22.04 HOT 4
- Support FI_CONTEXT2 HOT 2
- Misleading comparison on unsigned integer
- Plugin fails if compiled against Libfabric 1.18 but run against Libfabric 1.17 or older. HOT 11
- Unable to find libcudart.so (1.7.1) HOT 6
- Running nccl-perf tests documentation is missing MPI instructions HOT 3
- What are some AI/ML workloads users can utilize to test performance of the plugin?
- Unable to force FI_HMEM to be used and FI_OPT_CUDA_API_PERMITTED is not respected by config scripts HOT 4
- Support Amazon Linux 2023 (AL2023) HOT 2
- Support Red Hat Enterprise Linux 9+ HOT 4
- Add more examples with more recent cuda versions HOT 2
- Topology Discovery Regression HOT 2
- GPU direct HOT 1
- NCCL internal error after aws-ofi-nccl upgrade to version 1.7.4 HOT 6
- Segfault after/during finalize with OpenMPI HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-ofi-nccl.