TODO: Create a file called LICENSE (not LICENSE.TXT, LICENSE.md, etc.)โฆ
microsoft / azure-linux-kernel Goto Github PK
View Code? Open in Web Editor NEWPatches for building an Azure-tuned Linux kernel.
License: Other
Patches for building an Azure-tuned Linux kernel.
License: Other
TODO: Create a file called LICENSE (not LICENSE.TXT, LICENSE.md, etc.)โฆ
BackPorted required changes of pci-hyperv on Linux 4.9, this driver works.
But getting following issues with nvme:
[74530.686555] do_IRQ: 10.232 No irq handler for vector
[74530.712068] do_IRQ: 10.232 No irq handler for vector
[74530.737579] do_IRQ: 10.232 No irq handler for vector
[74530.763092] do_IRQ: 10.232 No irq handler for vector
[74532.832221] nvme nvme1: I/O 206 QID 6 timeout, reset controller
[74532.873967] nvme nvme1: completing aborted command with status: fffffffc
[74532.873971] blk_update_request: I/O error, dev nvme1n1, sector 1048320
Back-ported the following patch, but still facing same issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/drivers/nvme/host/pci.c?id=0ff199cb48b4af6f29a1bf15d92d93f44a22eeb4
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
This repository is currently missing a LICENSE file.
A license helps users understand how to use your project in a compliant manner. You can find the standard MIT license Microsoft uses at: https://github.com/microsoft/repo-templates/blob/main/shared/LICENSE.
If you would like to learn more about open source licenses, please visit the document at https://aka.ms/license (Microsoft-internal guidance).
Azure VM with 4.15.0-1041-azure
kernel cannot install Mellanox OFED driver (same issue for 4.3-*
, 4.4-*
, 4.5-*
).
Here's part of the log after executing ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv
(having run mlnx_add_kernel_support.sh
before to add kernel support).
Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):
libibumad
libopensm
libibmad
infiniband-diags
ofed-scripts
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-modules
iser-modules
isert-modules
srp-modules
mlnx-nfsrdma-modules
mlnx-rdma-rxe-modules
kernel-mft-modules
knem-modules
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.
Checking SW Requirements...
Running: dpkg --configure -a --force-all
Running: apt-get install -f
Removing old packages...
Installing new packages
Installing libibumad-43.1.1.MLNX20171122.0eb0969...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibumad_43.1.1.MLNX20171122.0eb0969-0.1.43101_amd64.deb
Installing libopensm-5.0.0.MLNX20180219.c610c42...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libopensm_5.0.0.MLNX20180219.c610c42-0.1.43101_amd64.deb
Installing libibmad-1.3.13.MLNX20170511.267a441...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibmad_1.3.13.MLNX20170511.267a441-0.1.43101_amd64.deb
Installing infiniband-diags-5.0.0.MLNX20180124.dfd2235...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/infiniband-diags_5.0.0.MLNX20180124.dfd2235-0.1.43101_amd64.deb
Installing ofed-scripts-4.3...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/ofed-scripts_4.3-OFED.4.3.1.0.1_amd64.deb
Installing mlnx-ofed-kernel-utils-4.3...
Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-utils_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_amd64.deb
Installing mlnx-ofed-kernel-modules-4.3...
Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb
Error: mlnx-ofed-kernel-modules installation failed!
Collecting debug info...
See:
/tmp/MLNX_OFED_LINUX.31695.logs/mlnx-ofed-kernel-modules.debinstall.log
Removing newly installed packages...
Running: /usr/sbin/ofed_uninstall.sh --force --keep-mft
Here's part of the log file:
/usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb
Selecting previously unselected package mlnx-ofed-kernel-modules.
(Reading database ... 33122 files and directories currently installed.)
Preparing to unpack .../mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb ...
D000002: maintscript_new nonexistent preinst '/var/lib/dpkg/tmp.ci/preinst'
Unpacking mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) ...
D000002: process_archive tmp.ci script/file '.' contains dot
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postinst' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst'
D000002: process_archive tmp.ci script/file '..' contains dot
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/control' is control
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postrm' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postrm'
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/md5sums' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.md5sums'
Setting up mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) ...
D000002: fork/exec /var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst ( configure )
---------------- START OF DEBUG INFO -------------------
Install command: ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv
Vars dump:
- ofedlogs: /tmp/MLNX_OFED_LINUX.9852.logs
- MLNX_OFED_LINUX_VERSION: 4.3-1.0.1.0
- MLNX_OFED_ARCH: x86_64
- MLNX_OFED_DISTRO: ubuntu16.04
- distro: ubuntu16.04
- arch: x86_64
- kernel: 4.15.0-1041-azure
- config: /tmp/ofed.conf
- update_firmware: 0
Setup info:
- uname -r: 4.15.0-1041-azure
- uname -m: x86_64
- lsb_release -a: No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
- cat /etc/issue: Ubuntu 16.04.6 LTS \n \l
- cat /proc/version: Linux version 4.15.0-1041-azure (buildd@lcy01-amd64-013) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #45-Ubuntu SMP Fri Mar 15 14:41:00 UTC 2019
- gcc --version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
The command /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb
was executed successfully, but mlnx-ofed-kernel-modules haven't been made after that. Following commands outputs empty.
$ depmod -a
$ lsmod | grep mlnx
The issue occurs after Azure VM upgrading to 4.15.0-1041-azure kernel automatically.
Hi Team,
one of the customer using our solution, a custom image which is based on Linux kernel 4.14.51 and Centos 7.8. Customer is facing random traffic loss in production on netvsc interfaces (non-accelerated) . P.S. deployment size is ~600 VM instances.
(2) we have rebuild the kernel with below 2-patches as this symptom (napi gets disable when ring is temporary busy ) is similar to issue mentioned in #36
(1) hv_netvsc: Fix napi reschedule while receive completion is busy
(2) hv_netvsc: fix race that may miss tx queue wakeup
(3) now with these patches there is some improvement in the sense few instances are getting into this problem , but issue still persists(~5 out of ~200) . On these bad instances ethtool stats shows very high 'rx_comp_busy & tx_send_full' as shown below. I think super high 'rx_comp_busy' is expected after these patches
-bash-4.2# ethtool -S eth2
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 417979<<<<<<<<<<<<<<<<<<<
rx_comp_busy: 36978379935<<<<<<<<<<<<<< rapid fast increments
vf_rx_packets: 0
vf_rx_bytes: 0
vf_tx_packets: 0
vf_tx_bytes: 0
vf_tx_dropped: 0
tx_queue_0_packets: 22487545
tx_queue_0_bytes: 4594218563
rx_queue_0_packets: 33816104
rx_queue_0_bytes: 3148800004
tx_queue_1_packets: 23095847
tx_queue_1_bytes: 4629433827
rx_queue_1_packets: 34169457
rx_queue_1_bytes: 3198473995
tx_queue_2_packets: 22235899
tx_queue_2_bytes: 4554101089
rx_queue_2_packets: 35447873
rx_queue_2_bytes: 3306351633
tx_queue_3_packets: 22655564
tx_queue_3_bytes: 4658776077
rx_queue_3_packets: 34320559
rx_queue_3_bytes: 3200636386
tx_queue_4_packets: 43152346
tx_queue_4_bytes: 17461777045
rx_queue_4_packets: 34941411
rx_queue_4_bytes: 3240195702
tx_queue_5_packets: 22992696
tx_queue_5_bytes: 4613837166
rx_queue_5_packets: 32975505
rx_queue_5_bytes: 3079512739
tx_queue_6_packets: 22535083
tx_queue_6_bytes: 4672503110
rx_queue_6_packets: 33796904
rx_queue_6_bytes: 3159691807
tx_queue_7_packets: 22452840
tx_queue_7_bytes: 4584966389
rx_queue_7_packets: 33860772
rx_queue_7_bytes: 3155304090
rx_queue_7_bytes: 6525418289
I would request azure team to provide list of patches that we can try with 4.14.51 kernel as LIS option is not applicable to us.
please let me know if I can provide any additional details.
Hello,
I'm trying to get mellanox drivers working on 4.9.76 branch for Azure accelerated Networking. This is what happens in a "Standard_D16_v3" instance.
[ 18.827068] hv_vmbus: registering driver hv_netvsc
[ 18.827989] hv_netvsc: hv_netvsc channel opened successfully
[ 18.845437] hv_netvsc 000d3a2e-fcbb-000d-3a2e-fcbb000d3a2e: Send section size: 6144, Section count:2560
[ 18.846091] hv_netvsc 000d3a2e-fcbb-000d-3a2e-fcbb000d3a2e: Device MAC 00:0d:3a:2e:fc:bb link state up
[ 21.890743] hv_utils: Registering HyperV Utility Driver
[ 21.890744] hv_vmbus: registering driver hv_util
[ 21.891766] hv_utils: Using TimeSync version 4.0
[ 21.901584] pps_core: LinuxPPS API ver. 1 registered
[ 21.901596] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
and later on
[ 51.336993] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 51.340963] mlx4_core: Initializing bd45:00:02.0
[ 51.353391] mlx4_core bd45:00:02.0: Detected virtual function - running in slave mode
[ 51.358380] mlx4_core bd45:00:02.0: Sending reset
[ 51.361118] mlx4_core bd45:00:02.0: Sending vhcr0
[ 51.366673] mlx4_core bd45:00:02.0: HCA minimum page size:512
[ 51.370358] mlx4_core bd45:00:02.0: Timestamping is not supported in slave mode
[ 51.383500] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.387946] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.393170] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.397522] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.402003] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.406938] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.411471] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.415622] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.420364] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.424936] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.429269] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.433900] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.439128] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.443571] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.448011] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.453216] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.457835] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[ 51.546176] mlx4_core bd45:00:02.0: Failed to initialize event queue table, aborting
[ 51.560164] mlx4_core: probe of bd45:00:02.0 failed with error -12
Hi ,
I am trying to enable SR-IOV in my Customized Linux whose kernel version is also 4.4.XX, I patch the code according to README, it's really spent me a lot of time to patch 0003-OFED-port-to-4.4.patch,
but it's harder to pass the compilation. many problems I have to solve, One of the big problem I encounter now is that. it seems that you ignore the file linux-4.4/include/linux/rhashtable.h and you create a new file named linux-4.4/include/linux/mlx_rhashtable.h which has most of the interface with rhashtable.h, these two files mess around, you changed a lot some interface in rhashtable.h, it's impossible to get passed the compilation.
Could some guys look into this issue?
The last contribution to this project was from Oct 2020, is it still actively maintained ?
Hi there,
I am tracing a production network issue in Azure cloud. We see random outbound timeout on some of the VMs, on those VM, TCPdump showed a lot of sync retranssmission to external host.
When we look the output for "ethtool -S eth0", the only abnomal stat is tx_send_full
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 17464
I traced the kernel code, and it seems this is the place where the code start from. So, can you explain what is this metrics and why it increases?
Hello,
I'm looking for minimal set of CONFIG params around the AN feature.
Obviously CONFIG_PCI_HYPERV is needed but away than that is there any other things that are mandatory ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.