Giter VIP home page Giter VIP logo

azure-linux-kernel's Introduction

TODO: Create a file called LICENSE (not LICENSE.TXT, LICENSE.md, etc.)โ€ฆ

azure-linux-kernel's People

Contributors

dcui avatar haiyangz avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar mkaniganti avatar msftgits avatar shemminger avatar shirgall avatar simonxiaoss avatar szarkos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-linux-kernel's Issues

Linux 4.9: nvme-pci: No irq handler for vector

BackPorted required changes of pci-hyperv on Linux 4.9, this driver works.

But getting following issues with nvme:

[74530.686555] do_IRQ: 10.232 No irq handler for vector
[74530.712068] do_IRQ: 10.232 No irq handler for vector
[74530.737579] do_IRQ: 10.232 No irq handler for vector
[74530.763092] do_IRQ: 10.232 No irq handler for vector
[74532.832221] nvme nvme1: I/O 206 QID 6 timeout, reset controller
[74532.873967] nvme nvme1: completing aborted command with status: fffffffc
[74532.873971] blk_update_request: I/O error, dev nvme1n1, sector 1048320

Back-ported the following patch, but still facing same issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/drivers/nvme/host/pci.c?id=0ff199cb48b4af6f29a1bf15d92d93f44a22eeb4

Cannot install Mellanox OFED driver with 4.15.0-1041-azure kernel

Azure VM with 4.15.0-1041-azure kernel cannot install Mellanox OFED driver (same issue for 4.3-*, 4.4-*, 4.5-*).

Here's part of the log after executing ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv (having run mlnx_add_kernel_support.sh before to add kernel support).

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

libibumad
libopensm
libibmad
infiniband-diags
ofed-scripts
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-modules
iser-modules
isert-modules
srp-modules
mlnx-nfsrdma-modules
mlnx-rdma-rxe-modules
kernel-mft-modules
knem-modules

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Checking SW Requirements...
Running: dpkg --configure -a --force-all
Running: apt-get install -f
Removing old packages...
Installing new packages
Installing libibumad-43.1.1.MLNX20171122.0eb0969...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibumad_43.1.1.MLNX20171122.0eb0969-0.1.43101_amd64.deb
Installing libopensm-5.0.0.MLNX20180219.c610c42...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libopensm_5.0.0.MLNX20180219.c610c42-0.1.43101_amd64.deb
Installing libibmad-1.3.13.MLNX20170511.267a441...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/libibmad_1.3.13.MLNX20170511.267a441-0.1.43101_amd64.deb
Installing infiniband-diags-5.0.0.MLNX20180124.dfd2235...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/infiniband-diags_5.0.0.MLNX20180124.dfd2235-0.1.43101_amd64.deb
Installing ofed-scripts-4.3...
Running /usr/bin/dpkg -i --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/ofed-scripts_4.3-OFED.4.3.1.0.1_amd64.deb
Installing mlnx-ofed-kernel-utils-4.3...
Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-utils_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_amd64.deb
Installing mlnx-ofed-kernel-modules-4.3...
Running /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb

Error: mlnx-ofed-kernel-modules installation failed!
Collecting debug info...
See:
        /tmp/MLNX_OFED_LINUX.31695.logs/mlnx-ofed-kernel-modules.debinstall.log
Removing newly installed packages...

Running: /usr/sbin/ofed_uninstall.sh --force  --keep-mft

Here's part of the log file:

/usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb
Selecting previously unselected package mlnx-ofed-kernel-modules.
(Reading database ... 33122 files and directories currently installed.)
Preparing to unpack .../mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb ...
D000002: maintscript_new nonexistent preinst '/var/lib/dpkg/tmp.ci/preinst'
Unpacking mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) ...
D000002: process_archive tmp.ci script/file '.' contains dot
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postinst' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst'
D000002: process_archive tmp.ci script/file '..' contains dot
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/control' is control
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/postrm' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.postrm'
D000002: process_archive tmp.ci script/file '/var/lib/dpkg/tmp.ci/md5sums' installed as '/var/lib/dpkg/info/mlnx-ofed-kernel-modules.md5sums'
Setting up mlnx-ofed-kernel-modules (4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure) ...
D000002: fork/exec /var/lib/dpkg/info/mlnx-ofed-kernel-modules.postinst ( configure  )


---------------- START OF DEBUG INFO -------------------
Install command: ./mlnxofedinstall --force --kernel-only --without-dkms --without-fw-update --with-infiniband-diags --package-install-options -D2 -vv

Vars dump:
- ofedlogs: /tmp/MLNX_OFED_LINUX.9852.logs
- MLNX_OFED_LINUX_VERSION: 4.3-1.0.1.0
- MLNX_OFED_ARCH: x86_64
- MLNX_OFED_DISTRO: ubuntu16.04
- distro: ubuntu16.04
- arch: x86_64
- kernel: 4.15.0-1041-azure
- config: /tmp/ofed.conf
- update_firmware: 0

Setup info:

- uname -r: 4.15.0-1041-azure

- uname -m: x86_64

- lsb_release -a: No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial

- cat /etc/issue: Ubuntu 16.04.6 LTS \n \l


- cat /proc/version: Linux version 4.15.0-1041-azure (buildd@lcy01-amd64-013) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #45-Ubuntu SMP Fri Mar 15 14:41:00 UTC 2019

- gcc --version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609

The command /usr/bin/dpkg -i --force-confnew --force-confmiss -D2 /var/drivers/mellanox/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64/DEBS/mlnx-ofed-kernel-modules_4.3-OFED.4.3.1.0.1.1.g8509e41.kver.4.15.0-1041-azure_all.deb was executed successfully, but mlnx-ofed-kernel-modules haven't been made after that. Following commands outputs empty.

$ depmod -a
$ lsmod | grep mlnx

The issue occurs after Azure VM upgrading to 4.15.0-1041-azure kernel automatically.

netvsc: high rx_comp_busy and tx_send_full + traffic loss

Hi Team,
one of the customer using our solution, a custom image which is based on Linux kernel 4.14.51 and Centos 7.8. Customer is facing random traffic loss in production on netvsc interfaces (non-accelerated) . P.S. deployment size is ~600 VM instances.

  1. on problematic instances 'rx_comp_busy' is always non zero(=1) and a high of 'tx_send_full' as shown below
    -bash-4.2# ethtool -S eth2
    NIC statistics:
    tx_scattered: 0
    tx_no_memory: 0
    tx_no_space: 0
    tx_too_big: 0
    tx_busy: 0
    tx_send_full: 75776 <<<<<<<<<<<<<
    rx_comp_busy: 1 <<<<<<<<<<<<<<<<
    vf_rx_packets: 0
    vf_rx_bytes: 0
    vf_tx_packets: 0
    vf_tx_bytes: 0
    vf_tx_dropped: 0
    tx_queue_0_packets: 48323650
    tx_queue_0_bytes: 9856533412
    rx_queue_0_packets: 70704892
    rx_queue_0_bytes: 6523868834
    tx_queue_1_packets: 44242587
    tx_queue_1_bytes: 9561505139
    rx_queue_1_packets: 67683390
    rx_queue_1_bytes: 6248204528
    tx_queue_2_packets: 45780035
    tx_queue_2_bytes: 10119440310
    rx_queue_2_packets: 69738233
    rx_queue_2_bytes: 6443619208
    tx_queue_3_packets: 44413637
    tx_queue_3_bytes: 9640385380
    rx_queue_3_packets: 69258427
    rx_queue_3_bytes: 6396199857
    tx_queue_4_packets: 96161043
    tx_queue_4_bytes: 43152567515
    rx_queue_4_packets: 68506662
    rx_queue_4_bytes: 6329763902
    tx_queue_5_packets: 42685859
    tx_queue_5_bytes: 9232930840
    rx_queue_5_packets: 68869195
    rx_queue_5_bytes: 6360734718
    tx_queue_6_packets: 44105935
    tx_queue_6_bytes: 9641517238
    rx_queue_6_packets: 71297219
    rx_queue_6_bytes: 6568436535
    tx_queue_7_packets: 44680296
    tx_queue_7_bytes: 9764630663
    rx_queue_7_packets: 70747471

(2) we have rebuild the kernel with below 2-patches as this symptom (napi gets disable when ring is temporary busy ) is similar to issue mentioned in #36

(1) hv_netvsc: Fix napi reschedule while receive completion is busy
(2) hv_netvsc: fix race that may miss tx queue wakeup

(3) now with these patches there is some improvement in the sense few instances are getting into this problem , but issue still persists(~5 out of ~200) . On these bad instances ethtool stats shows very high 'rx_comp_busy & tx_send_full' as shown below. I think super high 'rx_comp_busy' is expected after these patches

-bash-4.2# ethtool -S eth2
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 417979<<<<<<<<<<<<<<<<<<<
rx_comp_busy: 36978379935<<<<<<<<<<<<<< rapid fast increments
vf_rx_packets: 0
vf_rx_bytes: 0
vf_tx_packets: 0
vf_tx_bytes: 0
vf_tx_dropped: 0
tx_queue_0_packets: 22487545
tx_queue_0_bytes: 4594218563
rx_queue_0_packets: 33816104
rx_queue_0_bytes: 3148800004
tx_queue_1_packets: 23095847
tx_queue_1_bytes: 4629433827
rx_queue_1_packets: 34169457
rx_queue_1_bytes: 3198473995
tx_queue_2_packets: 22235899
tx_queue_2_bytes: 4554101089
rx_queue_2_packets: 35447873
rx_queue_2_bytes: 3306351633
tx_queue_3_packets: 22655564
tx_queue_3_bytes: 4658776077
rx_queue_3_packets: 34320559
rx_queue_3_bytes: 3200636386
tx_queue_4_packets: 43152346
tx_queue_4_bytes: 17461777045
rx_queue_4_packets: 34941411
rx_queue_4_bytes: 3240195702
tx_queue_5_packets: 22992696
tx_queue_5_bytes: 4613837166
rx_queue_5_packets: 32975505
rx_queue_5_bytes: 3079512739
tx_queue_6_packets: 22535083
tx_queue_6_bytes: 4672503110
rx_queue_6_packets: 33796904
rx_queue_6_bytes: 3159691807
tx_queue_7_packets: 22452840
tx_queue_7_bytes: 4584966389
rx_queue_7_packets: 33860772
rx_queue_7_bytes: 3155304090
rx_queue_7_bytes: 6525418289

I would request azure team to provide list of patches that we can try with 4.14.51 kernel as LIS option is not applicable to us.
please let me know if I can provide any additional details.

Backport patches for linux 4.9

Hello,
I'm trying to get mellanox drivers working on 4.9.76 branch for Azure accelerated Networking. This is what happens in a "Standard_D16_v3" instance.

[   18.827068] hv_vmbus: registering driver hv_netvsc
[   18.827989] hv_netvsc: hv_netvsc channel opened successfully
[   18.845437] hv_netvsc 000d3a2e-fcbb-000d-3a2e-fcbb000d3a2e: Send section size: 6144,     Section count:2560
[   18.846091] hv_netvsc 000d3a2e-fcbb-000d-3a2e-fcbb000d3a2e: Device MAC 00:0d:3a:2e:fc:bb     link state up
[   21.890743] hv_utils: Registering HyperV Utility Driver
[   21.890744] hv_vmbus: registering driver hv_util
[   21.891766] hv_utils: Using TimeSync version 4.0
[   21.901584] pps_core: LinuxPPS API ver. 1 registered
[   21.901596] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>

and later on

[   51.336993] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[   51.340963] mlx4_core: Initializing bd45:00:02.0
[   51.353391] mlx4_core bd45:00:02.0: Detected virtual function - running in slave mode
[   51.358380] mlx4_core bd45:00:02.0: Sending reset
[   51.361118] mlx4_core bd45:00:02.0: Sending vhcr0
[   51.366673] mlx4_core bd45:00:02.0: HCA minimum page size:512
[   51.370358] mlx4_core bd45:00:02.0: Timestamping is not supported in slave mode
[   51.383500] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.387946] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.393170] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.397522] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.402003] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.406938] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.411471] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.415622] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.420364] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.424936] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.429269] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.433900] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.439128] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.443571] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.448011] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.453216] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.457835] hv_pci 920132cd-7705-40a7-bd45-5d19732e5993: Request for interrupt failed: 0xc0350005
[   51.546176] mlx4_core bd45:00:02.0: Failed to initialize event queue table, aborting
[   51.560164] mlx4_core: probe of bd45:00:02.0 failed with error -12

4.4.116 patch 0003-OFED-port-to-4.4.patch has too many problems

Hi ,
I am trying to enable SR-IOV in my Customized Linux whose kernel version is also 4.4.XX, I patch the code according to README, it's really spent me a lot of time to patch 0003-OFED-port-to-4.4.patch,
but it's harder to pass the compilation. many problems I have to solve, One of the big problem I encounter now is that. it seems that you ignore the file linux-4.4/include/linux/rhashtable.h and you create a new file named linux-4.4/include/linux/mlx_rhashtable.h which has most of the interface with rhashtable.h, these two files mess around, you changed a lot some interface in rhashtable.h, it's impossible to get passed the compilation.
Could some guys look into this issue?

Q: what is tx_send_full?

Hi there,
I am tracing a production network issue in Azure cloud. We see random outbound timeout on some of the VMs, on those VM, TCPdump showed a lot of sync retranssmission to external host.

When we look the output for "ethtool -S eth0", the only abnomal stat is tx_send_full

NIC statistics:
     tx_scattered: 0
     tx_no_memory: 0
     tx_no_space: 0
     tx_too_big: 0
     tx_busy: 0
     tx_send_full: 17464

I traced the kernel code, and it seems this is the place where the code start from. So, can you explain what is this metrics and why it increases?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.