Giter VIP home page Giter VIP logo

Comments (24)

dcui avatar dcui commented on August 16, 2024

Does the unmodified upstream 4.4 kernel (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v4.4.159) have the same issue in your environment?

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

dcui avatar dcui commented on August 16, 2024

On Windows Server 2012 R2, my VM (4.19.0-rc5+) works fine with KVP (Hyper-V Manager's Networking tab can show the VM's IP, and "strings /var/lib/hyperv/.kvp_pool_*" shows the correct info like PhysicalHostName), and I also get the warning:

Hyper-V Data Exchange connected to virtual machine 'u1804', but the version does not match the version expected by Hyper-V (Virtual machine ID A9EF4C51-FCE6-4A55-A4F1-7AF166FCF97E). Framework version: Negotiated (3.0) - Expected (3.0); Message version: Negotiated (4.0) - Expected (5.0). This is an unsupported configuration. This means that technical support will not be provided until this problem is resolved. To fix this problem, upgrade the integration services. To upgrade, connect to the virtual machine and select Insert Integration Services Setup Disk from the Action menu.

I remember we can safely ignore the warning for Linux VM.
Something else must be wrong with your VM.

Please test the latest 4.19.0-rc5 code, or test the standard Ubuntu distro (like http://releases.ubuntu.com/16.04/. I remember Ubuntu 16.04 uses the 4.x kernel: https://askubuntu.com/questions/517136/list-of-ubuntu-versions-with-corresponding-linux-kernel-version)

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

dcui avatar dcui commented on August 16, 2024

Looks like a bug in the 4.x kernel.

I can reproduce the issue ("strings /var/lib/hyperv/.kvp_pool_*" returns nothing) on WS 2012 R2 with a newly-created Ubuntu 14.04.5 VM installed from http://releases.ubuntu.com/14.04/ubuntu-14.04.5-desktop-amd64.iso. Note: in my test, the kvp daemon itself is working, i.e. Hyper-V Manager's Networking tab can show the VM's IP correctly.

from lis-next.

dcui avatar dcui commented on August 16, 2024

BTW, here Ubuntu 14.05.5's kernel version is:
4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016

Ubuntu 16.04.5 (4.15.x kernel) can not reproduce the issue: http://releases.ubuntu.com/16.04/ubuntu-16.04.5-desktop-amd64.iso

from lis-next.

dcui avatar dcui commented on August 16, 2024

Unluckily I don't have time to follow up for now. I hope somebody else in the team can help...
@capsl0cker I suggest you compare the code diff between 4.4.x and 4.15.x

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

I think the root cause is:

  1. Hyper-V host seems only send OP_SET commands one time.
  2. If user kvp daemon is not registered yet while Hyper-V host sends OP_SET command, kvp's state < HVUTIL_READY and kernel will give a fail error response to host, so these commands should be lost.
    code is below:

645 kvp_transaction.kvp_msg = kvp_msg;
646
647 if (kvp_transaction.state < HVUTIL_READY) {
648 /* Userspace is not registered yet */
649 kvp_respond_to_host(NULL, HV_E_FAIL);
650 return;
651 }

Below is my fix and it works:
--- a/drivers/hv/hv_kvp.c 2018-10-07 23:44:39.855507393 +0800
+++ b/drivers/hv/hv_kvp.c 2018-10-07 23:48:26.605510551 +0800
@@ -155,7 +155,7 @@
pr_debug("KVP: userspace daemon ver. %d registered\n",
KVP_OP_REGISTER);
kvp_register(dm_reg_value);

  • kvp_transaction.state = HVUTIL_READY;
  • hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);

    return 0;
    }
    @@ -596,7 +596,9 @@
    int util_fw_version;
    int kvp_srv_version;

  • if (kvp_transaction.state > HVUTIL_READY)
  • kvp_transaction.recv_channel = channel;

  • if (kvp_transaction.state != HVUTIL_READY)
    return;

    vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen,
    @@ -640,7 +642,6 @@
    */

    	kvp_transaction.recv_len = recvlen;
    
  •   	kvp_transaction.recv_channel = channel;
      	kvp_transaction.recv_req_id = requestid;
      	kvp_transaction.kvp_msg = kvp_msg;
    

How do you think this? If it is okay, I can submit a patch to upstream 4.4 stable kernel.

from lis-next.

dcui avatar dcui commented on August 16, 2024

@capsl0cker :
Instead of accepting "new" patches, the maintainers of the stable kernels usually ask for the upstream patches that fixed the issue.

For this particular issue, according to my test, you can cherry-pick the 3 upstream patches from the Linus's tree to v4.4.159:
2d0c3b5ad739 ("Drivers: hv: utils: Invoke the poll function after handshake")
b9830d120cbe ("Drivers: hv: util: Pass the channel information during the init call")
4dbfc2e68004 ("Drivers: hv: kvp: fix IP Failover")

But it looks v4.4.159 lacks a lot of fixes... I'm not sure if the maintainer will accept so many backport requests:

$ git log v4.4.159..v4.17-rc7 --oneline -- drivers/hv/hv_util.c drivers/hv/hv_kvp.c drivers/hv/hv_snapshot.c drivers/hv/hv_fcopy.c drivers/hv/hv_utils_transport.h
549e658 Drivers: hv: fcopy: restore correct transfer length
ddce54b Drivers: hv: kvp: Use MAX_ADAPTER_ID_SIZE for translating adapter id
1d10602 hv_utils: fix TimeSync work on pre-TimeSync-v4 hosts
4f9bac0 hv_utils: drop .getcrosststamp() support from PTP driver
a3ade8c HV: properly delay KVP packets when negotiation is in progress
57c0eab Merge 4.11-rc4 into char-misc-next
bdc1dd4 vmbus: fix spelling errors
8b1f91f vmbus: remove useless return's
5a16dfc Drivers: hv: util: don't forget to init host_ts.lock
e9c18ae Drivers: hv: util: move waiting for release to hv_utils_transport itself
b71e328 vmbus: add direct isr callback mode
bb6a4db Drivers: hv: util: Fix a typo
3716a49 hv_utils: implement Hyper-V PTP source
1274a69 Drivers: hv: Log the negotiated IC versions.
a165645 Drivers: hv: vmbus: Use all supported IC versions to negotiate
1724462 hv_util: switch to using timespec64
305f754 Drivers: hv: util: Use hv_get_current_tick() to get current tick
d77044d Drivers: hv: util: Backup: Fix a rescind processing issue
20951c7 Drivers: hv: util: Fcopy: Fix a rescind processing issue
5a66fec Drivers: hv: util: kvp: Fix a rescind processing issue
b357fd3 Drivers: hv: vss: Operation timeouts should match host expectation
23d2cc0 Drivers: hv: vss: Improve log messages.
3da0401b Drivers: hv: utils: Fix the mapping between host version and protocol to use
407a3ae hv: do not lose pending heartbeat vmbus packets
3ba1eb1 Drivers: hv: hv_util: Avoid dynamic allocation in time synch
8e1d260 Drivers: hv: utils: Support TimeSync version 4.0 protocol samples.
2e338f7 Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot.
abeda47 Drivers: hv: utils: Rename version definitions to reflect protocol version.
db886e4 Drivers: hv: utils: Check VSS daemon is listening before a hot backup
497af84 Drivers: hv: utils: Continue to poll VSS channel after handling requests.
e0fa3e5 Drivers: hv: utils: fix a race on userspace daemons registration
4dbfc2e Drivers: hv: kvp: fix IP Failover
b9830d1 Drivers: hv: util: Pass the channel information during the init call
a150256 Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
a72f3a4 Drivers: hv: utils: rename outmsg_lock
2d0c3b5 Drivers: hv: utils: Invoke the poll function after handshake
ed9ba60 Drivers: hv: vss: run only on supported host versions
3cace4a Drivers: hv: utils: run polling callback always in interrupt context
c0b200c Drivers: hv: util: Increase the timeout for util services

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

dcui avatar dcui commented on August 16, 2024

So what's the proper way to deal with this issue?

Maybe we should identify the minimal amount of patches that can fix the issue, and request the stable-kernel maintainers for a backport.

from lis-next.

dcui avatar dcui commented on August 16, 2024

HI, Seems only these 3 patches are not enough. We still need to poll channel after negotiating version(the *bold *line code in below pathch). Or sometimes I find that kernel can't get other messages after negotiating version. Don't know why.

According to the change log of 4dbfc2e68004 ("Drivers: hv: kvp: fix IP Failover"), the kvp daemon needs to start timely, i.e. it should start in less than 75 seconds after the hv_utils driver loads. I'm not sure if your daemon starts timely.

from lis-next.

dcui avatar dcui commented on August 16, 2024

About "Upstream also does similar things" -- which upstream repo/branch are you using? The latest mainline kernel is supposed to work fine, but recently there is a regression caused by https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-next&id=fc62c3b1977d62e6374fd6e28d371bb42dfa5c9d. If your branch contains this buggy patch, please "git revert" it for now.

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

dcui avatar dcui commented on August 16, 2024

Ok, then let's add the 4th patch which should be backported from the upstream to the 4.4 branch:
a3ade8cc474d ("HV: properly delay KVP packets when negotiation is in progress"

from lis-next.

dcui avatar dcui commented on August 16, 2024

I made a list of patches, which should be backported to v4.4:

1 b003596 Drivers: hv: utils: use memdup_user in hvt_op_write
2 2d0c3b5 Drivers: hv: utils: Invoke the poll function after handshake
3 1f75338 Drivers: hv: utils: fix memory leak on on_msg() failure
4 a72f3a4 Drivers: hv: utils: rename outmsg_lock
5 a150256 Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
6 9420098 Drivers: hv: utils: fix crash when device is removed from host side
7 77b744a Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy
8 b9830d1 Drivers: hv: util: Pass the channel information during the init call
9 e66853b Drivers: hv: utils: Remove util transport handler from list if registration fails
10 4dbfc2e Drivers: hv: kvp: fix IP Failover
11 e0fa3e5 Drivers: hv: utils: fix a race on userspace daemons registration
12 497af84 Drivers: hv: utils: Continue to poll VSS channel after handling requests.
13 db886e4 Drivers: hv: utils: Check VSS daemon is listening before a hot backup
14 abeda47 Drivers: hv: utils: Rename version definitions to reflect protocol version.
15 2e338f7 Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot.
16 8e1d260 Drivers: hv: utils: Support TimeSync version 4.0 protocol samples.
17 3ba1eb1 Drivers: hv: hv_util: Avoid dynamic allocation in time synch
18 3da0401b Drivers: hv: utils: Fix the mapping between host version and protocol to use
19 23d2cc0 Drivers: hv: vss: Improve log messages.
20 b357fd3 Drivers: hv: vss: Operation timeouts should match host expectation
21 1724462 hv_util: switch to using timespec64
22 a165645 Drivers: hv: vmbus: Use all supported IC versions to negotiate
23 1274a69 Drivers: hv: Log the negotiated IC versions.
24 bb6a4db Drivers: hv: util: Fix a typo
25 e9c18ae Drivers: hv: util: move waiting for release to hv_utils_transport itself
26 bdc1dd4 vmbus: fix spelling errors
27 ddce54b Drivers: hv: kvp: Use MAX_ADAPTER_ID_SIZE for translating adapter id
28 a3ade8c HV: properly delay KVP packets when negotiation is in progress

Except the last one, the first 27 patches can be cherry-picked cleanly from the mainline to the laetest linux-4.4.y branch.

Patch-28 needs a lot of big supporting patches... :-(

In my test, KVP works fine for me without patch-28.

from lis-next.

dcui avatar dcui commented on August 16, 2024

So we either ask the 4.4 kernel maintain to take the 27 patches, or we ask the maintainer to take the smaller list of 3 patches:
2d0c3b5 ("Drivers: hv: utils: Invoke the poll function after handshake")
b9830d1 ("Drivers: hv: util: Pass the channel information during the init call")
4dbfc2e ("Drivers: hv: kvp: fix IP Failover")

I'm not sure what we should do with the 28th patch:
a3ade8c HV: properly delay KVP packets when negotiation is in progress

from lis-next.

dcui avatar dcui commented on August 16, 2024

I pushed the 27 patches + 1 reworked patch (the top commit) to my github branch:
https://github.com/dcui/linux/commits/decui/v4.4.159

@capsl0cker : can you please test the 28 patches?

Please also test cherry-picking
2d0c3b5 ("Drivers: hv: utils: Invoke the poll function after handshake")
b9830d1 ("Drivers: hv: util: Pass the channel information during the init call")
4dbfc2e ("Drivers: hv: kvp: fix IP Failover")
and cherry-picking the re-worked patch (dcui/linux@2a48252)

If everything works fine for you, I'll ask the 4.4 maintainer (e.g. Greg KH) if the 28 patches are acceptable; if not, I'll ask if the 4 patches are acceptable.

from lis-next.

capsl0cker avatar capsl0cker commented on August 16, 2024

from lis-next.

dcui avatar dcui commented on August 16, 2024

This is the 4-patch version: https://github.com/dcui/linux/commits/decui/v4.4.160-with-4-patches
This is the 28-patch version (which includes more fixes): https://github.com/dcui/linux/commits/decui/v4.4.160-with-28-patches

from lis-next.

dcui avatar dcui commented on August 16, 2024

I emailed '[email protected]' with the backport request:
[PATCH] [linux-4.4.y only] HV: properly delay KVP packets when negotiation is in progress
There is no reply yet, so far.

Let's close this github issue as we have addressed the kvp issue, and here the affected kernel (v4.4) is not supported by this lis-next repo, which is only for RHEL/CentOS.

from lis-next.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.