Giter VIP home page Giter VIP logo

Comments (24)

jglathe avatar jglathe commented on July 24, 2024 1

Both these commits are in the 6.6 tree, too. Without the latter no USB-A ports on the wdk.
Digging a little, I found that my 6.5.6 build is the last one doing fine with thermals. It's also the last one where I see CPU core temps. Similar behaviour on the x13s, except that I also see memory temp there.
Digging a little into the source code, there appears to be an oddity: The thermal sensors of the SnapDragon SoC are referenced in drivers/thermal/qcom/tsens.c. The Kconfig definition for it is CONFIG_QCOM_TSENS . If you go and use make menuconfig, QCOM_TSENS will not be shown as an option, but you can search for it (and this path is orphaned). And after saving your changed configuration (for example, enabling FW zstd compression) this option will be gone from your config.
I just made the check, and yes, if CONFIG_QCOM_TSENS is actually present in your .config (and you didn't hack it there manually) the darn thing builds it and I have CPU core temps back 😀. Next will be to destillate a useful defconfig file and revised build instructions (and maybe a bug report). Oh well.

jglathe@snapdragix:~$ sensors
pm8280_2_thermal-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

cpu7_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.6°C  

cpu2_thermal-virtual-0
Adapter: Virtual device
temp1:        +48.0°C  

cpu5_thermal-virtual-0
Adapter: Virtual device
temp1:        +48.3°C  

cpu0_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.6°C  

cpu3_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.3°C  

ath11k_hwmon-pci-60100
Adapter: PCI adapter
temp1:            N/A  

qcom_battmgr_bat-virtual-0
Adapter: Virtual device
in0:           8.80 V  
temp:         +30.1°C  

skin_temp_thermal-virtual-0
Adapter: Virtual device
temp1:        -41.0°C  

nvme-pci-20100
Adapter: PCI adapter
Composite:    +46.9°C  (low  = -40.1°C, high = +83.8°C)
                       (crit = +87.8°C)
Sensor 1:     +55.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)

cluster0_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.6°C  

pm8280_1_thermal-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

cpu6_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.3°C  

cpu1_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.0°C  

cpu4_thermal-virtual-0
Adapter: Virtual device
temp1:        +47.6°C  

mem_thermal-virtual-0
Adapter: Virtual device
temp1:        +46.6°C  

qcom_battmgr_bat-virtual-0
Adapter: Virtual device
ERROR: Can't get value of subfeature in0_input: Kernel interface error
in0:              N/A  
temp1:        +30.1°C  

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024 1

So it looks like some bug in Makefile dep for CONFIG_QCOM_TSENS

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024 1

Docker works, yes. Thought I could get /dev/kvm somehow, but unlikely. Although if you need a VM for (whatever) it would be cool to have it. It's research, like this whole thing.
I'm still not quite satisfied with the cooling behaviour. I have seen (and heard) it rev up gradually over time, not like these emergency coolings and nothing really in between like its now. Will try to investigate on sdbox2 with a 23.04 installation maybe its another oddity from the release change to 23.10.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024 1

ath11k_hwmon is for the QCNFA765 wireless adapter. On Windows it loads a whole bunch of stuff to behave, among other things "thermal mitigation driver". I guess if you hammer it with the bandwidth you can get it pretty hot.

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

@jglathe
I don't have this problem with 6.6.0 kernel. In my case whole soc just slow down cpu freq and I hear noise from internal fan.

But thermal sensors don't work. Any progress with this?

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

I found that best cooling option is to have it upside down As heated air can leave case with ventilation holes from bottom side

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

Appears to be intermittent, no real handle on it yet. It seems to be a timing issue connected to the userspace services, and the coprocessors. Sometimes it wil cool diligently as soon as the load goes up starting silent, gradually increasing over time, sometimes it won't and does only the emergency blasts, which may not be enough. Documentation is still not really there.

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

@jglathe you wrote that temp sensors works fine in 6.5 kernel version
maybe it is related to this change steev@41c1855

And maybe we can also enable USB MP
steev@e9bb9fe

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

I know that both commits are in tree but channel renaming isn't in sc8280xp-microsoft-dev-kit-2023.dts

this one
steev@41c1855

I did check 2.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

And it is reallly stubborn in getting not set on 6.6


jglathe@snapdragix:~/src/linux_ms_dev_kit$ make -j8 devkit_defconfig
#
# configuration written to .config
#
jglathe@snapdragix:~/src/linux_ms_dev_kit$ grep TSENS .config
jglathe@snapdragix:~/src/linux_ms_dev_kit$ make -j8 defconfig
*** Default configuration is based on 'defconfig'
#
# configuration written to .config
#
jglathe@snapdragix:~/src/linux_ms_dev_kit$ grep TSENS .config
CONFIG_QCOM_TSENS=y

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

I know that both commits are in tree but channel renaming isn't in sc8280xp-microsoft-dev-kit-2023.dts

this one steev@41c1855

I did check 2.

Okay, I'll amend this.

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

@jglathe
I found that
CONFIG_QCOM_TSENS depends on NVMEM_QCOM_QFPROM
https://github.com/jglathe/linux_ms_dev_kit/blob/jg/wdk2023-6.6.1/drivers/thermal/qcom/Kconfig#L4
and NVMEM_QCOM_QFPROM is missing in your config

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

Yeah I've seen that, and adding it gives just "unexpected data" but no change. Removing the dependency on NVMEM_QCOM_QFPROM from KConfig actually helps. And, this QFPROM option appears to be independent / have vanished from the menuconfig, too.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

@xlazom00 thank you for the debug. Looks like its working now.

jglathe@snapdragix:~$ uname -a
Linux snapdragix 6.6.2+ #7 SMP PREEMPT Fri Nov 24 14:11:31 CET 2023 aarch64 aarch64 aarch64 GNU/Linux
jglathe@snapdragix:~$ sensors
pm8280_2_thermal-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

mem_thermal-virtual-0
Adapter: Virtual device
temp1:        +42.6°C  

cpu3_thermal-virtual-0
Adapter: Virtual device
temp1:        +43.3°C  

cpu7_thermal-virtual-0
Adapter: Virtual device
temp1:        +42.6°C  

cpu1_thermal-virtual-0
Adapter: Virtual device
temp1:        +42.9°C  

cpu5_thermal-virtual-0
Adapter: Virtual device
temp1:        +43.3°C  

skin_temp_thermal-virtual-0
Adapter: Virtual device
temp1:        -41.0°C  

nvme-pci-20100
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -40.1°C, high = +83.8°C)
                       (crit = +87.8°C)
Sensor 1:     +52.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)

pm8280_1_thermal-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

cpu4_thermal-virtual-0
Adapter: Virtual device
temp1:        +43.6°C  

cluster0_thermal-virtual-0
Adapter: Virtual device
temp1:        +43.9°C  

cpu2_thermal-virtual-0
Adapter: Virtual device
temp1:        +42.3°C  

cpu6_thermal-virtual-0
Adapter: Virtual device
temp1:        +42.9°C  

cpu0_thermal-virtual-0
Adapter: Virtual device
temp1:        +43.6°C  

ath11k_hwmon-pci-60100
Adapter: PCI adapter
temp1:            N/A  

qcom_battmgr_bat-virtual-0
Adapter: Virtual device
in0:           8.80 V  
temp:         +30.1°C  

Will test a bit, then probably close it. We could try to get ath11k_hwmon to do something useful, too, but I guess soundwire over dp would be of greater benefit. Or the VM support with gunyah (if possible at all).

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

ath11k_hwmon ? Is it thermal monitor for what?
Why gunyah ? Docker is just good enought😁

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

hmm done my test with sdbox2, it's running kernel 6.6.2 with Lunar (23.04) now, from a SATA SSD. So, state of the art Lunar setup, enough to test large compile load. And, it's scaling smoothly with the fan. Lots of reserves, no emergency blasts.

Screenshot from 2023-11-24 20-16-12

Now, any ideas on how to go about debugging this on 23.10? I mean, we got our QCOM_TSENS devices back, that's something. But I'd like to understand this further.

On 23.10, we also have qrtr-ns and pd-mapper running as services. Any help or ideas would be appreciated.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

oh look, ath11k temp also works on 23.04.

from linux_ms_dev_kit.

xlazom00 avatar xlazom00 commented on July 24, 2024

[ 7.890048] ath11k_pci 0006:01:00.0: BAR 0: assigned [mem 0x30400000-0x305fffff 64bit]
[ 7.890155] ath11k_pci 0006:01:00.0: enabling device (0000 -> 0002)
[ 7.893791] ath11k_pci 0006:01:00.0: MSI vectors: 32
[ 7.893828] ath11k_pci 0006:01:00.0: wcn6855 hw2.1
[ 8.867216] ath11k_pci 0006:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
[ 8.867223] ath11k_pci 0006:01:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
[ 8.969033] ath11k_pci 0006:01:00.0: no success fetching board data for bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=0108,qmi-chip-id=18,qmi-board-id=255,variant=volterra
[ 9.791193] ath11k_pci 0006:01:00.0 wlP6p1s0: renamed from wlan0

maybe related to this

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

Nah. That‘s my debug message 😃 it says your board-2.bin doesn’t contain the calibration profile for this combination. I hacked up a board file that contains the X13s data as these… will be overwritten by linux-firmware when it updates ☹️ To avoid this I have placed the hacked variant into /lib/firmware/updates/ath11k…
Screenshot from 2023-11-30 06-46-02
board-2.zip
Anyway, doesn’t impact the temp measurements.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

A strange tale of recovery images, internal SSDs and power management behaviour on Linux

What a weekend. I think I narrowed down the odd behaviour of my wdk regarding cooling. One (the first bought and now thoroughly used looking) has been opened to replace internal SSD with larger models (2TB Micron 2400 as of recently). The other one, sdbox2, was bought this summer via ebay and looked quite pristine despite being sold as used in a non-original packaging. The SSD statistics confirmed that it was barely used. Anyway, sdbox2 got a recovery image treatment and has never been opened to replace the SSD yet. The Linux is usually booted from an external USB SSD enclosure, with GRUB on the external SSD, which works nicely and can also boot the Windows of the internal SSD. On sdbox2, the fan behaviour is smooth, cooling is efficient and mostly silent, with enough headroom for peaks (I guess). No full blasts.
On the original box, volterra, I eventually noticed these blasts. No smooth fan use (on Linux). Temperatures will grow gradually until ~85°C, then it will blast full power until it reaches 60°C, then switch off. And repeat. Occasionally it will get too hot and something will crash, resulting in a read-only filesystem.
Anyway. I refitted volterra with a different SSD, did the recovery treatment, rebooted with the external Linux SSD (23.04), tested the temps and it operated smoothly. I replaced the SSD with the previous one, rebooted to 23.04 (on the internal SSD) and it worked in blasts. Booted from external SSD (23.04) and it worked in blasts. Replaced the SSD back to the previously recovery-treated, booted from external SSD and it worked in blasts. What? 🤨 I did the recovery treatment again, didn't replace the SSD, booted from external SSD and... it worked smoothly again.

Findings

On the root cause I can only speculate, I guess it's some sort of signing / hardware signature thing. But there definitely is a connection between the SSD contents as created by the recovery image and the operation of the power management.

  • Resizing the Windows partition appears to be allowed
  • Installing GRUB on the existing EFI partition appears to be allowed
  • Adding a Linux partition appears to be allowed
  • Copying the whole structure over to a bigger SSD (with dd or more sophisticated tools) breaks power management.
  • Replacing the SSD with another working version breaks power management.

So, beware. First step after internal SSD change seems to be recovery treatment to ensure you get working power management on Linux. Fascinating.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

You know what affects ath11k_hwmon-pci-60100 measurements? Actually, rfkill does 😵‍💫 Another mystery solved.
Screenshot from 2023-12-04 13-22-58
Seen it on 23.10 only, though. Quite relieved.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

Update re 23.10 and thermal management: The same applies as told in the cautionary tale of recovery images. 23.10 on USB boot, with core temps thanks to CONFIG_QCOM_TSENS being enabled, behaves well now.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

Update to add: If you fail to mount the EFI partition in Linux for whatever reason, you're fscked. The hardware binding is broken at this moment. Still no idea what the reason is, but the WDK has an EC from Microsoft, and it might check a thing or two. To restore the binding you need to boot to Windows. It appears to repair or reset the hardware binding.
There's some research underway to access the EC and to get fan speeds. Let's see what comes of this.

from linux_ms_dev_kit.

jglathe avatar jglathe commented on July 24, 2024

For a quick test how the temperatures are, I use this script:
temperatures.sh.txt
and run it with watch -n 1 "./temperatures.sh |sort"

from linux_ms_dev_kit.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.