Comments (24)
Both these commits are in the 6.6 tree, too. Without the latter no USB-A ports on the wdk.
Digging a little, I found that my 6.5.6 build is the last one doing fine with thermals. It's also the last one where I see CPU core temps. Similar behaviour on the x13s, except that I also see memory temp there.
Digging a little into the source code, there appears to be an oddity: The thermal sensors of the SnapDragon SoC are referenced in drivers/thermal/qcom/tsens.c
. The Kconfig definition for it is CONFIG_QCOM_TSENS
. If you go and use make menuconfig
, QCOM_TSENS will not be shown as an option, but you can search for it (and this path is orphaned). And after saving your changed configuration (for example, enabling FW zstd compression) this option will be gone from your config.
I just made the check, and yes, if CONFIG_QCOM_TSENS
is actually present in your .config (and you didn't hack it there manually) the darn thing builds it and I have CPU core temps back 😀. Next will be to destillate a useful defconfig file and revised build instructions (and maybe a bug report). Oh well.
jglathe@snapdragix:~$ sensors
pm8280_2_thermal-virtual-0
Adapter: Virtual device
temp1: +37.0°C
cpu7_thermal-virtual-0
Adapter: Virtual device
temp1: +47.6°C
cpu2_thermal-virtual-0
Adapter: Virtual device
temp1: +48.0°C
cpu5_thermal-virtual-0
Adapter: Virtual device
temp1: +48.3°C
cpu0_thermal-virtual-0
Adapter: Virtual device
temp1: +47.6°C
cpu3_thermal-virtual-0
Adapter: Virtual device
temp1: +47.3°C
ath11k_hwmon-pci-60100
Adapter: PCI adapter
temp1: N/A
qcom_battmgr_bat-virtual-0
Adapter: Virtual device
in0: 8.80 V
temp: +30.1°C
skin_temp_thermal-virtual-0
Adapter: Virtual device
temp1: -41.0°C
nvme-pci-20100
Adapter: PCI adapter
Composite: +46.9°C (low = -40.1°C, high = +83.8°C)
(crit = +87.8°C)
Sensor 1: +55.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +46.9°C (low = -273.1°C, high = +65261.8°C)
cluster0_thermal-virtual-0
Adapter: Virtual device
temp1: +47.6°C
pm8280_1_thermal-virtual-0
Adapter: Virtual device
temp1: +37.0°C
cpu6_thermal-virtual-0
Adapter: Virtual device
temp1: +47.3°C
cpu1_thermal-virtual-0
Adapter: Virtual device
temp1: +47.0°C
cpu4_thermal-virtual-0
Adapter: Virtual device
temp1: +47.6°C
mem_thermal-virtual-0
Adapter: Virtual device
temp1: +46.6°C
qcom_battmgr_bat-virtual-0
Adapter: Virtual device
ERROR: Can't get value of subfeature in0_input: Kernel interface error
in0: N/A
temp1: +30.1°C
from linux_ms_dev_kit.
So it looks like some bug in Makefile dep for CONFIG_QCOM_TSENS
from linux_ms_dev_kit.
Docker works, yes. Thought I could get /dev/kvm somehow, but unlikely. Although if you need a VM for (whatever) it would be cool to have it. It's research, like this whole thing.
I'm still not quite satisfied with the cooling behaviour. I have seen (and heard) it rev up gradually over time, not like these emergency coolings and nothing really in between like its now. Will try to investigate on sdbox2 with a 23.04 installation maybe its another oddity from the release change to 23.10.
from linux_ms_dev_kit.
ath11k_hwmon is for the QCNFA765 wireless adapter. On Windows it loads a whole bunch of stuff to behave, among other things "thermal mitigation driver". I guess if you hammer it with the bandwidth you can get it pretty hot.
from linux_ms_dev_kit.
@jglathe
I don't have this problem with 6.6.0 kernel. In my case whole soc just slow down cpu freq and I hear noise from internal fan.
But thermal sensors don't work. Any progress with this?
from linux_ms_dev_kit.
I found that best cooling option is to have it upside down As heated air can leave case with ventilation holes from bottom side
from linux_ms_dev_kit.
Appears to be intermittent, no real handle on it yet. It seems to be a timing issue connected to the userspace services, and the coprocessors. Sometimes it wil cool diligently as soon as the load goes up starting silent, gradually increasing over time, sometimes it won't and does only the emergency blasts, which may not be enough. Documentation is still not really there.
from linux_ms_dev_kit.
@jglathe you wrote that temp sensors works fine in 6.5 kernel version
maybe it is related to this change steev@41c1855
And maybe we can also enable USB MP
steev@e9bb9fe
from linux_ms_dev_kit.
I know that both commits are in tree but channel renaming isn't in sc8280xp-microsoft-dev-kit-2023.dts
this one
steev@41c1855
I did check 2.
from linux_ms_dev_kit.
And it is reallly stubborn in getting not set on 6.6
jglathe@snapdragix:~/src/linux_ms_dev_kit$ make -j8 devkit_defconfig
#
# configuration written to .config
#
jglathe@snapdragix:~/src/linux_ms_dev_kit$ grep TSENS .config
jglathe@snapdragix:~/src/linux_ms_dev_kit$ make -j8 defconfig
*** Default configuration is based on 'defconfig'
#
# configuration written to .config
#
jglathe@snapdragix:~/src/linux_ms_dev_kit$ grep TSENS .config
CONFIG_QCOM_TSENS=y
from linux_ms_dev_kit.
I know that both commits are in tree but channel renaming isn't in sc8280xp-microsoft-dev-kit-2023.dts
this one steev@41c1855
I did check 2.
Okay, I'll amend this.
from linux_ms_dev_kit.
@jglathe
I found that
CONFIG_QCOM_TSENS depends on NVMEM_QCOM_QFPROM
https://github.com/jglathe/linux_ms_dev_kit/blob/jg/wdk2023-6.6.1/drivers/thermal/qcom/Kconfig#L4
and NVMEM_QCOM_QFPROM is missing in your config
from linux_ms_dev_kit.
Yeah I've seen that, and adding it gives just "unexpected data" but no change. Removing the dependency on NVMEM_QCOM_QFPROM from KConfig actually helps. And, this QFPROM option appears to be independent / have vanished from the menuconfig, too.
from linux_ms_dev_kit.
@xlazom00 thank you for the debug. Looks like its working now.
jglathe@snapdragix:~$ uname -a
Linux snapdragix 6.6.2+ #7 SMP PREEMPT Fri Nov 24 14:11:31 CET 2023 aarch64 aarch64 aarch64 GNU/Linux
jglathe@snapdragix:~$ sensors
pm8280_2_thermal-virtual-0
Adapter: Virtual device
temp1: +37.0°C
mem_thermal-virtual-0
Adapter: Virtual device
temp1: +42.6°C
cpu3_thermal-virtual-0
Adapter: Virtual device
temp1: +43.3°C
cpu7_thermal-virtual-0
Adapter: Virtual device
temp1: +42.6°C
cpu1_thermal-virtual-0
Adapter: Virtual device
temp1: +42.9°C
cpu5_thermal-virtual-0
Adapter: Virtual device
temp1: +43.3°C
skin_temp_thermal-virtual-0
Adapter: Virtual device
temp1: -41.0°C
nvme-pci-20100
Adapter: PCI adapter
Composite: +39.9°C (low = -40.1°C, high = +83.8°C)
(crit = +87.8°C)
Sensor 1: +52.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +39.9°C (low = -273.1°C, high = +65261.8°C)
pm8280_1_thermal-virtual-0
Adapter: Virtual device
temp1: +37.0°C
cpu4_thermal-virtual-0
Adapter: Virtual device
temp1: +43.6°C
cluster0_thermal-virtual-0
Adapter: Virtual device
temp1: +43.9°C
cpu2_thermal-virtual-0
Adapter: Virtual device
temp1: +42.3°C
cpu6_thermal-virtual-0
Adapter: Virtual device
temp1: +42.9°C
cpu0_thermal-virtual-0
Adapter: Virtual device
temp1: +43.6°C
ath11k_hwmon-pci-60100
Adapter: PCI adapter
temp1: N/A
qcom_battmgr_bat-virtual-0
Adapter: Virtual device
in0: 8.80 V
temp: +30.1°C
Will test a bit, then probably close it. We could try to get ath11k_hwmon to do something useful, too, but I guess soundwire over dp would be of greater benefit. Or the VM support with gunyah (if possible at all).
from linux_ms_dev_kit.
ath11k_hwmon ? Is it thermal monitor for what?
Why gunyah ? Docker is just good enought😁
from linux_ms_dev_kit.
hmm done my test with sdbox2, it's running kernel 6.6.2 with Lunar (23.04) now, from a SATA SSD. So, state of the art Lunar setup, enough to test large compile load. And, it's scaling smoothly with the fan. Lots of reserves, no emergency blasts.
Now, any ideas on how to go about debugging this on 23.10? I mean, we got our QCOM_TSENS devices back, that's something. But I'd like to understand this further.
On 23.10, we also have qrtr-ns and pd-mapper running as services. Any help or ideas would be appreciated.
from linux_ms_dev_kit.
oh look, ath11k temp also works on 23.04.
from linux_ms_dev_kit.
[ 7.890048] ath11k_pci 0006:01:00.0: BAR 0: assigned [mem 0x30400000-0x305fffff 64bit]
[ 7.890155] ath11k_pci 0006:01:00.0: enabling device (0000 -> 0002)
[ 7.893791] ath11k_pci 0006:01:00.0: MSI vectors: 32
[ 7.893828] ath11k_pci 0006:01:00.0: wcn6855 hw2.1
[ 8.867216] ath11k_pci 0006:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
[ 8.867223] ath11k_pci 0006:01:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
[ 8.969033] ath11k_pci 0006:01:00.0: no success fetching board data for bus=pci,vendor=17cb,device=1103,subsystem-vendor=17cb,subsystem-device=0108,qmi-chip-id=18,qmi-board-id=255,variant=volterra
[ 9.791193] ath11k_pci 0006:01:00.0 wlP6p1s0: renamed from wlan0
maybe related to this
from linux_ms_dev_kit.
Nah. That‘s my debug message 😃 it says your board-2.bin doesn’t contain the calibration profile for this combination. I hacked up a board file that contains the X13s data as these… will be overwritten by linux-firmware when it updates /lib/firmware/updates/ath11k…
board-2.zip
Anyway, doesn’t impact the temp measurements.
from linux_ms_dev_kit.
A strange tale of recovery images, internal SSDs and power management behaviour on Linux
What a weekend. I think I narrowed down the odd behaviour of my wdk regarding cooling. One (the first bought and now thoroughly used looking) has been opened to replace internal SSD with larger models (2TB Micron 2400 as of recently). The other one, sdbox2, was bought this summer via ebay and looked quite pristine despite being sold as used in a non-original packaging. The SSD statistics confirmed that it was barely used. Anyway, sdbox2 got a recovery image treatment and has never been opened to replace the SSD yet. The Linux is usually booted from an external USB SSD enclosure, with GRUB on the external SSD, which works nicely and can also boot the Windows of the internal SSD. On sdbox2, the fan behaviour is smooth, cooling is efficient and mostly silent, with enough headroom for peaks (I guess). No full blasts.
On the original box, volterra, I eventually noticed these blasts. No smooth fan use (on Linux). Temperatures will grow gradually until ~85°C, then it will blast full power until it reaches 60°C, then switch off. And repeat. Occasionally it will get too hot and something will crash, resulting in a read-only filesystem.
Anyway. I refitted volterra with a different SSD, did the recovery treatment, rebooted with the external Linux SSD (23.04), tested the temps and it operated smoothly. I replaced the SSD with the previous one, rebooted to 23.04 (on the internal SSD) and it worked in blasts. Booted from external SSD (23.04) and it worked in blasts. Replaced the SSD back to the previously recovery-treated, booted from external SSD and it worked in blasts. What? 🤨 I did the recovery treatment again, didn't replace the SSD, booted from external SSD and... it worked smoothly again.
Findings
On the root cause I can only speculate, I guess it's some sort of signing / hardware signature thing. But there definitely is a connection between the SSD contents as created by the recovery image and the operation of the power management.
- Resizing the Windows partition appears to be allowed
- Installing GRUB on the existing EFI partition appears to be allowed
- Adding a Linux partition appears to be allowed
- Copying the whole structure over to a bigger SSD (with dd or more sophisticated tools) breaks power management.
- Replacing the SSD with another working version breaks power management.
So, beware. First step after internal SSD change seems to be recovery treatment to ensure you get working power management on Linux. Fascinating.
from linux_ms_dev_kit.
You know what affects ath11k_hwmon-pci-60100
measurements? Actually, rfkill
does 😵💫 Another mystery solved.
Seen it on 23.10 only, though. Quite relieved.
from linux_ms_dev_kit.
Update re 23.10 and thermal management: The same applies as told in the cautionary tale of recovery images. 23.10 on USB boot, with core temps thanks to CONFIG_QCOM_TSENS being enabled, behaves well now.
from linux_ms_dev_kit.
Update to add: If you fail to mount the EFI partition in Linux for whatever reason, you're fscked. The hardware binding is broken at this moment. Still no idea what the reason is, but the WDK has an EC from Microsoft, and it might check a thing or two. To restore the binding you need to boot to Windows. It appears to repair or reset the hardware binding.
There's some research underway to access the EC and to get fan speeds. Let's see what comes of this.
from linux_ms_dev_kit.
For a quick test how the temperatures are, I use this script:
temperatures.sh.txt
and run it with watch -n 1 "./temperatures.sh |sort"
from linux_ms_dev_kit.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linux_ms_dev_kit.