Comments (4)
Thank you for the report. I've filed NVIDIA internal bug 4706166 to track this.
If you're willing to rebuild the open kernel modules, could you please apply this patch, and then upload the system log after the problem reproduces again? Thanks!
$ cat 0001-instrumentation-for-suspend-crash.patch
From 44afc9067af6df0671724e37b8f2c2cde7386590 Mon Sep 17 00:00:00 2001
From: Andy Ritger <aritger@nvidia.com>
Date: Mon, 17 Jun 2024 15:03:14 -0700
Subject: [PATCH] instrumentation for suspend crash
X-NVConfidentiality: public
---
kernel-open/nvidia/nv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel-open/nvidia/nv.c b/kernel-open/nvidia/nv.c
index 99792de96307..bc003399cd83 100644
--- a/kernel-open/nvidia/nv.c
+++ b/kernel-open/nvidia/nv.c
@@ -3111,7 +3111,8 @@ nv_map_guest_pages(nv_alloc_t *at,
if (pages == NULL)
{
nv_printf(NV_DBG_ERRORS,
- "NVRM: failed to allocate vmap() page descriptor table!\n");
+ "NVRM: failed to allocate vmap() page descriptor table! (page_count: %d)\n", page_count);
+ dump_stack();
return 0;
}
@@ -3604,7 +3605,8 @@ void* NV_API_CALL nv_alloc_kernel_mapping(
if (pages == NULL)
{
nv_printf(NV_DBG_ERRORS,
- "NVRM: failed to allocate vmap() page descriptor table!\n");
+ "NVRM: failed to allocate vmap() page descriptor table! (page_count:%d)\n", page_count);
+ dump_stack();
return NULL;
}
--
2.44.0
from open-gpu-kernel-modules.
Thanks for the patch. I am currently on the proprietary 555.58.02 module because I need to avoid the slowdowns in KDE caused by the GSP firmware, so I have not run into this sleep issue again. Once the GSP bug is resolved, I will switch to the open module again and apply the patch to see what's going on.
from open-gpu-kernel-modules.
I am on proprietary nvidia 555.58.02-1 driver and I have the same problems that are listed there, I use Arch Linux, NVIDIA GeForce RTX™ 3050 Laptop GPU
I have tried
linux 6.9.7 (or) linux-lts 6.6.37
NVreg_EnableS0ixPowerManagement (or) NVreg_PreserveVideoMemoryAllocations on /var/tmp (over 250GB space left)
nvidia_drm.modeset 0 (or) 1 as boot parameter
nvidia_drm.fbdev 0 (or) 1 as boot parameter
X11 (or) Xwayland
nothing helped, (except module_blacklist=nvidia)
Jul 06 03:35:17 archlinux kernel: NVRM: failed to allocate vmap() page descriptor table!
Jul 06 03:35:17 archlinux kernel: NVRM: GPU at PCI:0000:01:00: GPU-887a46df-29b2-be1c-8c55-e637117338ba
Jul 06 03:35:17 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=8874, name=kworker/u48:11, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205b 0x4).
Jul 06 03:35:17 archlinux kernel: NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x000000002080205b 0x0000000000000004.
Jul 06 03:35:17 archlinux kernel: NVRM: GPU0 RPC history (CPU -> GSP):
Jul 06 03:35:17 archlinux kernel: NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
Jul 06 03:35:17 archlinux kernel: NVRM: 0 76 GSP_RM_CONTROL 0x000000002080205b 0x0000000000000004 0x00061c8a2d4a3f2a 0x0000000000000000 y
Jul 06 03:35:17 archlinux kernel: NVRM: -1 47 UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x00061c8a2d32031f 0x00061c8a2d34fd27 195080us
Jul 06 03:35:17 archlinux kernel: NVRM: -2 10 FREE 0x00000000c1e016c0 0x0000000000000000 0x00061c8a2d320088 0x00061c8a2d3202fc 628us
Jul 06 03:35:17 archlinux kernel: NVRM: -3 10 FREE 0x000000000000000a 0x0000000000000000 0x00061c8a2d31fa32 0x00061c8a2d320087 1621us
Jul 06 03:35:17 archlinux kernel: NVRM: -4 10 FREE 0x000000000000000b 0x0000000000000000 0x00061c8a2d31f763 0x00061c8a2d31f943 480us
Jul 06 03:35:17 archlinux kernel: NVRM: -5 10 FREE 0x0000000000000006 0x0000000000000000 0x00061c8a2d31f52a 0x00061c8a2d31f75e 564us
Jul 06 03:35:17 archlinux kernel: NVRM: -6 10 FREE 0x0000000000000002 0x0000000000000000 0x00061c8a2d31e4fe 0x00061c8a2d31f4fd 4095us
Jul 06 03:35:17 archlinux kernel: NVRM: -7 10 FREE 0x0000000000000005 0x0000000000000000 0x00061c8a2d31da4b 0x00061c8a2d31e4fb 2736us
Jul 06 03:35:17 archlinux kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
Jul 06 03:35:17 archlinux kernel: NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
Jul 06 03:35:17 archlinux kernel: NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x00061c8a2d32b15b 0x00061c8a2d32b15c 1us
Jul 06 03:35:17 archlinux kernel: NVRM: -1 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x00061c8a2d324d79 0x00061c8a2d324d7b 2us
Jul 06 03:35:17 archlinux kernel: NVRM: -2 4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x00061c8a2d2fb48c 0x00061c8a2d2fb48c
Jul 06 03:35:17 archlinux kernel: NVRM: -3 4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x00061c8a2d24eef8 0x00061c8a2d24eef9 1us
Jul 06 03:35:17 archlinux kernel: NVRM: -4 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x00061c8a2ce35a01 0x00061c8a2ce35a01
Jul 06 03:35:17 archlinux kernel: NVRM: -5 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x00061c8a2ce357ff 0x00061c8a2ce35800 1us
Jul 06 03:35:17 archlinux kernel: NVRM: -6 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x00061c8a2ce33db1 0x00061c8a2ce33db3 2us
Jul 06 03:35:17 archlinux kernel: NVRM: -7 4098 GSP_RUN_CPU_SEQUENCER 0x000000000000060a 0x0000000000003fe2 0x00061c8a2ce27b5b 0x00061c8a2ce28c8b 4400us
Jul 06 03:35:17 archlinux kernel: CPU: 4 PID: 8874 Comm: kworker/u48:11 Tainted: P OE 6.9.7-arch1-1 #1 44783200744f92500e6484c6d93590bc19db4a83
Jul 06 03:35:17 archlinux kernel: Hardware name: Micro-Star International Co., Ltd. Thin GF63 12UC/MS-16R8, BIOS E16R8IMS.111 03/21/2024
Jul 06 03:35:17 archlinux kernel: Workqueue: async async_run_entry_fn
Jul 06 03:35:17 archlinux kernel: Call Trace:
Jul 06 03:35:17 archlinux kernel: <TASK>
Jul 06 03:35:17 archlinux kernel: dump_stack_lvl+0x5d/0x80
Jul 06 03:35:17 archlinux kernel: _nv012672rm+0x437/0x4b0 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv012592rm+0x74/0x330 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv046348rm+0x49f/0x7f0 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv049583rm+0xa1/0x150 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv045638rm+0x19e/0x1b0 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv047612rm+0x3fc/0x500 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv014430rm+0x42e/0x690 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv045777rm+0x26/0x30 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv000751rm+0x55/0x70 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv000750rm+0x21b/0x220 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: _nv000701rm+0x2ad/0x300 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: rm_power_management+0x22c/0x260 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: ? wait_for_completion+0x91/0x170
Jul 06 03:35:17 archlinux kernel: nv_power_management+0x92/0x170 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: nvidia_suspend+0x6c/0x100 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: nv_pmops_suspend+0x15/0x30 [nvidia 1c25e0c66b648a34b428942d4985971b2b3325f6]
Jul 06 03:35:17 archlinux kernel: pci_pm_suspend+0x7c/0x170
Jul 06 03:35:17 archlinux kernel: ? __pfx_pci_pm_suspend+0x10/0x10
Jul 06 03:35:17 archlinux kernel: dpm_run_callback+0x47/0x150
Jul 06 03:35:17 archlinux kernel: device_suspend+0x141/0x510
Jul 06 03:35:17 archlinux kernel: ? try_to_wake_up+0x76/0x660
Jul 06 03:35:17 archlinux kernel: async_suspend+0x1d/0x30
Jul 06 03:35:17 archlinux kernel: async_run_entry_fn+0x31/0x140
Jul 06 03:35:17 archlinux kernel: process_one_work+0x18b/0x350
Jul 06 03:35:17 archlinux kernel: worker_thread+0x2eb/0x410
Jul 06 03:35:17 archlinux kernel: ? __pfx_worker_thread+0x10/0x10
Jul 06 03:35:17 archlinux kernel: kthread+0xcf/0x100
Jul 06 03:35:17 archlinux kernel: ? __pfx_kthread+0x10/0x10
Jul 06 03:35:17 archlinux kernel: ret_from_fork+0x31/0x50
Jul 06 03:35:17 archlinux kernel: ? __pfx_kthread+0x10/0x10
Jul 06 03:35:17 archlinux kernel: ret_from_fork_asm+0x1a/0x30
Jul 06 03:35:17 archlinux kernel: </TASK>
Jul 06 03:35:17 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=8874, name=kworker/u48:11, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a81 0x4).
Jul 06 03:35:17 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=8874, name=kworker/u48:11, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a76 0x2).
Jul 06 03:35:17 archlinux kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30). The GPU likely needs to be reset.
Jul 06 03:35:17 archlinux kernel: nvidia-modeset: ERROR: GPU:0: Failed to determine display capabilities
Jul 06 03:35:17 archlinux kernel: nvidia-modeset: ERROR: GPU:0: Failed to tear down Disp
Jul 06 03:35:17 archlinux kernel: nvidia-modeset: ERROR: GPU:0: Failed to determine display capabilities
Jul 06 03:35:17 archlinux kernel: nvidia-modeset: ERROR: GPU:0: Failed to tear down Disp
Jul 06 03:35:17 archlinux kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Jul 06 03:35:17 archlinux kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Jul 06 03:35:17 archlinux kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
Jul 06 03:35:17 archlinux kernel: PM: Some devices failed to suspend, or early wake event detected
Jul 06 03:35:17 archlinux kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
Jul 06 03:35:17 archlinux kernel: done.
from open-gpu-kernel-modules.
Also seeing this on RTX 2070 with the proprietary 555.58.02 driver.
from open-gpu-kernel-modules.
Related Issues (20)
- Chromium GPU Process Cannot Start HOT 5
- Black screen on 555.x.x, Gnome 46, Fedora 40 HOT 2
- open-gpu-kerenel-modules can't build on kernel-6.10 after 2024-05-20 HOT 1
- 6.10 RC kernel fails to build DKMS HOT 4
- Low fps on external monitor connected to nvidia hdmi port HOT 3
- Nvidia gpu not used
- Please Update Changelog for huge 550.90.07 code dump HOT 2
- PRIMUS - Card remains acrive - NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC
- Periodic stutters and “NVRM: RmCheckForGcxSupportOnCurrentState” kernel warnings on Ubuntu 22.04 RTX 4070
- Strip: Assertion failed Thin LTO nvidia-open-dkms
- NO SHARED MEMORY FOR YEARS [NVIDIA_UVM] - BASIC FEATURE HOT 2
- Failed to load module on boot HOT 1
- Failures to resume when sleeping (s0ix) on newer kernels HOT 9
- The screen show incorrect image when using custom EDID file on ubuntu
- can't install cuda (530 | 535 | 555) in my system HOT 4
- Very slow 3D performance after a while
- "NVRM RmInitAdapter: Cannot initialize GSP firmware RM" error found HOT 1
- Cufft GPU memoey increase HOT 1
- black background after certain time
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from open-gpu-kernel-modules.