Comments (7)
Since you said that the problem does not occur with the proprietary driver, please post an nvidia-bug-report.log.gz with the proprietary driver installed, so that we can compare the two.
from open-gpu-kernel-modules.
Using the proprietary driver:
Sun Apr 21 01:33:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76 Driver Version: 550.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 55C P0 13W / 80W | 2MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
from open-gpu-kernel-modules.
same, i can't get any information about power usage with latest drivers (both of proprieatry and open gpu kernel modules)
from open-gpu-kernel-modules.
HI there! The nvidia-bug-report.log from the original post shows:
4月 19 22:36:49 Reverier-Arch kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.67 Release Build (archlinux-builder@)
[...snip...]
4月 20 19:55:29 Reverier-Arch kernel: NVRM: API mismatch: the client has the version 550.76, but
NVRM: this kernel module has the version 550.67. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
but your nvidia-smi output shows Driver Version: 550.76
. Are you sure these are from the same run? The mismatched kernelmode/usermode version could easily cause this error. It can happen if you built the wrong version of the open driver from source. dkms shows you built 550.76, but you're still loading 550.67 somehow.
from open-gpu-kernel-modules.
HI there! The nvidia-bug-report.log from the original post shows:
4月 19 22:36:49 Reverier-Arch kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.67 Release Build (archlinux-builder@) [...snip...] 4月 20 19:55:29 Reverier-Arch kernel: NVRM: API mismatch: the client has the version 550.76, but NVRM: this kernel module has the version 550.67. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version.
but your nvidia-smi output shows
Driver Version: 550.76
. Are you sure these are from the same run? The mismatched kernelmode/usermode version could easily cause this error. It can happen if you built the wrong version of the open driver from source. dkms shows you built 550.76, but you're still loading 550.67 somehow.
Previously the driver versions were inconsistent probably due to distro packaging issues, sorry for that. so far I can confirm that the driver versions are consistent, but the issue remains. I re-generated a log file using nvidia-bug-report.sh
:
I could confirm the dkms driver has the right version 550.76
in package manager.
$ pacman -Qi nvidia-open-dkms
Name : nvidia-open-dkms
Version : 550.76-3
Description : NVIDIA open kernel modules
Architecture : x86_64
URL : https://github.com/NVIDIA/open-gpu-kernel-modules
Licenses : GPL
Groups : None
Provides : nvidia-open NVIDIA-MODULE
Depends On : nvidia-utils=550.76 libglvnd dkms
Optional Deps : None
Required By : None
Optional For : None
Conflicts With : nvidia-open NVIDIA-MODULE
Replaces : None
Installed Size : 77.26 MiB
Packager : Jan Alexander Steffens (heftig) <[email protected]>
Build Date : Mon Apr 29 00:27:24 2024
Install Date : Tue Apr 30 22:05:55 2024
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature
from open-gpu-kernel-modules.
Thanks for the update! Looking at the new log, this really stands out:
[ 6.999219] NVRM: GPU at PCI:0000:01:00: GPU-e8108ab1-bcb6-22ff-7cab-c21072716616
[ 6.999223] NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, 20262044 2027f08e 2027df5c 2022ec3e 20281296 2022adbe 00000000 00000000
Xid 62 is PMU_HALT_ERROR which would make a lot of the power readings unavailable, but could also lead to bigger system instability. GSP logs confirm as much. I've filed bug 4630466 so that our PMU experts can look into it.
In the meantime, could you please load nvidia.ko with NVreg_RmMsg=":" and try one more time? That should flood your dmesg with a lot of debug info, and hopefully some of it can help us narrow it down.
from open-gpu-kernel-modules.
Archlinux has pushed nvidia-open 550.78 into the repository, seems that this issue is solved, thanks!
from open-gpu-kernel-modules.
Related Issues (20)
- Nvidia Linux Driver v550.54.14 Docker transcoding broken HOT 7
- Provide PM firmware for the nouveau HOT 4
- build failed on ub22.04 with 550.40.07 HOT 2
- linux 6.8. - NOT a release candidate - Module is failing to build - 390xx HOT 1
- kernel memory not released
- unconditional use of hmm_pfn_to_page() which was introduced in Linux 5.8
- Dynamic boost can't be enabled HOT 5
- Atomic commit fails if IN_FENCE_FD is set HOT 2
- PCI PM fails with modesetting enabled HOT 1
- 550.76 release missing HOT 1
- nvidia_ioctl frequent dynamic memory allocation HOT 1
- NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0 HOT 3
- using Clang to build and go error HOT 3
- Can't build on fedora 40 HOT 6
- -gcc-sections required
- Failed to display anything when early KMS HOT 1
- nvidia-drm Direct firmware load for nvidia/550.76/gsp_ga10x.bin failed with error -2 HOT 2
- [555.42.02] D3cold on Turing Mobile not working with kernel 6.9.2. Works with closed driver. HOT 10
- soc_isr_lock is missing a NV_SPIN_LOCK_INIT HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from open-gpu-kernel-modules.