Giter VIP home page Giter VIP logo

gddr6's Introduction

GDDR6/GDDR6X GPU Memory Temperature Reader for Linux

Reads GDDR6/GDDR6X VRAM memory temperatures from multiple supported NVIDIA GPUs found in a host Linux system. These findings are based on reverse engineering of the NVIDIA GPU Linux driver.

Prerequisites

  • Kernel boot parameter: iomem=relaxed
sudo vim /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iomem=relaxed"
sudo update-grub
sudo reboot
  • Disabling Secure Boot

This can be done in the UEFI/BIOS configuration or using mokutil:

mokutil --disable-validation

Check state with:

$ sudo mokutil --sb
SecureBoot disabled

Dependencies

  • libpci-dev
sudo apt install libpci-dev -y

Installation (cmake)

git clone https://github.com/olealgoritme/gddr6
cd gddr6
./build_install.sh
sudo gddr6

Supported GPUs

  • RTX 4090 (AD102)
  • RTX 4080 Super (AD103)
  • RTX 4080 (AD103)
  • RTX 4070 Ti Super (AD103)
  • RTX 4070 Ti (AD104)
  • RTX 4070 Super (AD104)
  • RTX 4070 (AD104)
  • RTX 3090 Ti (GA102)
  • RTX 3090 (GA102)
  • RTX 3080 Ti (GA102)
  • RTX 3080 (GA102)
  • RTX 3080 LHR (GA102)
  • RTX 3070 (GA104)
  • RTX 3070 LHR (GA104)
  • RTX A2000 (GA106)
  • RTX A4500 (GA102)
  • RTX A5000 (GA102)
  • RTX A6000 (AD102)
  • L4 (AD104)
  • L40S (AD102)
  • A10 (GA102)

gddr6's People

Contributors

ajk-dev avatar bengt avatar bluzukk avatar daniel-dona avatar dasanyx avatar deinferno avatar eisenh avatar hackettjp avatar leikareipa avatar olealgoritme avatar panicchoiceai avatar shivams avatar yhemery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gddr6's Issues

RTX 6000 ADA

A suggestion to add RTX 6000 ADA support using the following line:

{ .offset = 0x0000E2A8, .dev_id = 0x26B1, .vram = "GDDR6", .arch = "AD102", .name = "RTX 6000 ADA" },

support for 3080 LHR

Add

    { .offset = 0x0000E2A8, .dev_id = 0x2216, .vram = "GDDR6X", .arch = "GA102", .name =  "RTX 3080 LHR" },

for support of LHR versions. Tested, works.

Random output on RTX 3070 (FE) on driver 535.113.01

Here are the reported temps on my Nvidia RTX 3070 (FE) running on driver version 535.113.01

Device: RTX 3070 GDDR6 (GA104 / 0x2484) pci=1:0:0
VRAM Temps: |  24°c | @ 0x0000ee50
VRAM Temps: |  43°c | @ 0x0000ee50
VRAM Temps: |  26°c | @ 0x0000ee50
VRAM Temps: |  56°c | @ 0x0000ee50
VRAM Temps: |  90°c | @ 0x0000ee50
VRAM Temps: |   9°c | @ 0x0000ee50
VRAM Temps: |   1°c | @ 0x0000ee50
VRAM Temps: |  50°c | @ 0x0000ee50
VRAM Temps: |  85°c | @ 0x0000ee50
VRAM Temps: |   4°c | @ 0x0000ee50
VRAM Temps: | 118°c | @ 0x0000ee50
VRAM Temps: |  36°c | @ 0x0000ee50
VRAM Temps: |  71°c | @ 0x0000ee50
VRAM Temps: |  78°c | @ 0x0000ee50
VRAM Temps: |  69°c | @ 0x0000ee50
VRAM Temps: |  94°c | @ 0x0000ee50

Seems like a random number between 0 and 127.
I modified so it shows the .offset it reads from and not flush the output.

Only shows one GPU temp when multiple GPUs in system

Hi, really sorry to ask for more, but would it be possible to observe the vram temps of multiple gpus? In my dual gpu setup (for ai/ml work), it appears if there is gpu 0 and gpu 1, it only pulls the vram temps of gpu 1. I will look at the code and see if I can make the changes myself, but thought I might as well plug the request in here as well. Thanks in advance, in case you have time to figure it out!

Support for 3090 Ti

I have two 3090 Ti FEs in my workstation and was trying to use this tool but it reports "No compatible GPU found". I see that 3090 Ti is not in the list of supported GPUs. Can we make it work for 3090 Ti?

Multi Gpu Order and coresponding temperatures wrong

Looks like the order of the output is reversed compared to nvidia-smi output.

Here is the output of gddr6:

user@test_server:~/tmp/vramtest/gddr6$ sudo ./gddr6
Device: RTX A2000 GDDR6 (GA106 / 0x2531) pci=8:d:0
Device: RTX A2000 GDDR6 (GA106 / 0x2531) pci=6:1b:0
Device: RTX A2000 GDDR6 (GA106 / 0x2531) pci=6:10:0
VRAM Temps: | 76°c | 74°c | 70°c |

And here is the output of nvidia-smi:

user@test_server:~$ nvidia-smi
Thu May 18 16:15:14 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A2000 On | 00000000:06:10.0 Off | Off |
| 36% 66C P2 56W / 70W| 4320MiB / 6138MiB | 30% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A2000 On | 00000000:06:1B.0 Off | Off |
| 37% 67C P2 54W / 70W| 4320MiB / 6138MiB | 32% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A2000 On | 00000000:08:0D.0 Off | Off |
| 43% 72C P2 68W / 70W| 4628MiB / 6138MiB | 33% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

segmentation fault / sigsegv error

Hi, thanks for working on this! I compile using
gcc gddr6.c -o gddr6 -lpci

no errors, but when i run ./gddr6, i get Segmentation fault. I followed instructions for apt install, and updated grub and rebooted (twice).

Temperature for RTX A2000

Is there a chance to add details for RTX A2000 (GA106),6GB, GDDR6 ?
These are the Device ID details. No idea which offset to set.

$> lspci -nn |grep NVIDIA
06:10.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [RTX A2000] [10de:2531] (rev a1)
06:11.0 Audio device [0403]: NVIDIA Corporation Device [10de:228e] (rev a1)

Thank you for you help.

A5000 Mobile

Requesting help on adding the A5000 mobile. I sent you a friend request on discord

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.