Giter VIP home page Giter VIP logo

peakperf's Issues

Intel 13th Gen not is unknown

Hi,
I got the following error:
Unknown microarchitecture detected: M=0x00000007 EM=0x0000000B F=0x00000006 EF=0x00000000 S=0x00000001

The cpu is a 13th gen Intel I7-13700K.

If you need anything else let me know.
Dennis

Contact

Hey Dr-noob, 2 years ago you posted something about hacking pacybits, i need help with it and i'm wondering jf you could help me. Can i contact you somewhere?

I know your post was old but it would be really appreciated if you could help me.

Found invalid uarch: 'Zen 3'

[ERROR]: Found invalid uarch: 'Zen 3'
[ERROR]: peakperf is unable to automatically select the benchmark for your CPU. Please, select the benchmark manually (see peakperf -h) and/or post this error message in https://github.com/Dr-Noob/peakperf/issues

πŸ‘

Running FLOPS in Windows

I've tried running FLOPS in Windows:

First, one have to change some int and long to stdint's type (int32_t and int64_t). After that, I tried running it and the performance was horrible. Then, I figured out, looking at assembly, that the loop was compiled poorly. I noticed I was using a 32 bit compiler (mingw which is 32 bits). Lastly, I tried compiling it with a 64 bit compiler (I used mingw-w64), using the Windows build (MingW-W64-builds) and the Arch Linux one (mingw-w64-gcc-bin). Both gave me the same result: segmentation fault. I found somewhere that running it with 32 bits but having a segfault with 64 bits could be caused to some issues with the stack, which should be solved using -fno-stack-protector. This does not solve the segfault. I love you, Windows ❀️

Wrong compute architecture is being detected during build

Hello!
I noticed the following during build:

./build.sh
...

-- The CXX compiler identification is GNU 13.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /opt/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 12.2.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- ----------------------
-- peakperf build report:
-- CPU mode: ON
-- GPU mode: ON
-- ----------------------
-- Configuring done (7.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/william/src/peakperf/build
[  5%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/cpufetch.cpp.o
[  5%] Building CXX object CMakeFiles/512_8.dir/src/cpu/arch/512_8.cpp.o
[ 11%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/cpuid.cpp.o
[ 11%] Building CXX object CMakeFiles/512_12.dir/src/cpu/arch/512_12.cpp.o
[ 14%] Building CUDA object CMakeFiles/gpu_device.dir/src/gpu/arch.cu.o
[ 20%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/uarch.cpp.o
[ 23%] Building CXX object CMakeFiles/256_5.dir/src/cpu/arch/256_5.cpp.o
[ 29%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch.cpp.o
[ 29%] Building CXX object CMakeFiles/256_6.dir/src/cpu/arch/256_6.cpp.o
[ 35%] Building CXX object CMakeFiles/256_6_nofma.dir/src/cpu/arch/256_6_nofma.cpp.o
[ 35%] Building CUDA object CMakeFiles/gpu_device.dir/src/gpu/kernel.cu.o
[ 35%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_sse.cpp.o
[ 38%] Building CXX object CMakeFiles/128_6.dir/src/cpu/arch/128_6.cpp.o
[ 44%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_avx512.cpp.o
[ 47%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_avx.cpp.o
[ 47%] Building CXX object CMakeFiles/256_8.dir/src/cpu/arch/256_8.cpp.o
[ 52%] Building CXX object CMakeFiles/128_8.dir/src/cpu/arch/128_8.cpp.o
[ 52%] Building CXX object CMakeFiles/256_10.dir/src/cpu/arch/256_10.cpp.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/gpu_device.dir/build.make:76: CMakeFiles/gpu_device.dir/src/gpu/arch.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/gpu_device.dir/build.make:90: CMakeFiles/gpu_device.dir/src/gpu/kernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:401: CMakeFiles/gpu_device.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 55%] Linking CXX static library lib512_12.a
[ 58%] Linking CXX static library lib512_8.a
[ 58%] Built target 512_12
[ 58%] Built target 512_8
[ 61%] Linking CXX static library lib256_5.a
[ 64%] Linking CXX static library lib256_8.a
[ 67%] Linking CXX static library lib256_10.a
[ 70%] Linking CXX static library lib256_6.a
[ 73%] Linking CXX static library lib128_8.a
[ 76%] Linking CXX static library lib128_6.a
[ 76%] Built target 256_5
[ 79%] Built target 256_8
[ 79%] Built target 256_10
[ 79%] Linking CXX static library lib256_6_nofma.a
[ 79%] Built target 256_6
[ 79%] Built target 128_8
[ 82%] Linking CXX static library libcpu_device.a
[ 82%] Built target 128_6
[ 82%] Built target 256_6_nofma
[ 82%] Built target cpu_device
make: *** [Makefile:136: all] Error 2

The relevant part being nvcc fatal : Unsupported gpu architecture 'compute_35'
I looked at the code briefly, but couldn't see anything obvious that would cause it to return as compute_35.
Screenshot_20231026-030647

The getGencode script is just a validation tool I cobbled together for another project https://github.com/wallentx/alpha-report/blob/90ee2e7c006dcfd75dd76fe31ffc0a866179d819/get-gencode#L1-L33

Hybrid mode

Run peakperf in CPU and GPU at the same time:

device == DEVICE_TYPE_HYBRID

NΒΊ  Time(s)  TFLOP/s (CPU +  GPU)
 1  2.50984   4.300  (500 + 3800)
 2  2.50898   4.310  (500 + 3810)

[GPU] Support for RT cores

Same as tensor cores, but with RT cores. Not sure if this RT cores will provide more performance than tensor cores, tough.

[GPU] Support for tensor cores

  1. Detect uarch and deduce if the GPU has tensor cores or not
  2. Run a GeMM (how?) using tensor cores to achieve the peak performance in half precision

GFLOPS off by factor of 2 on AMD Ryzen 7 5800X 8-Core Processor

When I run the benchmark with >8 threads, my average performance is consistently lower than the expected GFLOPS. Running peakperf with no arguments yields GFLOP of 2048 but average performance under 1200 (I have decent undervolt enabled). Specifying 8 threads changes the printed GFLOP value to 1024, and I'm seeing an average perforamance > 1100 (but less than the results for t = 16). I'm not sure if this is an issue with my cpu configuration or the test expectations.

Add some feedback while benchmark is running

Something that makes the user understand that computing is taking place.

For example, a progress bar, like:

|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 50%
|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100%

or a blinking cursor like this function does:

spin() {
   local -a marks=( '/' '-' '\' '|' )
   while [[ 1 ]]; do
     printf '%s\r' "${marks[i++ % ${#marks[@]}]}"
     sleep 0.1
   done
 }

This would require adding an additional thread in main

[CPU] Support for non-AVX variantes

There are many uarchs (e.g., Kaby Lake) that support AVX in the majority of CPUs but not all (e.g., celeron), but peakperf currently assumes that they all support AVX.

Separate CPU backends by latencies and ALUs, not uarchs

Commit c763c17 separated files by uarch, instead of the old system of separating backends by latencies, ALU numbers and width. I think I introduced that change because I thought I had to use specific -march flags for each architecture, but it is not needed. Going back to the old system would reduce lines of code and code complexity.

Automatic benchmarking

Detect CPU microarchitecture (using cpufetch code) and select the right benchmark automatically

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.