Giter VIP home page Giter VIP logo

tf-32-tensor-test-suite's Introduction

TF-32-Tensor-Test-Suite

TensorFlow2 and PyTorch test for tensor cores performance. Tested with 2.10.0 and 1.13.0 respectively.

Tests were written after Cuda 11.8 and Cudnn 8.6.0.163 with Debian 11.

There are some parameters to be setup before test run ↓

Iterations number

-i or --iters Default value - 100

Matrix size n x n

-s or --size Default value - 8192

Tensor cores use selector

-t --tensor Default value - True

It is recommended not to change the values of iters and size without specific need.

tensor allows to switch between TF-32 math when selected and FP-32 when not selected.

Other parameters

If you need more information while running the test use -v or --verbose to make it more talkative.

For Test-Suite there two more parameters: -a to run all tests and -r or --runs to override predifined 13 runs.

Performance boosts for TF-32 math

For single GeForce RTX 3060 average measured performance acceleration medians were 37,02% with TF-32 math for Tensorflow and 81,07% for PyTorch. Tests iters were 100, 1'000, 10'000, 50'000 and 100'000 with n = 8192.

3060 tests

For single GeForce RTX 3060 set of 13 runs with exlude of 1st one with iter = 100 medians for TensorFlow2 were 9594,22 Gflops/s FP-32 (σ 5,5555) and 13102,84 Gflop/s TF-32 (σ 7,0173), acceleration 36,57%.

For single GeForce RTX 3060 set of 13 runs with exlude of 1st one with iter = 100 medians for PyTorch were 7255,66 Glop/s FP-32 (σ 4,7315) and 13137,63 Gflop/s TF-32 (σ 23,8104), acceleration 81,07%.

3080 TI tests

For single GeForce RTX 3080 TI set of 13 runs with exlude of 1st one with iter = 100 medians for TensorFlow2 were 25481,71 Gflops/s FP-32 (σ 43,8388) and 39424,27 Gflop/s TF-32 (σ 1,3126), acceleration 54,72%.

For single GeForce RTX 3080 TI set of 13 runs with exlude of 1st one with iter = 100 medians for PyTorch were 21671,57 Glop/s FP-32 (σ 31,9975) and 38914,05 Gflop/s TF-32 (σ 71,7127), acceleration 79,56%.

A40 tests

For single Tesla A40 set of 13 runs with exlude of 1st one with iter = 100 medians for TensorFlow2 were 23396,17 Gflops/s FP-32 (σ 469,9513) and 63867,91 Gflop/s TF-32 (σ 374,0034), acceleration 172,98%.

For single Tesla A40 set of 13 runs with exlude of 1st one with iter = 100 medians for PyTorch were 23906,69 Glop/s FP-32 (σ 157,2779) and 64557,36 Gflop/s TF-32 (σ 496,7070), acceleration 170,04%.

tf-32-tensor-test-suite's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.