Giter VIP home page Giter VIP logo

Comments (5)

mcabbott avatar mcabbott commented on May 25, 2024 1

Thanks for looking into all this. That's certainly not a graph which will impress anyone!

I'd call it disappointing but unsurprising. Tullio generates the most naiive possible KernelAbstractions code, and I know from e.g. these transpose examples that there is much more than could be done. How easily and how well this can be done automatically, for arbitrary expressions, I don't know.

I also don't know whether there are operations on which this naiive code is more competitive. On the CPU, large matmul is a severe test of how you handle awkward memory access, but something like @tullio d[i] := p[i,k] * log(p[i,k]/q[i,k]) has much more work per element and a more obvious access pattern.

Would be nice for Tullio to do better at these things, but I don't even have a working GPU at the moment, and I have many projects. If anyone wanted to try that would be fantastic, I think the code to generate loops can be played with without diving into the more tangled macro code for parsing etc.

For the purposes of BLASBenchmarksGPU, there may already exist much better KernelAbstractions matmul code, no need to auto-generate it. And it might be interesting to see how varying levels of cleverness affect things at different sizes.

from tullio.jl.

DilumAluthge avatar DilumAluthge commented on May 25, 2024

Here is a plot for Matrix{Float32}=Matrix{Float32}×Matrix{Float32}:

plot

And here are the TFLOPS values:

14×4 DataFrame
 Row │ Size   Library  TFLOPS      Time
     │ Int64  Symbol   Float64     Float64
─────┼────────────────────────────────────────────
   1 │   128  CUBLAS    0.189052    22186.0
   2 │   128  Tullio    0.0552115   75968.0
   3 │   256  CUBLAS    0.93772     35783.0
   4 │   256  Tullio    0.290567   115479.0
   5 │   512  CUBLAS    3.5438      75748.0
   6 │   512  Tullio    0.633555   423697.0
   7 │  1024  CUBLAS   19.232      111662.0
   8 │  1024  Tullio    0.745863        2.8792e6
   9 │  2048  CUBLAS   41.5728     413248.0
  10 │  2048  Tullio    0.535823        3.20626e7
  11 │  4096  CUBLAS   49.3898          2.78274e6
  12 │  4096  Tullio    0.31201         4.40496e8
  13 │  8192  CUBLAS   39.9035          2.75543e7
  14 │  8192  Tullio    0.298638        3.68176e9

from tullio.jl.

DilumAluthge avatar DilumAluthge commented on May 25, 2024

Here is Matrix{Float32}=Matrix{Float32}×Matrix{Float32} at small matrix sizes:

plot

And here are the TFLOPS values:

 Row │ Size   Library  TFLOPS       Time
     │ Int64  Symbol   Float64      Float64
─────┼───────────────────────────────────────
   1 │     1  CUBLAS   1.07614e-7    18585.0
   2 │     1  Tullio   1.1015e-7     18157.0
   3 │     2  CUBLAS   1.12859e-6    14177.0
   4 │     2  Tullio   8.46158e-7    18909.0
   5 │     3  CUBLAS   3.05603e-6    17670.0
   6 │     3  Tullio   2.58807e-6    20865.0
   7 │     4  CUBLAS   7.21411e-6    17743.0
   8 │     4  Tullio   6.12821e-6    20887.0
   9 │     5  CUBLAS   1.40473e-5    17797.0
  10 │     5  Tullio   1.19355e-5    20946.0
  11 │     6  CUBLAS   2.91695e-5    14810.0
  12 │     6  Tullio   2.05695e-5    21002.0
  13 │     7  CUBLAS   4.50575e-5    15225.0
  14 │     7  Tullio   3.25922e-5    21048.0
  15 │     8  CUBLAS   6.45772e-5    15857.0
  16 │     8  Tullio   4.75682e-5    21527.0
  17 │     9  CUBLAS   8.25875e-5    17654.0
  18 │     9  Tullio   6.71889e-5    21700.0
  19 │    10  CUBLAS   0.000111744   17898.0
  20 │    10  Tullio   9.05305e-5    22092.0
  21 │    30  CUBLAS   0.00295421    18279.0
  22 │    30  Tullio   0.00176655    30568.0
  23 │    50  CUBLAS   0.011872      21058.0
  24 │    50  Tullio   0.00448053    55797.0
  25 │    70  CUBLAS   0.0324304     21153.0
  26 │    70  Tullio   0.0105823     64825.0
  27 │    90  CUBLAS   0.0624679     23340.0
  28 │    90  Tullio   0.023251      62707.0
  29 │   110  CUBLAS   0.107313      24806.0
  30 │   110  Tullio   0.0386391     68894.0
  31 │   130  CUBLAS   0.194994      22534.0
  32 │   130  Tullio   0.0501592     87601.0
  33 │   150  CUBLAS   0.238196      28338.0
  34 │   150  Tullio   0.0735534     91770.0
  35 │   170  CUBLAS   0.395524      24843.0
  36 │   170  Tullio   0.104512      94018.0
  37 │   190  CUBLAS   0.437074      31386.0
  38 │   190  Tullio   0.140838      97403.0
  39 │   210  CUBLAS   0.548053      33796.0
  40 │   210  Tullio   0.177458     104374.0
  41 │   230  CUBLAS   0.76275       31903.0
  42 │   230  Tullio   0.223804     108729.0
  43 │   250  CUBLAS   1.06841       29249.0
  44 │   250  Tullio   0.273533     114246.0

from tullio.jl.

mcabbott avatar mcabbott commented on May 25, 2024

Re the Float32 graph, in round numbers my CPU gets to about half a teraflop from the low-hundreds, which appears to be faster than Tullio's GPU matmul in the range shown. My graph is here:

https://github.com/mcabbott/Tullio.jl/blob/master/benchmarks/02/matmul-0.2.11-Float32-1.6.0-dev.png

And xref also #30 about GPU vs CPU speed.

from tullio.jl.

simeonschaub avatar simeonschaub commented on May 25, 2024

I would be curious to know whether #82 helps here. I am not expecting a huge difference, but it might help a little.

from tullio.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.