Comments (10)
any ideas for mitigating this?
No easy ones, sorry. You could try to find a different implementation of this operation that doesn't require wider temporaries.
from metal.jl.
Thanks, @maleadt. Do you think that this will fail on CUDA/AMDGPU? I am asking because it is used by some operations in Flux, so this might be a relevant issue for them too.
from metal.jl.
CUDA has more log
intrinsics: https://github.com/JuliaGPU/CUDA.jl/blob/38fb7071d893fad8591361d0da96149f45f7653f/src/device/intrinsics/math.jl#L102-L123
from metal.jl.
Do you think a patch similar to this would be appropriate?
from metal.jl.
Yes, it would.
cc @oscardssmith (I mentioned this being a problem at JuliaCon)
from metal.jl.
how much accuracy are you willing to lose? I can probably cook you up a FLoat32 only one that performs pretty well at the cost of slightly lower accuracy (probably in the 2-4 ULP area).
from metal.jl.
If the only problem is the implementation of log_proc2, there is a single precision implementation in the original paper (p.385-386). But I think there is another Float64 cast in log_proc1 too
from metal.jl.
oh, I think it's possible that the double precision casts may not be necessary at all. The biggest difference between our algorithm and Tang's is that we add an extra multiply at the end to account for the different base logs and that multiply may have a bunch of error without the extended precision. Would be good to try the version that doesn't upcast though...
from metal.jl.
I made an attempt to re-write the functions without using double precision floats #236
I tried in a REPL and the functions seem to work, however using them in Metal with MtlArray
does not.
from metal.jl.
from metal.jl.
Related Issues (20)
- `fill!` for `Int8` errors when the value is negative
- Support for macOS Sonoma HOT 1
- Long stacktrace when trying to create Float64 rand arrays HOT 2
- allowscalar equivalent for Metal.jl HOT 2
- Equivalent of cuSparse - start with sparse matvec HOT 7
- MPS - Support for Convolutional Neural Network kernels
- mapreduce allocates a lot on the CPU HOT 3
- Better error message for mixing MtlArray and Array operations
- Compilation failure due to high register usage HOT 3
- Threadgroup atomics require all-atomic operation HOT 3
- KernelAbstractions: add Atomix back-end
- Define map! ? HOT 1
- Q: How to debug kernels - KA.@print?
- Crash during MTLDispatchListApply HOT 14
- Unable to compile trig functions through ForwardDiff HOT 4
- `symbol multiply defined!` Bug/crash on Julia master, fine on 1.10 HOT 1
- When precompiling, UndefVarError: `CompilerConfig` not defined HOT 2
- Legalization errors with vectorized code HOT 3
- Use vkFFT for FFT support HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metal.jl.