This raises InvalidIRError [...] Reason: unsupported unsuppo

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

CUDA has more log intrinsics: <a href="https://github

Do you think a patch similar to <a href="https://github.com/JuliaGPU/Metal.jl/blob/047

Yes, it would. cc <a class="user-mention notranslate" data-hovercard

If the only problem is the implementation of log_proc2, <a href="https://dl.acm.org/do

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="18

`log1p` fails on `MtlArray{Float32}` about metal.jl HOT 10 CLOSED

sotlampr commented on September 22, 2024

`log1p` fails on `MtlArray{Float32}`

from metal.jl.

Comments (10)

maleadt commented on September 22, 2024

any ideas for mitigating this?

No easy ones, sorry. You could try to find a different implementation of this operation that doesn't require wider temporaries.

from metal.jl.

sotlampr commented on September 22, 2024

Thanks, @maleadt. Do you think that this will fail on CUDA/AMDGPU? I am asking because it is used by some operations in Flux, so this might be a relevant issue for them too.

from metal.jl.

maleadt commented on September 22, 2024

CUDA has more log intrinsics: https://github.com/JuliaGPU/CUDA.jl/blob/38fb7071d893fad8591361d0da96149f45f7653f/src/device/intrinsics/math.jl#L102-L123

from metal.jl.

sotlampr commented on September 22, 2024

Do you think a patch similar to this would be appropriate?

from metal.jl.

maleadt commented on September 22, 2024

Yes, it would.

cc @oscardssmith (I mentioned this being a problem at JuliaCon)

from metal.jl.

oscardssmith commented on September 22, 2024

how much accuracy are you willing to lose? I can probably cook you up a FLoat32 only one that performs pretty well at the cost of slightly lower accuracy (probably in the 2-4 ULP area).

from metal.jl.

sotlampr commented on September 22, 2024

If the only problem is the implementation of log_proc2, there is a single precision implementation in the original paper (p.385-386). But I think there is another Float64 cast in log_proc1 too

from metal.jl.

oscardssmith commented on September 22, 2024

oh, I think it's possible that the double precision casts may not be necessary at all. The biggest difference between our algorithm and Tang's is that we add an extra multiply at the end to account for the different base logs and that multiply may have a bunch of error without the extended precision. Would be good to try the version that doesn't upcast though...

from metal.jl.

sotlampr commented on September 22, 2024

I made an attempt to re-write the functions without using double precision floats #236

I tried in a REPL and the functions seem to work, however using them in Metal with MtlArray does not.

from metal.jl.

maleadt commented on September 22, 2024

#239

from metal.jl.

Recommend Projects

`log1p` fails on `MtlArray{Float32}` about metal.jl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent