Hello, with a pytorch tensor t, I can call t.norm(p, dim

With an example to demonstrate the issue: <div class="highlight highlight-source-p

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

norms should delegate to the backend where possible about eagerpy HOT 3 OPEN

jonasrauber commented on May 19, 2024

norms should delegate to the backend where possible

from eagerpy.

Comments (3)

mglisse commented on May 19, 2024

With an example to demonstrate the issue:

import torch
import eagerpy
a = torch.tensor([0.], requires_grad=True)
torch.norm(a, p=2).backward()
print(a.grad)
eagerpy.astensor(a).norms.l2().raw.backward()
print(a.grad)

tensor([0.])
tensor([nan])

from eagerpy.

jonasrauber commented on May 19, 2024

Hi @mglisse, thanks for request and the example code.
That makes a lot of sense and I think this might be doable.
May I ask how you use EagerPy? Do you just use it as an alternative API for PyTorch, without needing the the ability to run the same code using different frameworks, or why is this only a problem with PyTorch?

from eagerpy.

mglisse commented on May 19, 2024

Hi, thanks for the reply. I use eagerpy so I can write the code only once and let it work with several frameworks. It is true that currently I mostly experiment with pytorch though.
The problem isn't limited to pytorch. The first time I hit this NaN issue with pytorch, jax was giving good numbers, so I assumed they were doing something different. I didn't keep the exact code, and now that I try to reproduce it, I seem to get NaN from jax and pytorch in the same cases. So I don't know if my experiment at the time was bogus, or hit a very special case...
A good thing is that all frameworks seem to provide a norm function (at least for p not 0?). A bad thing is that the one in jax (I did not check tensorflow) does not seem to have a special (sub)gradient implementation, it also gives a NaN gradient for jax.numpy.linalg.norm(x,2) in 0. But I could go ask them about that. Another bad thing is that they don't have the same definition. On a matrix [[1,2],[3,4]] with p=1, numpy/jax return 6 while torch/tensorflow return 10, that complicates things a bit...
Of course there are workarounds, I could compute norms manually and add tiny (trying to work through the various dtype/finfo to get it) before doing the square root. Or I can let eagerpy compute the norm and if result is 0, result=result.from_numpy(0.) to replace it with a constant (or actually some better formulation to get the right dtype, plus with pytorch this one does not have requires_grad so if I call .raw.backward() directly on it without combining it with other numbers, it fails).

from eagerpy.

Recommend Projects

norms should delegate to the backend where possible about eagerpy HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent