Giter VIP home page Giter VIP logo

Comments (10)

ilyes319 avatar ilyes319 commented on August 24, 2024

Can you update your torch version to the latest one. At the 2.0.1, torch.vmap was still experimental.

from mace.

ilyes319 avatar ilyes319 commented on August 24, 2024

Works fine on colab with Pytorch 2.3.0: https://colab.research.google.com/drive/1GJETYYW8iUpDeyJ7p0j44FkjOM7gbYAm#scrollTo=mxSX_Y1lRR7p

from mace.

Nilsgoe avatar Nilsgoe commented on August 24, 2024

I tested it with torch 2.31. and yes it works but it falls back to the to the loop all the time.

from mace.

ilyes319 avatar ilyes319 commented on August 24, 2024

What GPU are you using?

from mace.

Nilsgoe avatar Nilsgoe commented on August 24, 2024

It's a "NVIDIA A100 80GB PCIe".

from mace.

ilyes319 avatar ilyes319 commented on August 24, 2024

Can confirm that it is working on CPU?

from mace.

Nilsgoe avatar Nilsgoe commented on August 24, 2024

Yes with CPU it works.
But when I am trying it in colab with print statements added at the try/except:

try:
        print("vmap")
        chunk_size = 1 if num_elements < 64 else 16
        gradient = torch.vmap(get_vjp, in_dims=0, out_dims=0, chunk_size=chunk_size)(
            I_N
        )[0]
except RuntimeError:
        print("loop")
        gradient = compute_hessians_loop(forces, positions)

I am still printing vmap and loop which tells me that it falls back to the loop implementation.

from mace.

ilyes319 avatar ilyes319 commented on August 24, 2024

Can you confirm it is working on GPU when you remove the line that computes the E0.

from mace.

Nilsgoe avatar Nilsgoe commented on August 24, 2024

Yes it prints then once only vmap now when I am running it on my own GPU (in the colab it seems to not do that). But when i am running it like this:

for i in range(5):
    s=time.time()
    h_autograd=calc.get_hessian(atoms=initial)
    e=time.time()
    print(f"This system needs {e-s} seconds")

it falls back to the loop implementation after the first iteration of the loop.

EDIT:
So for the colab it is now the same as for my own GPU.
The error is:

Traceback (most recent call last):
  File "/work/home/ngoen/Documents/torch_testing/large_test.py", line 102, in <module>
    h_autograd=calc.get_hessian(atoms=initial)#,method="vectorized_autograd")
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/calculators/mace.py", line 320, in get_hessian
    hessians = [
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/calculators/mace.py", line 321, in <listcomp>
    model(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/modules/models.py", line 395, in forward
    forces, virials, stress, hessian = get_outputs(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/modules/utils.py", line 211, in get_outputs
    hessian = compute_hessians_vmap(forces, positions)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/modules/utils.py", line 135, in compute_hessians_vmap
    gradient = torch.vmap(get_vjp, in_dims=0, out_dims=0, chunk_size=chunk_size)(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/_functorch/apis.py", line 188, in wrapped
    return vmap_impl(func, in_dims, out_dims, randomness, chunk_size, *args, **kwargs)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 277, in vmap_impl
    return _chunked_vmap(func, flat_in_dims, chunks_flat_args,
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 365, in _chunked_vmap
    _flat_vmap(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 47, in fn
    return f(*args, **kwargs)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 403, in _flat_vmap
    batched_outputs = func(*batched_inputs, **kwargs)
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/mace/modules/utils.py", line 122, in get_vjp
    return torch.autograd.grad(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 412, in grad
    result = _engine_run_backward(
  File "/work/home/ngoen/mace_h-venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Cannot access data pointer of Tensor that doesn't have storage

and this can be solved by deactivating the jit.compile.script of e3nn spherical_harmonics but that will make it slower.

from mace.

ilyes319 avatar ilyes319 commented on August 24, 2024

You can use:
calc = mace_mp(model="medium", dispersion=False, default_dtype="float64",device='cuda', compile_mode="default")
It will use torch.compile, so it will be slow for the first pass.

from mace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.