❓ Questions and Help My application code is complex, but it's not

technically you can do <div class="snippet-clipboard-content notranslate position-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is there any way to directly execute the cached computational graph about xla HOT 5 OPEN

mars1248 commented on July 30, 2024

Is there any way to directly execute the cached computational graph

from xla.

Comments (5)

JackCaoG commented on July 30, 2024

technically there is, you can look at our dynamo implementation where we

execute the tracing

xla/torch_xla/core/dynamo_bridge.py

Line 337 in 08e63e3

xla_out = xla_model(*xla_args)

compute the hash + warm up the cache(compilation)

xla/torch_xla/core/dynamo_bridge.py

Lines 395 to 399 in 08e63e3

 graph_hash = torch_xla._XLAC._get_graph_hash(args_and_out) 

 if dynamo_debug: 

 print("Graph Hash: ", graph_hash) 

 # compiles and cache graph rooted at tensors in 'args_and_out' 

 torch_xla._XLAC._xla_warm_up_cache(args_and_out, [])

execute the hash with input

xla/torch_xla/core/dynamo_bridge.py

Line 497 in 08e63e3

res = torch_xla._XLAC._run_cached_graph(graph_hash, graph_input)

Dynamo is suppose to do what you expected, it handles the input ordering, output ordering, functionization of the graph etc. If you uses these api directly you need to be very careful.

from xla.

mars1248 commented on July 30, 2024

Thank you very much for your answer. I have successfully run the forward calculation of the model according to your tips and referring to this ut, https://github.com/pytorch/xla/blob/08e63e32af9eee71e8cd13d672f3200ee3356ab4/test/dynamo/test_graph_input_matcher.py
but I do not know how to add the backward calculation and the optimizer state update?

from xla.

JackCaoG commented on July 30, 2024

technically you can do

loss = fwd(input)
loss.backward()
optimizer.step()
graph_hash = torch_xla._XLAC._get_graph_hash([loss] + [all_parameter_gradient])

From xla perspective there is not fwd and bwd, you just need to pass all of the output(int this case gradients) it will use those as root to construct the whole graph.

from xla.

mars1248 commented on July 30, 2024

Thank you very much for your reply.I went through the whole process, but I found that the parameters were not updated, resulting in the same loss(in my case is res[0]). I constructed a minimal single test that can reproduce this problem. @JackCaoG


import torch
import torch_xla
import torch_xla.core.xla_model as xm
from torch import nn
from torch.utils._pytree import tree_map_only
from torch_xla.core.dynamo_bridge import GraphInputMatcher
from torch_xla.amp import syncfree

class M(nn.Module):

  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(5, 3)

  def forward(self, x):
    return self.linear(x)

  def get_example_inputs(self):
    return (torch.rand(10, 5),)

xla_dev = xm.xla_device()
model = M().to(device=xla_dev)
optimizer = syncfree.AdamW(model.parameters(), lr=0.01)
inputs = tree_map_only(torch.Tensor, lambda x: x.to(device=xla_dev),
                        model.get_example_inputs())

xm.mark_step()
args_tensor_ids = [
    torch_xla._XLAC._xla_get_tensor_id(xla_arg) for xla_arg in inputs
]
tensor_id_to_arg_idx = {
    tensor_id: i for i, tensor_id in enumerate(args_tensor_ids)
}
output = model(*inputs).sum()
output.backward()
found_inf = torch.isnan(output).to(torch.float32).to(xla_dev)
optimizer.step(found_inf=found_inf)
opt_state = []
for name, p in model.named_parameters():
    if p.grad is not None:
        opt_state.append(p)
        opt_state.append(p.grad)
    else:
        print(name, "no grad")
output_list = [output] + opt_state
xla_graph_hash = torch_xla._XLAC._get_graph_hash(output_list)
torch_xla._XLAC._xla_warm_up_cache(output_list, [])
(
    graph_input_tensor_ids,
    graph_input_xla_values,
) = torch_xla._XLAC._get_tensors_xla_device_data_node(output_list)
xla_args_tensor_ids = set(
    tree_map_only(torch.Tensor,
                    lambda input: torch_xla._XLAC._xla_get_tensor_id(input),
                    inputs))
graph_input_matcher = GraphInputMatcher(tensor_id_to_arg_idx,
                                        graph_input_tensor_ids,
                                        graph_input_xla_values,
                                        xla_args_tensor_ids)
for i in range(3):
    graph_input = graph_input_matcher(inputs)
    res = torch_xla._XLAC._run_cached_graph(xla_graph_hash, graph_input)
    print(res[0])

I think the code below is logically the same as the code above, but the loss will change for the code below, but not for the code above

import torch
import torch_xla
import torch_xla.core.xla_model as xm
from torch import nn
from torch.utils._pytree import tree_map_only
from torch_xla.core.dynamo_bridge import GraphInputMatcher
from torch_xla.amp import syncfree

class M(nn.Module):

  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(5, 3)

  def forward(self, x):
    return self.linear(x)

  def get_example_inputs(self):
    return (torch.rand(10, 5),)

xla_dev = xm.xla_device()
model = M().to(device=xla_dev)
optimizer = syncfree.AdamW(model.parameters(), lr=0.01)
inputs = tree_map_only(torch.Tensor, lambda x: x.to(device=xla_dev),
                        model.get_example_inputs())

xm.mark_step()
args_tensor_ids = [
    torch_xla._XLAC._xla_get_tensor_id(xla_arg) for xla_arg in inputs
]
tensor_id_to_arg_idx = {
    tensor_id: i for i, tensor_id in enumerate(args_tensor_ids)
}
for i in range(3):
    output = model(*inputs).sum()
    output.backward()
    found_inf = torch.isnan(output).to(torch.float32).to(xla_dev)
    optimizer.step(found_inf=found_inf)
    optimizer.zero_grad()
    xm.mark_step()
    print("debug ", output)

from xla.

mars1248 commented on July 30, 2024

@JackCaoG @dewitt @sprt @ezyang
Hello, I have located the root cause of the problem in the above single test, because only the placeholder was assigned, but the parameters were not assigned, so although new_param was calculated, the parameters did not change.
https://github.com/pytorch/xla/blob/master/torch_xla/csrc/xla_graph_executor.cpp#L816-L817
But I don't know how to fix this problem. Could you give me some ideas?

from xla.

Is there any way to directly execute the cached computational graph about xla HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	graph_hash = torch_xla._XLAC._get_graph_hash(args_and_out)
	if dynamo_debug:
	print("Graph Hash: ", graph_hash)
	# compiles and cache graph rooted at tensors in 'args_and_out'
	torch_xla._XLAC._xla_warm_up_cache(args_and_out, [])