paddlepaddle / padiff Goto Github PK
View Code? Open in Web Editor NEWPaddle Automatically Diff Precision Toolkits.
Paddle Automatically Diff Precision Toolkits.
PaDiff 能够进行 API 级别的对齐检查,其原理是将框架内的组网 API 包装成一个模型,然后给这个模型设置 hook,这使得所有 API 调用必定会进入 hook 逻辑,对于框架提供的模型组件而言(非用户自定义)也是如此。
修改 PaDiff 的 hook 逻辑,检查当前 API 的触发位置,若该 API 触发于 paddle/torch 框架提供的模型组件下,则略过此 API 的信息记录。
熟悉 report 模块下的内容,结合 PaDiff 工具维护的栈结构进行判断
PaDiff 仓库中有一份 yaml 文件(datas文件夹下),该文件标注了paddle与torch提供的组件权重之间的差异,例如linear的weight需要转置才能对齐。此外,在进行对齐检查时还有一个actions机制(在checker文件夹下),根据当前传入的类型名来选取不同的比较函数。目前这两个机制是独立的,actions实际上只有一种没有起作用。
修改checker模块下,关于模型权重以及梯度的对齐逻辑,剔除yamls的影响,改为使用actions机制(同时可以优化一下get_action()
接口)
P.S. 该 yaml 文件在权重初始化功能中仍被使用,权重初始化是一个独立的模块。就对齐工具而言只处理 checker 模块下的部分即可。
在获取actions时,需要区分当前的对齐目标:针对模型输出 or 模型权重。不同的对齐目标应当影响返回的 action 类型,为此,可能需要为 dump 下来的文件添加额外信息
在 PaDiff 仓库进行功能更新后,仓库的使用文档未进行同步更新,且仓库还缺乏一个新版本的模块设计图
对docs目录下的文档进行更新,其中大部分文档是旧版本的,将它们适配到新版本
仓库中的 docs/Interfaces.md
已进行了简要更新,也可以提供参考
在模型精度检查时,有时只需要监控特定的几个关键 Tensor
。提供一个接口,用户将调用这个接口侵入式地修改组网代码,标记需要监控的 Tensor
。这个接口与其他接口是互相独立的。
该接口收到一个 Tensor
后,应当保存这个 Tensor
的信息(到某个全局变量/闭包/或者其他机制),并注册反向 hook
。在一个step
后提供某种机制,将这个 step
中记录的 Tensor
信息保存下来。
具体的做法可以参考仓库中 report
文件夹下的 report
与 tensor_hoo
k 之间的互动。最后,还需要提供一个 dump
记录信息的接口。
在样例代码上加入MultiheadAttention,尝试进行参数值复制,但失败
版本:
paddlepaddle-gpu == 2.4.2
torch == 1.12.0+cu102
代码
import paddle
import torch
from padiff import assign_weight, create_model, auto_diff
import torch
import paddle
import paddle
import signal
import os
from padiff import add_special_init
class SimpleModule(torch.nn.Module):
def __init__(self):
super(SimpleModule, self).__init__()
self.linear1 = torch.nn.Linear(100, 10)
self.attention = torch.nn.MultiheadAttention(64, 8, dropout=0.1, batch_first=True)
def forward(self, x):
x = self.linear1(x)
return x
class SimpleLayer(paddle.nn.Layer):
def __init__(self):
super(SimpleLayer, self).__init__()
self.linear1 = paddle.nn.Linear(100, 10)
self.attention = paddle.nn.MultiHeadAttention(64, 8, dropout=0.1)
def forward(self, x):
x = self.linear1(x)
return x
module = create_model(SimpleModule())
module.auto_layer_map("base")
layer = create_model(SimpleLayer())
layer.auto_layer_map("raw")
assign_weight(module, layer)
报错:
RuntimeError: Error occured when trying init weights, between:
base_model: MultiheadAttention()
SimpleModule.attention.in_proj_weight
raw_model: Linear(in_features=64, out_features=64, dtype=float32)
SimpleLayer.attention.q_proj.weight
模型架构日志文件:
padiff_log/weight_init_SimpleModule.log:
SimpleModule
========================================
SimpleModule
|--- Linear
+--- MultiheadAttention <--- *** HERE ***
+--- NonDynamicallyQuantizableLinear (skip)
padiff_log/weight_init_SimpleLayer.log:
SimpleLayer
========================================
SimpleLayer
|--- Linear
+--- MultiHeadAttention (skip)
|--- Linear <--- *** HERE ***
|--- Linear
|--- Linear
+--- Linear
请问应该如何修改呢?
版本:
跑下面的demo代码出错:
import paddle
import torch
from padiff import auto_diff
import torch
import paddle
class SimpleModule(torch.nn.Module):
def __init__(self):
super(SimpleModule, self).__init__()
self.linear1 = torch.nn.Linear(100, 10)
def forward(self, x):
x = self.linear1(x)
return x
class SimpleLayer(paddle.nn.Layer):
def __init__(self):
super(SimpleLayer, self).__init__()
self.linear1 = paddle.nn.Linear(100, 10)
def forward(self, x):
x = self.linear1(x)
return x
module = SimpleModule()
layer = SimpleLayer()
inp = paddle.rand((100, 100)).numpy().astype("float32")
inp = ({'x': torch.as_tensor(inp) },
{'x': paddle.to_tensor(inp)})
auto_diff(module, layer, inp, atol=1e-4, auto_init=True)
报错信息如下:
Traceback (most recent call last):
File "padiff_test.py", line 63, in <module>
auto_diff(module, layer, inp, atol=1e-4, auto_init=True)
TypeError: auto_diff() got an unexpected keyword argument 'atol'
import torch
import paddle
import numpy as np
from padiff import create_model, auto_diff
class SparseDownSampleCloseBase(torch.nn.Module):
def __init__(self, stride):
super(SparseDownSampleCloseBase, self).__init__()
self.pooling = torch.nn.MaxPool2d(stride, stride)
self.large_number = 600
def forward(self, d, mask):
encode_d = -(1 - mask) * self.large_number - d
d = -self.pooling(encode_d)
mask_result = self.pooling(mask)
d_result = d - (1 - mask_result) * self.large_number
return d_result, mask_result
class SparseDownSampleCloseRaw(paddle.nn.Layer):
def __init__(self, stride):
super(SparseDownSampleCloseRaw, self).__init__()
self.pooling = paddle.nn.MaxPool2D(stride, stride)
self.large_number = 600
def forward(self, d, mask):
encode_d = -(1 - mask) * self.large_number - d
d = -self.pooling(encode_d)
mask_result = self.pooling(mask)
d_result = d - (1 - mask_result) * self.large_number
return d_result, mask_result
module = create_model(SparseDownSampleCloseBase(1))
layer = create_model(SparseDownSampleCloseRaw(1))
x = np.random.randn(1, 320, 320, 1).astype("float32")
y = np.random.randn(1, 320, 320, 1).astype("float32")
inp = ({"d": torch.as_tensor(x),
"mask": torch.as_tensor(y)},
{"d": paddle.to_tensor(x),
"mask": paddle.to_tensor(y)})
auto_diff(module, layer, inp, auto_weights=True, atol=1e-4)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
在paddle2.4.2中这句话执行失败,请问有适配2.4.2的padiff吗
PaDiff,全称 Paddle Automatically Diff precision toolkits —— 是基于 PaddlePaddle 与 PyTorch 的模型精度对齐工具。传入 Paddle 或 Torch 模型,对齐训练中间结果以及训练后的模型权重,并提示精度 diff 第一次出现的位置。
参与本项目,你可以:
序号 | 任务名 | 难度 | issue | 认领人 | 提交PR |
---|---|---|---|---|---|
1 | 新增支持标记 Tensor 功能 | 中等 | #78 | ||
2 | 解耦 yaml 配置文件 | 中等 | #79 | ||
3 | 过滤框架组件内部的 API 检查 | 中等 | #82 | ||
4 | 更新和升级 setup.py 打包逻辑 | 简单 | #83 | @littsk | #84 |
Note
决定认领任务后,记得及时联系管理员哦~
PaDiff 在更新文件结构后,使用 setup.py 无法正确从源码安装,原因是datas 文件夹下的数据文件没有被打包。
更新 PaDiff 仓库的 setup.py 文件,正确打包 PaDiff 工具
import paddle
import paddle.nn as nn
import torch
import numpy as np
from padiff import auto_diff, create_model
import paddle.nn as nn
class SimpleModel(nn.Layer):
def __init__(self,
c1,
c2,
k=1,
s=1,
p=None,
g=1,
dropout_p=0.0):
super().__init__()
c_ = 1280
if p is None:
p = k // 2
self.conv = nn.Conv2D(c1, c_, k, s, padding=p, groups=g)
self.norm = nn.InstanceNorm2D(c_)
self.pool = nn.AdaptiveAvgPool2D(1)
self.drop = nn.Dropout(p=dropout_p)
self.linear = nn.Linear(c_, c2)
def forward(self, x):
if isinstance(x, list):
x = paddle.concat(x, 1)
return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))
model = SimpleModel(3, 10)
paddle.summary(model, (1, 3, 224, 224))
state_dict = model.state_dict()
for key in state_dict:
print(key, state_dict[key].shape)
class SimpleModelRef(torch.nn.Module):
def __init__(self,
c1,
c2,
k=1,
s=1,
p=None,
g=1,
dropout_p=0.0):
super().__init__()
c_ = 1280
if p is None:
p = k // 2
self.conv = torch.nn.Conv2d(c1, c_, k, s, padding=p, groups=g)
self.norm = torch.nn.InstanceNorm2d(c_)
self.pool = torch.nn.AdaptiveAvgPool2d(1)
self.drop = torch.nn.Dropout(p=dropout_p, inplace=True)
self.linear = torch.nn.Linear(c_, c2)
def forward(self, x):
if isinstance(x, list):
x = torch.cat(x, 1)
return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))
model = SimpleModelRef(3, 10)
print("---" * 20)
state_dict = model.state_dict()
for key in state_dict:
print(key, state_dict[key].shape)
module = create_model(SimpleModelRef(3, 10))
module.auto_layer_map("base")
layer = create_model(SimpleModel(3, 10))
layer.auto_layer_map("raw")
input = np.random.randn(4, 3, 320, 320).astype("float32")
inp = ({"x": torch.as_tensor(input)}, {"x": paddle.to_tensor(input)})
auto_diff(module, layer, inp, auto_weights=True)
[AutoDiff] Auto set layer_map start searching...
[AutoDiff] Auto set layer_map start searching...
[AutoDiff] Your options:
{
auto_init: `True`
single_step: `False`
use_loss: `False`
use_opt: `False`
atol: `0`
rtol: `1e-07`
compare_mode: `mean`
}
[AutoDiff] Assign weight Failed !!!
RuntimeError: Error occured when trying init weights, between:
base_model: `Linear(in_features=1280, out_features=10, bias=True)`
`SimpleModelRef.linear.weight`
raw_model: `InstanceNorm2D(num_features=1280, epsilon=1e-05)`
`SimpleModel.norm.scale`
AssertionError: Shape of param `weight` in torch::Linear and param `scale` in paddle::InstanceNorm2D is not the same. [10, 1280] vs [1280]
Weight init log saved to
/home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModelRef.log
/home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModel.log
Please view the reports and checkout the layer marked with `<--- *** HERE ***` !
Hint:
1. Check the definition order of params is same in submodels.
2. Check the corresponding submodel have the same style:
param <=> param, buffer <=> buffer, embedding <=> embedding ...
cases like param <=> buffer, param <=> embedding are not allowed.
3. If can not change model codes, try to use a `LayerMap`
which can solve most problems.
4. (skip) means this layer is skipped because it is under black_list, or it has no param.
0. Visit `https://github.com/PaddlePaddle/PaDiff` to find more infomation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.