paddlepaddle / padiff Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 14.0 441 KB

Paddle Automatically Diff Precision Toolkits.

Python 99.41% Makefile 0.59%

deep-learning paddlepaddle

padiff's People

Contributors

Stargazers

Watchers

Forkers

2742195759 linjieccc pidack nemonameless aurelius84 wenmuzhou feifei-111 feixliu littsk xing-lil

padiff's Issues

一、问题描述📚

PaDiff 能够进行 API 级别的对齐检查，其原理是将框架内的组网 API 包装成一个模型，然后给这个模型设置 hook，这使得所有 API 调用必定会进入 hook 逻辑，对于框架提供的模型组件而言（非用户自定义）也是如此。

二、任务目标🚀

修改 PaDiff 的 hook 逻辑，检查当前 API 的触发位置，若该 API 触发于 paddle/torch 框架提供的模型组件下，则略过此 API 的信息记录。

三、TIPS

熟悉 report 模块下的内容，结合 PaDiff 工具维护的栈结构进行判断

一、问题描述 📚

PaDiff 仓库中有一份 yaml 文件（datas文件夹下），该文件标注了paddle与torch提供的组件权重之间的差异，例如linear的weight需要转置才能对齐。此外，在进行对齐检查时还有一个actions机制（在checker文件夹下），根据当前传入的类型名来选取不同的比较函数。目前这两个机制是独立的，actions实际上只有一种没有起作用。

二、任务目标 🚀

修改checker模块下，关于模型权重以及梯度的对齐逻辑，剔除yamls的影响，改为使用actions机制（同时可以优化一下get_action()接口）

P.S. 该 yaml 文件在权重初始化功能中仍被使用，权重初始化是一个独立的模块。就对齐工具而言只处理 checker 模块下的部分即可。

三、 TIPS

在获取actions时，需要区分当前的对齐目标：针对模型输出 or 模型权重。不同的对齐目标应当影响返回的 action 类型，为此，可能需要为 dump 下来的文件添加额外信息

一、问题描述 📚

在 PaDiff 仓库进行功能更新后，仓库的使用文档未进行同步更新，且仓库还缺乏一个新版本的模块设计图

二、任务目标 🚀

对docs目录下的文档进行更新，其中大部分文档是旧版本的，将它们适配到新版本

三、TIPS：

仓库中的 docs/Interfaces.md 已进行了简要更新，也可以提供参考

一、问题描述 📚

在模型精度检查时，有时只需要监控特定的几个关键 Tensor 。提供一个接口，用户将调用这个接口侵入式地修改组网代码，标记需要监控的 Tensor 。这个接口与其他接口是互相独立的。

二、任务目标 🚀

该接口收到一个 Tensor 后，应当保存这个 Tensor 的信息（到某个全局变量/闭包/或者其他机制），并注册反向 hook 。在一个step 后提供某种机制，将这个 step 中记录的 Tensor 信息保存下来。

三、 TIPS

具体的做法可以参考仓库中 report 文件夹下的 report 与 tensor_hook 之间的互动。最后，还需要提供一个 dump 记录信息的接口。

MultiheadAttention初始化失败

在样例代码上加入MultiheadAttention，尝试进行参数值复制，但失败

版本：
paddlepaddle-gpu == 2.4.2
torch == 1.12.0+cu102

代码

import paddle
import torch
from padiff import assign_weight, create_model, auto_diff
import torch
import paddle
import paddle
import signal
import os
from padiff import add_special_init

class SimpleModule(torch.nn.Module):
  def __init__(self):
      super(SimpleModule, self).__init__()
      self.linear1 = torch.nn.Linear(100, 10)
      self.attention = torch.nn.MultiheadAttention(64, 8, dropout=0.1, batch_first=True)
  def forward(self, x):
      x = self.linear1(x)
      return x

class SimpleLayer(paddle.nn.Layer):
  def __init__(self):
      super(SimpleLayer, self).__init__()
      self.linear1 = paddle.nn.Linear(100, 10)
      self.attention = paddle.nn.MultiHeadAttention(64, 8, dropout=0.1)
  def forward(self, x):
      x = self.linear1(x)
      return x
  
module = create_model(SimpleModule())
module.auto_layer_map("base")
layer = create_model(SimpleLayer())
layer.auto_layer_map("raw")

assign_weight(module, layer)

报错：
RuntimeError: Error occured when trying init weights, between:
base_model: MultiheadAttention()
SimpleModule.attention.in_proj_weight
raw_model: Linear(in_features=64, out_features=64, dtype=float32)
SimpleLayer.attention.q_proj.weight

模型架构日志文件:
padiff_log/weight_init_SimpleModule.log:

SimpleModule
========================================
    SimpleModule
     |--- Linear
     +--- MultiheadAttention    <---  *** HERE ***
           +--- NonDynamicallyQuantizableLinear  (skip)

padiff_log/weight_init_SimpleLayer.log:

SimpleLayer
========================================
    SimpleLayer
     |--- Linear
     +--- MultiHeadAttention  (skip)
           |--- Linear    <---  *** HERE ***
           |--- Linear
           |--- Linear
           +--- Linear

请问应该如何修改呢？

TypeError: auto_diff() got an unexpected keyword argument 'atol'

版本：

torch == 1.11.0 + cu102
paddlepaddle-gpu == 2.4.2
padiff == 0.2.0

跑下面的demo代码出错：

import paddle
import torch
from padiff import auto_diff
import torch
import paddle

class SimpleModule(torch.nn.Module):
  def __init__(self):
      super(SimpleModule, self).__init__()
      self.linear1 = torch.nn.Linear(100, 10)
  def forward(self, x):
      x = self.linear1(x)
      return x

class SimpleLayer(paddle.nn.Layer):
  def __init__(self):
      super(SimpleLayer, self).__init__()
      self.linear1 = paddle.nn.Linear(100, 10)
  def forward(self, x):
      x = self.linear1(x)
      return x

module = SimpleModule()
layer = SimpleLayer()

inp = paddle.rand((100, 100)).numpy().astype("float32")
inp = ({'x': torch.as_tensor(inp) },
     {'x': paddle.to_tensor(inp)})

auto_diff(module, layer, inp, atol=1e-4, auto_init=True)

报错信息如下：

Traceback (most recent call last):
  File "padiff_test.py", line 63, in <module>
    auto_diff(module, layer, inp, atol=1e-4, auto_init=True)
TypeError: auto_diff() got an unexpected keyword argument 'atol'

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

import torch

import paddle

import numpy as np

from padiff import create_model, auto_diff


class SparseDownSampleCloseBase(torch.nn.Module):
    def __init__(self, stride):
        super(SparseDownSampleCloseBase, self).__init__()
        self.pooling = torch.nn.MaxPool2d(stride, stride)
        self.large_number = 600

    def forward(self, d, mask):
        encode_d = -(1 - mask) * self.large_number - d

        d = -self.pooling(encode_d)
        mask_result = self.pooling(mask)
        d_result = d - (1 - mask_result) * self.large_number

        return d_result, mask_result


class SparseDownSampleCloseRaw(paddle.nn.Layer):
    def __init__(self, stride):
        super(SparseDownSampleCloseRaw, self).__init__()
        self.pooling = paddle.nn.MaxPool2D(stride, stride)
        self.large_number = 600

    def forward(self, d, mask):
        encode_d = -(1 - mask) * self.large_number - d

        d = -self.pooling(encode_d)
        mask_result = self.pooling(mask)
        d_result = d - (1 - mask_result) * self.large_number

        return d_result, mask_result

module = create_model(SparseDownSampleCloseBase(1))
layer = create_model(SparseDownSampleCloseRaw(1))

x = np.random.randn(1, 320, 320, 1).astype("float32")
y = np.random.randn(1, 320, 320, 1).astype("float32")
inp = ({"d": torch.as_tensor(x),
        "mask": torch.as_tensor(y)},
        {"d": paddle.to_tensor(x),
        "mask": paddle.to_tensor(y)})
auto_diff(module, layer, inp, auto_weights=True, atol=1e-4)

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

from paddle.utils import flatten, pack_sequence_as, map_structure

在paddle2.4.2中这句话执行失败，请问有适配2.4.2的padiff吗

🚀 PaDiff 精度对齐工具快乐开源任务 🎉

一、Background 📚

PaDiff，全称 Paddle Automatically Diff precision toolkits —— 是基于 PaddlePaddle 与 PyTorch 的模型精度对齐工具。传入 Paddle 或 Torch 模型，对齐训练中间结果以及训练后的模型权重，并提示精度 diff 第一次出现的位置。

文档目录 Guides
使用教程 Tutorial
对齐ViTPose流程 ViTPose
接口参数说明 Interface
常见问题解答 FAQs

二、Motivation 🚀

参与本项目，你可以：

深入理解深度学习框架（paddle/torch）的运行机制
深入了解模型精度对齐的场景需求和常见问题

三、Issue Tasks 🙋🏻‍♀️

序号	任务名	难度	issue	认领人	提交PR
1	新增支持标记 Tensor 功能	中等	#78
2	解耦 yaml 配置文件	中等	#79
3	过滤框架组件内部的 API 检查	中等	#82
4	更新和升级 setup.py 打包逻辑	简单	#83	@littsk	#84

Note
决定认领任务后，记得及时联系管理员哦~

更新setup.py

一、问题描述 📚

PaDiff 在更新文件结构后，使用 setup.py 无法正确从源码安装，原因是datas 文件夹下的数据文件没有被打包。

二、任务目标 🚀

更新 PaDiff 仓库的 setup.py 文件，正确打包 PaDiff 工具

包含InstanceNorm2D的模型，用padiff会报错

import paddle
import paddle.nn as nn

import torch

import numpy as np

from padiff import auto_diff, create_model

import paddle.nn as nn

class SimpleModel(nn.Layer):
    def __init__(self,
                 c1,
                 c2,
                 k=1,
                 s=1,
                 p=None,
                 g=1,
                 dropout_p=0.0):
        super().__init__()
        c_ = 1280
        if p is None:
            p = k // 2
        self.conv = nn.Conv2D(c1, c_, k, s, padding=p, groups=g)
        self.norm = nn.InstanceNorm2D(c_)
        self.pool = nn.AdaptiveAvgPool2D(1)
        self.drop = nn.Dropout(p=dropout_p)
        self.linear = nn.Linear(c_, c2)

    def forward(self, x):
        if isinstance(x, list):
            x = paddle.concat(x, 1)
        return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))


model = SimpleModel(3, 10)

paddle.summary(model, (1, 3, 224, 224))

state_dict = model.state_dict()
for key in state_dict:
    print(key, state_dict[key].shape)


class SimpleModelRef(torch.nn.Module):
    def __init__(self,
                 c1,
                 c2,
                 k=1,
                 s=1,
                 p=None,
                 g=1,
                 dropout_p=0.0):
        super().__init__()
        c_ = 1280
        if p is None:
            p = k // 2
        self.conv = torch.nn.Conv2d(c1, c_, k, s, padding=p, groups=g)
        self.norm = torch.nn.InstanceNorm2d(c_)
        self.pool = torch.nn.AdaptiveAvgPool2d(1)
        self.drop = torch.nn.Dropout(p=dropout_p, inplace=True)
        self.linear = torch.nn.Linear(c_, c2)

    def forward(self, x):
        if isinstance(x, list):
            x = torch.cat(x, 1)
        return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))



model = SimpleModelRef(3, 10)

print("---" * 20)
state_dict = model.state_dict()

for key in state_dict:
    print(key, state_dict[key].shape)

module = create_model(SimpleModelRef(3, 10))
module.auto_layer_map("base")

layer = create_model(SimpleModel(3, 10))
layer.auto_layer_map("raw")

input = np.random.randn(4, 3, 320, 320).astype("float32")
inp = ({"x": torch.as_tensor(input)}, {"x": paddle.to_tensor(input)})
auto_diff(module, layer, inp, auto_weights=True)

[AutoDiff] Auto set layer_map start searching...

[AutoDiff] Auto set layer_map start searching...

[AutoDiff] Your options:
{
  auto_init: `True`
  single_step: `False`
  use_loss: `False`
  use_opt: `False`
  atol: `0`
  rtol: `1e-07`
  compare_mode: `mean`
}
[AutoDiff] Assign weight Failed !!!

RuntimeError:  Error occured when trying init weights, between:
    base_model: `Linear(in_features=1280, out_features=10, bias=True)`
                `SimpleModelRef.linear.weight`
    raw_model: `InstanceNorm2D(num_features=1280, epsilon=1e-05)`
               `SimpleModel.norm.scale`
AssertionError:  Shape of param `weight` in torch::Linear and param `scale` in paddle::InstanceNorm2D is not the same. [10, 1280] vs [1280]

Weight init log saved to 
    /home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModelRef.log
    /home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModel.log

Please view the reports and checkout the layer marked with `<---  *** HERE ***` !
Hint:
    1. Check the definition order of params is same in submodels.
    2. Check the corresponding submodel have the same style:
       param <=> param, buffer <=> buffer, embedding <=> embedding ...
       cases like param <=> buffer, param <=> embedding are not allowed.
    3. If can not change model codes, try to use a `LayerMap`
       which can solve most problems.
    4. (skip) means this layer is skipped because it is under black_list, or it has no param.
    0. Visit `https://github.com/PaddlePaddle/PaDiff` to find more infomation.

paddlepaddle / padiff Goto Github PK

padiff's People

Contributors

Stargazers

Watchers

Forkers

padiff's Issues

一、 问题描述📚

二、 任务目标🚀

三、TIPS

一、问题描述 📚

二、 任务目标 🚀

三、 TIPS

一、问题描述 📚

二、 任务目标 🚀

三、TIPS：

一、问题描述 📚

二、任务目标 🚀

三、 TIPS

一、Background 📚

二、Motivation 🚀

三、Issue Tasks 🙋🏻‍♀️

一、问题描述 📚

二、任务目标 🚀

Recommend Projects

Recommend Topics

Recommend Org

一、问题描述📚

二、任务目标🚀

二、任务目标 🚀

二、任务目标 🚀