Giter VIP home page Giter VIP logo

padiff's People

Contributors

2742195759 avatar aurelius84 avatar feifei-111 avatar feixliu avatar linjieccc avatar littsk avatar wenmuzhou avatar xiaoguanghu01 avatar xing-lil avatar zh-hike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

padiff's Issues

过滤框架组件内部的 API 检查

一、 问题描述📚

PaDiff 能够进行 API 级别的对齐检查,其原理是将框架内的组网 API 包装成一个模型,然后给这个模型设置 hook,这使得所有 API 调用必定会进入 hook 逻辑,对于框架提供的模型组件而言(非用户自定义)也是如此。

二、 任务目标🚀

修改 PaDiff 的 hook 逻辑,检查当前 API 的触发位置,若该 API 触发于 paddle/torch 框架提供的模型组件下,则略过此 API 的信息记录。

三、TIPS

熟悉 report 模块下的内容,结合 PaDiff 工具维护的栈结构进行判断

删除yaml文件的影响

一、问题描述 📚

PaDiff 仓库中有一份 yaml 文件(datas文件夹下),该文件标注了paddle与torch提供的组件权重之间的差异,例如linear的weight需要转置才能对齐。此外,在进行对齐检查时还有一个actions机制(在checker文件夹下),根据当前传入的类型名来选取不同的比较函数。目前这两个机制是独立的,actions实际上只有一种没有起作用。

二、 任务目标 🚀

修改checker模块下,关于模型权重以及梯度的对齐逻辑,剔除yamls的影响,改为使用actions机制(同时可以优化一下get_action()接口)

P.S. 该 yaml 文件在权重初始化功能中仍被使用,权重初始化是一个独立的模块。就对齐工具而言只处理 checker 模块下的部分即可。

三、 TIPS

在获取actions时,需要区分当前的对齐目标:针对模型输出 or 模型权重。不同的对齐目标应当影响返回的 action 类型,为此,可能需要为 dump 下来的文件添加额外信息

代码阅读及文档更新

一、问题描述 📚

在 PaDiff 仓库进行功能更新后,仓库的使用文档未进行同步更新,且仓库还缺乏一个新版本的模块设计图

二、 任务目标 🚀

对docs目录下的文档进行更新,其中大部分文档是旧版本的,将它们适配到新版本

三、TIPS:

仓库中的 docs/Interfaces.md 已进行了简要更新,也可以提供参考

标记tensor功能

一、问题描述 📚

在模型精度检查时,有时只需要监控特定的几个关键 Tensor 。提供一个接口,用户将调用这个接口侵入式地修改组网代码,标记需要监控的 Tensor 。这个接口与其他接口是互相独立的。

二、任务目标 🚀

该接口收到一个 Tensor 后,应当保存这个 Tensor 的信息(到某个全局变量/闭包/或者其他机制),并注册反向 hook 。在一个step 后提供某种机制,将这个 step 中记录的 Tensor 信息保存下来。

三、 TIPS

具体的做法可以参考仓库中 report 文件夹下的 reporttensor_hook 之间的互动。最后,还需要提供一个 dump 记录信息的接口。

MultiheadAttention初始化失败

在样例代码上加入MultiheadAttention,尝试进行参数值复制,但失败

版本:
paddlepaddle-gpu == 2.4.2
torch == 1.12.0+cu102

代码

import paddle
import torch
from padiff import assign_weight, create_model, auto_diff
import torch
import paddle
import paddle
import signal
import os
from padiff import add_special_init

class SimpleModule(torch.nn.Module):
  def __init__(self):
      super(SimpleModule, self).__init__()
      self.linear1 = torch.nn.Linear(100, 10)
      self.attention = torch.nn.MultiheadAttention(64, 8, dropout=0.1, batch_first=True)
  def forward(self, x):
      x = self.linear1(x)
      return x

class SimpleLayer(paddle.nn.Layer):
  def __init__(self):
      super(SimpleLayer, self).__init__()
      self.linear1 = paddle.nn.Linear(100, 10)
      self.attention = paddle.nn.MultiHeadAttention(64, 8, dropout=0.1)
  def forward(self, x):
      x = self.linear1(x)
      return x
  
module = create_model(SimpleModule())
module.auto_layer_map("base")
layer = create_model(SimpleLayer())
layer.auto_layer_map("raw")

assign_weight(module, layer)

报错:
RuntimeError: Error occured when trying init weights, between:
base_model: MultiheadAttention()
SimpleModule.attention.in_proj_weight
raw_model: Linear(in_features=64, out_features=64, dtype=float32)
SimpleLayer.attention.q_proj.weight

模型架构日志文件:
padiff_log/weight_init_SimpleModule.log:

SimpleModule
========================================
    SimpleModule
     |--- Linear
     +--- MultiheadAttention    <---  *** HERE ***
           +--- NonDynamicallyQuantizableLinear  (skip)

padiff_log/weight_init_SimpleLayer.log:

SimpleLayer
========================================
    SimpleLayer
     |--- Linear
     +--- MultiHeadAttention  (skip)
           |--- Linear    <---  *** HERE ***
           |--- Linear
           |--- Linear
           +--- Linear

请问应该如何修改呢?

TypeError: auto_diff() got an unexpected keyword argument 'atol'

版本:

  • torch == 1.11.0 + cu102
  • paddlepaddle-gpu == 2.4.2
  • padiff == 0.2.0

跑下面的demo代码出错:

import paddle
import torch
from padiff import auto_diff
import torch
import paddle

class SimpleModule(torch.nn.Module):
  def __init__(self):
      super(SimpleModule, self).__init__()
      self.linear1 = torch.nn.Linear(100, 10)
  def forward(self, x):
      x = self.linear1(x)
      return x

class SimpleLayer(paddle.nn.Layer):
  def __init__(self):
      super(SimpleLayer, self).__init__()
      self.linear1 = paddle.nn.Linear(100, 10)
  def forward(self, x):
      x = self.linear1(x)
      return x

module = SimpleModule()
layer = SimpleLayer()

inp = paddle.rand((100, 100)).numpy().astype("float32")
inp = ({'x': torch.as_tensor(inp) },
     {'x': paddle.to_tensor(inp)})

auto_diff(module, layer, inp, atol=1e-4, auto_init=True)

报错信息如下:

Traceback (most recent call last):
  File "padiff_test.py", line 63, in <module>
    auto_diff(module, layer, inp, atol=1e-4, auto_init=True)
TypeError: auto_diff() got an unexpected keyword argument 'atol'

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

import torch

import paddle

import numpy as np

from padiff import create_model, auto_diff


class SparseDownSampleCloseBase(torch.nn.Module):
    def __init__(self, stride):
        super(SparseDownSampleCloseBase, self).__init__()
        self.pooling = torch.nn.MaxPool2d(stride, stride)
        self.large_number = 600

    def forward(self, d, mask):
        encode_d = -(1 - mask) * self.large_number - d

        d = -self.pooling(encode_d)
        mask_result = self.pooling(mask)
        d_result = d - (1 - mask_result) * self.large_number

        return d_result, mask_result


class SparseDownSampleCloseRaw(paddle.nn.Layer):
    def __init__(self, stride):
        super(SparseDownSampleCloseRaw, self).__init__()
        self.pooling = paddle.nn.MaxPool2D(stride, stride)
        self.large_number = 600

    def forward(self, d, mask):
        encode_d = -(1 - mask) * self.large_number - d

        d = -self.pooling(encode_d)
        mask_result = self.pooling(mask)
        d_result = d - (1 - mask_result) * self.large_number

        return d_result, mask_result

module = create_model(SparseDownSampleCloseBase(1))
layer = create_model(SparseDownSampleCloseRaw(1))

x = np.random.randn(1, 320, 320, 1).astype("float32")
y = np.random.randn(1, 320, 320, 1).astype("float32")
inp = ({"d": torch.as_tensor(x),
        "mask": torch.as_tensor(y)},
        {"d": paddle.to_tensor(x),
        "mask": paddle.to_tensor(y)})
auto_diff(module, layer, inp, auto_weights=True, atol=1e-4)
Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

🚀 PaDiff 精度对齐工具快乐开源任务 🎉

一、Background 📚

PaDiff,全称 Paddle Automatically Diff precision toolkits —— 是基于 PaddlePaddle 与 PyTorch 的模型精度对齐工具。传入 Paddle 或 Torch 模型,对齐训练中间结果以及训练后的模型权重,并提示精度 diff 第一次出现的位置。

二、Motivation 🚀

参与本项目,你可以:

  • 深入理解深度学习框架 (paddle/torch)的运行机制
  • 深入了解模型精度对齐的场景需求和常见问题

三、Issue Tasks 🙋🏻‍♀️

序号 任务名 难度 issue 认领人 提交PR
1 新增支持标记 Tensor 功能 中等 #78
2 解耦 yaml 配置文件 中等 #79
3 过滤框架组件内部的 API 检查 中等 #82
4 更新和升级 setup.py 打包逻辑 简单 #83 @littsk #84

Note
决定认领任务后,记得及时联系管理员哦~

更新setup.py

一、问题描述 📚

PaDiff 在更新文件结构后,使用 setup.py 无法正确从源码安装,原因是datas 文件夹下的数据文件没有被打包。

二、任务目标 🚀

更新 PaDiff 仓库的 setup.py 文件,正确打包 PaDiff 工具

包含InstanceNorm2D的模型,用padiff会报错

import paddle
import paddle.nn as nn

import torch

import numpy as np

from padiff import auto_diff, create_model

import paddle.nn as nn

class SimpleModel(nn.Layer):
    def __init__(self,
                 c1,
                 c2,
                 k=1,
                 s=1,
                 p=None,
                 g=1,
                 dropout_p=0.0):
        super().__init__()
        c_ = 1280
        if p is None:
            p = k // 2
        self.conv = nn.Conv2D(c1, c_, k, s, padding=p, groups=g)
        self.norm = nn.InstanceNorm2D(c_)
        self.pool = nn.AdaptiveAvgPool2D(1)
        self.drop = nn.Dropout(p=dropout_p)
        self.linear = nn.Linear(c_, c2)

    def forward(self, x):
        if isinstance(x, list):
            x = paddle.concat(x, 1)
        return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))


model = SimpleModel(3, 10)

paddle.summary(model, (1, 3, 224, 224))

state_dict = model.state_dict()
for key in state_dict:
    print(key, state_dict[key].shape)


class SimpleModelRef(torch.nn.Module):
    def __init__(self,
                 c1,
                 c2,
                 k=1,
                 s=1,
                 p=None,
                 g=1,
                 dropout_p=0.0):
        super().__init__()
        c_ = 1280
        if p is None:
            p = k // 2
        self.conv = torch.nn.Conv2d(c1, c_, k, s, padding=p, groups=g)
        self.norm = torch.nn.InstanceNorm2d(c_)
        self.pool = torch.nn.AdaptiveAvgPool2d(1)
        self.drop = torch.nn.Dropout(p=dropout_p, inplace=True)
        self.linear = torch.nn.Linear(c_, c2)

    def forward(self, x):
        if isinstance(x, list):
            x = torch.cat(x, 1)
        return self.linear(self.drop(self.pool(self.norm(self.conv(x))).flatten(1)))



model = SimpleModelRef(3, 10)

print("---" * 20)
state_dict = model.state_dict()

for key in state_dict:
    print(key, state_dict[key].shape)

module = create_model(SimpleModelRef(3, 10))
module.auto_layer_map("base")

layer = create_model(SimpleModel(3, 10))
layer.auto_layer_map("raw")

input = np.random.randn(4, 3, 320, 320).astype("float32")
inp = ({"x": torch.as_tensor(input)}, {"x": paddle.to_tensor(input)})
auto_diff(module, layer, inp, auto_weights=True)
[AutoDiff] Auto set layer_map start searching...

[AutoDiff] Auto set layer_map start searching...

[AutoDiff] Your options:
{
  auto_init: `True`
  single_step: `False`
  use_loss: `False`
  use_opt: `False`
  atol: `0`
  rtol: `1e-07`
  compare_mode: `mean`
}
[AutoDiff] Assign weight Failed !!!

RuntimeError:  Error occured when trying init weights, between:
    base_model: `Linear(in_features=1280, out_features=10, bias=True)`
                `SimpleModelRef.linear.weight`
    raw_model: `InstanceNorm2D(num_features=1280, epsilon=1e-05)`
               `SimpleModel.norm.scale`
AssertionError:  Shape of param `weight` in torch::Linear and param `scale` in paddle::InstanceNorm2D is not the same. [10, 1280] vs [1280]

Weight init log saved to 
    /home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModelRef.log
    /home/greatx/repos/DocTrPP/padiff_log/weight_init_SimpleModel.log

Please view the reports and checkout the layer marked with `<---  *** HERE ***` !
Hint:
    1. Check the definition order of params is same in submodels.
    2. Check the corresponding submodel have the same style:
       param <=> param, buffer <=> buffer, embedding <=> embedding ...
       cases like param <=> buffer, param <=> embedding are not allowed.
    3. If can not change model codes, try to use a `LayerMap`
       which can solve most problems.
    4. (skip) means this layer is skipped because it is under black_list, or it has no param.
    0. Visit `https://github.com/PaddlePaddle/PaDiff` to find more infomation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.