Giter VIP home page Giter VIP logo

Comments (4)

jacquesqiao avatar jacquesqiao commented on June 5, 2024

这个问题可能是正常的,我需要分析下,之前是有个bug

from paddle-ce-latest-kpis.

jacquesqiao avatar jacquesqiao commented on June 5, 2024

这个现象符合预期,之前的adam optimizer实现有个trick,所有的parameter共享同一个beta2_pow_acc和beta2_pow_acc,而且每次迭代只会计算一次scale,对于分布式的同步训练模式,两种方式效果是等价的,之前的方式减少了很多scale op,而对于异步训练,之前的做法有bug,造成beta没有被更新。

修改之后,同步和异步训练都对了,每个参数都有自己的beta2_pow_acc,对应的也会有自己的scale,导致计算的scale变多了。

对于transformer模型而言,多了183 * 2个scale

from paddle-ce-latest-kpis.

guochaorong avatar guochaorong commented on June 5, 2024

这里解释了时间为什么变长了

         for p in parameters:
              self._add_accumulator(self._moment1_acc_str, p)
              self._add_accumulator(self._moment2_acc_str, p)

可是没有说明这个fix 的收益是什么?
为什么每一个parameter需要自己的 beta1_pow_acc或beta2_pow_acc?这样修改后看起来 train的acc 数据并没有变好,

from paddle-ce-latest-kpis.

jacquesqiao avatar jacquesqiao commented on June 5, 2024

对于同步而言,两种写法效果是一样的,对于异步,之前的写法是不对的。商量了一下,后面会通过一个pass来把相关的scale op fuse成一个,这样更符合框架的整体设计。

from paddle-ce-latest-kpis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.