Giter VIP home page Giter VIP logo

Comments (10)

zenghsh3 avatar zenghsh3 commented on July 20, 2024
  1. DDPG和PPO的策略模型(policy/actor)的最后一层全连接网络(fc)的激活函数都是tanh,这个是会保证输出范围在(-1, 1);所以在测试阶段不用做clip。

  2. DDPG和PPO加噪声的方式不一样,这是跟算法相关的,DDPG只是通过action加噪声来进行探索;而PPO算法需要用到policy的mean和vars(方差)来进行action的概率、KL值计算,因此需要在模型层对policy的方差进行建模,所以我们就直接在model层根据方差进行sample。

  3. 采样时也可以在model层做clip的。

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

上面的回答偏实现细节,我补充回答下这个问题:

2、DDPG与PPO加高斯噪声的时机不一样

本质上,这两者都是添加的高斯噪声,这两种方式的区别是是否对噪声的大小建模。
这么做的主要原因是PPO 是个on-policy 的算法,on-policy 算法学习的过程中learning-policy 是趋向于 behaviour-policy 的。为了让模型逐步收敛,我们需要保证behaviour-policy 产生的样本里包含的噪声要逐步减小,所以我们把对高斯噪声的方差建模,让噪声大小在收敛过程中逐步下降。
至于DDPG,它是个off-policy 算法,收敛效果从理论上看不受噪声的大小的影响,所以不对噪声建模,直接添加在action上了。

from parl.

kosoraYintai avatar kosoraYintai commented on July 20, 2024

哦,layers.exp(logvars / 2.0) *XXX,噪声的自动衰减就体现在这里吧?

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

主要体现在优化policy网络的时候,梯度会回传到方差上面。

vars_item = -0.5 * layers.reduce_sum(logvars)

从这行代码上看,训练policy模型的过程中,为了最小化loss,网络会更新policy部分的参数并且减少方差。

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

十分感谢提出的疑惑,这部分确实看着不直观,我们后续会update 文档说明这个做法的原因:)
有时候为了复现论文级别的指标我们会引用一些paper里面没有提到的trick,比如会用huber_loss 代替传统的square_loss 来拟合Q function,但是核心的算法还是严格参照paper来的。

from parl.

kosoraYintai avatar kosoraYintai commented on July 20, 2024

谢谢大佬们的指导

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

谢谢支持。
另外我们已经收到了你在公众号上面的评论,上面提到将 PARL 应用到flappy bird 这个benchmark的收敛速度问题,这个我们已经找到paddle底层的同学和我们一起定位问题。由于临近年关,大家都陆续休假了,这个性能问题会放在春节后高优先级去定位以及解决。
再次感谢你的使用反馈,新年快乐。

from parl.

kosoraYintai avatar kosoraYintai commented on July 20, 2024

新年快乐!

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

@kosoraYintai 抱歉回复晚了,近期我们和paddlepaddle的同学定位了CNN的性能差距问题,目前给出的建议(这个方案用起来确实挺麻烦的,paddle的同学反馈是在1.4版本修复这个问题)是:

采用ParallelExecutor 而不是传统的Executor,具体配置参照下方的代码示例。

至于PARL框架层面的优化,我们希望是在等后续paddle 合并ParallelExecutor与Executor 之后再进行,现阶段这个优化写法实在是太丑了。

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024
    exec_strategy = fluid.ExecutionStrategy()
    exec_strategy.allow_op_delay = True
    exec_strategy.use_experimental_executor = True
    exec_strategy.num_threads = 20
    build_strategy = fluid.BuildStrategy()
    build_strategy.remove_unnecessary_lock = True

    train_exe = fluid.ParallelExecutor(
            use_cuda=False,
            main_program=train_program,
            build_strategy=build_strategy,
            exec_strategy=exec_strategy)

    # train_exe = fluid.Executor(fluid.CPUPlace())
    #exe = fluid.Executor(fluid.CUDAPlace(0))
    exe = fluid.Executor(fluid.CPUPlace())
    exe.run(fluid.default_startup_program())

from parl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.