paddlepaddle / parl Goto Github PK

View Code? Open in Web Editor NEW

3.2K 62.0 809.0 47.02 MB

A high-performance distributed training framework for Reinforcement Learning

Home Page: https://parl.readthedocs.io/

License: Apache License 2.0

Python 91.39% CMake 0.60% Shell 1.07% Dockerfile 0.07% JavaScript 1.67% HTML 0.65% C++ 4.56%

reinforcement-learning large-scale parallelization

parl's Introduction

English | 简体中文

PARL is a flexible and high-efficient reinforcement learning framework.

About PARL
Install
Getting Started
Examples

About PARL

Features

Reproducible. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.

Large Scale. Ability to support high-performance parallelization of training with thousands of CPUs and multi-GPUs.

Reusable. Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.

Extensible. Build new algorithms quickly by inheriting the abstract class in the framework.

Abstractions

PARL aims to build an agent for training algorithms to perform complex tasks. The main abstractions introduced by PARL that are used to build an agent recursively are the following:

Model

Model is abstracted to construct the forward network which defines a policy network or critic network given state as input.

Algorithm

Algorithm describes the mechanism to update parameters in Model and often contains at least one model.

Agent

Agent, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process.

Note: For more information about base classes, please visit our tutorial and API documentation.

Parallelization

PARL provides a compact API for distributed training, allowing users to transfer the code into a parallelized version by simply adding a decorator. For more information about our APIs for parallel training, please visit our documentation.
Here is a Hello World example to demonstrate how easy it is to leverage outer computation resources.

#============Agent.py=================
@parl.remote_class
class Agent(object):

    def say_hello(self):
        print("Hello World!")

    def sum(self, a, b):
        return a+b

parl.connect('localhost:8037')
agent = Agent()
agent.say_hello()
ans = agent.sum(1,5) # it runs remotely, without consuming any local computation resources

Two steps to use outer computation resources:

use the parl.remote_class to decorate a class at first, after which it is transferred to be a new class that can run in other CPUs or machines.
call parl.connect to initialize parallel communication before creating an object. Calling any function of the objects does not consume local computation resources since they are executed elsewhere.

As shown in the above figure, real actors (orange circle) are running at the cpu cluster, while the learner (blue circle) is running at the local gpu with several remote actors (yellow circle with dotted edge).

For users, they can write code in a simple way, just like writing multi-thread code, but with actors consuming remote resources. We have also provided examples of parallized algorithms like IMPALA, A2C. For more details in usage please refer to these examples.

Install:

Dependencies

Python 3.6+(Python 3.8+ is preferable for distributed training).
paddlepaddle>=2.3.1 (Optional, if you only want to use APIs related to parallelization alone)

pip install parl

Getting Started

Several-points to get you started:

Tutorial : How to solve cartpole problem.
Xparl Usage : How to set up a cluster with xparl and compute in parallel.
Advanced Tutorial : Create customized algorithms.
API documentation

For beginners who know little about reinforcement learning, we also provide an introductory course: ( Video | Code )

Examples

parl's People

Contributors

Stargazers

Watchers

Forkers

lzhao4ever emailweixu yu239-zz skylian wanglei828 lihangliu joodo yishuihanhan planck35 xiulonghan wycharry jianhuima portia1026 lianrzh taojiucheng lyjsz huiyuan11520 kunpengliu0827 kongdzh leefree-git awesome-archive catyans datianshi21 jerrymazhaizhai menglianghua devilofshine tianyikenan kongljob johndpope xiaosongshine nolisten ruimin0309 xeransis kaer1990 rongbohui wutenghu dupuleng fengzhou4 rymmx janychan fd-mingjie linyuhan32 zenghsh3 worldeditors alexqdh riashat cnhup qianrenjian ldy8665 shaojieyangsz caihengyu520 bigbugx ningkp liguoyu1 fengzifrank abhimalamkar chris0919 thomasqin2090 yelrose iloveopenworld ronaldjen enhenghengheng githubusr1 dtss316 collector-m situjunhao evapandora001 batermj gjfxiaomei nereus-yc legolas140 fanxingrong arsenelupinhb tinydada jiejiezhu dkjinggangshan mdheller gonion huangyumeng bffisabigman bbxfnet haha-533 chenqingzhu2019 ddayzzz ruyueshuo haojianyong a-big-tomato xueeinstein boscotsang zyysurely hikeny deng-rong cinocino kunbb mzy2240 kikitian lishuailong szlhl1040 zzdbirdsfly liuchaoxd

parl's Issues

有Save和Load的代码示例吗？我本地试验失败

win7环境
本地使用DQN算法训练迷宫问题，训练完成后达到了最优解，并save模型；
重新运行程序，load模型，得到的结果并不是最优解；
效果在这里：
https://github.com/kosoraYintai/PARL-Sample/blob/master/dqn_dnn/README.md
不知道错在哪里
DDPG、DQN、PPO等示例均未找到save和load模型的示例，可否提供代码模板？

clarification of environments used in A2C algorithm

Currently we are using envname&NoFrameSkip-v4 for training.

PARL/examples/A2C/a2c_config.py

Line 21 in ee3e8dc

'env_name': 'PongNoFrameskip-v4',

We should add more details in READEME for users to make sure that they use the actual environments.

AssertionError: The loss_name should be set here.

When running sh scripts/train_difficulty1.sh ./low_speed_model in /PARL/examples/NeurIPS2019-Learn-to-Move-Challenge, an AssertionError occurred.

`(opensim-rl) luo@idserver:~/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge$ sh scripts/train_difficulty1.sh ./low_speed_model

/home/luo/anaconda3/envs/opensim-rl/bin/python
[12-15 11:41:49 MainThread @logger.py:224] Argv: train.py --actor_num 300 --difficulty 1 --penalty_coeff 3.0 --logdir ./output/difficulty1 --restore_model_path ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/opensim/simbody.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[12-15 11:41:49 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
[12-15 11:41:49 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
W1215 11:41:50.501416 31581 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W1215 11:41:50.504907 31581 device_context.cc:243] device: 0, cuDNN Version: 7.5.
[12-15 11:41:51 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
Traceback (most recent call last):
File "train.py", line 327, in
learner = Learner(args)
File "train.py", line 78, in init
self.agent = OpenSimAgent(algorithm, OBS_DIM, ACT_DIM)
File "/home/luo/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py", line 40, in init
build_strategy=build_strategy)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/parallel_executor.py", line 201, in init
if share_vars_from else None)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py", line 244, in with_data_parallel
assert self._loss_name is not None, "The loss_name should be set here."
AssertionError: The loss_name should be set here.
`

Does anybody know how to solve that? Thanks in advance.

pip install 报错

按照quickstart的说明进入目录运行pip install . 时出现如下错误，不知道如何解决。。

[bug] abnormal GPU detection in Mac Os

export CUDA_VISIBLE_DEVICES="0"
>> from parl.utils import machine_info
>> machine_info.is_gpu_available()
>> True

The return of machine_info.is_gpu_available() is supposed to be False in Mac OS, where GPU has not been installed.

parl.remote.exceptions.RemoteError: [PARL remote error when calling function `init`]:

我运行的默认的IMPALA算法，actor数量为2
运行环境：

显卡：mx150
cuda：10.02.89
paddlepaddle-gpu (1.6.3.post107)
parl (1.2.1)

错误如下：

[02-29 17:50:42 MainThread @train.py:148] Waiting for 2 remote actors to connect.
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 1
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 2
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
    remote_actor = Actor(self.config)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
    raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
    cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'


Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
    remote_actor = Actor(self.config)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
    raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
    cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'

Add logs for remote actors

Currently, remote actors with PARL record no logs, which is inconvenient for debugging.
Add logs for remote actors and display it in the front end(Web UI).

调A2C的例子，把Atarimodel中的Policy network，从cnn换成lstm出现错误信息

ValueError: share_vars_from is not compiled and run, so there is no var to share.

When running python3 simulator_server.py --port 8030 --ensemble_num 1 within NeurIPS2018-AI-for-Prosthetics-Challenge, I got the following error:

share_vars_from is set, scope is ignored.
Traceback (most recent call last):
  File "simulator_server.py", line 331, in <module>
    simulator_server = SimulatorServer()
  File "simulator_server.py", line 84, in __init__
    self.agent = OpenSimAgent(alg, OBS_DIM, ACT_DIM, args.ensemble_num)
  File "/home/rsa-key-20191010/test/PARL/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim
_agent.py", line 64, in __init__
    share_vars_parallel_executor=self.learn_pe[i])
  File "/home/rsa-key-20191010/test/PARL/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_h
ead_ddpg.py", line 121, in sync_target
    share_vars_parallel_executor=share_vars_parallel_executor)
  File "/home/davidzhenggd/.local/lib/python3.5/site-packages/parl/core/fluid/model.py", line 182,
 in sync_weights_to
    self._cached_fluid_executor.run(fetch_list=[])
  File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/parallel_executor.py", 
line 280, in run
    return_numpy=return_numpy)
  File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/executor.py", line 664,
 in run
    program._compile(scope, self.place)
  File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/compiler.py", line 376,
 in _compile
    scope=self._scope)
  File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/compiler.py", line 284,
 in _compile_data_parallel
    "share_vars_from is not compiled and run, so there is no "
ValueError: share_vars_from is not compiled and run, so there is no var to share.

Do you know how to resolve this issue? Thanks in advance!

GUNREAL 模型的实现

请问如何实现以下的模型呢？我已经有了一个初步的设计（在GA3C的基础上进行的修改）：

现在有以下的几个疑问：

ParallelExecutor 的设计是否正确

self.learn_exe = fluid.ParallelExecutor(
            use_cuda=use_cuda,
            main_program=self.learn_program,
            build_strategy=build_strategy,
            exec_strategy=exec_strategy)
        # 从并行运行的环境中获得训练数据，将会得到 base，pc，vr，rp 的数据
        self.base_sample_exes = []
        for _ in range(base_predict_thread_num):
            with fluid.scope_guard(fluid.global_scope().new_scope()):
                pe = fluid.ParallelExecutor(
                    use_cuda=use_cuda,
                    main_program=self.base_sample_program,
                    build_strategy=build_strategy,
                    exec_strategy=exec_strategy)
                self.base_sample_exes.append(pe)
        self.pc_sample_exes = []
        for _ in range(pc_predict_thread_num):
            with fluid.scope_guard(fluid.global_scope().new_scope()):
                pe = fluid.ParallelExecutor(
                    use_cuda=use_cuda,
                    main_program=self.pc_sample_program,
                    build_strategy=build_strategy,
                    exec_strategy=exec_strategy)
                self.pc_sample_exes.append(pe)
        self.vr_sample_exes = []
        for _ in range(vr_predict_thread_num):
            with fluid.scope_guard(fluid.global_scope().new_scope()):
                pe = fluid.ParallelExecutor(
                    use_cuda=use_cuda,
                    main_program=self.vr_sample_program,
                    build_strategy=build_strategy,
                    exec_strategy=exec_strategy)
                self.vr_sample_exes.append(pe)

Program 是否正确：

# 可能是类似于获得输出名称的 program
        self.base_sample_program = fluid.Program()
        self.vr_sample_program = fluid.Program()
        self.pc_sample_program = fluid.Program()
        # 获取 base，vr，rp ，pc的 program
        # self.predict_program = fluid.Program()
        self.learn_program = fluid.Program()

        with fluid.program_guard(self.base_sample_program):
            # 输入的额state
            base_states = layers.data(name='base_states', shape=self.obs_shape, dtype='float32')
            base_sample_actions, base_values = self.alg.sample(base_states)
            # 输出行为 index，值函数. 为何需要他们的名称？
            self.base_sample_outputs = [base_sample_actions.name, base_values.name]

        with fluid.program_guard(self.pc_sample_program):
            pc_states = layers.data(name='pc_states', shape=self.obs_shape, dtype='float32')
            pc_q, pc_q_max = self.alg.predict_pc_q_and_pc_q_max(pc_states)
            # 输出行为 index，值函数. 为何需要他们的名称？
            self.pc_outputs = [pc_q_max.name]

        with fluid.program_guard(self.vr_sample_program):
            vr_states = layers.data(name='vr_states', shape=self.obs_shape, dtype='float32')
            vr_v = self.alg.predict_vr_value(vr_states)
            # 输出行为 index，值函数. 为何需要他们的名称？
            self.vr_outputs = [vr_v.name]

        with fluid.program_guard(self.learn_program):
            base_states = layers.data(name='base_states', shape=self.obs_shape, dtype='float32')
            # 这个算法自己进行 onehot
            base_actions = layers.data(name='base_actions', shape=[6], dtype='float32')

            base_R = layers.data(name='base_R', shape=[], dtype='float32')
            base_values = layers.data(name='base_values', shape=[], dtype='float32')
            # 辅助任务的数据
            pc_states = layers.data(name='pc_states', shape=self.obs_shape, dtype='float32'); pc_R = layers.data(name='pc_R', shape=[20, 20], dtype='float32'); pc_actions = layers.data(name='pc_actions', shape=[6], dtype='float32')
            vr_states = layers.data(name='vr_states', shape=self.obs_shape, dtype='float32'); vr_R = layers.data(name='vr_R', shape=[], dtype='float32')  # vr
            rp_states = layers.data(name='rp_states', shape=self.obs_shape, dtype='float32'); rp_C = layers.data(name='rp_C', shape=[3], dtype='float32')  # rp
            lr = layers.data(name='lr', shape=[1], dtype='float32', append_batch_size=False)
            entropy_coeff = layers.data(name='entropy_coeff', shape=[], dtype='float32')
            # 包装训练数据
            self.learn_reader = fluid.layers.create_py_reader_by_data(
                capacity=32,
                feed_list=[
                    base_states, base_actions, base_R, base_values,
                    # 其他的辅助任务的数据
                    pc_states, pc_actions, pc_R,
                    vr_states, vr_R,
                    rp_states, rp_C,
                    # 训练网络的数据
                    lr, entropy_coeff
                ])
            base_states, base_actions, base_R, base_values, pc_states, pc_actions, pc_R, vr_states, vr_R, rp_states, rp_C, lr, entropy_coeff = fluid.layers.read_file(self.learn_reader)

            total_loss, pi_loss, vf_loss, entropy, pc_loss, vr_loss, rp_loss = self.alg.learn(
                base_states, base_actions, base_R, base_values,
                # 训练数据
                pc_states, pc_R, pc_actions,
                vr_states, vr_R,
                rp_states, rp_C,
                lr, entropy_coeff)
            self.learn_outputs = [
                total_loss.name, pi_loss.name, vf_loss.name, entropy.name, pc_loss.name, vr_loss.name, rp_loss.name
            ]

刷POJ、LeetCode(半人工半智能)的baseline已出

经过N次的超参数调整，使用Rainbow（Double-Q + SegmentTree + Advantage）解决八数码问题的模型终于训练完毕。在不知道如何搜索最优解的情况下，根据reward的返回信息，从0开始训练，并自动生成了打表代码。没想到的是，居然冲到了LeetCode的Top1，成为了默认的submission！(虽然本地使用A*算法对拍正确率为98.6%，但并不影响AC，呵呵)

所以，困扰我三年以上的一系列问题终于解决：
1、不少知乎大V认为：有了机器学习算法之后，传统《算法导论》了解即可，不需要精读，两者联系不大。
2、99%脉脉帖子认为：刷算法题只是为了应付面试而已，工作上用不到
3、如何使用机器学习解决传统图搜索和离散优化问题？又如何使用传统动态规划和树递归优化深度学习模型？
有了PARL后，以上困惑真的是迎刃而解了——两手抓两手都要硬即可，不仅要要刷更多的题，也要学习更多的模型。
最后，为PARL打call：
深度强化学习框架PARL——联结传统数据结构与算法和新兴深度学习算法的桥梁，算法工程师的不二选择！
O(∩_∩)O

该升级了，和paddlehub不同步啊

ERROR: parl 1.2.3 has requirement flask==1.0.4, but you'll have flask 1.1.1 which is incompatible.

ERROR: paddlehub 1.6.1 has requirement flask>=1.1.0, but you'll have flask 1.0.4 which is incompatible.

locale.Error: unsupported locale setting in Windows

Locale setting's error occurred when i run parl in Windows OS

C:\Users\mxfeng>xparl start --cpu_num 64 --port 8010 --monitor_port 7777
[32m[09-26 15:08:16 MainThread @logger.py:217] [0m Argv: C:\Users\mxfeng\Anaconda3\envs\dist_deep_rl\Scripts\xparl.exe start --cpu_num 64 --port 8010 --monitor_port 7777
Traceback (most recent call last):
  File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\mxfeng\Anaconda3\envs\dist_deep_rl\Scripts\xparl.exe\__main__.py", line 5, in <module>
  File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\site-packages\parl\remote\scripts.py", line 36, in <module>
    locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
  File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\locale.py", line 598, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

I know how to fix the bug in Ubuntu, but i have no idea in Windows OS

运行demo出错

在/PARL/examples/QuickStart目录下执行：
python train.py
W1105 17:30:22.264058 15205 init.cc:212] *** Aborted at 1572946222 (unix time) try "date -d @1572946222" if you are using GNU date ***
W1105 17:30:22.266913 15205 init.cc:212] PC: @ 0x0 (unknown)
W1105 17:30:22.267369 15205 init.cc:212] *** SIGSEGV (@0x0) received by PID 15205 (TID 0x7f9838abb740) from PID 0; stack trace: ***
W1105 17:30:22.269906 15205 init.cc:212] @ 0x7f98386a8130 (unknown)
W1105 17:30:22.270401 15205 init.cc:212] @ 0x7f97c49c6a96 pybind11::detail::make_new_python_type()
W1105 17:30:22.270716 15205 init.cc:212] @ 0x7f97c49c81f8 pybind11::detail::generic_type::initialize()
W1105 17:30:22.271059 15205 init.cc:212] @ 0x7f97c4b4e552 ZN8pybind115enum_IN10onnx_torch20TensorProto_DataTypeEEC1IJEEERKNS_6handleEPKcDpRKT
W1105 17:30:22.271669 15205 init.cc:212] @ 0x7f97c4b46848 torch::onnx::initONNXBindings()
W1105 17:30:22.272096 15205 init.cc:212] @ 0x7f97c482fc28 initModule()
W1105 17:30:22.275000 15205 init.cc:212] @ 0x7f9838cef335 _PyImport_LoadDynamicModuleWithSpec
W1105 17:30:22.278131 15205 init.cc:212] @ 0x7f9838cef540 _imp_create_dynamic
W1105 17:30:22.281011 15205 init.cc:212] @ 0x7f9838bec711 PyCFunction_Call
W1105 17:30:22.283577 15205 init.cc:212] @ 0x7f9838c9a4ad _PyEval_EvalFrameDefault
W1105 17:30:22.285887 15205 init.cc:212] @ 0x7f9838c698e4 _PyEval_EvalCodeWithName
W1105 17:30:22.288254 15205 init.cc:212] @ 0x7f9838c6a771 fast_function
W1105 17:30:22.290830 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.293635 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.296411 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.298804 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.301241 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.303527 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.305831 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.308269 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.310564 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.312858 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.315277 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.317591 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.319972 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.322440 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.324836 15205 init.cc:212] @ 0x7f9838c6abab _PyFunction_FastCallDict
W1105 17:30:22.327309 15205 init.cc:212] @ 0x7f9838be9b0f _PyObject_FastCallDict
W1105 17:30:22.329708 15205 init.cc:212] @ 0x7f9838c2b810 _PyObject_CallMethodIdObjArgs
W1105 17:30:22.332087 15205 init.cc:212] @ 0x7f9838be0b10 PyImport_ImportModuleLevelObject
W1105 17:30:22.334471 15205 init.cc:212] @ 0x7f9838c97a8b _PyEval_EvalFrameDefault
W1105 17:30:22.336856 15205 init.cc:212] @ 0x7f9838c6b289 PyEval_EvalCodeEx
Segmentation fault

请问是什么原因？多谢
paddle版本是：https://paddle-wheel.bj.bcebos.com/1.6.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.6.0.post97-cp36-cp36m-linux_x86_64.whl

Error: Paddle internal Check failed.

在impala基础上的改动，运行出现如下错误。运行其他算法没有问题。

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::ReadOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
3   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
4   paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
5   paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool)

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2488, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/layers/io.py", line 872, in read_file
    type='read', inputs={'Reader': [reader]}, outputs={'Out': out})
  File "/home/xtq/DVTrace v2.3.1/atari_agent.py", line 74, in build_program
    self.learn_reader)
  File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 87, in __init__
    self.build_program()
  File "/home/xtq/DVTrace v2.3.1/atari_agent.py", line 29, in __init__
    super(AtariAgent, self).__init__(algorithm)
  File "train.py", line 65, in __init__
    self.learn_data_provider)
  File "train.py", line 276, in <module>
    learner = Learner(config)

----------------------
Error Message Summary:
----------------------
Error: Paddle internal Check failed. (Please help us create a new issue, here we need to find the developer to add a user friendly error message)
  [Hint: Expected ins.size() == out_arg_names.size(), but received ins.size():9 != out_arg_names.size():8.] at (/paddle/paddle/fluid/operators/reader/read_op.cc:92)
  [operator < read > error]

We are hiring!! Come and join us!!!

We are looking for interns with a machine learning background and good skill at programming.

We hope that you have at least three months for the internship position.

If you are interested in machine learning and its application in industrial productions, please email us with the cv through [email protected].

Woking City: Shenzhen, China.

模型的Save和Load在AIStudio中试验失败

目前Save、Load模型参数在本地可以试验成功，但在AIStudio中会出现以下问题：
1、调用fluid.io.save_params()函数后，模型参数的序号会变化，而不是从0开始：

2、将模型的参数拷贝到本地之后，是无法运行的；将他们的需要重命名为从0开始，本地才可以调用fluid.io.load_params()函数运行成功：

3、但是，无论是否重命名，AIStudio调用fluid.io.load_params()函数都会报错；而且，每次load的时候，序号都会发生改变，比如下图的12、16：

项目地址：
https://aistudio.baidu.com/aistudio/projectdetail/63441?_=1560765585066
内含报错信息，麻烦解决一下。

Add Demonstration of MAML

MAML - RL is an important class of RL algorithms
However, MAML seems to be not compatible with current PARL framework
Please see: https://github.com/cbfinn/maml_rl

pytorch example？

where are pytorch example？

你好，请教个关于并行计算的问题！

你好，我想使用PARL在一个计算机集群上进行并行计算和强化学习训练，计算机集群分为两个部分，一部分是GPU集群，运行的是linux系统，主要跑强化学习算法，另一部分是CPU集群，运行的是Windows系统，主要跑仿真环境，仿真环境会通过grpc方式与强化学习算法通迅，我想咨询在这种架构模式下使用PARL是否可行？谢谢！

Does Baidu apply PARL into online recommendation system?

paddle mobile是否可以使用parl？

请问下，paddle mobile是否可以使用parl？需要做哪些修改？

关于动作分布的计算

ppo和sac都要计算动作的概率，但是处理方法不一样。ppo设置单独的可训练的参数作为方差，并且手写了计算概率和kl的方法。而sac用神经网络拟合出均值和方差，并通过layers.Normal建立动作分布，Normal.sample()采样，Normal.kl_divergence(other)、 Normal.log_prob(action)计算动作概率和kl。这两种有什么区别吗？
PPO：

    def _calc_kl(self, means, logvars, old_means, old_logvars):
        log_det_cov_old = layers.reduce_sum(old_logvars)
        log_det_cov_new = layers.reduce_sum(logvars)
        tr_old_new = layers.reduce_sum(layers.exp(old_logvars - logvars))
        kl = 0.5 * (layers.reduce_sum(
            layers.square(means - old_means) / layers.exp(logvars), dim=1) + (
                log_det_cov_new - log_det_cov_old) + tr_old_new - self.act_dim)
        return kl

SAC:

    def sample(self, obs):
        mean, log_std = self.actor.policy(obs)
        std = layers.exp(log_std)
        normal = Normal(mean, std)
        x_t = normal.sample([1])[0]
        y_t = layers.tanh(x_t)
        action = y_t * self.max_action
        log_prob = normal.log_prob(x_t)
        log_prob -= layers.log(self.max_action * (1 - layers.pow(y_t, 2)) +
                               epsilon)
        log_prob = layers.reduce_sum(log_prob, dim=1, keep_dim=True)
        log_prob = layers.squeeze(log_prob, axes=[1])
        return action, log_prob

use cmake to manage all unittests

We should manage all tests using cmake so that all the tests can be run using ctest, similar to https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/CMakeLists.txt

询问关于ZQCNN人脸识别问题

你好，我在左庆的ZQCNN文章下见你在PC上运行了，我也在初手ZQCNN。现在我已在ubuntu系统ZQCNN文件中cmake..和make完成，但接下来不知道做哪一步或者怎么做使用这个库对图片进行识别实现。

pip install parl. Error: command cmake failed: no such file or directory

Under Windows 7 system， python 2.7 and paddle 1.5.1. The following error occurred.

NIPS2018: AI for Prosthetics Challenge的part3中的ensemble_num的含义？

请问，Winning Solution for NIPS2018: AI for Prosthetics Challenge的Part3: Training in random velocity environment for round2 evaluation中ensemble_num指的是什么？，是类似A3C的**吗？还是同时并行训练ensemble_num个模型，test的时候再分别检测各个模型的效果，从中选出一个好的？

Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

我改动过的IMPALA算法，运行出现问题。不知道怎么定位到这个问题。

[03-02 14:47:41 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 1
[03-02 14:47:41 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 1
[03-02 14:47:41 MainThread @train.py:154] Waiting for 1 remote actors to connect.
[03-02 14:47:41 MainThread @train.py:158] Remote actor count: 1
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
......
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.

F0302 14:47:44.609232 23176 device_context.cc:328] cudaStreamSynchronize unspecified launch failure errno: 4
*** Check failure stack trace: ***
    @     0x7f1e600d138d  google::LogMessage::Fail()
    @     0x7f1e600d4e3c  google::LogMessage::SendToLog()
    @     0x7f1e600d0eb3  google::LogMessage::Flush()
    @     0x7f1e600d634e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f1e629ee8f7  paddle::platform::CUDADeviceContext::Wait()
    @     0x7f1e601184fb  paddle::framework::Executor::RunPreparedContext()
    @     0x7f1e60a5c312  paddle::operators::RecurrentOp::RunImpl()
    @     0x7f1e6295783c  paddle::framework::OperatorBase::Run()
    @     0x7f1e601184c6  paddle::framework::Executor::RunPreparedContext()
    @     0x7f1e6011bd0f  paddle::framework::Executor::Run()
    @     0x7f1e5ff5609d  _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE103_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10_
    @     0x7f1e5ff9edd1  pybind11::cpp_function::dispatcher()
    @     0x7f1eb6d6a302  _PyCFunction_FastCallDict
    @     0x7f1eb6def95b  call_function
    @     0x7f1eb6df2d40  _PyEval_EvalFrameDefault
    @     0x7f1eb6dee100  _PyEval_EvalCodeWithName
    @     0x7f1eb6defb2a  call_function
    @     0x7f1eb6df32cc  _PyEval_EvalFrameDefault
    @     0x7f1eb6dee100  _PyEval_EvalCodeWithName
    @     0x7f1eb6defb2a  call_function
    @     0x7f1eb6df32cc  _PyEval_EvalFrameDefault
    @     0x7f1eb6dee100  _PyEval_EvalCodeWithName
    @     0x7f1eb6defb2a  call_function
    @     0x7f1eb6df32cc  _PyEval_EvalFrameDefault
    @     0x7f1eb6ded514  _PyFunction_FastCall
    @     0x7f1eb6defc88  call_function
    @     0x7f1eb6df2d40  _PyEval_EvalFrameDefault
    @     0x7f1eb6ded514  _PyFunction_FastCall
    @     0x7f1eb6dee515  _PyFunction_FastCallDict
    @     0x7f1eb6d12ce6  _PyObject_FastCallDict
    @     0x7f1eb6d12f3c  _PyObject_Call_Prepend
    @     0x7f1eb6d12fd6  PyObject_Call
已放弃

provide an example for saving and restoring trained model

Can we add an example for illustrating how to save and restore the trained model?
I think it is a basic feature of a RL lib and it will be helpful for newbies like me to use PARL in our projects.

Find a bug in layer_warppers

it should be

bias_attr=self.attr_holder.bias_attr

not

bias_attr=self.bias_attr

Cannot run IMPALA in conda environment

I cannot run IMPALA algorithm in my docker's conda environment.
My docker container is built on nvidia/cuda:18.04 with anaconda 5.3.0, and i create an environment named dist-rl with installing python=3.7 paddlepaddle-gpu=1.5.2 cudatoolkit=10.0 via conda and installing parl/gym[atari]/opencv-python via pip
When i run python train.py after starting the cpu cluster using xparl start --port 8010 --cpu_num 5(i also changed the number of cpus in impala_config.py), it occurred errors as follows:

It seems that the main error is paddle.fluid.core_avx.EnforceNotMet: Invoke operator elementwise_mul error., but i don't know how to deal with it.
Thanks very much~

PARL cluster setup failed in windows; get_ip_address not supported

I tried to follow the example in https://parl.readthedocs.io/en/latest/parallel_training/setup.html to setup my windows machine as a cluster. However the xparl start command failed and it seemed the get_ip_address function implemented in parl/utils/machine_info.py doesnt support windows.

my questions are threefold:

Does PARL distributed training support windows?
If it does support windows, then is the incompatibility of get_ip_address under windows the only reason why the example program failed in my case?
how should I fix get_ip_address? (i.e. "set ip address manually", as in the code comment). i am not familiar with socket communication so hopefully there is an easy way...

PaddleCheckError: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl

`paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::GetCUDADeviceCount()

Error Message Summary:

PaddleCheckError: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl, error code : 30, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: unknown error at [/paddle/paddle/fluid/platform/gpu_info.cc:67]
`
this problem happend sometimes, it can be solved after I reboot the computer.
what should I do?

A2C什么时候有?

百度的各位大大们，A2C算法框架什么时候能出？

Errors occurred when running training scripts in NeurIPS2019-Learn-to-Move-Challenge

When running sh scripts/train_difficulty1.sh ./low_speed_model in /PARL/examples/NeurIPS2019-Learn-to-Move-Challenge, absurd errors occurred (as shown below). Can anyone help me? Thanks in advance!

(opensim-rl) luo@idserver:~/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge$ sh scripts/train_difficulty1.sh ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/bin/python
[12-16 23:08:12 MainThread @logger.py:224] Argv: train.py --actor_num 300 --difficulty 1 --penalty_coeff 3.0 --logdir ./output/difficulty1 --restore_model_path ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/opensim/simbody.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[12-16 23:08:12 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
[12-16 23:08:13 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
W1216 23:08:14.078102 6084 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 8.0
W1216 23:08:14.081565 6084 device_context.cc:267] device: 0, cuDNN Version: 7.5.
[12-16 23:08:16 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

I1216 23:08:16.079864 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.081748 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

I1216 23:08:17.209542 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.324332 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True

     # pass the build_strategy to with_data_parallel API
     compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
         loss_name=loss.name, build_strategy=build_strategy)
  
 !!! Memory optimize is our experimental feature !!!
     some variables may be removed/reused internal to save memory usage, 
     in order to fetch the right value of the fetch_list, please set the 
     persistable property to true for each variable in fetch_list

     # Sample
     conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
     # if you need to fetch conv1, then:
     conv1.persistable = True

share_vars_from is set, scope is ignored.
I1216 23:08:17.525264 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.640771 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @train.py:303] restore model from ./low_speed_model
Traceback (most recent call last):
File "train.py", line 327, in
learner = Learner(args)
File "train.py", line 85, in init
self.restore(args.restore_model_path)
File "train.py", line 304, in restore
self.agent.restore(model_path)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 648, in load_vars
executor.run(load_prog)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 651, in run
use_program_cache=use_program_cache)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 749, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator load_combine error.
Python Callstacks:
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1771, in append_op
attrs=kwargs.get("attrs", None))
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 647, in load_vars
attrs={'file_path': os.path.join(load_dirname, filename)})
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
filename=filename)
File "train.py", line 304, in restore
self.agent.restore(model_path)
File "train.py", line 85, in init
self.restore(args.restore_model_path)
File "train.py", line 327, in
learner = Learner(args)
C++ Callstacks:
tensor version 3393762800 is not supported. at [/paddle/paddle/fluid/framework/lod_tensor.cc:256]
PaddlePaddle Call Stacks:
0 0x7efdba6c1f10p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7efdba6c2289p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 137
2 0x7efdbc38c7d4p paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&) + 724
3 0x7efdbb35e480p paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::LoadParamsFromBuffer(paddle::framework::ExecutionContext const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, std::istream*, bool, std::vector<std::string, std::allocatorstd::string > const&) const + 352
4 0x7efdbb35edfep paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 798
5 0x7efdbb35f273p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35
6 0x7efdbc7411e7p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375
7 0x7efdbc7415c1p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8 0x7efdbc73ebbcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9 0x7efdba84cd0ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10 0x7efdba84fdafp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
11 0x7efdba6b359dp
12 0x7efdba6f4826p
13 0x7efe81ea2df2p _PyCFunction_FastCallDict + 258
14 0x7efe81f282bbp
15 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
16 0x7efe81f26a60p
17 0x7efe81f2848ap
18 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
19 0x7efe81f26a60p
20 0x7efe81f2848ap
21 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
22 0x7efe81f26a60p
23 0x7efe81f2848ap
24 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
25 0x7efe81f26a60p
26 0x7efe81f2848ap
27 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
28 0x7efe81f26a60p
29 0x7efe81f2848ap
30 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
31 0x7efe81f26a60p
32 0x7efe81f2848ap
33 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
34 0x7efe81f25e74p
35 0x7efe81f285e8p
36 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
37 0x7efe81f25e74p
38 0x7efe81f26e75p _PyFunction_FastCallDict + 645
39 0x7efe81e4bba6p _PyObject_FastCallDict + 358
40 0x7efe81e4bdfcp _PyObject_Call_Prepend + 204
41 0x7efe81e4be96p PyObject_Call + 86
42 0x7efe81ec4233p
43 0x7efe81eb9d4cp
44 0x7efe81e4badep _PyObject_FastCallDict + 158
45 0x7efe81f282bbp
46 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
47 0x7efe81f26a60p
48 0x7efe81f26ee3p PyEval_EvalCodeEx + 99
49 0x7efe81f26f2bp PyEval_EvalCode + 59
50 0x7efe81f596c0p PyRun_FileExFlags + 304
51 0x7efe81f5ac83p PyRun_SimpleFileExFlags + 371
52 0x7efe81f760b5p Py_Main + 3621
53 0x400c1dp main + 365
54 0x7efe80f01830p __libc_start_main + 240
55 0x4009e9p

关于clip与gaussian噪声的疑问

看了一下DDPG与PPO例子，有两个细节不太清楚。

1、无论是DDPG还是PPO，都采取了"train阶段加clip和action_mapping，而test阶段只加action_mapping"的方式，代码如下所示：

        #train
        action = agent.policy_sample(obs)
        action = np.clip(action, -1.0, 1.0)
        action = action_mapping(action, env.action_space.low[0],
                                env.action_space.high[0])

       #test
        action = agent.policy_predict(obs)
        action = action_mapping(action, env.action_space.low[0],
                                env.action_space.high[0])

由于agent和model层并未进行action的clip，这样能保证test阶段action的预测值在(-1.0,1.0)之间吗？如果不能保证，直接action_mapping会不会有偏差？

2、DDPG与PPO加高斯噪声的时机不一样。
DDPG在模型之外加的噪声:

        # Add exploration noise, and clip to [-1.0, 1.0]
        action = np.clip(np.random.normal(action, 1.0), -1.0, 1.0)
        action = action_mapping(action, env.action_space.low[0],
                                env.action_space.high[0])

PPO在模型内部加的噪声:

    def sample(self, obs):
        means, logvars = self.policy(obs)
        sampled_act = means + (             
            layers.exp(logvars / 2.0) *  # stddev
            layers.gaussian_random(shape=(self.act_dim, ), dtype='float32'))
        return sampled_act

这两种不同的策略是微调出来的经验，还是都可以使用？另外，直接在model层对于action进行fluid.clip(action,low,high)的方式是不是也可以？

Can you give out an example of PrioritizedDQN, another version of DQN?

paddle.fluid.core_avx.EnforceNotMet: Input(C@GRAD) should not be null

I want to Implement COMA with parl, and I use two fluid.Program() to train critic and actor respectively. however I meat two error related to optimizer.

error 1:

code:

    def learn(self, obs, actions, last_actions, q_vals, lr):
        """
        Args:
            obs: [4*env*batch,time,84]
            actions: [4*env*batch,time,1]
            last_actions: [4*env*batch,time,1]
            q_vals:[env*batch,4,time,22]
            lr: float scalar of learning rate.
        """
        mac_out = []
        hidden_state = None
        pre_cell = None
        obs_batch = self._build_actor_inputs(obs, last_actions)  # [4*env*batch,time,106]
        for t in range(obs_batch.shape[1]):
            obs_ = layers.slice(obs_batch, axes=[1], starts=[t], ends=[t + 1])  # [4*env*batch,106]
            if hidden_state is None:
                hidden_state, pre_cell = self.model.init_hidden_state(obs_)  # [4*env*batch,64]
            logits, hidden_state, pre_cell = self.model.policy(obs_, hidden_state, pre_cell)  # [4*env*batch, 22]
            mac_out.append(logits)  # [times,4*env*batch, 22]
        mac_out = layers.stack(mac_out, axis=1)  # [4*env*batch,time,22]

        # Calculated baseline

        q_vals = layers.reshape(q_vals, [-1, self.action_dim])  # [4*env*batch*(time),22]
        pi = layers.reshape(mac_out, [-1, self.action_dim])  # [4*env*batch*(time),22]
        baseline = layers.reduce_sum(pi * q_vals, dim=-1, keep_dim=True)  # [4*env*batch*(time),1]

        # Calculate policy grad
        actions_for_one_hot = layers.reshape(actions, [-1, 1])  # [4*env*batch*(time),1]
        actions_one_hot = layers.one_hot(actions_for_one_hot, self.action_dim)  # [4*env*batch*(time),22]
        q_taken = layers.reduce_sum(actions_one_hot * q_vals, dim=-1, keep_dim=True)  # [4*env*batch*(time),1]
        pi_taken = layers.reduce_sum(actions_one_hot * pi, dim=-1, keep_dim=True)  # [4*env*batch*time,1]
        log_pi_taken = layers.log(pi_taken)  # [4*env*batch*time,1]

        advantages = (q_taken - baseline)
        coma_loss = layers.reduce_sum(advantages * log_pi_taken)  # [1]

        # Optimise agents
        fluid.clip.set_gradient_clip(
            clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=self.grad_norm_clip))

        optimizer = fluid.optimizer.RMSPropOptimizer(lr, rho=self.optim_alpha, epsilon=self.optim_eps)
        optimizer.minimize(coma_loss) # error
        return coma_loss

error

line 300, in learn
optimizer.minimize(total_loss)
File "", line 2, in minimize
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/dygraph/base.py", line 87, in impl
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 594, in minimize
no_grad_set=no_grad_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 493, in backward
no_grad_set, callbacks)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 578, in append_backward
append_backward_vars(root_block, fwd_op_num, grad_to_var, grad_info_map)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 392, in append_backward_vars
op_desc.infer_shape(block.desc)
paddle.fluid.core_avx.EnforceNotMet: Input(C@GRAD) should not be null at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/lstm_unit_op.cc:88]

error two:

code

def _train_critic(self, obs, actions, last_actions, rewards, targets, lr_critic):
     """
     :param obs: [4*env*batch,time,84]
     :param actions: [4*env*batch,time,1]
     :param last_actions: [4*env*batch,time,1]
     :param rewards: [env*batch,time]
     :param targets: [env*batch,4,time]
     :return: q_vals, critic_train_stats [env*batch,4,time,22]
     """
     # init state
     batch = self._build_critic_inputs(obs, actions, last_actions)  # [env*batch,time,452]
     actions_one_hot = layers.one_hot(actions, self.action_dim)  # [4*env*batch,time,22]
     actions_one_hot = layers.reshape(actions_one_hot, [-1, 4, batch.shape[-2], self.action_dim])  # [env*batch,4,time,22]

     # Optimise agents
     fluid.clip.set_gradient_clip(
         clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=self.grad_norm_clip))

     optimizer = fluid.optimizer.RMSPropOptimizer(lr_critic, rho=self.optim_alpha, epsilon=self.optim_eps)

     critic_train_stats = {
         "critic_loss": [],
         "td_error_abs": [],
         "target_mean": [],
         "q_taken_mean": []
     }

     q_vals_list = []
     for t in range(rewards.shape[1]):  # time
         obs_ = batch[:, t]  # [env*batch,452]
         q_t = self.model.value(obs_)  # [env*batch,22]
         q_t = layers.reshape(q_t, [q_t.shape[0], 1, q_t.shape[-1]])
         q_t = layers.expand(q_t, [1, 4, 1])  # [env*batch,4,22]
         q_taken = layers.reduce_sum(q_t * actions_one_hot[:, :, t, :], dim=-1)  # [env*batch,4]
         q_t_taken = targets[:, :, t]  # [env*batch,4]
         td_error = q_taken - q_t_taken  # [env*batch,4]
         q_vals_list.append(q_t)  # [env*batch,4,22]

         loss = layers.reduce_sum(td_error ** 2)  # [1]
         optimizer.minimize(loss)
         critic_train_stats["critic_loss"].append(loss)
         critic_train_stats['td_error_abs'].append(td_error)
         critic_train_stats['q_taken_mean'].append(q_taken)
         critic_train_stats['target_mean'].append(q_t_taken)

     q_vals = layers.stack(q_vals_list, axis=2)  # [env*batch,4,time,22]
     for key in critic_train_stats.keys():
         critic_train_stats[key] = layers.reduce_sum(layers.stack(critic_train_stats[key]))
     return q_vals, critic_train_stats

error:

File ,line 150, in learn
q_vals, critic_train_stats = self._train_critic(obs, actions, last_actions, rewards, targets, lr_critic)
File line 116, in train_critic
optimizer.minimize(loss)
File "", line 2, in minimize
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/dygraph/base.py", line 87, in impl
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 594, in minimize
no_grad_set=no_grad_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 493, in backward
no_grad_set, callbacks)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 571, in append_backward
input_grad_names_set=input_grad_names_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 310, in append_backward_ops
op.desc, cpt.to_text(no_grad_dict[block.idx]), grad_sub_block_list)
paddle.fluid.core_avx.EnforceNotMet: grad_op_maker should not be null
Operator GradOpMaker has not been registered. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/framework/op_info.h:69]

how can I solve it?
thanks

example coding for cluster can't work

I build a cluster successfully. But when I run the example and recommended Practice, it will return an error.
[02-27 09:23:17 MainThread @logger.py:224] Argv: a.test.py
[02-27 09:23:37 Thread-2 @client.py:244] ERR [xparl] lost connection with a job, current actor num: 0

I test it in ubuntu 18, ubuntu 16 and WSL.

batch_norm bug when copy models.

batch_norm have two accessible params "bias" and "scale", as well as two hidden params "mean" and "var". Current code won't copy the "mean" and "var".

UnboundLocalError in NeurIPS2019-Learn-to-Move-Challenge

When running sh scripts/train_difficulty1.sh ./low_speed_model , PARL remote error occurred. It
looks like the local variable 'reward_footstep_0' referenced before assignment.

[12-21 10:39:48 Thread-1 @train.py:287] saving models
[12-21 10:39:48 Thread-1 @train.py:290] saving rpm
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 190, in run_remote_sample
obs, reward, done, info = remote_actor.step(action)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 189, in wrapper
raise RemoteError(attr, error_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function step]:
local variable 'reward_footstep_0' referenced before assignment
traceback:
Traceback (most recent call last):
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/remote/job.py", line 379, in single_task
ret = getattr(obj, function_name)(*args, **kwargs)
File "/home/luo/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py", line 56, in step
return self.env.step(action, project=False)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 219, in step
obs, r, done, info = self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 51, in step
return self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 68, in step
obs, reward, done, info = self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 119, in step
obs, r, done, info = self.env.step(action, **kwargs)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 562, in step
_, reward, done, info = super(L2M2019Env, self).step(action_mapped, project=project, obs_as_dict=obs_as_dict)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 356, in step
return [ obs, self.get_reward(), self.is_done() or (self.osim_model.istep >= self.spec.timestep_limit), {} ]
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 764, in get_reward
return self.get_reward_1()
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 821, in get_reward_1
reward += reward_footstep_0 + 10
UnboundLocalError: local variable 'reward_footstep_0' referenced before assignment

I worked on Ubuntu 16.04 with Titan XP GPU.

第二部分没看明白，请教下

请问下，我看到几种target实现分为server和client，意思是一个在服务器上运行，一个在客户端运行吗？client中看到了target的运行速度，但是server中并没有写出。请我我该怎么理解这部分的意思？

Warning: NaN or Inf found in input tensor.

控制台显示NaN or Inf found in input tensor.
我该怎么定位这个问题呢？

Will it support Multi-Agent RL in the future?

how to deal with importing Multi-Agent Env or custom envs?

[bug] Here should check val[k] whether is a Network or LayerFunc

PARL/parl/layers/layer_wrappers.py

Line 156 in e48f67a

val[k].sync_paras_to(target_val[k], gpu_id)

restore model error

# save the parameters to ./model.ckpt
self.agent.save('./model.ckpt')

I have saved model successfully！
And I want to restore model and render.

 algorithm = IMPALA(
            model,
            sample_batch_steps=config['sample_batch_steps'],
            gamma=config['gamma'],
            vf_loss_coeff=config['vf_loss_coeff'],
            clip_rho_threshold=config['clip_rho_threshold'],
            clip_pg_rho_threshold=config['clip_pg_rho_threshold'])
    agent = MPEAgent(algorithm, obs_shape, act_dim)
    agent.restore('./model.ckpt')

Then the TypeError happend：

    agent.restore('./model.ckpt')
  File "/home/tianqi/anaconda3/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
    filename=filename)
  File "/home/tianqi/anaconda3/lib/python3.6/site-packages/paddle/fluid/io.py", line 798, in load_params
    filename=filename)
  File "/home/tianqi/anaconda3/lib/python3.6/site-packages/paddle/fluid/io.py", line 675, in load_vars
    raise TypeError("program's type should be Program")

TypeError: program's type should be Program```

reducing the time consuming in tests

Currently, it takes around 25 minutes in unittest teamcity. We can speed up the test by running multiple tests concurrently as the test finishes one by one.
That is, we use ctest -j10 in .teamcity/build.sh, but we need to change our code it make it executable in parallel.

layer wrapper of dynamic_lstm and dynamic_gru don't support to pass parameters of previous hidden or previous cell?