Giter VIP home page Giter VIP logo

Comments (20)

zenghsh3 avatar zenghsh3 commented on July 20, 2024 2

下面是一个简单的使用LodTensor和dynamic_lstm的例子,可以参考下,(提供的to_lodtensor工具只支持mini-batch中seq_len都是一样的情况)

#!/usr/bin/env python
# coding=utf8

import parl
from paddle import fluid
import numpy as np

OBS_DIM = 5
SEQ_LEN = 10
BATCH_SIZE = 16

HID_DIM = 16

def to_lodtensor(data, place):
    """
    For seq_len is the same in mini-batch
    """
    lod = [i * data.shape[1] for i in range(data.shape[0] + 1)]

    if len(data.shape) == 2:
        # (batch_size, seq_len)
        data = data.reshape([-1, 1])
    elif len(data.shape) == 3:
        # (batch_size, seq_len, input_dim)
        data = data.reshape([-1, data.shape[-1]])
    else:
        assert False

    res = fluid.LoDTensor()
    res.set(data, place)
    res.set_lod([lod])
    return res

class TestModel(parl.Model):
    def __init__(self):
        """
        NOTE:
            1. A fc project is needed before dynamic lstm.
            2. Input hidden size of dynamic lstm should be multiplied by 4.
        """
        self.lstm_before_fc = parl.layers.fc(size=HID_DIM * 4)
        self.dynamic_lstm = parl.layers.dynamic_lstm(size=HID_DIM * 4)

    def policy(self, obs):
        forward_proj = self.lstm_before_fc(obs)
        output, _ = self.dynamic_lstm(forward_proj)
        return output

test_model = TestModel()

test_program = fluid.Program()
with fluid.program_guard(test_program):
    x = fluid.layers.data(name='x', shape=[OBS_DIM], dtype='float32', lod_level=1)
    output = test_model.policy(x)


place = fluid.CPUPlace()
exe = fluid.Executor(place=place)
exe.run(fluid.default_startup_program())

x_np = np.random.random([BATCH_SIZE, SEQ_LEN, OBS_DIM]).astype('float32')

print('input shape:', x_np.shape)
feed = {'x': to_lodtensor(x_np, place)}

output_np = exe.run(program=test_program, feed=feed, fetch_list=[output], return_numpy=False)[0]

print('output shape:', np.array(output_np).shape)

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024 1
HID_DIM = 512     
class AtariModel(Model):
    def __init__(self, act_dim):
        self.policy_fc = layers.fc(size=act_dim, act='softmax')
        self.value_fc  = layers.fc(size=1)
    def policy(self, obs):
        fc1  = fluid.layers.fc(input=obs, size=HID_DIM)
        lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
        policy_logits  = self.policy_fc(lstm1)
        return policy_logits
    def value(self, obs):
        flatten   = layers.flatten(obs, axis=1)
        fc_output = layers.fc(size=512, act='relu')(flatten)
        values    = self.value_fc(fc_output)
        values    = layers.squeeze(values, axes=[1])
        return values
    def policy_and_value(self, obs):
        policy_logits = self.policy(obs)
        values        = self.value(obs)
        return policy_logits, values

你好。

我们讨论了一下你的代码,目前你做的变动是希望把RNN替换掉当前的CNN网络。

但是需要指出的是:RNN和CNN不是简单的可替换关系, CNN在这里是抽取图像的特征,用于后续做决策。而简单地把CNN换成RNN是没法表征图像的,有相关的工作在RL里面引入RNN,相关的论文可以看下DRQN这篇文章。

由于当前代码改动的目的不合理,所以我们不建议继续追查这个问题。

您好,我明白CNN和RNN不是简单的可替换关系,我做这个替换,是因为我换掉了gym的ENV,用了自己定义的ENV,在我定义的ENV里,obs是一个二维的矩阵(time_step,特征),是时间序列,所以我需要构建LSTM模型。

模型合理性的问题,请放心。現已完成一套由tensorflow写的A3C模型,用lstm处理有时间序列的obs,效果的确比较好。

在Keras中的表述为:

lstm_input   = Input(shape=(40,11), name='lstm_in')
lstm_output = LSTM(128, activation=tanh, dropout_W=0.2, dropout_U=0.1)(lstm_input)

能否给我个例子,在agent.learn的时候构造lod_tensor,该怎么feed进去

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024 1

感谢众开发人员的回覆,希望未来能提供model用LSTM的例子。

paddlepaddle LodTensor这概念实在太难写了,如果Value和Policy使用不同网络,还得单独对有LodTensor的输入输出进行转换,用openai baseline写一下午就搞定的事,这我写了四天还报错 :(

我弃坑了,祝你们越来越好,谢谢

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024 1

了解了,这个确实是底层paddle的接口设计太难上手了,我们会向paddlepaddle的开发人员建议:
在后续的动态图版本简化lod tensor的使用问题

祝好:)
再次感谢你的尝试。

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

你好:)
可以贴一下具体的错误信息吗?

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

Atarimodel

HID_DIM = 512     
class AtariModel(Model):
    def __init__(self, act_dim):
        self.policy_fc = layers.fc(size=act_dim, act='softmax')
        self.value_fc  = layers.fc(size=1)
    def policy(self, obs):
        fc1  = fluid.layers.fc(input=obs, size=HID_DIM)
        lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
        policy_logits  = self.policy_fc(lstm1)
        return policy_logits
    def value(self, obs):
        flatten   = layers.flatten(obs, axis=1)
        fc_output = layers.fc(size=512, act='relu')(flatten)
        values    = self.value_fc(fc_output)
        values    = layers.squeeze(values, axes=[1])
        return values
    def policy_and_value(self, obs):
        policy_logits = self.policy(obs)
        values        = self.value(obs)
        return policy_logits, values

Agent有更动的部分:(OBS新增lod_level)

self.sample_program  = fluid.Program()
        self.predict_program = fluid.Program()
        self.value_program   = fluid.Program()
        self.learn_program   = fluid.Program()

        with fluid.program_guard(self.sample_program):
            obs = layers.data(
                name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
            sample_actions, values = self.alg.sample(obs)
            self.sample_outputs = [sample_actions, values]

        with fluid.program_guard(self.predict_program):
            obs = layers.data(
                name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
            self.predict_actions = self.alg.predict(obs)

        with fluid.program_guard(self.value_program):
            obs = layers.data(
                name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
            self.values = self.alg.value(obs)

        with fluid.program_guard(self.learn_program):
            obs = layers.data(
                name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
            actions = layers.data(name='actions', shape=[], dtype='int64')

            advantages = layers.data(
                name='advantages', shape=[], dtype='float32')
            target_values = layers.data(
                name='target_values', shape=[], dtype='float32')
            lr = layers.data(
                name='lr', shape=[1], dtype='float32', append_batch_size=False)
            entropy_coeff = layers.data(
                name='entropy_coeff', shape=[], dtype='float32')

            total_loss, pi_loss, vf_loss, entropy = self.alg.learn(
                obs, actions, advantages, target_values, lr, entropy_coeff)
            self.learn_outputs = [
                total_loss.name, pi_loss.name, vf_loss.name, entropy.name
            ]

错误信息:

ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0711 13:32:59.592660 292443584 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
I0711 13:32:59.605545 292443584 build_strategy.cc:282] SeqOnlyAllReduceOps:0, num_trainers:1
[07-11 13:32:59 MainThread @learner.py:85] Waiting for 2 remote actors to connect.
[07-11 13:32:59 Thread-1 @remote_object.py:51] [connect_remote_client] client_address:192.168.75.236:54698
[07-11 13:32:59 Thread-1 @remote_manager.py:88] [RemoteManager] Added a new remote object.
[07-11 13:32:59 MainThread @learner.py:94] Remote actor count: 1
[07-11 13:32:59 Thread-1 @remote_object.py:51] [connect_remote_client] client_address:192.168.75.236:54694
[07-11 13:32:59 Thread-1 @remote_manager.py:88] [RemoteManager] Added a new remote object.
[07-11 13:32:59 MainThread @learner.py:94] Remote actor count: 2
[07-11 13:32:59 MainThread @learner.py:100] All remote actors are ready, begin to learn.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/anaconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/anaconda3/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "learner.py", line 109, in run_remote_sample
    batch  = remote_actor.sample()
  File "/anaconda3/lib/python3.5/site-packages/parl/remote/remote_object.py", line 82, in wrapper
    raise RemoteError(attr, error_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `sample`]:
Invoke operator lstm error.
Python Callstacks: 
  File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/framework.py", line 1654, in append_op
    attrs=kwargs.get("attrs", None))
  File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/layers/nn.py", line 525, in dynamic_lstm
    'candidate_activation': candidate_activation
  File "/Users/renyanxue/Desktop/git_project/paper/atari_model.py", line 116, in policy
    lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
  File "/Users/renyanxue/Desktop/git_project/paper/atari_model.py", line 143, in policy_and_value
    policy_logits = self.policy(obs)
  File "/anaconda3/lib/python3.5/site-packages/parl/algorithms/a3c.py", line 72, in sample
    logits, values = self.model.policy_and_value(obs)
  File "/Users/renyanxue/Desktop/git_project/paper/atari_agent.py", line 42, in build_program
    sample_actions, values = self.alg.sample(obs)
  File "/anaconda3/lib/python3.5/site-packages/parl/framework/agent_base.py", line 46, in __init__
    self.build_program()
  File "/Users/renyanxue/Desktop/git_project/paper/atari_agent.py", line 11, in __init__
    super(AtariAgent, self).__init__(algorithm)
  File "actor.py", line 45, in __init__
    self.agent= AtariAgent(algorithm, config)
  File "/anaconda3/lib/python3.5/site-packages/parl/remote/remote_decorator.py", line 54, in __init__
    self.unwrapped = cls(*args, **kwargs)
  File "actor.py", line 117, in <module>
    actor = Actor(config)
C++ Callstacks: 
Enforce failed. Expected lods.size() == 1UL, but received lods.size():0 != 1UL:1.
Only support one level sequence now. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/math/sequence2batch.h:79]
PaddlePaddle Call Stacks: 
0         0x1a2056a454p void paddle::platform::EnforceNotMet::Init<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const*, int) + 628
1         0x1a2056a180p paddle::platform::EnforceNotMet::EnforceNotMet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, int) + 80
2         0x1a2151c081p paddle::operators::math::LoDTensor2BatchFunctor<paddle::platform::CPUDeviceContext, float>::operator()(paddle::platform::CPUDeviceContext const&, paddle::framework::LoDTensor const&, paddle::framework::LoDTensor*, bool, bool) const + 2657
3         0x1a208a8c27p paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 1143
4         0x1a208a8770p std::__1::__function::__func<paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, double> >::operator()(char const*, char const*, int) const::'lambda'(paddle::framework::ExecutionContext const&), std::__1::allocator<paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, double> >::operator()(char const*, char const*, int) const::'lambda'(paddle::framework::ExecutionContext const&)>, void (paddle::framework::ExecutionContext const&)>::operator()(paddle::framework::ExecutionContext const&) + 32
5         0x1a2174fd11p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 865
6         0x1a2174f92cp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 300
7         0x1a2174baf5p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 357
8         0x1a206c73eep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 334
9         0x1a206c6e8cp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, bool) + 172
10        0x1a205d49fdp void pybind11::cpp_function::initialize<paddle::pybind::pybind11_init_core(pybind11::module&)::$_96, void, paddle::framework::Executor&, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(paddle::pybind::pybind11_init_core(pybind11::module&)::$_96&&, void (*)(paddle::framework::Executor&, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 301
11        0x1a2054b308p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3400
12         0x10ba3d59fp PyCFunction_Call + 127
13         0x10bb087e7p PyEval_EvalFrameEx + 33207
14         0x10bafefafp _PyEval_EvalCodeWithName + 335
15         0x10bb052a7p PyEval_EvalFrameEx + 19575
16         0x10bafefafp _PyEval_EvalCodeWithName + 335
17         0x10bb052a7p PyEval_EvalFrameEx + 19575
18         0x10bb04fb8p PyEval_EvalFrameEx + 18824
19         0x10bafefafp _PyEval_EvalCodeWithName + 335
20         0x10ba0a6aap function_call + 106
21         0x10b9c6b35p PyObject_Call + 69
22         0x10bb05c9bp PyEval_EvalFrameEx + 22123
23         0x10bafefafp _PyEval_EvalCodeWithName + 335
24         0x10ba0a6aap function_call + 106
25         0x10b9c6b35p PyObject_Call + 69
26         0x10bb05c9bp PyEval_EvalFrameEx + 22123
27         0x10bb04fb8p PyEval_EvalFrameEx + 18824
28         0x10bb04fb8p PyEval_EvalFrameEx + 18824
29         0x10bafefafp _PyEval_EvalCodeWithName + 335
30         0x10ba0a6aap function_call + 106
31         0x10b9c6b35p PyObject_Call + 69
32         0x10b9e9694p method_call + 148
33         0x10b9c6b35p PyObject_Call + 69
34         0x10bb0dbf4p PyEval_CallObjectWithKeywords + 68
35         0x10bb7a72ap t_bootstrap + 122
36      0x7fff78c4b2ebp _pthread_body + 126
37      0x7fff78c4e249p _pthread_start + 66
38      0x7fff78c4a40dp thread_start + 13

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

收到,我们先在本地尝试复现并定位你的问题,谢谢反馈

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

@RonaldJEN 请提供下正在使用的paddle版本

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

parl==1.1
paddle==1.0.2
paddlehub==1.0.1
paddlepaddle==1.4.0

PARL因为PIP无法更新到最新的,所以我直接拿git上的盖过去

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

收到,如果要安装最新的PARL可以

git clone https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .

即可,不需要用“盖过去”这么粗暴的方式哈

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

收到,如果要安装最新的PARL可以

git clone https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .

即可,不需要用“盖过去”这么粗暴的方式哈

我是按你说的方法更新,可能描述得不清楚哈

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024
HID_DIM = 512     
class AtariModel(Model):
    def __init__(self, act_dim):
        self.policy_fc = layers.fc(size=act_dim, act='softmax')
        self.value_fc  = layers.fc(size=1)
    def policy(self, obs):
        fc1  = fluid.layers.fc(input=obs, size=HID_DIM)
        lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
        policy_logits  = self.policy_fc(lstm1)
        return policy_logits
    def value(self, obs):
        flatten   = layers.flatten(obs, axis=1)
        fc_output = layers.fc(size=512, act='relu')(flatten)
        values    = self.value_fc(fc_output)
        values    = layers.squeeze(values, axes=[1])
        return values
    def policy_and_value(self, obs):
        policy_logits = self.policy(obs)
        values        = self.value(obs)
        return policy_logits, values

你好。
我们讨论了一下你的代码,目前你做的变动是希望把RNN替换掉当前的CNN网络。
但是需要指出的是:RNN和CNN不是简单的可替换关系, CNN在这里是抽取图像的特征,用于后续做决策。而简单地把CNN换成RNN是没法表征图像的,有相关的工作在RL里面引入RNN,相关的论文可以看下DRQN这篇文章。

由于当前代码改动的目的不合理,所以我们不建议继续追查这个问题。

from parl.

TomorrowIsAnOtherDay avatar TomorrowIsAnOtherDay commented on July 20, 2024

对于代码本身的运行错误问题,我们认为是:
既然把用lod_tensor来输入obs,就得在agent.learn的时候构造lod_tensor,而不能再像现在一样直接feed进去哈。

但我们还是建议你先去确认下这个改动的合理性,避免做无用功呢。欢迎继续讨论:)

from parl.

zenghsh3 avatar zenghsh3 commented on July 20, 2024

LodTensor相关示例可以参考paddle models里面的示例:https://github.com/PaddlePaddle/models/search?p=1&q=LodTensor&unscoped_q=LodTensor

另外,因为A2C使用了parl.Agent里面的get_params/set_params接口进行learner和actor的参数同步,但接口只支持同步Model里面用parl.layers在__init__里创建的带参数layers,所以建议在Model的__init__里面声明parl.layers.dynamic_lstm、parl.layers.fc等。

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

Agent
obs = (batch_size, seq_len, input_dim)
(80,20,11)
根据上述的案例做了以下更改,输出的action和values格式跟以前不同

# ε-greedy
    def build_program(self):
        self.sample_program  = fluid.Program()
        self.predict_program = fluid.Program()
        self.value_program   = fluid.Program()
        self.learn_program   = fluid.Program()

        with fluid.program_guard(self.sample_program):
            obs = layers.data(
                name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
            sample_actions, values = self.alg.sample(obs)
            self.sample_outputs = [sample_actions, values]
        with fluid.program_guard(self.predict_program):
            obs = layers.data(
                name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
            self.predict_actions = self.alg.predict(obs)

        with fluid.program_guard(self.value_program):
            obs = layers.data(
                name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
            self.values = self.alg.value(obs)

        with fluid.program_guard(self.learn_program):
            obs = layers.data(
                name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)

            actions = layers.data(name='actions', shape=[], dtype='int64')

            advantages = layers.data(
                name='advantages', shape=[], dtype='float32')
            target_values = layers.data(
                name='target_values', shape=[], dtype='float32')
            lr = layers.data(
                name='lr', shape=[1], dtype='float32', append_batch_size=False)
            entropy_coeff = layers.data(
                name='entropy_coeff', shape=[], dtype='float32')

            total_loss, pi_loss, vf_loss, entropy = self.alg.learn(
                obs, actions, advantages, target_values, lr, entropy_coeff)
            self.learn_outputs = [
                total_loss.name, pi_loss.name, vf_loss.name, entropy.name
            ]
    def sample(self, obs_np):
        obs_np = obs_np.astype('float32')

        sample_actions, values = self.fluid_executor.run(
                    self.sample_program,
                    feed={'obs': to_lodtensor(obs_np,self.place)},
                    fetch_list=self.sample_outputs,
                    return_numpy=False)


        return sample_actions, values

    # 预测
    def predict(self, obs_np):
     
        obs_np = obs_np.astype('float32')
        predict_actions = self.fluid_executor.run(
            self.predict_program,
            feed={'obs': to_lodtensor(obs_np,self.place)},
            fetch_list=[self.predict_actions],
            return_numpy=False)[0]
        return predict_actions


    def value(self, obs_np):
      
     
        obs_np = obs_np.astype('float32')

        values = self.fluid_executor.run(
            self.value_program, feed={'obs': to_lodtensor(obs_np,self.place)},
            fetch_list=[self.values],
            return_numpy=False)[0]
        return values

    # 学习
    def learn(self, obs_np, actions_np, advantages_np, target_values_np, terminal=None):
        obs_np           = obs_np.astype('float32')
        actions_np       = actions_np.astype('int64')
        advantages_np    = advantages_np.astype('float32')
        target_values_np = target_values_np.astype('float32')
        lr = self.lr_scheduler.step(step_num=obs_np.shape[0])
        entropy_coeff = self.entropy_coeff_scheduler.step()

        total_loss, pi_loss, vf_loss, entropy = self.learn_exe.run(
            feed={
                'obs':           to_lodtensor(obs_np,self.place),
                'actions':       to_lodtensor(actions_np,self.place),
                'advantages':    to_lodtensor(advantages_np,self.place),
                'target_values': to_lodtensor(target_values_np,self.place),
                'lr':            to_lodtensor(np.array([lr], dtype='float32'),self.place),
                'entropy_coeff': to_lodtensor(np.array([entropy_coeff], dtype='float32'),self.place)
            },
            fetch_list=self.learn_outputs,
            return_numpy=False)

        return total_loss, pi_loss, vf_loss, entropy, lr, entropy_coeff

Atarimodel

HID_DIM = 16
class Atarimodel(Model):
    def __init__(self, act_dim):
        self.lstm_before_fc = layers.fc(size=HID_DIM * 4)
        self.dynamic_lstm   = layers.dynamic_lstm(size=HID_DIM * 4)
        self.policy_fc      = layers.fc(size=act_dim, act='softmax')
        self.value_fc       = layers.fc(size=1)
    def policy(self, obs):
        forward_proj   = self.lstm_before_fc(obs)
        output, _      = self.dynamic_lstm(forward_proj)
        policy_logits  = self.policy_fc(output)
        return policy_logits
    def value(self, obs):
        forward_proj = self.lstm_before_fc(obs)
        output, _ = self.dynamic_lstm(forward_proj)
        values         = self.value_fc(output)
        values         = layers.squeeze(values, axes=[1])
        return values
    def policy_and_value(self, obs):
        policy_logits = self.policy(obs)
        values        = self.value(obs)
        return policy_logits, values

错误信息:
输出的action为<class 'paddle.fluid.core.Tensor'>
value也是,和原本的输出不同

from parl.

zenghsh3 avatar zenghsh3 commented on July 20, 2024

np.array(output_tensor)可以转换成numpy array;另外,LodTensor的输出会把Batch_size*seq_len合成一个维度,需要自己在paddle层面或者numpy层面转换

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

到learn这块输出错误信息(改了一晚没能改出,望指点)

def learn(self, obs_np, actions_np, advantages_np, target_values_np, terminal=None):
        obs_np                  = obs_np.astype('float32')
        Batch_size            = obs_np.shape[0]
        actions_np            = actions_np.astype('int64').reshape(Batch_size,1)
        advantages_np     = advantages_np.astype('float32').reshape(Batch_size,1)
        target_values_np   = target_values_np.astype('float32').reshape(Batch_size,1)

        lr = self.lr_scheduler.step(step_num=Batch_size)
        entropy_coeff = self.entropy_coeff_scheduler.step()

        total_loss, pi_loss, vf_loss, entropy = self.learn_exe.run(
            feed={
                'obs':           to_lodtensor(obs_np,self.place),
                'actions':       to_lodtensor(actions_np,self.place),
                'advantages':    to_lodtensor(advantages_np,self.place),
                'target_values': to_lodtensor(target_values_np,self.place),
                'lr':            np.array([lr], dtype='float32'),
                'entropy_coeff': np.array([entropy_coeff], dtype='float32')
            },
            fetch_list=self.learn_outputs,
            return_numpy=False)
        return total_loss, pi_loss, vf_loss, entropy, lr, entropy_coeff

错误信息

C++ Callstacks: 
Enforce failed. Expected x_dim.size() >= y_dim.size(), but received x_dim.size():2 < y_dim.size():3.
Rank of first input must >= rank of second input. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/elementwise/elementwise_op.h:56]
PaddlePaddle Call Stacks: 
0         0x1a22d7a454p void paddle::platform::EnforceNotMet::Init<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const*, int) + 628
1         0x1a22d7a180p paddle::platform::EnforceNotMet::EnforceNotMet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, int) + 80
2         0x1a23a390cap paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const + 842
3         0x1a23f5fcdfp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 815
4         0x1a23f5f92cp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 300
5         0x1a23f5baf5p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 357
6         0x1a23de3d8fp std::__1::__function::__func<paddle::framework::details::ComputationOpHandle::RunImpl()::$_0, std::__1::allocator<paddle::framework::details::ComputationOpHandle::RunImpl()::$_0>, void ()>::operator()() + 111
7         0x1a23de3803p paddle::framework::details::ComputationOpHandle::RunImpl() + 195
8         0x1a23dc0c62p std::__1::__packaged_task_func<std::__1::__bind<paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpAsync(std::__1::unordered_map<paddle::framework::details::OpHandleBase*, std::__1::atomic<int>, std::__1::hash<paddle::framework::details::OpHandleBase*>, std::__1::equal_to<paddle::framework::details::OpHandleBase*>, std::__1::allocator<std::__1::pair<paddle::framework::details::OpHandleBase* const, std::__1::atomic<int> > > >*, paddle::framework::details::OpHandleBase*, std::__1::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&)::$_0>, std::__1::allocator<std::__1::__bind<paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpAsync(std::__1::unordered_map<paddle::framework::details::OpHandleBase*, std::__1::atomic<int>, std::__1::hash<paddle::framework::details::OpHandleBase*>, std::__1::equal_to<paddle::framework::details::OpHandleBase*>, std::__1::allocator<std::__1::pair<paddle::framework::details::OpHandleBase* const, std::__1::atomic<int> > > >*, paddle::framework::details::OpHandleBase*, std::__1::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&)::$_0> >, void ()>::operator()() + 82
9         0x1a23d4dc99p std::__1::packaged_task<void ()>::operator()() + 73
10        0x1a22e49f9ap ThreadPool::ThreadPool(unsigned long)::'lambda'()::operator()() const + 522
11        0x1a22e49d1dp void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPool::ThreadPool(unsigned long)::'lambda'()> >(void*) + 45
12      0x7fff5a89c2ebp _pthread_body + 126
13      0x7fff5a89f249p _pthread_start + 66
14      0x7fff5a89b40dp thread_start + 13

from parl.

zenghsh3 avatar zenghsh3 commented on July 20, 2024

从报错信息看,应该是说调用elementwise_op时,要求 x_dim.size() >= y_dim.size(),即第一个输入的维度要大于第二个输入的维度,可以输出tensor shape定位下具体是哪里报错的

from parl.

phlrain avatar phlrain commented on July 20, 2024

paddle在1.5 中 新增了新的api,目前还在contrib目录中
使用方式请参考
https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/models/language_model/lm_model.py#L338

from parl.

RonaldJEN avatar RonaldJEN commented on July 20, 2024

paddle在1.5 中 新增了新的api,目前还在contrib目录中
使用方式请参考
https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/models/language_model/lm_model.py#L338

谢谢您的提供。我之后有时间的时候,再研究看看 :)

from parl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.