Comments (20)
下面是一个简单的使用LodTensor和dynamic_lstm的例子,可以参考下,(提供的to_lodtensor工具只支持mini-batch中seq_len都是一样的情况)
#!/usr/bin/env python
# coding=utf8
import parl
from paddle import fluid
import numpy as np
OBS_DIM = 5
SEQ_LEN = 10
BATCH_SIZE = 16
HID_DIM = 16
def to_lodtensor(data, place):
"""
For seq_len is the same in mini-batch
"""
lod = [i * data.shape[1] for i in range(data.shape[0] + 1)]
if len(data.shape) == 2:
# (batch_size, seq_len)
data = data.reshape([-1, 1])
elif len(data.shape) == 3:
# (batch_size, seq_len, input_dim)
data = data.reshape([-1, data.shape[-1]])
else:
assert False
res = fluid.LoDTensor()
res.set(data, place)
res.set_lod([lod])
return res
class TestModel(parl.Model):
def __init__(self):
"""
NOTE:
1. A fc project is needed before dynamic lstm.
2. Input hidden size of dynamic lstm should be multiplied by 4.
"""
self.lstm_before_fc = parl.layers.fc(size=HID_DIM * 4)
self.dynamic_lstm = parl.layers.dynamic_lstm(size=HID_DIM * 4)
def policy(self, obs):
forward_proj = self.lstm_before_fc(obs)
output, _ = self.dynamic_lstm(forward_proj)
return output
test_model = TestModel()
test_program = fluid.Program()
with fluid.program_guard(test_program):
x = fluid.layers.data(name='x', shape=[OBS_DIM], dtype='float32', lod_level=1)
output = test_model.policy(x)
place = fluid.CPUPlace()
exe = fluid.Executor(place=place)
exe.run(fluid.default_startup_program())
x_np = np.random.random([BATCH_SIZE, SEQ_LEN, OBS_DIM]).astype('float32')
print('input shape:', x_np.shape)
feed = {'x': to_lodtensor(x_np, place)}
output_np = exe.run(program=test_program, feed=feed, fetch_list=[output], return_numpy=False)[0]
print('output shape:', np.array(output_np).shape)
from parl.
HID_DIM = 512 class AtariModel(Model): def __init__(self, act_dim): self.policy_fc = layers.fc(size=act_dim, act='softmax') self.value_fc = layers.fc(size=1) def policy(self, obs): fc1 = fluid.layers.fc(input=obs, size=HID_DIM) lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM) policy_logits = self.policy_fc(lstm1) return policy_logits def value(self, obs): flatten = layers.flatten(obs, axis=1) fc_output = layers.fc(size=512, act='relu')(flatten) values = self.value_fc(fc_output) values = layers.squeeze(values, axes=[1]) return values def policy_and_value(self, obs): policy_logits = self.policy(obs) values = self.value(obs) return policy_logits, values
你好。
我们讨论了一下你的代码,目前你做的变动是希望把RNN替换掉当前的CNN网络。
但是需要指出的是:RNN和CNN不是简单的可替换关系, CNN在这里是抽取图像的特征,用于后续做决策。而简单地把CNN换成RNN是没法表征图像的,有相关的工作在RL里面引入RNN,相关的论文可以看下DRQN这篇文章。
由于当前代码改动的目的不合理,所以我们不建议继续追查这个问题。
您好,我明白CNN和RNN不是简单的可替换关系,我做这个替换,是因为我换掉了gym的ENV,用了自己定义的ENV,在我定义的ENV里,obs是一个二维的矩阵(time_step,特征),是时间序列,所以我需要构建LSTM模型。
模型合理性的问题,请放心。現已完成一套由tensorflow写的A3C模型,用lstm处理有时间序列的obs,效果的确比较好。
在Keras中的表述为:
lstm_input = Input(shape=(40,11), name='lstm_in')
lstm_output = LSTM(128, activation=tanh, dropout_W=0.2, dropout_U=0.1)(lstm_input)
能否给我个例子,在agent.learn的时候构造lod_tensor,该怎么feed进去
from parl.
感谢众开发人员的回覆,希望未来能提供model用LSTM的例子。
paddlepaddle LodTensor这概念实在太难写了,如果Value和Policy使用不同网络,还得单独对有LodTensor的输入输出进行转换,用openai baseline写一下午就搞定的事,这我写了四天还报错 :(
我弃坑了,祝你们越来越好,谢谢
from parl.
了解了,这个确实是底层paddle的接口设计太难上手了,我们会向paddlepaddle的开发人员建议:
在后续的动态图版本简化lod tensor的使用问题
祝好:)
再次感谢你的尝试。
from parl.
你好:)
可以贴一下具体的错误信息吗?
from parl.
Atarimodel
HID_DIM = 512
class AtariModel(Model):
def __init__(self, act_dim):
self.policy_fc = layers.fc(size=act_dim, act='softmax')
self.value_fc = layers.fc(size=1)
def policy(self, obs):
fc1 = fluid.layers.fc(input=obs, size=HID_DIM)
lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
policy_logits = self.policy_fc(lstm1)
return policy_logits
def value(self, obs):
flatten = layers.flatten(obs, axis=1)
fc_output = layers.fc(size=512, act='relu')(flatten)
values = self.value_fc(fc_output)
values = layers.squeeze(values, axes=[1])
return values
def policy_and_value(self, obs):
policy_logits = self.policy(obs)
values = self.value(obs)
return policy_logits, values
Agent有更动的部分:(OBS新增lod_level)
self.sample_program = fluid.Program()
self.predict_program = fluid.Program()
self.value_program = fluid.Program()
self.learn_program = fluid.Program()
with fluid.program_guard(self.sample_program):
obs = layers.data(
name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
sample_actions, values = self.alg.sample(obs)
self.sample_outputs = [sample_actions, values]
with fluid.program_guard(self.predict_program):
obs = layers.data(
name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
self.predict_actions = self.alg.predict(obs)
with fluid.program_guard(self.value_program):
obs = layers.data(
name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
self.values = self.alg.value(obs)
with fluid.program_guard(self.learn_program):
obs = layers.data(
name='obs', shape=self.config['obs_shape'], dtype='float32',lod_level=1)
actions = layers.data(name='actions', shape=[], dtype='int64')
advantages = layers.data(
name='advantages', shape=[], dtype='float32')
target_values = layers.data(
name='target_values', shape=[], dtype='float32')
lr = layers.data(
name='lr', shape=[1], dtype='float32', append_batch_size=False)
entropy_coeff = layers.data(
name='entropy_coeff', shape=[], dtype='float32')
total_loss, pi_loss, vf_loss, entropy = self.alg.learn(
obs, actions, advantages, target_values, lr, entropy_coeff)
self.learn_outputs = [
total_loss.name, pi_loss.name, vf_loss.name, entropy.name
]
错误信息:
ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0711 13:32:59.592660 292443584 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
I0711 13:32:59.605545 292443584 build_strategy.cc:282] SeqOnlyAllReduceOps:0, num_trainers:1
[07-11 13:32:59 MainThread @learner.py:85] Waiting for 2 remote actors to connect.
[07-11 13:32:59 Thread-1 @remote_object.py:51] [connect_remote_client] client_address:192.168.75.236:54698
[07-11 13:32:59 Thread-1 @remote_manager.py:88] [RemoteManager] Added a new remote object.
[07-11 13:32:59 MainThread @learner.py:94] Remote actor count: 1
[07-11 13:32:59 Thread-1 @remote_object.py:51] [connect_remote_client] client_address:192.168.75.236:54694
[07-11 13:32:59 Thread-1 @remote_manager.py:88] [RemoteManager] Added a new remote object.
[07-11 13:32:59 MainThread @learner.py:94] Remote actor count: 2
[07-11 13:32:59 MainThread @learner.py:100] All remote actors are ready, begin to learn.
Exception in thread Thread-4:
Traceback (most recent call last):
File "/anaconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/anaconda3/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "learner.py", line 109, in run_remote_sample
batch = remote_actor.sample()
File "/anaconda3/lib/python3.5/site-packages/parl/remote/remote_object.py", line 82, in wrapper
raise RemoteError(attr, error_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `sample`]:
Invoke operator lstm error.
Python Callstacks:
File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/framework.py", line 1654, in append_op
attrs=kwargs.get("attrs", None))
File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/anaconda3/lib/python3.5/site-packages/paddle/fluid/layers/nn.py", line 525, in dynamic_lstm
'candidate_activation': candidate_activation
File "/Users/renyanxue/Desktop/git_project/paper/atari_model.py", line 116, in policy
lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
File "/Users/renyanxue/Desktop/git_project/paper/atari_model.py", line 143, in policy_and_value
policy_logits = self.policy(obs)
File "/anaconda3/lib/python3.5/site-packages/parl/algorithms/a3c.py", line 72, in sample
logits, values = self.model.policy_and_value(obs)
File "/Users/renyanxue/Desktop/git_project/paper/atari_agent.py", line 42, in build_program
sample_actions, values = self.alg.sample(obs)
File "/anaconda3/lib/python3.5/site-packages/parl/framework/agent_base.py", line 46, in __init__
self.build_program()
File "/Users/renyanxue/Desktop/git_project/paper/atari_agent.py", line 11, in __init__
super(AtariAgent, self).__init__(algorithm)
File "actor.py", line 45, in __init__
self.agent= AtariAgent(algorithm, config)
File "/anaconda3/lib/python3.5/site-packages/parl/remote/remote_decorator.py", line 54, in __init__
self.unwrapped = cls(*args, **kwargs)
File "actor.py", line 117, in <module>
actor = Actor(config)
C++ Callstacks:
Enforce failed. Expected lods.size() == 1UL, but received lods.size():0 != 1UL:1.
Only support one level sequence now. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/math/sequence2batch.h:79]
PaddlePaddle Call Stacks:
0 0x1a2056a454p void paddle::platform::EnforceNotMet::Init<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const*, int) + 628
1 0x1a2056a180p paddle::platform::EnforceNotMet::EnforceNotMet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, int) + 80
2 0x1a2151c081p paddle::operators::math::LoDTensor2BatchFunctor<paddle::platform::CPUDeviceContext, float>::operator()(paddle::platform::CPUDeviceContext const&, paddle::framework::LoDTensor const&, paddle::framework::LoDTensor*, bool, bool) const + 2657
3 0x1a208a8c27p paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 1143
4 0x1a208a8770p std::__1::__function::__func<paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, double> >::operator()(char const*, char const*, int) const::'lambda'(paddle::framework::ExecutionContext const&), std::__1::allocator<paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::LSTMKernel<paddle::platform::CPUDeviceContext, double> >::operator()(char const*, char const*, int) const::'lambda'(paddle::framework::ExecutionContext const&)>, void (paddle::framework::ExecutionContext const&)>::operator()(paddle::framework::ExecutionContext const&) + 32
5 0x1a2174fd11p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 865
6 0x1a2174f92cp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 300
7 0x1a2174baf5p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 357
8 0x1a206c73eep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 334
9 0x1a206c6e8cp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, bool) + 172
10 0x1a205d49fdp void pybind11::cpp_function::initialize<paddle::pybind::pybind11_init_core(pybind11::module&)::$_96, void, paddle::framework::Executor&, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(paddle::pybind::pybind11_init_core(pybind11::module&)::$_96&&, void (*)(paddle::framework::Executor&, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 301
11 0x1a2054b308p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3400
12 0x10ba3d59fp PyCFunction_Call + 127
13 0x10bb087e7p PyEval_EvalFrameEx + 33207
14 0x10bafefafp _PyEval_EvalCodeWithName + 335
15 0x10bb052a7p PyEval_EvalFrameEx + 19575
16 0x10bafefafp _PyEval_EvalCodeWithName + 335
17 0x10bb052a7p PyEval_EvalFrameEx + 19575
18 0x10bb04fb8p PyEval_EvalFrameEx + 18824
19 0x10bafefafp _PyEval_EvalCodeWithName + 335
20 0x10ba0a6aap function_call + 106
21 0x10b9c6b35p PyObject_Call + 69
22 0x10bb05c9bp PyEval_EvalFrameEx + 22123
23 0x10bafefafp _PyEval_EvalCodeWithName + 335
24 0x10ba0a6aap function_call + 106
25 0x10b9c6b35p PyObject_Call + 69
26 0x10bb05c9bp PyEval_EvalFrameEx + 22123
27 0x10bb04fb8p PyEval_EvalFrameEx + 18824
28 0x10bb04fb8p PyEval_EvalFrameEx + 18824
29 0x10bafefafp _PyEval_EvalCodeWithName + 335
30 0x10ba0a6aap function_call + 106
31 0x10b9c6b35p PyObject_Call + 69
32 0x10b9e9694p method_call + 148
33 0x10b9c6b35p PyObject_Call + 69
34 0x10bb0dbf4p PyEval_CallObjectWithKeywords + 68
35 0x10bb7a72ap t_bootstrap + 122
36 0x7fff78c4b2ebp _pthread_body + 126
37 0x7fff78c4e249p _pthread_start + 66
38 0x7fff78c4a40dp thread_start + 13
from parl.
收到,我们先在本地尝试复现并定位你的问题,谢谢反馈
from parl.
@RonaldJEN 请提供下正在使用的paddle版本
from parl.
parl==1.1
paddle==1.0.2
paddlehub==1.0.1
paddlepaddle==1.4.0
PARL因为PIP无法更新到最新的,所以我直接拿git上的盖过去
from parl.
收到,如果要安装最新的PARL可以
git clone https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .
即可,不需要用“盖过去”这么粗暴的方式哈
from parl.
收到,如果要安装最新的PARL可以
git clone https://github.com/PaddlePaddle/PARL/ cd PARL pip install .
即可,不需要用“盖过去”这么粗暴的方式哈
我是按你说的方法更新,可能描述得不清楚哈
from parl.
HID_DIM = 512
class AtariModel(Model):
def __init__(self, act_dim):
self.policy_fc = layers.fc(size=act_dim, act='softmax')
self.value_fc = layers.fc(size=1)
def policy(self, obs):
fc1 = fluid.layers.fc(input=obs, size=HID_DIM)
lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=HID_DIM)
policy_logits = self.policy_fc(lstm1)
return policy_logits
def value(self, obs):
flatten = layers.flatten(obs, axis=1)
fc_output = layers.fc(size=512, act='relu')(flatten)
values = self.value_fc(fc_output)
values = layers.squeeze(values, axes=[1])
return values
def policy_and_value(self, obs):
policy_logits = self.policy(obs)
values = self.value(obs)
return policy_logits, values
你好。
我们讨论了一下你的代码,目前你做的变动是希望把RNN替换掉当前的CNN网络。
但是需要指出的是:RNN和CNN不是简单的可替换关系, CNN在这里是抽取图像的特征,用于后续做决策。而简单地把CNN换成RNN是没法表征图像的,有相关的工作在RL里面引入RNN,相关的论文可以看下DRQN这篇文章。
由于当前代码改动的目的不合理,所以我们不建议继续追查这个问题。
from parl.
对于代码本身的运行错误问题,我们认为是:
既然把用lod_tensor来输入obs,就得在agent.learn的时候构造lod_tensor,而不能再像现在一样直接feed进去哈。
但我们还是建议你先去确认下这个改动的合理性,避免做无用功呢。欢迎继续讨论:)
from parl.
LodTensor相关示例可以参考paddle models里面的示例:https://github.com/PaddlePaddle/models/search?p=1&q=LodTensor&unscoped_q=LodTensor
另外,因为A2C使用了parl.Agent里面的get_params/set_params接口进行learner和actor的参数同步,但接口只支持同步Model里面用parl.layers在__init__里创建的带参数layers,所以建议在Model的__init__里面声明parl.layers.dynamic_lstm、parl.layers.fc等。
from parl.
Agent
obs = (batch_size, seq_len, input_dim)
(80,20,11)
根据上述的案例做了以下更改,输出的action和values格式跟以前不同
# ε-greedy
def build_program(self):
self.sample_program = fluid.Program()
self.predict_program = fluid.Program()
self.value_program = fluid.Program()
self.learn_program = fluid.Program()
with fluid.program_guard(self.sample_program):
obs = layers.data(
name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
sample_actions, values = self.alg.sample(obs)
self.sample_outputs = [sample_actions, values]
with fluid.program_guard(self.predict_program):
obs = layers.data(
name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
self.predict_actions = self.alg.predict(obs)
with fluid.program_guard(self.value_program):
obs = layers.data(
name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
self.values = self.alg.value(obs)
with fluid.program_guard(self.learn_program):
obs = layers.data(
name='obs', shape=[self.config['obs_shape'][1]], dtype='float32',lod_level=1)
actions = layers.data(name='actions', shape=[], dtype='int64')
advantages = layers.data(
name='advantages', shape=[], dtype='float32')
target_values = layers.data(
name='target_values', shape=[], dtype='float32')
lr = layers.data(
name='lr', shape=[1], dtype='float32', append_batch_size=False)
entropy_coeff = layers.data(
name='entropy_coeff', shape=[], dtype='float32')
total_loss, pi_loss, vf_loss, entropy = self.alg.learn(
obs, actions, advantages, target_values, lr, entropy_coeff)
self.learn_outputs = [
total_loss.name, pi_loss.name, vf_loss.name, entropy.name
]
def sample(self, obs_np):
obs_np = obs_np.astype('float32')
sample_actions, values = self.fluid_executor.run(
self.sample_program,
feed={'obs': to_lodtensor(obs_np,self.place)},
fetch_list=self.sample_outputs,
return_numpy=False)
return sample_actions, values
# 预测
def predict(self, obs_np):
obs_np = obs_np.astype('float32')
predict_actions = self.fluid_executor.run(
self.predict_program,
feed={'obs': to_lodtensor(obs_np,self.place)},
fetch_list=[self.predict_actions],
return_numpy=False)[0]
return predict_actions
def value(self, obs_np):
obs_np = obs_np.astype('float32')
values = self.fluid_executor.run(
self.value_program, feed={'obs': to_lodtensor(obs_np,self.place)},
fetch_list=[self.values],
return_numpy=False)[0]
return values
# 学习
def learn(self, obs_np, actions_np, advantages_np, target_values_np, terminal=None):
obs_np = obs_np.astype('float32')
actions_np = actions_np.astype('int64')
advantages_np = advantages_np.astype('float32')
target_values_np = target_values_np.astype('float32')
lr = self.lr_scheduler.step(step_num=obs_np.shape[0])
entropy_coeff = self.entropy_coeff_scheduler.step()
total_loss, pi_loss, vf_loss, entropy = self.learn_exe.run(
feed={
'obs': to_lodtensor(obs_np,self.place),
'actions': to_lodtensor(actions_np,self.place),
'advantages': to_lodtensor(advantages_np,self.place),
'target_values': to_lodtensor(target_values_np,self.place),
'lr': to_lodtensor(np.array([lr], dtype='float32'),self.place),
'entropy_coeff': to_lodtensor(np.array([entropy_coeff], dtype='float32'),self.place)
},
fetch_list=self.learn_outputs,
return_numpy=False)
return total_loss, pi_loss, vf_loss, entropy, lr, entropy_coeff
Atarimodel
HID_DIM = 16
class Atarimodel(Model):
def __init__(self, act_dim):
self.lstm_before_fc = layers.fc(size=HID_DIM * 4)
self.dynamic_lstm = layers.dynamic_lstm(size=HID_DIM * 4)
self.policy_fc = layers.fc(size=act_dim, act='softmax')
self.value_fc = layers.fc(size=1)
def policy(self, obs):
forward_proj = self.lstm_before_fc(obs)
output, _ = self.dynamic_lstm(forward_proj)
policy_logits = self.policy_fc(output)
return policy_logits
def value(self, obs):
forward_proj = self.lstm_before_fc(obs)
output, _ = self.dynamic_lstm(forward_proj)
values = self.value_fc(output)
values = layers.squeeze(values, axes=[1])
return values
def policy_and_value(self, obs):
policy_logits = self.policy(obs)
values = self.value(obs)
return policy_logits, values
错误信息:
输出的action为<class 'paddle.fluid.core.Tensor'>
value也是,和原本的输出不同
from parl.
np.array(output_tensor)可以转换成numpy array;另外,LodTensor的输出会把Batch_size*seq_len合成一个维度,需要自己在paddle层面或者numpy层面转换
from parl.
到learn这块输出错误信息(改了一晚没能改出,望指点)
def learn(self, obs_np, actions_np, advantages_np, target_values_np, terminal=None):
obs_np = obs_np.astype('float32')
Batch_size = obs_np.shape[0]
actions_np = actions_np.astype('int64').reshape(Batch_size,1)
advantages_np = advantages_np.astype('float32').reshape(Batch_size,1)
target_values_np = target_values_np.astype('float32').reshape(Batch_size,1)
lr = self.lr_scheduler.step(step_num=Batch_size)
entropy_coeff = self.entropy_coeff_scheduler.step()
total_loss, pi_loss, vf_loss, entropy = self.learn_exe.run(
feed={
'obs': to_lodtensor(obs_np,self.place),
'actions': to_lodtensor(actions_np,self.place),
'advantages': to_lodtensor(advantages_np,self.place),
'target_values': to_lodtensor(target_values_np,self.place),
'lr': np.array([lr], dtype='float32'),
'entropy_coeff': np.array([entropy_coeff], dtype='float32')
},
fetch_list=self.learn_outputs,
return_numpy=False)
return total_loss, pi_loss, vf_loss, entropy, lr, entropy_coeff
错误信息
C++ Callstacks:
Enforce failed. Expected x_dim.size() >= y_dim.size(), but received x_dim.size():2 < y_dim.size():3.
Rank of first input must >= rank of second input. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/elementwise/elementwise_op.h:56]
PaddlePaddle Call Stacks:
0 0x1a22d7a454p void paddle::platform::EnforceNotMet::Init<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const*, int) + 628
1 0x1a22d7a180p paddle::platform::EnforceNotMet::EnforceNotMet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, int) + 80
2 0x1a23a390cap paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const + 842
3 0x1a23f5fcdfp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 815
4 0x1a23f5f92cp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 300
5 0x1a23f5baf5p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 357
6 0x1a23de3d8fp std::__1::__function::__func<paddle::framework::details::ComputationOpHandle::RunImpl()::$_0, std::__1::allocator<paddle::framework::details::ComputationOpHandle::RunImpl()::$_0>, void ()>::operator()() + 111
7 0x1a23de3803p paddle::framework::details::ComputationOpHandle::RunImpl() + 195
8 0x1a23dc0c62p std::__1::__packaged_task_func<std::__1::__bind<paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpAsync(std::__1::unordered_map<paddle::framework::details::OpHandleBase*, std::__1::atomic<int>, std::__1::hash<paddle::framework::details::OpHandleBase*>, std::__1::equal_to<paddle::framework::details::OpHandleBase*>, std::__1::allocator<std::__1::pair<paddle::framework::details::OpHandleBase* const, std::__1::atomic<int> > > >*, paddle::framework::details::OpHandleBase*, std::__1::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&)::$_0>, std::__1::allocator<std::__1::__bind<paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpAsync(std::__1::unordered_map<paddle::framework::details::OpHandleBase*, std::__1::atomic<int>, std::__1::hash<paddle::framework::details::OpHandleBase*>, std::__1::equal_to<paddle::framework::details::OpHandleBase*>, std::__1::allocator<std::__1::pair<paddle::framework::details::OpHandleBase* const, std::__1::atomic<int> > > >*, paddle::framework::details::OpHandleBase*, std::__1::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&)::$_0> >, void ()>::operator()() + 82
9 0x1a23d4dc99p std::__1::packaged_task<void ()>::operator()() + 73
10 0x1a22e49f9ap ThreadPool::ThreadPool(unsigned long)::'lambda'()::operator()() const + 522
11 0x1a22e49d1dp void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPool::ThreadPool(unsigned long)::'lambda'()> >(void*) + 45
12 0x7fff5a89c2ebp _pthread_body + 126
13 0x7fff5a89f249p _pthread_start + 66
14 0x7fff5a89b40dp thread_start + 13
from parl.
从报错信息看,应该是说调用elementwise_op时,要求 x_dim.size() >= y_dim.size(),即第一个输入的维度要大于第二个输入的维度,可以输出tensor shape定位下具体是哪里报错的
from parl.
paddle在1.5 中 新增了新的api,目前还在contrib目录中
使用方式请参考
https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/models/language_model/lm_model.py#L338
from parl.
paddle在1.5 中 新增了新的api,目前还在contrib目录中
使用方式请参考
https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/models/language_model/lm_model.py#L338
谢谢您的提供。我之后有时间的时候,再研究看看 :)
from parl.
Related Issues (20)
- PARL在MacOS系统上用pip安装的时候报错 HOT 2
- LESSON5中的DDPG,将PyCharm中提示未实现抽象函数的类都实现后,reward一直处于10左右
- 救命,安装需要的环境包的时候没有一个包安得上,换了ali的源也一样,ubuntu18.04
- pip安装parl时报错
- 电气研究生跨考控制之我真是小白!!! HOT 1
- import parl时RuntimeError HOT 2
- 询问gym库中OBS对象属性的问题
- 使用python train.py运行tutorials中的代码没有反应
- 渲染图像render=True时,代码报错,图像框一下子闪退 Windows10 HOT 1
- torch的选择问题? HOT 2
- DDPG mujoco error
- It seems `fluid` is absent from paddlepaddle after 2.5, causing issue with backend detection on Windows HOT 1
- Attributerror HOT 6
- pip安装问题 HOT 4
- 分发本地文件如何同步更新呢 HOT 1
- Remaining issues of CI HOT 1
- pettingzoo.utils.deprecated_module.DeprecatedEnv: simple_speaker_listener_v3 is now deprecated, use simple_speaker_listener_v4 instead
- WRN No deep learning framework was found, but it's ok for parallel computation. HOT 1
- ValueError: optimizer got an empty parameter list HOT 1
- AttributeError: module 'parl' has no attribute 'Model' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parl.