Comments (5)
找到问题了,Normal(means, std),求log_prob的时候std不能为0。神经网络输出要做处理。
def log_prob(self, value):
var = self.scale * self.scale
log_scale = nn.log(self.scale)
return -1. * ((value - self.loc) * (value - self.loc)) / (
2. * var) - log_scale - math.log(math.sqrt(2. * math.pi))
这部分代码是从sac复制来的,不明白为什么在sac里面执行不会出错?
from parl.
咨询下你目前是做了哪部分的改动呢?
建议你不要一下子改动太多,一步步地增加你的代码,如果发现某次改动出问题了,就把相应的改动扔上来。
这样我们更好定位到问题。
from parl.
@TomorrowIsAnOtherDay 暂时改动的部分不多,我想在连续动作空间使用IMPALA,改动算法learn()部分如下,其他没有大的改动,问题应该在这里。actor网络拟合means和std,参照sac算法。请帮忙检查是否合理。
values = self.model.value(obs)
actions_mu, log_std_mu = self.model.policy(obs)
std_pi = layers.exp(log_std)
normal_pi = Normal(actions, std_pi)
x_t1 = normal_pi.sample([1])[0]
y_t1 = layers.tanh(x_t1)
# action1 = y_t1 * self.max_action
log_prob1 = normal_pi.log_prob(x_t1)
log_prob1 -= layers.log(self.max_action * (1 - layers.pow(y_t1, 2)) + epsilon)
log_prob1 = layers.reduce_sum(log_prob1, dim=1, keep_dim=True)
log_prob_pi = layers.squeeze(log_prob1, axes=[1])
std_mu = layers.exp(log_std_mu)
normal_mu = Normal(actions_mu, std_mu)
x_t2 = normal_mu.sample([1])[0]
y_t2 = layers.tanh(x_t2)
# action2 = y_t2 * self.max_action
log_prob2 = normal_mu.log_prob(x_t2)
log_prob2 -= layers.log(self.max_action * (1 - layers.pow(y_t2, 2)) + epsilon)
log_prob2 = layers.reduce_sum(log_prob2, dim=1, keep_dim=True)
log_prob_mu = layers.squeeze(log_prob2, axes=[1])
# target_policy_distribution = CategoricalDistribution(target_logits)
# behaviour_policy_distribution = CategoricalDistribution(
# behaviour_logits)
policy_entropy = normal_pi.entropy()
target_actions_log_probs = log_prob_mu
behaviour_actions_log_probs = log_prob_pi
# Calculating kl for debug
# kl = target_policy_distribution.kl(behaviour_policy_distribution)
kl = normal_pi.kl_divergence(normal_mu)
kl = layers.reduce_mean(kl)
from parl.
你好,看错误提示“Integer division by zero encountered in divide”应该是elementwise_divide(除法op)计算时出错,除数存在0的情况,可以检查定位下是哪里数值计算不合理。(例如在除法op前面加入fluid.layers.Print打印tensor运行时数值)
from parl.
手动点赞,那我们就先关掉这个issue了:)
from parl.
Related Issues (20)
- import parl时报错RuntimeError问题 HOT 1
- A2C模型训练报错 HOT 1
- train.py导入parl时报错怎么解决 HOT 6
- 运行lesson3的课件代码,无法显示平衡杆的图像效果 HOT 4
- PARL在MacOS系统上用pip安装的时候报错 HOT 2
- LESSON5中的DDPG,将PyCharm中提示未实现抽象函数的类都实现后,reward一直处于10左右
- 救命,安装需要的环境包的时候没有一个包安得上,换了ali的源也一样,ubuntu18.04
- pip安装parl时报错
- 电气研究生跨考控制之我真是小白!!! HOT 1
- import parl时RuntimeError HOT 2
- 询问gym库中OBS对象属性的问题
- 使用python train.py运行tutorials中的代码没有反应
- 渲染图像render=True时,代码报错,图像框一下子闪退 Windows10
- torch的选择问题? HOT 2
- DDPG mujoco error
- It seems `fluid` is absent from paddlepaddle after 2.5, causing issue with backend detection on Windows HOT 1
- Attributerror HOT 6
- pip安装问题 HOT 4
- 分发本地文件如何同步更新呢 HOT 1
- Remaining issues of CI HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parl.