Comments (7)
hello,
interestingly the action bound in your code affects the learning very much. In my custom environment there are only 2 actions required and they are within a range of [0,1]. But when I try with [-1, 1] for action bound in your code it doesn't get stuck in local minima anymore unlike with [0,1]. Could you please explain this phenomena.
Regards,
Akhil
from reinforcement-learning-with-tensorflow.
Hi Akhil,
It may be related to the activation function you selected. For example, if mapping action to (-1, 1), you will choose tanh as the mapping function and sigmoid for (0, 1), these two mappings have different derivative which may affect your training.
from reinforcement-learning-with-tensorflow.
hello Zhou,
Thank you very much for the reply, I have tried the sigmoid activation function with [0,1] action bound, but it still get stuck in the local minima. But with sigmoid act. function and [-1,1] as action bound it again starts learning really well. Do u have some idea about it?
regards,
akhil
from reinforcement-learning-with-tensorflow.
Then I think it is likely that the backprop with tanh is better than sigmoid. This might be one of the reasons.
from reinforcement-learning-with-tensorflow.
hello Zhou,
But I am getting good results with sigmoid act. function and action bound of [-1, 1]. This makes me confused on how the action bound is really affecting even after using a sigmoid activation function.
from reinforcement-learning-with-tensorflow.
The calculation of action bound is tf.clip_by_value(output, lower_bound, upper_bound)
, Due to the action is normal-distributed, sometimes the output can still be less than 0 when using sigmoid mean. Therefore, taking action bound of (-1,1) still affect the final result.
from reinforcement-learning-with-tensorflow.
thanks zhou
from reinforcement-learning-with-tensorflow.
Related Issues (20)
- Validating the trained model with a provided trajectory
- pytorch
- Prioritized Experience Replay 中设置transition的priority
- 请问一下gym配置文件是哪一个
- Q-learning 的 Maze的红方块不显示颜色
- 模型保存
- 请问如何在tensorboard中展示DDPG reward值的变化趋势?
- Curiosity algorithm
- DQN的代码中,计算q_target时未考虑done为true的情况
- treasure on right例子中的程序报错
- 2D car project
- 关于Q_learning章节中某个方法已经deprecated的疑惑
- 计算机资源利用率低
- 关于open AI gym运行报错 HOT 2
- 每次运行实例都会出现中断,产生keyerror: HOT 1
- 关于DDPG算法 HOT 1
- pandas==1.4.4 FutureWarning解决:关于'df.append' use 'pandas.concat' instead. HOT 2
- INPUT and OUTPUT-solve classifier-question
- 关于10_A3C文件夹里面后三个代码文件出现如下问题:tuple indices must be integers or slices, not tuple的解决办法
- 迷宫环境的疑问
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reinforcement-learning-with-tensorflow.