Hello, Thank you very much for the A3C implementation. I was trying

Problem with more than one action - A3C about reinforcement-learning-with-tensorflow HOT 7 CLOSED

akhilsanand commented on May 15, 2024

Problem with more than one action - A3C

from reinforcement-learning-with-tensorflow.

Comments (7)

akhilsanand commented on May 15, 2024

hello,
interestingly the action bound in your code affects the learning very much. In my custom environment there are only 2 actions required and they are within a range of [0,1]. But when I try with [-1, 1] for action bound in your code it doesn't get stuck in local minima anymore unlike with [0,1]. Could you please explain this phenomena.

Regards,
Akhil

from reinforcement-learning-with-tensorflow.

MorvanZhou commented on May 15, 2024

Hi Akhil,

It may be related to the activation function you selected. For example, if mapping action to (-1, 1), you will choose tanh as the mapping function and sigmoid for (0, 1), these two mappings have different derivative which may affect your training.

from reinforcement-learning-with-tensorflow.

akhilsanand commented on May 15, 2024

hello Zhou,

Thank you very much for the reply, I have tried the sigmoid activation function with [0,1] action bound, but it still get stuck in the local minima. But with sigmoid act. function and [-1,1] as action bound it again starts learning really well. Do u have some idea about it?

regards,
akhil

from reinforcement-learning-with-tensorflow.

MorvanZhou commented on May 15, 2024

Then I think it is likely that the backprop with tanh is better than sigmoid. This might be one of the reasons.

from reinforcement-learning-with-tensorflow.

akhilsanand commented on May 15, 2024

hello Zhou,

But I am getting good results with sigmoid act. function and action bound of [-1, 1]. This makes me confused on how the action bound is really affecting even after using a sigmoid activation function.

from reinforcement-learning-with-tensorflow.

MorvanZhou commented on May 15, 2024

The calculation of action bound is tf.clip_by_value(output, lower_bound, upper_bound), Due to the action is normal-distributed, sometimes the output can still be less than 0 when using sigmoid mean. Therefore, taking action bound of (-1,1) still affect the final result.

from reinforcement-learning-with-tensorflow.

akhilsanand commented on May 15, 2024

thanks zhou

from reinforcement-learning-with-tensorflow.

Recommend Projects

Problem with more than one action - A3C about reinforcement-learning-with-tensorflow HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent