Giter VIP home page Giter VIP logo

hybrid-sac's People

Contributors

nisheeth-golakiya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

hybrid-sac's Issues

Question on formula of the continuous action

First, thank your for the code related to paper Discrete and Continuous Action Representation for Practical RL in Video Games.

Second, according to your code, all of action spaces of the environments you used for this project are based on the 5th action space architecture stated from the paper ---only ONE dimension of discrete action space + continuous action space for EACH action from discrete action space(If I am wrong ,please tell me).

And My question is the loss formula related to the continous part:

# from calculating critic loss
min_qf_next_target = next_state_prob_d * (torch.min(qf1_next_target, qf2_next_target) - alpha * next_state_prob_d * next_state_log_pi_c - alpha_d * next_state_log_pi_d)

# from calculating policy loss
policy_loss_c = (prob_d * (alpha * prob_d * log_pi_c - min_qf_pi)).sum(1).mean()

# from calculating temperature loss
alpha_loss = (-log_alpha * p_d * (p_d * lpi_c + target_entropy)).sum(1).mean()

From each code formula, you times distribution object (next_state_prob_d ,prob_d and p_d ) with the continuous objects (next_state_log_pi_c ,log_pi_c and lpi_c ),
even there is another same distribution object outside parenthesis.

Intuitionally, I think there is no need to multiple distribution object right with the corresponding continuous objects inside parenthesis.

I don't know whether I am wrong mathmatically. So I ask this question.

Dependency of actions

Hi, thank you so much for offering this code.

I read the code: hybrid_sac_platform.py

And, the paper said:
If the continuous action ac must depend on the discrete action chosen by the agent, then ad
can be used as input when computing µc and σc.

I think the continuous action depend on the discrete action, but I did not find the
where using ad to compute µc and σc.

Could you tell me where to show this dependency of actions.

Thank you very much for any advance!!!

about policy loss

hi, thank you for this wonderful job.

I am a new bird in RL, and I try to use your code in my env,

however, this policy loss seems a little strange.

image

I wonder it is right or I make some mistake?

Thank you again and wish you everything good

numpy.core._exceptions.MemoryError: Unable to allocate 8.00 GiB for an array with shape (2147483647,) and data type int32

python hybrid_sac_platform.py --seed 7 --gym-id Platform-v0 --total-timesteps 100000 --learning-starts 1000
pygame 2.1.0 (SDL 2.0.16, Python 3.8.16)
Hello from the pygame community. https://www.pygame.org/contribute.html
D:\Anaconda3\envs\pytorch2023\lib\site-packages\gym\spaces\box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(
<class 'gym.spaces.tuple.Tuple'>
Tuple(Box([0. 0. 0. 0. 0. 0. 0. 0. 0.], [1. 1. 1. 1. 1. 1. 1. 1. 1.], (9,), float32), Discrete(200))
Traceback (most recent call last):
File "hybrid_sac_platform.py", line 155, in
env.action_space.seed(args.seed)
File "D:\Anaconda3\envs\pytorch2023\lib\site-packages\gym\spaces\tuple.py", line 30, in seed
subseeds = self.np_random.choice(
File "mtrand.pyx", line 1010, in numpy.random.mtrand.RandomState.choice
File "mtrand.pyx", line 4699, in numpy.random.mtrand.RandomState.permutation
numpy.core._exceptions.MemoryError: Unable to allocate 8.00 GiB for an array with shape (2147483647,) and data type int32

I would be very grateful to you for answering this question

Don't try and do the continuous action scaling in the Policy network.

This isn't so much an issue with the code as it was user error that I'd like to help others avoid:

For anyone who is going to use this code, make sure that you DO NOT try and do the action scaling/bias in the policy model itself. I'm using a custom environment, so I figured it would be easier to do the scaling in Policy.get_action() as they do in the cleanrl implementation (lines 133 and 136). With the scaling in the policy, my code refused to converge even on very simple cases. Unless I'm very bad at synthesizing the two codes, I think there's some issue with either (i) erroneous values getting attached to the backpropagation graph or (ii) values being put into the replay buffer with(out) scaling when they should or shouldn't have it.

Solution: Leave the code completely alone and do all of your scaling inside the environment. For anyone else doing a custom environment, here's the easy solution:

extract_actions scale_bias

where as_high, as_low are the high and low parameters (of type np.array) that you pass to the environment's Box() space in init().

Also, thanks for making this code. It has helped me out a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.