adik993 / ppo-pytorch Goto Github PK

Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)

Python 100.00%

reinforcement-learning ppo pytorch icm intrinsic-curiosity-module proximal-policy-optimization generalized-advantage-estimation cartpole-v1 mountaincar-v0 pendulum-v0

ppo-pytorch's People

Contributors

Stargazers

Watchers

ppo-pytorch's Issues

Do you also have an LSTM implementation?

I really love this implementation, and I see that LSTM is still in the TODO. Have you made any progress on this in the last two months or should I just do it myself?

How can I get your result in tensorboard without early ending?

I tried your script in mountaincar env and It seems that the game ends when the step length reaches 200 per episode, but in your tensorboard plots, an episode didn't stop until it reached the final state(the top of mountain). I wonder if it's because there is any early ending mechanisms in your code but unfortunately I didn't find it. Could you give me some advise to get your tensorboard result in your publishes?

I found him good for discrete space when I ran the project, but I would like to know how to make use of it in continuous space？

I want to train the agent using the project file after customizing the environment based on the gym's continuous space, the state and actions of the environment are defined as follows：

    self.min_action = np.array([[-3, -3, -3, -3, -3]]).reshape(1,5)
    self.max_action = np.array([[3, 3, 3, 3, 3]]).reshape(1,5)

    self.low_state = np.array(
        [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=np.float32
    ).reshape(1,10)
    self.high_state = np.array(
        [[50, 50, 50, 50, 50, 300, 300, 300, 300, 300]], dtype=np.float32
    ).reshape(1,10)

    self.action_space = spaces.Box(
        low=self.min_action, high=self.max_action, shape=(1, 5), dtype=np.float32
    )

    self.observation_space = spaces.Box(
        low=self.low_state, high=self.high_state, shape=(1, 10), dtype=np.float32
    )

Is it possible to implement this idea based on PPO ICM? Thanks!

How many episodes are needed to solve MountainCar-v0 with PPO + curiosity?

I tried your run_mountain_car.py, but the accumulated rewards do not change at all.
Are there any hyper-parameters that I need to change? And how many episodes are needed in general?

adik993 / ppo-pytorch Goto Github PK

ppo-pytorch's People

Contributors

Stargazers

Watchers

Forkers

ppo-pytorch's Issues

Do you also have an LSTM implementation?

How can I get your result in tensorboard without early ending?

I found him good for discrete space when I ran the project, but I would like to know how to make use of it in continuous space？

How many episodes are needed to solve MountainCar-v0 with PPO + curiosity?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent