Comments (14)
Great I will add it to v5 change list
from gymnasium.
Here is the v4
vs v5
for InvertedPendulum (the only difference in v5
is the reward being fixed)
As expected the v5
version has a faster learning transience
from gymnasium.
Yes, I think that is a reasonable thing to consider adding for v5. @rodrigodelazcano thoughts?
from gymnasium.
The same appears to be the case for
InvertedPendulumEnv
from gymnasium.
That is a good catch. I agree with @pseudo-rnd-thoughts . This should be added to v5 since v4 only updates to the mujoco bindings and this reward error comes from older versions as well.
from gymnasium.
Here is some code verifying the bugs
>>> env = gymnasium.make('InvertedPendulum-v4')
>>> env.reset()
(array([-0.00114481, 0.00315834, -0.00689603, -0.00764207]), {})
>>> env.step([1])
(array([ 0.0052199 , -0.01239018, 0.32425438, -0.76226102]), 1.0, False, False, {})
>>> env.step([1])
(array([ 0.02474693, -0.05746427, 0.65169342, -1.48966764]), 1.0, False, False, {})
>>> env.step([1])
(array([ 0.05732965, -0.13159401, 0.97709572, -2.21890001]), 1.0, False, False, {})
>>> env.step([1])
(array([ 0.10286945, -0.23521879, 1.29895519, -2.96571882]), 1.0, True, False, {})
>>> env.step([1])
(array([ 0.16112042, -0.36907483, 1.6112052 , -3.72861976]), 1.0, True, False, {})
>>> env.step([1])
(array([ 0.23148975, -0.53346372, 1.902614 , -4.48774083]), 1.0, True, False, {})
>>> env = gymnasium.make('InvertedDoublePendulum-v4')
>>> env.reset()
(array([-0.05209413, -0.03106399, -0.05757982, 0.9995174 , 0.99834091,
-0.00319314, -0.10766195, 0.09683618, 0. , 0. ,
0. ]), {})
>>> env.step([1])
(array([ 7.67962813e-04, -1.44606909e-01, 8.22320453e-02, 9.89489182e-01,
9.96613210e-01, 2.11193186e+00, -4.45134196e+00, 5.49346477e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00]), 9.17710405832815, False, False, {})
>>> env.step([1])
(array([ 0.15545075, -0.44371616, 0.43168352, 0.89616738, 0.90202513,
3.9987776 , -7.74866516, 7.9830774 , 0. , 0. ,
0. ]), 8.877556821859912, False, False, {})
>>> env.step([1])
(array([ 0.39199627, -0.77051144, 0.69186829, 0.63742617, 0.72202373,
5.38530052, -8.71215195, 3.8089483 , 0. , 0. ,
0. ]), 8.807853136081622, True, False, {})
from gymnasium.
v4
is the current v4
version
v4-fixed
is the current v4
version, with the reward_alive
fixed
v5
is the current v4
version, with the reward_alive
fixed and the observation fix (#228)
from gymnasium.
This is a massive reward difference.
This shouldn't explain to me the performance difference as the primary difference is if terminated=True
as alive_bonus = 0
so I expected that the episode reward might be 10 points lower.
@Kallinteris-Andreas Am I misunderstanding something?
from gymnasium.
The episodic reward being 10 points happens only if the episode terminates (which does not happen after some training regardless of the reward function).
The best policy of all the cases resulted in the same return (~9360), it is just that with the fixed reward function it is possible to get there more consistently
Note: I have double-checked the source codes, nothing is wrong there.
from gymnasium.
That doesn't explain the ~4000 point increase shown by
To me, the change to the reward function is only when terminated=True
such that reward_alive=0
. Have I misunderstood something?
from gymnasium.
No, your understanding of the change in the reward function is correct
from gymnasium.
When why the ~4000 point difference? To me, if the agents were already collecting the optimal result then the difference should be on average 10 points
from gymnasium.
Because on some runs with the old reward function, the agent is not able to learn how to "escape" an unbalanced state
The optimal results are identical with both reward functions (since the "optimal" policy, would not be unbalanced)
from gymnasium.
Wow, that is amazing if purely changing that variables causes such a massive change in performance
from gymnasium.
Related Issues (20)
- [Question] AssertionError: Using `env.reset(seed=123)` is non-deterministic as the observations are not equivalent. HOT 1
- [Bug Report] [WEB Documentation] `/tutorials/gymnasium_basics/load_quadruped_model/` does not show in the TOCtree HOT 1
- [Proposal] [`FuncEnv`] Add `params` argument to `__init__`
- [Bug Report] Bug in docs or bug in implementation of HalfCheetah Mujoco Env HOT 3
- [Question] Problem with the reset method of gymnasium ... again HOT 12
- [Bug Report] CartPoleVectorEnv resets one step to early, not following the new VectorEnv API
- [Bug Report] `gymnasium.utils.play` for gymnasium-robotics environments HOT 6
- [Proposal] Update cython dependency to cython 3 HOT 3
- [Proposal] Improve `check_env` HOT 9
- [Bug Report] Documentation TOCtree spaces/utility link
- [Question] Are results on mujoco games v3 and v4 comparable HOT 1
- [Proposal] Remove `OrderedDict` in favor of standard `dict`
- [Proposal] Cupy support and performance issue HOT 4
- [Question] Observation problems in Pendulum-v1 HOT 1
- Projects updated to v1.0.0 alphas
- [Bug Report] No Matches Found Installing Optional Packages HOT 2
- spaces.Dict.contains can't return expected results HOT 1
- [Question] Problems with Collisions in Pusher-v4 HOT 42
- [Proposal] Change type of metadata from `dict` to `Mapping` HOT 2
- [Proposal] Randomize LunarLander wind generation at reset to gain statistical independence between episodes HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gymnasium.