tensorlayer / rlzoo Goto Github PK
View Code? Open in Web Editor NEWA Comprehensive Reinforcement Learning Zoo for Simple Usage 🚀
Home Page: http://rlzoo.readthedocs.io
License: Apache License 2.0
A Comprehensive Reinforcement Learning Zoo for Simple Usage 🚀
Home Page: http://rlzoo.readthedocs.io
License: Apache License 2.0
Hi, I am checking this repository, I was able to install everything without apparent problem.
I am testing the run_rlzoo.py script using RLBench with the ReachTarget. I run it but it is quite slow. One episode takes about 2~3 minutes. I wonder if you have seen the same behavior when training or somehow is my configuration. At first, the episode took about 5 min, then I realized that TensorFlow was not working with my GPU, I fixed that and well now is twice as fast, but still, 3 minutes per episode is quite slow, especially if it plans to run for 1000 episodes.
It is that the normal speed using R-VEP? is there anything I can do to train using faster-than-real-time simulation with RLBench?
My customized gym env has a dict type obs_space, even I also customized ActorNetwork and CriticNetwork, I found out RLzoo's source code seems only support single input and can not handle dict state.
Is there any plan to support dict gym env state?
https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/value_networks.py#L187
I think L187 "raise ValueError("State Shape Not Accepted!")" should be changed to "raise ValueError("Action Shape Not Accepted!")"
https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/algorithms/dqn/dqn.py#L178-L179
should be modified to "eps = 1 - (1 - exploration_final_eps) * min(1, i / (exploration_rate * train_episodes * max_steps))"
Traceback (most recent call last):
File "D:/Anaconda3/Lib/site-packages/rlzoo/run_rlzoo.py", line 30, in
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
File "D:\Anaconda3\Lib\site-packages\rlzoo\common\utils.py", line 131, in call_default_params
default_seed) # need manually set seed in the main script if default_seed = False
File "D:\Anaconda3\Lib\site-packages\rlzoo\algorithms\sac\default.py", line 43, in classic_control
soft_q_net1 = QNetwork(env.observation_space, env.action_space,
NameError: name 'QNetwork' is not defined
I tried to benchmark the follwing environments ['BipedalWalker-v2', 'BipedalWalkerHardcore-v2', 'CarRacing-v0', 'LunarLander-v2', 'LunarLanderContinuous-v2'] using ['A3C', 'DDPG', 'TD3', 'SAC', 'PG', 'TRPO', 'PPO', 'DPPO'] algorithms. Most of the combinations failed to learn the task and didn't converge. Only (SAC, LunarLanderContinuous-v2) and (TD3, LunarLanderContinuous-v2) learnt the task sub-optimally. . Can someone address this issue?
Hi,
In your code, the training parameter setting is imported from utils.
from rlzoo.common.utils import call_default_params
May I check is there any document that explains what is this default setting and how to you fix it?
I am getting this issue (in the screenshot) while running RLzoo with RLbench using the following code:
from rlzoo.common.env_wrappers import *
from rlzoo.common.utils import *
from rlzoo.algorithms import *
EnvName = 'ReachTarget'
EnvType = 'rlbench'
env = build_env(EnvName, EnvType, state_type='vision')
AlgName = 'SAC'
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=False, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)
env.close()
I have also added export PYTHONPATH="/home/sidharth/RLBench"
in .bashrc
Any help would be appreciated! Thanks.
Hi,
first of all let me say that I appreciate a lot the work made in this repo.
I would like to know if you have had success in training any algorithm using RLBench as the environment.
I'm currently trying to train the DDPG algorithm on the ReachTarget
task using all the observations available with state_type='vision'
. As suggested in the issue #6 I modified the default params for DDPG lowering the max_steps
and increasing the train_episodes
, but I can't seem to get any result.
Any feedback is really much appreciated.
Mirko
Edit:
I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?
Why you do "assert len(self._action_shape) == 1", i.e, https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/value_networks.py#L143 ? I understand that spaces.Box's shape may have two or higher dimension
Does the RLzoo support the environment which is written by user?
in https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L96, I think it should modified to "return -tf.reduce_sum(x*self._logits, axis=1)" for returning cross entropy because self._logits is already the logarithm of the probability .
In https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L99, I think there is a concise implementation:
@expand_dims
def kl(self, logits):
p = tf.exp(self._logits)
kl = tf.reduce_sum(p * (self._logits-logits), axis=-1)
return kl
Similarly in https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L115
@expand_dims
def entropy(_logits):
p = tf.exp(_logits)
return tf.reduce_sum(-p * _logits, axis=-1)
I don’t know the reason why you implemented it in a more complicated way in your code. Is it convenient to tell me?
When I run the run_dqn.py with setting QNetwork(...)'s parameter state_only=False, "states, actions = inputs" in value_networks.py occurrs a ValueError as the title indicates. It occurs because "obv = np.expand_dims(obv, 0).astype('float32')" in dqn.py. I think if state_only = False, obv should add act_inputs for debugging. Hope you can fix this error.
I install RLzoo and use its PPO to train an agent for DM Control Suite. I tested environments CheetahRun-v0 and CartpoleSwingup-v0, but the current PPO could solve neither of both. Could you please help me? I attach the testing reward for CartpoleSwingup-v0 below.
Testing... | Algorithm: PPO_CLIP | Environment: CartpoleSwingup-v0
Episode: 0/100 | Episode Reward: 28.4757 | Running Time: 0.5145
Episode: 1/100 | Episode Reward: 28.9385 | Running Time: 0.9912
Episode: 2/100 | Episode Reward: 28.6354 | Running Time: 1.4690
Episode: 3/100 | Episode Reward: 29.3395 | Running Time: 1.9561
Episode: 4/100 | Episode Reward: 28.6659 | Running Time: 2.4513
Episode: 5/100 | Episode Reward: 29.1282 | Running Time: 2.9346
Episode: 6/100 | Episode Reward: 28.1226 | Running Time: 3.4143
Episode: 7/100 | Episode Reward: 28.2178 | Running Time: 3.8957
Episode: 8/100 | Episode Reward: 28.2538 | Running Time: 4.3784
Episode: 9/100 | Episode Reward: 28.1161 | Running Time: 4.8581
Episode: 10/100 | Episode Reward: 28.2593 | Running Time: 5.3397
Episode: 11/100 | Episode Reward: 28.5096 | Running Time: 5.8084
Episode: 12/100 | Episode Reward: 27.9026 | Running Time: 6.2774
Episode: 13/100 | Episode Reward: 28.5970 | Running Time: 6.7430
Episode: 14/100 | Episode Reward: 28.6751 | Running Time: 7.2089
Episode: 15/100 | Episode Reward: 28.5764 | Running Time: 7.6745
Episode: 16/100 | Episode Reward: 28.2926 | Running Time: 8.1450
Episode: 17/100 | Episode Reward: 28.1627 | Running Time: 8.6118
Episode: 18/100 | Episode Reward: 28.3646 | Running Time: 9.1045
Episode: 19/100 | Episode Reward: 29.0091 | Running Time: 9.5852
Episode: 20/100 | Episode Reward: 28.6220 | Running Time: 10.0611
Episode: 21/100 | Episode Reward: 28.0959 | Running Time: 10.5363
Episode: 22/100 | Episode Reward: 28.3945 | Running Time: 11.0277
Episode: 23/100 | Episode Reward: 27.9222 | Running Time: 11.4943
Episode: 24/100 | Episode Reward: 29.1726 | Running Time: 11.9824
Episode: 25/100 | Episode Reward: 27.4740 | Running Time: 12.4491
Episode: 26/100 | Episode Reward: 29.0847 | Running Time: 13.0193
Episode: 27/100 | Episode Reward: 28.7490 | Running Time: 13.5816
Episode: 28/100 | Episode Reward: 29.4471 | Running Time: 14.0713
Episode: 29/100 | Episode Reward: 28.9889 | Running Time: 14.5420
Episode: 30/100 | Episode Reward: 27.9317 | Running Time: 15.0105
Episode: 31/100 | Episode Reward: 28.6057 | Running Time: 15.4807
Episode: 32/100 | Episode Reward: 27.9958 | Running Time: 15.9642
Episode: 33/100 | Episode Reward: 28.7247 | Running Time: 16.4500
Episode: 34/100 | Episode Reward: 28.7462 | Running Time: 16.9543
Episode: 35/100 | Episode Reward: 27.8867 | Running Time: 17.4389
Episode: 36/100 | Episode Reward: 28.2105 | Running Time: 17.9195
Episode: 37/100 | Episode Reward: 28.7734 | Running Time: 18.4043
Episode: 38/100 | Episode Reward: 28.7227 | Running Time: 18.8784
Episode: 39/100 | Episode Reward: 28.0787 | Running Time: 19.4129
Episode: 40/100 | Episode Reward: 28.7410 | Running Time: 19.9047
Episode: 41/100 | Episode Reward: 28.2244 | Running Time: 20.3926
Episode: 42/100 | Episode Reward: 28.6137 | Running Time: 20.8720
Episode: 43/100 | Episode Reward: 28.1213 | Running Time: 21.3447
Episode: 44/100 | Episode Reward: 28.7770 | Running Time: 21.8240
Episode: 45/100 | Episode Reward: 28.4468 | Running Time: 22.3140
Episode: 46/100 | Episode Reward: 28.3316 | Running Time: 22.7774
Episode: 47/100 | Episode Reward: 28.9745 | Running Time: 23.2579
Episode: 48/100 | Episode Reward: 28.5198 | Running Time: 23.7473
Episode: 49/100 | Episode Reward: 28.2299 | Running Time: 24.2266
Episode: 50/100 | Episode Reward: 27.5498 | Running Time: 24.7056
Episode: 51/100 | Episode Reward: 28.1589 | Running Time: 25.1900
Episode: 52/100 | Episode Reward: 28.2864 | Running Time: 25.6784
Episode: 53/100 | Episode Reward: 28.6884 | Running Time: 26.1567
Episode: 54/100 | Episode Reward: 28.1469 | Running Time: 26.6400
Episode: 55/100 | Episode Reward: 28.5643 | Running Time: 27.1302
Episode: 56/100 | Episode Reward: 28.3990 | Running Time: 27.6151
Episode: 57/100 | Episode Reward: 28.4950 | Running Time: 28.0974
Episode: 58/100 | Episode Reward: 27.8701 | Running Time: 28.5832
Episode: 59/100 | Episode Reward: 28.7812 | Running Time: 29.0694
Episode: 60/100 | Episode Reward: 27.9976 | Running Time: 29.5470
Episode: 61/100 | Episode Reward: 28.6969 | Running Time: 30.0193
Episode: 62/100 | Episode Reward: 28.6212 | Running Time: 30.4909
Episode: 63/100 | Episode Reward: 27.4787 | Running Time: 30.9708
Episode: 64/100 | Episode Reward: 28.4545 | Running Time: 31.4938
Episode: 65/100 | Episode Reward: 28.5045 | Running Time: 31.9696
Episode: 66/100 | Episode Reward: 27.7482 | Running Time: 32.4473
Episode: 67/100 | Episode Reward: 28.2154 | Running Time: 32.9266
Episode: 68/100 | Episode Reward: 28.5635 | Running Time: 33.3989
Episode: 69/100 | Episode Reward: 28.1430 | Running Time: 33.8813
Episode: 70/100 | Episode Reward: 28.6439 | Running Time: 34.3629
Episode: 71/100 | Episode Reward: 28.0486 | Running Time: 34.8364
Episode: 72/100 | Episode Reward: 28.7735 | Running Time: 35.3201
Episode: 73/100 | Episode Reward: 28.5547 | Running Time: 35.8198
Episode: 74/100 | Episode Reward: 28.8559 | Running Time: 36.3093
Episode: 75/100 | Episode Reward: 28.3621 | Running Time: 36.7830
Episode: 76/100 | Episode Reward: 28.5224 | Running Time: 37.2498
Episode: 77/100 | Episode Reward: 27.8541 | Running Time: 37.7178
Episode: 78/100 | Episode Reward: 28.5152 | Running Time: 38.1858
Episode: 79/100 | Episode Reward: 28.2094 | Running Time: 38.6564
Episode: 80/100 | Episode Reward: 27.7631 | Running Time: 39.1511
Episode: 81/100 | Episode Reward: 28.6392 | Running Time: 39.6275
Episode: 82/100 | Episode Reward: 29.1371 | Running Time: 40.1082
Episode: 83/100 | Episode Reward: 28.3666 | Running Time: 40.6566
Episode: 84/100 | Episode Reward: 28.5571 | Running Time: 41.1946
Episode: 85/100 | Episode Reward: 27.9868 | Running Time: 41.7414
Episode: 86/100 | Episode Reward: 28.3500 | Running Time: 42.3022
Episode: 87/100 | Episode Reward: 28.3478 | Running Time: 42.8577
Episode: 88/100 | Episode Reward: 28.2653 | Running Time: 43.4031
Episode: 89/100 | Episode Reward: 27.4543 | Running Time: 43.9250
Episode: 90/100 | Episode Reward: 28.3082 | Running Time: 44.4695
Episode: 91/100 | Episode Reward: 28.7126 | Running Time: 44.9914
Episode: 92/100 | Episode Reward: 28.8012 | Running Time: 45.5375
Episode: 93/100 | Episode Reward: 28.3971 | Running Time: 46.0787
Episode: 94/100 | Episode Reward: 28.2597 | Running Time: 46.6196
Episode: 95/100 | Episode Reward: 28.6428 | Running Time: 47.1560
Episode: 96/100 | Episode Reward: 28.4852 | Running Time: 47.6989
Episode: 97/100 | Episode Reward: 29.4192 | Running Time: 48.2465
Episode: 98/100 | Episode Reward: 29.0577 | Running Time: 48.7841
Episode: 99/100 | Episode Reward: 28.3379 | Running Time: 49.3224
I was trying to use the RLzoo algorithms for the RLBench environment and I got an error from line 50 in the build_rlbench_env.py
file.
Seems like in this commit they replaced where to get the action_size
attribute, now line 50 should be
self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(self.env.action_size,), dtype=np.float32)
hi, I have run (rlzoo/interactive/main.ipynb ) and I got this error
ImportError: cannot import name 'ArmActionMode'
I did the solution that was suggested in #43 but now I get a new error and the problem isn't solved.
the error is action_shape() missing 1 required positional argument: 'scene'
what should I do?
In https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/policy_networks.py#L192, I want to ask What is the purpose of the init parameter state_conditioned of class StochasticPolicyNetwork.
Hello!
I want to introduce a new RLBench task (or also override one). How do I accomplish this properly? The only way I can think of now is to rewrite parts of the code in the RLBench package, which I don't think is the proper way to do it. Should there be an argument to indicate where the task is defined?
Thank you!
https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L137, self.action_scale is initialized to None. https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/common/distributions.py#L173, I think it occurs a error because of divided by None
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.