Comments (3)
Hi @aravindr93, could you share a script to reproduce the discrepancy?
For the stub mode, it is necessary for performing sampling in parallel which is due to Theano not interacting friendly with multiprocessing. It is also useful for running experiments on EC2. We don't have this feature documented right now, unfortunately.
from rllab.
Hi @dementrock
I just ran the codes below on my laptop without a GPU.
from __future__ import print_function
from __future__ import absolute_import
import sys
sys.dont_write_bytecode = True
from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.gym_env import GymEnv
from rllab.envs.normalized_env import normalize
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
stub(globals())
env = GymEnv("Hopper-v1")
policy = GaussianMLPPolicy(env_spec=env.spec, hidden_sizes=(32,32))
baseline = LinearFeatureBaseline(env_spec=env.spec)
algo = TRPO(env=env, policy=policy, baseline=baseline, batch_size=5000, max_path_length=env.horizon, n_itr=150, discount=0.99, step_size=1e-2)
Followed by either
algo.train() # commented out stub() command
or
run_experiment_lite(
algo.train(),
# Number of parallel workers for sampling
n_parallel=4,
# Only keep the snapshot parameters for the last iteration
snapshot_mode="last",
# Specifies the seed for the experiment. If this is not provided, a random seed
# will be used
seed=10,
#plot=True,
)
Though the performances are somewhat variable, I get the algo.train()
policy to perform somewhere in the ball-park of 850 with the above hyper-parameters. The stubbed mode however gets somewhere close to 2000. Though there is some variability in the numeric values, the stubbed mode performs consistently better for many hyper-parameter choices.
Also, just to confirm, batch_size parameter in TRPO is the overall batch size and not per processor rite? I'm curious because the total run-time for both the above codes were roughly the same (around 10 min in my laptop). I looked at the source code, but I just want to make sure.
Also, how do we convert a stubbed policy to a normal one. For example, lets say I train a policy in stubbed mode on a cluster. How do we "un-stud" it for sharing with other folks. Basically, I want to directly get a = policy.get_action(o)[0]
instead of a serialized function.
Thanks & Regards!
from rllab.
Hi @aravindr93,
First of all, you might want to check the bounds for the action space. For continuous tasks it is usually advised to wrap the environment by the normalize()
method (see https://github.com/rllab/rllab/blob/master/rllab/envs/normalized_env.py), which rescales the action range to [-1, 1], which facilitates exploration.
Also to make the results less prone to different initialization, you can try setting the seed to be the same in the non-stubbed version. This can be done by calling rllab.misc.ext.set_seed(10)
in the code without stub(globals())
.
Question about batch_size
: Yes. It is the overall batch size.
Question about stubbed policy: when running in stub mode, intermediate results will be stored in a local folder (most likely data/local/experiment/...
), and there will be a pkl
file storing the trained policy parameters. You can load the policy data via joblib.load()
. See https://github.com/rllab/rllab/blob/master/scripts/sim_policy.py#L40 for an example.
from rllab.
Related Issues (20)
- gym.wrappers.monitoring import error HOT 1
- Problem running rllab MazeAntEnv HOT 2
- ImportError: cannot import name 'MemmapingPool' HOT 8
- How to record videos in SwimmerGatherEnv
- Error Using Custom Env + GaussianGRU + VPG
- Docker intended running environment HOT 2
- Gaussian Policy - no inputs
- can not find files vendor/mujoco/ HOT 4
- Dockerfiles unnecessarily large
- AttributeError: 'NoneType' object has no attribute 'put' HOT 1
- Difference between std_hidden_nonlinearity and hidden_nonlinearity?
- gradient descent to optimize the TRPO or PPO algorithm?
- No module named 'cached_property' HOT 1
- How to improve the GPU-Util when running RL program with RLLab. HOT 2
- setup_linux.sh always exits before creating environment
- Error while instantiating <class 'rllab.envs.gym_env.GymEnv'> HOT 1
- [Installation Issue]: ResolvePackageNotFound HOT 2
- How to test trained model??
- ResolvePackageNotFound:
- Stuck while training at 977 itr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rllab.