Hi, I'm noticing a considerable difference in performance between th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Difference in performance between normal and stubbed modes about rllab HOT 3 CLOSED

rll commented on July 25, 2024

Difference in performance between normal and stubbed modes

from rllab.

Comments (3)

dementrock commented on July 25, 2024

Hi @aravindr93, could you share a script to reproduce the discrepancy?

For the stub mode, it is necessary for performing sampling in parallel which is due to Theano not interacting friendly with multiprocessing. It is also useful for running experiments on EC2. We don't have this feature documented right now, unfortunately.

from rllab.

aravindr93 commented on July 25, 2024

Hi @dementrock

I just ran the codes below on my laptop without a GPU.

from __future__ import print_function
from __future__ import absolute_import

import sys
sys.dont_write_bytecode = True
from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.gym_env import GymEnv
from rllab.envs.normalized_env import normalize
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

stub(globals())

env = GymEnv("Hopper-v1")
policy = GaussianMLPPolicy(env_spec=env.spec, hidden_sizes=(32,32))
baseline = LinearFeatureBaseline(env_spec=env.spec)

algo = TRPO(env=env, policy=policy, baseline=baseline, batch_size=5000, max_path_length=env.horizon, n_itr=150, discount=0.99, step_size=1e-2)

Followed by either

algo.train()  #  commented out stub() command

run_experiment_lite(
    algo.train(),
    # Number of parallel workers for sampling
    n_parallel=4,
    # Only keep the snapshot parameters for the last iteration
    snapshot_mode="last",
    # Specifies the seed for the experiment. If this is not provided, a random seed
    # will be used
    seed=10,
    #plot=True,
)

Though the performances are somewhat variable, I get the algo.train() policy to perform somewhere in the ball-park of 850 with the above hyper-parameters. The stubbed mode however gets somewhere close to 2000. Though there is some variability in the numeric values, the stubbed mode performs consistently better for many hyper-parameter choices.

Also, just to confirm, batch_size parameter in TRPO is the overall batch size and not per processor rite? I'm curious because the total run-time for both the above codes were roughly the same (around 10 min in my laptop). I looked at the source code, but I just want to make sure.

Also, how do we convert a stubbed policy to a normal one. For example, lets say I train a policy in stubbed mode on a cluster. How do we "un-stud" it for sharing with other folks. Basically, I want to directly get a = policy.get_action(o)[0] instead of a serialized function.

Thanks & Regards!

from rllab.

dementrock commented on July 25, 2024

Hi @aravindr93,

First of all, you might want to check the bounds for the action space. For continuous tasks it is usually advised to wrap the environment by the normalize() method (see https://github.com/rllab/rllab/blob/master/rllab/envs/normalized_env.py), which rescales the action range to [-1, 1], which facilitates exploration.

Also to make the results less prone to different initialization, you can try setting the seed to be the same in the non-stubbed version. This can be done by calling rllab.misc.ext.set_seed(10) in the code without stub(globals()).

Question about batch_size: Yes. It is the overall batch size.

Question about stubbed policy: when running in stub mode, intermediate results will be stored in a local folder (most likely data/local/experiment/...), and there will be a pkl file storing the trained policy parameters. You can load the policy data via joblib.load(). See https://github.com/rllab/rllab/blob/master/scripts/sim_policy.py#L40 for an example.

from rllab.

Difference in performance between normal and stubbed modes about rllab HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent