Giter VIP home page Giter VIP logo

Comments (3)

dementrock avatar dementrock commented on July 25, 2024

Hi @aravindr93, could you share a script to reproduce the discrepancy?

For the stub mode, it is necessary for performing sampling in parallel which is due to Theano not interacting friendly with multiprocessing. It is also useful for running experiments on EC2. We don't have this feature documented right now, unfortunately.

from rllab.

aravindr93 avatar aravindr93 commented on July 25, 2024

Hi @dementrock

I just ran the codes below on my laptop without a GPU.

from __future__ import print_function
from __future__ import absolute_import

import sys
sys.dont_write_bytecode = True
from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.gym_env import GymEnv
from rllab.envs.normalized_env import normalize
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

stub(globals())

env = GymEnv("Hopper-v1")
policy = GaussianMLPPolicy(env_spec=env.spec, hidden_sizes=(32,32))
baseline = LinearFeatureBaseline(env_spec=env.spec)

algo = TRPO(env=env, policy=policy, baseline=baseline, batch_size=5000, max_path_length=env.horizon, n_itr=150, discount=0.99, step_size=1e-2)

Followed by either

algo.train()  #  commented out stub() command

or

run_experiment_lite(
    algo.train(),
    # Number of parallel workers for sampling
    n_parallel=4,
    # Only keep the snapshot parameters for the last iteration
    snapshot_mode="last",
    # Specifies the seed for the experiment. If this is not provided, a random seed
    # will be used
    seed=10,
    #plot=True,
)

Though the performances are somewhat variable, I get the algo.train() policy to perform somewhere in the ball-park of 850 with the above hyper-parameters. The stubbed mode however gets somewhere close to 2000. Though there is some variability in the numeric values, the stubbed mode performs consistently better for many hyper-parameter choices.

Also, just to confirm, batch_size parameter in TRPO is the overall batch size and not per processor rite? I'm curious because the total run-time for both the above codes were roughly the same (around 10 min in my laptop). I looked at the source code, but I just want to make sure.

Also, how do we convert a stubbed policy to a normal one. For example, lets say I train a policy in stubbed mode on a cluster. How do we "un-stud" it for sharing with other folks. Basically, I want to directly get a = policy.get_action(o)[0] instead of a serialized function.

Thanks & Regards!

from rllab.

dementrock avatar dementrock commented on July 25, 2024

Hi @aravindr93,

First of all, you might want to check the bounds for the action space. For continuous tasks it is usually advised to wrap the environment by the normalize() method (see https://github.com/rllab/rllab/blob/master/rllab/envs/normalized_env.py), which rescales the action range to [-1, 1], which facilitates exploration.

Also to make the results less prone to different initialization, you can try setting the seed to be the same in the non-stubbed version. This can be done by calling rllab.misc.ext.set_seed(10) in the code without stub(globals()).

Question about batch_size: Yes. It is the overall batch size.

Question about stubbed policy: when running in stub mode, intermediate results will be stored in a local folder (most likely data/local/experiment/...), and there will be a pkl file storing the trained policy parameters. You can load the policy data via joblib.load(). See https://github.com/rllab/rllab/blob/master/scripts/sim_policy.py#L40 for an example.

from rllab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.