Giter VIP home page Giter VIP logo

Comments (8)

dementrock avatar dementrock commented on July 25, 2024

Hi @lchenat, all the parameters should have been documented in the appendix of the paper. However, it is not guaranteed that you will get exactly the same result, due to difference in random seeds. I'd be happy to assist if you observe significant discrepancies.

from rllab.

lchenat avatar lchenat commented on July 25, 2024

I did not find the parameters of DDPG in the appendix of the paper. I ran the following code and the maximal average return in each iteration is no more than 2400:

import re
import numpy
import sys
from subprocess import call
from rllab.algos.vpg import VPG
from rllab.algos.tnpg import TNPG
from rllab.algos.erwr import ERWR
from rllab.algos.reps import REPS
from rllab.algos.trpo import TRPO
from rllab.algos.cem import CEM
from rllab.algos.cma_es import CMAES
from rllab.algos.ddpg import DDPG
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.box2d.cartpole_env import CartpoleEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.exploration_strategies.ou_strategy import OUStrategy
from rllab.policies.deterministic_mlp_policy import DeterministicMLPPolicy
from rllab.q_functions.continuous_mlp_q_function import ContinuousMLPQFunction

path = "/home/data/lchenat/rllab-master/data/local/experiment/"
exp_name = "test_cartpole_again_ddpg"

stub(globals())

env = normalize(CartpoleEnv())

policy = DeterministicMLPPolicy(
env_spec=env.spec,
hidden_sizes=(400, 300)
)
es = OUStrategy(env_spec=env.spec)
qf = ContinuousMLPQFunction(env_spec=env.spec)
algo = DDPG(
env=env,
policy=policy,
es=es,
qf=qf,
n_epochs=600,
)

#delete the previous data
call(["rm","-rf",path+exp_name])

run_experiment_lite(
algo.train(),
n_parallel=1,
snapshot_mode="last",
#seed=1,
exp_name=exp_name,
#plot=True,
)

from rllab.

lchenat avatar lchenat commented on July 25, 2024

By the way, is there a function that can calculate the metric defined in the paper? (average over all iterations and all trajectories) The debug log only provides average return of each iteration and the number of trajectories in each iteration are not provided in some algorithms.

from rllab.

dementrock avatar dementrock commented on July 25, 2024

Also, as mentioned in the paper (probably should have been more clear), we scaled all the rewards by 0.1 when running DDPG. Refer to https://github.com/openai/rllab/blob/master/rllab/algos/ddpg.py#L112

In general, we found this parameter to be very important, and due to time constraint at the time we weren't able to tune it extensively. You may try some other values on other tasks, which may give you even better results.

Re the second question: I think we did a very crude approximation and simply averaged the results over all iterations (treating it as if all iterations had the same number of trajectories). Feel free to submit a pull request that adds additional loggings.

from rllab.

lchenat avatar lchenat commented on July 25, 2024

I have scale the reward by 0.1, but I still got return around 2500, are there any other parameters that I need to tune?

from rllab.

dementrock avatar dementrock commented on July 25, 2024

Oh, you should change max path length in ddpg to 500. Otherwise, the optimal score is 2500!

from rllab.

lchenat avatar lchenat commented on July 25, 2024

Yes, the optimal score has increased to 5000 after I have changed the max path length to 500, but the average over all iteration is around 3100. Here are average return extracted from the debug.log:

[85.1877, 22.4833, 22.2935, 22.4445, 22.561, 22.3393, 22.8141, 22.2145, 22.2697, 22.3604, 100.441, 177.388, 196.363, 183.331, 223.452, 272.554, 293.124, 407.079, 535.813, 619.828, 695.468, 872.355, 1028.65, 952.744, 645.209, 846.002, 601.686, 607.632, 656.687, 697.427, 715.399, 646.103, 646.78, 621.531, 609.173, 629.381, 598.768, 633.524, 603.093, 692.313, 627.032, 665.51, 671.895, 678.046, 721.31, 670.6, 645.387, 603.164, 594.49, 617.101, 676.009, 634.184, 627.533, 658.008, 700.695, 684.835, 622.859, 596.207, 691.321, 615.621, 612.777, 573.243, 598.272, 611.166, 596.099, 598.044, 551.066, 636.267, 740.511, 599.541, 605.533, 615.751, 710.193, 662.288, 619.205, 661.016, 582.386, 582.968, 601.911, 653.29, 617.729, 651.414, 744.331, 714.654, 658.312, 804.903, 841.202, 925.207, 855.179, 1044.97, 895.128, 936.976, 1066.89, 1406.07, 2131.26, 4021.35, 1814.43, 1877.28, 1512.61, 1993.6, 1686.47, 1991.07, 3476.89, 4138.7, 2385.71, 3379.73, 2648.44, 2970.91, 4008.72, 4683.97, 3603.48, 4999.14, 4999.04, 4998.86, 2328.25, 4534.03, 4999.28, 4999.24, 4998.56, 4283.28, 4998.47, 4998.89, 4998.86, 2223.49, 4999.18, 2702.06, 4998.8, 4998.67, 4999.02, 4998.57, 4999.6, 4998.84, 4998.5, 4998.65, 2449.9, 2153.85, 2034.24, 1275.76, 1394.86, 2258.75, 4557.9, 4998.51, 4998.52, 4998.37, 4998.73, 4998.16, 4997.71, 4997.81, 4583.94, 4998.32, 4998.46, 4998.38, 4998.21, 4804.9, 4997.79, 4998.41, 4998.03, 4998.44, 4998.26, 4998.16, 4998.07, 4998.21, 4997.73, 4998.04, 4997.81, 4998.3, 4998.33, 4998.2, 4998.27, 4998.15, 4998.6, 4998.23, 4998.63, 4998.58, 4998.57, 4999.11, 4999.32, 4999.47, 4999.41, 4790.46, 4999.45, 4999.45, 4999.57, 4999.45, 4781.79, 4999.5, 4999.46, 2834.94, 2667.89, 4999.43, 4879.07, 4999.51, 4999.5, 4256.07, 4999.24, 3749.83, 3140.73, 2184.49, 3293.37, 4276.64, 4570.93, 4549.38, 4448.15, 4999.32, 4608.16, 4999.52, 4999.38, 4999.16, 4999.43, 4790.45, 4999.54, 4724.55, 4999.43, 4627.56, 4999.58, 4999.45, 4272.88, 4999.26, 4999.38, 4784.83, 4731.7, 4696.11, 4427.15, 4165.41, 4906.99, 4422.53, 3953.47, 3692.44, 4123.02, 4571.29, 4450.07, 4999.32, 4859.32, 4999.44, 4498.9, 4895.5, 4999.22, 4589.09, 4998.88, 4733.38, 4775.73, 4999.29, 4999.18, 4640.48, 4610.55, 4935.44, 4999.2, 4883.15, 4852.51, 4900.67, 4835.74, 4500.04, 4738.27, 4531.23, 4530.79, 4999.0, 4999.18, 3974.69, 4797.54, 4998.95, 4000.32, 3699.98, 3424.3, 4998.86, 4003.68, 4878.38, 4915.73, 4763.66, 4998.63, 4688.21, 4998.92, 4926.33, 3244.25, 4507.45, 4998.75, 4998.79, 4998.45, 3060.27, 2583.36, 2717.86, 2005.12, 4911.39, 4998.91, 4998.66, 4660.82, 4789.71, 4998.43, 4998.52, 4884.03, 4541.58, 4998.37]
291results
3179.83032646

The average return drops down from 5000 to 2000-3000 from time to time, is that a normal phenomenon in ddpg?

from rllab.

dementrock avatar dementrock commented on July 25, 2024

@lchenat The benchmark results were run over 25 million samples, to match the sample complexity used by other algorithms. This should correspond to roughly 2500 epochs. A good approximation could be to extrapolate the performance of the last few epochs to the same amount of sample, and compute the average return using all these data.

I have also observed that ddpg is sometimes unstable, even in cartpole. What you're getting seems about right. One thing we didn't try was batch normalization, which we did not get working before the paper deadline and this could be a good thing to try. You can also try other reward scaling (e.g. 0.01), which might stabilize learning more.

from rllab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.