Giter VIP home page Giter VIP logo

Comments (6)

araffin avatar araffin commented on May 21, 2024

hello,

You mean optimizing the model architecture?
yes, it is possible, you need to change the sampler script a bit and pass a policy_kwargs=dict(net_arch=[64,64]) (or layers= for SAC/DQN...) to the constructor (cf doc).

from rl-baselines-zoo.

jarlva avatar jarlva commented on May 21, 2024

Thanks Antonin,

Yes, optimizing the model architecture (tensors, layers, etc..)
I'm new to SB and tried some things ( https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html). Yet, it's not clear how exactly to tune the model (via optuna I assume). Would it be possible to get a simple example (like cartpole)?

Much appreciated!
Jake

from rl-baselines-zoo.

eunomiadev avatar eunomiadev commented on May 21, 2024

is this what you want?

import numpy as np
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import PPO2
from stable_baselines.common.policies import MlpLnLstmPolicy
import optuna

n_cpu = 4


def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    return {
        'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
        'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
        'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
        'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
        'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
        'lam': trial.suggest_uniform('lam', 0.8, 1.)
    }


def optimize_agent(trial):
    """ Train the model and optimise
        Optuna maximises the negative log likelihood, so we
        need to negate the reward here
    """
    model_params = optimize_ppo2(trial)
    env = SubprocVecEnv([lambda: gym.make('CartPole-v1') for i in range(n_cpu)])
    model = PPO2(MlpLnLstmPolicy, env, verbose=0, nminibatches=1, **model_params)
    model.learn(10000)

    rewards = []
    n_episodes, reward_sum = 0, 0.0

    obs = env.reset()
    while n_episodes < 4:
        action, _ = model.predict(obs)
        obs, reward, done, _ = env.step(action)
        reward_sum += reward

        if done:
            rewards.append(reward_sum)
            reward_sum = 0.0
            n_episodes += 1
            obs = env.reset()

    last_reward = np.mean(rewards)
    trial.report(-1 * last_reward)

    return -1 * last_reward


if __name__ == '__main__':
    study = optuna.create_study(study_name='cartpol_optuna', storage='sqlite:///params.db', load_if_exists=True)
    study.optimize(optimize_agent, n_trials=1000, n_jobs=1)

from rl-baselines-zoo.

jarlva avatar jarlva commented on May 21, 2024

Thanks for the script Eunomia! That has been very helpful!

Is there a place to define and tune the tensorflow model layers/tensors? For example, in Keras the model is defined by:
model = Sequential() ; model.add(Dense(32, input_dim=784)) model.add(Activation('relu'))

There is something a bit less simple in tensorflow.
Now, optimizing the model (tensors/layers and activation) to a specific problem can yield remarkable results/speed-up. To that end, Google came with up with adanet AutoML - a way to automatically find/tune the best tensorflow model (not sure how to apply it in RL). Is there a way to tune model's tensors/layers/activation (maybe by modifying the script above) via optuna (or maybe adanet)?

from rl-baselines-zoo.

araffin avatar araffin commented on May 21, 2024

@jheffez

The code you are looking for (and that @eunomiadev wrote) is here.

Is there a place to define and tune the tensorflow model layers/tensors?

Please read the documentation for that (especially "custom policy" part).
A quick example:

model = PPO2('MlpPolicy', 'CartPole-v1', policy_kwargs=dict(net_arch=[256, 256]))

with optuna:

def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    net_arch = trial.suggest_categorical('net_arch', ['small', 'medium'])
    net_arch = {
        'small': [dict(pi=[64, 64], vf=[64, 64])],
        'medium': [dict(pi=[256, 256], vf=[256, 256])],
    }[net_arch]
    return {
        'policy_kwargs': dict(net_arch=net_arch),
    }

I also recommend to read optuna documentation, you should find an answer to your questions ;)

from rl-baselines-zoo.

jarlva avatar jarlva commented on May 21, 2024

Thanks again!
I'll check it out.

from rl-baselines-zoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.