I understand that there is a way to tune hyperparameter. Is there a way to tune the ac

is this what you want? <div class="highlight highlight-source-python notranslate p

Thanks for the Eunomia! That has been very helpful! Is there

@jheffez The code you are looking for (and that <a class="user-menti

[question] Architecture Search about rl-baselines-zoo HOT 6 CLOSED

araffin commented on May 21, 2024

[question] Architecture Search

from rl-baselines-zoo.

Comments (6)

araffin commented on May 21, 2024

hello,

You mean optimizing the model architecture?
yes, it is possible, you need to change the sampler script a bit and pass a policy_kwargs=dict(net_arch=[64,64]) (or layers= for SAC/DQN...) to the constructor (cf doc).

from rl-baselines-zoo.

jarlva commented on May 21, 2024

Thanks Antonin,

Yes, optimizing the model architecture (tensors, layers, etc..)
I'm new to SB and tried some things ( https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html). Yet, it's not clear how exactly to tune the model (via optuna I assume). Would it be possible to get a simple example (like cartpole)?

Much appreciated!
Jake

from rl-baselines-zoo.

eunomiadev commented on May 21, 2024

is this what you want?

import numpy as np
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import PPO2
from stable_baselines.common.policies import MlpLnLstmPolicy
import optuna

n_cpu = 4


def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    return {
        'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
        'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
        'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
        'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
        'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
        'lam': trial.suggest_uniform('lam', 0.8, 1.)
    }


def optimize_agent(trial):
    """ Train the model and optimise
        Optuna maximises the negative log likelihood, so we
        need to negate the reward here
    """
    model_params = optimize_ppo2(trial)
    env = SubprocVecEnv([lambda: gym.make('CartPole-v1') for i in range(n_cpu)])
    model = PPO2(MlpLnLstmPolicy, env, verbose=0, nminibatches=1, **model_params)
    model.learn(10000)

    rewards = []
    n_episodes, reward_sum = 0, 0.0

    obs = env.reset()
    while n_episodes < 4:
        action, _ = model.predict(obs)
        obs, reward, done, _ = env.step(action)
        reward_sum += reward

        if done:
            rewards.append(reward_sum)
            reward_sum = 0.0
            n_episodes += 1
            obs = env.reset()

    last_reward = np.mean(rewards)
    trial.report(-1 * last_reward)

    return -1 * last_reward


if __name__ == '__main__':
    study = optuna.create_study(study_name='cartpol_optuna', storage='sqlite:///params.db', load_if_exists=True)
    study.optimize(optimize_agent, n_trials=1000, n_jobs=1)

from rl-baselines-zoo.

jarlva commented on May 21, 2024

Thanks for the script Eunomia! That has been very helpful!

Is there a place to define and tune the tensorflow model layers/tensors? For example, in Keras the model is defined by:
model = Sequential() ; model.add(Dense(32, input_dim=784)) model.add(Activation('relu'))

There is something a bit less simple in tensorflow.
Now, optimizing the model (tensors/layers and activation) to a specific problem can yield remarkable results/speed-up. To that end, Google came with up with adanet AutoML - a way to automatically find/tune the best tensorflow model (not sure how to apply it in RL). Is there a way to tune model's tensors/layers/activation (maybe by modifying the script above) via optuna (or maybe adanet)?

from rl-baselines-zoo.

araffin commented on May 21, 2024

@jheffez

The code you are looking for (and that @eunomiadev wrote) is here.

Is there a place to define and tune the tensorflow model layers/tensors?

Please read the documentation for that (especially "custom policy" part).
A quick example:

model = PPO2('MlpPolicy', 'CartPole-v1', policy_kwargs=dict(net_arch=[256, 256]))

with optuna:

def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    net_arch = trial.suggest_categorical('net_arch', ['small', 'medium'])
    net_arch = {
        'small': [dict(pi=[64, 64], vf=[64, 64])],
        'medium': [dict(pi=[256, 256], vf=[256, 256])],
    }[net_arch]
    return {
        'policy_kwargs': dict(net_arch=net_arch),
    }

I also recommend to read optuna documentation, you should find an answer to your questions ;)

from rl-baselines-zoo.

jarlva commented on May 21, 2024

Thanks again!
I'll check it out.

from rl-baselines-zoo.

[question] Architecture Search about rl-baselines-zoo HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent