dhdev0 / stochastic-muzero Goto Github PK

Pytorch Implementation of Stochastic MuZero for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.

License: GNU General Public License v3.0

Dockerfile 0.35% Python 93.35% Jupyter Notebook 6.31%

arxiv-papers machine-learning offline-reinforcement-learning online-reinforcement-learning muzero-stochastic stochastic-muzero deep-reinforcement-learning gym-environments lstm monte-carlo-tree-search

stochastic-muzero's People

Contributors

Stargazers

Watchers

Forkers

ipsec matteo-maggiolo karlheinzniebuhr windyasd valkryhx tkietreiber yamashirohermit emerrf jwjwjw3

stochastic-muzero's Issues

loss is nan from the beginning by default config

Hello, the default configuration of this code is that when cartpole is run, loss is nan from the beginning. It should be the gradient problem in some link of the code. Is this a complete copy of the code, or are there bugs that haven't been fixed yet?

Default experiments are not converging

Hi Daniel,

I'm running the experiment_450_config.json without modifications with this command:

python muzero_cli.py train report config/experiment_450_config.json

And I'm getting this report:

Is mandatory to run python muzero_cli.py human_buffer config/experiment_450_config.json before train?

training loss: nan

Hi Daniel,

I'm trying to run a custom environment (works with muzero) with your Stochastic-muzero version.

After creating a config file (just changing env name in experiment_450_config.json) I'm getting 'training loss: nan' while running.

Have you any idea by?

reproducing the result on 2048

Hello, Daniel, thanks a lot for your contribution. I am trying to reproduce the reported result on 2048 game. But I do not find the environment implementation in the repo. I would appreciate it if you could recommend an open source 2048 env to use in this repo.

Problem adapting Stoch-muzero to custom gymnasium environment

I'm trying to adapt the tutorial code to my environment which has the following dimensions:

env.observation_space.shape[0] #Continuous
50

env.action_space.n #Discrete
3

I'm getting an error in the following code:

# Define your custom trading environment using the 'TradingEnv' class with your DataFrame
env = TradingEnv(train_df)

# Set the random seed for reproducibility
seed = 0
np.random.seed(seed)
torch.manual_seed(seed)

# Initialize the MuZero model with appropriate parameters for your environment
muzero = Muzero(
    model_structure='mlp_model',
    observation_space_dimensions=env.observation_space.shape[0],  # Use the length of the 1D observation space
    action_space_dimensions=env.action_space.n,
    state_space_dimensions=61,
    hidden_layer_dimensions=126,
    number_of_hidden_layer=4,
    k_hypothetical_steps=10,
    learning_rate=0.01,
    optimizer="adam",
    lr_scheduler="cosineannealinglr",
    loss_type="general",
    num_of_epoch=1000,
    device="cuda",
    type_format=torch.float32,
    load=False,
    use_amp=False,
    bin_method="uniform_bin",
    bin_decomposition_number=10,
    priority_scale=0.5,
    rescale_value_loss=1
)

# Initialize the 'demonstration_buffer', 'replay_buffer', and 'mcts' as per your requirements

# Create the 'gameplay' instance for your custom environment
gameplay = Game(
    limit_of_game_play=500,
    gym_env=env,
    discount=mcts.discount,
    observation_dimension=env.observation_space.shape[0],  # Use the length of the 1D observation space
    action_dimension=env.action_space.n,
    rgb_observation=False,  # Change to True if you have RGB observations
    action_map={i: i for i in range(env.action_space.n)},  # Modify as needed
    priority_scale=muzero.priority_scale
)


# Train your model using the 'learning_cycle' function with your adapted environment
epoch_pr, loss, reward, learning_config = learning_cycle(
    number_of_iteration=1000,
    number_of_self_play_before_training=10,
    number_of_training_before_self_play=1,
    model_tag_number=450,
    temperature_type="static_temperature",
    verbose=True,
    number_of_worker_selfplay=0,
    muzero_model=muzero,
    gameplay=gameplay,
    monte_carlo_tree_search=mcts,
    replay_buffer=replay_buffer
)

TypeError                                 Traceback (most recent call last)
[<ipython-input-22-29674da447ad>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 # Initialize the MuZero model with appropriate parameters for your environment
---> 10 muzero = Muzero(
     11     model_structure='mlp_model',
     12     observation_space_dimensions=env.observation_space.shape[0],  # Use the length of the 1D observation space

3 frames
[/content/Stochastic-muzero/muzero_model.py](https://localhost:8080/#) in obs_space(self, obs)
    492             return int(sum(checker(i) for i in obs))
    493         else:
--> 494             return int(checker(obs))
    495 
    496     def one_hot_encode(self, action, counter_part):

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Feel free to run the entire reproducible example in this google colab

Stochastic MuZero for Simultaneous-Move Games

Hello,

I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.

I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?

Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.

Dependency Bug

Cannot build the docker container.
It results in this error:
278.6 ERROR: Ignored the following yanked versions: 0.1.63, 0.4.0, 0.4.15
278.6 ERROR: Ignored the following versions that require a different python version: 0.7 Requires-Python >=3.6, <3.7; 0.8 Requires-Python >=3.6, <3.7; 8.19.0 Requires-Python >=3.10
278.6 ERROR: Could not find a version that satisfies the requirement jaxlib==0.3.24; extra == "all" (from gymnasium[all]) (from versions: 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.16, 0.4.17, 0.4.18, 0.4.19, 0.4.20, 0.4.21, 0.4.22, 0.4.23)
278.6 ERROR: No matching distribution found for jaxlib==0.3.24; extra == "all"

Does the code only work on CartPole?

Questions about is_chance label assignment

When I run monte_carlo_tree_search.py, I found the decision node and chance node do not alternate within a search path.

Normally, we would expect a search path to look like this: [decision node, chance node, decision node, chance node, ...].
However, after running the code, the result is: [decision node, decision node, chance node, chance node, ...].

I believe the issue arises within the create_new_node_in_the_chosen_node_with_action_and_policy function. We should assign the value of is_child_chance to self.node.is_chance instead of self.node.children[i].is_chance.

dhdev0 / stochastic-muzero Goto Github PK

stochastic-muzero's People

Contributors

Stargazers

Watchers

Forkers

stochastic-muzero's Issues

loss is nan from the beginning by default config

Default experiments are not converging

training loss: nan

reproducing the result on 2048

Problem adapting Stoch-muzero to custom gymnasium environment

Stochastic MuZero for Simultaneous-Move Games

Dependency Bug

Does the code only work on CartPole?

Questions about is_chance label assignment

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent