Giter VIP home page Giter VIP logo

dhdev0 / stochastic-muzero Goto Github PK

View Code? Open in Web Editor NEW
49.0 5.0 9.0 12.87 MB

Pytorch Implementation of Stochastic MuZero for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.

License: GNU General Public License v3.0

Dockerfile 0.35% Python 93.35% Jupyter Notebook 6.31%
arxiv-papers machine-learning offline-reinforcement-learning online-reinforcement-learning muzero-stochastic stochastic-muzero deep-reinforcement-learning gym-environments lstm monte-carlo-tree-search

stochastic-muzero's People

Contributors

dhdev0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stochastic-muzero's Issues

loss is nan from the beginning by default config

Hello, the default configuration of this code is that when cartpole is run, loss is nan from the beginning. It should be the gradient problem in some link of the code. Is this a complete copy of the code, or are there bugs that haven't been fixed yet?

Default experiments are not converging

Hi Daniel,

I'm running the experiment_450_config.json without modifications with this command:

python muzero_cli.py train report config/experiment_450_config.json

And I'm getting this report:

model_450_data_of_the_average_reward

Is mandatory to run python muzero_cli.py human_buffer config/experiment_450_config.json before train?

training loss: nan

Hi Daniel,

I'm trying to run a custom environment (works with muzero) with your Stochastic-muzero version.

After creating a config file (just changing env name in experiment_450_config.json) I'm getting 'training loss: nan' while running.

Have you any idea by?

reproducing the result on 2048

Hello, Daniel, thanks a lot for your contribution. I am trying to reproduce the reported result on 2048 game. But I do not find the environment implementation in the repo. I would appreciate it if you could recommend an open source 2048 env to use in this repo.

Problem adapting Stoch-muzero to custom gymnasium environment

I'm trying to adapt the tutorial code to my environment which has the following dimensions:

env.observation_space.shape[0] #Continuous
50
env.action_space.n #Discrete
3

I'm getting an error in the following code:

# Define your custom trading environment using the 'TradingEnv' class with your DataFrame
env = TradingEnv(train_df)

# Set the random seed for reproducibility
seed = 0
np.random.seed(seed)
torch.manual_seed(seed)

# Initialize the MuZero model with appropriate parameters for your environment
muzero = Muzero(
    model_structure='mlp_model',
    observation_space_dimensions=env.observation_space.shape[0],  # Use the length of the 1D observation space
    action_space_dimensions=env.action_space.n,
    state_space_dimensions=61,
    hidden_layer_dimensions=126,
    number_of_hidden_layer=4,
    k_hypothetical_steps=10,
    learning_rate=0.01,
    optimizer="adam",
    lr_scheduler="cosineannealinglr",
    loss_type="general",
    num_of_epoch=1000,
    device="cuda",
    type_format=torch.float32,
    load=False,
    use_amp=False,
    bin_method="uniform_bin",
    bin_decomposition_number=10,
    priority_scale=0.5,
    rescale_value_loss=1
)

# Initialize the 'demonstration_buffer', 'replay_buffer', and 'mcts' as per your requirements

# Create the 'gameplay' instance for your custom environment
gameplay = Game(
    limit_of_game_play=500,
    gym_env=env,
    discount=mcts.discount,
    observation_dimension=env.observation_space.shape[0],  # Use the length of the 1D observation space
    action_dimension=env.action_space.n,
    rgb_observation=False,  # Change to True if you have RGB observations
    action_map={i: i for i in range(env.action_space.n)},  # Modify as needed
    priority_scale=muzero.priority_scale
)


# Train your model using the 'learning_cycle' function with your adapted environment
epoch_pr, loss, reward, learning_config = learning_cycle(
    number_of_iteration=1000,
    number_of_self_play_before_training=10,
    number_of_training_before_self_play=1,
    model_tag_number=450,
    temperature_type="static_temperature",
    verbose=True,
    number_of_worker_selfplay=0,
    muzero_model=muzero,
    gameplay=gameplay,
    monte_carlo_tree_search=mcts,
    replay_buffer=replay_buffer
)
TypeError                                 Traceback (most recent call last)
[<ipython-input-22-29674da447ad>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 # Initialize the MuZero model with appropriate parameters for your environment
---> 10 muzero = Muzero(
     11     model_structure='mlp_model',
     12     observation_space_dimensions=env.observation_space.shape[0],  # Use the length of the 1D observation space

3 frames
[/content/Stochastic-muzero/muzero_model.py](https://localhost:8080/#) in obs_space(self, obs)
    492             return int(sum(checker(i) for i in obs))
    493         else:
--> 494             return int(checker(obs))
    495 
    496     def one_hot_encode(self, action, counter_part):

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Feel free to run the entire reproducible example in this google colab

Stochastic MuZero for Simultaneous-Move Games

Hello,

I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.

I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?

Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.

Dependency Bug

Cannot build the docker container.
It results in this error:
278.6 ERROR: Ignored the following yanked versions: 0.1.63, 0.4.0, 0.4.15
278.6 ERROR: Ignored the following versions that require a different python version: 0.7 Requires-Python >=3.6, <3.7; 0.8 Requires-Python >=3.6, <3.7; 8.19.0 Requires-Python >=3.10
278.6 ERROR: Could not find a version that satisfies the requirement jaxlib==0.3.24; extra == "all" (from gymnasium[all]) (from versions: 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.16, 0.4.17, 0.4.18, 0.4.19, 0.4.20, 0.4.21, 0.4.22, 0.4.23)
278.6 ERROR: No matching distribution found for jaxlib==0.3.24; extra == "all"

Questions about is_chance label assignment

When I run monte_carlo_tree_search.py, I found the decision node and chance node do not alternate within a search path.

Normally, we would expect a search path to look like this: [decision node, chance node, decision node, chance node, ...].
However, after running the code, the result is: [decision node, decision node, chance node, chance node, ...].

I believe the issue arises within the create_new_node_in_the_chosen_node_with_action_and_policy function. We should assign the value of is_child_chance to self.node.is_chance instead of self.node.children[i].is_chance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.