Giter VIP home page Giter VIP logo

rl-starter-files's Introduction

RL Starter Files

RL starter files in order to immediatly train, visualize and evaluate an agent without writing any line of code.

These files are suited for minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms.

Features

  • Script to train, including:
    • Log in txt, CSV and Tensorboard
    • Save model
    • Stop and restart training
    • Use A2C or PPO algorithms
  • Script to visualize, including:
    • Act by sampling or argmax
    • Save as Gif
  • Script to evaluate, including:
    • Act by sampling or argmax
    • List the worst performed episodes

Installation

  1. Clone this repository.

  2. Install minigrid environments and torch-ac RL algorithms:

pip3 install -r requirements.txt

Note: If you want to modify torch-ac algorithms, you will need to rather install a cloned version, i.e.:

git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .

Example of use

Train, visualize and evaluate an agent on the MiniGrid-DoorKey-5x5-v0 environment:

  1. Train the agent on the MiniGrid-DoorKey-5x5-v0 environment with PPO algorithm:
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

  1. Visualize agent's behavior:
python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

  1. Evaluate agent's performance:
python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Note: More details on the commands are given below.

Other examples

Handle textual instructions

In the GoToDoor environment, the agent receives an image along with a textual instruction. To handle the latter, add --text to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-GoToDoor-5x5-v0 --model GoToDoor --text --save-interval 10 --frames 1000000

Add memory

In the RedBlueDoors environment, the agent has to open the red door then the blue one. To solve it efficiently, when it opens the red door, it has to remember it. To add memory to the agent, add --recurrence X to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-RedBlueDoors-6x6-v0 --model RedBlueDoors --recurrence 4 --save-interval 10 --frames 1000000

Files

This package contains:

  • scripts to:
    • train an agent
      in script/train.py (more details)
    • visualize agent's behavior
      in script/visualize.py (more details)
    • evaluate agent's performances
      in script/evaluate.py (more details)
  • a default agent's model
    in model.py (more details)
  • utilitarian classes and functions used by the scripts
    in utils

These files are suited for minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms by modifying:

  • model.py
  • utils/format.py

scripts/train.py

An example of use:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The script loads the model in storage/DoorKey or creates it if it doesn't exist, then trains it with the PPO algorithm on the MiniGrid DoorKey environment, and saves it every 10 updates in storage/DoorKey. It stops after 80 000 frames.

Note: You can define a different storage location in the environment variable PROJECT_STORAGE.

More generally, the script has 2 required arguments:

  • --algo ALGO: name of the RL algorithm used to train
  • --env ENV: name of the environment to train on

and a bunch of optional arguments among which:

  • --recurrence N: gradient will be backpropagated over N timesteps. By default, N = 1. If N > 1, a LSTM is added to the model to have memory.
  • --text: a GRU is added to the model to handle text input.
  • ... (see more using --help)

During training, logs are printed in your terminal (and saved in text and CSV format):

Note: U gives the update number, F the total number of frames, FPS the number of frames per second, D the total duration, rR:μσmM the mean, std, min and max reshaped return per episode, F:μσmM the mean, std, min and max number of frames per episode, H the entropy, V the value, pL the policy loss, vL the value loss and the gradient norm.

During training, logs are also plotted in Tensorboard:

scripts/visualize.py

An example of use:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script displays how the model in storage/DoorKey behaves on the MiniGrid DoorKey environment.

More generally, the script has 2 required arguments:

  • --env ENV: name of the environment to act on.
  • --model MODEL: name of the trained model.

and a bunch of optional arguments among which:

  • --argmax: select the action with highest probability
  • ... (see more using --help)

scripts/evaluate.py

An example of use:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script prints in the terminal the performance among 100 episodes of the model in storage/DoorKey.

More generally, the script has 2 required arguments:

  • --env ENV: name of the environment to act on.
  • --model MODEL: name of the trained model.

and a bunch of optional arguments among which:

  • --episodes N: number of episodes of evaluation. By default, N = 100.
  • ... (see more using --help)

model.py

The default model is discribed by the following schema:

By default, the memory part (in red) and the langage part (in blue) are disabled. They can be enabled by setting to True the use_memory and use_text parameters of the model constructor.

This model can be easily adapted to your needs.

rl-starter-files's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-starter-files's Issues

Question: Did anyone solve MiniGrid-DoorKey-8x8-v0?

Hi all,

Did anyone solve the MiniGrid-DoorKey-8x8-v0 environment with the PPO algorithm and if so, with which hyperparameters, environment steps and for how many frames did you run this?

Thanks! :)

Kind regards,

Erik

Training broken on MiniGrid envs?

With the latest clone of this repo, installing from requirements.txt:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --save-interval 10 --frames 8000000

Namespace(algo='ppo', batch_size=256, clip_eps=0.2, discount=0.99, entropy_coef=0.01, env='MiniGrid-DoorKey-5x5-v0', epochs=4, frames=8000000, frames_per_proc=None, gae_lambda=0.95, log_interval=1, lr=0.0007, max_grad_norm=0.5, mem=False, model=None, optim_alpha=0.99, optim_eps=1e-05, procs=16, recurrence=1, save_interval=10, seed=1, tb=False, text=False, value_loss_coef=0.5)

Traceback (most recent call last):
  File "/home/maximecb/Desktop/rl-starter-files/scripts/train.py", line 117, in <module>
    acmodel = utils.load_model(model_dir)
  File "/home/maximecb/Desktop/rl-starter-files/utils/save.py", line 15, in load_model
    model = torch.load(path)
  File "/home/maximecb/.local/lib/python3.6/site-packages/torch/serialization.py", line 366, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'storage/MiniGrid-DoorKey-5x5-v0_ppo_seed1_19-04-30-15-38-46/model.pt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/maximecb/Desktop/rl-starter-files/scripts/train.py", line 120, in <module>
    acmodel = ACModel(obs_space, envs[0].action_space, args.mem, args.text)
  File "/home/maximecb/Desktop/rl-starter-files/model.py", line 58, in __init__
    nn.Linear(self.embedding_size, 64),
  File "/home/maximecb/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 56, in __init__
    self.reset_parameters()
  File "/home/maximecb/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 59, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "/home/maximecb/.local/lib/python3.6/site-packages/torch/nn/init.py", line 290, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

Not learning in large 16x16 gridworld

I tried on 16x16 gridworld, and it seems the agent is not learning (can't reach the goal), even after I changed the hidden dimension of network to 128. Any idea how to make this work?

How to use RGB image as input instead of original observation? @Questions

Hi there,

rl_starter-files is really a great project which makes it much easier for people to analyze their RL algorithms quickly, thanks very much for this great contribution.!

I'm currently going to learn the state information from the RBG image instead of the original observation in gym-minigrid environment, I noticed that in the last section of the readme file of rl_stater_files, it seems to choose image as input, but when I read the code of this project, it seems not. I'm not sure about this, could u please give me some help:

  1. How to use the image as input in rl-starter-files?

  2. Do you have some code examples about that?

Thanks very much, wish u all the best~

model

VIsualization script problem when dealing with cpu only machine

Salut Lucas!

I found this error while trying the visualization script using only cpu.

Traceback (most recent call last):
  File "/envs/pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/envs/pytorch/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "rl-starter-files/scripts/visualize.py", line 55, in <module>
    device=device, argmax=args.argmax, use_memory=args.memory, use_text=args.text)
  File "rl-starter-files/utils/agent.py", line 25, in __init__
    self.acmodel.load_state_dict(utils.get_model_state(model_dir))
  File "rl-starter-files/utils/storage.py", line 46, in get_model_state
    return get_status(model_dir)["model_state"]
  File "rl-starter-files/utils/storage.py", line 32, in get_status
    return torch.load(path)
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 592, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 851, in _load
    result = unpickler.load()
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 843, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 832, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Tensorboard not working

I've installed all the packages in miniconda called "pytorch" and run all. It all worked until trying tensorboard:

I installed tensorboard with:
pip3 install tensorboardX
And then added --tb at the end of training line code but got a segmentation fault:
(pytorch) fabian_57@Gregor:~/python_libraries/pytorch-a2c-ppo$ python3 -m train --algo ppo --env MiniGrid-DoorKey-6x6-v0 --no-instr --no-mem --model DoorKey-6x6-ppo --save-interval 10 --tb Segmentation fault (core dumped)
Also when trying to view some results i got:
(pytorch) fabian_57@Gregor:~/python_libraries/pytorch-a2c-ppo$ tensorboard --logdir storage tensorboard: command not found
Even if with pip3 list i got tensorboardX as installed!

Do i really have to install tensorflow and then tensorboardX to make it work?

Data mixed from different parallel environments

https://github.com/lcswillems/torch-rl/blob/c33bf422aad70be89498fc712a7bed56aa2512aa/torch_rl/torch_rl/algos/base.py#L126

I think the data from different environments are getting mixed here.
the "preprocessed_obs" seems to receive observations from the parallel environment and then it gets forwarded to the model.
My understanding was observations from a specific environment should only get to the model and then based on the model's prediction you would select an action for that specific environment. But it seems all the observations from the parallel environments are forwarded to the model.
please correct me if I am wrong.

Support for continuous action spaces

Hi Lucas,

In the readme you note that your implementation supports continuous action space, but after reading your code I haven't encountered any special handling of a continuous action space. Does your code currently supports continuous action spaces? If not, can you please add this important feature?

Thanks,
Ori

CartPole-v1 does not run

python3 -m scripts.train --algo a2c --env CartPole-v1

Namespace(algo='a2c', batch_size=256, clip_eps=0.2, discount=0.99, entropy_coef=0.01, env='CartPole-v1', epochs=4, frames=10000000, frames_per_proc=None, gae_lambda=0.95, log_interval=1, lr=0.0007, max_grad_norm=0.5, model=None, no_instr=False, no_mem=False, optim_alpha=0.99, optim_eps=1e-05, procs=16, recurrence=1, save_interval=0, seed=1, tb=False, value_loss_coef=0.5)

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/simon/pytorch-a2c-ppo-git/scripts/train.py", line 106, in 
    preprocess_obss = utils.ObssPreprocessor(model_dir, envs[0].observation_space)
  File "/home/simon/pytorch-a2c-ppo-git/utils/format.py", line 40, in __init__
    "image": obs_space.spaces['image'].shape,
AttributeError: 'Box' object has no attribute 'spaces'

handling the end of the episode

Here is a very trick RL detail which is often overlooked. I think it can be important when your episodes are not very long and your discount is large.

When the advantage is estimated for the last step of the episode, the current code computes it as A_t = r_t - V_t. One can argue that for modelling infinite episodes a more appropriate estimate is A_t = r_t + \gamma V_{t+1} - V_t. It is used broadly, e.g. in the PPO paper.

There is nothing too wrong with A_t = r_t - V_t if we optimize for finite episodes, or if we have a genuine episode termination (which is the case in Baby AI), so I guess I will just leave this here as food for thought.

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

Whenever I try to use the visualize script on memory dependent environments and agents, I get the following error:

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wen-chung/rl-starter-files/scripts/visualize.py", line 75, in
action = agent.get_action(obs)
File "/home/wen-chung/rl-starter-files/utils/agent.py", line 48, in get_action
return self.get_actions([obs])[0]
File "/home/wen-chung/rl-starter-files/utils/agent.py", line 36, in get_actions
dist, _, self.memories = self.acmodel(preprocessed_obss, self.memories)
File "/home/wen-chung/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/wen-chung/rl-starter-files/model.py", line 88, in forward
hidden = self.memory_rnn(x, hidden)
File "/home/wen-chung/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/wen-chung/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 955, in forward
self.bias_ih, self.bias_hh,
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

Here is the command I'm running to reproduce the above error:
python3 -m scripts.visualize --env MiniGrid-Empty-RandomGoal-16x16-v0 --model A2CMiniGrid-Empty-RandomGoal-16x16-v0-Int1-Fr10M-LSTM4-Test --memory

How do I resolve this issue?

Thanks,
Andy Cheng

Can't run example of use

C:\Users\matsa\torch-ac>python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000
C:\Users\matsa\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe: Error while finding module specification for 'scripts.train' (ModuleNotFoundError: No module named 'scripts')

ZeroDivisionError when running with FullyObsWrapper

Hi,

First of all, thank you for this great project!

I am trying to work with a fully observable environment and I am doing so by making the following changes to env.py:

def make_env(env_key, seed=None): env = gym.make(env_key) env = gym_minigrid.wrappers.FullyObsWrapper(env) env.reset() env.seed(seed) return env

When I run this on the MiniGrid-DoorKey-5x5-v0, MiniGrid-Empty-5x5-v0 and
MiniGrid-Empty-6x6-v0 environments - I think all the small environments - I get the following error:
ZeroDivisionError: float division by zero

This seems to happen during the initialisation of the acmodel:
acmodel = ACModel(obs_space, envs[0].action_space, args.mem, args.text)

I tried this for the MiniGrid-DoorKey-8x8-v0 environment as well and here I do not get this error. Any idea what this could be and how we could solve it?

Kind regards,
Erik

Won't replay RNN policies using script.visualize

File "/home/rl-starter-files/utils/agent.py", line 24, in __init__ self.acmodel.load_state_dict(utils.get_model_state(model_dir))
  File "/home/miniconda3/envs/ml/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ACModel:

Unexpected key(s) in state_dict: "memory_rnn.weight_ih", "memory_rnn.weight_hh", "memory_rnn.bias_ih", "memory_rnn.bias_hh". 

Error: No module named scripts.train

I followed all the steps mention on your repo and also followed instructions discussed in Issue#21. However, when I try to execute the following line to train the agent I get an error.

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

Error: No module named scripts.train

Broken pipe when training a model on CPU

Hi,

I followed the instructions in README.md to train a A2C agent in DoorKey environment using the following command (Python 3.7.3) in Ubuntu 18.04 with 8 CPUs.

python scripts/train.py --algo a2c --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The train went well initially but ended with a BrokenPipeError exception that crashes the training process. The error message is copied below. According to scripts/train.py, the above command will run with 16 processes. Initially, I thought the error was because the training initialized too many processes. But even when setting --procs=6, the same exception happened again. Only when setting --procs=1, the training ran successfully. Is there any special setting I should do to enable the training with multi-processes?

(Just realized that the error roots in torch_ac)

Error Message

Exception ignored in: <function ParallelEnv.__del__ at 0x7f2df3411a60>
Traceback (most recent call last):
  File "~/torch-ac/torch_ac/utils/penv.py", line 41, in __del__
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 206, in send
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe

Support for MiniWorld (3D indoor environment)?

Hi Lucas,

I've been working on my 3D indoor environment. It's still very basic, but it works, and I just made the repository public: https://github.com/maximecb/gym-miniworld

I've tried to adjust your pytorch-a2c-ppo code to work with MiniWorld, but ran into issues. One is that MiniWorld produces observations which are not dictionary-based, and it was awkward to support this (the obs are just 60x80x3 RGB arrays). The other is that I only got something like 16 frames per second while training with 16 processes, and I have no idea why.

Would you have time to take a look? It would be great if it could work with your RL code out of the box. Right now I'm using my own fork of ikostrikov's, but there are multiple other issues with that code, one of which is that the performance when visualizing trained agents doesn't match the performance reported while training.

Cannot run the scripts

Traceback (most recent call last):
File "/opt/anaconda3/envs/env1/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/anaconda3/envs/env1/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/../rl_in_polycraft_domain/base_environment/rl-starter-files/scripts/train.py", line 5, in
import torch_ac
ModuleNotFoundError: No module named 'torch_ac'

I have installed torch-ac as mentioned in the repository. Still, it throws this error. I will appreciate any help.

Problem with 'agent.py' for gym.spaces.Box based gym environments

Hi there :)

When I run the evaluate script with some gym environments (e.g. procgen coinrun) I get the following error:

AttributeError: 'function' object has no attribute 'vocab'

This appears to be because when the Agent class (in agent.py) is initialized the vocab attribute only gets created by get_obss_preprocessor (in format.py) for environments that have dictionaries as their observation space.

This bug can be temporarily fixed by putting the line that throws the error in an if statement:
[agent.py]

if isinstance(obs_space, gym.spaces.Dict) and list(obs_space.spaces.keys()) == ["image"]:
    self.preprocess_obss.vocab.load_vocab(utils.get_vocab(model_dir))

I'm not confident I understand the code well enough to submit a PR, but the above might solve it.

Thanks!
Gus

Training from image

I wanted to try to train the model from the environment rgb image.
I added a wrapper from miniworld like that:

resolution = 32
class ImageWrapper(gym.core.ObservationWrapper):

def __init__(self, env):
    super().__init__(env)
    self.__dict__.update(vars(env))  # hack to pass values to super wrapper
    self.observation_space = gym.spaces.Box(
        low=0,
        high=255,
        shape=(resolution, resolution, 3),  # number of cells
        dtype='uint8'
    )
def reset(self):
    obs = self.env.reset()
    img = self.env.render(mode="rgb_array")
    obs["image"] = skimage.transform.resize(img, (resolution, resolution), anti_aliasing=False)
    return obs

def step(self, action):
    obs, reward, done, info = self.env.step(action)
    img = self.env.render(mode="rgb_array")
    obs["image"] = skimage.transform.resize(img, (resolution, resolution), anti_aliasing=False)
    return obs, reward, done, info

I know this solution is not so elegant but it's just for testing.
I tried to change the image size from 7 to 32px.
But results are always similar to:
screenshot from 2019-03-01 14-33-33

When the image size is 7px, it should obtains similar results as without using my wrapper? However, my results are much worse. Did any one try using a rgb image?

Segmentation fault: 11

Hello there!

I've just cloned the repository, installed the environment, and checking if it's working by launching (as described in README):

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

I get the following output, with a Segmentation Fault 11 error message.

Namespace(algo='ppo', batch_size=256, clip_eps=0.2, discount=0.99, entropy_coef=0.01, env='MiniGrid-DoorKey-5x5-v0', epochs=4, frames=80000, frames_per_proc=None, gae_lambda=0.95, log_interval=1, lr=0.001, max_grad_norm=0.5, mem=False, model='DoorKey', optim_alpha=0.99, optim_eps=1e-08, procs=16, recurrence=1, save_interval=10, seed=1, text=False, value_loss_coef=0.5)

Device: cpu

Environments loaded

Training status loaded

Observations preprocessor loaded
Model loaded

ACModel(
  (image_conv): Sequential(
    (0): Conv2d(3, 16, kernel_size=(2, 2), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 32, kernel_size=(2, 2), stride=(1, 1))
    (4): ReLU()
    (5): Conv2d(32, 64, kernel_size=(2, 2), stride=(1, 1))
    (6): ReLU()
  )
  (actor): Sequential(
    (0): Linear(in_features=64, out_features=64, bias=True)
    (1): Tanh()
    (2): Linear(in_features=64, out_features=7, bias=True)
  )
  (critic): Sequential(
    (0): Linear(in_features=64, out_features=64, bias=True)
    (1): Tanh()
    (2): Linear(in_features=64, out_features=1, bias=True)
  )
)

Optimizer loaded

**Segmentation fault: 11**

Does anybody know what is going here?
Maybe @maximecb or @lcswillems you know if it's still related to this issue -> Farama-Foundation/Minigrid#77

Thanks in advance

TypeError: tuple indices must be integers or slices, not str

Hi,
I get an error after executing "!python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000".

Error:
Optimizer loaded

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/TFM/code/rl-starter-files/scripts/train.py", line 155, in
exps, logs1 = algo.collect_experiences()
File "/usr/local/lib/python3.7/dist-packages/torch_ac/algos/base.py", line 129, in collect_experiences
preprocessed_obs = self.preprocess_obss(self.obs, device=self.device)
File "/content/drive/MyDrive/TFM/code/rl-starter-files/utils/format.py", line 30, in preprocess_obss
"image": preprocess_images([obs["image"] for obs in obss], device=device),
File "/content/drive/MyDrive/TFM/code/rl-starter-files/utils/format.py", line 30, in
"image": preprocess_images([obs["image"] for obs in obss], device=device),
TypeError: tuple indices must be integers or slices, not str

Handle full observability

Hi Lucas!

If we use the FullyObsWrapper on a Minigrid environment then the format of observation_space will go from Dict(image:Box(7, 7, 3)) to Box(19, 19, 3). (19 is an example)

In utils/format.py the get_preprocessor function first tries if re.match("MiniGrid-.*", env_id)
and assumes that every MiniGrid environment will be partially observable and won't be able to handle a fully observable minigrid environment.

We could just change the order of the if ... and elif ... to make it work, but I am not sure this would be optimal, this is why I prefer opening an issue.

Thanks :)

Issue with relative imports

I'm trying to setup automated testing with CircleCI and running into issues with the absolute imports used by this package, eg:

https://github.com/lcswillems/pytorch-a2c-ppo/blob/master/torch_rl/torch_rl/__init__.py#L1
https://github.com/lcswillems/pytorch-a2c-ppo/blob/master/torch_rl/torch_rl/algos/base.py#L5

Example error: https://circleci.com/gh/maximecb/baby-ai-game/4

 File "/home/circleci/project/babyai/utils/format.py", line 6, in <module>
    import torch_rl
  File "/home/circleci/.local/lib/python3.5/site-packages/torch_rl/__init__.py", line 1, in <module>
    from torch_rl.algos import A2CAlgo, PPOAlgo
ImportError: No module named 'torch_rl.algos'

So, as you can see torch_rl imports, but torch_rl.algos doesn't resolve within the torch_rl package. I would suggest either switching to relative imports, eg: from .algos import A2CAlgo, PPOAlgo, or modifying setup.py to export the subpackages as well.

You can take a look at how the baby-ai-game package exports subpackage names: https://github.com/maximecb/baby-ai-game/blob/master/setup.py#L8

AttributeError

Hi there,
I have this issue when running 'python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000' on macOS.

Error
AttributeError: Can't pickle local object 'DoorKeyEnv.init..'

It shows after this line:
File"/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj)

Logging intermediate/cumulative rewards

Although MiniGrid standardly uses a single discounted reward at the end of an episode, I want to experiment with intermediate rewards. Specifically, I use a RewardWrapper that replaces the discounted final reward with a non-discounted final reward (of 1), and instead yields a small negative reward at every step to encourage faster completion.

I don't see their effect on the logged values. Browsing through the code, I encountered in BaseAlgo:

self.mask = 1 - torch.tensor(done, device=self.device, dtype=torch.float)

and

self.log_episode_return *= self.mask
self.log_episode_reshaped_return *= self.mask
self.log_episode_num_frames *= self.mask

Do I understand correctly that this is to filter out non-final steps from the logged statistics - a trick that works if all rewards are issued at the final step of episodes, but doesn't work if we want statistics on the cumulative rewards during one episode? I'd imagine that fixing that is non-trivial, as endings of episodes do not coincide with model update steps and we would need a way to count the cumulative rewards per process asynchronously with the training loop?

Also, do I understand correctly that this only affects the logs and that the intermediate rewards are still taken into account during training?

the initial memory of chunks

I have a question about this implementation, in order to ask which I guess I have to introduce a bit of terminology. At each PPO-step we use --procs processes to produce a rollout of --frames-per-proc steps. All these rollouts are then concatenated. Several epochs of optimization are then performed. At each epoch we split the concatenated rollouts in chunks of --recurrence steps. For each such chunk we initialize the memory LSTM with the values remembered from the rollout stage or with values from the previous PPO epoch.

My question is as follows. In this line we update the memory state for the next epoch. We don't however perform an update when i == self.recurrence - 1. That means that some of the memories in exps.memory, in particular the ones at indices 0, self.recurrence, 2 * self.recurrence, will be stale.

Is that correct? Perhaps memory should not be updated at all in PPO?

Evaluate not working with CUDA

Whenever I try to use the evaluate script on memory dependent environments, I get the following error:

  File "C:\User\rl-starter-files-master\scripts\evaluate.py", line 73, in <module>
    actions = agent.get_actions(obss)
  File "C:\User\rl-starter-files-master\utils\agent.py", line 36, in get_actions
    dist, _, self.memories = self.acmodel(preprocessed_obss, self.memories)
  File "C:\User\Anaconda3\envs\SSASC\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\User\rl-starter-files-master\model.py", line 88, in forward
    hidden = self.memory_rnn(x, hidden)
  File "C:\User\Anaconda3\envs\SSASC\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\User\Anaconda3\envs\SSASC\lib\site-packages\torch\nn\modules\rnn.py", line 944, in forward
    self.bias_ih, self.bias_hh,
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mat2'

Here is the command I'm running to reproduce the above error:

python3 -m scripts.evaluate --env MiniGrid-RedBlueDoors-6x6-v0 --model RedBlueDoors --memory

Thanks!
Lili

Error in Mac OS X

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey
Device: cpu

Environments loaded

Agent loaded

Device: cpu

Device: cpu

Device: cpu

Device: cpu

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Device: cpu

Device: cpu

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
Device: cpu

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
raise RuntimeError('''

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
exitcode = _main(fd, parent_sentinel)
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in check_not_importing File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
main
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
super().__init__(process_obj)

File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
Device: cpu

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Device: cpu

_fixup_main_from_name(data['init_main_from_name'])

File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 234, in prepare
_fixup_main_from_name(data['init_main_from_name'])
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 258, in _fixup_main_from_name
main_content = runpy.run_module(mod_name,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 47, in
env = ParallelEnv(envs)
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 34, in init
p.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
return Popen(process_obj)

File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/patras2/rl-starter-files/scripts/evaluate.py", line 66, in
obss = env.reset()
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 40, in reset
results = [self.envs[0].reset()] + [local.recv() for local in self.locals]
File "/usr/local/lib/python3.8/site-packages/torch_ac/utils/penv.py", line 40, in
results = [self.envs[0].reset()] + [local.recv() for local in self.locals]
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Bug with fully observable small enviroments

The size of the observation space of mini-gridworlds now depends on the size of the environment for fully observational spaces instead of always being (7,7,3). This causes errors for environments which have a dimension of 6 or smaller due to this line:

self.image_embedding_size = ((n-1)//2-2)*((m-1)//2-2)*64
since the embedding size is set to zero. This results in a division by zero error:

  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/nix/cirl/scripts/train.py", line 121, in <module>
    acmodel = ACModel(obs_space, envs[0].action_space, args.mem, args.text)
  File "/Users/nix/cirl/model.py", line 63, in __init__
    nn.Linear(self.embedding_size, 64),
  File "/Users/nix/cirl/env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 56, in __init__
    self.reset_parameters()
  File "/Users/nix/cirl/env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 59, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "/Users/nix/cirl/env/lib/python3.7/site-packages/torch/nn/init.py", line 291, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

Incompatible with MiniGrid > 1.1.0

This project and torch-ac don't support the gym API upgrades enabled in MiniGrid >= 1.2.0.
MiniGrid 1.1.0 works fine, so setting requirements.txt as follows works:

torch-ac>=1.1.0
gym-minigrid==1.1.0
tensorboardX>=1.6
numpy>=1.3

Multiprocessing error without if-clause protection on Windows

Hi,

I came across an issue when training & evaluating a model on Windows 10. Specifically, if the code in train.py, evaluate,py, and visualize.py is not wrapped in some form of a main function, which is then called with if __name__ == '__main__': main(), then the code will be run multiple times and until a multiprocessing runtime error occurs.

This can be solved, as stated in the Pytorch doc here, by wrapping the code in a function, and calling it when needed.

Wasn't sure if you wanted to address the issue as I don't know how the added code will impact Linux & Mac users, but wanted to give you a heads up that this issue exists for Windows users.

Running scripts.train on Windows 10

Hi there,

I followed the instructions to install the dependencies for this project, and once I get to the step where I should run python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000 for example, the terminal enters a new line and no model is trained.

Here are the steps I took to get to this point:

  1. OS is Windows 10
  2. Created a new conda env with basic science packages at the start
  3. Installed latest version of pytorch
  4. Installed gym-minigrid from source
  5. Installed torch-ac from source
  6. Cloned rl-starter-files & ran pip3 install -r requirements.txt in root dir of repo
  7. Ran python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000 in root dir of repo, and saw no output

Help would be greatly appreciated!

Training instructions unclear

Hi Lucas,

The training instructions are a bit unclear. First you suggest to install torch-ac using pip. Then you suggest people run:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10

There's a step missing which would be to clone your rl-starter-files repository.

Personally, I don't love the splitting into two repos. I find it complicates things a bit. Now there are two repos and more steps involved. You also have to keep in mind, it's likely that people will want to modify the PPO training code to add auxiliary losses or some such. I would encourage you to keep things in one repo and to remove levels of abstraction, reduce the amount of files and the amount of code, etc. Just my two cents.

Trying to give training hints

Hi Lucas, this is not an issue, but more hoping you might have a moment to share some insight. Firstly, thanks for the great package. One of the few RL frameworks that support Dict Spaces!

I am using your torch-rl and this package as a basis of something I'm playing with. The environment give spares rewards, and hence can take a long time to train. I have the benefit of having some known data to train on and so can know in advance what a 'good' move is. Hence I have extended my environment to supply an extra observation space called hint. I have then modified the end of the forward method in model.py to have

        x = self.actor(embedding)

        # XX                                                                                                                                                            
        if self.hints:
            x.zero_()
            indexes = obs.hint.type(torch.long).view(-1,1)
            x.scatter_(1, indexes, 1)
            dist = Categorical(x)
        ###                                                                                                                                                             
        else:
            dist = Categorical(logits=F.log_softmax(x, dim=1))

        x = self.critic(embedding)
        value = x.squeeze(1)

        return dist, value, memory

My theory was that if I can manually set the output of the actor to be a 'good' state that it would backpropogate it and learn quicker. A sort of hybrid between RL and supervised learning.

Sure enough, when I run it, the rewards go right up from the start, as the agent is always making 'good' moves, and hence gets high rewards. But... it would appear the network is not actually learning this. If I disable the hints and continue training, then it appears the training so far has learned nothing at all, and it makes very obvious bad moves.

So my question is this... does my theory make sense? Do you think I should be able to get my 'hints' to be propagated back into the network to learn? Or am I missing something fundamental here. I'm pretty new to pytorch, having mainly used tensorflow and keras in the past.

trying to visualize the example model

Hi Lucas,

I just installed gym-minigrid and torch-ac, cloned this repo and tried to run through the basic example in the README, but I get an error when I try visualizing. The training works fine:

 python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

[...]

U 40 | F 081920 | FPS 2519 | D 29 | rR:μσmM 0.93 0.02 0.87 0.97 | F:μσmM 19.3 6.4 8.0 35.0 | H 1.344 | V 0.840 | pL -0.001 | vL 0.001 | ∇ 0.028
Status saved

However, when I try to visualize it, I get the following error:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Device: cuda

Environment loaded

Agent loaded

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "[...]/rl-starter-files/scripts/visualize.py", line 71, in <module>
    if renderer.window is None:
AttributeError: 'numpy.ndarray' object has no attribute 'window'

Do you have any suggestions? Thanks a lot!

-Adrian

Add an open source license

Could you please add an open source license to this repo? Usually this is in a file named LICENSE. I would recommend something very liberal like the MIT one. You can put your own name as the copyright holder. It's just that without this, your code can't legally be assumed to be open source, AFAIK.

Possible bug of the calculation precision

Hello, Lucas
First, thanks for your code. Now I'm using your ppo algorithm for my project. But I found a possible issue of the number precision, which may cause the training process diverged.

Here is the problem. My original internal reward (each step) is small, like 0.02, 0.001, and the max return is 0.08 for the environment. Strangely, the training does not converge at all, and this problem is solved by multiply the reward by a factor of ten.

I used the OpenAI's baselines when I was using tensorflow. The precision was not a problem, and the small reward can also work well. Now when I ported to Pytorch, The policy and the agent are basically the same (torch and TF). I do not know how to handle this possible issue.

Here are the logs of the original reward and the magnified reward by a factor of ten. You can check the mean return of each of them.
The first one is in the range of [0.0100, 0.012] and goes up and down.
reward_origin

Here is the magnified reward, and the return is increasing over time.
reward_magnified

I wonder why the small reward can make the training stage fail, so I open this issue to discuss it.

Thanks in advance.

TypeError: tuple indices must be integers or slices, not str

Hi, I have a little problem.
when I run

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

It goes like

        image": preprocess_images([obs["image"] for obs in obss], device=device),
TypeError: tuple indices must be integers or slices, not str

which happened here:

        def preprocess_obss(obss, device=None):
            return torch_ac.DictList({
                "image": preprocess_images([obs["image"] for obs in obss], device=device),
                "text": preprocess_texts([obs["mission"] for obs in obss], vocab, device=device)
            })
        preprocess_obss.vocab = vocab

How should I fix it? Thx

Module 'torch_rl' has no attribute 'RecurrentACModel'

When trying to run

(spinningup) Pablos-iMac:torch-rl pablo$ python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0

I get the following error

Traceback (most recent call last):
  File "/anaconda3/envs/spinningup/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/envs/spinningup/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/pablo/torch-rl/scripts/train.py", line 17, in <module>
    from model import ACModel
  File "/Users/pablo/torch-rl/model.py", line 17, in <module>
    class ACModel(nn.Module, torch_rl.RecurrentACModel):
AttributeError: module 'torch_rl' has no attribute 'RecurrentACModel'

I have not been able to find out what is going on.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.