farama-foundation / minigrid Goto Github PK
View Code? Open in Web Editor NEWSimple and easily configurable grid world environments for reinforcement learning
Home Page: https://minigrid.farama.org/
License: Other
Simple and easily configurable grid world environments for reinforcement learning
Home Page: https://minigrid.farama.org/
License: Other
Is there a way to receive an RBG observation instead of a matrix of values [0,3...].
Is there an option when I create the agent?
When using RGBImgObsWrapper, the green goal tile disappears from the partial observation once the agent reaches the goal tile.
To reproduce:
# manual_control.py, line 33
if done:
print('done!')
from PIL import Image
Image.fromarray(obs).show()
./manual_control.py --env MiniGrid-Empty-6x6-v0 --agent_view
Hi there! Thanks for you guys create this really interesting environments!
I am wondering why the observation_space of an observation-flatten environment contain an extra first dimension. The obs space is Box(1, xxx)
while the observation itself is a vector with shape (xxx, )
Doing this violates the rules of gym space, and may cause error in some application.
env = gym_minigrid.wrappers.FlatObsWrapper(gym.make("MiniGrid-Empty-8x8-v0"))
env.observation_space
Out[20]: Box(1, 2739)
env.reset().shape
Out[21]: (2739,)
In some application the observation is checked via:
obs = env.reset()
env.observation_space.contains(obs)
Out[23]: False
and thus this will cause problem.
Hello,
It seems to me that the sub-grid observed by the agent is somtimes wrong depending on what is the agent direction.
For example, assuming that we have a AGENT_VIEW_SIZE or 3. We will have a grid 3x3 around the agent and 9 elements in the observation array.
Here are the indexes of the elements in the grid visible by the agent that is facing up and is positioned in the center-bottom of the grid.
8 7 6
5 4 3
2 A 0
In this example shouldn't the index 8 of the observation array always be the element that the agent percieves at its top-left corner and the element of idex 0 always be the one at bottom-right (and so on..)?
It seems that in the current implementation of gen_obs_grid the positions of the elements in the observation view of the agent are not consistent with their position in the actual sub-grid generated as observation when it rotates.
`
topX, topY, botX, botY = self.get_view_exts()
grid = self.grid.slice(topX, topY, AGENT_VIEW_SIZE, AGENT_VIEW_SIZE)
for i in range(self.agent_dir + 1):
grid = grid.rotate_left()
`
I tried with a very naive and not optimized implementation to fix this issue:
`
topX, topY, botX, botY = self.get_view_exts()
grid = self.grid.slice(topX, topY, AGENT_VIEW_SIZE, AGENT_VIEW_SIZE)
grid_rotate_left_1 = grid
grid_rotate_left_2 = grid
grid_rotate_left_3 = grid
grid_rotate_left_1 = grid_rotate_left_1.rotate_left()
for i in range (2):
grid_rotate_left_2 = grid_rotate_left_2.rotate_left()
for i in range (3):
grid_rotate_left_3 = grid_rotate_left_3.rotate_left()
# agent facing right
if self.agent_dir == 0:
grid = grid_rotate_left_3
# agent facing left
elif self.agent_dir == 2:
grid = grid_rotate_left_1
# agent facing up
elif self.agent_dir == 3:
grid = grid_rotate_left_2
`
What do you think about this?
Thank you for your collaboration!
Pier
Hello,
the current Grid.process_vis is allowing the agent to see around walls if the wall is horizontal, relative to the agent.
This is due to the horizontal processing of the visible area cascading around the walled cells.
I have included an example of this light-curving vision and the tree node that shows how each cell mask is being activated (the first time)
I believe a potential fix of this is to use a sequence of beam emulations. it may be less computationally effective but does produce more accurate vision. Code attached
def border(grid, agent_pos):
if hasattr(grid,"sett"): return grid.sett
grid.sett=[]
for j in range(0,grid.height):
grid.sett.append([0-agent_pos[0], j])
grid.sett.append([grid.width-1-agent_pos[0], j])
for i in range(1-agent_pos[0], grid.width-1-agent_pos[0]):
grid.sett.append([i, grid.height-1])
#print(sett)
return grid.sett
def process_vis(grid, agent_pos):
mask = np.zeros(shape=(grid.width, grid.height), dtype=np.bool)
mask[agent_pos[0], agent_pos[1]] = True
for b in grid.border(agent_pos):
angle=math.atan2(b[1],b[0])
#print(angle)
beam_range=1
while(True):
x,y = round(agent_pos[0]+beam_rangemath.cos(angle)), round(agent_pos[1]-beam_rangemath.sin(angle))
#print("-",x,y)
if(x<0 or x>= grid.width or y<0 or y>= grid.height):
break
mask[x,y]=True
cell = grid.get(x, y)
if cell and not cell.see_behind():
break
beam_range+=1
for j in range(0, grid.height):
for i in range(0, grid.width):
if not mask[i, j]:
grid.set(i, j, None)
return mask
When using RGBImgPartialObsWrapper
I get an error:
File ".../lib/python3.7/site-packages/gym_minigrid/wrappers.py", line 200, in __init__
obs_shape = env.observation_space['image'].shape
TypeError: 'Dict' object is not subscriptable
I think it should be
obs_shape = env.observation_space.spaces['image'].shape
instead of
obs_shape = env.observation_space['image'].shape
I added colored floor tiles to one of your environments and it failed to render:
self.grid.set(i, j, Floor(color=self._rand_elem(colors)))
File "site-packages/gym/core.py", line 233, in render
return self.env.render(mode, **kwargs)
File "gym-minigrid/gym_minigrid/minigrid.py", line 1279, in render
highlight_mask=highlight_mask if highlight else None
File "gym-minigrid/gym_minigrid/minigrid.py", line 527, in render
tile_size=tile_size
File "gym-minigrid/gym_minigrid/minigrid.py", line 469, in render_tile
obj.render(img)
File "gym-minigrid/gym_minigrid/minigrid.py", line 177, in render
r.setLineColor(100, 100, 100, 0)
AttributeError: 'numpy.ndarray' object has no attribute 'setLineColor'
Am I using them wrong? Thanks!
I found a bug in the Dynamic Obstacles environemnt. When I run the script below I get an assertion error after not so many iterations.
import random
import time
import gym
from gym_minigrid import *
env = gym.make("MiniGrid-Dynamic-Obstacles-5x5-v0")
while True:
action = random.randint(0, 6)
observation, reward, done, info = env.step(action)
# env.render("human")
# time.sleep(1/20)
if done:
env.reset()
An optional observation variable that points to the direction of the first goal rather than the orientation of the agent. The observable image in itself provides very little information for the agent to advance ( even if a human looked at the partially observable grid, they would have to rely on mere random guessing ). This need for the agent to take arbitrary steps to discover the grid further could be removed.
The direction can be simply the tan inverse of the difference of goal and current position:
dir_radians = numpy.arctan( (goal[1] - agent_pos[1]) / (goal[0] - agent_pos[0]) )
It would allow for a greater range of use cases if there was an option that could be passed to train using RGB arrays instead of only the image encodings. The readme describes a way to do this with get_obs_render
, but it would be nice if it could be included as an argument.
Would you consider adding something like this?
Could anyone provide what type of performance they get on these domains with some of the provided baseline algorithms?
It is probably overkill to maintain something like a leaderboard, but it would be helpful to have some mechanism to confirm that the RL algorithms are behaving as expected and everything is installed correctly. Sometimes plots of reward vs episode number are provided in the readme. In addition, it would be cool to hear about any hyperparameters that were found to be particularly important for good performance.
For at least an order-of-magnitude example based on my installation, without making any changes to the repo, I saw convergence to ~0.9 mean reward in 130,000 timesteps when running this command:
python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 1 --algo a2c
which took about 5mins on my i7-6700 w/ 1060GTX, though I don't know whether the GPU was being used by default.
For the 8x8 empty domain, it took about 45 mins and 1.8M timesteps to converge to ~0.9 reward with this command:
python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-8x8-v0 --no-vis --num-processes 1 --algo a2c
Taking inspiration from the work done by @planetceres. I think it would be neat to have a new kind of tile/object which is lava, that is, the agent dies if it tries to go over it. This is useful for studying safety in RL. I would probably draw the tile in orange with some little triangular wave lines on it.
This should be accompanies by a MiniGrid-LavaCrossing-v0
environment where the agent has to get through a gap between lava tiles in order to get to the goal. The number of lava crossings should be configurable in the environment's constructor (ie: num_crossings
).
To see how tiles/objects are implemented, check out gym_minigrid/minigrid.py
For examples of how environments are implemented, see: gym_minigrid/envs
Environment: MiniGrid-Empty-5x5-v0
Wrappers: ImgObsWrapper
+ ViewSizeWrapper
When the agent_view_size
is even (e.g. 6 as in the example below), an agent observes:
######
######
######
X--###
---###
--^-##
instead of
######
######
######
X--###
---###
--^###
Legend: #
walls, -
empty, X
goal, ^
agent facing up (forward).
As you see, there's an empty cell on the right of an agent, while in reality (in environment's grid) there's a wall.
This particular behavior is the effect of this line of code, which "places" an agent himself here (or what he carries). Actually, there's no bug here.
But the real cause is in how agent's view rectangle is calculated in get_view_exts. There, topX
and topY
calculation doesn't take into consideration that in case of even agent_view_size
the number of cells on the left and on the right of an agent is different.
E.g. in case of agent_view_size = 6
, an agent views 3 cells on the left and 2 on the right (plus 1 he is at, which sums in 6): ---^--
. And for agent_view_size=7
, an agent views 3 cells both on the left and on the right: ---^---
Different number of rooms in a gridworld fashion (no of rooms: 1 ,2,3,4 . .. . )
Baseline - 2 room gridworld, 2X2, top row - red room, bottom row โ green room. No doors.
Different number of independent Agents. (number as input 1,2,3,4)
Baseline: 1 Agent at top left in the red room
Different number of balls. ( 1,2,3,4)
baseline: 1 ball at the top right in the red room
Goal Condition
Baseline: Move the ball to green room, and the robot should also be in green room.
Hi, what is required for fixing the numpy 1.15.4
dependency? I have a situation where gym-minigrid
downgrades the numpy
version installed by pytorch
via conda
.
Hello,
I'm using gym-minigrid and I need to get the rendering image, transform it and then display it.
After the "transformation", I get a np.array matrix with 3 channels (RGB). Is there a way to give this result to the environment (for instance the Render class ?) in order to display it ?
Thanks in advance !
PS: In other environment, I used to do something like
from gym.envs.classic_control import rendering
env_unwrapped = gym.make("...").env.unwrapped
img = env_unwrapped.ale.getScreenRGB2()
image_transformed = ....
if env_unwrapped.viewer is None:
env_unwrapped.viewer = rendering.SimpleImageViewer()
env_unwrapped.viewer.imshow(image_transformed)
return env_unwrapped.viewer.isopen
but apparently env_unwrapped
has no attribute viewer
.
I think it might be useful to implement the classic "four rooms" environment, which is used in many classic reinforcement learning papers.
The environment should ideally have configurable start and goal positions, as parameters to the constructor. If these are not specified, they should be chosen at random every time the environment is reset.
I'm opening an issue to keep track of potential improvements to a MiniGrid v2.0. These are things I would do differently if I had to do it again. I'm not sure if these will get merged in the current version of MiniGrid because there is a big risk of breaking existing code, and this is particularly sensitive since the people using this code are using it for research purposes, where replicability is important. What may eventually happen is that I create a MiniGrid2
repository.
Separate out the gridworld from the OpenAI Gym interface. As pointed out in #37, it would make sense to have a gridworld class separate from the OpenAI Gym environment class. This would help us support multi-agent type of setups. It might also be slightly cleaner. We are already part of the way there with the current Grid
class.
The agent should be treated like any other gridworld object. This again goes in line with multi-agent support. I think it would also be cleaner.
Observations should be encoded using a one-hot scheme rather than integers corresponding to each object type and color. The observations may not actually be any bigger in terms of bytes taken if we use a numpy array of bools (bits). This would likely be easier for a neural network to decode. (Won't be done, results inconclusive)
By default, observations should be tensors, so that the code works out of the box with most RL frameworks. Mission strings should be provided as part of the info
dict. The unfortunate truth is that OpenAI Gym has very poor support for any other kind of observation encoding.
Doors and locked doors should probably have the same object type. We could take advantage of a one-hot scheme here and have a "door" bit as well as a "locked" flag bit. Possibly, each object should have a method to produce its own encoding in terms of bits. (Done)
Some people have had difficulty installing PyQT. It might be nice to explore alternative rendering options. We could potentially generate graphics directly using 2D NumPy tensors, but we would still need some library to open display windows. I don't know what is the simplest Python option for that. The goal is for this package to work everywhere without any issues as much as possible.
Render current step, max steps, reward obtained in env.render('human')
.
Rename max_steps
to max_episode_steps
, which is more in keeping with OpenAI Gym conventions.
Other suggestions welcome.
Hello,
Is there a way to change the origin of observation? Currently the observation covers the 77 area "in the front" of the agent. I'm wondering if I can change it so that the observation will be a 77 area centered at the agent's position.
Does this environment support saving and restoring states?
For example, I want to save an exact map state in MiniGrid-Dynamic-Obstacles-5x5-v0 at some point and later I should be able to make the agent start from the saved state.
Gym's Atari environment has such a functionality: openai/gym#402 (comment)
Hey, very nice job!
I am wondering if you're planning to release a new environment for the firefighter problem, i.e., a grid world where a cell might have its state updated (burning, protected, none) after each iteration i.
(in a more simplistic configuration: a firefighter agent, an initial burning cell and a fixed object to protect)
Cheers!
Is there an easy way to change the grid between different episodes?
I am looking to recreate some of the dynamic room environments, i.e where a shortcut opens up or a goal changes location after a certain number of timesteps or episodes.
Hello,
I've been using gym-minigrid for a personal project and I need to add new kind of tiles/objects.
For example I'm adding a FoodSource object, which produce Food around. The agent have an energy level and doing actions use energy. Taking Food add energy to the agent.
My question is : are you interested in pull-requests for new kind of tiles/objects or do you prefer to keep gym-minigrid as it is ?
Thanks a lot for your work !
Antoine.
Hello,
RGBImgObsWrapper() seems to not extract the dictionary items anymore.
EDIT: Really great env nonetheless :-)
Hi:
It's really a great projet! After reading the realted issues about the FullyObs, i want to try it, but i don't know how to use the Class FullyObsWrapper in the files--'wrappers.py' . Should I import it or change the code of funciton 'gen_obs(self)' in the files--'minigrid.py' ?
Thank you!
Hello, I found a problem in the Dynamic-Obstacles environment.
The problem is that in env.step(action), the obstacles position is updated after calling the env.step() of the base class, and the observation is not updated with the new position of the obstacles. So, the returned observation by env.step() has the obstacles in the old position.
Example:
obs, reward, done, info = env.step(action) # in obs, obstacles are in the old position
pixmap = env.render('pixmap') # in pixmap, obstacles are in the new (correct) position
Workaround: manual update observation after env.step()
obs, reward, done, info = env.step(action)
obs = env.gen_obs()
Suggested fix:
In dynamicobstacles.py, change the return after the obstacles positions update from:
return obs, reward, done, info
to:
return self.gen_obs(), reward, done, info
Is there a way to get the position ( x, y) of the goal?
I could get which grid is my goal by this loop
for grid in env.grid.grid:
if grid is not None and grid.type == "goal":
print("This grid is my goal")
but I don't know how to map this to an (x,y) coordinates
I'm unable to train the model with the provided pytorch main.py
script with --num-processes
> 1.
When I run this command:
$ python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-8x8-v0 --no-vis --num-processes 5 --algo a2c
The error I get is:
obs.shape: (5, 1875)
shape_dim0: 1
current_obs.shape: torch.Size([5, 1, 1875])
current_obs.shape: torch.Size([5, 1, 1875])
Traceback (most recent call last):
File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 271, in <module>
main()
File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 98, in main
update_current_obs(obs)
File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 95, in update_current_obs
current_obs[:, -shape_dim0:] = obs
RuntimeError: The expanded size of the tensor (1) must match the existing size (5) at non-singleton dimension 1
I believe this is related to the code on line 86 of main.py
:
def update_current_obs(obs):
print("obs.shape:", obs.shape)
shape_dim0 = envs.observation_space.shape[0]
print("shape_dim0:", shape_dim0)
obs = torch.from_numpy(obs).float()
print("current_obs.shape:", current_obs.shape)
if args.num_stack > 1:
current_obs[:, :-shape_dim0] = current_obs[:, shape_dim0:]
print("current_obs.shape:", current_obs.shape)
current_obs[:, -shape_dim0:] = obs
It seems like that the dictionary is outdated. I'm fixing it by adding the proper keys and values from the OBJ_TO_IDX
dictionary.
Hey,
I haven't thought about two issues with the FullyObsWrapper
:
Any suggestion on how I can improve on it?
Thank you !
Some users have difficulties installing PyQT, on clusters in particular. It would be useful to build a renderer which uses NumPy instead of PyQT. This should not be too complicated to do given that the 2D graphics of MiniGrid are very simple.
The first step will be to evaluate what the performance impact might be. A caching strategy may need to be used to maximize performance.
Hey there,
I'm experiencing a segmentation fault after I call env.step when there are multiple renderings open. I saw a closed thread on openai for the same issue, but I'm unsure if the bug still persists across some of their environments.
I got this in the middle of training on DynamicObstacles-Random-6x6
. Not sure exactly how to reproduce it, I will try to run more experiments and see how often it happens.
File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym/core.py", line 273, in step
observation, reward, done, info = self.env.step(action)
File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym_minigrid/envs/dynamicobstacles.py", line 75, in step
self.place_obj(self.obstacles[i_obst], top=top, size=(3,3), max_tries=100)
File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym_minigrid/minigrid.py", line 923, in place_obj
raise RecursionError('rejection sampling failed in place_obj')
RecursionError: rejection sampling failed in place_obj
Hi! While looking at the method I saw that it is only checking the agent start_pos and not the current agent_pos. I can suggest a fix with a pull request if it was not intentional (it will also affect the Dynamic-Obstacles envs as it was counting that it could place an object on top of the agent and then check for overlap - so have to fix this as well if it changes).
# Don't place the object where the agent is
if np.array_equal(pos, self.start_pos):
By the way great job with the envs. Thank you!
I'd like to compile a list of submissions and accepted papers using MiniGrid. If you've used MiniGrid in a paper on arxiv or published at a conference, or if you see papers in the wild using MiniGrid, please list them in comments :)
Hi Maxime,
I wasn't able to make the agent learn RedBlueDoors
and I thought it was because my implementation of agent's memory was buggy but, in fact, it is because the max_step
is too low.
In this line, I tried max_steps=20*size*size
instead of max_steps=10*size*size
and it works now.
Do you think it is possible to write the text in env.render()
window (at the bottom)?
It would be a really nice feature because everything will be displayed in the same window. Currently, doing a gif with the grid and the text the one below the other is very hard: https://github.com/lcswillems/pytorch-a2c-ppo/blob/a0a19f994517a7fe198066685688ad66f50efa53/README-images/enjoy-gotodoor.gif
I need to precisely move my terminal and the rendering window.
I have a problem, when I import gym-minigrid as well as torch and, I call the rendering function:
"dlopen: cannot load any more object with static TLS ". I try to use the code on a server (it works on my local machine)
` ImportError: dlopen: cannot load any more object with static TLS
args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss)
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/algos/ppo.py", line 18, in init
value_loss_coef, max_grad_norm, recurrence, preprocess_obss, reshape_reward)
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/algos/base.py", line 78, in init
self.obs = self.env.reset()
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/utils/penv.py", line 51, in reset
results = [self.envs[0].reset()] + [local.recv() for local in self.locals]
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/scripts/train.py", line 91, in reset
img = self.env.render(mode="rgb_array")
File "/home/nicolas/gym-minigrid/gym_minigrid/minigrid.py", line 1269, in render
from gym_minigrid.rendering import Renderer
File "/home/nicolas/gym-minigrid/gym_minigrid/rendering.py", line 3, in
from PyQt5.QtGui import QImage, QPixmap, QPainter, QColor, QPolygon
ImportError: dlopen: cannot load any more object with static TLS
`
Hi, is there an easy way to extend this environment to support the multi-agent setting? It seems like the MiniGridEnv class assumes that there is only one agent in the environment. Would it be possibly to have a wrapper that first instantiates a single grid environment, then calls the methods in MiniGridEnv only with respect to a particular agent?
I'm trying to get inspiration from Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, but I'm having trouble seeing exactly how and where to extend MiniGrid.
A naive approach would be simply to copy paste the relevant methods, e.g. have dir_vec1
and dir_vec2
, right_vec1
and right_vec2
, etc... However some methods like reset
, _reward
, _rand_int
, steps_remaining
need not be copied.
What would be an easy and elegant way to go about extending to multiple (2 or more) agents?
Best,
Kevin
First of all, thanks for the great work!
When I was trying to render an rgb array with the environment MiniGrid-KeyCorridorS3R1-v0
, the image came out totally scrambled. It seems to be either because of it's an non-square environment or there's some issue with the RoomGrid superclass. Could you look into this? Thanks!
Hi Maxime,
MiniGrid is growing fast!
I was wondering: why you don't create a pip package for MiniGrid? It would be easier to install your package. And why you don't create releases of your code? This way, you can do changes that are not backward compatible without breaking people's code. You can also detail the changes you did.
Best,
Lucas
Minigrid uses a config module which is easily shadowed by local config files, could this import be made relative or renamed?
Here we discuss the possibility of creating a simple environment from txt file. Few consideration first:
MiniGridEnv.__str__
.My suggestion would be "invert" the str function, and use the same dictionary to generate the grid, with 1 character for each pixel indicating the type of the object and complete the rendering with the default color.
Regarding 1. maybe a wrapper that recreate the enviornment taking the path of the text file as argument?
Hi,
I want to introduce Dynamic Obstacles to the Empty Environment (MiniGrid-Empty
). This could be represented by a moving colored square on the grid. The objective of the robot would be to navigate without colliding with these dynamic obstacles.
Moreover, I also want to introduce occlusion. If an obstacle lies in front of the robot, and another obstacle lies directly behind the first obstacle, the second obstacle should be hidden by the first one, i.e. the robot fails to see the 2nd obstacle.
The final environment will be similar to the Lava Crossing Environment with the following changes:
To achieve these tasks, could you kindly point me to the files to be modified?
Thank you
Hello there! I've found a small bug with the FullyObsWrapper I think. Cheers!
import gym_minigrid
from gym_minigrid.wrappers import FullyObsWrapper
import gym
env = gym.make('MiniGrid-Fetch-5x5-N2-v0')
env.reset()
wrapped_env = FullyObsWrapper(env)
print(env.mission, wrapped_env.mission)
wrapped_env.reset()
print("Unupdated mission variable",wrapped_env.mission)
print("Actual mission variable",wrapped_env.unwrapped.mission)
When using the FullyObsWrapper (and using the recommended A2C algorithm from https://github.com/lcswillems/pytorch-a2c-ppo), no issues arise when training on a 8x8 grid. However, on a 16x16 grid the following happens:
Traceback (most recent call last):
File "C:\Python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\pytorch-a2c-ppo\scripts\train.py", line 157, in <module>
logs = algo.update_parameters()
File "c:\pytorch-a2c-ppo\torch_rl\torch_rl\algos\ppo.py", line 32, in update_parameters
exps, logs = self.collect_experiences()
File "c:\pytorch-a2c-ppo\torch_rl\torch_rl\algos\base.py", line 131, in collect_experiences
dist, value, memory = self.acmodel(preprocessed_obs, self.memory * self.mask.unsqueeze(1))
File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\pytorch-a2c-ppo\model.py", line 94, in forward
x = self.actor(embedding)
File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Python36\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
input = module(input)
File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Python36\lib\site-packages\torch\nn\modules\linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "C:\Python36\lib\site-packages\torch\nn\functional.py", line 1024, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 1600], m2: [64 x 64] at c:\new-builder_2\win-wheel\pytorch\aten\src\th\generic/THTensorMath.cpp:2070
Any ideas as what could be the cause of this?
I think this would be useful for use cases where we want learning to converge to a deterministic policy (which is not optimal in some of the partially-observable settings). It would be nice to have this as an option, simply using the full grid encoding doesn't work because it doesn't represent the agent.
Can this be turned into:
topX + self._rand_int(1, sizeX) - 1
To allow for rooms to be 1x1 size?
I think it would be beneficial (at least to me) to include the option of distinguishing between a cell which is observed to be empty, and a cell which is not seen at all (e.g. behind a wall). This is currently not the case, as both scenarios correspond to 0 values in the observation image.
I would probably try to do this in my own fork, but if there is any interest in pulling this to the main repo I would like to discuss it here because there's multiple ways to approach this:
Adding an item to the observation dictionary.
pros: does not change observation image, i.e. less likely to break existing code. Completely separates "what is seen" from "whether it is seen", so that the user can choose whether to use just one of both.
cons: The original ImgObsWrapper would ignore the addition and just return the original image.
Adding a cell type value to the observation image; perhaps this can be done by adding a new cell type altogether "Unseen" which will only be used in observation grids?
pros: minor change to observation image, less likely to break code
cons: Arguably awkward to add a type which is not associated with the state of the true environments and its objects.
Adding a binary channel to the observation image.
pros: dedicated channel, "unseen" is not a physical property inherent to the environment so it makes sense to separate it from things like the actual items.
cons: major change to observation image, more likely to break existing code.
Personally, I think option 1 would work best: Existing code will continue working as before, and methods which want to exploit the new mask are able to do so. New observation wrappers can be added to handle the case where the user wants just the image, or the image and the "mask".
Thoughts?
I followed the installation instructions but I did not get a working minigrid system.
I installed in a conda virtual environment. First I did:
source activate tf
pip3 install gym-minigrid
./manual_control.py
But there is no manual_control file created in the current directory or anywhere else that I could find.
Then I tried the other installation method:
git clone https://github.com/maximecb/gym-minigrid.git
cd gym-minigrid
pip3 install -e .
After this I saw a copy of manual_control.py which succeeded in a running a simple environment. However I could not load other environments. The following (taken from the instructions) failed:
./manual_control.py --env_name MiniGrid-Empty-8x8-v0
I believe the problem is conda not knowing the correct path for the packages, but it is my experience that 95% of any problem using python is getting it to use the right installer, package versions, paths and interpreter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.