farama-foundation / minigrid Goto Github PK

View Code? Open in Web Editor NEW

2.0K 39.0 597.0 14.12 MB

Simple and easily configurable grid world environments for reinforcement learning

Home Page: https://minigrid.farama.org/

License: Other

Python 99.82% Dockerfile 0.09% Shell 0.09%

gridworld-environment gym

minigrid's Issues

RGB observation

Is there a way to receive an RBG observation instead of a matrix of values [0,3...].
Is there an option when I create the agent?

Goal tile vanishes when agent reaches goal

When using RGBImgObsWrapper, the green goal tile disappears from the partial observation once the agent reaches the goal tile.

To reproduce:

# manual_control.py, line 33
if done:
    print('done!')
    from PIL import Image
    Image.fromarray(obs).show()

./manual_control.py --env MiniGrid-Empty-6x6-v0 --agent_view

Why there is an extra dimension in flatten observation?

Hi there! Thanks for you guys create this really interesting environments!

I am wondering why the observation_space of an observation-flatten environment contain an extra first dimension. The obs space is Box(1, xxx) while the observation itself is a vector with shape (xxx, )

Doing this violates the rules of gym space, and may cause error in some application.

env = gym_minigrid.wrappers.FlatObsWrapper(gym.make("MiniGrid-Empty-8x8-v0"))

env.observation_space
Out[20]: Box(1, 2739)

env.reset().shape
Out[21]: (2739,)

In some application the observation is checked via:

obs = env.reset()

env.observation_space.contains(obs)
Out[23]: False

and thus this will cause problem.

bug in gen_obs_grid?

Hello,

It seems to me that the sub-grid observed by the agent is somtimes wrong depending on what is the agent direction.
For example, assuming that we have a AGENT_VIEW_SIZE or 3. We will have a grid 3x3 around the agent and 9 elements in the observation array.
Here are the indexes of the elements in the grid visible by the agent that is facing up and is positioned in the center-bottom of the grid.

8 7 6
5 4 3
2 A 0

In this example shouldn't the index 8 of the observation array always be the element that the agent percieves at its top-left corner and the element of idex 0 always be the one at bottom-right (and so on..)?

It seems that in the current implementation of gen_obs_grid the positions of the elements in the observation view of the agent are not consistent with their position in the actual sub-grid generated as observation when it rotates.

    topX, topY, botX, botY = self.get_view_exts()

    grid = self.grid.slice(topX, topY, AGENT_VIEW_SIZE, AGENT_VIEW_SIZE)

    for i in range(self.agent_dir + 1):
        grid = grid.rotate_left()

I tried with a very naive and not optimized implementation to fix this issue:

    topX, topY, botX, botY = self.get_view_exts()

    grid = self.grid.slice(topX, topY, AGENT_VIEW_SIZE, AGENT_VIEW_SIZE)
    grid_rotate_left_1 = grid
    grid_rotate_left_2 = grid
    grid_rotate_left_3 = grid

    grid_rotate_left_1 = grid_rotate_left_1.rotate_left()

    for i in range (2):
        grid_rotate_left_2 = grid_rotate_left_2.rotate_left()

    for i in range (3):
        grid_rotate_left_3 = grid_rotate_left_3.rotate_left()

    # agent facing right
    if self.agent_dir == 0:
        grid = grid_rotate_left_3
    # agent facing left
    elif self.agent_dir == 2:
        grid = grid_rotate_left_1
    # agent facing up
    elif self.agent_dir == 3:
        grid = grid_rotate_left_2

What do you think about this?

Thank you for your collaboration!
Pier

Agent can see around walls

Hello,
the current Grid.process_vis is allowing the agent to see around walls if the wall is horizontal, relative to the agent.
This is due to the horizontal processing of the visible area cascading around the walled cells.

I have included an example of this light-curving vision and the tree node that shows how each cell mask is being activated (the first time)

I believe a potential fix of this is to use a sequence of beam emulations. it may be less computationally effective but does produce more accurate vision. Code attached

def border(grid, agent_pos):
if hasattr(grid,"sett"): return grid.sett
grid.sett=[]
for j in range(0,grid.height):
grid.sett.append([0-agent_pos[0], j])
grid.sett.append([grid.width-1-agent_pos[0], j])
for i in range(1-agent_pos[0], grid.width-1-agent_pos[0]):
grid.sett.append([i, grid.height-1])
#print(sett)
return grid.sett
def process_vis(grid, agent_pos):
mask = np.zeros(shape=(grid.width, grid.height), dtype=np.bool)
mask[agent_pos[0], agent_pos[1]] = True
for b in grid.border(agent_pos):
angle=math.atan2(b[1],b[0])
#print(angle)
beam_range=1
while(True):
x,y = round(agent_pos[0]+beam_rangemath.cos(angle)), round(agent_pos[1]-beam_rangemath.sin(angle))
#print("-",x,y)
if(x<0 or x>= grid.width or y<0 or y>= grid.height):
break
mask[x,y]=True
cell = grid.get(x, y)
if cell and not cell.see_behind():
break
beam_range+=1
for j in range(0, grid.height):
for i in range(0, grid.width):
if not mask[i, j]:
grid.set(i, j, None)
return mask

Probably a bug in RGBImgPartialObsWrapper

When using RGBImgPartialObsWrapper I get an error:

 File ".../lib/python3.7/site-packages/gym_minigrid/wrappers.py", line 200, in __init__
    obs_shape = env.observation_space['image'].shape
TypeError: 'Dict' object is not subscriptable

I think it should be
obs_shape = env.observation_space.spaces['image'].shape
instead of
obs_shape = env.observation_space['image'].shape

https://github.com/maximecb/gym-minigrid/blob/1e8b22e92b02a23fb4f85e0509815a41693aee75/gym_minigrid/wrappers.py#L200

Floor object doesn't render

I added colored floor tiles to one of your environments and it failed to render:

self.grid.set(i, j, Floor(color=self._rand_elem(colors)))

  File "site-packages/gym/core.py", line 233, in render
    return self.env.render(mode, **kwargs)
  File "gym-minigrid/gym_minigrid/minigrid.py", line 1279, in render
    highlight_mask=highlight_mask if highlight else None
  File "gym-minigrid/gym_minigrid/minigrid.py", line 527, in render
    tile_size=tile_size
  File "gym-minigrid/gym_minigrid/minigrid.py", line 469, in render_tile
    obj.render(img)
  File "gym-minigrid/gym_minigrid/minigrid.py", line 177, in render
    r.setLineColor(100, 100, 100, 0)
AttributeError: 'numpy.ndarray' object has no attribute 'setLineColor'

Am I using them wrong? Thanks!

Obstacles

I found a bug in the Dynamic Obstacles environemnt. When I run the script below I get an assertion error after not so many iterations.

import random
import time
import gym
from gym_minigrid import *

env = gym.make("MiniGrid-Dynamic-Obstacles-5x5-v0")
while True:
    action = random.randint(0, 6)
    observation, reward, done, info = env.step(action)
    # env.render("human")
    # time.sleep(1/20)
    if done:
        env.reset()

Add optional goal compass

An optional observation variable that points to the direction of the first goal rather than the orientation of the agent. The observable image in itself provides very little information for the agent to advance ( even if a human looked at the partially observable grid, they would have to rely on mere random guessing ). This need for the agent to take arbitrary steps to discover the grid further could be removed.
The direction can be simply the tan inverse of the difference of goal and current position:
dir_radians = numpy.arctan( (goal[1] - agent_pos[1]) / (goal[0] - agent_pos[0]) )

Option for observations as RGB array

It would allow for a greater range of use cases if there was an option that could be passed to train using RGB arrays instead of only the image encodings. The readme describes a way to do this with get_obs_render, but it would be nice if it could be included as an argument.

Would you consider adding something like this?

planetceres@9c2ae3c

Expected Performance?

Could anyone provide what type of performance they get on these domains with some of the provided baseline algorithms?

It is probably overkill to maintain something like a leaderboard, but it would be helpful to have some mechanism to confirm that the RL algorithms are behaving as expected and everything is installed correctly. Sometimes plots of reward vs episode number are provided in the readme. In addition, it would be cool to hear about any hyperparameters that were found to be particularly important for good performance.

For at least an order-of-magnitude example based on my installation, without making any changes to the repo, I saw convergence to ~0.9 mean reward in 130,000 timesteps when running this command:

python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 1 --algo a2c

which took about 5mins on my i7-6700 w/ 1060GTX, though I don't know whether the GPU was being used by default.

For the 8x8 empty domain, it took about 45 mins and 1.8M timesteps to converge to ~0.9 reward with this command:

python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-8x8-v0 --no-vis --num-processes 1 --algo a2c

Lava Tile

Taking inspiration from the work done by @planetceres. I think it would be neat to have a new kind of tile/object which is lava, that is, the agent dies if it tries to go over it. This is useful for studying safety in RL. I would probably draw the tile in orange with some little triangular wave lines on it.

This should be accompanies by a MiniGrid-LavaCrossing-v0 environment where the agent has to get through a gap between lava tiles in order to get to the goal. The number of lava crossings should be configurable in the environment's constructor (ie: num_crossings).

To see how tiles/objects are implemented, check out gym_minigrid/minigrid.py

For examples of how environments are implemented, see: gym_minigrid/envs

Bug: agent's view rectangle calculation

Environment: MiniGrid-Empty-5x5-v0
Wrappers: ImgObsWrapper + ViewSizeWrapper

When the agent_view_size is even (e.g. 6 as in the example below), an agent observes:

######
######
######
X--###
---###
--^-##

instead of

######
######
######
X--###
---###
--^###

Legend: # walls, - empty, X goal, ^ agent facing up (forward).

As you see, there's an empty cell on the right of an agent, while in reality (in environment's grid) there's a wall.

This particular behavior is the effect of this line of code, which "places" an agent himself here (or what he carries). Actually, there's no bug here.

But the real cause is in how agent's view rectangle is calculated in get_view_exts. There, topX and topY calculation doesn't take into consideration that in case of even agent_view_size the number of cells on the left and on the right of an agent is different.

E.g. in case of agent_view_size = 6, an agent views 3 cells on the left and 2 on the right (plus 1 he is at, which sums in 6): ---^--. And for agent_view_size=7, an agent views 3 cells both on the left and on the right: ---^---

Extending Support for Four Rooms

Different number of rooms in a gridworld fashion (no of rooms: 1 ,2,3,4 . .. . )
Baseline - 2 room gridworld, 2X2, top row - red room, bottom row – green room. No doors.

- Can generate gridworld env with number of rooms as input
- Should also take color of the rooms as input (default, same and options for all different)
- Should also take the size of the gridworld as an input (default size is smallest grid has some number of grids.
- Options for Doors (random placement)

Different number of independent Agents. (number as input 1,2,3,4)
Baseline: 1 Agent at top left in the red room

- Random start places or fixed start places.
- One grid can contain more than one agents ? ( flag)
- Representation pixel styles of the Agent (Random or different color or patterns)
- Agent Properties:
- - Partial or full observability. (front 3 by 3 grid) or front(triangle rather than square)
  - Agent can move in four 2-D directions
  - Agent can interact with other objects
- - - Blocked by the wall
    - Pick a ball
    - Drop a ball
    - Open door ( lock and door environment)

Different number of balls. ( 1,2,3,4)
baseline: 1 ball at the top right in the red room

- Random start place or fixed start places.
- One grid can contain more than one agents ? ( flag)
- Representation pixel styles of the Agent (Random or different color or patterns)
- Ball Properties :
- - Balls can be picked up and dropped down.
  - Ball explodes (May be) (Additional Ball behavior according to color)

Goal Condition
Baseline: Move the ball to green room, and the robot should also be in green room.

- Move red balls to red room.
- Or Move all balls to specified room.
- Agent may be asked to get back to initial position or to any of the rooms.

Fix numpy 1.15.4 dependency

Hi, what is required for fixing the numpy 1.15.4 dependency? I have a situation where gym-minigrid downgrades the numpy version installed by pytorch via conda.

How can I transform the render ?

Hello,
I'm using gym-minigrid and I need to get the rendering image, transform it and then display it.

After the "transformation", I get a np.array matrix with 3 channels (RGB). Is there a way to give this result to the environment (for instance the Render class ?) in order to display it ?

Thanks in advance !

PS: In other environment, I used to do something like

from gym.envs.classic_control import rendering

env_unwrapped = gym.make("...").env.unwrapped

img = env_unwrapped.ale.getScreenRGB2()
image_transformed = .... 

if env_unwrapped.viewer is None:
    env_unwrapped.viewer = rendering.SimpleImageViewer()

env_unwrapped.viewer.imshow(image_transformed)
return env_unwrapped.viewer.isopen

but apparently env_unwrapped has no attribute viewer.

Four-rooms environment

I think it might be useful to implement the classic "four rooms" environment, which is used in many classic reinforcement learning papers.

Eg: http://rstb.royalsocietypublishing.org/content/royptb/369/1655/20130480/F2.large.jpg?width=800&height=600&carousel=1

The environment should ideally have configurable start and goal positions, as parameters to the constructor. If these are not specified, they should be chosen at random every time the environment is reset.

List of potential improvements

I'm opening an issue to keep track of potential improvements to a MiniGrid v2.0. These are things I would do differently if I had to do it again. I'm not sure if these will get merged in the current version of MiniGrid because there is a big risk of breaking existing code, and this is particularly sensitive since the people using this code are using it for research purposes, where replicability is important. What may eventually happen is that I create a MiniGrid2 repository.

Separate out the gridworld from the OpenAI Gym interface. As pointed out in #37, it would make sense to have a gridworld class separate from the OpenAI Gym environment class. This would help us support multi-agent type of setups. It might also be slightly cleaner. We are already part of the way there with the current Grid class.
The agent should be treated like any other gridworld object. This again goes in line with multi-agent support. I think it would also be cleaner.
Observations should be encoded using a one-hot scheme rather than integers corresponding to each object type and color. The observations may not actually be any bigger in terms of bytes taken if we use a numpy array of bools (bits). This would likely be easier for a neural network to decode. (Won't be done, results inconclusive)
By default, observations should be tensors, so that the code works out of the box with most RL frameworks. Mission strings should be provided as part of the info dict. The unfortunate truth is that OpenAI Gym has very poor support for any other kind of observation encoding.
Doors and locked doors should probably have the same object type. We could take advantage of a one-hot scheme here and have a "door" bit as well as a "locked" flag bit. Possibly, each object should have a method to produce its own encoding in terms of bits. (Done)
Some people have had difficulty installing PyQT. It might be nice to explore alternative rendering options. We could potentially generate graphics directly using 2D NumPy tensors, but we would still need some library to open display windows. I don't know what is the simplest Python option for that. The goal is for this package to work everywhere without any issues as much as possible.
Render current step, max steps, reward obtained in env.render('human').
Rename max_steps to max_episode_steps, which is more in keeping with OpenAI Gym conventions.

Other suggestions welcome.

Can I change the origin of observation?

Hello,
Is there a way to change the origin of observation? Currently the observation covers the 77 area "in the front" of the agent. I'm wondering if I can change it so that the observation will be a 77 area centered at the agent's position.

Saving and restoring states

Does this environment support saving and restoring states?

For example, I want to save an exact map state in MiniGrid-Dynamic-Obstacles-5x5-v0 at some point and later I should be able to make the agent start from the saved state.

Gym's Atari environment has such a functionality: openai/gym#402 (comment)

Firefighter

Hey, very nice job!

I am wondering if you're planning to release a new environment for the firefighter problem, i.e., a grid world where a cell might have its state updated (burning, protected, none) after each iteration i.

(in a more simplistic configuration: a firefighter agent, an initial burning cell and a fixed object to protect)

Cheers!

Dynamic Room Environment

Is there an easy way to change the grid between different episodes?

I am looking to recreate some of the dynamic room environments, i.e where a shortcut opens up or a goal changes location after a certain number of timesteps or episodes.

New kind of tiles/objects ?

Hello,

I've been using gym-minigrid for a personal project and I need to add new kind of tiles/objects.
For example I'm adding a FoodSource object, which produce Food around. The agent have an energy level and doing actions use energy. Taking Food add energy to the agent.

My question is : are you interested in pull-requests for new kind of tiles/objects or do you prefer to keep gym-minigrid as it is ?

Thanks a lot for your work !
Antoine.

Wrappers not working correctly

Hello,

RGBImgObsWrapper() seems to not extract the dictionary items anymore.

EDIT: Really great env nonetheless :-)

How to use Class FullyObsWrapper in 'wrappers.py'

Hi:
It's really a great projet! After reading the realted issues about the FullyObs, i want to try it, but i don't know how to use the Class FullyObsWrapper in the files--'wrappers.py' . Should I import it or change the code of funciton 'gen_obs(self)' in the files--'minigrid.py' ?
Thank you!

Old obstacles positions in observation from env.step() in Dynamic-Obstacles environment

Hello, I found a problem in the Dynamic-Obstacles environment.
The problem is that in env.step(action), the obstacles position is updated after calling the env.step() of the base class, and the observation is not updated with the new position of the obstacles. So, the returned observation by env.step() has the obstacles in the old position.

Example:
obs, reward, done, info = env.step(action) # in obs, obstacles are in the old position
pixmap = env.render('pixmap') # in pixmap, obstacles are in the new (correct) position

Workaround: manual update observation after env.step()
obs, reward, done, info = env.step(action)
obs = env.gen_obs()

Suggested fix:
In dynamicobstacles.py, change the return after the obstacles positions update from:
return obs, reward, done, info
to:
return self.gen_obs(), reward, done, info

Get position of the goal

Is there a way to get the position ( x, y) of the goal?

I could get which grid is my goal by this loop

    for grid in env.grid.grid:
        if grid is not None and grid.type == "goal":
            print("This grid is my goal")

but I don't know how to map this to an (x,y) coordinates

Observation tensor size error

I'm unable to train the model with the provided pytorch main.py script with --num-processes > 1.

When I run this command:

$ python3 /home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py --env-name MiniGrid-Empty-8x8-v0 --no-vis --num-processes 5 --algo a2c

The error I get is:

obs.shape: (5, 1875)
shape_dim0: 1
current_obs.shape: torch.Size([5, 1, 1875])
current_obs.shape: torch.Size([5, 1, 1875])
Traceback (most recent call last):
  File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 271, in <module>
    main()
  File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 98, in main
    update_current_obs(obs)
  File "/home/mfe/code/gyms/gym-minigrid/pytorch_rl/main.py", line 95, in update_current_obs
    current_obs[:, -shape_dim0:] = obs
RuntimeError: The expanded size of the tensor (1) must match the existing size (5) at non-singleton dimension 1

I believe this is related to the code on line 86 of main.py:

def update_current_obs(obs):
        print("obs.shape:", obs.shape)
        shape_dim0 = envs.observation_space.shape[0]
        print("shape_dim0:", shape_dim0)
        obs = torch.from_numpy(obs).float()
        print("current_obs.shape:", current_obs.shape)
        if args.num_stack > 1:
            current_obs[:, :-shape_dim0] = current_obs[:, shape_dim0:]
        print("current_obs.shape:", current_obs.shape)
        current_obs[:, -shape_dim0:] = obs

Index error in str encoding

It seems like that the dictionary is outdated. I'm fixing it by adding the proper keys and values from the OBJ_TO_IDX dictionary.

https://github.com/maximecb/gym-minigrid/blob/a6678d060dfa3afa7f572e9528f99a3d1fc3b356/gym_minigrid/minigrid.py#L802

Fully observability

Hey,
I haven't thought about two issues with the FullyObsWrapper:

using the wrapper as it is, it renders an image of 800 x 800 x 3 which is quite heavy to manage
one must call the env.render('human')
The combination of the two makes the env very slow.

Any suggestion on how I can improve on it?

Thank you !

NumPy-based renderer (eliminate PyQT dependency)

Some users have difficulties installing PyQT, on clusters in particular. It would be useful to build a renderer which uses NumPy instead of PyQT. This should not be too complicated to do given that the 2D graphics of MiniGrid are very simple.

The first step will be to evaluate what the performance impact might be. A caching strategy may need to be used to maximize performance.

SIGSEGV on env.step for multiple renderings.

Hey there,
I'm experiencing a segmentation fault after I call env.step when there are multiple renderings open. I saw a closed thread on openai for the same issue, but I'm unsure if the bug still persists across some of their environments.

RecursionError in DynamicObstacles-Random-6x6

I got this in the middle of training on DynamicObstacles-Random-6x6. Not sure exactly how to reproduce it, I will try to run more experiments and see how often it happens.

  File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym/core.py", line 273, in step
    observation, reward, done, info = self.env.step(action)
  File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym_minigrid/envs/dynamicobstacles.py", line 75, in step
    self.place_obj(self.obstacles[i_obst], top=top, size=(3,3), max_tries=100)
  File "/opt/conda/envs/torch110/lib/python3.7/site-packages/gym_minigrid/minigrid.py", line 923, in place_obj
    raise RecursionError('rejection sampling failed in place_obj')
RecursionError: rejection sampling failed in place_obj

Minigrid.place_obj only avoiding agent start_pos

Hi! While looking at the method I saw that it is only checking the agent start_pos and not the current agent_pos. I can suggest a fix with a pull request if it was not intentional (it will also affect the Dynamic-Obstacles envs as it was counting that it could place an object on top of the agent and then check for overlap - so have to fix this as well if it changes).

            # Don't place the object where the agent is
            if np.array_equal(pos, self.start_pos):

Ref code

By the way great job with the envs. Thank you!

Submissions/papers using MiniGrid?

I'd like to compile a list of submissions and accepted papers using MiniGrid. If you've used MiniGrid in a paper on arxiv or published at a conference, or if you see papers in the wild using MiniGrid, please list them in comments :)

RedBlueDoors not trainable with the actual `max_step`

Hi Maxime,

I wasn't able to make the agent learn RedBlueDoors and I thought it was because my implementation of agent's memory was buggy but, in fact, it is because the max_step is too low.

In this line, I tried max_steps=20*size*size instead of max_steps=10*size*size and it works now.

Write the instruction in `env.render()` window

Do you think it is possible to write the text in env.render() window (at the bottom)?

It would be a really nice feature because everything will be displayed in the same window. Currently, doing a gif with the grid and the text the one below the other is very hard: https://github.com/lcswillems/pytorch-a2c-ppo/blob/a0a19f994517a7fe198066685688ad66f50efa53/README-images/enjoy-gotodoor.gif

I need to precisely move my terminal and the rendering window.

Issue rendering on server

I have a problem, when I import gym-minigrid as well as torch and, I call the rendering function:
"dlopen: cannot load any more object with static TLS ". I try to use the code on a server (it works on my local machine)

` ImportError: dlopen: cannot load any more object with static TLS
args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss)
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/algos/ppo.py", line 18, in init
value_loss_coef, max_grad_norm, recurrence, preprocess_obss, reshape_reward)
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/algos/base.py", line 78, in init
self.obs = self.env.reset()
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/torch_rl/torch_rl/utils/penv.py", line 51, in reset
results = [self.envs[0].reset()] + [local.recv() for local in self.locals]
File "/home/nicolas/InstrictMotivation/InstrictGoalReward/torch-rl/scripts/train.py", line 91, in reset
img = self.env.render(mode="rgb_array")
File "/home/nicolas/gym-minigrid/gym_minigrid/minigrid.py", line 1269, in render
from gym_minigrid.rendering import Renderer
File "/home/nicolas/gym-minigrid/gym_minigrid/rendering.py", line 3, in
from PyQt5.QtGui import QImage, QPixmap, QPainter, QColor, QPolygon
ImportError: dlopen: cannot load any more object with static TLS

Multi-agent extension?

Hi, is there an easy way to extend this environment to support the multi-agent setting? It seems like the MiniGridEnv class assumes that there is only one agent in the environment. Would it be possibly to have a wrapper that first instantiates a single grid environment, then calls the methods in MiniGridEnv only with respect to a particular agent?

I'm trying to get inspiration from Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, but I'm having trouble seeing exactly how and where to extend MiniGrid.

A naive approach would be simply to copy paste the relevant methods, e.g. have dir_vec1 and dir_vec2, right_vec1 and right_vec2, etc... However some methods like reset, _reward, _rand_int, steps_remaining need not be copied.

What would be an easy and elegant way to go about extending to multiple (2 or more) agents?

Best,
Kevin

Potential rendering bug

First of all, thanks for the great work!

When I was trying to render an rgb array with the environment MiniGrid-KeyCorridorS3R1-v0, the image came out totally scrambled. It seems to be either because of it's an non-square environment or there's some issue with the RoomGrid superclass. Could you look into this? Thanks!

Why not make a pip package?

Hi Maxime,

MiniGrid is growing fast!

I was wondering: why you don't create a pip package for MiniGrid? It would be easier to install your package. And why you don't create releases of your code? This way, you can do changes that are not backward compatible without breaking people's code. You can also detail the changes you did.

Best,
Lucas

import config

Minigrid uses a config module which is easily shadowed by local config files, could this import be made relative or renamed?

Env from text file

Here we discuss the possibility of creating a simple environment from txt file. Few consideration first:

The gym interface doesn't allow for extra argument in gym.make, how can we handle this?
The core method should be already present in MiniGridEnv.__str__.
How diverse can be the set of environment from txt file

My suggestion would be "invert" the str function, and use the same dictionary to generate the grid, with 1 character for each pixel indicating the type of the object and complete the rendering with the default color.

Regarding 1. maybe a wrapper that recreate the enviornment taking the path of the text file as argument?

Introducing Dynamic Obstacles to the Empty Environment

Hi,

I want to introduce Dynamic Obstacles to the Empty Environment (MiniGrid-Empty). This could be represented by a moving colored square on the grid. The objective of the robot would be to navigate without colliding with these dynamic obstacles.

Moreover, I also want to introduce occlusion. If an obstacle lies in front of the robot, and another obstacle lies directly behind the first obstacle, the second obstacle should be hidden by the first one, i.e. the robot fails to see the 2nd obstacle.

The final environment will be similar to the Lava Crossing Environment with the following changes:

Each obstacles (lava) is only 1 cell size.
Motion of obstacle is random.
Obstacle occlusion.

To achieve these tasks, could you kindly point me to the files to be modified?

Thank you

FullyObsWrapper mission attribute not updated

Hello there! I've found a small bug with the FullyObsWrapper I think. Cheers!

import gym_minigrid
from gym_minigrid.wrappers import FullyObsWrapper
import gym

env = gym.make('MiniGrid-Fetch-5x5-N2-v0')
env.reset()
wrapped_env = FullyObsWrapper(env)
print(env.mission, wrapped_env.mission)
wrapped_env.reset()
print("Unupdated mission variable",wrapped_env.mission)
print("Actual mission variable",wrapped_env.unwrapped.mission)

size mismatch when using FullyObsWrapper

When using the FullyObsWrapper (and using the recommended A2C algorithm from https://github.com/lcswillems/pytorch-a2c-ppo), no issues arise when training on a 8x8 grid. However, on a 16x16 grid the following happens:

Traceback (most recent call last):
  File "C:\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\pytorch-a2c-ppo\scripts\train.py", line 157, in <module>
    logs = algo.update_parameters()
  File "c:\pytorch-a2c-ppo\torch_rl\torch_rl\algos\ppo.py", line 32, in update_parameters
    exps, logs = self.collect_experiences()
  File "c:\pytorch-a2c-ppo\torch_rl\torch_rl\algos\base.py", line 131, in collect_experiences
    dist, value, memory = self.acmodel(preprocessed_obs, self.memory * self.mask.unsqueeze(1))
  File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\pytorch-a2c-ppo\model.py", line 94, in forward
    x = self.actor(embedding)
  File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Python36\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
    input = module(input)
  File "C:\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Python36\lib\site-packages\torch\nn\modules\linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Python36\lib\site-packages\torch\nn\functional.py", line 1024, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 1600], m2: [64 x 64] at c:\new-builder_2\win-wheel\pytorch\aten\src\th\generic/THTensorMath.cpp:2070

Any ideas as what could be the cause of this?

Option for full observability

I think this would be useful for use cases where we want learning to converge to a deterministic policy (which is not optimal in some of the partially-observable settings). It would be nice to have this as an option, simply using the full grid encoding doesn't work because it doesn't represent the agent.

1x1 Room sizes

https://github.com/maximecb/gym-minigrid/blob/999599a412db112bc7efa9a0f72f8c315074f8bb/gym_minigrid/envs/multiroom.py#L201

Can this be turned into:
topX + self._rand_int(1, sizeX) - 1

To allow for rooms to be 1x1 size?

Observation does not distinguish between empty cells and unobserved cells

I think it would be beneficial (at least to me) to include the option of distinguishing between a cell which is observed to be empty, and a cell which is not seen at all (e.g. behind a wall). This is currently not the case, as both scenarios correspond to 0 values in the observation image.

I would probably try to do this in my own fork, but if there is any interest in pulling this to the main repo I would like to discuss it here because there's multiple ways to approach this:

Adding an item to the observation dictionary.
pros: does not change observation image, i.e. less likely to break existing code. Completely separates "what is seen" from "whether it is seen", so that the user can choose whether to use just one of both.
cons: The original ImgObsWrapper would ignore the addition and just return the original image.
Adding a cell type value to the observation image; perhaps this can be done by adding a new cell type altogether "Unseen" which will only be used in observation grids?
pros: minor change to observation image, less likely to break code
cons: Arguably awkward to add a type which is not associated with the state of the true environments and its objects.
Adding a binary channel to the observation image.
pros: dedicated channel, "unseen" is not a physical property inherent to the environment so it makes sense to separate it from things like the actual items.
cons: major change to observation image, more likely to break existing code.

Personally, I think option 1 would work best: Existing code will continue working as before, and methods which want to exploit the new mask are able to do so. New observation wrappers can be added to handle the case where the user wants just the image, or the image and the "mask".

Thoughts?

Installation instructions did not work.

I followed the installation instructions but I did not get a working minigrid system.

I installed in a conda virtual environment. First I did:

source activate tf
pip3 install gym-minigrid
./manual_control.py

But there is no manual_control file created in the current directory or anywhere else that I could find.

Then I tried the other installation method:

git clone https://github.com/maximecb/gym-minigrid.git
cd gym-minigrid
pip3 install -e .

After this I saw a copy of manual_control.py which succeeded in a running a simple environment. However I could not load other environments. The following (taken from the instructions) failed:
./manual_control.py --env_name MiniGrid-Empty-8x8-v0
I believe the problem is conda not knowing the correct path for the packages, but it is my experience that 95% of any problem using python is getting it to use the right installer, package versions, paths and interpreter.

farama-foundation / minigrid Goto Github PK

minigrid's Issues

Recommend Projects

Recommend Topics

Recommend Org