Giter VIP home page Giter VIP logo

Comments (22)

d3sm0 avatar d3sm0 commented on July 4, 2024

How about stochastic transitions and maybe stochastic rewards? And maybe a better test suite?

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

I think stochastic transitions and rewards are already possible using the random number generation methods.

For a better test suite, I'm open to it, but would need more concrete suggestions.

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

I'll think more about it. I never wrote one but I'll be happy to take care of it.

One colleague of mine was asking about the possibility of creating envs from a text file. It has been done for a long time before gym, and maybe it can be useful, for specific research.

I would also like to have a benchmark or leaderboard if you want over a subset of all the envs like the top 10 that ore more difficult.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

It would make sense to open issues for all 3 of these things to discuss what's involved. Like, for tests, what tests make sense to have. For creating envs from text files, the format needs to be discussed. For the leaderboard, I don't know exactly how that works. One issue is that the maximum reward you can get is 1.0. Maybe if it was averaged over 100 runs?

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

Maybe let's gather a list of potential improvements /changes and desiderata for the 2.0 and with this we can split the the issues and start working? At least that's what I understood what was this thread about. If not, sorry i'll open an issue for each of those.

Beside there are two more things that i was wondering, could do some rendering of some stats like current step, reward, action, and maybe value of the state. Especially the latter would be super useful in debugging the methods

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

Some of these improvements we can integrate in the current MiniGrid because they are not at risk of breaking existing code. We could already improve the tests, for instance. Creating envs from a text file too, possibly. The things that are the most risky are changing the APIs that people call when interacting with MiniGrid and changing the observation format.

We could definitely render current state, reward and action. Value of the state I'm not sure because that's dependent on the RL implementation you're using.

from minigrid.

abaisero avatar abaisero commented on July 4, 2024

Hello, just throwing in my 2c since I've taken interest in this project recently:

Separate out the gridworld from the OpenAI Gym interface. As pointed out in #37, it would make sense to have a gridworld class separate from the OpenAI Gym environment class. This would help us support multi-agent type of setups. It might also be slightly cleaner. We are already part of the way there with the current Grid class.

While OpenAI Gym is technically designed for single-agent environments in mind, there is nothing stopping people to use the same library / interface to handle multi-agent environments. E.g. env.step receives a tuple of actions, and returns a tuple of observations. At least, this is how I'm planning to handle this scenario.

The agent should be treated like any other gridworld object. This again goes in line with multi-agent support. I think it would also be cleaner.

+1

Observations should be encoded using a one-hot scheme rather than integers corresponding to each object type and color. The observations may not actually be any bigger in terms of bytes taken if we use a numpy array of bools (bits). This would likely be easier for a neural network to decode.

I don't think this is necessary; Yes, one should obviously use one-hot-encodings (categorical indices are not features!) but these can happen outside of the environment, e.g. I do it at the input of whatever model you want to use them with (e.g. with pytorch, one can just use torch.nn.Embedding.from_pretrained). On the other hand, having the index-version makes some things easier, e.g. to debug or inspect, one can print the observation map knowing what each index represents.

Doors and locked doors should probably have the same object type. We could take advantage of a one-hot scheme here and have a "door" bit as well as a "locked" flag bit. Possibly, each object should have a method to produce its own encoding in terms of bits.

Absolutely agree that they should be the same object type, what I'm not sure about is, why isn't the state of the object completely contained in the third channel? e.g. for doors, 0 = open, 1 = closed, 2 = locked. And this could be used to contain the state of other types of objects too.. This leaves some open questions, e.g. would each object have their own "state" encoding? For now only door have this type of state, but I don't see a reason why this same channel shouldn't be used more broadly.

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

It would make sense to open issues for all 3 of these things to discuss what's involved. Like, for tests, what tests make sense to have. For creating envs from text files, the format needs to be discussed. For the leaderboard, I don't know exactly how that works. One issue is that the maximum reward you can get is 1.0. Maybe if it was averaged over 100 runs?

For the leaderboard I'm thinking more of a page with final score for the top 10 hardest environment and the learning curve of the top 3 algorithms, PPO, A2C, DQN. It should serve as a reference if someone wants to benchmark its' algorithm on this environment against the basic version. We could pick the OpenAi/Baselines implementation and just run MiniGrid or implement DQN in the linked repo in pytorch.

About tests if we are going to break the env in the upcoming weeks, I would rather work on it in the 2.0.

About text files I'm going to open an issue :)

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

Some of these improvements we can integrate in the current MiniGrid because they are not at risk of breaking existing code. We could already improve the tests, for instance. Creating envs from a text file too, possibly. The things that are the most risky are changing the APIs that people call when interacting with MiniGrid and changing the observation format.

We could definitely render current state, reward and action. Value of the state I'm not sure because that's dependent on the RL implementation you're using.

Sounds fair, how about a debug mode with an optional string or variable to render?

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

While OpenAI Gym is technically designed for single-agent environments in mind, there is nothing stopping people to use the same library / interface to handle multi-agent environments. E.g. env.step receives a tuple of actions, and returns a tuple of observations. At least, this is how I'm planning to handle this scenario.

@abaisero That's a fair point.

I don't think this is necessary; Yes, one should obviously use one-hot-encodings (categorical indices are not features!) but these can happen outside of the environment, e.g. I do it at the input of whatever model you want to use them with (e.g. with pytorch, one can just use torch.nn.Embedding.from_pretrained). On the other hand, having the index-version makes some things easier, e.g. to debug or inspect, one can print the observation map knowing what each index represents.

We do have a Grid.decode function to translate back from observations though. I'd be curious to see how the training performance differs with a one-hot encoding. Would be happy if someone benchmarked it.

For the leaderboard I'm thinking more of a page with final score for the top 10 hardest environment and the learning curve of the top 3 algorithms, PPO, A2C, DQN. It should serve as a reference if someone wants to benchmark its' algorithm on this environment against the basic version.

@d3sm0 If you have the time to do this kind of study it would be awesome.

We could pick the OpenAi/Baselines implementation and just run MiniGrid or implement DQN in the linked repo in pytorch.

Would be happy to have sample code using OpenAI baselines in this repo, to serve as a reference, though we would then ideally want to maintain this code afterwards.

About tests if we are going to break the env in the upcoming weeks, I would rather work on it in the 2.0.

After thinking about this some more, I think that we may actually want to make the changes in this repo. The reason being, no code is ever perfect. We will likely need to make yet more changes in the future. Creating a new repo would just be kicking the can down the road, and people might be confused if there are multiple repos named MiniGrid. We will try to be careful and not break too many things at once, avoid unnecessary changes to outward-facing APIs.

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

While OpenAI Gym is technically designed for single-agent environments in mind, there is nothing stopping people to use the same library / interface to handle multi-agent environments. E.g. env.step receives a tuple of actions, and returns a tuple of observations. At least, this is how I'm planning to handle this scenario.

@abaisero That's a fair point.

I don't think this is necessary; Yes, one should obviously use one-hot-encodings (categorical indices are not features!) but these can happen outside of the environment, e.g. I do it at the input of whatever model you want to use them with (e.g. with pytorch, one can just use torch.nn.Embedding.from_pretrained). On the other hand, having the index-version makes some things easier, e.g. to debug or inspect, one can print the observation map knowing what each index represents.

I do agree that the one-hot encoding should be left to the agent because is an architectural choice, but as we have seen with the FullyObsWrapper the number inside the encoding can change the results of the learning. Maybe a wrapper then?

We do have a Grid.decode function to translate back from observations though. I'd be curious to see how the training performance differs with a one-hot encoding. Would be happy if someone benchmarked it.

For the leaderboard I'm thinking more of a page with final score for the top 10 hardest environment and the learning curve of the top 3 algorithms, PPO, A2C, DQN. It should serve as a reference if someone wants to benchmark its' algorithm on this environment against the basic version.

@d3sm0 If you have the time to do this kind of study it would be awesome.

Indeed I do, cause I'm using this for my own research and for the repr challenge. But the algorithms are my own which is not ideal. Baselines changed a lot and it will change. Eventually we will just need to update the results from time to time, without maintaining the codebase, and run script that collects the results and render an html file with it.

We could pick the OpenAi/Baselines implementation and just run MiniGrid or implement DQN in the linked repo in pytorch.

Would be happy to have sample code using OpenAI baselines in this repo, to serve as a reference, though we would then ideally want to maintain this code afterwards.

About tests if we are going to break the env in the upcoming weeks, I would rather work on it in the 2.0.

After thinking about this some more, I think that we may actually want to make the changes in this repo. The reason being, no code is ever perfect. We will likely need to make yet more changes in the future. Creating a new repo would just be kicking the can down the road, and people might be confused if there are multiple repos named MiniGrid. We will try to be careful and not break too many things at once, avoid unnecessary changes to outward-facing APIs.

Fair, maybe we can just branch this, so who is using it will be stable on the master and iterate in it, starting from the proposed core changes.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

I do agree that the one-hot encoding should be left to the agent because is an architectural choice, but as we have seen with the FullyObsWrapper the number inside the encoding can change the results of the learning. Maybe a wrapper then?

I like to make things as easy and as optimal out of the box for newcomers. I would like to benchmark the training performance of a one-hot type embedding vs the class indices we have now, and decide based on that. If the one-hot scheme trains noticeably faster, we might want to make it the default.

Indeed I do, cause I'm using this for my own research and for the repr challenge. But the algorithms are my own which is not ideal. Baselines changed a lot and it will change. Eventually we will just need to update the results from time to time, without maintaining the codebase, and run script that collects the results and render an html file with it.

If you write such a script I will merge it. What is the repr challenge?

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

I do agree that the one-hot encoding should be left to the agent because is an architectural choice, but as we have seen with the FullyObsWrapper the number inside the encoding can change the results of the learning. Maybe a wrapper then?

I like to make things as easy and as optimal out of the box for newcomers. I would like to benchmark the training performance of a one-hot type embedding vs the class indices we have now, and decide based on that. If the one-hot scheme trains noticeably faster, we might want to make it the default.

Fair. I think can be related to #39 and #26. I'm not confident that the neural networks learn the same pattern if it's one-hot or real-valued. Eventually they should.

Indeed I do, cause I'm using this for my own research and for the repr challenge. But the algorithms are my own which is not ideal. Baselines changed a lot and it will change. Eventually we will just need to update the results from time to time, without maintaining the codebase, and run script that collects the results and render an html file with it.

If you write such a script I will merge it. What is the repr challenge?

This is the reproducibility challenge (repr) https://www.cs.mcgill.ca/~jpineau/ICLR2019-ReproducibilityChallenge.html

Alright i'll make the script. I'll open an issue for this, since it's not clear what the script should do.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

@d3sm0 if you'd like to benchmark the one-hot encoding, I quickly hacked it together in the maximecb-onehot branch. I'd like to try this on non-trivial environment (eg: harder than the empty ones) and measure the time it takes to reach some success rate, as well as max success rate.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

I've done some benchmarks of my own on the BabyAI GoTo and PutNextLocal environments and the results are inconclusive for the onehot change.

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

@maximecb
Sweet thanks! What do you mean by inconclusive? what are you looking at? Only the cumulative reward?

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

@d3sm0 I mean that on some levels the success rate was very slightly higher, on others it was worse. Success rate as measured as the percentage of episodes ending with a positive reward.

from minigrid.

d3sm0 avatar d3sm0 commented on July 4, 2024

@maximecb I see. If you change the representation of the state, in principle the function approximator should behave differently, thus maybe is worth looking at the moments of the gradient of the neural network, the smoothness of the losses that you are considering and the cumulative reward over time during training.

I might run some experiments too over the weekend.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

I've pushed a commit that unifies Door and LockedDoor. This simplifies some code.

from minigrid.

TheodoreGalanos avatar TheodoreGalanos commented on July 4, 2024

Hello! I was wondering if it is possible to assign a custom, arbitrary, shape for the grid world. I'm not sure if that is available already (apologies if it is!) but it would make for a nice addition. I plant to use RL for assessing occupant satisfaction and custom shapes allow for different layouts to be tested and would make minigrid a nice tool for my use case, although not sure how useful it'd be for the rest.

Kind regards,
Theodore.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

@TheodoreGalanos Hi Theodore. Right now you can have square shapes of any size. I assume you mean support for environments that are rectangular, but not necessarily squares? I will try to get that working today.

from minigrid.

maximecb avatar maximecb commented on July 4, 2024

@TheodoreGalanos it's done. See the RedBlueDoors environment for an example: https://github.com/maximecb/gym-minigrid/blob/master/gym_minigrid/envs/redbluedoors.py

from minigrid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.