juliareinforcementlearning / gridworlds.jl Goto Github PK

View Code? Open in Web Editor NEW

46.0 8.0 9.0 35.49 MB

Help! I'm lost in the flatland!

License: MIT License

Julia 100.00%

reinforcement-learning gridworld grid-world gridworld-environment julia makie hacktoberfest

gridworlds.jl's People

Stargazers

Watchers

Forkers

findmyway sriyash421 sriram13m landrumb adinhobl juliatagbot kharyal sid-bhatia-0 looseterrifyingspacemonkey

gridworlds.jl's Issues

Should we store assets in a different repository?

@findmyway

The total repo size of GridWorlds.jl is already around 55MB.
I was wondering if we should store assets in a different repository GridWorldsAssets and only use links in this repository. I understand that this might not be a very good practice in general. It is likely that we will add and modify the gifs in the future. And as the number of environments increases, modifications become costly for gifs.

It would be nice to freely experiment with different images & gifs, and not worry about the repo size drastically increasing.

Replace Makie.jl playablity with REPL playability

Deprecate Makie.jl playablity and replace it with REPL playability.

Fix Travis CI

I think lazily loading Makie.jl should be enough to fix this issue.

ERROR: type FourRooms has no field agent_dir

julia> w3 = FourRooms()

julia> w3(MOVE_FORWARD)
ERROR: type FourRooms has no field agent_dir
Stacktrace:
 [1] getproperty(::FourRooms, ::Symbol) at ./Base.jl:33
 [2] (::FourRooms)(::Gridworld.MoveForward) at /home/sid/.julia/packages/Gridworld/yQBUD/src/envs/fourrooms.jl:25
 [3] top-level scope at REPL[10]:1

FourRooms functor tries to access w.agent_dir, a field that does not exist. In file fourrooms.jl

mutable struct FourRooms <: AbstractGridWorld
    world::GridWorldBase{Tuple{Empty,Wall,Goal}}
    agent_pos::CartesianIndex{2}
    agent::Agent
end

function (w::FourRooms)(::MoveForward)
    dest = w.agent_dir(w.agent_pos)
    if !w.world[WALL, dest]
        w.agent_pos = dest
    end
    w
end

Allow configuring if agent can see through walls

Currently the agent can see through walls by default.

BoundsError in GoToDoor while moving beyond a door on the edge of the world

Trying to move beyond a door in the GoToDoor environment throws a BoundsError

Should there be an error in the first place? Also depends upon the semantics of making such an "illegal" move.

julia> play(w2)
Key bindings:
←: TurnLeft
→: TurnRight
↑: MoveForward
q: Quit
Error in c callback: 
BoundsError: attempt to access 8×8×8 BitArray{3} at index [2, 2, 0]
Stacktrace:
 [1] throw_boundserror(::BitArray{3}, ::Tuple{Int64,Int64,Int64}) at ./abstractarray.jl:541
 [2] checkbounds at ./abstractarray.jl:506 [inlined]
 [3] _getindex at ./abstractarray.jl:1082 [inlined]
 [4] getindex at ./abstractarray.jl:1060 [inlined]
 [5] getindex at /home/sid/.julia/packages/Gridworld/yQBUD/src/grid_world_base.jl:35 [inlined]
 [6] getindex at /home/sid/.julia/packages/Gridworld/yQBUD/src/grid_world_base.jl:36 [inlined]
 [7] (::GoToDoor{Gridworld.GridWorldBase{Tuple{Gridworld.Empty,Gridworld.Wall,Gridworld.Door{:red},Gridworld.Door{:green},Gridworld.Door{:blue},Gridworld.Door{:magenta},Gridworld.Door{:yellow},Gridworld.Door{:white}}}})(::Gridworld.MoveForward) at /home/sid/.julia/packages/Gridworld/yQBUD/src/envs/gotodoor.jl:32
 [8] (::Gridworld.var"#49#50"{GoToDoor{Gridworld.GridWorldBase{Tuple{Gridworld.Empty,Gridworld.Wall,Gridworld.Door{:red},Gridworld.Door{:green},Gridworld.Door{:blue},Gridworld.Door{:magenta},Gridworld.Door{:yellow},Gridworld.Door{:white}}}},Observables.Observable{GoToDoor{Gridworld.GridWorldBase{Tuple{Gridworld.Empty,Gridworld.Wall,Gridworld.Door{:red},Gridworld.Door{:green},Gridworld.Door{:blue},Gridworld.Door{:magenta},Gridworld.Door{:yellow},Gridworld.Door{:white}}}}},Base.RefValue{Bool}})(::Set{AbstractPlotting.Keyboard.Button}) at /home/sid/.julia/packages/Gridworld/yQBUD/src/render_with_Makie.jl:72
 [9] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [10] invokelatest at ./essentials.jl:709 [inlined]
 [11] setindex!(::Observables.Observable{Set{AbstractPlotting.Keyboard.Button}}, ::Set{AbstractPlotting.Keyboard.Button}; notify::Observables.var"#6#8") at /home/sid/.julia/packages/Observables/0wrF6/src/Observables.jl:132
 [12] setindex!(::Observables.Observable{Set{AbstractPlotting.Keyboard.Button}}, ::Set{AbstractPlotting.Keyboard.Button}) at /home/sid/.julia/packages/Observables/0wrF6/src/Observables.jl:126
 [13] addbuttons(::AbstractPlotting.Scene, ::Symbol, ::GLFW.Key, ::GLFW.Action, ::Type{AbstractPlotting.Keyboard.Button}) at /home/sid/.julia/packages/GLMakie/4EXKe/src/events.jl:36
 [14] (::GLMakie.var"#keyoardbuttons#59"{AbstractPlotting.Scene})(::GLFW.Window, ::GLFW.Key, ::Int32, ::GLFW.Action, ::Int32) at /home/sid/.julia/packages/GLMakie/4EXKe/src/events.jl:124
 [15] _KeyCallbackWrapper(::GLFW.Window, ::GLFW.Key, ::Int32, ::GLFW.Action, ::Int32) at /home/sid/.julia/packages/GLFW/CBo9c/src/callback.jl:43
 [16] PollEvents at /home/sid/.julia/packages/GLFW/CBo9c/src/glfw3.jl:620 [inlined]
 [17] pollevents at /home/sid/.julia/packages/GLMakie/4EXKe/src/screen.jl:475 [inlined]
 [18] fps_renderloop(::GLMakie.Screen, ::Float64) at /home/sid/.julia/packages/GLMakie/4EXKe/src/rendering.jl:21
 [19] renderloop(::GLMakie.Screen; framerate::Float64) at /home/sid/.julia/packages/GLMakie/4EXKe/src/rendering.jl:47
 [20] renderloop(::GLMakie.Screen) at /home/sid/.julia/packages/GLMakie/4EXKe/src/rendering.jl:41
 [21] (::GLMakie.var"#47#49"{GLMakie.Screen})() at ./task.jl:356

Pressing `q` key does not close the Makie window

Pressing the q key terminates the command play(w) in the REPL but does not close the Makie window, where I can still play with the agent. Is this intensional?

Create benchmark table

Summarize benchmarks using BenchmarkTools.@btime into a table so that multiple environments can be compared.

Adding pickup and drop action to the agent

The environments in #34 #32 #26, need the agent to pick up and drop objects present in the environment. So I think we need to add an object field to the agent(considering it can pick up only one thing at a time). Currently, the door key env handles this by assuming that the agent picks up the key as it moves over its tile.

Benchmarking

Tracking this here instead of README to ensure our implementations do not have significant performance issues.

Accelerate parallel environment interactions with GPU

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Separate API for this package?

Ref: #146

Add DynamicObstacles

RLBase.reward(env) is not defined

Hi!

I'm working on recreating some basic RL policies, and trying to use the common interface of Environment and it looks like RLBase.reward is not implemented for EmptyRoom.

I can open a pull request, but wanted to ask if it was intentional or not!

Thanks!

Purpose of GoToDoor is not novel

GoToDoor does not seem to serve a purpose in terms of demonstrating something unique.
From the perspective of implementing the RLBase API for it, the purpose of this environment seems vague.

There is no goal to reach in this environment. get_reward and get_terminal become subjective here.
There is no point in having a door without a key if one can just walk through it just like one walks through the narrow empty entrance between rooms, as given in FourRooms.
If we are going to consider adding a key, then well, we already have DoorKey to demonstrate this concept.

I suggest we either come up with some tangible and useful purpose for it, or better, remove the GoToDoor environment altogether.

Environment constructors should reuse `reset!` method

Having all environment constructors reuse the reset! method for their environment has the following benefits:

Increased code reuse.
Increased consistency. It would be consistent if a freshly instantiated environment is just like an environment that has been reset.

From tomorrow, I'll start working on a multi-agent environment.
I am thinking of a cooperative multi-agent version of the CollectGemsUndirected environment with full-observability. A few reasons/advantages for choosing this:

This problem seems easy to transition from single-agent to multi-agent. Both single agent and multi-agent versions have the same purpose, to collect as many gems as fast as possible.
Rewards are distributed somewhat evenly during the episode. I think makes it easier to learn than sparse reward environments (as in the case of goal-reaching environments).
The difficulty of this problem is easily tunable with the density of scattered gems and number of agents.
Visualization of behavior in this environment can aid in easily detecting if the agents are learning to collaborate with each other (for example, by collecting gems from different regions of the map) or not (competing for the same gems).
Fully-observable because we want the agents to have a broader context in order to be able to collaborate effectively. And it is simpler to implement than partially-observable too.

Let me know what you think.

Also, please suggest if there is a better platform to document our discussions. RL.jl has a discussions section. Maybe we can enable it for GridWorlds.jl @findmyway .

`GoToDoor` environment display looks a little jagged in the terminal

GoToDoor environment display looks a little jagged on the terminal.

Could we use a different character than the current door emoji? Something like a wall but a different color perhaps?

Keeping constant pixel dimensions for all characters would probably lead to a more uniform visual representation in the terminal.

Add agent's view as a subplot

Ref: http://makie.juliaplots.org/stable/makielayout/tutorial.html

DeepMind Lab2D

Maybe we can borrow some ideas from lab2d

Add sokoban

Ideally, this environment should be integrated in https://github.com/JuliaReinforcementLearning/Maze.jl

Add KeyCorridorSXRY

DynamicObstacles updates obstacles only on MOVE_FORWARD action

The obstacles in DynamicObstacles seem to only get updated on taking the MOVE_FORWARD action and not on the TURN_LEFT or TURN_RIGHT actions.

I was wondering if this is by design. After all, TURN_LEFT and TURN_RIGHT do consume a time-step, just like MOVE_FORWARD. So it seems more natural to me that the agent should be vigilant of the obstacles for each step it takes.

gym-maze

https://github.com/MattChanTK/gym-maze

API for GridWorlds.jl environments

This is to decide upon a consistent API that would be implemented by GridWorlds environments.

I propose to use the ReinforcementLearningBase interface (see here) for this package for following reasons:

The RLBase interface is quite general and it covers several of the typical RL scenarios, including but not limited to multiple agents, different stochasticity styles, sequential/simultaneous moves etc... My hunch is that GridWorlds will soon expand and require such sophisticated features (say multi-agent, for example for a gridworld-soccer like environment).
It is well integrated with the rest of the JuliaRL ecosystem. RLCore utilizes RLBase and implements a lot of commonly required functionalities for RL tasks. RLZoo utilizes RLBase and implements several useful algorithms that can be used off the shelf. We can utilize all of that work for free.
If we choose to implement an independent API from scratch specifically tailored for GridWorlds, it is possible that we will encounter some conversion friction while trying to convert it to RLBase interface in order to work with the rest of the JuliaRL packages. In this case, we would mostly need to write our own implementations of the functionalities offered in RLCore or RLZoo. On the other hand, if we make a GridWorlds API from scratch that is as general as the RLBase one, such that we never encounter any lossy conversion, then well, we might as well have used RLBase API in the first place.
In the rare case, if we require something that is not currently offered by RLBase (our current environments are already well covered I think), then this an improvement opportunity for RLBase, and RLBase interfaces would expand accordingly. In the case that it does not happen for whatever reason, we can always expand our own API for such special cases while sticking to RLBase interface as much as possible. Of course, since we would be incorporating features never implemented in the rest of the JuliaRL ecosystem, we would have to consequently write our own implementations for such special features. But this is a rare case, I think. And moreover, if we are to not follow the RLBase API, then we anyways have to write custom implementations without much choice.

Of course, this would add a dependency upon RLBase. But I feel that is very much worth it.

I would love to hear other peoples' thoughts on this.

`env[:, :, :] .= false` not working as expected

Broadcasted setindex! for AbstractGridWorld objects, for example, like env[:, :, :] .= false, doesn't actually modify the env object's world field, even though we have used the forward macro here

If such behaviour is needed, use something like the following (for now) instead:

world = get_world(env)
world[:, :, :] .= false

Why not a `get_world` method instead of `convert(GridWorldBase, env)` method?

We currently have Base.convert(::Type{GridWorldBase}, env::AbstractGridWorld) = env.world

One of my reservations against the convert method is that it is a significantly lossy conversion. An environment struct will typically have extra information that GridWorldBase cannot hold. And the use of convert hints that this is okay, which is a little hard to digest.

Instead, I was wondering if we could have a simple getter method like get_world(env::AbstractGridWorld) = env.world

@findmyway

CollectGems env needs a rng argument for reproducibility

Make separate module for each environment

Add MultiRoomNXSY

GSoC 2021 Work Product

cc @findmyway @jonathan-laurent

Here I'll summarize the work done by me as part of GSoC 2021 and also provide links to the related pull requests.

I added the first multi-agent reinforcement environment in this package called CollectGemsUndirectedMultiAgent (later renamed to CollectGemsMultiAgentUndirected). This was done in #143 .

I also experimented with batch environments for a couple of weeks (struct of arrays like collection for a bunch of environments for improved performance). This work is in the draft PR #146 . This work would later prove useful when we support algorithms that can leverage an array of structs representation of an environment for better performance.

Then there were a series of pull requests for revamping each of the environments according to a newer, simpler design for the entire package. The new design involved placing each environment into its separate module and reducing code reuse in the favor of clarity. Revisiting each environment also meant a chance to look for performance improvements. Overall, we were able to get to most of the low hanging fruits after this exercise. By this time, I had already established a system for playing these games interactively inside the Julia REPL, which proved immensely helpful while testing the environments. These are the related PRs: #153 #154 #155 #156 #159 #160 #161 #162 #164 #165 #166 # 168 #169

#170 cleaned up a bunch of things pertaining to the old design, removing unused dependencies, structs, and methods.

#171 revisited benchmarking the performance of environments, which had been paused while revamping the new environments. This also provided a concise tabular format for the memory and (median) time usages for the most common operations on an environment.

#172 re-added the agent's view

#173 and #174 contained a bunch of reorganization and cleaning up a bunch of miscellaneous things.

#175 made playing and replaying more robust

#176 provided input validation for the act! methods for all the environments. Additionally, it contained a few bug fixes.

#177 cleaned up and fixed a few miscellaneous things

#179 updated the benchmarks and #178 #180 #183 updated the documentation (in the README)

Finally #183 bumped the minor version and released v0.5.0. JuliaReinforcementLearning/ReinforcementLearning.jl#406 also updated the related experiments in RL.jl as per the latest version of GridWorlds.jl.

Allow rectangular grid dimensions

This can be added after the v0.1.0 release.

juliareinforcementlearning / gridworlds.jl Goto Github PK

gridworlds.jl's People

Stargazers

Watchers

Forkers

gridworlds.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org