Giter VIP home page Giter VIP logo

juliapomdp / pomdps.jl Goto Github PK

View Code? Open in Web Editor NEW
648.0 44.0 97.0 8.43 MB

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.

Home Page: http://juliapomdp.github.io/POMDPs.jl/latest/

License: Other

Julia 100.00%
pomdps markov-decision-processes julia artificial-intelligence control-systems reinforcement-learning reinforcement-learning-algorithms mdps python

pomdps.jl's Introduction

POMDPs

Linux Mac OS X Windows
Build Status Build Status Build Status

Docs Dev-Docs Gitter Slack

This package provides a core interface for working with Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). The POMDPTools package acts as a "standard library" for the POMDPs.jl interface, providing implementations of commonly-used components such as policies, belief updaters, distributions, and simulators.

Our goal is to provide a common programming vocabulary for:

  1. Expressing problems as MDPs and POMDPs.
  2. Writing solver software.
  3. Running simulations efficiently.

POMDPs.jl integrates with other ecosystems:

For a detailed introduction, check out our Julia Academy course! For help, please post in GitHub Discussions tab. We welcome contributions from anyone! See CONTRIBUTING.md for information about contributing.

Installation

POMDPs.jl and associated solver packages can be installed using Julia's package manager. For example, to install POMDPs.jl and the QMDP solver package, type the following in the Julia REPL:

using Pkg; Pkg.add("POMDPs"); Pkg.add("QMDP")

Quick Start

To run a simple simulation of the classic Tiger POMDP using a policy created by the QMDP solver, you can use the following code (note that POMDPs.jl is not limited to discrete problems with explicitly-defined distributions like this):

using POMDPs, QuickPOMDPs, POMDPTools, QMDP

m = QuickPOMDP(
    states = ["left", "right"],
    actions = ["left", "right", "listen"],
    observations = ["left", "right"],
    initialstate = Uniform(["left", "right"]),
    discount = 0.95,

    transition = function (s, a)
        if a == "listen"
            return Deterministic(s) # tiger stays behind the same door
        else # a door is opened
            return Uniform(["left", "right"]) # reset
        end
    end,

    observation = function (s, a, sp)
        if a == "listen"
            if sp == "left"
                return SparseCat(["left", "right"], [0.85, 0.15]) # sparse categorical distribution
            else
                return SparseCat(["right", "left"], [0.85, 0.15])
            end
        else
            return Uniform(["left", "right"])
        end
    end,

    reward = function (s, a)
        if a == "listen"
            return -1.0
        elseif s == a # the tiger was found
            return -100.0
        else # the tiger was escaped
            return 10.0
        end
    end
)

solver = QMDPSolver()
policy = solve(solver, m)

rsum = 0.0
for (s,b,a,o,r) in stepthrough(m, policy, "s,b,a,o,r", max_steps=10)
    println("s: $s, b: $([s=>pdf(b,s) for s in states(m)]), a: $a, o: $o")
    global rsum += r
end
println("Undiscounted reward was $rsum.")

For more examples and examples with visualizations, reference the Examples and Gallery of POMDPs.jl Problems sections of the documentaiton.

Documentation and Tutorials

In addition to the above-mentioned Julia Academy course, detailed documentation and examples can be found here.

Docs Docs

Supported Packages

Many packages use the POMDPs.jl interface, including MDP and POMDP solvers, support tools, and extensions to the POMDPs.jl interface. POMDPs.jl and all packages in the JuliaPOMDP project are fully supported on Linux. OSX and Windows are supported for all native solvers*, and most non-native solvers should work, but may require additional configuration.

Tools:

POMDPs.jl itself contains only the core interface for communicating about problem definitions; these packages contain implementations of commonly-used components:

Package Build Coverage
POMDPTools (hosted in this repository) Build Status
ParticleFilters Build Status codecov.io

Implemented Models:

Many models have been implemented using the POMDPs.jl interface for various projects. This list contains a few commonly used models:

Package Build Coverage
POMDPModels CI codecov
LaserTag CI codecov
RockSample CI codecov
TagPOMDPProblem CI Coverage Status
DroneSurveillance Build status codecov
ContinuumWorld CI Coverage Status
VDPTag2 Build Status
RoombaPOMDPs (Roomba Localization) CI

MDP solvers:

Package Build/Coverage Online/
Offline
Continuous
States - Actions
Rating3
DiscreteValueIteration Build Status
Coverage Status
Offline N-N ★★★★★
LocalApproximationValueIteration Build Status
Coverage Status
Offline Y-N ★★
GlobalApproximationValueIteration Build Status
Coverage Status
Offline Y-N ★★
MCTS (Monte Carlo Tree Search) Build Status
Coverage Status
Online Y (DPW)-Y (DPW) ★★★★

POMDP solvers:

Package Build/Coverage Online/
Offline
Continuous
States-Actions-Observations
Rating3
QMDP (suboptimal) Build Status
Coverage Status
Offline N-N-N ★★★★★
FIB (suboptimal) Build Status
Coverage Status
Offline N-N-N ★★
BeliefGridValueIteration Build Status
codecov
Offline N-N-N ★★
SARSOP* Build Status
Coverage Status
Offline N-N-N ★★★★
NativeSARSOP Build Status
Coverage Status
Offline N-N-N ★★★★
ParticleFilterTrees (SparsePFT, PFT-DPW) Build Status
codecov
Online Y-Y2-Y ★★★
BasicPOMCP Build Status
Coverage Status
Online Y-N-N1 ★★★★
ARDESPOT Build Status
Coverage Status
Online Y-N-N1 ★★★★
AdaOPS CI
codecov.io
Online Y-N-Y ★★★★
MCVI Build Status
Coverage Status
Offline Y-N-Y ★★
POMDPSolve* Build Status
Coverage Status
Offline N-N-N ★★★
IncrementalPruning Build Status
Coverage Status
Offline N-N-N ★★★
POMCPOW Build Status
Coverage Status
Online Y-Y2-Y ★★★
AEMS Build Status
Coverage Status
Online N-N-N ★★
PointBasedValueIteration Build status
Coverage Status
Offline N-N-N ★★

1: Will run, but will not converge to optimal solution

2: Will run, but convergence to optimal solution is not proven, and it will likely not work well on multidimensional action spaces. See also https://github.com/michaelhlim/VOOTreeSearch.jl.

Reinforcement Learning:

Package Build/Coverage Continuous
States
Continuous
Actions
Rating3
TabularTDLearning Build Status
Coverage Status
N N ★★
DeepQLearning Build Status
Coverage Status
Y1 N ★★★

1: For POMDPs, it will use the observation instead of the state as input to the policy.

3 Subjective rating; File an issue if you believe one should be changed

  • ★★★★★: Reliably Computes solution for every problem.
  • ★★★★: Works well for most problems. May require some configuration, or not support every edge of interface.
  • ★★★: May work well, but could require difficult or significant configuration.
  • ★★: Not recently used (unknown condition). May not conform to interface exactly, or may have package compatibility issues
  • ★: Not known to run

Performance Benchmarks:

Package
DESPOT

*These packages require non-Julia dependencies

Citing POMDPs

If POMDPs is useful in your research and you would like to acknowledge it, please cite this paper:

@article{egorov2017pomdps,
  author  = {Maxim Egorov and Zachary N. Sunberg and Edward Balaban and Tim A. Wheeler and Jayesh K. Gupta and Mykel J. Kochenderfer},
  title   = {{POMDP}s.jl: A Framework for Sequential Decision Making under Uncertainty},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {26},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v18/16-300.html}
}

pomdps.jl's People

Contributors

aidmandorky avatar ajkeith avatar alexbork avatar bozenkhaa avatar deyandyankov avatar dressel avatar dykim07 avatar dylan-asmar avatar ebalaban avatar etotheipluspi avatar felixmg312 avatar fredcallaway avatar github-actions[bot] avatar himanshugupta1009 avatar johannes-fischer avatar juliatagbot avatar lassepe avatar logankilpatrick avatar maximebouton avatar michaelhatherly avatar mossr avatar mykelk avatar neroblackstone avatar potatoboiler avatar rejuvyesh avatar shushman avatar tawheeler avatar tkelman avatar whifflefish avatar zsunberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pomdps.jl's Issues

standard simulate() function

Hi all, has anyone discussed having a standard simulate() function to run a simulation once a policy has been solved for? This would be useful for both policy evaluation and documentation - someone can easily look at the simulate function to see how all of the functions in the interface work together.

The arguments might be the problem, an initial belief, the policy, and the random number source. Should I go ahead and implement this (or perhaps it belongs in POMDPToolbox)?

random states (perhaps actions and observations as well)

Here is another issue that came up for me. In order to refill an empty particle filter, I'd like to randomly sample states. State creation should, obviously, happen on the problem side. Given our current interface, I don't see a clean way of generating a random state, but here are a couple of options that come to mind:

  1. Use rand!(). This is my preferred option, but the only two distribution types currently supported through the interface are transition distributions and observation distributions. We could, for example, add create_state_distribution() that would return an AbstractDistribution that can then be immediately used in rand!(). I can also see the same being useful for actions and observations down the road.
  2. Have create_state() and alike return a random state rather than some arbitrary/empty/invalid state. Given their current form, we would not be able to pass an RNG into create_* functions though and this option also takes us away from our philosophy of create_* functions serving pre-allocation purposes only.

Can anybody propose something better/easier that I am overlooking?

rollout!()

Something that we may want to add to the interface in the future is the ability for a user to define a custom rollout function in case there is a really fast way to estimate the reward of a rollout for a particular problem.

rng as an argument to rand

Hi guys,

I was just looking through this quickly, and I noticed that the rand!() function for abstract distribution does not take a random number generator as an argument. I have found the ability to supply an AbstractRNG to make rand!() deterministic very useful for debugging. I think that this approach is superior to setting the global random seed, and it is used in all of the random functions in Julia 0.4 (http://julia.readthedocs.org/en/latest/stdlib/numbers/?highlight=rand#Base.rand). Should we add an optional AbstractRNG argument to rand!()?

Rename policy functions

In the spirit of keeping function names as short as possible without losing meaning (as per the Julia style), perhaps we should get rid of the get_ in front of the policy functions.

PolicyState

A policy should map policy states to actions

Up until this time, we have conceptualized a policy to be a mapping from beliefs to actions, and we have given responsibility for defining the belief structure and update behavior to the problem writer. However, a more general formulation that is perhaps more appropriate in this context is that a policy maps a policy state (that may be a belief) to an action. This reflects the fact that many advanced POMDP solvers have a closely associated belief representation, so they maintain and update their own state or belief. Thus, belief maintenance should be under control of the solver writer rather than the problem writer (though the problem writer will provide a structure for representing a distribution over states that can be used to represent the initial belief, and may provide exact belief updates for the policy to use if they are available).

The policy state will take different forms for different solvers. For example, the policy state for an MCVI policy is the node in the policy graph, the policy state for a POMCP policy is the action-observation tree and associated particle filter, and the policy state for a policy that feeds a baby when it is crying is simply the previous observation.

In order to implement this new paradigm, we are considering adding three things to the interface (there will probably be other minor changes associated with this as well).

First, we will introduce a new abstract type

abstract PolicyState

Second, we will introduce a function to create an initial policy state at the beginning of the simulation

function initial_policy_state(policy::Policy, belief::Belief)

(also the standard create_policy_state(p::Policy) will be declared)

Finally, we will replace the belief function in the simulation loop with a function that updates the policy state (which, again, could be a belief representation)

function update(policy::Policy, ps_old::PolicyState, a::Action, o::Observation, ps_new=create_policy_state(policy))

A simulation loop will look like this:

function simulate(sim::MySimulator, pomdp::POMDP, policy::Policy)

    # initialize belief, state, stopping criteria, allocate stuff etc
    ...

    # create the policy state from the initial belief
    ps = initial_policy_state(policy, belief)
    ps2 = create_policy_state(policy)

    while ... # stopping criteria

        a = action(policy, ps, a)
        r += reward(pomdp, s, a)

        # sample the next state and observation
        trans_dist = transition(pomdp, s, a, trans_dist)
        rand!(sim.rng, s2, trans_dist)
        obs_dist = observation(pomdp, s2, a, obs_dist)
        rand!(sim.rng, o, obs_dist)

        ps2 = update(policy, ps, a, o, ps2)

        # switch things around so we don't have to allocate more
        tmp=s; s=s2; s2=tmp;
        tmpps=ps, ps=ps2, ps2=tmpps;
    end
end

Issues

There are a few issues that we need to nail down before adding this:

Type Heirarchy

The proper location of PolicyState in the type heirarchy is not completely clear. Originally we thought that Belief and PolicyState should simply be aliased

typealias PolicyState Belief

However, this is a little strange because a PolicyState may not be sampleable even though it will then be a subclass of AbstractDistribution.

Belief should perhaps be a subtype of PolicyState because it is a policy state that can be sampled. However, since julia does not allow multiple inheritance, Belief could no longer be an AbstractDistribution.

Another option is to declare that, PolicyState is not connected by inheritance to any other classes, and if a policy state is based on the belief, it should include the belief as a member, i.e.

type MyPolicyState <: PolicyState
    b::Belief
end

Another wrinkle to think about is that, for an MDP, the policy state could be the MDP state itself, but I think we should just handle this separately with an extra action(..., s::State,...) function like we have now.

Arguments

We need to decide for sure what the arguments for each function should be. In particular, we need to answer whether the following arguments should be in the following functions:

  • pomdp in action? (I think no because the policy should have knowledge of the pomdp from the call to solve())
  • pomdp in create_policy_state? (I think no for the same reason as above)

Problem Specific Belief Updates

Should POMDPs.jl still contain a standard belief update interface (I think it should), or should we break it out into a different package that can focus more on it? Should it look any different? Does this have any bearing on (#32)?

Does anyone have any thoughts on this?

Naming of solver

Should the convention be for packages to define things like SARSOPSolver, or should it be just Solver for simplicity? And when necessary use SARSOP.Solver?

current function signature for action is invalid

The current function signature for action is

action(p::Policy, state::State, action=create_action(pomdp))

This is not valid because pomdp, the argument to create_actions, is not one of the arguments of action.

How should we fix this? If a policy is required to be able to specify an action given the state, then we a create_action(::Policy) method makes sense. I think that's the best option. What do y'all think?

isterminal() for observations?

Since we give "special" status to terminal states, should we also be able to distinguish terminal observations? I used to have a sanity check in DESPOT where I verified whether a terminal observation was generated upon transition to a terminal state. It's not critical though. Does anyone else see any need for such a function?

State Indexing

When dealing with discrete MDPs it is not clear what the best way to obtain the utility/action for a given state is. The current implementation of value iteration creates an array of utility values, so an index is needed for access. For example, calling action(policy, state) or value(policy, state) requires a mapping from the state (which can be a concrete type) to the index represented by that state.
There are a few possible options to deal with this:

  • Create a dictionary that maps states to their indices in each solver (this would require the user to define a hash() and == functions
  • Define the policy and the utility as a dictionary
  • Require the user to define an indexing function that has form index(mdp::POMDP, state::Any)

What do you all think?

display functions

I think it would be handy to have display functions for states, actions, and observations (e.g. display_state, display_action, and display_obs). They, if implemented, would produce domain-specific output (e.g. "next action: go right", instead of "next action: 2") that users could utilize for monitoring execution (particularly in the case of online solvers).

Someday... Documentation/warnings about missing functions and explicit importing

Printing an error when a function is not implemented for a POMDP model is good, but we need to make sure that we have clear documentation about explicitly importing the functions that you are implementing - it is rather confusing if you have, for example, reward() clearly written in CustomPOMDP.jl, but you are getting the error CustomPOMDP does not implement reward because you didn't explicitly import POMDPs.reward.

In addition to documentation, we might also improve the error messages (e.g. by printing out the currently available methods of the function) or even implement a simple lint-like tool to help users.

I realize this is down the line a ways, but writing this out in an issue makes me feel more at peace that it will get done in the future.

allocating observation sample

Hi all,

Currently the interface does not appear to contain a mechanism for allocating an observation. When a sample state needs to be generated, I can use create_state() [which is not currently documented btw] to allocate a state and then rand!() to fill it, but create_observation() currently creates a distribution, not a single observation. This conflict stems from the fact that the term "observation" is currently overloaded to refer to either a single observation or a distribution of observations.

How should we solve this? Is there another word that we could use? Is there a word that fits the in the following analogy? "state" is to "transition" as "observation" is to

Tiger example

The readme for POMDPs.jl has links to two examples, grid world for MDPs and the tiger problem for POMDPs. But the link to the tiger problem goes nowhere? I looked the "examples" folder as well and found grid world but no tiger problem.

POMDP Tutorial link broken

The POMDP Tutorial link in the README is not working for me.

The browser says:

400 : Bad Request
We couldn't render your notebook
Perhaps it is not valid JSON, or not the right URL.
If this should be a working notebook, please let us know.
The error was: HTTP 403: Forbidden

I am logged into github, so it seems that I don't have the proper credentials on ipython.org

Observation function depends on the resulting state

It would be good to update the documentation to make sure it is clear that the function depends on the resulting state (s') that follows from taking action a. This created confusion for Zach and Edward, so we should clarify it for others.

Remove !?

Should we remove the ! functions? So, for example, instead of:

transition!(distribution, pomdp::POMDP, state::State, action::Action)

we would have

transition(pomdp::POMDP, state::State, action::Action; distribution = create_transition_distribution(pomdp))

or something like that? It would return distribution. Is there a performance difference? The benefit of this change is that it allows you to have simpler calls if you want the distribution to be allocated for you. I think this would also allow you to work with immutables if you wanted. What do folks think?

Hashing States in Search Trees

In the current implementation of MCTS, the tree is represented by a dictionary that is hashed by the states and actions. That is, the tree is hashed by the concrete types that represent a give state or action. This is problematic for two reasons:

  • The user would need to define their own Base.hash and == functions for their states and actions in order for the haskey function to work on the tree
  • Hashing on mutable types is not recommended, see here. Since we want to support rand!(rng, state, distribution), we can't make states immutable.

So what should we do here? One option is to have a mapping between state and indices, but this will not work for continuos spaces (if MCTS with double progressive widening is used).

This seems like a problem that could come up in other places, so I am posting it here for visibility.

update_belief and update_belief!

Is there a reason that we have both of these? Shouldn't we just pick one so that noone has to worry about supporting this? Also, we need to add the one we choose to the readme

Should there be an actions(::POMDP, ::Belief, ::AbstractSpace) method?

Tree-based solvers like POMCP need to evaluate all (or some in the sparse or dpw case) of the actions available at a sample belief state. Right now we only have the actions(::POMDP, ::State, ::AbstractSpace) method.

By default, the method would return the full action space, so that it only needs to be implemented in cases where only certain actions are available from certain beliefs.

@pomdp_func actions(pomdp::POMDP, b::Belief, aspace::AbstractSpace=actions(pomdp)) = aspace

Macro to automatically generate error messages for interface functions

Hi all, I am working on a macro that will automatically generate the error messages for the functions in the interface. I talked to Max about this yesterday, and I think it will be pretty useful.

Instead of

create_state(pomdp::POMDP) = error("$(typeof(pomdp)) does not implement create_state")

We'll just be able to type

@pomdp_func create_state(pomdp::POMDP)

or something similar in the source and it will automatically generate the default error implementation. I think this will help us be more consistent and correct, and if we want to change the way these errors show up we can do it all from one place.

I don't really think pomdp_func is a great name - if anyone has any ideas, let me know.

Interface Changes

This issue will be a central place to discuss implementation of the changes decided on in our October 2nd in-person meeting. This should fix #32, #33, and #34

I created a branch (b43ecf8), but the only changes I have made so far are to the README. I have changed the format somewhat (if we don't like this I'm fine with going back).

First big question: Should I continue with this branch, or just start pushing straight to master?

get rid of DiscretePOMDP?

Is anyone using the DiscretePOMDP type? To me, it seems like just having one POMDP type is adequate. Does someone have a strong reason for having it? (Note: I think I may have been the one to propose it in the first place - if so, I no longer think it's necessary)

Introduce BeliefUpdater type

I think it would be very useful to define a BeliefUpdater type. Here are the main benefits, as I see them:

  • Allows use of different belief update methods for the same belief type. For instance, for a particle-based belief, one may chose an exact belief updater for a small problem and a particle-filtering updater for a larger problem.
  • Similarly to the Solver type, allows for a clean way of providing configuration parameters for belief updaters and preallocation/storage of their frequently used variables (thus possibly improving performance). An alternative to this would be to store such parameters and variables in either the belief structure or the POMDP structure, neither of which would be ideal, in my opinion.

We'd need to modify 'belief' function as follows:

belief(bu::BeliefUpdater, pomdp::POMDP, belief_old::Belief, action::Any, obs::Any, belief_new::Belief=create_belief(pomdp))

action() uniformity with the rest of the interface

action() has been slightly different from the other functions in this interface since the beginning. If I understood correctly, the argument for this was that the action type is less likely than other types to be big and mutable and will be generated less often, so the performance hit for allocating every time is not that large. Now that we have finished transitioning away from ! functions (#23) should we make action more uniform with the rest of the interface (i.e. adding an optional argument for pre-allocated memory)?

One decision that needs to be made regarding this is what the argument for create_action() would be. On one hand

create_action(pomdp::POMDP)

seems natural because the action type should depend on the POMDP. However, this precludes it's use as a default allocator for action() because action() does not take the POMDP as one of its arguments.

The alternative is to use

create_action(policy::Policy)

This seems reasonable, except that both knowledge of the problem is required along with the declaration of the policy type, so this violates separation of the solver and problem code.

Perhaps both of these functions should exist - the first written by the problem-writer and the second (which will probably call the first) written by the solver-writer and used as the default final argument for action(). Or should we add the pomdp as one of the arguments for action()?

Require ==, hash(), and iterator functions for State, Action, and Observation types?

For DESPOT I need to use states, actions, and observations as associative keys in various places. I also find it useful to be able to determine, for example, if two different Action objects represent the same exact action. So far the cleanest solution I found is defining the == operator (and thus isequal()) and the hash() function for each of the types in the problem code, so that they can be used in dictionaries and such.

Likewise, at least for the time being, I need to iterate through ranges of states, actions, and observations, so I've implemented start(), next(), and done() for each of the types.

I am guessing that this will be useful to more solvers than just DESPOT. If so, should we require that problem writers provide these operators/functions along with the state, action, and observation types? Can you think of a more elegant solution? We already have index() for discrete POMDP states in the interface, by the way. It would be replaced by hash(), which would just return the state index in the simplest case or some unique user-defined hash number for a more complex representation. Same for actions and observations.

terminal states

Hi All,

This isn't urgent, but I wanted to make a stub to start the discussion. Should we have a way of detecting terminal states to notify rollout-based solvers that there will be no additional reward? e.g.

is_terminal(state) returns true if the state is terminal

Why is domain() necessary?

I can't remember if we've discussed why domain() is in the interface. Why not just define start(), next(), and done() directly for subtypes of AbstractSpace?

The only reason I can think of is that it might be convenient to just have domain() return an array in some cases. Is there something else?

POMDP belief representations vs solver/belief updater belief representations

Here is an issue with our current approach to belief management:

  • initial_belief() or create_belief() return a belief representation that's problem-specific (i.e. POMDP-specific). For the purposes of explanation, let's assume that initial_belief(my_pomdp) returns a 10-element vector of particles that is then stored in current_belief.
  • Now, in my [online] solver, I'd like to use a higher-fidelity 500-particle belief representation. Furthermore, I'd also like to use the same 500-particle representation for maintaining the belief state in the simulation loop (via our belief() function).
  • It would, obviously, be undesirable to try to convert back and forth between the 10-particle (POMDP) and 500-particle (solver/belief updater) representations.

Here are a few possible solutions:

  1. During the first iteration of the simulation, the belief structure is converted from 10 to 500 particles inside action(), then, again, inside belief() (current_belief goes into belief() as a 10-particle vector, then comes out as a 500-particle updated_belief). Both current_belief and updated_belief then remain 500-particle vectors until the end of the simulation. Kind of ugly, especially since we'd have to ensure that conversions inside action() and belief() are done in an exactly the same way (alternatively, we need to return the converted belief vector from action() and pass it into belief()).
  2. Use a wrapper type, like what Zach uses, and change the actual representation within that wrapper structure. This, to me, is not all that different from the first option and also requires the user to know about some MyBeliefWrapper type and construct an instance of it before calling action() and belief().
  3. Add initial_belief() and create_belief() versions that take the solver and the belief updater as arguments and return a belief structure that's tailored to solver/belief updater needs from the very beginning. We would still retain the current versions of initial_belief() and create_belief(), of course:

create_belief(pomdp::POMDP)
create_belief(pomdp::POMDP, solver::Solver, bu::BeliefUpdater)

initial_belief(pomdp::POMDP, belief = create_belief(pomdp))
initial_belief(pomdp::POMDP, solver::Solver, bu::BeliefUpdater, belief = create_belief(pomdp))

The third option is my preferred solution (assuming that we go ahead and implement the BeliefUpdater type, see #32), but I am open to other ideas, of course.

JMLR Paper Roadmap

I thought it might be worthwhile to have a list of deliverables that can also serve as a roadmap for the POMDPs.jl paper.

The following is a minimal list of solvers I think we need to support. A finished solver should have extensive documentation, run tests and should be validated.

Have to have solvers:

Nice to have solvers (variants can now be found in POMDPSolve):

  • PBVI and simple variants (no repo)
  • Witness (no repo)
  • AMDP (no repo)
  • Perseus (no repo)

The following support tools are provided:

  • POMDPToolbox (still not clear what this should contain)
  • POMDPXFile (SARSOP file generation and .policy file parsing)
  • POMDPFile
  • POMDPDistributions (Most agree we don't need this)
  • POMDPSimulator (part of POMDPToolbox now)
  • PLite.jl high-level modeling language modeled after JuMP

Miscellaneous:

  • Main Doc
  • [POMDPModels.jl]

Benchmarks against the following solvers would be a nice addition to the paper:

  • AIToolbox (C++, supports VI, MCTS, QMDP, POMCP, Witness, AMDP, Perseus, and others)
  • pomdp-solve (C++, supports PBVI, Witness and others)
  • MDP Toolbox (MATLAB, supports VI, policy iteration and LP methods)
  • PyMDP Toolbox (python, supports VI and variants)

Please add or edit as you see necessary.

reward should only have the reward(state, action, statep) method

Currently, if a solver uses reward(s, a), it will not work with any problems that have reward defined based on s, a, s'. We either need to only have one method available (this is my preference, and what we decided to do for observation), or document that solvers must use the s, a, s' version.

FYI: POMCP is ready for public consumption/improvement

Hey all,

Didn't know how to notify interested people about this in any way other than creating an issue. FYI I added POMCP to the list of solvers. It's pretty bare-bones right now, and probably pretty slow compared to what it could be, but it's ready for initial consumption if AA228ers want to use it or help me improve it.

I'll be adding features as I use it more.

observation from current state

Hi all,

I am having trouble figuring out how to represent a class of problems using our framework. In most of the control problems that I am familiar with, for example the LQG problem, the controller is allowed to make a decision based on an observation produced by the current state. This does not appear to be possible using the current framework. In the current framework, since the distributions of s_{t+1} and o_t are independent and based on s_t and a_t, there is no way for an observation based on s_{t+1} to be used to make decision a_{t+1}.

An alternative way of stating this difficulty is that it is impossible to express a correlated joint distribution for s' and o in the current framework. If this were possible then the issue would be resolved.

Another alternative way of thinking about this is that, if the pomdp has a generative model, G, the current framework only allows one to define
o = G1(s,a) and
s' = G2(s,a),
but in order to express many problems, we would want to define
(s', o) = G(s,a) as in the POMCP paper.

A similar problem might also arise if the reward is a function of both s' and s. That is, as in the POMCP paper, what we really want to define in a generative model is
(s', o, r) = G(s,a)
This does not appear to be possible in the current framework. Am I missing something? Is there a way to do this?

How do you think we should fix this? This will take some thought, so I am not going to make a suggestion right now.

Abstract simulate

An idea is to keep simulate abstract (meaning have it just error out in POMDPs.jl) and provide some basic implementations in POMDPToolbox.jl. As mentioned in #19, the idea comes from RLGlue. See fig. 2 of this paper. You will see the "agent program" (policy), the "experiment program" (simulator), and the "environment program" (pomdp).

We seem to have two options:

  1. Commit to keeping POMDPs.jl implementation free. Provide a basic implementation in POMDPToolkit.jl.
  2. Provide a basic implementation of simulate in POMDPs.jl for educational reasons (as suggested by @zsunberg). Of course, we can just include a basic implementation in the documentation of POMDPs.jl and not necessarily in the code itself (with the idea of keeping POMDPs.jl pristine).

keyword argument in initial_belief()

@ebalaban , just curious if you meant to make the last argument of initial_belief a keyword argument. All the other functions just have optional arguments. If there's a reason it's totally fine though

Belief Initalization

Currently, there are no rules on how a belief should be initialized. A belief is attached to a pomdp, so having a create function for it seems to make sense. There are two ways that I think might work here:

  • Have something like create_belief(pomdp) that returns an initial belief instance.
  • Have create_state_distribution(pomdp) return a distribution that can be used in transition!() and in update_belief!(). This means that both the belief and the transition distribution are the same. This is reasonable because both are strictly distributions over states.

What does everyone think?

rename 'rand' to 'sample'?

I think that calling a function that samples from a distribution 'rand' is somewhat unintuitive, particularly since one is likely to call rand(rng, ...) of some kind inside its implementation. How about calling this function 'sample' instead? Has there been a discussion on this already that I missed?

AbstractSpace type alias of AbstractDistribution

Hi team,

What is the reasoning behind AbstractSpace being identical to (a type alias of) AbstractDistribution? To me, they seem like they represent different things conceptually. It would be helpful to have the concepts described in documentation or here in this issue.

Observations conditioned on state-action-state

Right now we have P(o|a, s'). Should we allow P(o|s, a, s').

PROS: More flexible. Edward can imagine cases where this might make things easier. Of course, one can always add stuff to the state. The POMDP file format allows for P(o|s, a, s').

CONS: Maybe it will slow things down? Adds another argument. Not used often?

execute_action function?

It seems to me that we may want the following function in the API (or something similar):

obs, reward = execute_action (pomdp, action)

Then we can use the observation to update our belief state. This is handy for online solvers that generate one action at a time. Am I missing something in the API that can support this already? Simulate(...) is not quite the same thing.

`belief` or `update_belief`

As I was doing the getting rid of! refactor, I noticed this.

update_belief is now similar to transition, observation, etc. Why should it be named differently (with update at the beginning)? One potential reason is that the object to be modified may be passed in as the second argument in addition to its place as the final argument. I went ahead and changed it to belief, but we could keep it as update_belief. What do y'all think?

Also, the current setup could get a little weird because the problem-programmer might assume that bold and belief are different objects and might do calculations in a way that doesn't function correctly if both are pointing to the same object. Is this ok?

Documentation policy

Hi guys, can we clarify our documentation policy?

Right now, I see documentation cropping up (or potentially cropping up in the future) in three places:

  1. In the readme
  2. In the source files
  3. In the "Main Doc" mentioned in the JMLR roadmap #30

To me this seems like one too many places. Can we clarify which of these will be supported so that we can put a lot of work into making at least one or two of them really good and consistent, rather than sporadically making changes and forgetting to fix documentation everywhere?

Do we want to try to maintain all three? Maybe the README should just have the function names and documentation should be elsewhere? Is there a tool for automatically converting inline documentation in the source to become the skeleton of a main doc?

better error messages

Debugging would be a whole lot easier if we made the error messages more verbose, e.g.

MyPomdp does not implement reward for states of type MyStateType and actions of type MyActionType

instead of just

MyPomdp does not implement reward

This would help when diagnosing problems like action accidentally returning the wrong thing or returning nothing.

Can I go ahead and start adding things to the error messages? Has anyone else been thinking about this? Do we want to have some kind of standard format or base it off of some principles?

Space/distribution dimensions

I am working on wrapping the MCVI code in C, and I would like to access the dimensions of the state space. Can we have something like:

dims(space::AbstractSpace)

for spaces and maybe for distributions?

I really only need the number of state variables in the state space for MCVI so maybe we can get around creating a new method. I can get around this by having the user pass in the number of dimensions in the state space into the solver as well.

What does everyone think?

Nitpick: "supported" in documentation

Shouldn't the documentation read
"The following MDP solvers support this interface:"
and
"The following POMDP solvers support this interface:"
instead of "... are supported"?

Sorry, this might seem kind of petty, but I think using clear, precise language can make a big difference in understanding.

discount part of POMDP

Have you all had the talk about whether the discount factor should be defined by the POMDP object? I noticed that it is part of the solver object in the QMDP implementation.

My initial impression is that it should be part of the POMDP object, i.e. we should make

discount(pomdp::POMDP)

part of the interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.