I started trying to implement a new, more sane vector API. And then I quickly realized that it is, indeed, as messy as I could have expected, so the code will have to wait for some time.
Here I want to dump my thoughts about how this whole thing should/could look so that we have a discussion going.
Main desired outcome: we can use the vector API to easily create envs vectorized through either simple vectorization, or jax vmapping (or any other fast mechanism). This can give us huge performance improvements for some envs without relying on additional external libraries. For other envs, we default to Sync/Async/EnvPool?
Current situation: vectorization is only possible via Sync/Async, which is slow af, but very general. EnvPool (not officially supported) only works with some envs, but is faster. Other existing options are generally similar to Sync/Async, with their own quirks (e.g. ray in rllib, or the custom implementation in SB3)
The main complication is wrappers. If an environment provides its own optimized vectorized version, then we can't apply single-env wrappers to it. A nice solution would be an automatic conversion from a Wrapper
to a VectorWrapper
, but that seems either very tricky or impossible to do in a general case. Fortunately, many actual wrappers don't need that "general case" treatment.
The hope I see for this is switching to lambda wrappers, at least for some of the existing wrappers. ActionWrappers, ObservationWrappers and RewardWrappers can in principle be stateful, which requires some workarounds to map them over vectorized envs. With lambda wrappers, we can literally just do a map.
An element that I think will be crucial is different levels of optimization - existing third-party environments and wrappers should work exactly the same way, with the clunky subprocess vecenv approach, unless they do a few extra things to opt-in for the improvements.
Another rough edge might be autoreset. Currently this concept is barely present in gym, it's an optional wrapper for single envs, and in that scope it works fine. In a vectorized case, it's more important and a bit more complicated. If we don't have some sort of autoreset by default in vector envs, that makes them borderline useless for many envs (consider cartpole where the first env instance happens to take 10 steps, and the second takes 100 steps - if we only reset after both are terminated, we just lost 45% of the data)
While a vectorized autoreset is trivial with a subprocess-like vector env, that's not the case with e.g. numpy/jax acceleration. While I can see some hacks that maybe would kinda work to add it in some of these cases via wrapper, we might just have to add a requirement that the environment handles autoreset itself. Note that this wouldn't be a breaking change in env design - envs that don't have built-in autoreset can still use the standard vectorization. But if you want to use vectorized wrappers and the more efficient vectorization paradigm, you need to add it.
Finally, a question is - how much can we break? I'm not aware of any significant usage of gym.vector
, though I know it is used at least sometimes. Ideally I'd like to keep the outside API as similar as possible, perhaps even exactly the same (with additional capabilities). But can we change some of the internal semantics that are in principle exposed to the public, but are also just one of the few remaining relics of the past? As I recall, we want to do the vector revamp before 1.0, which is good, because after 1.0 we have to be very careful about breaking stuff.
Below I'm including a braindump of my semi-structured thoughts on this, just to have it recorded here with some additional details (most of this was mentioned above):
- Each environment can implement its own VectorEnv, or use built-in Sync/Async
- If implements its own, we can’t use individual wrappers - there’s no instance of
gym.Env
we can actually apply them to
- If uses built-in, then the VectorEnv contains several instances of
gym.Env
, to which individual wrappers are applied
- Each (?) wrapper should have single and vector mode - need to convert single to multi
- Should be trivial for:
- Observation wrapper - map
observation
- Reward wrapper - map
reward
- Action wrapper - map
action
- EDIT - it's actually not trivial, needs lambda wrappers
- Need to have selectable/automatic optimization
- Jax envs/wrappers →
vmap
- Pure numpy → nothing? or np vfunc?
- Generic →
np.array(map)
or np.array([... for o in obs])
- Settable in the wrapper?
self.optimization: Literal["numpy"] | Literal["jax"] | None
- Some wrappers can’t be vectorized
- Atari preprocessing - needs to reset envs asynchronously
- Autoreset in general?
- We can require optimized envs to autoreset internally. Third-party envs will default to the regular vectorization, and they can opt-in for this
Issues in the meantime:
- OrderEnforcing (and others?) accept arguments in render
- Atari wrapper
- Several typing errors in vector API
Questions:
- Can we break the whole vector API? Does anyone use it?
- SB3 and rllib def have their own
- (check myself) do we want vector API before 1.0?