First of all, congratulations for your paper. I really enjoyed reading it, the idea is quite refreshing and I was happy to see I'm not the only one using Unity for RL research. ;)
I do have a (few) technical question(s) however. I have been wondering for several days now how to deal with message passing when learning in mini-batches.
Do you have parallel environments ? If so, there is an important deal of observations preprocessing, right ? I'm talking about the fact that the observations have to be properly lined in order to feed each parent its child message (in the bottom-up case).
Do you consider the message to be part of the output action ? If not, how do you backpropagate through the message sending head? In the appendix, it is written that only the sensory inputs and the action (torque + link/unlink) are considered.
There are a few details that I do not yet completely graps, but I really enjoyed the paper overall.