Implementation of Advantage Actor-Critic Algorithm (A2C) with visual observations and discrete action space
Custom build MLDriver Unity Environment
- Observation Space:
[64, 64, 1]
- Action Space:
[3]
- Input Tensor with Dimensions
[64,64,5]
- Convolutional Layer with 32 kernels of size
[8,8]
, strides[4,4]
and ReLU activation - Convolutional Layer with 64 kernels of size
[4,4]
, strides[2,2]
and ReLU activation - Convolutional Layer with 64 kernels of size
[3,3]
, strides[1,1]
and ReLU activation - Fully-Connected Layer with 1024 neurons and ReLU activation
- Fully-Connected Layer with 512 neurons and ReLU activation
- Fully-Connected Layer with 256 neurons and ReLU activation
- Policy Head (Actor) with 3 output neurons and Value Head (Critic) with 1 output neurons
Smothed average episode reward vs number of training steps
Pavel Koryakin [email protected]