python3 main.py --env-name "PongNoFrameskip-v4"
#######
WARNING: All rewards are clipped or normalized so you need to use a monitor (see envs.py) or visdom plot to get true rewards
#######
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.uint8'>. Please provide explicit dtype.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
WARN: <class 'envs.WrapPyTorch'> doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method.
Updates 0, num timesteps 80, FPS 109, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.77930, value loss 0.02714, policy loss -0.21655
Updates 10, num timesteps 880, FPS 821, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.79153, value loss 0.12026, policy loss -0.22026
Updates 20, num timesteps 1680, FPS 1243, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.50924, value loss 0.00107, policy loss 0.04321
Updates 30, num timesteps 2480, FPS 1526, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.56517, value loss 0.00118, policy loss 0.04794
Updates 40, num timesteps 3280, FPS 1715, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.36696, value loss 0.02682, policy loss 0.09570
Updates 50, num timesteps 4080, FPS 1857, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.70921, value loss 0.12343, policy loss -0.20016
Updates 60, num timesteps 4880, FPS 1972, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.33832, value loss 0.11343, policy loss -0.06592
Updates 70, num timesteps 5680, FPS 2060, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.60979, value loss 0.06754, policy loss -0.03790
Updates 80, num timesteps 6480, FPS 2133, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.44397, value loss 0.12002, policy loss -0.07597
Updates 90, num timesteps 7280, FPS 2189, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 0.56885, value loss 0.01250, policy loss 0.05751
Updates 100, num timesteps 8080, FPS 2228, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 0.90573, value loss 0.01997, policy loss 0.08723
Updates 110, num timesteps 8880, FPS 2266, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 0.79921, value loss 0.04416, policy loss 0.07492
Updates 120, num timesteps 9680, FPS 2296, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.61159, value loss 0.00413, policy loss 0.08989
Updates 130, num timesteps 10480, FPS 2333, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.73758, value loss 0.08978, policy loss -0.10888
Updates 140, num timesteps 11280, FPS 2362, mean/median reward 0.0/0.0, min/max reward 0.0/0.0, entropy 1.64044, value loss 0.03661, policy loss 0.09212
Updates 150, num timesteps 12080, FPS 2382, mean/median reward -2.6/0.0, min/max reward -21.0/0.0, entropy 1.76401, value loss 0.00949, policy loss 0.10924
Updates 160, num timesteps 12880, FPS 2383, mean/median reward -9.2/0.0, min/max reward -21.0/0.0, entropy 1.77274, value loss 0.16604, policy loss -0.09154
Updates 170, num timesteps 13680, FPS 2398, mean/median reward -11.8/-21.0, min/max reward -21.0/0.0, entropy 1.52948, value loss 0.05661, policy loss 0.03393
Updates 180, num timesteps 14480, FPS 2410, mean/median reward -14.3/-21.0, min/max reward -21.0/0.0, entropy 1.76128, value loss 0.13569, policy loss -0.21295
Updates 190, num timesteps 15280, FPS 2412, mean/median reward -18.1/-21.0, min/max reward -21.0/0.0, entropy 1.67226, value loss 0.18734, policy loss -0.28125
Updates 200, num timesteps 16080, FPS 2423, mean/median reward -19.3/-21.0, min/max reward -21.0/0.0, entropy 1.65022, value loss 0.11167, policy loss -0.14717
Updates 210, num timesteps 16880, FPS 2437, mean/median reward -19.3/-21.0, min/max reward -21.0/0.0, entropy 1.69721, value loss 0.10054, policy loss -0.11975
Updates 220, num timesteps 17680, FPS 2450, mean/median reward -20.4/-21.0, min/max reward -21.0/-18.0, entropy 1.64935, value loss 0.06355, policy loss 0.12313
Updates 230, num timesteps 18480, FPS 2463, mean/median reward -20.4/-21.0, min/max reward -21.0/-18.0, entropy 1.70666, value loss 0.00183, policy loss 0.06517