Hi,
I tested your code after downgrading tensorflow to version 0.12.1 and Keras to version 1.1.0. So there was no error running the program but it does not converge to the optimal policy as you can see in the following result:
('Episode', 4, 'Step', 268993, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.098257193098788914, 'Loss', 0.021010929718613625)
('Episode', 4, 'Step', 268994, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0065970534669316794, 'Loss', 0.042674191296100616)
('Episode', 4, 'Step', 268995, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00031545438826979712, 'Loss', 0.024418037384748459)
('Episode', 4, 'Step', 268996, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.019568415147379711, 'Loss', 0.014433514326810837)
('Episode', 4, 'Step', 268997, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.018506277374548623, 'Loss', 0.041520103812217712)
('Episode', 4, 'Step', 268998, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.010767906475774695, 'Loss', 0.015593868680298328)
('Episode', 4, 'Step', 268999, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.002420423072458714, 'Loss', 0.0040719900280237198)
('Episode', 4, 'Step', 269000, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.004262467498310836, 'Loss', 0.037771023809909821)
('Episode', 4, 'Step', 269001, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.026749610641322499, 'Loss', 0.028206927701830864)
('Episode', 4, 'Step', 269002, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0030027620641139251, 'Loss', 0.015061482787132263)
('Episode', 4, 'Step', 269003, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.081173965529346151, 'Loss', 0.036130554974079132)
('Episode', 4, 'Step', 269004, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.10121600213326937, 'Loss', 0.014408881776034832)
('Episode', 4, 'Step', 269005, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.030823456600634971, 'Loss', 0.0036484464071691036)
('Episode', 4, 'Step', 269006, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.01758625249384254, 'Loss', 0.016869166865944862)
('Episode', 4, 'Step', 269007, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.0040950106730622809, 'Loss', 0.0061152027919888496)
('Episode', 4, 'Step', 269008, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0015352772668305245, 'Loss', 0.0020759631879627705)
('Episode', 4, 'Step', 269009, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.013906418235719725, 'Loss', 0.013349947519600391)
('Episode', 4, 'Step', 269010, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00019903168120680918, 'Loss', 0.033619172871112823)
('Episode', 4, 'Step', 269011, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.00017914316922484539, 'Loss', 0.026164039969444275)
('Episode', 4, 'Step', 269012, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.003975938587499582, 'Loss', 0.0075099128298461437)
('Episode', 4, 'Step', 269013, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0088532675322910981, 'Loss', 0.051988624036312103)
('Episode', 4, 'Step', 269014, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.0022390205788955062, 'Loss', 0.021969024091959)
('Episode', 4, 'Step', 269015, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.033058362379728756, 'Loss', 0.0012654899619519711)
('Episode', 4, 'Step', 269016, 'Action', array([[-1., 1., 1.]]), 'Reward', -0.019517203300999583, 'Loss', 0.012628044933080673)
('Episode', 4, 'Step', 269017, 'Action', array([[-1., 1., 1.]]), 'Reward', 0.0091726067061909666, 'Loss', 0.032372094690799713)
As you can see actions are saturated. Could you help me to fix this problem?
Thanks,