juliusfrost / dreamer-pytorch Goto Github PK

View Code? Open in Web Editor NEW

267.0 267.0 34.0 677 KB

Dream to Control: Learning Behaviors by Latent Imagination, implemented in PyTorch.

License: MIT License

Python 100.00%

dreamer-pytorch's People

Contributors

Stargazers

Watchers

dreamer-pytorch's Issues

main.py does not run on pytorch==1.5.0

~~main.py does not run in mac os. Because I don't use a mac to develop, I won't fix this, but someone else can take a crack at it!~~

It turns out, it was because PyTorch updated to 1.5.0 and the pip install for mac didn't specify a version number. It seems this error affects all os versions in 1.5.0.

Error log from the build:

2020-04-22T20:42:36.5876540Z Collecting pytest
2020-04-22T20:42:36.6195830Z   Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
2020-04-22T20:42:36.8314650Z Collecting pytest-cov
2020-04-22T20:42:36.8405240Z   Downloading pytest_cov-2.8.1-py2.py3-none-any.whl (18 kB)
2020-04-22T20:42:37.1253830Z Collecting packaging
2020-04-22T20:42:37.1347300Z   Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
2020-04-22T20:42:37.2577280Z Collecting py>=1.5.0
2020-04-22T20:42:37.2647560Z   Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
2020-04-22T20:42:37.4000280Z Collecting pluggy<1.0,>=0.12
2020-04-22T20:42:37.4081120Z   Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
2020-04-22T20:42:37.5453000Z Collecting more-itertools>=4.0.0
2020-04-22T20:42:37.5524190Z   Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
2020-04-22T20:42:37.6967750Z Collecting attrs>=17.4.0
2020-04-22T20:42:37.7051850Z   Downloading attrs-19.3.0-py2.py3-none-any.whl (39 kB)
2020-04-22T20:42:37.9789450Z Collecting importlib-metadata>=0.12; python_version < "3.8"
2020-04-22T20:42:37.9865520Z   Downloading importlib_metadata-1.6.0-py2.py3-none-any.whl (30 kB)
2020-04-22T20:42:38.1663180Z Collecting wcwidth
2020-04-22T20:42:38.1730610Z   Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
2020-04-22T20:42:39.4831290Z Collecting coverage>=4.4
2020-04-22T20:42:39.4918760Z   Downloading coverage-5.1-cp36-cp36m-macosx_10_13_x86_64.whl (203 kB)
2020-04-22T20:42:39.7908350Z Requirement already satisfied: six in /Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages (from packaging->pytest) (1.14.0)
2020-04-22T20:42:40.1113990Z Collecting pyparsing>=2.0.2
2020-04-22T20:42:40.1195300Z   Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
2020-04-22T20:42:40.2541140Z Collecting zipp>=0.5
2020-04-22T20:42:40.2603170Z   Downloading zipp-3.1.0-py3-none-any.whl (4.9 kB)
2020-04-22T20:42:40.7142020Z Installing collected packages: pyparsing, packaging, py, zipp, importlib-metadata, pluggy, more-itertools, attrs, wcwidth, pytest, coverage, pytest-cov
2020-04-22T20:42:42.0165720Z Successfully installed attrs-19.3.0 coverage-5.1 importlib-metadata-1.6.0 more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 pytest-cov-2.8.1 wcwidth-0.1.9 zipp-3.1.0
2020-04-22T20:42:43.0003890Z ============================= test session starts ==============================
2020-04-22T20:42:43.0007750Z platform darwin -- Python 3.6.10, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
2020-04-22T20:42:43.0061510Z rootdir: /Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch
2020-04-22T20:42:43.0062060Z plugins: cov-2.8.1
2020-04-22T20:42:51.8153180Z collected 18 items
2020-04-22T20:42:51.8160320Z 
2020-04-22T20:42:54.6529430Z tests/dreamer/test_main.py F                                             [  5%]
2020-04-22T20:42:54.6733660Z tests/dreamer/models/test_action.py ....                                 [ 27%]
2020-04-22T20:42:54.8353140Z tests/dreamer/models/test_agent.py ...                                   [ 44%]
2020-04-22T20:42:54.8440990Z tests/dreamer/models/test_dense.py ..                                    [ 55%]
2020-04-22T20:42:54.8464560Z tests/dreamer/models/test_distribution.py .                              [ 61%]
2020-04-22T20:42:55.3288250Z tests/dreamer/models/test_observation.py .....                           [ 88%]
2020-04-22T20:42:55.6524340Z tests/dreamer/models/test_rnns.py ..                                     [100%]
2020-04-22T20:42:55.6524610Z 
2020-04-22T20:42:55.6527570Z =================================== FAILURES ===================================
2020-04-22T20:42:55.6527860Z __________________________________ test_main ___________________________________
2020-04-22T20:42:55.6527920Z 
2020-04-22T20:42:55.6529400Z     def test_main():
2020-04-22T20:42:55.6530410Z         logdir = 'data/tests/'
2020-04-22T20:42:55.6530610Z >       build_and_train(logdir)
2020-04-22T20:42:55.6530660Z 
2020-04-22T20:42:55.6530780Z tests/dreamer/test_main.py:63: 
2020-04-22T20:42:55.6531140Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-22T20:42:55.6532490Z tests/dreamer/test_main.py:58: in build_and_train
2020-04-22T20:42:55.6532630Z     runner.train()
2020-04-22T20:42:55.6534550Z /Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/rlpyt/runners/minibatch_rl.py:259: in train
2020-04-22T20:42:55.6534760Z     opt_info = self.algo.optimize_agent(itr, samples)
2020-04-22T20:42:55.6535470Z dreamer/algos/dreamer_algo.py:152: in optimize_agent
2020-04-22T20:42:55.6535770Z     actor_loss.backward()
2020-04-22T20:42:55.6537470Z /Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/torch/tensor.py:198: in backward
2020-04-22T20:42:55.6537860Z     torch.autograd.backward(self, gradient, retain_graph, create_graph)
2020-04-22T20:42:55.6539330Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-22T20:42:55.6539440Z 
2020-04-22T20:42:55.6540780Z tensors = (tensor(0.0182, grad_fn=<NegBackward>),), grad_tensors = (tensor(1.),)
2020-04-22T20:42:55.6541740Z retain_graph = False, create_graph = False, grad_variables = None
2020-04-22T20:42:55.6541830Z 
2020-04-22T20:42:55.6543880Z     def backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None):
2020-04-22T20:42:55.6545170Z         r"""Computes the sum of gradients of given tensors w.r.t. graph leaves.
2020-04-22T20:42:55.6545330Z     
2020-04-22T20:42:55.6546700Z         The graph is differentiated using the chain rule. If any of ``tensors``
2020-04-22T20:42:55.6547560Z         are non-scalar (i.e. their data has more than one element) and require
2020-04-22T20:42:55.6548400Z         gradient, then the Jacobian-vector product would be computed, in this
2020-04-22T20:42:55.6549220Z         case the function additionally requires specifying ``grad_tensors``.
2020-04-22T20:42:55.6549570Z         It should be a sequence of matching length, that contains the "vector"
2020-04-22T20:42:55.6551040Z         in the Jacobian-vector product, usually the gradient of the differentiated
2020-04-22T20:42:55.6551510Z         function w.r.t. corresponding tensors (``None`` is an acceptable value for
2020-04-22T20:42:55.6553210Z         all tensors that don't need gradient tensors).
2020-04-22T20:42:55.6553350Z     
2020-04-22T20:42:55.6553850Z         This function accumulates gradients in the leaves - you might need to zero
2020-04-22T20:42:55.6554030Z         them before calling it.
2020-04-22T20:42:55.6554210Z     
2020-04-22T20:42:55.6555190Z         Arguments:
2020-04-22T20:42:55.6556250Z             tensors (sequence of Tensor): Tensors of which the derivative will be
2020-04-22T20:42:55.6556430Z                 computed.
2020-04-22T20:42:55.6558330Z             grad_tensors (sequence of (Tensor or None)): The "vector" in the Jacobian-vector
2020-04-22T20:42:55.6558920Z                 product, usually gradients w.r.t. each element of corresponding tensors.
2020-04-22T20:42:55.6559670Z                 None values can be specified for scalar Tensors or ones that don't require
2020-04-22T20:42:55.6560480Z                 grad. If a None value would be acceptable for all grad_tensors, then this
2020-04-22T20:42:55.6560680Z                 argument is optional.
2020-04-22T20:42:55.6562090Z             retain_graph (bool, optional): If ``False``, the graph used to compute the grad
2020-04-22T20:42:55.6562230Z                 will be freed. Note that in nearly all cases setting this option to ``True``
2020-04-22T20:42:55.6563730Z                 is not needed and often can be worked around in a much more efficient
2020-04-22T20:42:55.6563960Z                 way. Defaults to the value of ``create_graph``.
2020-04-22T20:42:55.6565200Z             create_graph (bool, optional): If ``True``, graph of the derivative will
2020-04-22T20:42:55.6565750Z                 be constructed, allowing to compute higher order derivative products.
2020-04-22T20:42:55.6566350Z                 Defaults to ``False``.
2020-04-22T20:42:55.6566470Z         """
2020-04-22T20:42:55.6567060Z         if grad_variables is not None:
2020-04-22T20:42:55.6568310Z             warnings.warn("'grad_variables' is deprecated. Use 'grad_tensors' instead.")
2020-04-22T20:42:55.6568530Z             if grad_tensors is None:
2020-04-22T20:42:55.6569320Z                 grad_tensors = grad_variables
2020-04-22T20:42:55.6569490Z             else:
2020-04-22T20:42:55.6571150Z                 raise RuntimeError("'grad_tensors' and 'grad_variables' (deprecated) "
2020-04-22T20:42:55.6571360Z                                    "arguments both passed to backward(). Please only "
2020-04-22T20:42:55.6572000Z                                    "use 'grad_tensors'.")
2020-04-22T20:42:55.6572180Z     
2020-04-22T20:42:55.6577090Z         tensors = (tensors,) if isinstance(tensors, torch.Tensor) else tuple(tensors)
2020-04-22T20:42:55.6577350Z     
2020-04-22T20:42:55.6577520Z         if grad_tensors is None:
2020-04-22T20:42:55.6577790Z             grad_tensors = [None] * len(tensors)
2020-04-22T20:42:55.6578960Z         elif isinstance(grad_tensors, torch.Tensor):
2020-04-22T20:42:55.6579360Z             grad_tensors = [grad_tensors]
2020-04-22T20:42:55.6579740Z         else:
2020-04-22T20:42:55.6580230Z             grad_tensors = list(grad_tensors)
2020-04-22T20:42:55.6580560Z     
2020-04-22T20:42:55.6581290Z         grad_tensors = _make_grads(tensors, grad_tensors)
2020-04-22T20:42:55.6582120Z         if retain_graph is None:
2020-04-22T20:42:55.6582300Z             retain_graph = create_graph
2020-04-22T20:42:55.6582870Z     
2020-04-22T20:42:55.6583350Z         Variable._execution_engine.run_backward(
2020-04-22T20:42:55.6583970Z             tensors, grad_tensors, retain_graph, create_graph,
2020-04-22T20:42:55.6584480Z >           allow_unreachable=True)  # allow_unreachable flag
2020-04-22T20:42:55.6587140Z E       RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
2020-04-22T20:42:55.6587270Z 
2020-04-22T20:42:55.6589120Z /Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/torch/autograd/__init__.py:100: RuntimeError
2020-04-22T20:42:55.6590200Z ----------------------------- Captured stdout call -----------------------------
2020-04-22T20:42:55.6590800Z 2020-04-22 20:42:51.890607  | dreamer_pong_0 Runner  master CPU affinity: UNAVAILABLE MacOS.
2020-04-22T20:42:55.6591310Z 2020-04-22 20:42:51.890813  | dreamer_pong_0 Runner  master Torch threads: 1.
2020-04-22T20:42:55.6591770Z �[32musing seed 5737�[0m
2020-04-22T20:42:55.6592310Z 2020-04-22 20:42:54.065460  | dreamer_pong_0 Sampler decorrelating envs, max steps: 0
2020-04-22T20:42:55.6593190Z 2020-04-22 20:42:54.066371  | dreamer_pong_0 Serial Sampler initialized.
2020-04-22T20:42:55.6593700Z 2020-04-22 20:42:54.066464  | dreamer_pong_0 Running 20 iterations of minibatch RL.
2020-04-22T20:42:55.6594160Z 2020-04-22 20:42:54.067698  | dreamer_pong_0 Optimizing over 10 iterations.
2020-04-22T20:42:55.6594300Z Warning: No valid output stream.
2020-04-22T20:42:55.6594810Z 2020-04-22 20:42:54.139205  | dreamer_pong_0 itr #9 saving snapshot...
2020-04-22T20:42:55.6595280Z 2020-04-22 20:42:54.204652  | dreamer_pong_0 itr #9 saved
2020-04-22T20:42:55.6595770Z 2020-04-22 20:42:54.226654  | -----------------------------  ----------
2020-04-22T20:42:55.6596490Z 2020-04-22 20:42:54.226800  | Diagnostics/NewCompletedTrajs    0
2020-04-22T20:42:55.6597090Z 2020-04-22 20:42:54.227001  | Diagnostics/StepsInTrajWindow    0
2020-04-22T20:42:55.6597590Z 2020-04-22 20:42:54.227099  | Diagnostics/Iteration            9
2020-04-22T20:42:55.6598070Z 2020-04-22 20:42:54.227242  | Diagnostics/CumTime (s)          0.137128
2020-04-22T20:42:55.6598560Z 2020-04-22 20:42:54.227420  | Diagnostics/CumSteps            10
2020-04-22T20:42:55.6598990Z 2020-04-22 20:42:54.227517  | Diagnostics/CumCompletedTrajs    0
2020-04-22T20:42:55.6599470Z 2020-04-22 20:42:54.227582  | Diagnostics/CumUpdates           0
2020-04-22T20:42:55.6599950Z 2020-04-22 20:42:54.227731  | Diagnostics/StepsPerSecond      72.9245
2020-04-22T20:42:55.6600420Z 2020-04-22 20:42:54.227937  | Diagnostics/UpdatesPerSecond     0
2020-04-22T20:42:55.6600900Z 2020-04-22 20:42:54.228049  | Diagnostics/ReplayRatio          0
2020-04-22T20:42:55.6601340Z 2020-04-22 20:42:54.228121  | Diagnostics/CumReplayRatio       0
2020-04-22T20:42:55.6601820Z 2020-04-22 20:42:54.228331  | loss/Average                   nan
2020-04-22T20:42:55.6602380Z 2020-04-22 20:42:54.228460  | loss/Std                       nan
2020-04-22T20:42:55.6602870Z 2020-04-22 20:42:54.228529  | loss/Median                    nan
2020-04-22T20:42:55.6603320Z 2020-04-22 20:42:54.228627  | loss/Min                       nan
2020-04-22T20:42:55.6603790Z 2020-04-22 20:42:54.228784  | loss/Max                       nan
2020-04-22T20:42:55.6604270Z 2020-04-22 20:42:54.228888  | model_loss/Average             nan
2020-04-22T20:42:55.6604730Z 2020-04-22 20:42:54.228953  | model_loss/Std                 nan
2020-04-22T20:42:55.6605210Z 2020-04-22 20:42:54.229017  | model_loss/Median              nan
2020-04-22T20:42:55.6605640Z 2020-04-22 20:42:54.229254  | model_loss/Min                 nan
2020-04-22T20:42:55.6606520Z 2020-04-22 20:42:54.229476  | model_loss/Max                 nan
2020-04-22T20:42:55.6607260Z 2020-04-22 20:42:54.229558  | actor_loss/Average             nan
2020-04-22T20:42:55.6607780Z 2020-04-22 20:42:54.229625  | actor_loss/Std                 nan
2020-04-22T20:42:55.6608230Z 2020-04-22 20:42:54.229818  | actor_loss/Median              nan
2020-04-22T20:42:55.6608710Z 2020-04-22 20:42:54.229951  | actor_loss/Min                 nan
2020-04-22T20:42:55.6609210Z 2020-04-22 20:42:54.230024  | actor_loss/Max                 nan
2020-04-22T20:42:55.6609680Z 2020-04-22 20:42:54.230090  | value_loss/Average             nan
2020-04-22T20:42:55.6610160Z 2020-04-22 20:42:54.230197  | value_loss/Std                 nan
2020-04-22T20:42:55.6610590Z 2020-04-22 20:42:54.230384  | value_loss/Median              nan
2020-04-22T20:42:55.6611080Z 2020-04-22 20:42:54.230493  | value_loss/Min                 nan
2020-04-22T20:42:55.6611560Z 2020-04-22 20:42:54.230618  | value_loss/Max                 nan
2020-04-22T20:42:55.6612020Z 2020-04-22 20:42:54.230844  | prior_entropy/Average          nan
2020-04-22T20:42:55.6612510Z 2020-04-22 20:42:54.230915  | prior_entropy/Std              nan
2020-04-22T20:42:55.6613100Z 2020-04-22 20:42:54.231061  | prior_entropy/Median           nan
2020-04-22T20:42:55.6613590Z 2020-04-22 20:42:54.231234  | prior_entropy/Min              nan
2020-04-22T20:42:55.6614030Z 2020-04-22 20:42:54.231336  | prior_entropy/Max              nan
2020-04-22T20:42:55.6614920Z 2020-04-22 20:42:54.231451  | post_entropy/Average           nan
2020-04-22T20:42:55.6615410Z 2020-04-22 20:42:54.231570  | post_entropy/Std               nan
2020-04-22T20:42:55.6615880Z 2020-04-22 20:42:54.231637  | post_entropy/Median            nan
2020-04-22T20:42:55.6616320Z 2020-04-22 20:42:54.231701  | post_entropy/Min               nan
2020-04-22T20:42:55.6616800Z 2020-04-22 20:42:54.231813  | post_entropy/Max               nan
2020-04-22T20:42:55.6617340Z 2020-04-22 20:42:54.231878  | divergence/Average             nan
2020-04-22T20:42:55.6617820Z 2020-04-22 20:42:54.231942  | divergence/Std                 nan
2020-04-22T20:42:55.6618300Z 2020-04-22 20:42:54.232005  | divergence/Median              nan
2020-04-22T20:42:55.6618990Z 2020-04-22 20:42:54.232069  | divergence/Min                 nan
2020-04-22T20:42:55.6619560Z 2020-04-22 20:42:54.232132  | divergence/Max                 nan
2020-04-22T20:42:55.6620040Z 2020-04-22 20:42:54.232195  | reward_loss/Average            nan
2020-04-22T20:42:55.6620530Z 2020-04-22 20:42:54.232259  | reward_loss/Std                nan
2020-04-22T20:42:55.6621010Z 2020-04-22 20:42:54.232322  | reward_loss/Median             nan
2020-04-22T20:42:55.6621440Z 2020-04-22 20:42:54.232386  | reward_loss/Min                nan
2020-04-22T20:42:55.6621930Z 2020-04-22 20:42:54.232449  | reward_loss/Max                nan
2020-04-22T20:42:55.6622420Z 2020-04-22 20:42:54.232512  | image_loss/Average             nan
2020-04-22T20:42:55.6622890Z 2020-04-22 20:42:54.232611  | image_loss/Std                 nan
2020-04-22T20:42:55.6623330Z 2020-04-22 20:42:54.232688  | image_loss/Median              nan
2020-04-22T20:42:55.6623810Z 2020-04-22 20:42:54.232743  | image_loss/Min                 nan
2020-04-22T20:42:55.6624300Z 2020-04-22 20:42:54.232796  | image_loss/Max                 nan
2020-04-22T20:42:55.6624850Z 2020-04-22 20:42:54.232850  | -----------------------------  ----------
2020-04-22T20:42:55.6625360Z 2020-04-22 20:42:54.233423  | dreamer_pong_0 itr #9 Optimizing over 10 iterations.
2020-04-22T20:42:55.6625470Z Warning: No valid output stream.
2020-04-22T20:42:55.6625930Z ----------------------------- Captured stderr call -----------------------------
2020-04-22T20:42:55.6626040Z 
2020-04-22T20:42:55.6626120Z Imagination:   0%|          | 0/1 [00:00<?, ?it/s]
2020-04-22T20:42:55.6626230Z Imagination:   0%|          | 0/1 [00:00<?, ?it/s]
2020-04-22T20:42:55.6626320Z 
2020-04-22T20:42:55.6626750Z ---------- coverage: platform darwin, python 3.6.10-final-0 ----------
2020-04-22T20:42:55.6626860Z Coverage XML written to file coverage.xml
2020-04-22T20:42:55.6626910Z 
2020-04-22T20:42:55.6627070Z =========================== short test summary info ============================
2020-04-22T20:42:55.6628160Z FAILED tests/dreamer/test_main.py::test_main - RuntimeError: one of the varia...
2020-04-22T20:42:55.6628320Z ======================== 1 failed, 17 passed in 12.66s =========================
2020-04-22T20:42:55.7923550Z ##[error]Process completed with exit code 1.

Error with anomaly detection:

2020-04-22T21:21:18.5506060Z ##[group]Run pip install pytest pytest-cov
2020-04-22T21:21:18.5506380Z �[36;1mpip install pytest pytest-cov�[0m
2020-04-22T21:21:18.5506460Z �[36;1mpytest tests --cov=dreamer --cov-report=xml�[0m
2020-04-22T21:21:18.5668270Z shell: /bin/bash -e {0}
2020-04-22T21:21:18.5668470Z env:
2020-04-22T21:21:18.5668610Z   pythonLocation: /Users/runner/hostedtoolcache/Python/3.7.6/x64
2020-04-22T21:21:18.5668740Z ##[endgroup]
2020-04-22T21:21:19.6524770Z Collecting pytest
2020-04-22T21:21:19.6826470Z   Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
2020-04-22T21:21:19.9416970Z Collecting pytest-cov
2020-04-22T21:21:19.9484140Z   Downloading pytest_cov-2.8.1-py2.py3-none-any.whl (18 kB)
2020-04-22T21:21:20.0467580Z Collecting wcwidth
2020-04-22T21:21:20.0575740Z   Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
2020-04-22T21:21:20.1687880Z Collecting py>=1.5.0
2020-04-22T21:21:20.1749250Z   Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
2020-04-22T21:21:20.4829230Z Collecting importlib-metadata>=0.12; python_version < "3.8"
2020-04-22T21:21:20.4914430Z   Downloading importlib_metadata-1.6.0-py2.py3-none-any.whl (30 kB)
2020-04-22T21:21:20.5889610Z Collecting pluggy<1.0,>=0.12
2020-04-22T21:21:20.5957990Z   Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
2020-04-22T21:21:20.7772780Z Collecting packaging
2020-04-22T21:21:20.7841680Z   Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
2020-04-22T21:21:20.9053280Z Collecting more-itertools>=4.0.0
2020-04-22T21:21:20.9122460Z   Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
2020-04-22T21:21:21.0086450Z Collecting attrs>=17.4.0
2020-04-22T21:21:21.0162000Z   Downloading attrs-19.3.0-py2.py3-none-any.whl (39 kB)
2020-04-22T21:21:21.8292050Z Collecting coverage>=4.4
2020-04-22T21:21:21.8397260Z   Downloading coverage-5.1-cp37-cp37m-macosx_10_13_x86_64.whl (203 kB)
2020-04-22T21:21:21.9922260Z Collecting zipp>=0.5
2020-04-22T21:21:22.0005150Z   Downloading zipp-3.1.0-py3-none-any.whl (4.9 kB)
2020-04-22T21:21:22.0293940Z Requirement already satisfied: six in /Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages (from packaging->pytest) (1.14.0)
2020-04-22T21:21:22.2198340Z Collecting pyparsing>=2.0.2
2020-04-22T21:21:22.2269270Z   Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
2020-04-22T21:21:22.5346110Z Installing collected packages: wcwidth, py, zipp, importlib-metadata, pluggy, pyparsing, packaging, more-itertools, attrs, pytest, coverage, pytest-cov
2020-04-22T21:21:23.6225720Z Successfully installed attrs-19.3.0 coverage-5.1 importlib-metadata-1.6.0 more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 pytest-cov-2.8.1 wcwidth-0.1.9 zipp-3.1.0
2020-04-22T21:21:24.3545700Z ============================= test session starts ==============================
2020-04-22T21:21:24.3553280Z platform darwin -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
2020-04-22T21:21:24.3614370Z rootdir: /Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch
2020-04-22T21:21:24.3615030Z plugins: cov-2.8.1
2020-04-22T21:21:33.8253480Z collected 18 items
2020-04-22T21:21:33.8262020Z 
2020-04-22T21:21:37.2885550Z tests/dreamer/test_main.py F                                             [  5%]
2020-04-22T21:21:37.4517150Z tests/dreamer/models/test_action.py ....                                 [ 27%]
2020-04-22T21:21:38.0522990Z tests/dreamer/models/test_agent.py ...                                   [ 44%]
2020-04-22T21:21:38.1096380Z tests/dreamer/models/test_dense.py ..                                    [ 55%]
2020-04-22T21:21:38.1124260Z tests/dreamer/models/test_distribution.py .                              [ 61%]
2020-04-22T21:21:38.7461140Z tests/dreamer/models/test_observation.py .....                           [ 88%]
2020-04-22T21:21:40.0865880Z tests/dreamer/models/test_rnns.py ..                                     [100%]
2020-04-22T21:21:40.0866590Z 
2020-04-22T21:21:40.0867820Z =================================== FAILURES ===================================
2020-04-22T21:21:40.0868040Z __________________________________ test_main ___________________________________
2020-04-22T21:21:40.0868100Z 
2020-04-22T21:21:40.0868240Z     def test_main():
2020-04-22T21:21:40.0869190Z         logdir = 'data/tests/'
2020-04-22T21:21:40.0869290Z >       build_and_train(logdir)
2020-04-22T21:21:40.0869380Z 
2020-04-22T21:21:40.0869500Z tests/dreamer/test_main.py:63: 
2020-04-22T21:21:40.0869590Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-22T21:21:40.0869740Z tests/dreamer/test_main.py:58: in build_and_train
2020-04-22T21:21:40.0869860Z     runner.train()
2020-04-22T21:21:40.0870740Z /Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py:259: in train
2020-04-22T21:21:40.0871270Z     opt_info = self.algo.optimize_agent(itr, samples)
2020-04-22T21:21:40.0871420Z dreamer/algos/dreamer_algo.py:154: in optimize_agent
2020-04-22T21:21:40.0871570Z     actor_loss.backward()
2020-04-22T21:21:40.0872340Z /Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/tensor.py:198: in backward
2020-04-22T21:21:40.0873600Z     torch.autograd.backward(self, gradient, retain_graph, create_graph)
2020-04-22T21:21:40.0874080Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-22T21:21:40.0874810Z 
2020-04-22T21:21:40.0875570Z tensors = (tensor(-0.2482, grad_fn=<NegBackward>),)
2020-04-22T21:21:40.0876050Z grad_tensors = (tensor(1.),), retain_graph = False, create_graph = False
2020-04-22T21:21:40.0876240Z grad_variables = None
2020-04-22T21:21:40.0876370Z 
2020-04-22T21:21:40.0876530Z     def backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None):
2020-04-22T21:21:40.0877720Z         r"""Computes the sum of gradients of given tensors w.r.t. graph leaves.
2020-04-22T21:21:40.0877910Z     
2020-04-22T21:21:40.0878060Z         The graph is differentiated using the chain rule. If any of ``tensors``
2020-04-22T21:21:40.0878700Z         are non-scalar (i.e. their data has more than one element) and require
2020-04-22T21:21:40.0879210Z         gradient, then the Jacobian-vector product would be computed, in this
2020-04-22T21:21:40.0879390Z         case the function additionally requires specifying ``grad_tensors``.
2020-04-22T21:21:40.0879540Z         It should be a sequence of matching length, that contains the "vector"
2020-04-22T21:21:40.0880070Z         in the Jacobian-vector product, usually the gradient of the differentiated
2020-04-22T21:21:40.0880250Z         function w.r.t. corresponding tensors (``None`` is an acceptable value for
2020-04-22T21:21:40.0880740Z         all tensors that don't need gradient tensors).
2020-04-22T21:21:40.0880940Z     
2020-04-22T21:21:40.0881430Z         This function accumulates gradients in the leaves - you might need to zero
2020-04-22T21:21:40.0881620Z         them before calling it.
2020-04-22T21:21:40.0881740Z     
2020-04-22T21:21:40.0882160Z         Arguments:
2020-04-22T21:21:40.0882850Z             tensors (sequence of Tensor): Tensors of which the derivative will be
2020-04-22T21:21:40.0883360Z                 computed.
2020-04-22T21:21:40.0884550Z             grad_tensors (sequence of (Tensor or None)): The "vector" in the Jacobian-vector
2020-04-22T21:21:40.0884750Z                 product, usually gradients w.r.t. each element of corresponding tensors.
2020-04-22T21:21:40.0885540Z                 None values can be specified for scalar Tensors or ones that don't require
2020-04-22T21:21:40.0886160Z                 grad. If a None value would be acceptable for all grad_tensors, then this
2020-04-22T21:21:40.0886270Z                 argument is optional.
2020-04-22T21:21:40.0886630Z             retain_graph (bool, optional): If ``False``, the graph used to compute the grad
2020-04-22T21:21:40.0887210Z                 will be freed. Note that in nearly all cases setting this option to ``True``
2020-04-22T21:21:40.0888060Z                 is not needed and often can be worked around in a much more efficient
2020-04-22T21:21:40.0888320Z                 way. Defaults to the value of ``create_graph``.
2020-04-22T21:21:40.0888820Z             create_graph (bool, optional): If ``True``, graph of the derivative will
2020-04-22T21:21:40.0889420Z                 be constructed, allowing to compute higher order derivative products.
2020-04-22T21:21:40.0889750Z                 Defaults to ``False``.
2020-04-22T21:21:40.0889950Z         """
2020-04-22T21:21:40.0890400Z         if grad_variables is not None:
2020-04-22T21:21:40.0891350Z             warnings.warn("'grad_variables' is deprecated. Use 'grad_tensors' instead.")
2020-04-22T21:21:40.0891550Z             if grad_tensors is None:
2020-04-22T21:21:40.0891840Z                 grad_tensors = grad_variables
2020-04-22T21:21:40.0892230Z             else:
2020-04-22T21:21:40.0892880Z                 raise RuntimeError("'grad_tensors' and 'grad_variables' (deprecated) "
2020-04-22T21:21:40.0893100Z                                    "arguments both passed to backward(). Please only "
2020-04-22T21:21:40.0894040Z                                    "use 'grad_tensors'.")
2020-04-22T21:21:40.0894220Z     
2020-04-22T21:21:40.0894310Z         tensors = (tensors,) if isinstance(tensors, torch.Tensor) else tuple(tensors)
2020-04-22T21:21:40.0894460Z     
2020-04-22T21:21:40.0894780Z         if grad_tensors is None:
2020-04-22T21:21:40.0895260Z             grad_tensors = [None] * len(tensors)
2020-04-22T21:21:40.0895680Z         elif isinstance(grad_tensors, torch.Tensor):
2020-04-22T21:21:40.0896060Z             grad_tensors = [grad_tensors]
2020-04-22T21:21:40.0896200Z         else:
2020-04-22T21:21:40.0896610Z             grad_tensors = list(grad_tensors)
2020-04-22T21:21:40.0896790Z     
2020-04-22T21:21:40.0897320Z         grad_tensors = _make_grads(tensors, grad_tensors)
2020-04-22T21:21:40.0897600Z         if retain_graph is None:
2020-04-22T21:21:40.0898000Z             retain_graph = create_graph
2020-04-22T21:21:40.0898270Z     
2020-04-22T21:21:40.0898630Z         Variable._execution_engine.run_backward(
2020-04-22T21:21:40.0899000Z             tensors, grad_tensors, retain_graph, create_graph,
2020-04-22T21:21:40.0899540Z >           allow_unreachable=True)  # allow_unreachable flag
2020-04-22T21:21:40.0901920Z E       RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
2020-04-22T21:21:40.0902060Z 
2020-04-22T21:21:40.0903330Z /Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/autograd/__init__.py:100: RuntimeError
2020-04-22T21:21:40.0904240Z ----------------------------- Captured stdout call -----------------------------
2020-04-22T21:21:40.0904840Z 2020-04-22 21:21:33.965571  | dreamer_pong_0 Runner  master CPU affinity: UNAVAILABLE MacOS.
2020-04-22T21:21:40.0905360Z 2020-04-22 21:21:33.965814  | dreamer_pong_0 Runner  master Torch threads: 1.
2020-04-22T21:21:40.0905860Z �[32musing seed 3925�[0m
2020-04-22T21:21:40.0906330Z 2020-04-22 21:21:36.152252  | dreamer_pong_0 Sampler decorrelating envs, max steps: 0
2020-04-22T21:21:40.0906850Z 2020-04-22 21:21:36.153202  | dreamer_pong_0 Serial Sampler initialized.
2020-04-22T21:21:40.0907380Z 2020-04-22 21:21:36.153295  | dreamer_pong_0 Running 20 iterations of minibatch RL.
2020-04-22T21:21:40.0908340Z 2020-04-22 21:21:36.154483  | dreamer_pong_0 Optimizing over 10 iterations.
2020-04-22T21:21:40.0908500Z Warning: No valid output stream.
2020-04-22T21:21:40.0909000Z 2020-04-22 21:21:36.221418  | dreamer_pong_0 itr #9 saving snapshot...
2020-04-22T21:21:40.0909490Z 2020-04-22 21:21:36.296967  | dreamer_pong_0 itr #9 saved
2020-04-22T21:21:40.0909940Z 2020-04-22 21:21:36.321011  | -----------------------------  ----------
2020-04-22T21:21:40.0910730Z 2020-04-22 21:21:36.321133  | Diagnostics/NewCompletedTrajs    0
2020-04-22T21:21:40.0911310Z 2020-04-22 21:21:36.321347  | Diagnostics/StepsInTrajWindow    0
2020-04-22T21:21:40.0911810Z 2020-04-22 21:21:36.321423  | Diagnostics/Iteration            9
2020-04-22T21:21:40.0912320Z 2020-04-22 21:21:36.321491  | Diagnostics/CumTime (s)          0.142672
2020-04-22T21:21:40.0912780Z 2020-04-22 21:21:36.321568  | Diagnostics/CumSteps            10
2020-04-22T21:21:40.0913290Z 2020-04-22 21:21:36.321764  | Diagnostics/CumCompletedTrajs    0
2020-04-22T21:21:40.0913810Z 2020-04-22 21:21:36.321852  | Diagnostics/CumUpdates           0
2020-04-22T21:21:40.0914300Z 2020-04-22 21:21:36.321923  | Diagnostics/StepsPerSecond      70.0909
2020-04-22T21:21:40.0914760Z 2020-04-22 21:21:36.321999  | Diagnostics/UpdatesPerSecond     0
2020-04-22T21:21:40.0915250Z 2020-04-22 21:21:36.322199  | Diagnostics/ReplayRatio          0
2020-04-22T21:21:40.0915750Z 2020-04-22 21:21:36.322296  | Diagnostics/CumReplayRatio       0
2020-04-22T21:21:40.0916230Z 2020-04-22 21:21:36.322368  | loss/Average                   nan
2020-04-22T21:21:40.0916730Z 2020-04-22 21:21:36.322471  | loss/Std                       nan
2020-04-22T21:21:40.0917260Z 2020-04-22 21:21:36.322661  | loss/Median                    nan
2020-04-22T21:21:40.0917760Z 2020-04-22 21:21:36.322733  | loss/Min                       nan
2020-04-22T21:21:40.0918270Z 2020-04-22 21:21:36.322801  | loss/Max                       nan
2020-04-22T21:21:40.0918750Z 2020-04-22 21:21:36.322870  | model_loss/Average             nan
2020-04-22T21:21:40.0919210Z 2020-04-22 21:21:36.322946  | model_loss/Std                 nan
2020-04-22T21:21:40.0919710Z 2020-04-22 21:21:36.323139  | model_loss/Median              nan
2020-04-22T21:21:40.0920200Z 2020-04-22 21:21:36.323212  | model_loss/Min                 nan
2020-04-22T21:21:40.0920680Z 2020-04-22 21:21:36.323280  | model_loss/Max                 nan
2020-04-22T21:21:40.0921180Z 2020-04-22 21:21:36.323378  | actor_loss/Average             nan
2020-04-22T21:21:40.0921640Z 2020-04-22 21:21:36.323574  | actor_loss/Std                 nan
2020-04-22T21:21:40.0922130Z 2020-04-22 21:21:36.323646  | actor_loss/Median              nan
2020-04-22T21:21:40.0922630Z 2020-04-22 21:21:36.323725  | actor_loss/Min                 nan
2020-04-22T21:21:40.0923120Z 2020-04-22 21:21:36.323793  | actor_loss/Max                 nan
2020-04-22T21:21:40.0923570Z 2020-04-22 21:21:36.323862  | value_loss/Average             nan
2020-04-22T21:21:40.0924080Z 2020-04-22 21:21:36.323930  | value_loss/Std                 nan
2020-04-22T21:21:40.0924580Z 2020-04-22 21:21:36.323997  | value_loss/Median              nan
2020-04-22T21:21:40.0925060Z 2020-04-22 21:21:36.324065  | value_loss/Min                 nan
2020-04-22T21:21:40.0925560Z 2020-04-22 21:21:36.324133  | value_loss/Max                 nan
2020-04-22T21:21:40.0926010Z 2020-04-22 21:21:36.324200  | prior_entropy/Average          nan
2020-04-22T21:21:40.0926510Z 2020-04-22 21:21:36.324268  | prior_entropy/Std              nan
2020-04-22T21:21:40.0927100Z 2020-04-22 21:21:36.324336  | prior_entropy/Median           nan
2020-04-22T21:21:40.0927610Z 2020-04-22 21:21:36.324405  | prior_entropy/Min              nan
2020-04-22T21:21:40.0928120Z 2020-04-22 21:21:36.324598  | prior_entropy/Max              nan
2020-04-22T21:21:40.0928600Z 2020-04-22 21:21:36.324691  | post_entropy/Average           nan
2020-04-22T21:21:40.0929100Z 2020-04-22 21:21:36.324784  | post_entropy/Std               nan
2020-04-22T21:21:40.0929870Z 2020-04-22 21:21:36.324953  | post_entropy/Median            nan
2020-04-22T21:21:40.0930380Z 2020-04-22 21:21:36.325047  | post_entropy/Min               nan
2020-04-22T21:21:40.0930870Z 2020-04-22 21:21:36.325138  | post_entropy/Max               nan
2020-04-22T21:21:40.0931360Z 2020-04-22 21:21:36.325308  | divergence/Average             nan
2020-04-22T21:21:40.0931810Z 2020-04-22 21:21:36.325401  | divergence/Std                 nan
2020-04-22T21:21:40.0932560Z 2020-04-22 21:21:36.325471  | divergence/Median              nan
2020-04-22T21:21:40.0933860Z 2020-04-22 21:21:36.325574  | divergence/Min                 nan
2020-04-22T21:21:40.0934380Z 2020-04-22 21:21:36.325738  | divergence/Max                 nan
2020-04-22T21:21:40.0934880Z 2020-04-22 21:21:36.325838  | reward_loss/Average            nan
2020-04-22T21:21:40.0935330Z 2020-04-22 21:21:36.325916  | reward_loss/Std                nan
2020-04-22T21:21:40.0935830Z 2020-04-22 21:21:36.326030  | reward_loss/Median             nan
2020-04-22T21:21:40.0936340Z 2020-04-22 21:21:36.326099  | reward_loss/Min                nan
2020-04-22T21:21:40.0936820Z 2020-04-22 21:21:36.326167  | reward_loss/Max                nan
2020-04-22T21:21:40.0937360Z 2020-04-22 21:21:36.326234  | image_loss/Average             nan
2020-04-22T21:21:40.0937860Z 2020-04-22 21:21:36.326332  | image_loss/Std                 nan
2020-04-22T21:21:40.0938370Z 2020-04-22 21:21:36.326405  | image_loss/Median              nan
2020-04-22T21:21:40.0938860Z 2020-04-22 21:21:36.326462  | image_loss/Min                 nan
2020-04-22T21:21:40.0939370Z 2020-04-22 21:21:36.326517  | image_loss/Max                 nan
2020-04-22T21:21:40.0939820Z 2020-04-22 21:21:36.326572  | -----------------------------  ----------
2020-04-22T21:21:40.0940350Z 2020-04-22 21:21:36.327139  | dreamer_pong_0 itr #9 Optimizing over 10 iterations.
2020-04-22T21:21:40.0940510Z Warning: No valid output stream.
2020-04-22T21:21:40.0941000Z ----------------------------- Captured stderr call -----------------------------
2020-04-22T21:21:40.0941090Z 
2020-04-22T21:21:40.0941220Z Imagination:   0%|          | 0/1 [00:00<?, ?it/s]Warning: Error detected in MmBackward. Traceback of forward call that caused the error:
2020-04-22T21:21:40.0942520Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/bin/pytest", line 8, in <module>
2020-04-22T21:21:40.0942780Z     sys.exit(main())
2020-04-22T21:21:40.0943550Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/config/__init__.py", line 125, in main
2020-04-22T21:21:40.0943720Z     config=config
2020-04-22T21:21:40.0944290Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
2020-04-22T21:21:40.0944470Z     return self._hookexec(self, self.get_hookimpls(), kwargs)
2020-04-22T21:21:40.0945060Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
2020-04-22T21:21:40.0945240Z     return self._inner_hookexec(hook, methods, kwargs)
2020-04-22T21:21:40.0945850Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
2020-04-22T21:21:40.0946020Z     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
2020-04-22T21:21:40.0946630Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
2020-04-22T21:21:40.0946760Z     res = hook_impl.function(*args)
2020-04-22T21:21:40.0947370Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/main.py", line 240, in pytest_cmdline_main
2020-04-22T21:21:40.0947550Z     return wrap_session(config, _main)
2020-04-22T21:21:40.0948160Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/main.py", line 191, in wrap_session
2020-04-22T21:21:40.0948310Z     session.exitstatus = doit(config, session) or 0
2020-04-22T21:21:40.0948910Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/main.py", line 247, in _main
2020-04-22T21:21:40.0949410Z     config.hook.pytest_runtestloop(session=session)
2020-04-22T21:21:40.0950060Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
2020-04-22T21:21:40.0950240Z     return self._hookexec(self, self.get_hookimpls(), kwargs)
2020-04-22T21:21:40.0951170Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
2020-04-22T21:21:40.0951340Z     return self._inner_hookexec(hook, methods, kwargs)
2020-04-22T21:21:40.0951950Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
2020-04-22T21:21:40.0952080Z     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
2020-04-22T21:21:40.0952690Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
2020-04-22T21:21:40.0952820Z     res = hook_impl.function(*args)
2020-04-22T21:21:40.0953430Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/main.py", line 272, in pytest_runtestloop
2020-04-22T21:21:40.0953620Z     item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
2020-04-22T21:21:40.0954230Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
2020-04-22T21:21:40.0954380Z     return self._hookexec(self, self.get_hookimpls(), kwargs)
2020-04-22T21:21:40.0954990Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
2020-04-22T21:21:40.0955110Z     return self._inner_hookexec(hook, methods, kwargs)
2020-04-22T21:21:40.0955710Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
2020-04-22T21:21:40.0955900Z     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
2020-04-22T21:21:40.0956490Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
2020-04-22T21:21:40.0956670Z     res = hook_impl.function(*args)
2020-04-22T21:21:40.0957280Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 85, in pytest_runtest_protocol
2020-04-22T21:21:40.0957450Z     runtestprotocol(item, nextitem=nextitem)
2020-04-22T21:21:40.0958020Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 100, in runtestprotocol
2020-04-22T21:21:40.0958160Z     reports.append(call_and_report(item, "call", log))
2020-04-22T21:21:40.0958770Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 186, in call_and_report
2020-04-22T21:21:40.0958930Z     call = call_runtest_hook(item, when, **kwds)
2020-04-22T21:21:40.0959560Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 217, in call_runtest_hook
2020-04-22T21:21:40.0959710Z     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
2020-04-22T21:21:40.0960310Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 244, in from_call
2020-04-22T21:21:40.0960430Z     result = func()
2020-04-22T21:21:40.0961030Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 217, in <lambda>
2020-04-22T21:21:40.0961210Z     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
2020-04-22T21:21:40.0961800Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
2020-04-22T21:21:40.0961970Z     return self._hookexec(self, self.get_hookimpls(), kwargs)
2020-04-22T21:21:40.0962580Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
2020-04-22T21:21:40.0963060Z     return self._inner_hookexec(hook, methods, kwargs)
2020-04-22T21:21:40.0963650Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
2020-04-22T21:21:40.0963830Z     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
2020-04-22T21:21:40.0964690Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
2020-04-22T21:21:40.0964890Z     res = hook_impl.function(*args)
2020-04-22T21:21:40.0965560Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/runner.py", line 135, in pytest_runtest_call
2020-04-22T21:21:40.0965710Z     item.runtest()
2020-04-22T21:21:40.0966310Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/python.py", line 1479, in runtest
2020-04-22T21:21:40.0966440Z     self.ihook.pytest_pyfunc_call(pyfuncitem=self)
2020-04-22T21:21:40.0967130Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
2020-04-22T21:21:40.0967320Z     return self._hookexec(self, self.get_hookimpls(), kwargs)
2020-04-22T21:21:40.0967920Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
2020-04-22T21:21:40.0968100Z     return self._inner_hookexec(hook, methods, kwargs)
2020-04-22T21:21:40.0968700Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
2020-04-22T21:21:40.0968860Z     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
2020-04-22T21:21:40.0969420Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
2020-04-22T21:21:40.0969600Z     res = hook_impl.function(*args)
2020-04-22T21:21:40.0970210Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/_pytest/python.py", line 184, in pytest_pyfunc_call
2020-04-22T21:21:40.0970390Z     result = testfunction(**testargs)
2020-04-22T21:21:40.0970990Z   File "/Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch/tests/dreamer/test_main.py", line 63, in test_main
2020-04-22T21:21:40.0971150Z     build_and_train(logdir)
2020-04-22T21:21:40.0971770Z   File "/Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch/tests/dreamer/test_main.py", line 58, in build_and_train
2020-04-22T21:21:40.0971890Z     runner.train()
2020-04-22T21:21:40.0972480Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py", line 259, in train
2020-04-22T21:21:40.0972660Z     opt_info = self.algo.optimize_agent(itr, samples)
2020-04-22T21:21:40.0973270Z   File "/Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch/dreamer/algos/dreamer_algo.py", line 147, in optimize_agent
2020-04-22T21:21:40.0973440Z     model_loss, actor_loss, value_loss, loss_info = self.loss(*loss_inputs, itr, i)
2020-04-22T21:21:40.0974060Z   File "/Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch/dreamer/algos/dreamer_algo.py", line 239, in loss
2020-04-22T21:21:40.0974220Z     imag_reward = model.reward_model(imag_feat).mean
2020-04-22T21:21:40.0974790Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
2020-04-22T21:21:40.0974960Z     result = self.forward(*input, **kwargs)
2020-04-22T21:21:40.0975550Z   File "/Users/runner/runners/2.169.0/work/dreamer-pytorch/dreamer-pytorch/dreamer/models/dense.py", line 30, in forward
2020-04-22T21:21:40.0975720Z     x = self.model(features)
2020-04-22T21:21:40.0976330Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
2020-04-22T21:21:40.0976810Z     result = self.forward(*input, **kwargs)
2020-04-22T21:21:40.0977400Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
2020-04-22T21:21:40.0977580Z     input = module(input)
2020-04-22T21:21:40.0978180Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
2020-04-22T21:21:40.0978600Z     result = self.forward(*input, **kwargs)
2020-04-22T21:21:40.0979720Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
2020-04-22T21:21:40.0979950Z     return F.linear(input, self.weight, self.bias)
2020-04-22T21:21:40.0981230Z   File "/Users/runner/hostedtoolcache/Python/3.7.6/x64/lib/python3.7/site-packages/torch/nn/functional.py", line 1612, in linear
2020-04-22T21:21:40.0981370Z     output = input.matmul(weight.t())
2020-04-22T21:21:40.0981520Z  (print_stack at ../torch/csrc/autograd/python_anomaly_mode.cpp:60)
2020-04-22T21:21:40.0981620Z 
2020-04-22T21:21:40.0981710Z Imagination:   0%|          | 0/1 [00:00<?, ?it/s]
2020-04-22T21:21:40.0981800Z 
2020-04-22T21:21:40.0982350Z ---------- coverage: platform darwin, python 3.7.6-final-0 -----------
2020-04-22T21:21:40.0982460Z Coverage XML written to file coverage.xml
2020-04-22T21:21:40.0982540Z 
2020-04-22T21:21:40.0982770Z =========================== short test summary info ============================
2020-04-22T21:21:40.0986770Z FAILED tests/dreamer/test_main.py::test_main - RuntimeError: one of the varia...
2020-04-22T21:21:40.0986950Z ======================== 1 failed, 17 passed in 15.74s =========================
2020-04-22T21:21:40.2397990Z ##[error]Process completed with exit code 1.
2

For more detail, see:

https://github.com/juliusfrost/dreamer-pytorch/actions/runs/85118145

https://github.com/juliusfrost/dreamer-pytorch/runs/610066248

Gradient and loss are shown as Nan(Atari game)

Describe the bug
When I run the main.py, the grad and loss printed by the terminal are both Nan. I noticed that here seems to be a similar question, but there is no clear answer (I cannot open the Log file).

To Reproduce
I run pytest tests first and found no problems as follows:

============================= test session starts ==============================
platform linux -- Python 3.7.9, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
rootdir: /home/tq/2_code/dreamer
collected 18 items                                                             

tests/dreamer/test_main.py .                                             [  5%]
tests/dreamer/models/test_action.py ....                                 [ 27%]
tests/dreamer/models/test_agent.py ...                                   [ 44%]
tests/dreamer/models/test_dense.py ..                                    [ 55%]
tests/dreamer/models/test_distribution.py .                              [ 61%]
tests/dreamer/models/test_observation.py .....                           [ 88%]
tests/dreamer/models/test_rnns.py ..                                     [100%]

=============================== warnings summary ===============================
../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    _np_qint8 = np.dtype([("qint8", np.int8, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    _np_quint8 = np.dtype([("quint8", np.uint8, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    _np_qint16 = np.dtype([("qint16", np.int16, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    _np_quint16 = np.dtype([("quint16", np.uint16, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    _np_qint32 = np.dtype([("qint32", np.int32, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
    np_resource = np.dtype([("resource", np.ubyte, 1)])

../../anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/past/builtins/misc.py:45
  /home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import reload

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================== 18 passed, 7 warnings in 4.75s ========================

But when I run python3 main.py, the result is as follows:

/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/tq/anaconda3/envs/py3_torch_1/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
run 0 already exists. 
run 1 already exists. 
run 2 already exists. 
run 3 already exists. 
Using run id = 4
2020-12-08 21:40:34.528947  | dreamer_pong_4 Runner  master CPU affinity: [0, 1, 2, 3].
2020-12-08 21:40:34.529074  | dreamer_pong_4 Runner  master Torch threads: 4.
using seed 7705
2020-12-08 21:40:36.323988  | dreamer_pong_4 Sampler decorrelating envs, max steps: 0
2020-12-08 21:40:36.324458  | dreamer_pong_4 Serial Sampler initialized.
2020-12-08 21:40:36.324551  | dreamer_pong_4 Running 5000000 iterations of minibatch RL.
2020-12-08 21:40:36.325991  | dreamer_pong_4 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2020-12-08 21:40:40.829984  | dreamer_pong_4 itr #999 saving snapshot...
2020-12-08 21:40:40.852852  | dreamer_pong_4 itr #999 saved
2020-12-08 21:40:40.868678  | -----------------------------  -----------
2020-12-08 21:40:40.869799  | Diagnostics/NewCompletedTrajs     2
2020-12-08 21:40:40.869839  | Diagnostics/StepsInTrajWindow  1000
2020-12-08 21:40:40.869859  | Diagnostics/Iteration           999
2020-12-08 21:40:40.869876  | Diagnostics/CumTime (s)           4.53176
2020-12-08 21:40:40.869890  | Diagnostics/CumSteps           1000
2020-12-08 21:40:40.869904  | Diagnostics/CumCompletedTrajs     2
2020-12-08 21:40:40.869917  | Diagnostics/CumUpdates            0
2020-12-08 21:40:40.869930  | Diagnostics/StepsPerSecond      220.665
2020-12-08 21:40:40.869944  | Diagnostics/UpdatesPerSecond      0
2020-12-08 21:40:40.869957  | Diagnostics/ReplayRatio           0
2020-12-08 21:40:40.869971  | Diagnostics/CumReplayRatio        0
2020-12-08 21:40:40.869984  | Length/Average                  500
2020-12-08 21:40:40.869997  | Length/Std                        0
2020-12-08 21:40:40.870011  | Length/Median                   500
2020-12-08 21:40:40.870024  | Length/Min                      500
2020-12-08 21:40:40.870037  | Length/Max                      500
2020-12-08 21:40:40.870051  | Return/Average                   -5.5
2020-12-08 21:40:40.870064  | Return/Std                        0.5
2020-12-08 21:40:40.870077  | Return/Median                    -5.5
2020-12-08 21:40:40.870091  | Return/Min                       -6
2020-12-08 21:40:40.870104  | Return/Max                       -5
2020-12-08 21:40:40.870117  | NonzeroRewards/Average            5.5
2020-12-08 21:40:40.870130  | NonzeroRewards/Std                0.5
2020-12-08 21:40:40.870144  | NonzeroRewards/Median             5.5
2020-12-08 21:40:40.870157  | NonzeroRewards/Min                5
2020-12-08 21:40:40.870171  | NonzeroRewards/Max                6
2020-12-08 21:40:40.870184  | DiscountedReturn/Average         -0.380603
2020-12-08 21:40:40.870198  | DiscountedReturn/Std              0.163669
2020-12-08 21:40:40.870211  | DiscountedReturn/Median          -0.380603
2020-12-08 21:40:40.870225  | DiscountedReturn/Min             -0.544272
2020-12-08 21:40:40.870238  | DiscountedReturn/Max             -0.216933
2020-12-08 21:40:40.870251  | GameScore/Average                -5.5
2020-12-08 21:40:40.870272  | GameScore/Std                     0.5
2020-12-08 21:40:40.870287  | GameScore/Median                 -5.5
2020-12-08 21:40:40.870300  | GameScore/Min                    -6
2020-12-08 21:40:40.870314  | GameScore/Max                    -5
2020-12-08 21:40:40.870327  | loss/Average                    nan
2020-12-08 21:40:40.870340  | loss/Std                        nan
2020-12-08 21:40:40.870354  | loss/Median                     nan
2020-12-08 21:40:40.870367  | loss/Min                        nan
2020-12-08 21:40:40.870380  | loss/Max                        nan
2020-12-08 21:40:40.870394  | grad_norm_model/Average         nan
2020-12-08 21:40:40.870407  | grad_norm_model/Std             nan
2020-12-08 21:40:40.870421  | grad_norm_model/Median          nan
2020-12-08 21:40:40.870434  | grad_norm_model/Min             nan
2020-12-08 21:40:40.870447  | grad_norm_model/Max             nan
2020-12-08 21:40:40.870461  | grad_norm_actor/Average         nan
2020-12-08 21:40:40.870474  | grad_norm_actor/Std             nan
2020-12-08 21:40:40.870488  | grad_norm_actor/Median          nan
2020-12-08 21:40:40.870501  | grad_norm_actor/Min             nan
2020-12-08 21:40:40.870514  | grad_norm_actor/Max             nan
2020-12-08 21:40:40.870533  | grad_norm_value/Average         nan
2020-12-08 21:40:40.870548  | grad_norm_value/Std             nan
2020-12-08 21:40:40.870562  | grad_norm_value/Median          nan
2020-12-08 21:40:40.870575  | grad_norm_value/Min             nan
2020-12-08 21:40:40.870588  | grad_norm_value/Max             nan
2020-12-08 21:40:40.870601  | model_loss/Average              nan
2020-12-08 21:40:40.870615  | model_loss/Std                  nan
2020-12-08 21:40:40.870628  | model_loss/Median               nan
2020-12-08 21:40:40.870642  | model_loss/Min                  nan
2020-12-08 21:40:40.870655  | model_loss/Max                  nan
2020-12-08 21:40:40.870669  | actor_loss/Average              nan
2020-12-08 21:40:40.870682  | actor_loss/Std                  nan
2020-12-08 21:40:40.870695  | actor_loss/Median               nan
2020-12-08 21:40:40.870709  | actor_loss/Min                  nan
2020-12-08 21:40:40.870723  | actor_loss/Max                  nan
2020-12-08 21:40:40.870736  | value_loss/Average              nan
2020-12-08 21:40:40.870749  | value_loss/Std                  nan
2020-12-08 21:40:40.870763  | value_loss/Median               nan
2020-12-08 21:40:40.870776  | value_loss/Min                  nan
2020-12-08 21:40:40.870789  | value_loss/Max                  nan
2020-12-08 21:40:40.870803  | prior_entropy/Average           nan
2020-12-08 21:40:40.870816  | prior_entropy/Std               nan
2020-12-08 21:40:40.870830  | prior_entropy/Median            nan
2020-12-08 21:40:40.870843  | prior_entropy/Min               nan
2020-12-08 21:40:40.870856  | prior_entropy/Max               nan
2020-12-08 21:40:40.870869  | post_entropy/Average            nan
2020-12-08 21:40:40.870883  | post_entropy/Std                nan
2020-12-08 21:40:40.870896  | post_entropy/Median             nan
2020-12-08 21:40:40.870910  | post_entropy/Min                nan
2020-12-08 21:40:40.870923  | post_entropy/Max                nan
2020-12-08 21:40:40.870936  | divergence/Average              nan
2020-12-08 21:40:40.870950  | divergence/Std                  nan
2020-12-08 21:40:40.870963  | divergence/Median               nan
2020-12-08 21:40:40.870976  | divergence/Min                  nan
2020-12-08 21:40:40.870989  | divergence/Max                  nan
2020-12-08 21:40:40.871003  | reward_loss/Average             nan
2020-12-08 21:40:40.871016  | reward_loss/Std                 nan
2020-12-08 21:40:40.871029  | reward_loss/Median              nan
2020-12-08 21:40:40.871043  | reward_loss/Min                 nan
2020-12-08 21:40:40.871056  | reward_loss/Max                 nan
2020-12-08 21:40:40.871069  | image_loss/Average              nan
2020-12-08 21:40:40.871082  | image_loss/Std                  nan
2020-12-08 21:40:40.871096  | image_loss/Median               nan
2020-12-08 21:40:40.871109  | image_loss/Min                  nan
2020-12-08 21:40:40.871122  | image_loss/Max                  nan
2020-12-08 21:40:40.871136  | pcont_loss/Average              nan
2020-12-08 21:40:40.871149  | pcont_loss/Std                  nan
2020-12-08 21:40:40.871162  | pcont_loss/Median               nan
2020-12-08 21:40:40.871175  | pcont_loss/Min                  nan
2020-12-08 21:40:40.871189  | pcont_loss/Max                  nan
2020-12-08 21:40:40.871202  | -----------------------------  -----------
2020-12-08 21:40:40.871387  | dreamer_pong_4 itr #999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2020-12-08 21:40:45.441675  | dreamer_pong_4 itr #1999 saving snapshot...
2020-12-08 21:40:45.470369  | dreamer_pong_4 itr #1999 saved
2020-12-08 21:40:45.479243  | -----------------------------  -----------
2020-12-08 21:40:45.479561  | Diagnostics/NewCompletedTrajs     2
2020-12-08 21:40:45.479634  | Diagnostics/StepsInTrajWindow  2000
2020-12-08 21:40:45.479701  | Diagnostics/Iteration          1999
2020-12-08 21:40:45.479765  | Diagnostics/CumTime (s)           9.14452
2020-12-08 21:40:45.479829  | Diagnostics/CumSteps           2000
2020-12-08 21:40:45.479892  | Diagnostics/CumCompletedTrajs     4
2020-12-08 21:40:45.479960  | Diagnostics/CumUpdates            0
2020-12-08 21:40:45.480024  | Diagnostics/StepsPerSecond      216.79
2020-12-08 21:40:45.480087  | Diagnostics/UpdatesPerSecond      0
2020-12-08 21:40:45.480150  | Diagnostics/ReplayRatio           0
2020-12-08 21:40:45.480213  | Diagnostics/CumReplayRatio        0
2020-12-08 21:40:45.480276  | Length/Average                  500
2020-12-08 21:40:45.480339  | Length/Std                        0
2020-12-08 21:40:45.480402  | Length/Median                   500
2020-12-08 21:40:45.480465  | Length/Min                      500
2020-12-08 21:40:45.480527  | Length/Max                      500
2020-12-08 21:40:45.480590  | Return/Average                   -5.75
2020-12-08 21:40:45.480652  | Return/Std                        0.433013
2020-12-08 21:40:45.480715  | Return/Median                    -6
2020-12-08 21:40:45.480778  | Return/Min                       -6
2020-12-08 21:40:45.480841  | Return/Max                       -5
2020-12-08 21:40:45.480903  | NonzeroRewards/Average            5.75
2020-12-08 21:40:45.480966  | NonzeroRewards/Std                0.433013
2020-12-08 21:40:45.481047  | NonzeroRewards/Median             6
2020-12-08 21:40:45.481110  | NonzeroRewards/Min                5
2020-12-08 21:40:45.481172  | NonzeroRewards/Max                6
2020-12-08 21:40:45.481234  | DiscountedReturn/Average         -0.48546
2020-12-08 21:40:45.481296  | DiscountedReturn/Std              0.157067
2020-12-08 21:40:45.481358  | DiscountedReturn/Median          -0.555435
2020-12-08 21:40:45.481421  | DiscountedReturn/Min             -0.614036
2020-12-08 21:40:45.481483  | DiscountedReturn/Max             -0.216933
2020-12-08 21:40:45.481545  | GameScore/Average                -5.75
2020-12-08 21:40:45.481607  | GameScore/Std                     0.433013
2020-12-08 21:40:45.481669  | GameScore/Median                 -6
2020-12-08 21:40:45.481731  | GameScore/Min                    -6
2020-12-08 21:40:45.481793  | GameScore/Max                    -5
2020-12-08 21:40:45.481855  | loss/Average                    nan
2020-12-08 21:40:45.481986  | loss/Std                        nan
2020-12-08 21:40:45.482050  | loss/Median                     nan
2020-12-08 21:40:45.482112  | loss/Min                        nan
2020-12-08 21:40:45.482174  | loss/Max                        nan
2020-12-08 21:40:45.482236  | grad_norm_model/Average         nan
2020-12-08 21:40:45.482306  | grad_norm_model/Std             nan
2020-12-08 21:40:45.482368  | grad_norm_model/Median          nan
2020-12-08 21:40:45.482431  | grad_norm_model/Min             nan
2020-12-08 21:40:45.482493  | grad_norm_model/Max             nan
2020-12-08 21:40:45.482556  | grad_norm_actor/Average         nan
2020-12-08 21:40:45.482618  | grad_norm_actor/Std             nan
2020-12-08 21:40:45.482680  | grad_norm_actor/Median          nan
2020-12-08 21:40:45.482742  | grad_norm_actor/Min             nan
2020-12-08 21:40:45.482804  | grad_norm_actor/Max             nan
2020-12-08 21:40:45.482867  | grad_norm_value/Average         nan
2020-12-08 21:40:45.482929  | grad_norm_value/Std             nan
2020-12-08 21:40:45.482991  | grad_norm_value/Median          nan
2020-12-08 21:40:45.483054  | grad_norm_value/Min             nan
2020-12-08 21:40:45.483116  | grad_norm_value/Max             nan
2020-12-08 21:40:45.483178  | model_loss/Average              nan
2020-12-08 21:40:45.483241  | model_loss/Std                  nan
2020-12-08 21:40:45.483303  | model_loss/Median               nan
2020-12-08 21:40:45.483365  | model_loss/Min                  nan
2020-12-08 21:40:45.483427  | model_loss/Max                  nan
2020-12-08 21:40:45.483490  | actor_loss/Average              nan
2020-12-08 21:40:45.483569  | actor_loss/Std                  nan
2020-12-08 21:40:45.483633  | actor_loss/Median               nan
2020-12-08 21:40:45.483695  | actor_loss/Min                  nan
2020-12-08 21:40:45.483758  | actor_loss/Max                  nan
2020-12-08 21:40:45.483821  | value_loss/Average              nan
2020-12-08 21:40:45.483887  | value_loss/Std                  nan
2020-12-08 21:40:45.484033  | value_loss/Median               nan
2020-12-08 21:40:45.484099  | value_loss/Min                  nan
2020-12-08 21:40:45.484162  | value_loss/Max                  nan
2020-12-08 21:40:45.484225  | prior_entropy/Average           nan
2020-12-08 21:40:45.484288  | prior_entropy/Std               nan
2020-12-08 21:40:45.484351  | prior_entropy/Median            nan
2020-12-08 21:40:45.484487  | prior_entropy/Min               nan
2020-12-08 21:40:45.484551  | prior_entropy/Max               nan
2020-12-08 21:40:45.484614  | post_entropy/Average            nan
2020-12-08 21:40:45.484677  | post_entropy/Std                nan
2020-12-08 21:40:45.484740  | post_entropy/Median             nan
2020-12-08 21:40:45.484803  | post_entropy/Min                nan
2020-12-08 21:40:45.484865  | post_entropy/Max                nan
2020-12-08 21:40:45.484928  | divergence/Average              nan
2020-12-08 21:40:45.484991  | divergence/Std                  nan
2020-12-08 21:40:45.485053  | divergence/Median               nan
2020-12-08 21:40:45.485116  | divergence/Min                  nan
2020-12-08 21:40:45.485179  | divergence/Max                  nan
2020-12-08 21:40:45.485242  | reward_loss/Average             nan
2020-12-08 21:40:45.485305  | reward_loss/Std                 nan
2020-12-08 21:40:45.485368  | reward_loss/Median              nan
2020-12-08 21:40:45.485431  | reward_loss/Min                 nan
2020-12-08 21:40:45.485494  | reward_loss/Max                 nan
2020-12-08 21:40:45.485556  | image_loss/Average              nan
2020-12-08 21:40:45.485619  | image_loss/Std                  nan
2020-12-08 21:40:45.485682  | image_loss/Median               nan
2020-12-08 21:40:45.485745  | image_loss/Min                  nan
2020-12-08 21:40:45.485808  | image_loss/Max                  nan
2020-12-08 21:40:45.485871  | pcont_loss/Average              nan
2020-12-08 21:40:45.485934  | pcont_loss/Std                  nan
2020-12-08 21:40:45.486012  | pcont_loss/Median               nan
2020-12-08 21:40:45.486084  | pcont_loss/Min                  nan
2020-12-08 21:40:45.486147  | pcont_loss/Max                  nan
2020-12-08 21:40:45.486210  | -----------------------------  -----------
2020-12-08 21:40:45.486394  | dreamer_pong_4 itr #1999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2020-12-08 21:40:49.996258  | dreamer_pong_4 itr #2999 saving snapshot...
2020-12-08 21:40:50.101570  | dreamer_pong_4 itr #2999 saved
2020-12-08 21:40:50.112942  | -----------------------------  -----------
2020-12-08 21:40:50.113748  | Diagnostics/NewCompletedTrajs     2
2020-12-08 21:40:50.114253  | Diagnostics/StepsInTrajWindow  3000
2020-12-08 21:40:50.114965  | Diagnostics/Iteration          2999
2020-12-08 21:40:50.115052  | Diagnostics/CumTime (s)          13.7764
2020-12-08 21:40:50.115133  | Diagnostics/CumSteps           3000
2020-12-08 21:40:50.115199  | Diagnostics/CumCompletedTrajs     6
2020-12-08 21:40:50.115263  | Diagnostics/CumUpdates            0
2020-12-08 21:40:50.115327  | Diagnostics/StepsPerSecond      215.894
2020-12-08 21:40:50.115391  | Diagnostics/UpdatesPerSecond      0
2020-12-08 21:40:50.115504  | Diagnostics/ReplayRatio           0
2020-12-08 21:40:50.115569  | Diagnostics/CumReplayRatio        0
2020-12-08 21:40:50.115633  | Length/Average                  500
2020-12-08 21:40:50.115696  | Length/Std                        0
2020-12-08 21:40:50.115760  | Length/Median                   500
2020-12-08 21:40:50.115822  | Length/Min                      500
2020-12-08 21:40:50.115886  | Length/Max                      500
2020-12-08 21:40:50.116015  | Return/Average                   -5.33333
2020-12-08 21:40:50.116079  | Return/Std                        1.10554
2020-12-08 21:40:50.116142  | Return/Median                    -6
2020-12-08 21:40:50.116205  | Return/Min                       -6
2020-12-08 21:40:50.116268  | Return/Max                       -3
2020-12-08 21:40:50.116330  | NonzeroRewards/Average            5.33333
2020-12-08 21:40:50.116393  | NonzeroRewards/Std                1.10554
2020-12-08 21:40:50.116462  | NonzeroRewards/Median             6
2020-12-08 21:40:50.116526  | NonzeroRewards/Min                3
2020-12-08 21:40:50.116590  | NonzeroRewards/Max                6
2020-12-08 21:40:50.118740  | DiscountedReturn/Average         -0.481913
2020-12-08 21:40:50.119389  | DiscountedReturn/Std              0.155374
2020-12-08 21:40:50.119484  | DiscountedReturn/Median          -0.555435
2020-12-08 21:40:50.119553  | DiscountedReturn/Min             -0.626503
2020-12-08 21:40:50.119618  | DiscountedReturn/Max             -0.216933
2020-12-08 21:40:50.119681  | GameScore/Average                -5.33333
2020-12-08 21:40:50.119744  | GameScore/Std                     1.10554
2020-12-08 21:40:50.119808  | GameScore/Median                 -6
2020-12-08 21:40:50.119871  | GameScore/Min                    -6
2020-12-08 21:40:50.119935  | GameScore/Max                    -3
2020-12-08 21:40:50.120008  | loss/Average                    nan
2020-12-08 21:40:50.120074  | loss/Std                        nan
2020-12-08 21:40:50.120137  | loss/Median                     nan
2020-12-08 21:40:50.120201  | loss/Min                        nan
2020-12-08 21:40:50.120264  | loss/Max                        nan
2020-12-08 21:40:50.120327  | grad_norm_model/Average         nan
2020-12-08 21:40:50.120390  | grad_norm_model/Std             nan
2020-12-08 21:40:50.120453  | grad_norm_model/Median          nan
2020-12-08 21:40:50.120516  | grad_norm_model/Min             nan
2020-12-08 21:40:50.120579  | grad_norm_model/Max             nan
2020-12-08 21:40:50.120643  | grad_norm_actor/Average         nan
2020-12-08 21:40:50.120706  | grad_norm_actor/Std             nan
2020-12-08 21:40:50.120769  | grad_norm_actor/Median          nan
2020-12-08 21:40:50.120832  | grad_norm_actor/Min             nan
2020-12-08 21:40:50.120962  | grad_norm_actor/Max             nan
2020-12-08 21:40:50.121026  | grad_norm_value/Average         nan
2020-12-08 21:40:50.121089  | grad_norm_value/Std             nan
2020-12-08 21:40:50.121152  | grad_norm_value/Median          nan
2020-12-08 21:40:50.121215  | grad_norm_value/Min             nan
2020-12-08 21:40:50.121278  | grad_norm_value/Max             nan
2020-12-08 21:40:50.121341  | model_loss/Average              nan
2020-12-08 21:40:50.121405  | model_loss/Std                  nan
2020-12-08 21:40:50.121469  | model_loss/Median               nan
2020-12-08 21:40:50.121532  | model_loss/Min                  nan
2020-12-08 21:40:50.121595  | model_loss/Max                  nan
2020-12-08 21:40:50.121659  | actor_loss/Average              nan
2020-12-08 21:40:50.121721  | actor_loss/Std                  nan
2020-12-08 21:40:50.121785  | actor_loss/Median               nan
2020-12-08 21:40:50.121847  | actor_loss/Min                  nan
2020-12-08 21:40:50.121910  | actor_loss/Max                  nan
2020-12-08 21:40:50.121973  | value_loss/Average              nan
2020-12-08 21:40:50.122036  | value_loss/Std                  nan
2020-12-08 21:40:50.122099  | value_loss/Median               nan
2020-12-08 21:40:50.122162  | value_loss/Min                  nan
2020-12-08 21:40:50.122225  | value_loss/Max                  nan
2020-12-08 21:40:50.122295  | prior_entropy/Average           nan
2020-12-08 21:40:50.122359  | prior_entropy/Std               nan
2020-12-08 21:40:50.122422  | prior_entropy/Median            nan
2020-12-08 21:40:50.122495  | prior_entropy/Min               nan
2020-12-08 21:40:50.122559  | prior_entropy/Max               nan
2020-12-08 21:40:50.122621  | post_entropy/Average            nan
2020-12-08 21:40:50.122684  | post_entropy/Std                nan
2020-12-08 21:40:50.122747  | post_entropy/Median             nan
2020-12-08 21:40:50.122810  | post_entropy/Min                nan
2020-12-08 21:40:50.125242  | post_entropy/Max                nan
2020-12-08 21:40:50.125794  | divergence/Average              nan
2020-12-08 21:40:50.126293  | divergence/Std                  nan
2020-12-08 21:40:50.126885  | divergence/Median               nan
2020-12-08 21:40:50.129519  | divergence/Min                  nan
2020-12-08 21:40:50.129971  | divergence/Max                  nan
2020-12-08 21:40:50.131475  | reward_loss/Average             nan
2020-12-08 21:40:50.132900  | reward_loss/Std                 nan
2020-12-08 21:40:50.133001  | reward_loss/Median              nan
2020-12-08 21:40:50.133803  | reward_loss/Min                 nan
2020-12-08 21:40:50.134231  | reward_loss/Max                 nan
2020-12-08 21:40:50.134686  | image_loss/Average              nan
2020-12-08 21:40:50.135153  | image_loss/Std                  nan
2020-12-08 21:40:50.136847  | image_loss/Median               nan
2020-12-08 21:40:50.136925  | image_loss/Min                  nan
2020-12-08 21:40:50.136992  | image_loss/Max                  nan
2020-12-08 21:40:50.137056  | pcont_loss/Average              nan
2020-12-08 21:40:50.137120  | pcont_loss/Std                  nan
2020-12-08 21:40:50.137183  | pcont_loss/Median               nan
2020-12-08 21:40:50.137247  | pcont_loss/Min                  nan
2020-12-08 21:40:50.137310  | pcont_loss/Max                  nan
2020-12-08 21:40:50.137373  | -----------------------------  -----------
2020-12-08 21:40:50.137565  | dreamer_pong_4 itr #2999 Optimizing over 1000 iterations.

Is this result normal?

Expected behavior
Hope you can explain this.

Desktop:

OS: Ubuntu 16.04
Version Torch 1.2.0

Observation decoder cannot reconstruct images other than size 3, 64, 64

Observation decoder cannot reconstruct images other than size (3, 64, 64)

See dreamer/models/observation.py, ObservationDecoder class.

Make hyperparameters aligned with dreamer

The hyperparameters are not yet aligned with the dreamer algorithm, such as sampling and agent optimizers.

What are the results of this on atari envs?

I see that the results for mujoco are posted.
What results did you get for the Atari envs?
Was the agent able to more-or-less converge after 1 million steps?

Error with gradients

Check logs here: https://github.com/juliusfrost/dreamer-pytorch/runs/623489903
Resulted from pull request #59

Probability of Continuing / Discount Modeling

Add the probability of continuing with a dense model. This is described very briefly in appendix A for use in Atari Environments. "We predict the discount factor from the latent state with a binary classiﬁer that is trained towards the soft labels of 0 and γ."
pcont is the equivalent term in the TensorFlow implementation

Why are observation_embed and action at the same “t” in the rollout_representation function?

Hi, I'm confused... In rnns.py，there is a function as follows:

def rollout_representation(self, steps: int, obs_embed: torch.Tensor, action: torch.Tensor,
                           prev_state: RSSMState):
    priors = []
    posteriors = []
    for t in range(steps):
        prior_state, posterior_state = self.representation_model(obs_embed[t], action[t], prev_state)
        priors.append(prior_state)
        posteriors.append(posterior_state)
        prev_state = posterior_state
    prior = stack_states(priors, dim=0)
    post = stack_states(posteriors, dim=0)
    return prior, post

According to the original formula in paper, the input of the representation model should be the action at the previous moment and the obs_embed at the current moment?
So why is it the same moment here <prior_state, posterior_state = self.representation_model(obs_embed[t], action[t], prev_state)>?
Maybe I missed some details, please help me to resolve my confusion. Thank you.

Does this repo reproduce original result

Is this implementation of dreamer verified to work? And did anyone tested if this reproduces result in original paper

some bug with "python main.py" ~v~

Dear authers, after run “python main.py”, there is a error.

run 0 already exists.
run 1 already exists.
run 2 already exists.
run 3 already exists.
run 4 already exists.
run 5 already exists.
run 6 already exists.
run 7 already exists.
run 8 already exists.
run 9 already exists.
run 10 already exists.
run 11 already exists.
Using run id = 12
2022-07-13 11:45:12.949051 | dreamer_pong_12 Runner master CPU affinity: [0, 1, 2, 3, 4, 5, 6, 7].
2022-07-13 11:45:12.949116 | dreamer_pong_12 Runner master Torch threads: 4.
using seed 970
2022-07-13 11:45:14.629311 | dreamer_pong_12 Sampler decorrelating envs, max steps: 0
2022-07-13 11:45:14.629631 | dreamer_pong_12 Serial Sampler initialized.
2022-07-13 11:45:14.629661 | dreamer_pong_12 Running 5000000 iterations of minibatch RL.
/home/uav-robot/anaconda3/envs/juliusfrost/lib/python3.8/site-packages/torch/optim/adam.py:90: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information
super(Adam, self).init(params, defaults)
2022-07-13 11:45:14.630298 | dreamer_pong_12 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:03
2022-07-13 11:45:18.363206 | dreamer_pong_12 itr #999 saving snapshot...
2022-07-13 11:45:18.388679 | dreamer_pong_12 itr #999 saved
2022-07-13 11:45:18.396905 | ----------------------------- ------------
2022-07-13 11:45:18.396941 | Diagnostics/NewCompletedTrajs 2
2022-07-13 11:45:18.396986 | Diagnostics/StepsInTrajWindow 1000
2022-07-13 11:45:18.397003 | Diagnostics/Iteration 999
2022-07-13 11:45:18.397028 | Diagnostics/CumTime (s) 3.75844
2022-07-13 11:45:18.397065 | Diagnostics/CumSteps 1000
2022-07-13 11:45:18.397090 | Diagnostics/CumCompletedTrajs 2
2022-07-13 11:45:18.397130 | Diagnostics/CumUpdates 0
2022-07-13 11:45:18.397179 | Diagnostics/StepsPerSecond 266.068
2022-07-13 11:45:18.397235 | Diagnostics/UpdatesPerSecond 0
2022-07-13 11:45:18.397261 | Diagnostics/ReplayRatio 0
2022-07-13 11:45:18.397300 | Diagnostics/CumReplayRatio 0
2022-07-13 11:45:18.397325 | Length/Average 500
2022-07-13 11:45:18.397365 | Length/Std 0
2022-07-13 11:45:18.397379 | Length/Median 500
2022-07-13 11:45:18.397402 | Length/Min 500
2022-07-13 11:45:18.397442 | Length/Max 500
2022-07-13 11:45:18.397466 | Return/Average -6
2022-07-13 11:45:18.397506 | Return/Std 0
2022-07-13 11:45:18.397520 | Return/Median -6
2022-07-13 11:45:18.397568 | Return/Min -6
2022-07-13 11:45:18.397582 | Return/Max -6
2022-07-13 11:45:18.397605 | NonzeroRewards/Average 6
2022-07-13 11:45:18.397645 | NonzeroRewards/Std 0
2022-07-13 11:45:18.397659 | NonzeroRewards/Median 6
2022-07-13 11:45:18.397682 | NonzeroRewards/Min 6
2022-07-13 11:45:18.397710 | NonzeroRewards/Max 6
2022-07-13 11:45:18.397721 | DiscountedReturn/Average -0.596882
2022-07-13 11:45:18.397731 | DiscountedReturn/Std 0.0359496
2022-07-13 11:45:18.397741 | DiscountedReturn/Median -0.596882
2022-07-13 11:45:18.397751 | DiscountedReturn/Min -0.632832
2022-07-13 11:45:18.397761 | DiscountedReturn/Max -0.560932
2022-07-13 11:45:18.397771 | GameScore/Average -6
2022-07-13 11:45:18.397801 | GameScore/Std 0
2022-07-13 11:45:18.397826 | GameScore/Median -6
2022-07-13 11:45:18.397854 | GameScore/Min -6
2022-07-13 11:45:18.397864 | GameScore/Max -6
2022-07-13 11:45:18.397890 | loss/Average nan
2022-07-13 11:45:18.397900 | loss/Std nan
2022-07-13 11:45:18.397910 | loss/Median nan
2022-07-13 11:45:18.397919 | loss/Min nan
2022-07-13 11:45:18.397929 | loss/Max nan
2022-07-13 11:45:18.397939 | grad_norm_model/Average nan
2022-07-13 11:45:18.397968 | grad_norm_model/Std nan
2022-07-13 11:45:18.397978 | grad_norm_model/Median nan
2022-07-13 11:45:18.398003 | grad_norm_model/Min nan
2022-07-13 11:45:18.398031 | grad_norm_model/Max nan
2022-07-13 11:45:18.398040 | grad_norm_actor/Average nan
2022-07-13 11:45:18.398065 | grad_norm_actor/Std nan
2022-07-13 11:45:18.398093 | grad_norm_actor/Median nan
2022-07-13 11:45:18.398102 | grad_norm_actor/Min nan
2022-07-13 11:45:18.398127 | grad_norm_actor/Max nan
2022-07-13 11:45:18.398161 | grad_norm_value/Average nan
2022-07-13 11:45:18.398187 | grad_norm_value/Std nan
2022-07-13 11:45:18.398216 | grad_norm_value/Median nan
2022-07-13 11:45:18.398226 | grad_norm_value/Min nan
2022-07-13 11:45:18.398236 | grad_norm_value/Max nan
2022-07-13 11:45:18.398246 | model_loss/Average nan
2022-07-13 11:45:18.398256 | model_loss/Std nan
2022-07-13 11:45:18.398266 | model_loss/Median nan
2022-07-13 11:45:18.398276 | model_loss/Min nan
2022-07-13 11:45:18.398285 | model_loss/Max nan
2022-07-13 11:45:18.398295 | actor_loss/Average nan
2022-07-13 11:45:18.398305 | actor_loss/Std nan
2022-07-13 11:45:18.398315 | actor_loss/Median nan
2022-07-13 11:45:18.398325 | actor_loss/Min nan
2022-07-13 11:45:18.398335 | actor_loss/Max nan
2022-07-13 11:45:18.398345 | value_loss/Average nan
2022-07-13 11:45:18.398355 | value_loss/Std nan
2022-07-13 11:45:18.398365 | value_loss/Median nan
2022-07-13 11:45:18.398374 | value_loss/Min nan
2022-07-13 11:45:18.398384 | value_loss/Max nan
2022-07-13 11:45:18.398394 | prior_entropy/Average nan
2022-07-13 11:45:18.398404 | prior_entropy/Std nan
2022-07-13 11:45:18.398414 | prior_entropy/Median nan
2022-07-13 11:45:18.398424 | prior_entropy/Min nan
2022-07-13 11:45:18.398434 | prior_entropy/Max nan
2022-07-13 11:45:18.398443 | post_entropy/Average nan
2022-07-13 11:45:18.398453 | post_entropy/Std nan
2022-07-13 11:45:18.398463 | post_entropy/Median nan
2022-07-13 11:45:18.398473 | post_entropy/Min nan
2022-07-13 11:45:18.398483 | post_entropy/Max nan
2022-07-13 11:45:18.398493 | divergence/Average nan
2022-07-13 11:45:18.398503 | divergence/Std nan
2022-07-13 11:45:18.398513 | divergence/Median nan
2022-07-13 11:45:18.398523 | divergence/Min nan
2022-07-13 11:45:18.398533 | divergence/Max nan
2022-07-13 11:45:18.398543 | reward_loss/Average nan
2022-07-13 11:45:18.398553 | reward_loss/Std nan
2022-07-13 11:45:18.398562 | reward_loss/Median nan
2022-07-13 11:45:18.398572 | reward_loss/Min nan
2022-07-13 11:45:18.398582 | reward_loss/Max nan
2022-07-13 11:45:18.398592 | image_loss/Average nan
2022-07-13 11:45:18.398602 | image_loss/Std nan
2022-07-13 11:45:18.398612 | image_loss/Median nan
2022-07-13 11:45:18.398621 | image_loss/Min nan
2022-07-13 11:45:18.398631 | image_loss/Max nan
2022-07-13 11:45:18.398641 | pcont_loss/Average nan
2022-07-13 11:45:18.398651 | pcont_loss/Std nan
2022-07-13 11:45:18.398661 | pcont_loss/Median nan
2022-07-13 11:45:18.398671 | pcont_loss/Min nan
2022-07-13 11:45:18.398681 | pcont_loss/Max nan
2022-07-13 11:45:18.398691 | ----------------------------- ------------
2022-07-13 11:45:18.398820 | dreamer_pong_12 itr #999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:03
2022-07-13 11:45:22.201126 | dreamer_pong_12 itr #1999 saving snapshot...
2022-07-13 11:45:22.230317 | dreamer_pong_12 itr #1999 saved
2022-07-13 11:45:22.239040 | ----------------------------- -----------
2022-07-13 11:45:22.239081 | Diagnostics/NewCompletedTrajs 2
2022-07-13 11:45:22.239163 | Diagnostics/StepsInTrajWindow 2000
2022-07-13 11:45:22.239189 | Diagnostics/Iteration 1999
2022-07-13 11:45:22.239264 | Diagnostics/CumTime (s) 7.60008
2022-07-13 11:45:22.239300 | Diagnostics/CumSteps 2000
2022-07-13 11:45:22.239357 | Diagnostics/CumCompletedTrajs 4
2022-07-13 11:45:22.239444 | Diagnostics/CumUpdates 0
2022-07-13 11:45:22.239523 | Diagnostics/StepsPerSecond 260.306
2022-07-13 11:45:22.239546 | Diagnostics/UpdatesPerSecond 0
2022-07-13 11:45:22.239600 | Diagnostics/ReplayRatio 0
2022-07-13 11:45:22.239614 | Diagnostics/CumReplayRatio 0
2022-07-13 11:45:22.239664 | Length/Average 500
2022-07-13 11:45:22.239679 | Length/Std 0
2022-07-13 11:45:22.239729 | Length/Median 500
2022-07-13 11:45:22.239754 | Length/Min 500
2022-07-13 11:45:22.239796 | Length/Max 500
2022-07-13 11:45:22.239810 | Return/Average -5.75
2022-07-13 11:45:22.239834 | Return/Std 0.433013
2022-07-13 11:45:22.239874 | Return/Median -6
2022-07-13 11:45:22.239889 | Return/Min -6
2022-07-13 11:45:22.239912 | Return/Max -5
2022-07-13 11:45:22.239955 | NonzeroRewards/Average 5.75
2022-07-13 11:45:22.239979 | NonzeroRewards/Std 0.433013
2022-07-13 11:45:22.240006 | NonzeroRewards/Median 6
2022-07-13 11:45:22.240018 | NonzeroRewards/Min 5
2022-07-13 11:45:22.240028 | NonzeroRewards/Max 6
2022-07-13 11:45:22.240038 | DiscountedReturn/Average -0.537198
2022-07-13 11:45:22.240049 | DiscountedReturn/Std 0.116308
2022-07-13 11:45:22.240059 | DiscountedReturn/Median -0.587484
2022-07-13 11:45:22.240069 | DiscountedReturn/Min -0.632832
2022-07-13 11:45:22.240079 | DiscountedReturn/Max -0.340991
2022-07-13 11:45:22.240089 | GameScore/Average -5.75
2022-07-13 11:45:22.240099 | GameScore/Std 0.433013
2022-07-13 11:45:22.240109 | GameScore/Median -6
2022-07-13 11:45:22.240119 | GameScore/Min -6
2022-07-13 11:45:22.240129 | GameScore/Max -5
2022-07-13 11:45:22.240139 | loss/Average nan
2022-07-13 11:45:22.240150 | loss/Std nan
2022-07-13 11:45:22.240160 | loss/Median nan
2022-07-13 11:45:22.240170 | loss/Min nan
2022-07-13 11:45:22.240180 | loss/Max nan
2022-07-13 11:45:22.240190 | grad_norm_model/Average nan
2022-07-13 11:45:22.240200 | grad_norm_model/Std nan
2022-07-13 11:45:22.240210 | grad_norm_model/Median nan
2022-07-13 11:45:22.240220 | grad_norm_model/Min nan
2022-07-13 11:45:22.240230 | grad_norm_model/Max nan
2022-07-13 11:45:22.240240 | grad_norm_actor/Average nan
2022-07-13 11:45:22.240250 | grad_norm_actor/Std nan
2022-07-13 11:45:22.240260 | grad_norm_actor/Median nan
2022-07-13 11:45:22.240270 | grad_norm_actor/Min nan
2022-07-13 11:45:22.240280 | grad_norm_actor/Max nan
2022-07-13 11:45:22.240290 | grad_norm_value/Average nan
2022-07-13 11:45:22.240300 | grad_norm_value/Std nan
2022-07-13 11:45:22.240310 | grad_norm_value/Median nan
2022-07-13 11:45:22.240320 | grad_norm_value/Min nan
2022-07-13 11:45:22.240330 | grad_norm_value/Max nan
2022-07-13 11:45:22.240340 | model_loss/Average nan
2022-07-13 11:45:22.240350 | model_loss/Std nan
2022-07-13 11:45:22.240360 | model_loss/Median nan
2022-07-13 11:45:22.240370 | model_loss/Min nan
2022-07-13 11:45:22.240380 | model_loss/Max nan
2022-07-13 11:45:22.240390 | actor_loss/Average nan
2022-07-13 11:45:22.240400 | actor_loss/Std nan
2022-07-13 11:45:22.240410 | actor_loss/Median nan
2022-07-13 11:45:22.240420 | actor_loss/Min nan
2022-07-13 11:45:22.240430 | actor_loss/Max nan
2022-07-13 11:45:22.240440 | value_loss/Average nan
2022-07-13 11:45:22.240453 | value_loss/Std nan
2022-07-13 11:45:22.240464 | value_loss/Median nan
2022-07-13 11:45:22.240474 | value_loss/Min nan
2022-07-13 11:45:22.240484 | value_loss/Max nan
2022-07-13 11:45:22.240494 | prior_entropy/Average nan
2022-07-13 11:45:22.240504 | prior_entropy/Std nan
2022-07-13 11:45:22.240514 | prior_entropy/Median nan
2022-07-13 11:45:22.240524 | prior_entropy/Min nan
2022-07-13 11:45:22.240534 | prior_entropy/Max nan
2022-07-13 11:45:22.240544 | post_entropy/Average nan
2022-07-13 11:45:22.240553 | post_entropy/Std nan
2022-07-13 11:45:22.240563 | post_entropy/Median nan
2022-07-13 11:45:22.240573 | post_entropy/Min nan
2022-07-13 11:45:22.240583 | post_entropy/Max nan
2022-07-13 11:45:22.240593 | divergence/Average nan
2022-07-13 11:45:22.240603 | divergence/Std nan
2022-07-13 11:45:22.240613 | divergence/Median nan
2022-07-13 11:45:22.240623 | divergence/Min nan
2022-07-13 11:45:22.240633 | divergence/Max nan
2022-07-13 11:45:22.240643 | reward_loss/Average nan
2022-07-13 11:45:22.240653 | reward_loss/Std nan
2022-07-13 11:45:22.240663 | reward_loss/Median nan
2022-07-13 11:45:22.240673 | reward_loss/Min nan
2022-07-13 11:45:22.240683 | reward_loss/Max nan
2022-07-13 11:45:22.240693 | image_loss/Average nan
2022-07-13 11:45:22.240702 | image_loss/Std nan
2022-07-13 11:45:22.240712 | image_loss/Median nan
2022-07-13 11:45:22.240722 | image_loss/Min nan
2022-07-13 11:45:22.240732 | image_loss/Max nan
2022-07-13 11:45:22.240742 | pcont_loss/Average nan
2022-07-13 11:45:22.240752 | pcont_loss/Std nan
2022-07-13 11:45:22.240762 | pcont_loss/Median nan
2022-07-13 11:45:22.240772 | pcont_loss/Min nan
2022-07-13 11:45:22.240782 | pcont_loss/Max nan
2022-07-13 11:45:22.240792 | ----------------------------- -----------
2022-07-13 11:45:22.240882 | dreamer_pong_12 itr #1999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2022-07-13 11:45:26.303541 | dreamer_pong_12 itr #2999 saving snapshot...
2022-07-13 11:45:26.333394 | dreamer_pong_12 itr #2999 saved
2022-07-13 11:45:26.344508 | ----------------------------- ------------
2022-07-13 11:45:26.344557 | Diagnostics/NewCompletedTrajs 2
2022-07-13 11:45:26.344587 | Diagnostics/StepsInTrajWindow 3000
2022-07-13 11:45:26.344640 | Diagnostics/Iteration 2999
2022-07-13 11:45:26.344670 | Diagnostics/CumTime (s) 11.7032
2022-07-13 11:45:26.344712 | Diagnostics/CumSteps 3000
2022-07-13 11:45:26.344737 | Diagnostics/CumCompletedTrajs 6
2022-07-13 11:45:26.344776 | Diagnostics/CumUpdates 0
2022-07-13 11:45:26.344802 | Diagnostics/StepsPerSecond 243.718
2022-07-13 11:45:26.344845 | Diagnostics/UpdatesPerSecond 0
2022-07-13 11:45:26.344864 | Diagnostics/ReplayRatio 0
2022-07-13 11:45:26.344910 | Diagnostics/CumReplayRatio 0
2022-07-13 11:45:26.344924 | Length/Average 500
2022-07-13 11:45:26.344947 | Length/Std 0
2022-07-13 11:45:26.344990 | Length/Median 500
2022-07-13 11:45:26.345010 | Length/Min 500
2022-07-13 11:45:26.345055 | Length/Max 500
2022-07-13 11:45:26.345069 | Return/Average -5.66667
2022-07-13 11:45:26.345093 | Return/Std 0.471405
2022-07-13 11:45:26.345134 | Return/Median -6
2022-07-13 11:45:26.345163 | Return/Min -6
2022-07-13 11:45:26.345205 | Return/Max -5
2022-07-13 11:45:26.345219 | NonzeroRewards/Average 5.66667
2022-07-13 11:45:26.345243 | NonzeroRewards/Std 0.471405
2022-07-13 11:45:26.345282 | NonzeroRewards/Median 6
2022-07-13 11:45:26.345316 | NonzeroRewards/Min 5
2022-07-13 11:45:26.345364 | NonzeroRewards/Max 6
2022-07-13 11:45:26.345383 | DiscountedReturn/Average -0.548505
2022-07-13 11:45:26.345401 | DiscountedReturn/Std 0.0969068
2022-07-13 11:45:26.345417 | DiscountedReturn/Median -0.575386
2022-07-13 11:45:26.345432 | DiscountedReturn/Min -0.632832
2022-07-13 11:45:26.345449 | DiscountedReturn/Max -0.340991
2022-07-13 11:45:26.345464 | GameScore/Average -5.66667
2022-07-13 11:45:26.345480 | GameScore/Std 0.471405
2022-07-13 11:45:26.345495 | GameScore/Median -6
2022-07-13 11:45:26.345509 | GameScore/Min -6
2022-07-13 11:45:26.345524 | GameScore/Max -5
2022-07-13 11:45:26.345539 | loss/Average nan
2022-07-13 11:45:26.345553 | loss/Std nan
2022-07-13 11:45:26.345568 | loss/Median nan
2022-07-13 11:45:26.345584 | loss/Min nan
2022-07-13 11:45:26.345600 | loss/Max nan
2022-07-13 11:45:26.345616 | grad_norm_model/Average nan
2022-07-13 11:45:26.345630 | grad_norm_model/Std nan
2022-07-13 11:45:26.345644 | grad_norm_model/Median nan
2022-07-13 11:45:26.345658 | grad_norm_model/Min nan
2022-07-13 11:45:26.345673 | grad_norm_model/Max nan
2022-07-13 11:45:26.345689 | grad_norm_actor/Average nan
2022-07-13 11:45:26.345707 | grad_norm_actor/Std nan
2022-07-13 11:45:26.345723 | grad_norm_actor/Median nan
2022-07-13 11:45:26.345739 | grad_norm_actor/Min nan
2022-07-13 11:45:26.345755 | grad_norm_actor/Max nan
2022-07-13 11:45:26.345770 | grad_norm_value/Average nan
2022-07-13 11:45:26.345787 | grad_norm_value/Std nan
2022-07-13 11:45:26.345805 | grad_norm_value/Median nan
2022-07-13 11:45:26.345824 | grad_norm_value/Min nan
2022-07-13 11:45:26.345841 | grad_norm_value/Max nan
2022-07-13 11:45:26.345858 | model_loss/Average nan
2022-07-13 11:45:26.345874 | model_loss/Std nan
2022-07-13 11:45:26.345891 | model_loss/Median nan
2022-07-13 11:45:26.345908 | model_loss/Min nan
2022-07-13 11:45:26.345927 | model_loss/Max nan
2022-07-13 11:45:26.345945 | actor_loss/Average nan
2022-07-13 11:45:26.345964 | actor_loss/Std nan
2022-07-13 11:45:26.345982 | actor_loss/Median nan
2022-07-13 11:45:26.346001 | actor_loss/Min nan
2022-07-13 11:45:26.346020 | actor_loss/Max nan
2022-07-13 11:45:26.346039 | value_loss/Average nan
2022-07-13 11:45:26.346057 | value_loss/Std nan
2022-07-13 11:45:26.346075 | value_loss/Median nan
2022-07-13 11:45:26.346093 | value_loss/Min nan
2022-07-13 11:45:26.346111 | value_loss/Max nan
2022-07-13 11:45:26.346129 | prior_entropy/Average nan
2022-07-13 11:45:26.346146 | prior_entropy/Std nan
2022-07-13 11:45:26.346164 | prior_entropy/Median nan
2022-07-13 11:45:26.346181 | prior_entropy/Min nan
2022-07-13 11:45:26.346198 | prior_entropy/Max nan
2022-07-13 11:45:26.346216 | post_entropy/Average nan
2022-07-13 11:45:26.346234 | post_entropy/Std nan
2022-07-13 11:45:26.346250 | post_entropy/Median nan
2022-07-13 11:45:26.346266 | post_entropy/Min nan
2022-07-13 11:45:26.346284 | post_entropy/Max nan
2022-07-13 11:45:26.346303 | divergence/Average nan
2022-07-13 11:45:26.346321 | divergence/Std nan
2022-07-13 11:45:26.346339 | divergence/Median nan
2022-07-13 11:45:26.346364 | divergence/Min nan
2022-07-13 11:45:26.346383 | divergence/Max nan
2022-07-13 11:45:26.346402 | reward_loss/Average nan
2022-07-13 11:45:26.346419 | reward_loss/Std nan
2022-07-13 11:45:26.346437 | reward_loss/Median nan
2022-07-13 11:45:26.346457 | reward_loss/Min nan
2022-07-13 11:45:26.346476 | reward_loss/Max nan
2022-07-13 11:45:26.346493 | image_loss/Average nan
2022-07-13 11:45:26.346510 | image_loss/Std nan
2022-07-13 11:45:26.346527 | image_loss/Median nan
2022-07-13 11:45:26.346545 | image_loss/Min nan
2022-07-13 11:45:26.346563 | image_loss/Max nan
2022-07-13 11:45:26.346582 | pcont_loss/Average nan
2022-07-13 11:45:26.346600 | pcont_loss/Std nan
2022-07-13 11:45:26.346618 | pcont_loss/Median nan
2022-07-13 11:45:26.346637 | pcont_loss/Min nan
2022-07-13 11:45:26.346657 | pcont_loss/Max nan
2022-07-13 11:45:26.346676 | ----------------------------- ------------
2022-07-13 11:45:26.346844 | dreamer_pong_12 itr #2999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2022-07-13 11:45:30.617001 | dreamer_pong_12 itr #3999 saving snapshot...
2022-07-13 11:45:30.642814 | dreamer_pong_12 itr #3999 saved
2022-07-13 11:45:30.651841 | ----------------------------- -----------
2022-07-13 11:45:30.651886 | Diagnostics/NewCompletedTrajs 2
2022-07-13 11:45:30.651942 | Diagnostics/StepsInTrajWindow 4000
2022-07-13 11:45:30.651966 | Diagnostics/Iteration 3999
2022-07-13 11:45:30.652023 | Diagnostics/CumTime (s) 16.0126
2022-07-13 11:45:30.652075 | Diagnostics/CumSteps 4000
2022-07-13 11:45:30.652097 | Diagnostics/CumCompletedTrajs 8
2022-07-13 11:45:30.652140 | Diagnostics/CumUpdates 0
2022-07-13 11:45:30.652173 | Diagnostics/StepsPerSecond 232.051
2022-07-13 11:45:30.652220 | Diagnostics/UpdatesPerSecond 0
2022-07-13 11:45:30.652243 | Diagnostics/ReplayRatio 0
2022-07-13 11:45:30.652290 | Diagnostics/CumReplayRatio 0
2022-07-13 11:45:30.652312 | Length/Average 500
2022-07-13 11:45:30.652366 | Length/Std 0
2022-07-13 11:45:30.652424 | Length/Median 500
2022-07-13 11:45:30.652471 | Length/Min 500
2022-07-13 11:45:30.652497 | Length/Max 500
2022-07-13 11:45:30.652543 | Return/Average -5.5
2022-07-13 11:45:30.652567 | Return/Std 0.707107
2022-07-13 11:45:30.652596 | Return/Median -6
2022-07-13 11:45:30.652608 | Return/Min -6
2022-07-13 11:45:30.652618 | Return/Max -4
2022-07-13 11:45:30.652628 | NonzeroRewards/Average 5.5
2022-07-13 11:45:30.652638 | NonzeroRewards/Std 0.707107
2022-07-13 11:45:30.652647 | NonzeroRewards/Median 6
2022-07-13 11:45:30.652657 | NonzeroRewards/Min 4
2022-07-13 11:45:30.652667 | NonzeroRewards/Max 6
2022-07-13 11:45:30.652676 | DiscountedReturn/Average -0.496869
2022-07-13 11:45:30.652686 | DiscountedReturn/Std 0.166305
2022-07-13 11:45:30.652696 | DiscountedReturn/Median -0.563765
2022-07-13 11:45:30.652705 | DiscountedReturn/Min -0.632832
2022-07-13 11:45:30.652715 | DiscountedReturn/Max -0.117326
2022-07-13 11:45:30.652725 | GameScore/Average -5.5
2022-07-13 11:45:30.652734 | GameScore/Std 0.707107
2022-07-13 11:45:30.652744 | GameScore/Median -6
2022-07-13 11:45:30.652754 | GameScore/Min -6
2022-07-13 11:45:30.652763 | GameScore/Max -4
2022-07-13 11:45:30.652773 | loss/Average nan
2022-07-13 11:45:30.652783 | loss/Std nan
2022-07-13 11:45:30.652796 | loss/Median nan
2022-07-13 11:45:30.652807 | loss/Min nan
2022-07-13 11:45:30.652816 | loss/Max nan
2022-07-13 11:45:30.652826 | grad_norm_model/Average nan
2022-07-13 11:45:30.652836 | grad_norm_model/Std nan
2022-07-13 11:45:30.652846 | grad_norm_model/Median nan
2022-07-13 11:45:30.652855 | grad_norm_model/Min nan
2022-07-13 11:45:30.652865 | grad_norm_model/Max nan
2022-07-13 11:45:30.652874 | grad_norm_actor/Average nan
2022-07-13 11:45:30.652884 | grad_norm_actor/Std nan
2022-07-13 11:45:30.652894 | grad_norm_actor/Median nan
2022-07-13 11:45:30.652903 | grad_norm_actor/Min nan
2022-07-13 11:45:30.652913 | grad_norm_actor/Max nan
2022-07-13 11:45:30.652923 | grad_norm_value/Average nan
2022-07-13 11:45:30.652932 | grad_norm_value/Std nan
2022-07-13 11:45:30.652942 | grad_norm_value/Median nan
2022-07-13 11:45:30.652951 | grad_norm_value/Min nan
2022-07-13 11:45:30.652961 | grad_norm_value/Max nan
2022-07-13 11:45:30.652970 | model_loss/Average nan
2022-07-13 11:45:30.652980 | model_loss/Std nan
2022-07-13 11:45:30.652990 | model_loss/Median nan
2022-07-13 11:45:30.652999 | model_loss/Min nan
2022-07-13 11:45:30.653009 | model_loss/Max nan
2022-07-13 11:45:30.653018 | actor_loss/Average nan
2022-07-13 11:45:30.653028 | actor_loss/Std nan
2022-07-13 11:45:30.653038 | actor_loss/Median nan
2022-07-13 11:45:30.653047 | actor_loss/Min nan
2022-07-13 11:45:30.653057 | actor_loss/Max nan
2022-07-13 11:45:30.653067 | value_loss/Average nan
2022-07-13 11:45:30.653076 | value_loss/Std nan
2022-07-13 11:45:30.653086 | value_loss/Median nan
2022-07-13 11:45:30.653096 | value_loss/Min nan
2022-07-13 11:45:30.653105 | value_loss/Max nan
2022-07-13 11:45:30.653134 | prior_entropy/Average nan
2022-07-13 11:45:30.653149 | prior_entropy/Std nan
2022-07-13 11:45:30.653161 | prior_entropy/Median nan
2022-07-13 11:45:30.653187 | prior_entropy/Min nan
2022-07-13 11:45:30.653197 | prior_entropy/Max nan
2022-07-13 11:45:30.653207 | post_entropy/Average nan
2022-07-13 11:45:30.653216 | post_entropy/Std nan
2022-07-13 11:45:30.653226 | post_entropy/Median nan
2022-07-13 11:45:30.653236 | post_entropy/Min nan
2022-07-13 11:45:30.653246 | post_entropy/Max nan
2022-07-13 11:45:30.653255 | divergence/Average nan
2022-07-13 11:45:30.653265 | divergence/Std nan
2022-07-13 11:45:30.653275 | divergence/Median nan
2022-07-13 11:45:30.653284 | divergence/Min nan
2022-07-13 11:45:30.653294 | divergence/Max nan
2022-07-13 11:45:30.653304 | reward_loss/Average nan
2022-07-13 11:45:30.653313 | reward_loss/Std nan
2022-07-13 11:45:30.653323 | reward_loss/Median nan
2022-07-13 11:45:30.653333 | reward_loss/Min nan
2022-07-13 11:45:30.653343 | reward_loss/Max nan
2022-07-13 11:45:30.653369 | image_loss/Average nan
2022-07-13 11:45:30.653408 | image_loss/Std nan
2022-07-13 11:45:30.653425 | image_loss/Median nan
2022-07-13 11:45:30.653436 | image_loss/Min nan
2022-07-13 11:45:30.653462 | image_loss/Max nan
2022-07-13 11:45:30.653471 | pcont_loss/Average nan
2022-07-13 11:45:30.653481 | pcont_loss/Std nan
2022-07-13 11:45:30.653491 | pcont_loss/Median nan
2022-07-13 11:45:30.653501 | pcont_loss/Min nan
2022-07-13 11:45:30.653535 | pcont_loss/Max nan
2022-07-13 11:45:30.653546 | ----------------------------- -----------
2022-07-13 11:45:30.653701 | dreamer_pong_12 itr #3999 Optimizing over 1000 iterations.
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
2022-07-13 11:45:34.724956 | dreamer_pong_12 itr #4999 saving snapshot...
2022-07-13 11:45:34.750314 | dreamer_pong_12 itr #4999 saved
2022-07-13 11:45:34.760227 | ----------------------------- -----------
2022-07-13 11:45:34.760293 | Diagnostics/NewCompletedTrajs 2
2022-07-13 11:45:34.760357 | Diagnostics/StepsInTrajWindow 5000
2022-07-13 11:45:34.760392 | Diagnostics/Iteration 4999
2022-07-13 11:45:34.760416 | Diagnostics/CumTime (s) 20.1201
2022-07-13 11:45:34.760459 | Diagnostics/CumSteps 5000
2022-07-13 11:45:34.760484 | Diagnostics/CumCompletedTrajs 10
2022-07-13 11:45:34.760526 | Diagnostics/CumUpdates 0
2022-07-13 11:45:34.760546 | Diagnostics/StepsPerSecond 243.455
2022-07-13 11:45:34.760592 | Diagnostics/UpdatesPerSecond 0
2022-07-13 11:45:34.760606 | Diagnostics/ReplayRatio 0
2022-07-13 11:45:34.760657 | Diagnostics/CumReplayRatio 0
2022-07-13 11:45:34.760673 | Length/Average 500
2022-07-13 11:45:34.760775 | Length/Std 0
2022-07-13 11:45:34.760817 | Length/Median 500
2022-07-13 11:45:34.760865 | Length/Min 500
2022-07-13 11:45:34.760880 | Length/Max 500
2022-07-13 11:45:34.760932 | Return/Average -5.5
2022-07-13 11:45:34.760946 | Return/Std 0.67082
2022-07-13 11:45:34.760997 | Return/Median -6
2022-07-13 11:45:34.761012 | Return/Min -6
2022-07-13 11:45:34.761064 | Return/Max -4
2022-07-13 11:45:34.761078 | NonzeroRewards/Average 5.5
2022-07-13 11:45:34.761119 | NonzeroRewards/Std 0.67082
2022-07-13 11:45:34.761131 | NonzeroRewards/Median 6
2022-07-13 11:45:34.761141 | NonzeroRewards/Min 4
2022-07-13 11:45:34.761160 | NonzeroRewards/Max 6
2022-07-13 11:45:34.761171 | DiscountedReturn/Average -0.490804
2022-07-13 11:45:34.761181 | DiscountedReturn/Std 0.15736
2022-07-13 11:45:34.761191 | DiscountedReturn/Median -0.563765
2022-07-13 11:45:34.761201 | DiscountedReturn/Min -0.632832
2022-07-13 11:45:34.761211 | DiscountedReturn/Max -0.117326
2022-07-13 11:45:34.761220 | GameScore/Average -5.5
2022-07-13 11:45:34.761230 | GameScore/Std 0.67082
2022-07-13 11:45:34.761240 | GameScore/Median -6
2022-07-13 11:45:34.761249 | GameScore/Min -6
2022-07-13 11:45:34.761259 | GameScore/Max -4
2022-07-13 11:45:34.761269 | loss/Average nan
2022-07-13 11:45:34.761278 | loss/Std nan
2022-07-13 11:45:34.761288 | loss/Median nan
2022-07-13 11:45:34.761298 | loss/Min nan
2022-07-13 11:45:34.761308 | loss/Max nan
2022-07-13 11:45:34.761317 | grad_norm_model/Average nan
2022-07-13 11:45:34.761327 | grad_norm_model/Std nan
2022-07-13 11:45:34.761336 | grad_norm_model/Median nan
2022-07-13 11:45:34.761346 | grad_norm_model/Min nan
2022-07-13 11:45:34.761356 | grad_norm_model/Max nan
2022-07-13 11:45:34.761365 | grad_norm_actor/Average nan
2022-07-13 11:45:34.761375 | grad_norm_actor/Std nan
2022-07-13 11:45:34.761385 | grad_norm_actor/Median nan
2022-07-13 11:45:34.761394 | grad_norm_actor/Min nan
2022-07-13 11:45:34.761404 | grad_norm_actor/Max nan
2022-07-13 11:45:34.761414 | grad_norm_value/Average nan
2022-07-13 11:45:34.761423 | grad_norm_value/Std nan
2022-07-13 11:45:34.761433 | grad_norm_value/Median nan
2022-07-13 11:45:34.761448 | grad_norm_value/Min nan
2022-07-13 11:45:34.761459 | grad_norm_value/Max nan
2022-07-13 11:45:34.761468 | model_loss/Average nan
2022-07-13 11:45:34.761478 | model_loss/Std nan
2022-07-13 11:45:34.761488 | model_loss/Median nan
2022-07-13 11:45:34.761498 | model_loss/Min nan
2022-07-13 11:45:34.761508 | model_loss/Max nan
2022-07-13 11:45:34.761517 | actor_loss/Average nan
2022-07-13 11:45:34.761527 | actor_loss/Std nan
2022-07-13 11:45:34.761537 | actor_loss/Median nan
2022-07-13 11:45:34.761546 | actor_loss/Min nan
2022-07-13 11:45:34.761556 | actor_loss/Max nan
2022-07-13 11:45:34.761566 | value_loss/Average nan
2022-07-13 11:45:34.761575 | value_loss/Std nan
2022-07-13 11:45:34.761585 | value_loss/Median nan
2022-07-13 11:45:34.761595 | value_loss/Min nan
2022-07-13 11:45:34.761605 | value_loss/Max nan
2022-07-13 11:45:34.761615 | prior_entropy/Average nan
2022-07-13 11:45:34.761624 | prior_entropy/Std nan
2022-07-13 11:45:34.761634 | prior_entropy/Median nan
2022-07-13 11:45:34.761644 | prior_entropy/Min nan
2022-07-13 11:45:34.761654 | prior_entropy/Max nan
2022-07-13 11:45:34.761663 | post_entropy/Average nan
2022-07-13 11:45:34.761673 | post_entropy/Std nan
2022-07-13 11:45:34.761683 | post_entropy/Median nan
2022-07-13 11:45:34.761692 | post_entropy/Min nan
2022-07-13 11:45:34.761702 | post_entropy/Max nan
2022-07-13 11:45:34.761712 | divergence/Average nan
2022-07-13 11:45:34.761721 | divergence/Std nan
2022-07-13 11:45:34.761731 | divergence/Median nan
2022-07-13 11:45:34.761741 | divergence/Min nan
2022-07-13 11:45:34.761751 | divergence/Max nan
2022-07-13 11:45:34.761760 | reward_loss/Average nan
2022-07-13 11:45:34.761770 | reward_loss/Std nan
2022-07-13 11:45:34.761780 | reward_loss/Median nan
2022-07-13 11:45:34.761789 | reward_loss/Min nan
2022-07-13 11:45:34.761799 | reward_loss/Max nan
2022-07-13 11:45:34.761809 | image_loss/Average nan
2022-07-13 11:45:34.761819 | image_loss/Std nan
2022-07-13 11:45:34.761828 | image_loss/Median nan
2022-07-13 11:45:34.761838 | image_loss/Min nan
2022-07-13 11:45:34.761848 | image_loss/Max nan
2022-07-13 11:45:34.761857 | pcont_loss/Average nan
2022-07-13 11:45:34.761867 | pcont_loss/Std nan
2022-07-13 11:45:34.761877 | pcont_loss/Median nan
2022-07-13 11:45:34.761886 | pcont_loss/Min nan
2022-07-13 11:45:34.761896 | pcont_loss/Max nan
2022-07-13 11:45:34.761906 | ----------------------------- -----------
2022-07-13 11:45:34.761999 | dreamer_pong_12 itr #4999 Optimizing over 1000 iterations.
Imagination: 0%| | 0/100 [00:05<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 92, in
build_and_train(
File "main.py", line 65, in build_and_train
runner.train()
File "/home/uav-robot/anaconda3/envs/juliusfrost/lib/python3.8/site-packages/rlpyt/runners/minibatch_rl.py", line 259, in train
opt_info = self.algo.optimize_agent(itr, samples)
File "/home/uav-robot/MBRL/juliusfrost/dreamer-pytorch/dreamer/algos/dreamer_algo.py", line 147, in optimize_agent
model_loss, actor_loss, value_loss, loss_info = self.loss(buffed_samples, itr, i)
File "/home/uav-robot/MBRL/juliusfrost/dreamer-pytorch/dreamer/algos/dreamer_algo.py", line 232, in loss
pcont_loss = -torch.mean(pcont_pred.log_prob(pcont_target))
File "/home/uav-robot/anaconda3/envs/juliusfrost/lib/python3.8/site-packages/torch/distributions/independent.py", line 95, in log_prob
log_prob = self.base_dist.log_prob(value)
File "/home/uav-robot/anaconda3/envs/juliusfrost/lib/python3.8/site-packages/torch/distributions/bernoulli.py", line 100, in log_prob
self._validate_sample(value)
File "/home/uav-robot/anaconda3/envs/juliusfrost/lib/python3.8/site-packages/torch/distributions/distribution.py", line 293, in _validate_sample
raise ValueError(
ValueError: Expected value argument (Tensor of shape (50, 50, 1)) to be within the support (Boolean()) of the distribution Bernoulli(logits: torch.Size([50, 50, 1])), but found invalid values:
tensor([[[0.9900],
[0.9900],
[0.9900],
...,
[0.9900],
[0.9900],
[0.9900]],

    [[0.9900],
     [0.9900],
     [0.9900],
     ...,
     [0.9900],
     [0.9900],
     [0.9900]],

    [[0.9900],
     [0.9900],
     [0.9900],
     ...,
     [0.9900],
     [0.9900],
     [0.9900]],

    ...,

    [[0.9900],
     [0.9900],
     [0.9900],
     ...,
     [0.9900],
     [0.9900],
     [0.9900]],

    [[0.9900],
     [0.9900],
     [0.9900],
     ...,
     [0.9900],
     [0.9900],
     [0.9900]],

    [[0.9900],
     [0.9900],
     [0.9900],
     ...,
     [0.9900],
     [0.9900],
     [0.9900]]])

and my env is

packages in environment at /home/uav-robot/anaconda3/envs/juliusfrost:

Issue in Adjusting Exploration Amount in Dreamer Agent

Describe the bug
I have been working on wrapping a custom game with a Dreamer agent. During the execution in the epsilon_greedy mode, I noticed that the expl_amount was not decreasing as expected in dreamer/agents/dreamer_agent.py. It seems the variable self._itr was not properly updated, which led to this issue.

    def exploration(self, action: torch.Tensor) -> torch.Tensor:
        """
        :param action: action to take, shape (1,) (if categorical), or (action dim,) (if continuous)
        :return: action of the same shape passed in, augmented with some noise
        """
        if self._mode in ["train", "sample"]:
            expl_amount = self.train_noise
            if self.expl_decay:  # Linear decay
                expl_amount = expl_amount - self._itr / self.expl_decay
            if self.expl_min:
                expl_amount = max(self.expl_min, expl_amount)
        elif self._mode == "eval":
            expl_amount = self.eval_noise
        else:
            raise NotImplementedError
            
        if self.expl_type == "additive_gaussian":  # For continuous actions
            noise = torch.randn(*action.shape, device=action.device) * expl_amount
            return torch.clamp(action + noise, -1, 1)
        if self.expl_type == "completely_random":  # For continuous actions
            if expl_amount == 0:
                return action
            else:
                return (
                    torch.rand(*action.shape, device=action.device) * 2 - 1
                )  # scale to [-1, 1]
        if self.expl_type == "epsilon_greedy":  # For discrete actions
            action_dim = self.env_model_kwargs["action_shape"][0]
            if np.random.uniform(0, 1) < expl_amount:
                index = torch.randint(
                    0, action_dim, action.shape[:-1], device=action.device
                )
                action = torch.zeros_like(action)
                action[..., index] = 1
            return action
        raise NotImplementedError(self.expl_type)

Solve
To remedy this, I have added self.agent._itr = itr in rlpyt/rlpyt/runners/minibatch_rl.py as shown below:

    def train(self):
        """samples
        Performs startup, then loops by alternating between
        ``sampler.obtain_samples()`` and ``algo.optimize_agent()``, logging
        diagnostics at the specified interval.
        """
        n_itr = self.startup()
        for itr in range(n_itr):
            logger.set_iteration(itr)
            with logger.prefix(f"itr #{itr} "):
                self.agent._itr = itr # add
                self.agent.sample_mode(itr)  # Might not be this agent sampling.
                samples, traj_infos = self.sampler.obtain_samples(itr)
                self.agent.train_mode(itr)
                opt_info = self.algo.optimize_agent(itr, samples)
                self.store_diagnostics(itr, traj_infos, opt_info)
                if (itr + 1) % self.log_interval_itrs == 0:
                    self.log_diagnostics(itr)
        self.shutdown()

Additional context
With this modification, I was able to observe the decrease in expl_amount during the execution in the epsilon_greedy mode.

Could you please confirm if this is the correct way to address this issue? If not, any suggestions or guidance would be greatly appreciated.

batch_length doesn't seem to work

dreamer-pytorch/dreamer/algos/dreamer_algo.py

Line 194 in 47bd509

 observation = samples.all_observation[:-1] # [t, t+batch_length+1] -> [t, t+batch_length] 

in there, observation shape is [15, 50, 3, 64, 64].

dreamer-pytorch/dreamer/algos/dreamer_algo.py

Line 145 in 47bd509

 samples_from_replay = self.replay_buffer.sample_batch(self._batch_size, self.batch_length) 

but i guess you intended to get [50, 50, 3, 64, 64]. what do you think?

Very minor bug in RSSM

Very minor (and I may have misunderstood): I believe the input to the stochastic_prior_model in the RSSMTransition class should be deterministic_size rather than hidden_size (line 76 in rnns.py).

This doesn't cause any errors at the moment as these default to the same value (200). But I see no reason why these have to be the same - one is the hidden size of the RNN, and the other is a generic size for hidden nodes in the model.

juliusfrost / dreamer-pytorch Goto Github PK

dreamer-pytorch's People

Contributors

Stargazers

Watchers

Forkers

dreamer-pytorch's Issues

packages in environment at /home/uav-robot/anaconda3/envs/juliusfrost:

Name Version Build Channel

Recommend Projects

Recommend Topics

Recommend Org