qasimwani / rl-unity Goto Github PK

View Code? Open in Web Editor NEW

19.0 5.0 5.0 62.65 MB

Implementation of Deep Reinforcement Learning algorithms in the Unity game engine.

License: Boost Software License 1.0

Jupyter Notebook 100.00%

unity drl dqn-pytorch ddpg pytorch python3 prioritized-experience-replay td3 multi-agent-reinforcement-learning

rl-unity's Introduction

Implementation of STOA Deep Reinforcement Learning (DRL) algorithms in the Unity Engine.

Algorithms:

Value Based Method - Deep Q Network (DQN)
Policy Based Method - Deep Deterministic Policy Gradient (DDPG)
Multi Agent Reinforcement Learning using MADDPG

Outputs:

Value Based Method - DQN

In this Unity environment, the goal of the agent is to pick up yellow bananas while avoiding blue bananas.

Rolling Scores

Policy Based Method - DDPG

In this Unity environment, the goal of the agent is to move the double-jointed arm to the target location indicated by the torquoise sphere. This video demonstrates a more practical approach of the Reacher Unity environment.

Rolling Scores

Multi-Agent RL - MADDPG

In this Unity environment, the goal of the agent is to maximize the rally between the two tennis agents, i.e. as the two agents pass the ball to each other without dropping, the higher the reward.

Rolling Scores

rl-unity's People

Contributors

Stargazers

Watchers

Forkers

linuxiscool lx97 basharbme vitovandersen

rl-unity's Issues

unable to run venv on remote server with cuda enabled

Run training model on cuda GPU with drlnd venv enabled.

What is the unity version?

Hi, great repo.

What is the unity verson that you used? I noticed that 'Tennis.app" doesn't work with the latest version.

actor loss backprop version diff

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [128, 2]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Local Optima reached, decreased rewards

For Continous Control on 20 parallel reacher agents, the model seems to have reached a saddle or local optima at around 26 for window score (last 100 score mean) and then starts to decay.
Need to fine tune the parameters, maybe change from Relu to leaky Relu activation?

bottleneck achieved in Tennis (MADDPG) env using complete competition

model fails to converge. found a local optima, almost a saddle point and started degrading afterwards because it couldn't keep ascending (gradient).

potential solutions:

change model architecture (128->64) {default} to something else, try smaller simpler first.
change other hyperparams.
implement different style of DDPG

optimizer grad error

`element 0 of tensors does not require grad and does not have a grad_fn`

Stack:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-178-212068df461b> in <module>
     15         done = env_info.local_done[0]                  # see if episode has finished
     16         transition = (state, action, reward, next_state, done) #set transition
---> 17         agent.step(transition)                         # step into the next state
     18         score += reward                                # update the score
     19         state = next_state                             # roll over the state to next time step

<ipython-input-173-50233688ff15> in step(self, transition)
     42             if(len(self.memory) > BATCH_SIZE):
     43                 experiences = self.memory.sample()
---> 44                 self.train(experiences)
     45 
     46     def get_action(self, state, eps=0.0):

<ipython-input-173-50233688ff15> in train(self, experiences)
     84         loss = F.mse_loss(Q_expected, Q_targets)
     85         self.optimizer.zero_grad()
---> 86         loss.backward()
     87         self.optimizer.step()
     88 

~/opt/anaconda3/envs/drlnd/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    196                 products. Defaults to ``False``.
    197         """
--> 198         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    199 
    200     def register_hook(self, hook):

~/opt/anaconda3/envs/drlnd/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98     Variable._execution_engine.run_backward(
     99         tensors, grad_tensors, retain_graph, create_graph,
--> 100         allow_unreachable=True)  # allow_unreachable flag
    101 
    102 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn```

qasimwani / rl-unity Goto Github PK

rl-unity's Introduction

Implementation of STOA Deep Reinforcement Learning (DRL) algorithms in the Unity Engine.

Algorithms:

Outputs:

Value Based Method - DQN

Policy Based Method - DDPG

rl-unity's People

Contributors

Stargazers

Watchers

Forkers

rl-unity's Issues

potential solutions:

element 0 of tensors does not require grad and does not have a grad_fn

Recommend Projects

Recommend Topics

Recommend Org

`element 0 of tensors does not require grad and does not have a grad_fn`