Giter VIP home page Giter VIP logo

rl-unity's Introduction

Implementation of STOA Deep Reinforcement Learning (DRL) algorithms in the Unity Engine.

Algorithms:

  1. Value Based Method - Deep Q Network (DQN)
  2. Policy Based Method - Deep Deterministic Policy Gradient (DDPG)
  3. Multi Agent Reinforcement Learning using MADDPG

Outputs:

Value Based Method - DQN

In this Unity environment, the goal of the agent is to pick up yellow bananas while avoiding blue bananas.

Rolling Scores

Policy Based Method - DDPG

In this Unity environment, the goal of the agent is to move the double-jointed arm to the target location indicated by the torquoise sphere. This video demonstrates a more practical approach of the Reacher Unity environment.

Rolling Scores

Multi-Agent RL - MADDPG

In this Unity environment, the goal of the agent is to maximize the rally between the two tennis agents, i.e. as the two agents pass the ball to each other without dropping, the higher the reward.

Rolling Scores

rl-unity's People

Contributors

qasimwani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rl-unity's Issues

bottleneck achieved in Tennis (MADDPG) env using complete competition

model fails to converge. found a local optima, almost a saddle point and started degrading afterwards because it couldn't keep ascending (gradient).

potential solutions:

  1. change model architecture (128->64) {default} to something else, try smaller simpler first.
  2. change other hyperparams.
  3. implement different style of DDPG

actor loss backprop version diff

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [128, 2]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

What is the unity version?

Hi, great repo.

What is the unity verson that you used? I noticed that 'Tennis.app" doesn't work with the latest version.

optimizer grad error

element 0 of tensors does not require grad and does not have a grad_fn

Stack:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-178-212068df461b> in <module>
     15         done = env_info.local_done[0]                  # see if episode has finished
     16         transition = (state, action, reward, next_state, done) #set transition
---> 17         agent.step(transition)                         # step into the next state
     18         score += reward                                # update the score
     19         state = next_state                             # roll over the state to next time step

<ipython-input-173-50233688ff15> in step(self, transition)
     42             if(len(self.memory) > BATCH_SIZE):
     43                 experiences = self.memory.sample()
---> 44                 self.train(experiences)
     45 
     46     def get_action(self, state, eps=0.0):

<ipython-input-173-50233688ff15> in train(self, experiences)
     84         loss = F.mse_loss(Q_expected, Q_targets)
     85         self.optimizer.zero_grad()
---> 86         loss.backward()
     87         self.optimizer.step()
     88 

~/opt/anaconda3/envs/drlnd/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    196                 products. Defaults to ``False``.
    197         """
--> 198         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    199 
    200     def register_hook(self, hook):

~/opt/anaconda3/envs/drlnd/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98     Variable._execution_engine.run_backward(
     99         tensors, grad_tensors, retain_graph, create_graph,
--> 100         allow_unreachable=True)  # allow_unreachable flag
    101 
    102 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn```

Local Optima reached, decreased rewards

For Continous Control on 20 parallel reacher agents, the model seems to have reached a saddle or local optima at around 26 for window score (last 100 score mean) and then starts to decay.
Need to fine tune the parameters, maybe change from Relu to leaky Relu activation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.