Giter VIP home page Giter VIP logo

pytorch-a2c's Introduction

Pytorch-A2C

This Implementation of an Actor-Critic network gets an Image as input. The features of the image are shared bewteen the actor and the critic. Features of the image are extracted by two sequential convolution layers followed by a dense Layer. Actor and critic are getting the output of the dense layer as inputs.

Original version is from https://github.com/higgsfield/Imagination-Augmented-Agents

requiered Libraries:

  • Pytorch 0.4.1 or later
  • Torchvision
  • gym

########################### Pacman training ###########################

Datails of the environment: The code was written by @sracaniere from DeepMind https://github.com/sracaniere

See https://arxiv.org/abs/1707.06203 [p.15ff] "C MiniPacman additional details" for more information about about MiniPacman environment.

To start the training of the agent, run A2C_training_pacman.py. To see the result, run PlayPacman.py.

The training method of this A2C is n-Step training. To prevent the agent from overfitting, at least 16 environments should be used to guarantee enough variation of training data. If a different environment with more sparce rewards is used, the number of environments should be increased.

########################### KeyCollect training ###########################

In https://arxiv.org/abs/1707.06203 the Sokoban environment is described as second training environment. An implementation of this Sokoban environment can be found here: https://github.com/mpSchrader/gym-sokoban . To make this environment usable for the I2A it has to be downscaled to (3x80x80).

Since the generation of Sokoban puzzles can take from 2 seconds to over one minute and the A2C is synchronized, training would take way to long. A3C should be considered to use for this environment as the processes don´t have to wait for other processes to finish level generation. The network proposed in the paper for Sokoban is larger and takes many more training epochs ( see https://arxiv.org/abs/1707.06203 for details)

KeyCollect is a computationally very light weight environment that has sparse rewards and keeps training costs low. Traing of KeyCollect environment is done with the same network as MiniPacman. In KeyCollect the agent must reach one to five subgoals before reaching the goal. If he reaches the goal before, he is trapped in it until maximum steps of the level are reached

rewards: every step: -1 subgoal: 1 goal: 10

To start the training of the agent, run A2C_training_keyCollect.py. To see the result, run PlayKeyCollect.py.

The training method of this A2C is n-Step training. The advatage of n-Step training is that no target networks are reqiered. Due to very sparse rewards to prevent the agent from overfitting, at least 32 environments should be used to guarantee enough variation of training data.

Feel free to give any feedback or comments.

pytorch-a2c's People

Contributors

olloxan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.