Giter VIP home page Giter VIP logo

rl-from-scratch's Introduction

rl-from-scratch

This project was done as a part of the cource project COMPSCI 687. We have implemented REINFORCE with baseline, Actor critic episodic method and PPO and have discussed and benchmarked the performance of these methods on environments provided by the Open AI gymnasium toolkit. We also implemented different variations of the PPO algorithm, like adding entropy bonus to facilitate exploration and normalizing advantage estimates for stability. Please refer to this report for more specific details.

Setting up the environment :

For best results we recommend setting up a conda environment or a venv. To install the dependencies, run:

pip install -r requirements.txt

Training and Visualizing the agent :

For training the agent using Reinforce with baseline, run :

For Lunar Lander : python reinforce.py

For Cartpole : python reinforce-cartpole.py

For acrobot: python reinforce-acrobot.py

Actor Critic

For training the agent using actor-critic, run :

python actor_critic.py --env_name <env_name> --rand_seed <rand_seed>

For visualizing the trained model using actor-critic, run (For running the models trained by us keep rand_seed between 0-4) :

python actor_critic.py --env_name <env_name> --rand_seed <rand_seed> --eval_only

PPO

For training the agent using ppo, run :

python ppo.py --env_name <env_name> --rand_seed <rand_seed>

For visualizing the trained model using ppo, run (For running the models trained by us keep rand_seed between 0-4) :

python ppo.py --env_name <env_name> --rand_seed <rand_seed> --eval_only

Reinforce with baseline results :

Cartpole :

Acrobot :

Lunar lander :

Actor critic results :

Cartpole :

Acrobot :

Lunar lander :

Proximal policy optimization :

Cartpole :

Acrobot :

Lunar lander :

rl-from-scratch's People

Contributors

demon702 avatar sriharsha-hatwar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.