Giter VIP home page Giter VIP logo

multiarmedbandit_rl's Introduction

MultiArmedBandit_RL

Implementation of various multi-armed bandits algorithms using Python.

Algorithms Implemented

The following algorithms are implemented on a 10-arm testbed, as described in Reinforcement Learning : An Introduction by Richard and Sutton.

  • Epsilon-Greedy Algorithm
  • Softmax Algorithm
  • Upper Confidence Bound(UCB1)
  • Median Elimination Algorithm(MEA)

Dependencies

  • numpy(used version 1.13.3)
  • matplotlib(used version 1.5.1)

Description of the Experiment

A 10-arm bandit testbed was generated as described in Section 'The 10-armed Testbed' of the book. The testbed contains 2000 bandit problems with 10 arms each, with the true action values q(a) for each action/arm in each bandit sampled from a normal distribution N(0,1). When a learning algorithm is applied to any of these arms at time t, action At is selected from each bandit problem and it gets a reward Rt which is sampled from N (q(At),1). The performance of each of the implemented algorithms is measured as ”Average Reward” (or) ”% Optimal Actions picked” at each time step.

Implementations

The codes for each algorithm and corresponding plots generated can be found in the respective folders.

The 10-arm testbed is implemented separately for each algorithm in the corresponding code file. The epsilon-greedy, softmax and UCB1 algorithms have been implemented for 1000 (time) steps each with varying values of respective parameters. The Median Elimination Algorithm has been implemented for different values of ε and δ (total number of time steps for MEA is determined by the value of these hyperparameters). Running it for these 1000(or in the case of MEA, as calculated by values of ε and δ) time steps constitutes 1 run. This is repeated for 2000 independent runs, each time with a different bandit problem. The rewards obtained over all these runs is averaged to get an idea of the algorithm's average behaviour.

Comparison graphs have been plotted and can be found in the UCB and MEA folders.

More descriptions can be found in the respective folders.

References

multiarmedbandit_rl's People

Contributors

sahanaramnath avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.