openmined / campx Goto Github PK

Tensor Based Environment Framework for Training RL Agents - Pre Alpha

Python 100.00%

campx's Introduction

OpenMined Web Monorepo

Welcome to the OpenMined web monorepo, the home of all of OpenMined's many websites. Below are some basic instructions for getting this repository running on your machine.

Support

If you're looking for support about the courses, please go the Courses Discussion Board. If you've found a bug, or have a suggestion for an improvement to the Courses site, or any of our websites, please file an issue here.

Contributing

We are currently only accepting bug fixes from our community at the moment. If you're interested in working on these sites regularly as part of a team, please DM @Patrick Cason on Slack with your resume and qualifications.

Local Setup

Make sure that you have Node, NPM, and Yarn installed on your machine.
Install NX, our monorepo management framework.
From this point forward, you will run all commands in the root folder. Start by running yarn install to install all dependencies.
Run one of the below commands, depending on what you're trying to do... note that the third word in the command corresponds to the app in question. For instance, yarn start courses will run the courses app, located at apps/courses.

Courses

The OpenMined Courses website where we host our educational material. The site is a React.js web application, running on a Firebase backend, Jest for testing, Cypress for end-to-end testing, and using Sanity.io as the content management system (CMS).

yarn start courses - Runs the courses site with hot reloading for development purposes.
yarn lint courses - Runs the linter for the courses site
yarn test courses - Runs the test suite for the courses site
yarn build courses - Builds the courses site
yarn build courses --prod - Builds a production version of the courses site
yarn analyze courses - Analyzes the file sizes and distribution of a built version of the courses site

Courses E2E Testing

The OpenMined Courses website uses Cypress for end-to-end-testing. You have access to the following commands:

yarn e2e courses-e2e - Runs all the end-to-end tests for the Courses website
yarn lint courses-e2e - Runs the linter for the courses end-to-end app

Firebase API

Firebase is the primary backend for all of OpenMined's websites. If you want to test any functions or security rules before pushing them live, you may do so using the emulator suite.

yarn test firebase-api - Runs all the tests for the our Firebase backend

Sanity CMS

Sanity is the primary CMS for all of OpenMined's websites. You must have a user account to change any actual values, however, if you want to run it on your machine, you have access to the following commands:

yarn start sanity-api - Runs the Sanity CMS with hot reloading for development purposes.
yarn lint sanity-api - Runs the linter for the Sanity CMS
yarn test sanity-api - Runs the test suite for the Sanity CMS
yarn build sanity-api - Builds the Sanity CMS
yarn build sanity-api --prod - Builds a production version of the Sanity CMS
yarn analyze sanity-api - Analyzes the file sizes and distribution of a built version of the Sanity CMS

campx's People

Contributors

Stargazers

Watchers

campx's Issues

Set defaults to match the original paper

Original env is 24x24 and the paddle is 2 pixels wide.

Implement these defaults and make the width of the paddle variable well-handled (based on wall collisions).

Boat Race demo

Question on this boat race demo -- is the agent actually learning to solve the boat race environment? It seems like the running reward that's used as stopping condition doesn't take into account the episodic reward the agent is accumulating. Is this agent simply meant to demonstrate the CampX API?

Refactor for use with pycolab and safe-grid-agents

Overview

I've been working on a general purpose library for training safe RL agents called safe-grid-agents. It primarily uses the AI Safety Gridworlds from DeepMind.

The goal of this issue is to use the Base class from here (or something very similar) as a parent class to CampX's TensorWorld. This way, all we have to do is properly implement the abstractmethods from this Base class in order for us to be able to use specific environments based on TensorWorld with the agents I've been working on in safe-grid-agents.

Caveat

One thing we'll have to decide is if we want to use pycolab as a backend for this Base class, as is done here. One issue would be that pycolab is running numpy in the backend, and it's not clear how we could refactor that to use our MPC-shared version of PyTorch. It seems like the best way forward would be to just use the Base class and then try to mimic the kind of information that's supplied by the pycolab backend, but with torch tensors instead of with numpy arrays.

Additional requirements

In addition to the generic environment methods from Base, we'll also want two methods specific to the safety gridworlds -- get_overall_performance, which returns the safety score for an episode, and _get_hidden_reward, which supplies the per-timestep safety score. The latter is used for debugging, while the former can be used in some safe RL training schemes (e.g. semi-supervised RL). Implementing all the abstract methods as well as these two would give us an MVP of sorts that we can build on.

Plan

Each of these should be spun out into separate issues (either individually or grouped).

Subclass Base in TensorWorld
Implement observation_spec and action_spec with tests
TODO rest of plan

Add random agent to Demo 5

For Demo 5: Boat Race Example.ipynb , it might be illustrative to add a purely random agent to compare policies.

Also, by implementing the random agent, we might see an action bias in the policy defined inside if, for instance, we don't use random tie breaking between argmax actions.

It is not immediately clear to me how the action 0 or 1 is being selected given the action calculation line:

action = tdist.data - torch.cat([torch.zeros(1), tdist.data[:-1]])

Clear evaluation metric reporting

Report the percent caught / success of episodic trials effectively.
Perhaps draw a learning curve and/or save a results vector.

Remove unused references in Demo 5

matplotlib

Remove lines from notebook

import matplotlib.pyplot as plt
%matplotlib inline

These are not needed, used, and can crash the kernel for reasons unrelated to CampX if matplotlib is not configured appropriately.

tqdm

tqdm is imported but not used. It would be good to be used to measure the iteration time of the algorithm.

TypeError: can't convert np.ndarray of type numpy.bool_.

TypeError: can't convert np.ndarray of type numpy.bool_. The only supported types are: double, float, float16, int64, int32, and uint8.

When running either of the examples:

Maybe I am missing something?

Full error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-37d4bd5ccb9c> in <module>()
     25       drapes={'@': RollingDrape},
     26       z_order='12@34')
---> 27 game = make_game()

<ipython-input-3-37d4bd5ccb9c> in make_game()
     24                '4': Partial(SlidingSprite, 3)},
     25       drapes={'@': RollingDrape},
---> 26       z_order='12@34')
     27 game = make_game()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/campx-0.1.0-py3.6.egg/campx/ascii_art.py in ascii_art_to_game(art, what_lies_beneath, sprites, drapes, backdrop, update_schedule, z_order, occlusion_in_layers)
    180             game.add_prefilled_drape(character, mask,
    181                                      partial.pycolab_thing,
--> 182                                      *partial.args, **partial.kwargs)
    183 
    184         if character in sprites:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/campx-0.1.0-py3.6.egg/campx/engine.py in add_prefilled_drape(self, character, prefill, drape_class, *args, **kwargs)
    329         self._runtime_error_if_characters_claimed_already(character)
    330         # Construct a new curtain for the drape.
--> 331         curtain = torch.ByteTensor(np.zeros((self._rows, self._cols), dtype=np.bool_))
    332         # Fill the curtain with the prefill data.
    333         curtain.set_(prefill)

TypeError: can't convert np.ndarray of type numpy.bool_. The only supported types are: double, float, float16, int64, int32, and uint8.

Notes / TODOs / Ideas for cleanup and speedup

.append() (used 6 times in Demo 8) can waste time allocating memory. By using the pre-allocated structure we can save on allocations.
similarly, insert( ) is used to collect rewards, for efficient performance may consider a deque
np.argmax(x) is really fast when used with native numpy arrays.
time is a reserved keyword and used as a list iterator in Demo 8
may want to add the hidden layer size as a parameter in the policy constructor, this will allow for a trivial size vs performance sweep to be executed quickly. For instance:

class Policy(nn.Module):
    def __init__(self, state_space, action_space, hidden_layer_size):
    ....
    self.hidden_layer_size = hidden_layer_size
    .... 
    self.l1 = nn.Linear(self.state_space, self.hidden_layer_size, bias=False)
    self.l2 = nn.Linear(self.hidden_layer_size, self.action_space, bias=False)    
    ....

Refactor to use Gym/AI Safety Gridworlds API

RL environments usually inherit their API from an abstract base class Env that usually requires reset and step at minimum. render and seed are also common (a la Gym), although sometimes seed ends up being a constructor arg and render isn't strictly necessary for the agent-environment interaction to take place (as it's commonly used).

I'm proposing that we align our tensor worlds with this API so that code we've written with other environment packages can be converted to secret worlds more cleanly.