Giter VIP home page Giter VIP logo

adeptrl's Introduction

banner

adept is a reinforcement learning framework designed to accelerate research by abstracting away engineering challenges associated with deep reinforcement learning. adept provides:

  • multi-GPU training
  • a modular interface for using custom networks, agents, and environments
  • baseline reinforcement learning models and algorithms for PyTorch
  • built-in tensorboard logging, model saving, reloading, evaluation, and rendering
  • proven hyperparameter defaults

This code is early-access, expect rough edges. Interfaces subject to change. We're happy to accept feedback and contributions.

Read More

Documentation

Examples

Installation

git clone https://github.com/heronsystems/adeptRL
cd adeptRL
pip install -e .[all]

From docker:

Quickstart

Train an Agent Logs go to /tmp/adept_logs/ by default. The log directory contains the tensorboard file, saved models, and other metadata.

# Local Mode (A2C)
# We recommend 4GB+ GPU memory, 8GB+ RAM, 4+ Cores
python -m adept.app local --env BeamRiderNoFrameskip-v4

# Distributed Mode (A2C, requires NCCL)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app distrib --env BeamRiderNoFrameskip-v4

# IMPALA (requires ray, resource intensive)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app actorlearner --env BeamRiderNoFrameskip-v4

# To see a full list of options:
python -m adept.app -h
python -m adept.app help <command>

Use your own Agent, Environment, Network, or SubModule

"""
my_script.py

Train an agent on a single GPU.
"""
from adept.scripts.local import parse_args, main
from adept.network import NetworkModule, SubModule1D
from adept.agent import AgentModule
from adept.env import EnvModule


class MyAgent(AgentModule):
    pass  # Implement


class MyEnv(EnvModule):
    pass  # Implement


class MyNet(NetworkModule):
    pass  # Implement


class MySubModule1D(SubModule1D):
    pass  # Implement


if __name__ == '__main__':
    import adept
    adept.register_agent(MyAgent)
    adept.register_env(MyEnv)
    adept.register_network(MyNet)
    adept.register_submodule(MySubModule1D)
    main(parse_args())
  • Call your script like this: python my_script.py --agent MyAgent --env env-id-1 --custom-network MyNet
  • You can see all the args here or how to implement the stubs in the examples section above.

Features

Scripts

Local (Single-node, Single-GPU)

  • Best place to start if you're trying to understand code.

Distributed (Multi-node, Multi-GPU)

  • Uses NCCL backend to all-reduce gradients across GPUs without a parameter server or host process.
  • Supports NVLINK and InfiniBand to reduce communication overhead
  • InfiniBand untested since we do not have a setup to test on.

Importance Weighted Actor Learner Architectures, IMPALA (Single Node, Multi-GPU)

  • Our implementation uses GPU workers rather than CPU workers for forward passes.
  • On Atari we achieve ~4k SPS = ~16k FPS with two GPUs and an 8-core CPU.
  • "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour."
  • IMPALA official experiments use 48 cores.
  • Ours: 2000 frame / (second * # CPU core) DeepMind: 1157 frame / (second * # CPU core)
  • Does not yet support multiple nodes or direct GPU memory transfers.

Agents

Networks

  • Modular Network Interface: supports arbitrary input and output shapes up to 4D via a SubModule API.
  • Stateful networks (ie. LSTMs)
  • Batch normalization (paper)

Environments

  • OpenAI Gym Atari

Performance

  • ~ 3,000 Steps/second = 12,000 FPS (Atari)
    • Local Mode
    • 64 environments
    • GeForce 2080 Ti
    • Ryzen 2700x 8-core
  • Used to win a Doom competition (Ben Bell / Marv2in) architecture
  • Trained for 50M Steps / 200M Frames
  • Up to 30 no-ops at start of each episode
  • Evaluated on different seeds than trained on
  • Architecture: Four Convs (F=32) followed by an LSTM (F=512)
  • Reproduce with python -m adept.app local --logdir ~/local64_benchmark --eval -y --nb-step 50e6 --env <env-id>

Acknowledgements

We borrow pieces of OpenAI's gym and baselines code. We indicate where this is done.

adeptrl's People

Contributors

benbellheron avatar heron-ci avatar heronbrett avatar jdenalil avatar josephbanks avatar jtatusko avatar ryansingman avatar sflc6 avatar trangml avatar wyattlansford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adeptrl's Issues

clean up argument parsing

  • should use docopt
  • args that depend on other args need to be collected by prompting the user
    • this should be done by showing a defaults dictionary and asking the user to modify keys
  • adept should have a centralized entry point (adept.app)

Potential dependency conflicts between adeptrl and cloudpickle

Hi, as shown in the following full dependency graph of adeptrl, adeptrl requires cloudpickle (>=0.5), while the installed version of gym(0.17.1) requires cloudpickle>=1.2.0,<1.4.0.

According to Pip's “first found wins” installation strategy, cloudpickle 1.3.0 is the actually installed version.

Although the first found package version cloudpickle 1.3.0 just satisfies the later dependency constraint (cloudpickle>=1.2.0,<1.4.0), it will lead to a build failure once developers release a newer version of cloudpickle.

Dependency tree--------

adeptrl  - 0.2.0
| +- absl-py(install version:0.9.0 version range:>=0.2)
| +- cloudpickle(install version:1.3.0 version range:>=0.5)
| +- docopt(install version:0.6.2 version range:>=0.6)
| +- gym(install version:0.17.1 version range:>=0.10)
| | +- cloudpickle(install version:1.3.0 version range:>=1.2.0,<1.4.0)
| | +- enum34(install version: version range:<.2,>=1.1.6)
| | +- numpy(install version:1.18.2 version range:>=1.10.4)
| | +- pyglet(install version:1.5.0 version range:>=1.4.0,<=1.5.0)
| | +- scipy(install version:1.2.3 version range:*)
| | +- six(install version:1.14.0 version range:*)
| +- numpy(install version:1.18.2 version range:>=1.14)
| +- opencv-python-headless(install version:4.1.0.25 version range:>=3.4)
| +- pyzmq(install version:19.0.0 version range:>=17.1.2)
| +- tensorboard(install version:1.14.0 version range:>=1.14)
| +- torch(install version:0.1.2.post2 version range:>=1.3.1)
| | +- pyyaml(install version:5.3.1 version range:*)
| +- torchvision(install version:0.3.0 version range:>=0.4.2)

Thanks for your attention.
Best,
Neolith

env cleanup

  • should be easier to construct environments
  • should be easier to understand how to add your own environment

Container.from_args

  • simplify construction of containers by allowing them to be created from_args

agent cleanup

  • should be easier to construct agents
  • agents should be pluggable
  • should be easier to understand how to add your own agent

[Bug] ObsPreprocessor does not ops name_filters on call

Example:

        cpu_ops = [CustomOpFn(name_filters=[dict_key])]
        cpu_preprocessor = ObsPreprocessor(
            cpu_ops,
            Space.from_gym(observation_space),
            Space.dtypes_from_gym(observation_space),
        )

I expect that my CustomOpFn should only be receiving the name_filters specified when calling update_shape, update_dtype and update_obs but it does not occur for update obs.

Relevant file/lines: https://github.com/heronsystems/adeptRL/blob/master/adept/preprocess/observation.py#L51

IMPALA quickstart example doesn't work

In particular, running the following example on master:

# IMPALA (requires ray, resource intensive)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app impala --agent ActorCriticVtrace --env BeamRiderNoFrameskip-v4

yields the following error message:

impala is not a valid command. See 'adept.app --help'.

This command is likely out of date - we should replace it with something that works.

MPI Error - Impala

Hi I'm trying to run the example on a cuda cluster -

I am running
adeptRL/0.1.1
glibc/2.14
python/3.7.0
mpi4py/3.0.0
torch-nightly/1.0.0.dev20180929
CUDA Version 8.0.61
mpicc -v
Using built-in specs.
COLLECT_GCC=/share/apps/gcc/7.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/share/apps/gcc/7.2.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/share/apps/gcc//7.2.0 --with-gmp=/share/apps/gcc//7.2.0/gmp/ --with-mpfr=/share/apps/gcc//7.2.0/mpfr/ --with-mpc=/share/apps/gcc//7.2.0/mpc/ --disable-multilib
Thread model: posix
gcc version 7.2.0 (GCC)

mpiexec -n 3 python3 -m adept.scripts.impala --env-id BeamRiderNoFrameskip-v4 0 1

gives

[cuda-24-1:02171] *** Process received signal ***
[cuda-24-1:02171] Signal: Segmentation fault (11)
[cuda-24-1:02171] Signal code: Address not mapped (1)
[cuda-24-1:02171] Failing at address: 0x46f

network cleanup

  • should be easier to create networks
  • networks should be pluggable
  • should be easier to understand how to add your own network

[Enhancement] Log the global gradient norms to tensorboard

Adept uses global gradient norm clipping of 0.5 by default which could prevent learning (just slowing it down) if the gradients are high and the clipping occurs at every training step. At minimum we should log this to tensorboard so the user can decide/view for themselves.

jenkins upgrades

  • build badge
  • fix github hooks
    both of these require a domain and port forward.

get rid of eval thread in local

  • eval performance usually about the same as train performance
  • you need to run the eval script anyways after each run because it will compute 30-environment averages necessary to compare performance to papers

examples

  • adding a custom environment
  • using a custom network
  • using a custom agent

Bug and/or dead code in modular_network.py?

In particular, see here:

# Dict[Dim, SubModule]
# instantiate heads based on output_shapes
head_submodules = {}
for output_key, shape in output_space.items():
dim = len(shape)
if dim in head_submodules:
continue
elif dim == 1:
submod_cls = net_reg.lookup_submodule(args.head1d)
elif dim == 2:
submod_cls = net_reg.lookup_submodule(args.head2d)
elif dim == 3:
submod_cls = net_reg.lookup_submodule(args.head3d)
elif dim == 4:
submod_cls = net_reg.lookup_submodule(args.head4d)
else:
raise ValueError("Invalid dim: {}".format(dim))
submod = submod_cls.from_args(
args,
body_submod.output_shape(submod_cls.dim),
"head" + str(dim) + "d",
)
head_submodules[str(dim)] = submod

In the following if-statement:

if dim in head_submodules:
    continue

type(dim) == int; however, keys of head_submodules are guaranteed to be str due to the line head_submodules[str(dim)] = submod. This suggests that the above if-statement will never continue. Thoughts on this?

In SubModuleXD, shapes are duplicated within output_shape and _to_xd

For example, looking at SubModule2D:

    def output_shape(self, dim=None):
        if dim == 1:
            f, l = self._output_shape
            return (f * l,)
        ...
    ...
    def _to_1d(self, submodule_output):
        """
        :param submodule_output: torch.Tensor (Batch + 2D)
        :return: torch.Tensor (Batch + 1D)
        """
        n, f, l = submodule_output.size()
        return submodule_output.view(n, f * l)

The fact that the 2D -> 1D conversion goes from (F, S) -> (F * S) is indicated in two places within the SubModule2D class. The same is true in general for mD -> nD. It may be worth eliminating this duplication.

sc2 3d action_space

would be nice to replicate some of the work in the papers where convLSTM, RMC, RMA are used. currently action space is all 1d outputs. 84x84 outputs are treated as separate heads, 84, 84

unit tests for EnvManagers

  • pretty sure SimpleEnvManager does not work for starcraft 2
  • can reproduce by swapping SimpleEnvManager into replay_gen_sc2.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.