Giter VIP home page Giter VIP logo

saida_rl's Introduction

SAIDA RL

Welcome to SAIDA RL! This is the open-source platform for anyone who is interested in Starcraft I and reinforcement learning to play and evaluate your model and algorithms.

starcraft

Table of Contents

What is SAIDA RL?

It is a simulator for users to train and evaluate their own algorithms and models in challenging Starcraft I environment. It is not only provide simulator itself, but also provide tutorials, development api document. It is specialized in Starcraft I and provides many scenarios. You can taste very fresh and challenging game and you can try your own idea to make starcraft unit better.

architecture

Basics

Anyone who are familiar with reinforcement learning entities and how open ai gym works can skip this. In reinforcement learning, there are the environment and the agent. The agent sends actions to the environment, and the environment replies with observations and rewards.

SAIDA RL inherits interface of Env class of gym and provides baseline algorithms and agent source which is independent from Env. But it is up to you whether to use these baseline sources. The following are the Env methods you should know:

function explanation
init(self) Not used.
render(self) Not used.
reset(self) Reset the environment's state. Returns observation.
step(self, action) Step the environment by one timestep. Returns observation, reward, done, info.
close(self) Close connection with Starcraft.

Guide

You can access three documentations to run a agent in this environment.

Installation

Link

Tutorials

Link

API

Link

Environment

We built environment based on openai gym. it consists of interface like below.

Agent

Agent we provide is based on keras-rl which is one of top reinforcement learning framework commonly used and we upgraded it by oursevles to support more. But you can use your own agent if you want. We decoupled between agent and environment. there is no dependencies so that, it is compatible with most numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages. If you're not sure where to start, we recommend beginning with the tutorials on our site.

Scenarios

We have various challenging scenarios for you to motivate trying to solve with reinforcement learning algorithms.

Map Name Env Name Desc. Terrain(Y/N) Agent Action space Termination Condition
Vul_VS_Zeal_v0(~3) VultureVsZealot Agent(Terran Vulture) should kill all Protoss Zealots while being damaged minimally. The number of zealots and existence of terrain depend on the version of map. It depends on the version of map. Vulture Move to specific direction, Patrol to enemy(meaning attack) kill all Zealots or defeated
Avoid_Observer_v0 AvoidObserver Reach the top of map while avoiding observers in the middle area. N Scourge Move to specific direction Reach the goal or bumped with observers
Avoid_Reaver_v0 AvoidReaver Reach the right-bottom area of the map while avoiding reavers in the middle area. N DropShip Move to specific direction Reach the goal

Algorithms

we divided algorithms to three categories.

Value based

Before DQN

  • QLearning

  • SARSA

DQN with variants

Deep Recurrent DQN

Policy based

Multi Agent algorithms

  • MADQN

  • TD Gradient Lambda

  • MADDPG [12]

  • MARDPG

  • BicNet [13]

Working Examples

Demos for well trained agents' play

Grid World in Starcraft I

for warming up, you can try this problem by yourselves.

Grid_World

Avoid Observers

Scurege's goal is to reach top area of current map avoiding conflict with observers surrounded

Avoid_observer

Avoid Reavers

Reaver's goal is to reach bottom area of current map avoiding conflict with drop ship surrounded

Avoid_Reaver

Vultures 1 vs Zealot 1

Battle between one vulture and one zealot.

vulture-vs-zealot

Vultures 1 vs Zealot 2

Battle between one vulture and two zealot.

vulture-vs-zealot

Plan

  • We will update more challenging scenarios.
  • Multi Agent algorithms

References

  1. Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
  2. Human-level control through deep reinforcement learning, Mnih et al., 2015
  3. Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
  4. Continuous control with deep reinforcement learning, Lillicrap et al., 2015
  5. Asynchronous Methods for Deep Reinforcement Learning, Mnih et al., 2016
  6. Continuous Deep Q-Learning with Model-based Acceleration, Gu et al., 2016
  7. Deep Reinforcement Learning (MLSS lecture notes), Schulman, 2016
  8. Dueling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016
  9. Reinforcement learning: An introduction, Sutton and Barto, 2011
  10. Proximal Policy Optimization Algorithms, Schulman et al., 2017
  11. Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, 2015
  12. Multi-agent actor-critic for mixed cooperative-competitive environments, Lowe, Ryan, et al., 2017
  13. Multiagent Bidirectionally-Coordinated Nets Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games, Peng et al., 2017
  14. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, William et al., 1992

saida_rl's People

Contributors

oglee815 avatar iljooyoon avatar verystrongjoe avatar

Stargazers

 avatar Wangjun avatar yoon2 avatar  avatar  avatar Jo ML avatar Alxy Savin avatar Oh Won Sik avatar  avatar  avatar ggleam7 avatar  avatar Taehyeon Lim avatar jiaqing avatar Jose Cohenca avatar Ryan Lee avatar Jeongseok Kang avatar  avatar Uiseon Yu avatar Jusik Yun avatar Jinhyeok Yang avatar Jeina avatar HK avatar  avatar Eon Kim avatar Eunjun Jang avatar Yedarm Seong avatar Hyeonjae.kim avatar 권세인 avatar Sinyu Jeong avatar  avatar HWANHEE KIM avatar NeuroWhAI avatar  avatar Junho Kim avatar Chris Ohk avatar Woodam Lim avatar kim minsu avatar Tony (Sungjin) Ahn avatar  avatar  avatar  avatar Sungyoon Lee avatar liuruoze avatar Shim Daeyeol avatar Nick Linker avatar gorakgarak avatar Sungwook Yeom avatar dongseok Lee avatar Hyunsoo Park avatar Julio Reeks avatar  avatar Beniamin Malinski avatar Bryan S Weber avatar Daewon avatar Daewon Lee avatar Wooil Shim avatar Baek In-Chang avatar Jean Chassoul avatar  avatar Javier Sacido avatar  avatar  avatar  avatar  avatar Dan Gant avatar

Watchers

James Cloos avatar Jean Chassoul avatar  avatar  avatar Oh Won Sik avatar Baek In-Chang avatar Do-yoon avatar TeamSAIDA avatar Jung Yeon Lee avatar  avatar paper2code - bot avatar

saida_rl's Issues

오타 발견

/TeamSAIDA/SAIDA_RL/blob/master/python/core/common/agent.py
line 86 에

raise RuntimeError('Your tried to run your agent but it hasn't been compiled yet. Please call compile() before run().')

your -> you

avoidObserver tutorial reward function x좌표 부등호 문제

scourge 가 safe zone 근처에서 죽을 때 reward shape의 x좌표의 부등호가
896 - 32*MARGINAL_SPACE >= observation.my_unit[0].pos_x and observation.my_unit[0].pos_x <= 1056 + 32*MARGINAL_SPACE 이 아니라
896 - 32*MARGINAL_SPACE <= observation.my_unit[0].pos_x and observation.my_unit[0].pos_x <= 1056 + 32*MARGINAL_SPACE
으로 수정돼야 하는 것 같습니다.

avoidObserver.py, girdWorld.py에서 발생하는 numpy error 문제

SAIDA_RL/python/core/callbacks.py의 60번째 라인의 variables의 print 부분에서 다음과 같은 에러가 발생합니다.

TypeError: unsupported format string passed to numpy.ndarray.format

episode의 action을 mean 하는 과정에서 str.format에서 에러를 발생하고, 아래와 같이 해결 가능합니다.
'action_mean': np.mean(self.actions[episode])
-> 'action_mean': float(np.mean(self.actions[episode]))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.