Giter VIP home page Giter VIP logo

scii_bots-wip's Introduction

SCII_Bots

Several agents able to play StarCraft II will be built in this repository!

Page Views Count

Initializing

Build basic develop environment

First of all, you need to download and install the game. Then follow the instructions below to build awesome battle bots!

The install packages from requirements file are all tested on windows 10, conda 4.9.2 and python 3.7.3.

Create a new conda environment with

conda create -n SCII_Bots python=3.7.3

Install git for python with

conda install git

Install requirements with (note that some of the packages may not need to be used)

pip install -r requirements.txt

If everything goes well, the followed result should be shown in the powershell/prompt.

PS: the newest version of PySC2 package is 3.0.0, which most of the codes published on websites are based on 2.x.x, so most of them are no longer available to run.

PPS: PySC2 is very different with and more complex than python-sc2. In this repository, we mainly focus on PySC2.

Learning

introduction of game environment

State: obtained from env.observation, including the feature screen, feature minimap and player info.

Action: try to determine what to do and where to go to win the game. There are two types of the actions include several basic action (currently 11) and a coordinate position with 64*64 points.

Reward:

version 1: $$ (score + total_value_units + total_value_structures + 10killed_value_units + 10killed_value_structures + collected_minerals + collected_rate_minerals + 5spent_minerals - 8idle_work_time) * 10e-6 $$ need further adjust.

version 2:

use *spent_minerals* to reward the action; use *killed_value_units + killed_value_structures* to reward attack point.

In summary, $$ reward = [reward_a, reward_p] $$ where $$ reward_a = spent_minerals * (10e^{-2}) $$

$$ reward_p = killed_value_units + killed_value_structures $$

In addition, the reward will adjusted further to simulate the returns from environment more precisely.

  • if action is available, actual_action is action, else expect UnboundLocalError and return actual_action as actions.FUNCTIONS.no_op.
if actual_action == action:
    reward_a = reward_a * 10
  • if win in an episode:
reward = list(np.array(reward) + 10000)
  • if done is True (forces are equal in the match of each side):
reward = list(np.array(reward) - 5000)

Run the environment test script as follows with

python runner_basic_test.py

I also use the supervised value network to validate if the gradient update is worked, just run with

python runner_nn_test.py

Train an DQN agent to play the game with

python runner_dqn.py

Train an A2C agent to play the game with (Coming Soon)

python runner_a2c.py
details of neural agents and algorithms

The structure of value neural agent can be trained through DQN algorithm. The value neural agent takes three different types of input tensors include 27 channels screen features, 11channels mini-map features and 11 channels player information features.

Also, two different functional models, named as operation model and warfare model, shared several layers of the whole network and the inputs. The two models output action value and value of attack position respectively.

The DQN algorithm is expressed as follows.

The structure of policy neural agent is constructed as follows with pytorch 1.2.0. Try to use A2C algorhthm to search better policy distribution (Coming Soon).

evaluating

Build value neural agent to learn high-value action based on DQN algorithm, currently the state of convergence as follows:

  • The most important, the agent has learned available action sequence like how to select army and attack (select_scv-->build_supply_deport-->build_barrack-->train marines in multiple times-->select_all_troops-->reach to attack_point and attack);
  • available to train new battle units when army is losing, build around 10 marines in each attack wave;
  • available to attack specific position.

Several replays are saved in here. Best score in one episode: 8339872.5. Average learning losses after 1068 batch_pools.

And finally,

En Taro Adun !!!

En Taro Tassadar !!!

En Taro Zeratul !!!

En Taro Artanis !!!

TODO List:

  1. Standardize the game rules in environment;
  2. Use Keyframe rather than all frame as inputs;
  3. Use recurrent block to extract temporal features;
  4. Optimize the attack position to select from the positions of known enemies, rather than all positions on the mini-map;
  5. add residual block to reinforce the ability of image feature extraction;
  6. add attention module to refine the functions in each module.

references

https://github.com/deepmind/pysc2

https://github.com/skjb/pysc2-tutorial

https://github.com/Dentosal/python-sc2

https://github.com/ClausewitzCPU0/SC2AI

Home · Dentosal/python-sc2 Wiki · GitHub

scii_bots-wip's People

Contributors

divergent63 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.