tencent-ailab / tleague Goto Github PK

View Code? Open in Web Editor NEW

76.0 6.0 21.0 5.08 MB

License: MIT License

Python 88.06% Shell 2.23% Jinja 9.71%

tleague's Introduction

TLeague

A Framework for Competitive SelfPlay based Multi-Agent Reinforcement Learning.

Install

cd to the folder and run the command:

pip install -e .

Quick Example

See the docs here for examples that can run in a single machine.

Large-Scale Run

We recommend using k8s (Kubernetes) to manage large-scale run. See the docs here (TODO) for more information.

Coding Style

Use google python coding style except for:

Indent the code blocks with 2 spaces

Disclaimer

This is not an officially supported Tencent product. The code and data in this repository are for research purpose only. No representation or warranty whatsoever, expressed or implied, is made as to its accuracy, reliability or completeness. We assume no liability and are not responsible for any misuse or damage caused by the code and data. Your use of the code and data are subject to applicable laws and your use of them is at your own risk.

tleague's People

Contributors

Stargazers

Watchers

tleague's Issues

ModuleNotFoundError: No module named 'tpolicies'

Hi , I met an error when running 'bash example_pong2p_sp_ppo.sh learner'
from
https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_PONG2P_SP_PPO.md.
""ModuleNotFoundError: No module named 'tpolicies' "
How can I install tpolicies?

tpolicies找不到是什么原因？

Which matching strategy do you use the most?

Which mgr do you use the most?
How to make a reasonable choice among them?

Fatal Python error: Aborted

Hi, when I run the code bash example_pommerman_pfsp_ppo.sh learner, I met the following error

2020-12-01 05:10:43.476860: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-12-01 05:10:43.497785: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-12-01 05:10:43.498066: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60694d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-01 05:10:43.498082: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-01 05:10:43.499164: F tensorflow/core/common_runtime/device.cc:28] Check failed: DeviceNameUtils::ParseFullName(name(), &parsed_name_) Invalid device name: /job:localhost/replica:0/task:0/device:XLA_GPU:-1
Fatal Python error: Aborted

Thread 0x00007fc9a653f700 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/zmq/sugar/socket.py", line 683 in recv_string
  File "/home/TLeague/tleague/learners/base_learner.py", line 86 in _message_worker
  File "/usr/lib/python3.6/threading.py", line 864 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007fca0bc53740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 699 in __init__
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1585 in __init__
  File "/home/TLeague/tleague/learners/pg_learner.py", line 75 in __init__
  File "/home/TLeague/tleague/learners/ppo_learner3.py", line 35 in __init__
  File "/home/TLeague/tleague/bin/run_pg_learner.py", line 123 in main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
  File "/home/TLeague/tleague/bin/run_pg_learner.py", line 128 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
example_pommerman_pfsp_ppo.sh: line 127:   450 Aborted                 (core dumped) python3 -m tleague.bin.run_pg_learner --learner_spec=-1:30003:30004 --model_pool_addrs=localhost:10003:10004 --league_mgr_addr=localhost:20005 --learner_id=lrngrp0 --unroll_length=32 --rollout_length=8 --batch_size=32 --rm_size=64 --pub_interval=100 --log_interval=100 --total_timesteps=2000000 --burn_in_timesteps=12 --env="${env}" --policy="${policy}" --policy_config="${policy_config}" --batch_worker_num=1 --rwd_shape --learner_config="${learner_config}" --type=PPO

AssertionError with PBTPSROGameMgr

set game_mgr_type=tleague.game_mgr.game_mgrs.PBTPSROGameMgr

During the 'request_learner_task' process,

TLeague/tleague/league_mgrs/league_mgr.py

Line 125 in cc5e13a

parent_model_key, is_mutate = self.game_mgr.get_player(cur_model_key)

a new model_key is got using the game_mgr based on cur_model_key, however, the diversity of the current player is less than the threshold, game_mgr sets self._population[current_player].is_living = False to kill the current player and randomly selects another player.

TLeague/tleague/game_mgr/game_mgrs.py

Line 402 in cc5e13a

self._population[current_player].is_living = False

After several debugs, I found that the population size was always one, as the new model key is either None or equal to the cur_model_key and cannot be added to the population. As a result, an error occurred when selecting another player.

TLeague/tleague/game_mgr/game_mgrs.py

Line 404 in cc5e13a

assert len(living_players) > 0

Blow is my debug log:

_on_request_learner_task:  done adding player rand_model:0001, parent_player rand_model:0001, to game mgr
_on_request_learner_task: learner_id:'lrngrp0'
Traceback (most recent call last):
  File "/Users/huyueyue/opt/anaconda3/envs/rl/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/huyueyue/opt/anaconda3/envs/rl/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/huyueyue/Work/Code/TLeague/tleague/bin/run_league_mgr.py", line 107, in <module>
    app.run(main)
  File "/Users/huyueyue/opt/anaconda3/envs/rl/lib/python3.6/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/Users/huyueyue/opt/anaconda3/envs/rl/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/Users/huyueyue/Work/Code/TLeague/tleague/bin/run_league_mgr.py", line 103, in main
    league_mgr.run()
  File "/Users/huyueyue/Work/Code/TLeague/tleague/league_mgrs/base_league_mgr.py", line 72, in run
    learner_task = self._on_request_learner_task(learner_id)
  File "/Users/huyueyue/Work/Code/TLeague/tleague/league_mgrs/league_mgr.py", line 125, in _on_request_learner_task
    parent_model_key, is_mutate = self.game_mgr.get_player(cur_model_key)
  File "/Users/huyueyue/Work/Code/TLeague/tleague/game_mgr/game_mgrs.py", line 404, in get_player
    assert len(living_players) > 0
AssertionError

There seems to be a problem with the code, please help check it.
Thks.

What does this code do in pg_learner.py

https://github.com/tencent-ailab/TLeague/blob/dev-open/tleague/learners/pg_learner.py#L152

Why need to add another optimizer to gd the value loss?
What's the burn_in mean?

Actor 是本地在运行模型的吗?

我看到 Actor 获取 task 向模型池拿模型下发到本地，那每个 Actor 不都要运行一个 tf sess，这样好像对资源消耗很大？好像没找到哪里有用到 inf server，是不是只有 train 进行训练那一端才使用 inf server?

IndexError: tuple index out of range

When I was running the example for Running SC2 with SelfPlay + PPO2, the following error occurred when I started the learner process. Why is such a problem?

(python-3.7.10) sh-4.4$bash example_sc2_sp_ppo2.sh learner
Running as learner
Empty config string, returning empty dict
Empty config string, returning empty dict
pygame 2.1.2 (SDL 2.0.16, Python 3.7.10)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "/home/anaconda3/envs/python-3.7.10/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/anaconda3/envs/python-3.7.10/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/work/tff_files/alphastar/TLeague-dev-open/tleague/bin/run_pg_learner.py", line 138, in <module>
    app.run(main)
  File "/home/anaconda3/envs/python-3.7.10/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/anaconda3/envs/python-3.7.10/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/work/alphastar/TLeague-dev-open/tleague/bin/run_pg_learner.py", line 93, in main
    ob_space, ac_space = env_space(FLAGS.env, env_config, interface_config)
  File "/home/work/alphastar/TLeague-dev-open/tleague/envs/create_envs.py", line 75, in env_space
    return sc2_env_space(arena_id, env_config, inter_config)
  File "/home/work/alphastar/TLeague-dev-open/tleague/envs/sc2/create_sc2_envs.py", line 493, in sc2_env_space
    **inter_config
  File "/home/work/alphastar/TLeague-dev-open/tleague/envs/sc2/create_sc2_envs.py", line 336, in make_sc2full_v8_interface
    **kwargs)
  File "/home/work/alphastar/Arena-dev-open/arena/interfaces/sc2full_formal/obs_int.py", line 36, in __init__
    crop_to_playable_area=crop_to_playable_area)
  File "/home/work/alphastar/TImitate-dev-open/timitate/lib6/pb2feature_converter.py", line 1096, in __init__
    distinguish_effect_camp, lurker_effect_decay))
IndexError: tuple index out of range

Sampling from Experience Replay during Imitation Learning

Hello,

During Imitation Learning, do you process the trajectories from the same replay in a sequential manner or do you randomly sample a trajectory from a random starting point in the replay? In the latter case, how do you keep track of the LSTM state so that the memory is maintained throughout the game?

Thank you!

what is pgn_file ?

https://github.com/tencent-ailab/TLeague/blob/dev-open/tleague/league_mgrs/league_mgr.py#L50

What is stored in the file?

No module named 'pysc2.lib.typeenums'

The error come out while running

"bash example_sc2_sp_ppo2.sh learner"
mentioned in
"docs/EXAMPLE_SC2_SP_PPO2.md"

The traceback info are:

Traceback (most recent call last):
File "/opt/miniconda3/envs/tleague/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/miniconda3/envs/tleague/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/xx/Documents/code/TLeague/tleague/bin/run_pg_learner.py", line 138, in
app.run(main)
File "/opt/miniconda3/envs/tleague/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/miniconda3/envs/tleague/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/Users/xx/Documents/code/TLeague/tleague/bin/run_pg_learner.py", line 93, in main
ob_space, ac_space = env_space(FLAGS.env, env_config, interface_config)
File "/Users/xx/Documents/code/TLeague/tleague/envs/create_envs.py", line 74, in env_space
from tleague.envs.sc2 import sc2_env_space
File "/Users/xx/Documents/code/TLeague/tleague/envs/sc2/init.py", line 1, in
from .create_sc2_envs import *
File "/Users/xx/Documents/code/TLeague/tleague/envs/sc2/create_sc2_envs.py", line 11, in
from arena.wrappers.sc2_wrapper import VecRwd
File "/Users/xx/Documents/code/Arena/arena/wrappers/sc2_wrapper.py", line 2, in
from arena.utils.constant import AllianceType
File "/Users/xx/Documents/code/Arena/arena/utils/constant.py", line 2, in
from pysc2.lib.typeenums import UNIT_TYPEID
ModuleNotFoundError: No module named 'pysc2.lib.typeenums'

related libs are:
TImitate 1.3 /Users/xx/Documents/code/TImitate
TLeague 1.3 /Users/xx/Documents/code/TLeague
TPolicies 1.3 /Users/xx/Documents/code/TPolicies

python: 3.6.13
MacOS

If you need any more info, please let me know.
And I tried different version of pysc2 from v1.0 to v3.0, there is always this error.

Thank you~

make_sc2full_v8_interface got an unexpected keyword argument 'max_bo_count'

I installed the env as instructed in https://github.com/tencent-ailab/TImitate
but it got a unexpected keyword argument error when run learner:
bash example_sc2_sp_ppo_infserver.sh learner

is there any problem with the code?

(my runtime: python3.6.5. tensorflow 1.15.0)

Why don't you remove instance when you skip frame(step).

TLeague/tleague/actors/ppo_actor.py

Line 98 in 101ff85

delta = (mb_rewards[t] + (self._gamma ** (mb_skips[t]+1))

skip的数据这里 self._gamma ** (mb_skips[t]+1) gamma乘以了两次，这里让我很迷惑，为什么不直接去掉这个数据呢？🤔，是为了减小 next_value 中间略过一个step的影响吗? 有对比过和直接去掉之间效果上有什么差别吗

我的理解，skip 的时机类似英雄被晕眩的时候，actor 无法控制，这个时候是没有任何操作的，只能取一个空的 action，拿这样的数据去训练感觉没有很大的意义。

No module named "tensorflow"

Hi team,

Per title, there is a missing reference to "tenserflow" installation in your setup.py and/or Readme. I was trying out your Pommerman example and followed instructions to see that learner requires tensorflow. What's more it seems that you require tensorflow in version 1 but it isn't available through pip anymore. Please provide instructions on how to test your code.

In case you were thinking about upgrading your codebase to tensorflow v2 there has been some progress in helping users do so semi-automatically. Please see https://www.tensorflow.org/guide/upgrade .

Why didn't you build the TLeague on top of ray and rllib?

How about the efficiency of TLeague

Firstly, appreciate your contribution to the community.

But I'm wondering about the efficiency of these examples in a single machine. Since I have run this system on my personal server, it may take a long time but cannot see the agent (atari-game and pong-game) who becomes stronger than before.

I have read the run shell and some parameters (batch size, unroll length and batch_num_worker) are very small. So I wonder that are there any better parameters that can be provided for a more impressive result with a shorter time for training.

Best regards!

How long does it take to train?

why learner add api for response learner id ?

https://github.com/tencent-ailab/TLeague/blob/dev-open/tleague/learners/base_learner.py#L68

What does this API do?

Is it to serve some discovery/register services in cluster cloud?