google-deepmind / open_spiel Goto Github PK

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

License: Apache License 2.0

Shell 0.54% CMake 0.88% Python 41.17% C++ 54.83% Jupyter Notebook 1.92% Julia 0.19% Go 0.20% C 0.07% Rust 0.21%

games reinforcement-learning multiagent cpp python

open_spiel's Introduction

OpenSpiel: A Framework for Reinforcement Learning in Games

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. Games are represented as procedural extensive-form games, with some natural extensions. The core API and games are implemented in C++ and exposed to Python. Algorithms and tools are written both in C++ and Python.

To try OpenSpiel in Google Colaboratory, please refer to open_spiel/colabs subdirectory or start here.

Index

Please choose among the following options:

For a longer introduction to the core concepts, formalisms, and terminology, including an overview of the algorithms and some results, please see OpenSpiel: A Framework for Reinforcement Learning in Games.

For an overview of OpenSpiel and example uses of the core API, please check out our tutorials:

Motivation, Core API, Brief Intro to Replictor Dynamics and Imperfect Information Games by Marc Lanctot. (slides) (colab)
Motivation, Core API, Implementing CFR and REINFORCE on Kuhn poker, Leduc poker, and Goofspiel by Edward Lockhart. (slides) (colab)

If you use OpenSpiel in your research, please cite the paper using the following BibTeX:

@article{LanctotEtAl2019OpenSpiel,
  title     = {{OpenSpiel}: A Framework for Reinforcement Learning in Games},
  author    = {Marc Lanctot and Edward Lockhart and Jean-Baptiste Lespiau and
               Vinicius Zambaldi and Satyaki Upadhyay and Julien P\'{e}rolat and
               Sriram Srinivasan and Finbarr Timbers and Karl Tuyls and
               Shayegan Omidshafiei and Daniel Hennes and Dustin Morrill and
               Paul Muller and Timo Ewalds and Ryan Faulkner and J\'{a}nos Kram\'{a}r
               and Bart De Vylder and Brennan Saeta and James Bradbury and David Ding
               and Sebastian Borgeaud and Matthew Lai and Julian Schrittwieser and
               Thomas Anthony and Edward Hughes and Ivo Danihelka and Jonah Ryan-Davis},
  year      = {2019},
  eprint    = {1908.09453},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  journal   = {CoRR},
  volume    = {abs/1908.09453},
  url       = {http://arxiv.org/abs/1908.09453},
}

Versioning

We use Semantic Versioning.

open_spiel's People

Contributors

Stargazers

Watchers

Forkers

b2220333 shaneisley endpress tzq2doc nlgrf kiransterling lyh0208 sharathraparthy stjordanis johnathana shyamalschandra ralami1859 alleboudy flamato donsetpg tokarev-tt-33 allensmile hyzcn raihan-seraj wanghuimu splendor-kill sampsonguo hdony wangzy btbujiangjun arnoldzhou bitice shashisingh pierceming yanwen0614 xuexi1522234101 shaobin awesome-archive colorfulclouds treseizero gonion ggpanda ian09 wangyy161 dailyactie flybeyond kornbergfresnel redyjq shamcondor zhujiang0086 plzo ujwal2910 hayfred bradhowes zenoengine snazz2001 yinjiangjin wqzsscc dmorrill10 guodongzhu ytsafe hello-yaowq jamesliu brettkoonce vivienzou1 comsuite jafei amatc saeta tianyunkeml smwade paperflight tianskyluj b-xiang liuwenhaha penglonghu gerrysonx jinc0418 jw447 batermj nononowow leiluo2020 kailianghu logan-lu shadowkun qq1588518 adrianhust moving zxkyjimmy xuejinqi ice-amber wangxiqiu minglingge juliendehos dmitriial gitchenze volodymyrpavliukevych jack51706 rajeevbk zsnake1209 amir22010 jiaodaxiaozi gms2009 virus19887719 fuxianh

open_spiel's Issues

Backgammon notation issue

First of all a huge thank you for making this framework available, I think this is going to be a fantastic resource for anyone wanting to learn modern best practices for reinforcement learning.

I'm just getting my head around what this framework can do, but have noticed an issue with the notation within the Backgammon implementation - this doesn't follow the normal standards so may cause confusion. Currently it shows the starting point and the number of spaces moved, as opposed to the starting point / ending point. For example from the start state, the following is displayed after a roll of 6-1:

chose action: 621 (23-6 23-1)

This should actually be:
chose action: 621 (23-17 23-22)

not sure if this has been done deliberately, but I'm happy to fix this if not - just wanted to check first.

I'd also be keen to implement the pubeval code as an agent / bot - this is a reference backgammon agent created by the mighty Gerard Tesauro which is useful for being able to test the performance of a backgammon bot.

Finally, would be keen to get involved with the Windows port - is there a vision for the toolset that this will use?

Question about games wrapper for Ludii

I saw that a games wrapper for the Ludii General Game System was something that you were looking to add in contributing.md. It looks like Ludii is written in Java. Was the idea to be able to call the Ludii jar from C++ in order to interact with it?

[MacOS] Unable to Building and running tests____make -j$(nproc)

environment:
system: mac
compiler: gcc

I have seccessfully build the test until "make -j$(nproc)", and have change the command line to "make -j$(sysctl -n hw.ncpu)",since mac not identify nproc.
and the this happen:

Scanning dependencies of target open_spiel_core
[ 1%] Building CXX object abseil-cpp/absl/base/CMakeFiles/absl_spinlock_wait.dir/internal/spinlock_wait.cc.o
[ 1%] Building CXX object abseil-cpp/absl/base/CMakeFiles/absl_dynamic_annotations.dir/dynamic_annotations.cc.o
[ 2%] Building CXX object abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o
[ 2%] Building CXX object CMakeFiles/open_spiel_core.dir/game_parameters.cc.o
clangclang: clangclang: : : error: argument unused during compilation: '-undefined dynamic_lookup' [-Werror,-Wunused-command-line-argument]errorerror: :
argument unused during compilation: '-undefined dynamic_lookup' [-Werror,-Wunused-command-line-argument]argument unused during compilation: '-undefined dynamic_lookup' [-Werror,-Wunused-command-line-argument]

error: argument unused during compilation: '-undefined dynamic_lookup' [-Werror,-Wunused-command-line-argument]
make[2]: *** [CMakeFiles/open_spiel_core.dir/game_parameters.cc.o] Error 1
make[1]: make[2]: *** [CMakeFiles/open_spiel_core.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
*** [abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o] Error 1
make[2]: *** [abseil-cpp/absl/base/CMakeFiles/absl_spinlock_wait.dir/internal/spinlock_wait.cc.o] Error 1
make[2]: *** [abseil-cpp/absl/base/CMakeFiles/absl_dynamic_annotations.dir/dynamic_annotations.cc.o] Error 1
make[1]: *** [abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/all] Error 2
make[1]: *** [abseil-cpp/absl/base/CMakeFiles/absl_spinlock_wait.dir/all] Error 2
make[1]: *** [abseil-cpp/absl/base/CMakeFiles/absl_dynamic_annotations.dir/all] Error 2
make: *** [all] Error 2

when I use python to import pyspiel, this happen:
ModuleNotFoundError: No module named 'pyspiel'

please help!

Unable to build and run tests - CMAKE_HAVE_LIBC_PTHREAD

environment:
system: Ubuntu 18.04.3 LST
cmake: 3.15.2
compiler: gcc (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026

I am unable to build and run tests (the following was run within the activated virtual environment, as specified in the installation instructions):

~/github/open_spiel/build$ CXX=g++ cmake -DPython_TARGET_VERSION=3.6 -DCMAKE_CXX_COMPILER=${CXX} ../open_spiel
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 6.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/local/cuda-9.0/bin/g++
-- Check for working CXX compiler: /usr/local/cuda-9.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found Python3: /usr/lib/x86_64-linux-gnu/libpython3.6m.so (found version "3.6.8") found components: Development
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= /usr/local/cuda-9.0/bin/g++

-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 6.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/local/cuda-9.0/bin/g++
-- Check for working CXX compiler: /usr/local/cuda-9.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
CMake Error at python/CMakeLists.txt:1 (if):
if given arguments:

"STREQUAL" ""

Unknown arguments specified

-- Configuring incomplete, errors occurred!
See also "/home/skif/github/open_spiel/build/CMakeFiles/CMakeOutput.log".
See also "/home/skif/github/open_spiel/build/CMakeFiles/CMakeError.log".
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= /usr/local/cuda-9.0/bin/g++

CMAKE_HAVE_LIBC_PTHREAD is causing an issue - I am not entirely sure how to resolve it.

Any suggestions would be most welcome. Apologies in advance if it's a trivial issue

Unable to find the "example" program in the In the "examples" directory

I have successful run the following steps on MacOS:

./install.sh

virtualenv -p python3 venv
source venv/bin/activate
pip3 install -r requirements.txt
3) ./open_spiel/scripts/build_and_run_tests.sh

But when I try to run my first example, like:
examples/example --game=tic_tac_toe

I can not find the "example" program in the "examples" directory:

Anyone knows what happens here? Thanks!

make freezes on Google Compute instance (Ubuntu 19.04) at 100%

So it just freezes forever at the final make step (install.sh, venv, etc. run without a glitch):
[ 97%] Built target matrix_games_test
[ 97%] Built target turn_based_simultaneous_game_test
[ 98%] Built target misere_test
[ 99%] Built target spiel_test
[100%] Building CXX object python/CMakeFiles/pyspiel.dir/pybind11/pyspiel.cc.o

Question: information state targetting

I'm trying to rewrite OOS (http://mlanctot.info/files/papers/aamas15-iioos.pdf) and specifically information state targeting.

Part of it is to write a function which takes two ActionObservation sequences and returns if they are compatible (it's possible to get from the current infoset to the target infoset).

In OpenSpiel, neither the InformationState / InformationStateAsNormalizedVector provide "incremental" information: it's a domain-dependent representation of the whole AO sequence.
Observation or ObservationAsNormalizedVector do these "incremental" observations: however how is the observation of "no observation" encoded in phantom games? Empty vector?

Also, do you think it would make sense to store the ActionObservation for each player in State (similarly to how history_ is stored)? It is not possible to do a currentState->parentState lookup, so to retrieve the whole AO for arbitrary state you'd have to make targetted tree traversal from the root to this state.

I'm asking this specifically in the context of OOS, where you will be given a target infoset (or State, in the Bot API), but you might not know what it's ActionObservations are (if you didn't visit this infoset before and cached it). The result is that you cannot target this infoset efficiently.

make process error...

When I use "make -j$(nproc)" to make, some errors display as the follows:

In file included from /Users/user/documents/deeplearning/open_spiel/open_spiel/abseil-cpp/absl/random/internal/randen_slow.cc:21:
/Users/user/documents/deeplearning/open_spiel/open_spiel/abseil-cpp/absl/random/internal/platform.h:165:48: error:
'TARGET_OS_IPHONE_SIMULATOR' is not defined, evaluates to 0 [-Werror,-Wundef]
#if defined(APPLE) && (TARGET_OS_IPHONE || TARGET_OS_IPHONE_SIMULATOR)

[ 8%] Built target absl_leak_check_disable
In file included from /Users/user/documents/deeplearning/open_spiel/open_spiel/abseil-cpp/absl/random/internal/randen_hwaes.cc:26:
/Users/user/documents/deeplearning/open_spiel/open_spiel/abseil-cpp/absl/random/internal/platform.h:165:48: error:
'TARGET_OS_IPHONE_SIMULATOR' is not defined, evaluates to 0 [-Werror,-Wundef]
#if defined(APPLE) && (TARGET_OS_IPHONE || TARGET_OS_IPHONE_SIMULATOR)
...

Please give some help. Thanks a lot.

Issues with Hanabi rewards and self-play.

Hello,
I was checking policy gradient algorithm on HLE.
Firstly, I think there is something wrong with how Hanabi handles rewards (or I am missing something). I ran this simple code (similar to PG for poker example) and looked how agents save rewards. I made discount = 0, so agent._dataset['returns'] is a list of immediate rewards.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from open_spiel.python import policy
from open_spiel.python import rl_environment
from open_spiel.python.algorithms import policy_gradient

game = "hanabi"
num_players = 2
discount =0
env_configs = {"players": num_players, 'max_life_tokens' : 1, 'colors' : 2, 
               'ranks' : 5, 'hand_size' : 2, 'max_information_tokens' : 3, 'discount' : discount}
env = rl_environment.Environment(game, **env_configs)
info_state_size = env.observation_spec()["info_state"][0]
num_actions = env.action_spec()["num_actions"]

with tf.Session() as sess:
    agents = [policy_gradient.PolicyGradient(sess, idx, info_state_size, num_actions, 
                                             hidden_layers_sizes=(128,)) for idx in range(num_players)]

    
    sess.run(tf.global_variables_initializer())
    for ep in range(1):
        time_step = env.reset()
        while not time_step.last():
            player_id = time_step.observations["current_player"]
            print('Player %d' % player_id)
            fireworks_before = env._state.observation()[30:47]
                
            agent_output = agents[player_id].step(time_step)
            action_list = [agent_output.action]
            time_step = env.step(action_list)
            
            if not time_step.last():
                fireworks_after = env._state.observation()[30:47]
            else:
                fireworks_after = 'lost'
                

            print(fireworks_before, '-->', fireworks_after, '\n')
        for agent in agents:
            agent.step(time_step)
print('\n')
print('Agent 0 rewards history:')
print(agents[0]._dataset['returns'])
print('Agent 1 rewards history:')
print(agents[1]._dataset['returns'])

This is an example of weird output.

P0 get 0pt, then P1 gets 1pt and then P0 loses. As I understand, P0 should have rewards [1, -1] which is correct. However P1 should have [0] since he scored a point and right after P0 lost it.

Player0
Fireworks: R0 Y0 --> Fireworks: R0 Y0

Player 1
Fireworks: R0 Y0 --> Fireworks: R0 Y1

Player 0
Fireworks: R0 Y1 --> lost

Agent 0 rewards history:
[1.0, -1.0]
Agent 1 rewards history:
[-1.0]

Second question is concerning self-play. I would like to run PG agent in a self-play there it plays with itself (not similar agent.) All examples I found in open_spiel have several copies of agents playing with each other and learning separately. As I understand, using one agent to play with itself in current implementation will not work since episode experience of different players will mix up.

Hanabi

It would be great if we could use this repo with Hanabi. Probably adding some kind of wrapper would be enough.

terminal state detection connect_four not working correctly

It seems that the terminal state detection of connect_four does not detect all winning states.
See the following example for a minimal reproduction:

import pyspiel
game = pyspiel.load_game("connect_four")
state = game.new_initial_state()
state.apply_action(3)
state.apply_action(2)
state.apply_action(3)
state.apply_action(2)
state.apply_action(3)
state.apply_action(2)
state.apply_action(3)
print(state)
print(state.is_terminal())

results in:

.......
.......
...x...
..ox...
..ox...
..ox...

False

This should be a winning state for x and thus print True, right? Or do I just understand the functionalities of state and is_terminal() incorrectly?

RuntimeError: Unknown parameter players

Hello,

I tried to use example.py to play a tic_tac_toe game using command python3 example.py --players=1. Here's the error I ran into:
Traceback (most recent call last): File "example.py", line 111, in <module> app.run(main) File "/open_spiel/venv/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/open_spiel/venv/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "example.py", line 46, in main {"players": pyspiel.GameParameter(FLAGS,players)}) RuntimeError: Unknown parameter players
The example.py works well if I do not pass any parameter to it. Could anyone tells me how to solve this problem? Thanks!

Limiting number of actions in the domains

As pointed out in #51 domains can factorize their actions if there's too many of them per node.

Would you like to impose a limit on that, for example having at most 256 actions per node? The reason why this would practically useful is because pretty much all the algorithms work with some notion of saving a value per action.

Backgammon doubling cube and matches

I'd like to take a look at implementation of doubling cube and match logic for Backgammon.

The doubling cube is critical to backgammon strategy, and previous Backgammon value networks such as TDGammon have not implemented this within the neural network - it's been traditionally bolted on via a separate doubling algorithm. I think this has the potential to be an area where this framework could offer significant improvements over previous works.

The way I would see it working would be:

Players with an available double would have a move choice of offering the double or rolling the dice.
If double is offered, opponent would have the move choices of accept the double or resign.
If the double is accepted the doubling dice would then be owned by the opposing player.

For this to work, the backgammon State will need to include:

Match Player 0 score
Match Player 1 score
Match SetPoints (the number of points a player needs to get to)
Match isCrawford (is the Crawford rule applicable?)
Doubling cube value
Doubling cube owner (0/1 for players, -1 for no owner)

Would also need to have a way of passing the following options to backgammon.cc :

UseDoublingCube
MatchSetPoints (1 for single game)
UseCrawfordRule

Would be interested in views on this - probably won't be looking at this for a few weeks (assuming you're happy to support this), but keen to get initial thoughts down.

Questions about different CFR variants

Thanks a lot for open sourcing the repository. It's quite helpful.

I see there are many variants of CFR algorithm implemented. I would like to ask if there are someone already working on the VR-MCCFR and if the discounted regret minimisation algorithm also on the list to be done.

If I want to try this two tasks, are there some suggestions to make the code more suitable for the repository?

Support for Games with Continuous Action-Spaces

Thanks for open sourcing open_spiel. I love the aim for completeness in this space!

One thing that I noticed is missing is games with continuous action-spaces and related algorithms (such as Deterministic Policy Gradient). Are there any plans to add support for those?

I'm especially interested in MARL and equilibrium learning in auction games.
As an example, I looked at the included implementation of the discretized first-price sealed-bid auction with integer valuations and actions. On the one hand, continuous-actions are necessary when studying emergent MARL-behavior: While a discrete implementation is pretty straight-forward for symmetric-uniform valuation-distributions, it become a lot less meaningful in settings with asymmetric or non-uniform (e.g. Gaussian) priors, where equilibria are nonlinear.
On the other hand, more involved auction games, that are being studied in economics, simply become all but intractable in a discrete implementation: Take, e.g. combinatorial FPSB auctions, where bundles of multiple items are sold to bidders at the same time. Even in a continuous-action implementation, the representation size of the action space already grows exponentially in the number of items; but in a discrete implementation it grows double-exponentially.

I'm sure there's many other use cases, e.g. in optimal control.

Public states support

I've looked over the code and the arxiv paper but I didn't see a mention of public states. What is the status on these? Are you planning to support them in the library? If yes, are you considering also introducing "shared" states between players? And how about the FOG formalism?

We are developing a game theoretic library at CTU as well. It seems like wasteful redundant effort to make two libraries, however for the algorithms we develop public trees / infoset trees are needed.

I'm very happy to open a discussion on this topic! :) It would save us tremendous engineering efforts if we do not have to duplicate code like this.

Preferred priority for PyTorch example contributions?

I read contributing.md and saw that one of them was adding examples and supports for PyTorch. I wanted to contribute, but was wondering if you had particular algorithms or examples in mind to start with.

If there is no particular preference, I would like to start with DQN, imitating the style of the existing TensorFlow example in open_spiel/python/algorithms/dqn.py.

Thank you!

Install.sh does not work in zsh

Need to include #!/bin/bash in the first line.

Backgammon PubEval bot

Looking for some guidance - keen to implement the backgammon PubEval bot for benchmarking backgammon game performance. The original code is in C++ and would rather implement in this if possible, but all the existing bots are written in Python - is it possible to write a C++ bot, or is there a smarter way of doing this?

I can convert the code to Python if necessary, but just want to know if there are other options first.

CMake error running build_and_run_tests.sh

I am following the instructions from here and I am getting an error in step 3 when running
./open_spiel/scripts/build_and_run_tests.sh
The output is

++ CXX=g++
++ [[ linux-gnu == \d\a\r\w\i\n* ]]
+++ nproc
++ MAKE_NUM_PROCS=8
+++ nproc
++ let 'TEST_NUM_PROCS=4*8'
+++ python3 -c 'import sys; print(sys.version.split(" ")[0])'
++ PYVERSION=3.5.2
+++ python3 -c 'import sys; print(sys.version_info.major)'
++ PY_VERSION_MAJOR=3
++ BUILD_DIR=build_python_3
++ mkdir -p build_python_3
++ cd build_python_3
++ echo 'Building and testing in /home/alex/open_spiel/build_python_3 using '\''python'\'' (version 3.5.2).'
Building and testing in /home/alex/open_spiel/build_python_3 using 'python' (version 3.5.2).
++ cmake -DPython_TARGET_VERSION=3.5.2 -DCMAKE_CXX_COMPILER=g++ ../open_spiel
CMake Error at CMakeLists.txt:45 (add_subdirectory):
  add_subdirectory given source "abseil-cpp" which is not an existing
  directory.


-- Configuring incomplete, errors occurred!
See also "/home/alex/open_spiel/build_python_3/CMakeFiles/CMakeOutput.log".

I am running this is Ubuntu 16.04 and using the virtualenv specified in the previous steps

Implementation of discounted cfr and linear cfr

I try to implement discounted cfr by writing a subclass of _CFRSolver class. I overwrite the _compute_counterfactual_regret_for_player function. Since linear CFR is just a special case of discounted cfr, so it is also implemented. I am not sure if I implement it in a nice way.

I test it by running on goofspiel4, it gets decent results. Will the implementation be helpful? Should I pull it?

Guaranteed order of actions within an infoset

I just wanted to make sure I understand some things:

Action is an int64_t, but it is not guaranteed to have them indexed from i=0...n-1, i.e. State.LegalActions.at(i) == i does not always hold. (Btw this could be mentioned in comment for using Action = int64_t; in spiel_utils.h)

However, does it hold that x.LegalActions.at(i) == y.LegalActions.at(i) for all states x,y within the same infoset I and all action indices i? Or in other words, are actions of histories within infoset consistent?

If yes, is there a test for that which checks it for all games (maybe up to certain depth of the tree)?

Thanks for great work!

Suggestion: Player instead of int

For similar reasons why there is typedef int64_t Action I'd like to suggest to use typedef int Player as well.

Cool work. Do you plan to support large games, such as Texas Hold'em?

FYI

Segmentation fault for `python_playthrough_test`

Output of the test:

1/93 Test #49: python_playthrough_test .....................***Exception: SegFault 11.71 sec
Running tests under Python 3.7.3: /home/michal/.virtualenvs/os/bin/python3
[ RUN      ] PlaythroughTest.test_rerun_playthroughs
I0902 13:00:14.309487 140507519309632 playthrough_test.py:42] liars_dice.txt
I0902 13:00:14.368588 140507519309632 playthrough_test.py:42] havannah(board_size=4).txt
I0902 13:00:14.671652 140507519309632 playthrough_test.py:42] bridge_uncontested_bidding.txt
I0902 13:00:21.458908 140507519309632 playthrough_test.py:42] backgammon.txt
I0902 13:00:22.513101 140507519309632 playthrough_test.py:42] coop_box_pushing.txt
I0902 13:00:23.643769 140507519309632 playthrough_test.py:42] y(board_size=9).txt
I0902 13:00:24.053513 140507519309632 playthrough_test.py:42] hex(board_size=5).txt
I0902 13:00:24.171892 140507519309632 playthrough_test.py:42] misere(game=tic_tac_toe()).txt
I0902 13:00:24.201291 140507519309632 playthrough_test.py:42] pig_4p.txt
I0902 13:00:24.263108 140507519309632 playthrough_test.py:42] markov_soccer.txt
I0902 13:00:24.363373 140507519309632 playthrough_test.py:42] leduc_poker_773740114.txt
I0902 13:00:24.374366 140507519309632 playthrough_test.py:42] tiny_bridge_4p.txt
I0902 13:00:24.389407 140507519309632 playthrough_test.py:42] tic_tac_toe.txt
I0902 13:00:24.415252 140507519309632 playthrough_test.py:42] matrix_rps.txt
I0902 13:00:24.417987 140507519309632 playthrough_test.py:42] leduc_poker_1540482260.txt
I0902 13:00:24.436774 140507519309632 playthrough_test.py:42] oshi_zumo.txt
I0902 13:00:24.454514 140507519309632 playthrough_test.py:42] kuhn_poker_2p.txt
I0902 13:00:24.458492 140507519309632 playthrough_test.py:42] pig_5p.txt
I0902 13:00:24.823575 140507519309632 playthrough_test.py:42] catch.txt
Fatal Python error: Segmentation fault

Current thread 0x00007fca74c6d740 (most recent call first):
  File "/home/michal/Code/GT/open_spiel/open_spiel/python/algorithms/generate_playthrough.py", line 161 in playthrough_lines
  File "/home/michal/Code/GT/open_spiel/open_spiel/python/algorithms/generate_playthrough.py", line 52 in playthrough
  File "/home/michal/Code/GT/open_spiel/open_spiel/python/algorithms/generate_playthrough.py", line 254 in replay
  File "/home/michal/Code/GT/open_spiel/open_spiel/python/../integration_tests/playthrough_test.py", line 43 in test_rerun_playthroughs
  File "/usr/lib/python3.7/unittest/case.py", line 615 in run
  File "/usr/lib/python3.7/unittest/case.py", line 663 in __call__
  File "/usr/lib/python3.7/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.7/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.7/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.7/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.7/unittest/runner.py", line 176 in run
  File "/usr/lib/python3.7/unittest/main.py", line 271 in runTests
  File "/usr/lib/python3.7/unittest/main.py", line 101 in __init__
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/testing/absltest.py", line 2200 in _run_and_get_tests_result
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/testing/absltest.py", line 2230 in run_tests
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/testing/absltest.py", line 1971 in main_function
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/app.py", line 251 in _run_main
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/app.py", line 300 in run
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/testing/absltest.py", line 1973 in _run_in_app
  File "/home/michal/.virtualenvs/os/lib/python3.7/site-packages/absl/testing/absltest.py", line 1855 in main
  File "/home/michal/Code/GT/open_spiel/open_spiel/python/../integration_tests/playthrough_test.py", line 53 in <module>

My setup:

python --version
Python 3.7.3

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.04
Release:        19.04
Codename:       disco

cmake --version
cmake version 3.13.4

g++ --version
g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0

Question: Is there any information that can be used to identify Open Spiel supports?

I am not familiar with game classification.

For example, I want to add tetris.
But I don't know if Tetris applies to the classification described in Open Spiel supports. Is there any information that can be used to identify it?

Thank you.

Automatic tests of games

I think it would be nice to have a general test suite for all games which makes these kinds of generic tests, like action-infoset consistency, zero-sumness if the game claims so, etc., it would be beneficial for game developers. I didn't notice such a thing in the lib. The kinds of tests we've done in GTLib are:

isDomainZeroSum
domainMaxUtility
domainMaxDepth
isNumPlayersCountActionsConsistentInState
isActionGenerationAndAOHConsistent

It's not much, but catches bunch of bugs fast.

unable to run install.sh [ubuntu 19.04]

while trying to run install.sh, errors pump up for some reason finally ends up manually install dependencies in that .sh file

env:
Ubuntu 19.04
python 3.6

error:

+ [[  == linux-gnu ]]
install.sh: 24: install.sh: [[: not found
+ [[  == darwin* ]]
install.sh: 30: install.sh: [[: not found
+ echo The OS '' is not supported (Only Linux and MacOS is).  Feel free to contribute the install for a new OS.
The OS '' is not supported (Only Linux and MacOS is).  Feel free to contribute the install for a new OS.
+ exit 1

`kStandard` strategy averaging incorrect in `ExternalSamplingMCCFRSolver`

https://github.com/deepmind/open_spiel/blob/ee3b6d906e982e3e62b11bf239a98340b172b9a8/open_spiel/algorithms/external_sampling_mccfr.cc#L108

The behavioral policy is being added to the cumulative policy when the player's sequence probability (the probability that the player plays to the current information state and plays action aidx) should be added instead. The probability that the node was sampled will also have to be taken into account by dividing by that probability.

It doesn't look like there's a trivial fix for this because the reach and sampling probabilities aren't being passed down the tree, but adding and keeping track of these values would make the fix easy.

Bots API - step function does not allow to restrict number of iterations/time

The Step method

std::pair<ActionsAndProbs, Action> Step(const State& state)

or respectively EvaluateBots does not offer a way to specify for how many iterations or for what time the bots are allowed to update their strategies, so it's not possible for example to compare the performance of two Monte Carlo bots which both receive 5s of time per move.

If there's interest, I can submit a PR (I've developed something similar in GTLib)

Cooperative box pushing giving unexpected observations

Hello,

according to the paper referenced in connection to the "Cooperative Box-Pushing" game, the agents are supposed to receive observations that correspond to what they currently see in a given time step:

After every time step each agent gets one out of 5 possible observations deterministically describing the situation of the environment in front of the agent: empty field,wall, other agent, small box, large box.

However, the game as implemented in OpenSpiel appears to return something else. In time_step.observations["info_state"], each agent receives a field of 704 binary values which seems to correspond to the global state (704 = 64 cells x 11 possible cell states).

Also, the GameType flag provides_information_state is set to true, while provides_observation is set to false which also goes contrary to what I would expect after reading the paper.

Is this an oversight or is there a purpose behind this?

Adding a game docs is outdated

The documentation docs/developer_guide.md seems outdated - there is no new_game.h and the registration seems to be done via REGISTER_SPIEL_GAME macro.

Unable to get Observation/StateInformation for the game of go

there seems to be no way to access the go board using the python API

Using xrange in python3

https://github.com/deepmind/open_spiel/blob/b1c2b285abe0b80eba7357dcaa2582bf93851a0c/open_spiel/python/examples/example.py#L86

Suspicious implementation of UCB formula in MCTS algorithm

Hi All

Firstly, thanks for this fantastic repo!

Looking at the python/algorithms/mcts.py function SearchNode.child_value it seems UCB is implemented as follows:

return (
        self.player_sign * child.total_reward / child.explore_count +
        uct_c * math.sqrt(math.log(child.explore_count) / child.explore_count))

In the usual UCB formula (wiki or link) the second term contains log(parentN) / childN, while above is log(childN)/childN.

I tested both formulas on different implementation of MCTS and both work, but log(childN)/childN seems to degrade performance.

There seems to be equivalent issue in C++ version in mcts.cc

Is this a potential bug, or am I missing something?

Unable to build and run test.(Solved)

The Problem is solved.Thanks to rejuvyesh.

(lost) hyh@amax2080-1:~/Downloads/open_spiel-master/build$ CXX=g++ cmake
-DPython_TARGET_VERSION=3.6 -DCMAKE_CXX_COMPILER=${CXX} ../open_spiel
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++
-- Check for working CXX compiler: /usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:45 (add_subdirectory):
add_subdirectory given source "abseil-cpp" which is not an existing
directory.

-- Found Python3: /home/hyh/anaconda3/lib/libpython3.7m.so (found version "3.7.0") found components: Development
-- Configuring incomplete, errors occurred!
See also "/home/hyh/Downloads/open_spiel-master/build/CMakeFiles/CMakeOutput.log".

Gamut library

I just came across Gamut library: http://gamut.stanford.edu/

I thought it might be an interesting addition for generating normal-form games and wanted to let you know about it, in case you didn't. It's written in Java, so maybe it would be possible to write a "bridge" for it? Maybe it can be added to the list of call for contributions?

Question: where is supported game list?

Is this a list of supported games?

CMakeLists.txt

I wanted to see a simpler list if possible.

Thank you.

Unable to run example on different games

I want to run cfr_example.py on different games other than Kuhn poker. I changed the game name to "goofspiel". Then I got a running error like this:

RuntimeError: /Desktop/open_spiel/open_spiel/games/goofspiel.cc:313 player >= 0
player = -2, 0 = 0

I don't know how to fix it. What mistake do I make?

ModuleNotFoundError: No module named 'open_spiel'

I want to run open_spiel/python/examples/playthrough.py for an richer example generating a playthrough and printing all available information. (MacOS, Python 3.7.3)

The error message is:
ModuleNotFoundError: No module named 'open_spiel'

The snapshot is as the follows:

How can I import open_spiel.python.algorithms when I run the python program?
Thanks~

@lanctot

Games with large branching factor

In games with large branching factor (such as stratego): std::vector<Action> LegalActions will simply run out of memory, even on a smaller 6x6 variant (12!^2).

Do you think it would make sense to support them, and if yes, how?
I suggest implementing methods such as unsigned long LegalActionsCount and LegalActionAt(unsigned long index)

Failed to invoke `nproc` alias

(venv) howes% ./open_spiel/scripts/build_and_run_tests.sh
++ CXX=g++
++ [[ darwin18 == \d\a\r\w\i\n* ]]
++ alias 'nproc=sysctl -n hw.physicalcpu'
++ CXX=/usr/local/bin/g++-7
+++ nproc
./open_spiel/scripts/build_and_run_tests.sh: line 37: nproc: command not found
++ MAKE_NUM_PROCS=

Apparently nproc alias is not expanded in the build_and_run_tests.sh script in the $(nproc) expression. My fix:

diff --git a/open_spiel/scripts/build_and_run_tests.sh b/open_spiel/scripts/build_and_run_tests.sh
index ba0ec84..a4a64ab 100755
--- a/open_spiel/scripts/build_and_run_tests.sh
+++ b/open_spiel/scripts/build_and_run_tests.sh
@@ -29,13 +29,14 @@ set -e  # exit when any command fails
 set -x
 
 CXX=g++
+NPROC=nproc
 if [[ "$OSTYPE" == "darwin"* ]]; then  # Mac OSX
-  alias nproc="sysctl -n hw.physicalcpu"
+  NPROC="sysctl -n hw.physicalcpu"
   CXX=/usr/local/bin/g++-7
 fi
 
-MAKE_NUM_PROCS=$(nproc)
-let TEST_NUM_PROCS=4*$(nproc)
+MAKE_NUM_PROCS=$(${NPROC})
+let TEST_NUM_PROCS=4*${MAKE_NUM_PROCS}
 
 PYVERSION=$(python3 -c 'import sys; print(sys.version.split(" ")[0])')
 PY_VERSION_MAJOR=$(python3 -c 'import sys; print(sys.version_info.major)')

Infinite loop configuring CMake

I'm getting into an infinite loop when running cmake per the installation guide:

(venv) howes% CXX=/usr/local/bin/g++-7 cmake -DPython_TARGET_VERSION=3.6.5 -DCMAKE_CXX_COMPILER=${CXX} ../open_spiel
-- The C compiler identification is AppleClang 10.0.1.10010046
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-7
-- Check for working CXX compiler: /usr/local/bin/g++-7 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE  
-- Found Python3: /usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/libpython3.6m.dylib (found version "3.6.5") found components:  Development 
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= /usr/local/bin/g++-7

-- The C compiler identification is AppleClang 10.0.1.10010046
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-7
-- Check for working CXX compiler: /usr/local/bin/g++-7 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE  
-- Found Python: /usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/libpython3.6m.dylib (found version "3.6.5") found components:  Development 
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= /usr/local/bin/g++-7
...

Page not found when clicked on Contributing.md

In README under Guidlines and Contributing when we click on CONTRIBUTING.md under "Roadmap and Call for Contributions" it says Not Found.

Apparently, the link address is incorrect. I'm willing to open a PR to fix this.

Bots API - information state instead of state

The API currently has a method

std::pair<ActionsAndProbs, Action> Step(const State& state);

I believe this should be rather

std::pair<ActionsAndProbs, Action> Step(const std::string& informationState);

as the bots should not get access to current state of the world, but only to their possible perception of it.

The current API also allows bots to "cheat", to see where they are exactly in the game, or to ask for observations of their opponent.

I understand changing this has some drawbacks - for example in perfect information (PI) games it's unnecessary (as num. of histories in information state is always 1) and it would require the authors somehow derive what state they are in. For such games, a workaround could be to derive all the actions from information state string -- player can always know his actions, and opponent's actions are received player's observations, since they are public in perfect info games. There can be a helper function for that in PI games.

In imp. info games this derivation is a crucial component of the algorithms themselves, as they have to "figure out" what could have happened in the game so that they've received such observations.

Running OpenSpiel in Windows

Have had a first stab at running OpenSpiel within the Windows Subsystem for Linux (WSL) version 1 on Windows 10, and its been partially successful - 54% of tests passed. Not suggesting this as a long term solution for Windows support, but may hep some in the short term.

First of all installed WSL as per the instructions here: https://docs.microsoft.com/en-us/windows/wsl/install-win10

The highest version of SUSE support was 18.04, so had to upgrade cmake via the following commands:

wget http://www.cmake.org/files/v3.12/cmake-3.12.4.tar.gz
tar -xvzf cmake-3.12.4.tar.gz
cd cmake-3.12.4/
./configure
make
sudo make install
sudo update-alternatives --install /usr/bin/cmake cmake /usr/local/bin/cmake 1 --force

After that, the install instructions seemed to work fine. Looks like the tests failing are the Python ones - anyone got any suggestions on resolving this?

54% tests passed, 43 tests failed out of 94

Total Test time (real) = 130.80 sec

The following tests FAILED:
49 - python_api_test (Failed)
50 - python_playthrough_test (Failed)
51 - python_action_value_vs_best_response_test (Failed)
52 - python_best_response_test (Failed)
53 - python_cfr_test (Failed)
54 - python_deep_cfr_test (Failed)
55 - python_dqn_test (Failed)
56 - python_eva_test (Failed)
57 - python_evaluate_bots_test (Failed)
58 - python_expected_game_score_test (Failed)
59 - python_exploitability_descent_test (Failed)
60 - python_exploitability_test (Failed)
61 - python_fictitious_play_test (Failed)
62 - python_generate_playthrough_test (Failed)
64 - python_rl_losses_test (Failed)
65 - python_lp_solver_test (Failed)
66 - python_mcts_test (Failed)
68 - python_nfsp_test (Failed)
69 - python_outcome_sampling_mccfr_test (Failed)
70 - python_policy_gradient_test (Failed)
71 - python_projected_replicator_dynamics_test (Failed)
72 - python_generalized_psro_test (Failed)
73 - python_rectified_nash_response_test (Failed)
74 - python_random_agent_test (Failed)
75 - python_rcfr_test (Failed)
76 - python_sequence_form_lp_test (Failed)
78 - python_bluechip_bridge_wrapper_test (Failed)
79 - python_uniform_random_test (Failed)
80 - python_alpharank_test (Failed)
81 - python_alpharank_visualizer_test (Failed)
82 - python_dynamics_test (Failed)
83 - python_heuristic_payoff_table_test (Failed)
84 - python_utils_test (Failed)
85 - python_visualization_test (Failed)
86 - python_catch_test (Failed)
87 - python_cliff_walking_test (Failed)
88 - python_data_test (Failed)
89 - python_bot_test (Failed)
90 - python_games_sim_test (Failed)
91 - python_matrix_game_utils_test (Failed)
92 - python_policy_test (Failed)
93 - python_pyspiel_test (Failed)
94 - python_rl_environment_test (Failed)
Errors while running CTest
jfadmin@QJF-SURFACEBOOK:~/open_spiel/build$

problem encoutered when test neurd_example.py

(venv) hadoop2@hadoop2-OptiPlex-3060:~/open_spiel/open_spiel/python$ ipython examples/neurd_example.py 
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-09-01 21:21:08.022815: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-01 21:21:08.043147: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-09-01 21:21:08.043872: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3da5710 executing computations on platform Host. Devices:
2019-09-01 21:21:08.043885: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-01 21:21:08.369054: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-09-01 21:21:08.710001: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_feedforward_evaluate_707
2019-09-01 21:21:08.710106: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __forward_feedforward_evaluate_795
2019-09-01 21:21:08.710185: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference___backward_feedforward_evaluate_793_824
2019-09-01 21:21:08.716664: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_reduce_reduce_body_with_dummy_state_853
2019-09-01 21:21:08.716829: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_feedforward_evaluate_707
2019-09-01 21:21:08.716861: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __forward_feedforward_evaluate_795
2019-09-01 21:21:08.716907: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference___backward_feedforward_evaluate_793_824
2019-09-01 21:21:08.827831: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_feedforward_evaluate_917
2019-09-01 21:21:08.827891: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __forward_feedforward_evaluate_1005
2019-09-01 21:21:08.827929: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference___backward_feedforward_evaluate_1003_1034
2019-09-01 21:21:08.833679: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_reduce_reduce_body_with_dummy_state_1063
2019-09-01 21:21:08.833821: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_feedforward_evaluate_917
2019-09-01 21:21:08.833853: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __forward_feedforward_evaluate_1005
2019-09-01 21:21:08.833887: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference___backward_feedforward_evaluate_1003_1034
Iteration 0 exploitability 0.4247961000921089
Iteration 100 exploitability 0.13983355315315135
Iteration 200 exploitability 0.1823885485363587
Iteration 300 exploitability 0.1990055277899966
Iteration 400 exploitability 0.21137665455358728
Iteration 500 exploitability 0.21808422199937166
Iteration 600 exploitability 0.22395193202207553
Iteration 700 exploitability 0.21091725506283007
Iteration 800 exploitability 0.18711425547827917
Iteration 900 exploitability 0.16794287456432777
Exception ignored in: <bound method _EagerDefinedFunctionDeleter.__del__ of <tensorflow.python.eager.function._EagerDefinedFunctionDeleter object at 0x7fa2447a3208>>
Traceback (most recent call last):
  File "/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 305, in __del__
AttributeError: 'NoneType' object has no attribute 'remove_function'
Exception ignored in: <bound method _EagerDefinedFunctionDeleter.__del__ of <tensorflow.python.eager.function._EagerDefinedFunctionDeleter object at 0x7fa2446f90f0>>
Traceback (most recent call last):
  File "/home/hadoop2/open_spiel/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 305, in __del__
AttributeError: 'NoneType' object has no attribute 'remove_function'

add_subdirectory given source "abseil-cpp" which is not existing

Running (patched -- see #2) ./open_spiel/scripts/build_and_run_tests.sh per install instructions results in the following:

++ CXX=g++
++ NPROC=nproc
++ [[ darwin18 == \d\a\r\w\i\n* ]]
++ NPROC='sysctl -n hw.physicalcpu'
++ CXX=/usr/local/bin/g++-7
+++ sysctl -n hw.physicalcpu
++ MAKE_NUM_PROCS=4
++ let 'TEST_NUM_PROCS=4*4'
+++ python3 -c 'import sys; print(sys.version.split(" ")[0])'
++ PYVERSION=3.6.5
+++ python3 -c 'import sys; print(sys.version_info.major)'
++ PY_VERSION_MAJOR=3
++ BUILD_DIR=build_python_3
++ mkdir -p build_python_3
++ cd build_python_3
/Users/howes/src/open_spiel/build_python_3
++ echo 'Building and testing in /Users/howes/src/open_spiel/build_python_3 using '\''python'\'' (version 3.6.5).'
Building and testing in /Users/howes/src/open_spiel/build_python_3 using 'python' (version 3.6.5).
++ cmake -DPython_TARGET_VERSION=3.6.5 -DCMAKE_CXX_COMPILER=/usr/local/bin/g++-7 ../open_spiel
-- The C compiler identification is AppleClang 10.0.1.10010046
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-7
-- Check for working CXX compiler: /usr/local/bin/g++-7 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:45 (add_subdirectory):
  add_subdirectory given source "abseil-cpp" which is not an existing
  directory.


-- Found Python3: /usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/libpython3.6m.dylib (found version "3.6.5") found components:  Development 
-- Configuring incomplete, errors occurred!
See also "/Users/howes/src/open_spiel/build_python_3/CMakeFiles/CMakeOutput.log".

alphaZero implementation ideas and passing state to rl_agent

First of all, thanks for making this framework open source.

I’m investigating the possibility of making a (simplified) alphaZero implementation using openspiel, and I was looking for some implementation ideas, especially since you already mention this in the contributors guide.

Please note: I am not sure if I will have the time to make the code up to openspiel standards. Also I might not very closely follow the alphazero pseudo-code. Thus, I am uncertain sure whether this effort will eventually result in a pull request. I still think some pointers would be very helpful, since others might be going to work on similar algorithms.

Implementation wise, it seems most logical to me to create an rl_agent implementation called alphaZero. When taking a step however; the agent will perform a MCTS. To do this, a complete game state will have to be reconstructed. The easiest way to do this would be to pass the current environment state as an argument to the step() function of the rl_agent and then creating a game with this state internally in the rl_agent. This feels hacky to me: instead of using the time_step argument, which was seemingly designed to provide all available information to the agent, you are feeding it extra with the full game state (of course, in a perfect information game this would be available already anyways).

What would be your perspective on this topic?
Of course, any other design advice on the implementation would be very welcome as well.

tl;dr: How to do state reconstruction in rl_agent? Do you have design advice on the implementation of an alphaZero-like algorithm?