dickreuter / neuron_poker Goto Github PK

Texas holdem OpenAi gym poker environment with reinforcement learning based on keras-rl. Includes virtual rendering and montecarlo for equity calculation.

License: MIT License

Python 77.71% C++ 9.94% CMake 0.21% Jupyter Notebook 1.94% Cython 10.21%

poker openai-gym holdem reinforcement-learning neural-network gym-environment pokerbot

neuron_poker's People

Contributors

Stargazers

Watchers

Forkers

waynejlee wengminhua bobdijs yazzyyaz smalllemur joeydebreuk sebastianbo thinhlx1993 llongobardi bayesss cekisakurek imtreydp imrc21 dingmingxin neo-godlike magic7 alienpars pawelgorny masoudj vkresch marcosjrcwb kingsfield cuongtransc zhangwuzheng yoi532110 154461013 unonth hathcox studiobnd waspinator xuehongyanl seboz123 anzebrvar samukaunt giusergiu emilmirzayev jyl0003 vogtai mstroehle maximdrobchak adrianp- chillbest rakhithjk xgshark swrd06bp williamyuanv0 gavintgh gms2009 dirichletzz kibassa schlaeth tzuren knarxi luci0 dsfdsfgdsa evgeniy-goroh baerxxl hobbit19 codeanimal revolution-game-creation-club ngoduyvu toshenyang albert-92 dbur 80ultraman sjoerdteunisse gnodvi eonlight yiweichen04 tianrilieyan behzadk rainmanwang calebsanfo felzek huynhnhathao fuinha phdche ardieb st-tse spineight ins0o0mniac wwwjsh wendyh1108 bamasa agm07 ross-considine syntex01 phungvankhanh northwolf521 eilonshi dilshan23 jarorid eru-test forrestyang119 horstboy yuanxuegang mahlahj horyuarthur passionke aspencer44

neuron_poker's Issues

Last raiser should not take action again if all call/fold afterwards

Describe the bug
When running the dqn training, I found something strange happened, like the last raiser would fold though nobody re-raise.. i.e. Last raiser can take action again even if all call/fold afterwards.

To Reproduce
Steps to reproduce the behavior:

Run python main.py dqn_train
From the log you will see the last raiser can take action again after all call/fold.

Take below log as an example, only Seat 4 and Seat 3 are still on the table, Seat 4 was still able to make a CALL action, when Seat 4 raised 3bb and Seat 3 called.(Sometimes the last raiser would FOLD)

INFO - ===ROUND: Stage.FLOP ===
INFO - Seat 3 (Random): Action.CHECK - Remaining stack: 46, Round pot: 0, Community pot: 280, player pot: 0
INFO - Seat 4 (Random): Action.RAISE_3BB - Remaining stack: 35, Round pot: 6, Community pot: 280, player pot: 6
INFO - Seat 3 (Random): Action.CALL - Remaining stack: 40, Round pot: 12, Community pot: 280, player pot: 6
INFO - Seat 4 (Random): Action.CALL - Remaining stack: 35, Round pot: 12, Community pot: 280, player pot: 6

I looked at the code and seems something wrong with the max steps checker after raiser, which should be >= rather than >, at below line in gym_env/env.py#PlayerCycle.next_player

if self.max_steps_after_raiser and (self.counter > self.max_steps_after_raiser + raiser_reference):
=>
if self.max_steps_after_raiser and (self.counter >= self.max_steps_after_raiser + raiser_reference):

Support for Pot Limit Omaha(Hi), or Shortdeck(6+ hold em)

I was wondering if you ever had it in your roadmap to add different game styles? I think Shortdeck(6+ hold em) would be the easiest to add; but It'd be great to see this bot also handle some PLO(hi or hi/lo).

If you are into supporting it, I could probably write up a PR for it. Thoughts everyone?

Betting round limit

Currently only two betting rounds are supported, i.e. no 4bets are possible.
I believe this should be addressed at some point.

TypeError: len is not well defined for symbolic Tensors. (dense_4/BiasAdd:0)

Hello together,
i have some problems with the following run

python main.py dqn_train

Exception:
(base) franzman007:neuron_poker-master franzman007$ python main.py dqn_train
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
Using default log file
Saving log file to: /Users/franzman007/Downloads/neuron_poker-master/log/default.log
Saving info file to: /Users/franzman007/Downloads/neuron_poker-master/log/default_info.log
Saving error only file to: /Users/franzman007/Downloads/neuron_poker-master/log/default_errors.log
Screenloglevel: 20
INFO - Initializing program
INFO -
INFO - ++++++++++++++++++
INFO - Starting new hand.
INFO - ++++++++++++++++++
INFO - Dealer is at position 0
INFO - Player 0 got ['KD', '2H'] and $100
INFO - Player 1 got ['9D', 'JC'] and $100
INFO - Player 2 got ['QD', '6H'] and $100
INFO - Player 3 got ['7D', 'AD'] and $100
INFO - Player 4 got ['8D', 'JS'] and $100
INFO - Player 5 got ['JH', '4H'] and $100
INFO -
INFO - ===Round: Stage: PREFLOP
INFO - Seat 1 (equity/20/30): Action.SMALL_BLIND - Remaining stack: 99, Round pot: 1, Community pot: 0, player pot: 1
INFO - Seat 2 (Random): Action.BIG_BLIND - Remaining stack: 98, Round pot: 3, Community pot: 0, player pot: 2
INFO - Seat 3 (Random): Action.FOLD - Remaining stack: 100, Round pot: 3, Community pot: 0, player pot: 0
INFO - Seat 4 (Random): Action.RAISE_POT - Remaining stack: 97, Round pot: 6, Community pot: 0, player pot: 3
WARNING:tensorflow:From /Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING - From /Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "main.py", line 210, in
command_line_parser()
File "main.py", line 70, in command_line_parser
runner.dqn_train()
File "main.py", line 187, in dqn_train
dqn.initiate_agent(env)
File "/Users/franzman007/Downloads/neuron_poker-master/agents/agent_dqn.py", line 98, in initiate_agent
batch_size=batch_size, train_interval=train_interval, enable_double_dqn=enable_double_dqn)
File "/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/rl/agents/dqn.py", line 107, in init
if hasattr(model.output, 'len') and len(model.output) > 1:
File "/Users/franzman007/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 741, in len
"shape information.".format(self.name))
TypeError: len is not well defined for symbolic Tensors. (dense_4/BiasAdd:0) Please call x.shape rather than len(x) for shape information.
(base) franzman007:neuron_poker-master franzman007$

It ist possible, that my Version of Pyhton is incompatible?

Thank you for your great work

model_from_config error

Hello

When I run ""poetry run python main.py selfplay dqn_train"" in cmd to run the code for training the rl model, I get this error:

ImportError: cannot import name 'model_from_config' from 'tensorflow.keras.models'

Any idea why? the problem seems to be with model_from_config method.

I have tensorflow 2.16.1 and keras 3.0.0 installed.

Any help would be appreciated!

Test your Poker Agents on pokerwars.io

Hi Guys,

just wanna let you know that there is a Free Poker Bot Platform to test your agents in a more heterogenous environment. I think the current bots are already pretty competitive although it would be nice to compete against more ml experts. There are around 20-40 bots online almost 24/7.

Jump to pokerwars leaderboard or check out several API languages on pokerwars github

Hope to see some of you there and exchange some insights.

Cheers,
Simon

Problems with Tensorflow 2.4

Hi,
because there are problems with new CUDA 11.2 to run Tensorflow 2.3 I had to update to 2.4.

I want to run dqn_train.
Now I got error because of problems with tf.compat.v1.disable_eager_execution( ) in agent_keras_rl_dqn.py:
AttributeError: 'TensorBoard' object has no attribute '_should_trace'

I changed to tf.compat.v1.enable_eager_execution(), now I get error:
AttributeError: 'DQNAgent' object has no attribute 'distribute_strategy'
If I try to add "distribute_strategy" to DQNAgent(main=.....), it doesn't change anything.
What to do?

Here complete log:

2021-03-10 14:42:29.846718: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cupti64_110.dll'; dlerror: cupti64_110.dll not found
2021-03-10 14:42:29.849472: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cupti.dll'; dlerror: cupti.dll not found
2021-03-10 14:42:29.849573: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1415] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2021-03-10 14:42:29.849915: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-03-10 14:42:29.850186: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1496] function cupti_interface_->Finalize()failed with error CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
File "main.py", line 255, in
command_line_parser()
File "main.py", line 78, in command_line_parser
runner.dqn_train_keras_rl(model_name)
File "main.py", line 204, in dqn_train_keras_rl
dqn.train(env_name=model_name)
File "C:\Users\neuron_poker\agents\agent_keras_rl_dqn.py", line 102, in train
start_step_policy=self.start_step_policy, callbacks=[tensorboard])
File "C:\Users\Shadow.conda\envs\neuron_poker\lib\site-packages\rl\core.py", line 103, in fit
callbacks.set_model(self)
File "C:\Users\Shadow.conda\envs\neuron_poker\lib\site-packages\tensorflow\python\keras\callbacks.py", line 286, in set_model
callback.set_model(model)
File "C:\Users\Shadow.conda\envs\neuron_poker\lib\site-packages\tensorflow\python\keras\callbacks.py", line 2110, in set_model
self._log_write_dir = self._get_log_write_dir()
File "C:\Users\Shadow.conda\envs\neuron_poker\lib\site-packages\tensorflow\python\keras\callbacks.py", line 2143, in _get_log_write_dir
self.model.distribute_strategy)
AttributeError: 'DQNAgent' object has no attribute 'distribute_strategy'

Negative remaining stack and wrong size of RAISE_3BB

The following log shows that at the moment negative remaining stacks are possible. Also there is a wrong size of Raise3BB. On Pokerstars a raise on 3 BB is for a big blind of 100 a raise on 300.

INFO - Dealer is at position 3
INFO - Player 0 got ['JC', '4D'] and $594
INFO - Player 3 got ['QS', '9C'] and $6
INFO -
INFO - ===Round: Stage: PREFLOP
INFO - Seat 0 (equity/50/70): Action.SMALL_BLIND - Remaining stack: 593, Round pot: 1, Community pot: 0, player pot: 1
INFO - Seat 3 (Random): Action.BIG_BLIND - Remaining stack: 4, Round pot: 3, Community pot: 0, player pot: 2
INFO - Seat 0 (equity/50/70): Action.CALL - Remaining stack: 592, Round pot: 4, Community pot: 0, player pot: 2
INFO - Seat 3 (Random): Action.RAISE_3BB - Remaining stack: -4, Round pot: 12, Community pot: 0, player pot: 10
INFO - Seat 0 (equity/50/70): Action.FOLD - Remaining stack: 592, Round pot: 12, Community pot: 0, player pot: 2
INFO - Only one player remaining in round
INFO - Player 3 won: Only remaining player in round

I could provida a test and a fix, just let me know.

agent_keras_rl_dqn train with steps param

Is your feature request related to a problem? Please describe.
When fitting a DQN agent to a new environment, I may wish to vary the number of steps to train for.

Describe the solution you'd like
Having the steps as a param would allow this.

Autoplay spelt incorrectly in agent_keras_rl_dqn.py

Describe the bug
There is currently a property within agent_keras_rl_dqn called autplay, I'm not sure where it's used but it seems right to correct it.

Fast forward tables

Does the pokerbot work on fastforward cash tables? :)

Player legal moves are accessed before being set

Describe the bug
Using a blank installation from the repo following your description and running "python .\main.py selfplay dqn_train" leads to self.legal_moves being None instead of an iteratable when called using list comprehention, leading to a type error.

Steps To Reproduce
Steps to reproduce the behavior:
1.) be on win10, dl your repo using github and unzip it
2.) install the repo packages following your instructions using conda and pip -r requirements.txt
3.) cd to dir, call "python .\main.py selfplay dqn_train" from powershell
4.) self.legal_moves_limit = [move.value for move in self.legal_moves_limit]
TypeError: 'NoneType' object is not iterable

Expected behavior
training

Proposed Next Steps
There are a lot of deprecation warnings appearing which may be related to the problem. Maybe it is a simple updated package that does not work anymore as the devs intended. Could you maybe put your package version information to the requirements.txt? using pip list and conda list should be suitable!

Additional context
System Information:
Win10 version 2004
gpu: RTX2060

----------- SCROLLBACK ------------------------

(base) PS D:\poker\neuron_poker-master\tests> conda activate neuron_poker
(neuron_poker) PS D:\poker\neuron_poker-master\tests> cd ..
(neuron_poker) PS D:\poker\neuron_poker-master> python .\main.py
Usage:
main.py selfplay random [options]
main.py selfplay keypress [options]
main.py selfplay consider_equity [options]
main.py selfplay equity_improvement --improvement_rounds=<> [options]
main.py selfplay dqn_train [options]
main.py selfplay dqn_play [options]
main.py learn_table_scraping [options]
(neuron_poker) PS D:\poker\neuron_poker-master> python .\main.py selfplay dqn_train
Using default log file
Saving log file to: D:\poker\neuron_poker-master\log\default.log
Saving info file to: D:\poker\neuron_poker-master\log\default_info.log
Saving error only file to: D:\poker\neuron_poker-master\log\default_errors.log
Screenloglevel: 20
INFO - Initializing program
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
D:\Continuum\envs\neuron_poker\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
DQN_TRAIN_KERAS_RL
INFO -
INFO - ++++++++++++++++++
INFO - Starting new hand.
INFO - ++++++++++++++++++
INFO - Dealer is at position 0
INFO - Player 0 got ['KD', '2H'] and $100
INFO - Player 1 got ['9D', 'JC'] and $100
INFO - Player 2 got ['QD', '6H'] and $100
INFO - Player 3 got ['7D', 'AD'] and $100
INFO - Player 4 got ['8D', 'JS'] and $100
INFO - Player 5 got ['JH', '4H'] and $100
INFO -
INFO - ===Round: Stage: PREFLOP
INFO - Seat 1 (equity/20/30): Action.SMALL_BLIND - Remaining stack: 99, Round pot: 1, Community pot: 0, player pot: 1
INFO - Seat 2 (Random): Action.BIG_BLIND - Remaining stack: 98, Round pot: 3, Community pot: 0, player pot: 2
INFO - Seat 3 (Random): Action.FOLD - Remaining stack: 100, Round pot: 3, Community pot: 0, player pot: 0
INFO - Seat 4 (Random): Action.RAISE_3BB - Remaining stack: 92, Round pot: 11, Community pot: 0, player pot: 8
WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:148: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:3733: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\backend\tensorflow_backend.py:190: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2020-08-31 20:44:20.497962: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\callbacks.py:1122: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING - From D:\Continuum\envs\neuron_poker\lib\site-packages\keras\callbacks.py:1125: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Training for 1000 steps ...
INFO -
INFO - ++++++++++++++++++
INFO - Starting new hand.
INFO - ++++++++++++++++++
INFO - Dealer is at position 0
INFO - Player 0 got ['5D', '4C'] and $100
INFO - Player 1 got ['KH', '7H'] and $100
INFO - Player 2 got ['4H', 'JC'] and $100
INFO - Player 3 got ['AH', '9C'] and $100
INFO - Player 4 got ['AC', '3D'] and $100
INFO - Player 5 got ['KD', '6H'] and $100
INFO -
INFO - ===Round: Stage: PREFLOP
INFO - Seat 1 (equity/20/30): Action.SMALL_BLIND - Remaining stack: 99, Round pot: 1, Community pot: 0, player pot: 1
INFO - Seat 2 (Random): Action.BIG_BLIND - Remaining stack: 98, Round pot: 3, Community pot: 0, player pot: 2
INFO - Seat 3 (Random): Action.RAISE_2POT - Remaining stack: 94, Round pot: 9, Community pot: 0, player pot: 6
INFO - Seat 4 (Random): Action.RAISE_2POT - Remaining stack: 82, Round pot: 27, Community pot: 0, player pot: 18
INFO - Chosen action by keras-rl 1 - probabilities: [0.11855229 0.14552703 0.17519688 0.1530902 0.15422752 0.1201125
0.13329357]
Traceback (most recent call last):
File ".\main.py", line 261, in
command_line_parser()
File ".\main.py", line 76, in command_line_parser
runner.dqn_train_keras_rl(model_name)
File ".\main.py", line 208, in dqn_train_keras_rl
dqn.train(env_name=model_name)
File "D:\poker\neuron_poker-master\agents\agent_keras_rl_dqn.py", line 100, in train
start_step_policy=self.start_step_policy, callbacks=[tensorboard])
File "D:\Continuum\envs\neuron_poker\lib\site-packages\rl\core.py", line 171, in fit
action = self.processor.process_action(action)
File "D:\poker\neuron_poker-master\agents\agent_keras_rl_dqn.py", line 211, in process_action
self.legal_moves_limit = [move.value for move in self.legal_moves_limit]
TypeError: 'NoneType' object is not iterable

while train by dqn, some illegal_move occurs

some illegal movements araised, the illegal move mainly focus on check action, such as:

env.py - _illegal_move - 241 - 6 is an Illegal move, try again. Currently allowed: [<Action.CHECK: 1>] env.py - _illegal_move - 241 - 1 is an Illegal move, try again. Currently allowed: [<Action.CALL: 2>, <Action.FOL .... ]

do you know how to fix it? could I remove check action from actions list.

Best

Doubts about montecarlo_python.py tests results

Hello
How do you know that expected test results are correct?
I have checked montecarlo_python.py file and tried to optimize a few things. One thing I do not like is removing cards from deck using pop(random(0, len(deck))) I have rewritten some pieces of code using just pop() from deque class, but then some test failed. My idea was to shuffle the deck and then just remove the cards one by one, instead of having ordered deck and removing random cards.
What I suspect, is that shuffle() method should receive a random seed... I must verify this. I have tried random and np.random but it is still failing.
You make take a look at branch 'opt' in my repo
BUT:
When I was reading original code, I found one piece which could probably not work as expected:
in distribute_cards_to_players in lines 169 & 170 you get random indexes,
random_card1 = np.random.randint(0, len(deck))
random_card2 = np.random.randint(0, len(deck) - 1)
then in 173 you correctly get corresponding cards
self.get_two_short_notation([deck[random_card1], deck[random_card2]]
but later you add cards to player's hand (and remove from deck in this same moment) using index, not the retrieved card:
random_player.append(deck.pop(random_card1))
random_player.append(deck.pop(random_card2))
Not, if for example random_card1=0 and random_card2=1, in line 173 you retrieve the cards for verification and in line 178 you retrieve the first card and remove from deck. Now indexes are switched, so your card random_card2 (=1) will give you a new card, not verified ( which originally had index=2).
The quick fix is to check indexes and first remove the card with bigger index. But after the fix, I started to get test failures too...
assert 3.191379578251258 < 2

Why not add ante

after 700 k iterations agent just learns to go all in

Hello,

After 700 k iterations agent just learns to go all in all the time. Maybe the reward architecture should be different?

What is your experience?

Best regards,
Roberts

AssertionError: The environment must specify an observation space.

tried to start and get the following:
Screenloglevel: 20 INFO - Initializing program Traceback (most recent call last): File "main.py", line 255, in <module> command_line_parser() File "main.py", line 65, in command_line_parser runner.random_agents() File "main.py", line 107, in random_agents self.env = gym.make(env_name, initial_stacks=self.stack, render=self.render) File "/Users/stefanschumm/anaconda3/envs/neuron_poker/lib/python3.7/site-packages/gym/envs/registration.py", line 669, in make env = PassiveEnvChecker(env) File "/Users/stefanschumm/anaconda3/envs/neuron_poker/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 26, in __init__ ), "The environment must specify an observation space. https://www.gymlibrary.dev/content/environment_creation/" AssertionError: The environment must specify an observation space. https://www.gymlibrary.dev/content/environment_creation/

error for inference

I'm running the main.py with command: python main.py selfplay random --render
and getting the following error,
"AssertionError: The environment must specify an observation space. "
I'm new to the gym and reinforcement learning so it may there is a simple task I missed

bash: main.py: command not found

Describe the bug
When I type main.py selfplay random --render in my terminal, I get bash: main.py: command not found

To Reproduce
Steps to reproduce the behavior:

Create a virtual environment with conda create -n neuron_poker python=3.7
Activate it with conda activate neuron_poker, then install all required packages with pip install -r requirements.txt
Run 6 random players playing against each other: main.py selfplay random --render

Expected behavior
I expected a 6 player game to begin in the form of text in the terminal or in the form of a GUI in a seperate window

Screenshots

Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools"

Describe the bug
I downloaded and installed "Microsoft C++ Build Tools" but still get this error:

INFO - Failed to find compiled extension; rebuilding.
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

after I run python main.py selfplay dqn_train -c

Screenshots

Potential bug determining call amount

At env.py line 222 the contribution amount when calling is apparently determined by substracting our current total bet from the total bet of the previous player.

I reckon if the previous player wasn't able to cover the current bet and went all in, the current player would then call the all in and not the current bet, even if they could.

I think there should be a field for the current bet of this stage in the stage data to simplify the logic and to give relevant information to the agents.

I haven't read the entire thing yet so i may be wrong here :)

test_heads_up_after_flop bug

Here the first actions from test_heads_up_after_flop and an assert which follow after that.
env.step(Action.ALL_IN) # seat 3 utg
env.step(Action.ALL_IN) # seat 4
env.step(Action.ALL_IN) # seat 5
env.step(Action.ALL_IN) # seat 0
env.step(Action.CALL) # seat 1 small blind = all in
env.step(Action.FOLD) # seat 2 big blind folds
assert sum(env.player_cycle.alive) == 2

This test has in my oppinion several bugs.

After all actions should no player alive, because 5 players are allin they have a remaining stack of 0 so they cant do any additional actions. 1 player fold, so no actions is possible. But this test is a green one. I try to fix this, but i see that env.player_cycle.alive changed behaviour often. Sometimes is index 2 and 4 true other times index 0 and 3 and so on.
See 1 the test name and the implemented behavior is not the same. Test name expected a heads up, but there is no generation of a scenario for headsup.

Because of these points I recommend to ignore this test and and also add a comment why is it ignored.

Is active_players taking effect?

Reading the code seems that active_players in CommunityData is not being changed at all. Is desired or a bug?

Thanks and amazing job

Support Tensorflow 2

Is your feature request related to a problem? Please describe.
I wish to use Tensorflow 2 (which also supports the CUDA drivers I currently have installed)

Describe the solution you'd like
Upgrade the packages and keras usages to support tf 2.

Describe alternatives you've considered
The alternative is for me to uninstall my current CUDA drivers and install an older CUDA driver, that is supported on TF1.14. This seems unreasonable - also I think supporting the most recent, and supported, version of tensorflow is important.

Higher / variable number of iterations for equity calculation

Great work first of all!

The montecarlo simulation works fine and very fast for me.

However, I do not understand why the number of iterations for equity calculation is set to 10 in the main environment (unlike e.g. the montecarlo tests with 5000 iterations):

it results in a very high variance
it can be set higher without high cost (especially for the c++ implementation)
it should be an argument imho
especially the agent that considers equity should have a good estimate (and maybe recalculate it)

Hope I did not overlook something, otherwise it should be an easy fix

Project dependencies may have API risk issues

Hi, In neuron_poker, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

pandas
pytest
pylint
pydocstyle
gym
numpy
matplotlib
pyglet
keras-rl2
docopt
tensorflow==2.3.2
tensorboard
pybind11
cppimport

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency pandas can be changed to >=0.7.0,<=1.4.2.
The version constraint of dependency numpy can be changed to >=1.5.0,<=1.23.0rc3.
The version constraint of dependency pyglet can be changed to >=1.3.0rc2,<=1.4.11.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the pandas

pandas.concat

The calling methods from the numpy

distutils.core.setup

The calling methods from the pyglet

pyglet.graphics.vertex_list
pyglet.clock.tick
datetime.date.today
time.strftime
pyglet.graphics.draw

The calling methods from the all methods

ValueError
self.player_cycle.deactivate_current
agents.agent_keras_rl_dqn.Player.train
collections.Counter
float
self.player_cycle.next_player
self._distribute_cards
numpy.ceil
arr3.arr2.arr1.np.concatenate.flatten
logging.getLogger.setLevel
gym.envs.registration.register
pyglet.graphics.vertex_list
MonteCarlo.get_two_short_notation
type
self.player_cycle.next_dealer
tools.hand_evaluator.SUITS_ORIGINAL.find
pyglet.clock.tick
info.keys
numpy.maximum
cppimport.imp
self.player_data.__dict__.values
self.get_multiplecards
self._save_funds_history
Memoise
matplotlib.pyplot.show
numpy.random.seed
numpy.mean
logging.getLogger.error
numpy.array
join
this_player_action_space.intersection
format
numpy.sum
time.time
self.distribute_cards
pandas.Series
logging.getLogger.removeHandler
gym.make.add_player
SelfPlay.random_agents
player_alive.append
preflop_state.get_reverse_sheetname
logging.StreamHandler.setFormatter
MonteCarlo
self._get_legal_moves
ui_action_and_signals.signal_progressbar_increase.emit
gym_env.rendering.PygletWindow
self.dqn.compile
winner_in_episodes.pd.Series.value_counts
traceback.format_exception
self._get_environment
os.path.isfile
self.community_data.__dict__.values
logging.getLogger.removeFilter
tensorflow.compat.v1.disable_eager_execution
max
self.get_four_of_a_kind
self._end_round
self._create_card_deck
self._end_hand
self.winnerCardTypeList.items
pyglet.graphics.vertex_list.draw
self.config.read
deck.index
self._award_winner
multiprocessing.pool.ThreadPool.map
self._close_round
flatten
get_multiprocessing_config
logging.getLogger.warning
self.preflop_equities.items
Action
hasattr
player.cards.append
os.getenv
self.get_straightflush
eval_best_hand
self.player_cycle.deactivate_player
numpy.squeeze
agents.agent_keras_rl_dqn.Player.initiate_agent
get_config
SelfPlay.dqn_train_keras_rl
numpy.argsort
datetime.date.today
pickle.dumps
self.get_two_short_notation
pool_fn
self.model.add
numpy.diff
numpy.sort
self.viewer.text
self.legal_moves.append
pyglet.text.Label
pyglet.gl.glClear
gym.spaces.Discrete
agents.agent_custom_q1.Player
self._calculate_reward
preflop_state.get_rangecards_from_sheetname
pyglet.graphics.draw.draw
cls.Singleton.super.__call__
os.path.abspath
self._process_decision
self._next_dealer
self.callers.append
datetime.date.today.strftime
self.model.load_weights
os.environ.get
self.viewer.circle
self.get_highcard
hand._.r.r.hand.join.count.r.CARD_RANKS_ORIGINAL.find.items
setuptools.find_packages
self.player_cycle.mark_checker
PlayerShell
Exception
numpy.all
str
self._game_over
copy.copy.append
super
compiled_args.append
numpy.argmax
numpy.random.randint
kwargs.items
self._check_game_over
t.PlayerCardList_and_others.append
self.current_player.agent_obj.action
logging.Formatter
numpy.round
self.table_cards.append
PlayerCycle
logging.handlers.RotatingFileHandler.setLevel
self.deck.append
CustomProcessor
int
tensorflow.keras.models.model_from_json
_calc_score
self.raisers.append
numpy.logical_and
self.player_cycle.mark_out_of_cash_but_contributed
random.choice
agents.agent_keras_rl_dqn.TrumpPolicy
os.path.dirname
Evaluation.run_evaluation
SelfPlay.key_press_agents
StageData
sum
self.calc_score
self.func
numpy.tile
PygletWindow.update
self.card_to_num
numpy.minimum
SelfPlay
self.viewer.reset
json.dump
highCardsVal.sum.sum
self.player_cycle.mark_folder
isinstance
self._agent_is_autoplay
deck.pop
numpy.stack
player_cards_with_table_cards.index
agents.agent_keypress.Player
self.player_cycle.update_alive
tools.helper.init_logger
print
hand.join.count
self.player_cycle.new_round_reset
logging.getLogger.info
self.current_player.actions.append
numpy.delete
get_config.get
self.create_card_deck.index
self.render
pyglet.window.Window
cards_combined_array.append
multiprocessing.pool.ThreadPool.starmap
self.step
multiprocessing.cpu_count
self.display_surface.flip
os.path.realpath
tensorflow.keras.layers.Dense
rl.memory.SequentialMemory
self.create_card_deck
logging.handlers.RotatingFileHandler.setFormatter
open
configparser.ConfigParser
input
multiplier.self.sorted_5flushcards.sum
os.path.join
self.dqn.test
pyglet.text.Label.draw
filename.replace.replace
self.counts.sum
numpy.cos
table_card_list.append
winner_in_episodes.append
copy.copy.pop
agents.agent_keras_rl_dqn.Player
self.env.action_space.sample
math.cos
multiprocessing.pool.ThreadPool
known_player.append
SelfPlay.dqn_play_keras_rl
self.player_cycle.mark_raiser
CARD_RANKS_ORIGINAL.find
math.radians
self.get_opponent_allowed_cards_list
self._start_new_hand
self.deck.pop
tuple
player_cards.append
list
self.env.reset
self.distribute_cards_to_table
PlayerFinalCardsWithTableCards.index
sorted
tensorflow.keras.layers.Dropout
copy.copy
self._next_player
CustomConfigParser
self.get_counts
gym_env.env.Action
self.viewer.rectangle
math.sin
numpy.concatenate
PygletWindow
agents.agent_consider_equity.Player
configparser.ExtendedInterpolation
self._distribute_cards_to_table
set
numpy.logical_or
self.get_three_of_a_kind
logging.getLogger.debug
self.cards.np.arange.sum
numpy.any
self.dqn.fit
round
MonteCarlo.run_montecarlo
docopt.docopt
self._get_winner
distutils.core.setup
suits.count
self.set_args
strided
pyglet.graphics.draw
numpy.logical_not
CardsOnTable.append
self.display_surface.switch_to
self.player_cycle.get_potential_winners
tools.helper.flatten
self.update_alive
PlayerData
self.current_player.temp_stack.append
numpy.insert
numpy.random.choice
numpy.exp
logging.StreamHandler
self.distribute_cards_to_players
sd.__dict__.values
docopt.docopt.upper
SelfPlay.equity_vs_random
command_line_parser
logging.getLogger.addHandler
zip
pandas.DataFrame
random_player.append
PygletWindow.circle
setuptools.setup
CommunityData
self.winner_in_episodes.pd.Series.value_counts
tools.hand_evaluator.eval_best_hand
numpy.zeros
self.new_hand_reset
self.funds_history.reset_index
tensorflow.keras.callbacks.TensorBoard
self.get_kickers
winnerCardTypeList.append
self.funds_history.reset_index.plot
TrumpPolicy
get_dir
t.PlayerCardList.append
min
logging.handlers.RotatingFileHandler
tools.hand_evaluator.get_winner
Cython.Build.cythonize
pandas.concat
self.get_straight
ui_action_and_signals.signal_status.emit
tensorflow.keras.optimizers.Adam
get_config.getboolean
table_cards_numeric.append
flush_hand._.r.r.flush_hand.join.count.r.CARD_RANKS_ORIGINAL.find.items
rl.agents.DQNAgent
self._clean_up_pots
json.load
numpy.append
operator.itemgetter
self.create_card_deck.pop
player_cards_with_table_cards.append
readme.read
numpy.random.random
len
self.players.append
tensorflow.keras.models.Sequential
tools.hand_evaluator.CARD_RANKS_ORIGINAL.find
tableCardList.append
numpy.amax
time.strftime
L.get_collusion_cards
self.deactivate_current
numpy.sin
agents.agent_random.Player
self.load
self.get_pair_score
enumerate
numpy.isin
self.stage_data.sd.sd.__dict__.values.flatten.list.np.array.flatten
MyWinnerMask.Winners.all
self.log.info
self.get_flush
tools.helper.get_config
self.player_cycle.mark_bb
PygletWindow.reset
gym_env.env.PlayerShell
agents.agent_keras_rl_dqn.Player.play
self.player_status.append
self.display_surface.dispatch_events
os.path.relpath
self.env.add_player
self._initiate_round
self.get_fullhouse
PlayerFinalCardsWithTableCards.append
gym.make
SelfPlay.equity_self_improvement
all_players.append
range
self.reset
numpy.arange.self.suits.sum
self._illegal_move
self.model.to_json
self.player_cycle.new_hand_reset
numpy.clip
getattr
pyglet.gl.glColor4f
numpy.arange
get_config.getint
logging.StreamHandler.setLevel
_keys_to_tuple
self.dqn.save_weights
gym.make.seed
self.winner_in_episodes.append
flush_hand.join.count
PygletWindow.text
logging.getLogger
Evaluation
self.viewer.update
self.get_equity
self.get_two_pair_score
q_values.astype.astype
RuntimeError
self._execute_step
gym.make.reset

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

AttributeError: 'NoneType' object has no attribute 'load_weights'

Im trying to train and then play DQN.

Train works fine and it generated a weight file.
Now I try to play and i suppose it can't load the weights for some reason.


Traceback (most recent call last):
  File "D:/Pokerbot/main.py", line 208, in <module>
    command_line_parser()
  File "D:/Pokerbot/main.py", line 73, in command_line_parser
    runner.dqn_play()
  File "D:/Pokerbot/main.py", line 201, in dqn_play
    self.env.add_player(DQNPlayer(load_model='neuron_poker-v0'))
  File "D:\Pokerbot\agents\agent_dqn.py", line 42, in __init__
    self.load(load_model)
  File "D:\Pokerbot\agents\agent_dqn.py", line 120, in load
    self.dqn.load_weights('dqn_{}_weights.h5'.format(env_name))
AttributeError: 'NoneType' object has no attribute 'load_weights'

Process finished with exit code 1

Save/load memory with DQN player

Is your feature request related to a problem? Please describe.
I wish to be able to load the sequence memory which loading a model and its weights, so that I can continue to train the model on a new environment.

Describe the solution you'd like
The memory to be saved/loaded whenever the model/weights are.

"stack" option in main.py

Is your feature request related to a problem? Please describe.
I want to be able to specify the stack size when running main.py.

Describe the solution you'd like
Implement a stack option, defaulting to 500 (the current default)

6 random players playing against each other uses A LOT of ram

Describe the bug
This is not really a bug, it's more of a question. Why is only playing with random players uses such a big amount of ram?
It's expected that the ram consumption increases as more hands are being played (to save log and stuff in memory). But I am talking about 3-4GB of ram after only ~20 hands.

To Reproduce
Steps to reproduce the behavior:
Follow the installation steps. (The readme is quite clear, it very helpful thank you !)
Then run: python main.py random --render

Expected behavior
I would except the ram to be used less, as hands are being played.

Screenshots

Additional context
I have only 8gb of ram

Problem with dqn_train

Describe the bug
How to launch training? Is there any particular version of python or tensorflow which must be used?

I get error: ValueError: could not convert string to float

python main.py dqn_train
returns:

INFO - Starting new hand.
INFO - ++++++++++++++++++
INFO - Dealer is at position 0
INFO - Player 0 got ['3C', '5H'] and $100
INFO - Player 1 got ['KC', '2S'] and $100
INFO -
INFO - ===Round: Stage: PREFLOP
INFO - Seat 1 (keras-rl): Action.SMALL_BLIND - Remaining stack: 99, Round pot: 1, Community pot: 0, player pot: 1
INFO - Seat 0 (equity/50/70): Action.BIG_BLIND - Remaining stack: 98, Round pot: 3, Community pot: 0, player pot: 2
INFO - Random action
INFO - Seat 1 (keras-rl): Action.ALL_IN - Remaining stack: 0, Round pot: 102, Community pot: 0, player pot: 100
INFO - Previous action reward for seat 1: 0
Traceback (most recent call last):
File "main.py", line 238, in
command_line_parser()
File "main.py", line 75, in command_line_parser
runner.dqn_train()
File "main.py", line 218, in dqn_train
dqn.train(env_name='dqn1')
File "c:\Users\ms\git\neuron_poker\agents\agent_dqn.py", line 110, in train
start_step_policy=self.start_step_policy, callbacks=[tensorboard])
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\rl\core.py", line 169, in fit
action = self.forward(observation)
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\rl\agents\dqn.py", line 228, in forward
q_values = self.compute_q_values(state)
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\rl\agents\dqn.py", line 69, in compute_q_values
q_values = self.compute_batch_q_values([state]).flatten()
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\rl\agents\dqn.py", line 64, in compute_batch_q_values
q_values = self.model.predict_on_batch(batch)
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1506, in predict_on_batch
outputs = self.predict_function(ins)
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2979, in call
return self._call(inputs)
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2917, in _call
dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
File "C:\Users\ms\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: '3C'

'Card Type error' in Montecarlo for some hands

Please close the issue.

I got this error for a hand like this:

get_equity({'10H', '4C'},{'10C', '10D', '10S'}, 2, 5000)

However, if you replace 10 by T it works. For some reason, all other hands (except four times T) do not cause an error, not sure why

Cant raise after my first turn

Describe the bug
The following is currently not possible:
Player 1 bets -> Player 2 raises -> Player 1 raises
And:
Player 1 checks -> Player 2 raises -> Player 1 raises

To Reproduce
Steps to reproduce the behavior:

Add one KeyPressAgent instead of a RandomPlayer in the method random_agents in main.py
run main.py random --render

The simulation:
3. Bet in your turn
4. Opponents raises
5. Your options are only CALL and FOLD

Expected behaviour
Player 1 can raise

Additional context
I am not sure if you would expect issues adding a KeyPressAgent to the random simulation. Looking at the code it looks like it shouldn't be a problem. Also, looking at the random simulation without adding a KeyPressAgent, I dont see players raising twice.

log:

INFO - ===ROUND: Stage.RIVER ===
Choose action with number: [<Action.CHECK: 1>, <Action.RAISE_3BB: 3>, <Action.RAISE_3BB: 3>, <Action.RAISE_POT: 4>, <Action.RAISE_2POT: 5>, <Action.ALL_IN: 6>]
3
INFO - Seat 0 (Keypress): Action.RAISE_3BB - Remaining stack: 641, Round pot: 6, Community pot: 174, player pot: 6
INFO - Seat 1 (Random): Action.CALL - Remaining stack: 459, Round pot: 12, Community pot: 174, player pot: 6
Choose action with number: [<Action.CALL: 2>, <Action.FOLD: 0>]

Duplicated enum value

Duplicated enum value in Action enum
./gym_env/enums.py

class Action(Enum):
"""Allowed actions"""
FOLD = 0
CHECK = 1
CALL = 2
RAISE_3BB = 3 <--------------
RAISE_HALF_POT = 3 <--------------
RAISE_POT = 4
RAISE_2POT = 5
ALL_IN = 6
SMALL_BLIND = 7
BIG_BLIND = 8

Need to specify pyglet version in requirements.txt ?

Stunning work, looking forward to getting dug in. Noticed installing from requirements.txt prompts:

"ERROR: gym 0.15.3 has requirement pyglet<=1.3.2,>=1.2.0, but you'll have pyglet 1.4.6 which is incompatible."

Illegal moves

Hi Nicolas.

Cool project! I just got it up and running, testing the DQN agent.

I had problems with the agent keep trying illegal moves. While googling around I found a SA question of yours related to the same problems.

There might be multiple things wrong, but one episode I stumbled over in my logs is a case where the agent wants to make a call or a bet while the legal moves are being limited to check and all in. So the agent is determined to put money in the pot, but is not smart enough to figure that calling or betting would imply an all in. Might it not be better to find a way to accept a call or a bet as a legal move in this instance, to not penalize with negative rewards?

Support C++17 compiler (Visual Studio 2019)

Is your feature request related to a problem? Please describe.
I want to use the montecarlo_cpp module, but have Visual Studio 2019 community edition - with c++17.

Describe the solution you'd like
The compiler_args defined in pymontecarlo.cpp file are not compatible with c++17 compiler, this should be updated to allow the c++17 compiler to compile the module successfully.

Describe alternatives you've considered
I am not sure what other alternative solution there is, other than installing an older version of Visual Studio. It seems reasonable to support the latest version of the C++ compiler.

First call to process_action throws error when training with DQN agent

Describe the bug
When I attempted to train/fit the DQN agent model, when the DQN agent from keras-rl made the first call to the process_action it produced an error because legal_moves_limit was None. There is a check to see if the legal_moves_limit is defined, but not if it's got a None value.

To Reproduce
Steps to reproduce the behavior:

Run command "python main.py selfplay dqn_train -c --name=test"
File: agent_keras_rl_dqn.py
Line number: 210

Expected behavior
Player.process_action invocation should not throw when legal_moves_limit is None.

Screenshots
N/A

Additional context
N/A

self.current_round_pot = 9

in gym_env.env.class HoldemTable( ),
self.current_round_pot = 9
Why 9 instead of 0 ？

dickreuter / neuron_poker Goto Github PK

neuron_poker's People

Contributors

Stargazers

Watchers

Forkers

neuron_poker's Issues

Recommend Projects

Recommend Topics

Recommend Org