junxiaosong / alphazero_gomoku Goto Github PK
View Code? Open in Web Editor NEWAn implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
License: MIT License
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
License: MIT License
I wonder how can I change the board size into 19.
Do you expect is it possible to train it with size 19 instead of 8?
我看一些文章中Q是取的平均值,但是我看代码中Q是滑动平均值,这两个值应该是不一样的,所以想问一下为什么代码中用滑动平均?
用的是默认的配置:6x6 board and 4 in a row.
macos上跑的。
batch i:1100, episode_len:21
kl:0.00058,lr_multiplier:11.391,loss:4.518421649932861,entropy:3.5188446044921875,explained_var_old:0.000,explained_var_new:0.000
current self-play batch: 1100
num_playouts:1000, win: 2, lose: 8, tie:0
请指教。
你好,训练出来的模型有三种格式(.model.meta; .model.index; model.data.....),但我要使用模型的时候,应该使用哪个呢?或者说还是要做个转换才能使用?
请问你的电脑配置是什么呢,想参考一下
而且有个问题想问一问,我用的tensorflow跑885的大小,cpu是i5 6300hq,gpu是gtx 965m,跑出来的时间gpu比cpu更耗时,是因为gpu太差了吗
你好,请问如何把weight的参数保存为txt文件呢?
您好,我是一个刚接触机器学习没多久的大学生,想请教您一些问题。我发现在使用GPU训练时,GPU的占用率长时间保持20%以下,看了其他的Issues之后,也明白了这是生成自我对局的时间过长而导致的。那么,如果我提升mini-batch的size的话,应该是能提升训练的效率的吧?但是这会不会导致其他问题呢,望您能在百忙之中给予答复,谢谢。
您好,注意到代码中有通过比较新旧两个神经网络输出的KL散度来控制学习率的方法,实验过程中学习率先快速增加然后逐渐减少,说明这个方法确实有用。想问一下这种方法有相关的文献资料的介绍吗?还是您凭经验创造出来的呢?
跑的时候GPU内存占满了,但GPU运算部件的利用率只有12%,不知道可以增大利用率吗
就想知道
best_policy_8_8_5.model
best_policy_8_8_5.model2
这两个的区别
I'm wondering whether it would be better if we extend the input to more layers to include more information on the board such as 'live 3' and 'sleep 4' ......
贴个腾讯PhoenixGo的链接:http://tech.qq.com/a/20180511/024785.htm
源代码:https://github.com/Tencent/PhoenixGo
这个新闻说只要1块GPU就够了?!是指训练用1块就够了还是跑训练好的模型用1块?
Line 150 in 68603c0
play_data=[1,2],data_buffer为空队列的话,执行self.data_buffer.extend(play_data)想要的结果是data_buffer=[1,2],(1,2分别为训练数据[s,矩阵,z]),但实际结果会不会是data_buffer = [[1,2]]? 是不是应该把play data里的每一项提出来append到data buffer后面?
instead of 和 pure MCTS 最对比,我想这样的话是不是无论在效果优化上会得到提升?并且不需要自己在写一个pure MCTS了。因为个人觉得对于pure MCTS来说,playout的次数增加对提高它胜率是一个diminishing return
我是参考了另一个github的repro https://github.com/suragnair/alpha-zero-general 他们的效果也很不错
BTW, 谢谢楼主po这些代码出来以供学习,本人小白一枚。
AlphaZero_Gomoku/mcts_alphaZero.py
Line 137 in 68603c0
I can't absolutely understant the negative leaf_value, which is different with in the paper(AlphaGo Zero)
Could you give a explaination for this?
Thank you very much .
The source code for the protocol can be found here:
https://github.com/stranskyjan/pbrain-pyrandom/blob/master/pisqpipe.py
Then it can be used with the Piskvork gomoku manager to compare with other engines like http://www.aiexp.info/pages/yixin.html (which is presently the top gomoku engine)
MCTS根据价值网络来进行模拟和下棋,不知道这样的话效果如何?
RT
在敲你的代码过程中遇到了两个问题,麻烦您给指导一下:
Hello,I meet the problem “No module named cPickle” how to solve it?thanks!
如题
大量的计算力花费在了mcts的select与expand上,policy只占了1/8不到的时间,即使用TPU来跑CPU也跟不上,如何让mcts并发运行变得更加高效?
self._u = (c_puct * self._P *np.sqrt(self._parent._n_visits) / (1 + self._n_visits))
是不是要改成下面的:
self._u = (c_puct * self._P *np.sqrt(self._parent._n_visits / (1 + self._n_visits)))
self._parent._n_visits后面的括号改到后面去
def update_recursive(self, leaf_value):
"""Like a call to update(), but applied recursively for all ancestors.
"""
# If it is not root, this node's parent should be updated first.
if self._parent:
self._parent.update_recursive(-leaf_value)
self.update(leaf_value)
楼主你好,非常感谢你的分享!
我在tensorflow上试着跑了一下您的代码,可是在8*8的尺寸下不知为何效果不理想。于是把合法落子点限制在已有落子点的附近,但是效果还是不甚理想。想问一下如果减少合法动作数量的话,这个c_puct是否需要相应地增加或者减小呢?
请问在大棋盘上的实验效果如何,比如 15 * 15 的棋盘大小
Traceback (most recent call last):
File "human_play.py", line 75, in
run()
File "human_play.py", line 59, in run
policy_param = pickle.load(open('best_policy_8_8_5.model', 'rb'))
ImportError: No module named 'numpy.core.multiarray\r'
你好,请问如何把weight的参数保存为txt文件呢?
This neural network architecture is quite different from that in Alphago Zero's paper, for instance, the latter took a resnet approach, using 1 convolutional block and 19 residual blocks.
Simply stacking layers may cause certain defects(e.g. speed and accuracy) in network training.
我注意到你的
game.py#L42
與
game.py#L97
計算h與w之時預設height 與 width同長
h = m // width
w = m % width
您好,蒙特卡罗树部分输出的probs经过了softmax,然后每个prob都减去了max值:
probs = np.exp(x - np.max(x))
请问减去max值是为了防止结果溢出吗?
您好 我现在想采用人工对战的数据用于加速收敛,请问这里人工对战的话,训练时候的mcts_probs_batch概率该如何设定呢 ,可否让采取当前action的概率为1 其他为0?
AlphaZero_Gomoku/mcts_alphaZero.py
Line 206 in 66292c5
一直不懂这个参数的意思。有什么作用?
tensorflow版本中input_state的维度转换直接用reshape应该有问题吧?输入是[batch_size, c, h, w],tensorflow需要的则是[batch_size, h, w, c],直接reshape的话只改变了维度,数据并没有转置
如果训练的是一个先手有利的棋类(如无禁手的五子棋),那训练数据会是大量的黑棋胜的数据,这样会对训练结果造成什么样的影响呢?
state_dict in pytorch is a dict while params trained with theano dumped as list.
when you want to retrain the model trained with theano, it seems that the model can't be loaded properly.
is there any way to solve this?
在自我对弈的训练中最终陷入到两方都只进攻不防守,导致一局棋很快就结束了而且局面都比较类似,这种情况的问题在哪里?
五子棋 比赛规则 不了解一下吗???
黑旗 有 限制的
围棋和五子棋都是放下后不可移动,所以action和evaluation共用了一部分网络
如果是象棋跳棋类型,应该怎么设计这个action网络部分呢?
能否提供一点思路?
楼主你好,您在model里面的这个entropy似乎计算错了,应该是log吧,您用的tf.exp
AlphaGo zero和 alphazero 的区别是少了一个eval。self和opt还是一样。AlphaGo zero中是一个步骤一个步骤来。如,在我们的单机电脑中,是先self生成数据,然后再opt训练数据。再eval评估。我们不能同时进行这三个步骤。因为不能刚好连接得上。。。即使alphazero少了eval。但还是要一步一步来啊,要怎么做才能self数据的时候自动的变成opt的模型呢?我想的是self数据完成后直接变成model。然后model又直接self这样不是更快了吗。这样我们就可以美美的睡觉第二天起来,直接能用上更好的model了。如:https://github.com/chncyhn/flappybird-qlearning-bot
AlphaZero_Gomoku/mcts_alphaZero.py
Line 137 in 68603c0
顺便确认一下,代码中多次提到的 current player 是指当前棋局下需要走子的玩家吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.