pbsinclair42 / mcts Goto Github PK
View Code? Open in Web Editor NEWA simple package to allow users to run Monte Carlo Tree Search on any perfect information domain
License: MIT License
A simple package to allow users to run Monte Carlo Tree Search on any perfect information domain
License: MIT License
Hi, thank you for your great work. Is it possible to have more information returned with the action chosen? like probability of success, expected reward or anything that could be interesting? thank you
I'm trying to implement a rather complicated application of this. I'm trying to do a fantasy football draft. I have a class for the draft, league members, and nfl players. Everytime it is my turn to draft, I want to simulate the entire draft to determine the best possible pick. Can somebody try to point me in the right direction? I feel like I'm really close. I've added only a few members and players and set the number of rounds to 2 for testing. Obviously these classes are bare bones and I would have to make some more constraints. I'm just trying to get logic to work. Thanks.
from __future__ import division
from copy import deepcopy
from mcts import mcts
from functools import reduce
import operator
# Class for a Member of a League. Has a name, draft position, and me = True if member is me
class Member:
def __init__(self,name,draft_pos,me=False):
self.name = name
self.draft_pos = draft_pos
self.me = me
self.roster = []
@property
def total_pts(self):
return sum([p.proj for p in self.roster])
# Class for a NFL Player. Has a name, team, position, and projected Points.
class Player:
def __init__(self,name,team,pos,proj):
self.name = name
self.team = team
self.pos = pos
self.proj = proj
self.drafted = False
def __repr__(self):
return f'{self.name} {self.proj}'
# Class for a Draft. Has members, players, and number of rounds
class Draft:
def __init__(self,members,players,num_rounds):
self.members = members
self.players = players
self.num_rounds = num_rounds
# Get Players that have been drafted
@property
def drafted_players(self):
return [p for p in self.players if p.drafted == True]
# Get Players that are undrafted
@property
def undrafted_players(self):
return [p for p in self.players if p.drafted == False]
# Get current overall pick in draft
@property
def current_overall_pick(self):
return len(self.drafted_players) + 1
# Get current round of draft
@property
def current_round(self):
return ((self.current_overall_pick -1) // len(self.members)) + 1
# Get current pick in the round
@property
def current_pick_in_round(self):
return self.current_overall_pick - (len(self.members)*(self.current_round - 1))
# Get the member that is currently drafting
@property
def current_member_drafting(self):
if self.current_round % 2 != 0:
return next((m for m in self.members if m.draft_pos == self.current_pick_in_round), None)
elif self.current_round % 2 == 0:
return next((m for m in self.members if m.draft_pos == (len(self.members) - self.current_pick_in_round + 1)), None)
# Draft a player to the member that is currently drafting
def draft_player(self,player):
self.current_member_drafting.roster.append(player)
print(f'{self.current_member_drafting.name} drafted {player.name}')
player.drafted = True
# If I am drafting, return 1, else return -1
def getCurrentPlayer(self):
if self.current_member_drafting.me == True:
return 1
else:
return -1
# Get a list of all players available to draft
def getPossibleActions(self):
possible_actions = []
for player in self.undrafted_players:
possible_actions.append(player)
return possible_actions
# Create copy of state and draft player to the current member drafting
def takeAction(self,action):
new = deepcopy(self)
new.draft_player(action)
return new
# Node is terminal when all teams are full
def isTerminal(self):
if all(len(m.roster) == self.num_rounds for m in self.members):
return True
else:
return False
def getReward(self):
me = next((m for m in self.members if m.me == True), None)
others = [m for m in self.members if m.me == False]
if all(me.total_pts > m.total_pts for m in others):
return 1
else:
return False
if __name__=="__main__":
m1 = Member('m1',1,me=True)
m2 = Member('m2',2)
m3 = Member('m3',3)
members = [m1,m2,m3]
p1 = Player('Tom Brady','TB','QB',300)
p2 = Player('Josh Allen','BUF','QB',325)
p3 = Player('Russell Wilson','DEN','QB',275)
p4 = Player('Mac Jones','NE','QB',350)
p5 = Player('Matt Ryan','DEN','QB',250)
p6 = Player('Drew Lock','NE','QB',225)
players = [p1,p2,p3,p4,p5,p6]
draft = Draft(members,players,num_rounds=2)
searcher = mcts(iterationLimit=10)
action = searcher.search(initialState=draft)
print(action)
I play with the AI, but it seems not to play so well in this easy game.
For example,
def __init__(self):
self.board = [[1, -1, 0],
[1, 1, 0],
[-1, 0, -1]]
self.currentPlayer = 1
I think humans will definitely choose (1,2), but the code seem to usually choose (0, 2)
Is my operation wrong?Or what caused it to rush the wrong way.
After Pip install being successful (pip show mcts), I still get a modulenotfounderror.
Hi! I am having some issues running this library, and I would like some clarification about how/why this implementation works.
It appears that, in this MCTS implementation, the getReward() function (https://github.com/pbsinclair42/MCTS/blob/master/naughtsandcrosses.py) does not vary based on the "perspective" of the player (i.e. which player is currently selecting a move). This means that a reward of 1 means first-player wins, while a reward of -1 means second-player wins. The backpropagation step sums this reward to get the totalReward for a node.
However, if you look at getBestChild() (https://github.com/pbsinclair42/MCTS/blob/master/mcts.py), the following lines are worrying to me:
nodeValue = child.totalReward / child.numVisits + explorationValue * math.sqrt(
2 * math.log(node.numVisits) / child.numVisits)
if nodeValue > bestValue:
Doesn't this mean that, regardless of who the current player is, this function is choosing to explore the child node that is best for player 1 only? Where is the minimax aspect of this algorithm, where the reward is "flipped/inverted" based on which player is playing?
Usually, MCTS implementations will choose one of two strategies:
I think that some people choose 1) because it's more general and can apply to games with 1, 2, 3+ players, whereas 2) is limited to two-player adversarial games.
Hi, Thanks for your great work. When I run your example NaughtsAndCrossesState, I get the info that is AttributeError: 'NoneType' object has no attribute 'state'
def getBestChild(self, node, explorationValue):
bestValue = float("-inf")
bestNodes = []
for child in node.children.values():
nodeValue = child.totalReward / child.numVisits + explorationValue * math.sqrt(
2 * math.log(node.numVisits) / child.numVisits)
if nodeValue >= bestValue:
bestValue = nodeValue
bestNodes.append(child)
return random.choice(bestNodes)
This implementation looks like it will always append at least the first child it explores to bestNodes, which means it could be selected at random at the end, right?
I might be misunderstanding the process, but that might explain the performance issues raised by the other poster.
Hi,
thank you for your work, really good stuff.
I am a bit confused however on one point of the implementation logic.
Let's assume I need to decide the best possible action between A and B, given an initial state and a number of iteration limit parameter (let's say limit = 1000
).
In the first two iterations, until the node is fully expanded all the branches (or node's childs) are explored. Therefore, going back to our example, after two iterations both branch A and B are explored once. Great.
However when the node is expanded and both branches are visited once, there are a number of things that can happen:
In case A wins (same applies with reversed letters in case B wins), then when I execute round number 3, on selectNode() the best child is selected, in this case A.
The problem I have noticed is that at the end of the 1000 iterations, branch A is explored 999 times (node.numVisit), while branch B is explored just once, at the beginning.
It looks like that getBestChild()
get stuck on the first branch which won and never recover from there, and in at the end of the simulation that would be the winning branch.
Something to consider is that I am using continuous rewards and not True, False (win or lose) rewards.
I tried to play with the exploration constant, but didn't help.
Is the implementation correct? is there something to mitigate this you would suggest?
Thank you
I'm trying out your code, just expanding on your demo. I set up the computer to play itself, but it's not doing a very good job, even when I extend the time limit from 1 second to 10. X almost always wins. Any idea why? Am I running the code wrong, or is there something wrong with your algorithm and/or naughtsandcrosses implementation?
I can share my NaughtsAndCrossesState.renderBoard() code if you'd like. Everything else is unchanged from your code.
mcts = mcts(timeLimit=1000)
state = NaughtsAndCrossesState()
print(state.renderBoard())
while not state.isTerminal():
action = mcts.search(state)
state = state.takeAction(action)
print(state.renderBoard())
def expand(self, node):
actions = node.state.getPossibleActions()
for action in actions:
if action not in node.children:
newNode = treeNode(node.state.takeAction(action), node)
node.children[action] = newNode
if len(actions) == len(node.children):
node.isFullyExpanded = True
return newNode
raise Exception("Should never reach here")
we can see the node.children[action] = newNode
, does that mean this algorithm assumes that if we do the same action mutiple times in the same state, we can always get the same next-state?
As far as I understood, through MCTS we receive only the next most beneficial play after iterations but I also wonder about putting mcts object to a for loop so that I might end up with the best path. But I get the same error each time which is TypeError: 'mcts' object is not callable
when I try to use the object for a second time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.