Giter VIP home page Giter VIP logo

Comments (27)

lanctot avatar lanctot commented on May 10, 2024 1

Wait a sec; I take back the make a copy thing. You can just check the end points of both checker moves to see if there is one opponent checker on each point. (Being sure to cover the exception that if the end point is the same for both checker moves then the second one can't be a hit). That is doable from within ActionToString and requires minimal change. Shouldn't that work?

(To summarize: I am saying the hit information is missing by construction and suggesting setting them manually within ActionToString)

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Hi @jamesdfrost, thanks for the kind words! ... and for catching this :) This was not deliberate.... and entirely due to my own lack of knowledge on the common notational conventions of Backgammon.

It would be a welcome change! Do you also know if it is common to also include captures in the string representations? If so, we should probably include that as well if we're not already.

A reference backgammon agents -- especially an important one in the history of AI -- would be awesome and highly encourages!

Regarding a Windows port; oh, it's been a while since I've coded on Windows... the last thing I used was Cygwin At the bare mininum this can get you a bash shell with a view editors and g++/Make. Just took a quick check; still seems like it's in use, so could be a good start, but I've been out of the Windows game for too long, so might be good to check with a few others to see. Maybe let's start a separate thread for this point as I would like to point a few people I know to it and we could try to get a few opinions from the community.

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

Lanctot - thanks for the quick reply. I'll fixup the notation, and yes - captures are usually indicated by a , e.g. 6/4 indicates a piece captured on the 4 spot. From a quick look at the code, will also need to make this change across backgammon_test.cc as well - let me know if there's anywhere else I should look.

Will also look at an implementation of pubeval - any guidance on where this best fits in the framework would be appreciated - would it be a "bot"?

Would be good to start a new thread on Windows porting - would you see this as something which would run natively under Windows, or running in an environment like Cygwin? Win10 now has the Windows Subsystem for Linux so that might be another option of getting this running. https://www.howtogeek.com/424886/windows-10s-linux-kernel-is-now-available/

It's been a long time since I did any proper C++ coding, so might take me a while to get up to speed, but I'm really keen to get involved and help where I can.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Hmm can you clarify why we'd need to change backgammon_test.cc? We should not be comparing those strings anywhere (except maybe the playthrough -- which would need to be regenerated). To be clear, you're only suggesting changing the implementation of backgammon::BackGammonState::ActionToString, right? That's what the tests call.

Yes, a bot would make sense, maybe under a new directory games/backgammon ?

Would love it if someone could get it working natively in Windows without Cygwin. I started with suggesting Cygwin because I assume this is too difficult or too much work without out since we've adopted a Linux/MacOS workflow based on CMake and g++/clang, but would be happy to be proven wrong on this! I would be absolutely blown away if WSL gave enough on top of just the kernel that allowed OpenSpiel to compile on Windows. On that new thread, I'd love to know what kind of workflow typical Windows researchers use-- would be good to start from there.

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

On the backgammon_test front, it depends on whether you'd just like to change the human readable notation, or the actual "SpielMove" representation - not sure what is best for DL. (I have created a backgammon playing AI previously, but it was based on state values not action values, this is one area I'm keen to play with using OpenSpiel.)

For example:
bstate->CheckerMovesToSpielMove({{18, 1, false}, {19, 6, false}})));

should this be:
bstate->CheckerMovesToSpielMove({{18, 17, false}, {19, 13, false}})));

Otherwise will just change the ActionToString, plus some of the comments in backgammon_test for clarity.

On the Windows front, I'll have a play with WSL first as that might be a quick win if it works.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Ah yeah, sorry thought you meant only changing the string representation only, not the internal representation of the action encoding. We have this specific encoding for efficiency reasons:

The way we are encoding actions now is documented here: https://github.com/deepmind/open_spiel/blob/316a71f083afdc59fabb2609faa1e6f3c542ed4b/open_spiel/games/backgammon.h#L57

This allows an action space of size 1352. That's already large (this is e.g. the width of the output head of a policy network). If we were to change the encoding to four positions, this would have to be ~26^4 which is roughly 460000, and there would be a lot of unused wasted space. It would be nice if we could either keep the internal representation (since it avoids these redundancies already) or still have a compact encoding that allow users to construct explicit actions using i.e. "15-17"; that one would be ok because it's clearly using a die with two pips, but for example "23 - score" would not be mappable to an explicit action without more context-- maybe this is ok, though since they are only used when paired with states anyway. To avoid this, I did the easiest thing and defined a Checker move to have the specific die being used.

I agree it would be nicer but I'm not sure how to do it cleanly, properly, and efficiently. I might make the legal move generation more complex (which is already complex -- there are so many tricky cases to cover). Happy if you want to try this out, but it's also quite a big change. @locked-deepmind may also have some thoughts on this.

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

Lanctot - thanks, that makes perfect sense, hadn't really considered the impact on action state size. I'll leave the internal representation alone just now and just focus on the user readable text.

Backgammon legal move generation is quite tricky, and your code for this seems pretty slick. I spent a large portion of my time doing this for my previous backgammon training framework, and my code was pretty horrific by the end of it.

One other potential issue I've spotted is the splitting a double roll into two separate turns - this could cause the network to miss potential moves which it would chose if it had 4 moves rather than 2X2. An example would be the board position below with a roll of 1-1

+------|------+
|......|.....o|
|......|.....o|
|......|......|
|......|......|
|......|......|
|      |      |
|......|......|
|......|......|
|o.....|......|
|o.....|......|
|o..o..|......|
+------|------+

ideally here what you should do is move a checker 3 moves from 12-9 to make the point on the 9 spot, and then move 24-23 with the remaining move.

If you are looking at 2 moves at a time you'd never do this, you would probably end up doing (24-23, 24-23) and then (23-22, 23-22)

However, treating the double as a single move is going to cause the number of potential actions to explode.

I think the solution is potentially to treat the first move on a double as a different kind of move to the second move. So the moves for a double six become 661 followed by 660. So at least the network gets some form of clue that it has a second move to come after the first move.

Not sure if this makes any sense, but happy to try and explain it better if not!

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

BTW you can use three back-ticks for getting a fixed font (reposting so it's easier for me to interpret):

+------|------+
|......|.....o|
|......|.....o|
|......|......|
|......|......|
|......|......|
|      |      |
|......|......|
|......|......|
|o.....|......|
|o.....|......|
|o..o..|......|
+------|------+

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Ah yes, my general answer to this would "the agent should learn the high-level bahavior".

One way to do that would be to include whether you're on the second turn of a double-move in the observation (rather than augmenting the actions), which can be a single bit. The the agent would be able to learn if it sees doubles but that double is off, then it's on the first turn of the doubles, and if it sees 1 in that bit then it's on the second. And a sufficiently smart policy could then plan both moves since it would know that a second one is coming.

I think we're currently not including that information in the observation, because we matched it to the Tesauro paper-- but we probably want to add it, at least as parameter when instatiating the game. What do you think?

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

I think having a bit to encode whether its on the first or second move of a double makes perfect sense, and does make more sense as an observation. I'd probably have it as 1 if you are on the first move of a double, and 0 for all other scenarios (including not a double) as that feels simpler for the agent to understand, but either should work.

All of Tesauro's published work which I've read is based on state values rather than action values, so the problem of the doubles exploding the action values didn't occur - you just consider which of the (huge number of) possible end positions had the highest state value and move to that.

Backgammon has a relatively small state space (10^20) but a huge branching factor - I've seen positions with 6000+ possible branches from a single (double) roll which probably makes it more suited to learning from the state value, but really interested to see if learning from the action values is going to end up with a stronger network.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

All of Tesauro's published work which I've read is based on state values rather than action values, so the problem of the doubles exploding the action values didn't occur - you just consider which of the (huge number of) possible end positions had the highest state value and move to that.

Right. I only made reference to Tesauro's work to refer to the precise observations (ie. what would be input the the neural network). Right now, we have implemented exactly what is documented in his paper. If we were to add a bit indicating whether or not you are on a double-turn, it would then become inconsistent with his inputs, which is why I'd prefer it to be optional (i.e. a game paremeter flag that you can enable) because that is implementation-dependent.

I am quite curious about what current methods can do with Q-networks, though, and whether the choice of value nets is just the way to go in this game. I know in AlphaZero the Shogi policy net was a distribution over thousands of actions and still worked, but it was huge networks there. I'm curious if DQN could work out-of-the-box on backgammon, and how a Tesauro-style value net-- but with a search based backup operator rather than the Q-learning one, and otherwise everything else identical to the DQN setup-- how well this would do.

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

I'd be really interested in collaborating on that if possible - my MSc data science project was about applying modern DL practices to backgammon, and ended up achieving a pretty strong convolutional value network on a learning framework I built myself, trying to (simply) apply some of the principles from AlphaZero. However there were a number of issues which I didn't manage to resolve, in particular the lack of look ahead search which I believe would have significantly improved the strength of the network. I think that search based backup on a value network would easily surpass the current best programs - although these are close to perfect already, they all have concepts of "outplays" where a human picks a move that on deeper inspection is better than the best move suggested by the neural net.

I had been thinking about starting again from scratch when I came across this framework, hence the interest!

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

OK, back to the main point here - going to look at changing the human readable notation, but don't understand what the "chance outcome" section of ActionToState relates to - this doesn't seem to be actual moves as far as I can tell, but not sure what this is actually showing, example below.

Also, there seems to be a player 1, player -1 (which I'd expect), but occasionally Player 0 pops up.

The rest of the changes in actionToString are straightforward, I'll also add an "Off" to the PositionToString function to represent pieces which have been beared off.

player -1
sampled outcome: chance outcome 8 (roll: 26)
State: 
+------|------+
|o...x.|x....o|
|o...x.|x....o|
|o...x.|x.....|
|o.....|x.....|
|o.....|x.....|
|      |      |
|x.....|o.....|
|x.....|o.....|
|x...o.|o.....|
|x...o.|o....x|
|x...o.|o....x|
+------|------+
Turn: o
Dice: 26
Bar:
Scores, X: 0, O: 0

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

@jamesdfrost : ok, great-- I will keep you posted if we do any work on this front! Would be fun to collaborate.

You should not need to change anything on how the action gets mapped to a string when it's the chance player (kChancePlayerId which is -1, see https://github.com/deepmind/open_spiel/blob/e39f9c2950f990c8c974e4bab4968c8ed6ef0638/open_spiel/spiel.h#L38) only when it's one of the players (0 or 1).

In backgammon: Player 0 is the X player, player 1 is the O player. See here: https://github.com/deepmind/open_spiel/blob/e39f9c2950f990c8c974e4bab4968c8ed6ef0638/open_spiel/games/backgammon.h#L46

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

Ahhh player (-1) is the dice, sorry should have spotted that... D'oh!

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

OK, I've got the new code for the human readable moves, but I've got an issue with displaying whether a piece is "hit". In the old proc, there was code to do this using the following bools:
cmoves[0].hit
cmoves[1].hit

However, these flags don't seem to be getting set. Within DoApplyAction the following code is definitely working:

bool first_move_hit = ApplyCheckerMove(cur_player_, moves[0]);
bool second_move_hit = ApplyCheckerMove(cur_player_, moves[1]);

not been able to track down where in the flow this is getting missed - will have a look over the weekend but any pointers to where this might be going wrong would be appreciated.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Hmm.. I'm a bit confused without seeing the code. It's not clear where the discussion settled.

Are you doing more than modifying backgammon::BackgammonState::ActionToString?

The .hit is being set online in the legal move generation, and I think it's only being used in two places: ActionToString (maybe) and Undo.

Maybe you can put the code into a pull request and we can iterate a few times there? I'm a bit lost on what specifically we agreed on, sorry!

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

I’m only modifying actionTostring currently to display the move in standard backgammon format.

Happy to post the existing changes up tomorrow, but the issue of not recognising the hit exists already in the following code which looks like it is trying to show the hit with a * on it - it’s in the current return function of actionToString:

cmoves[0].hit ? "*" : ""

(And the same for cmoves[1].hit)

These always return false currently, even straight after a hit.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Ah yes, I see. The hit information is not really a part of the move, it's just meta-information that is included for certain purposes (such as making the ActionToString nicer and allowing the Undo to put back a stone when it was a hit).

So when the moves are constructed manually, that meta-info can be added. But when the CheckerMoves are obtained by a conversion, i.e. using SpielMoveToCheckerMove, then it won't have that extra meta-information (because there's no difference from the perspective of the internal move representation).

An "easy" way to fix this would be to make a copy the state within ActionToString, then check if the first one is a hit, and set the cmoves[0].hit appropriately. Then apply the checkermove to the copy, and then check if the second one is a hit and set the boolean in cmoves[1].hit. It would make the ActionToString quite heavy, though...

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

Lanctot, thanks for your help on this - that works well. I have now modified the ActionToString function to make the following changes:

  • Always show the numbering going from Bar->24->0->Off, irrespective of which player is moving.
  • Show the start position followed by end position, instead of start position, number of moves. E.g. 24/22 10/8
  • Show hits with an asterisk, e.g. 9/7*.
  • Order the moves by highest number first, e.g. 22/7 10/8 not 10/8 22/7. Not an official requirement, but seems to be standard convention.
  • Show duplicate moves as 10/8(2) instead of 10/8 10/8.
  • Show moves on a single piece as 10/8/5 not 10/8 8/5

The code is a lot bigger than previous, as it needs to cope with the rules above - hopefully still quite efficient, but its certainly heading towards "quite heavy". I'm presuming this function is only called when you want to display human readable content rather than in training but if not it may be worth having the option to switch it off.

Have loaded this file into my fork of the code, but think I have mucked up the PR for this, so looks like it has merged into my previous PR. Sorry, not really sure what I'm doing with Github, off to look for a training video on this...

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Ok, great. I'm not worried about the complexity of ActionToString, but yes please let's start another separate PR for this because I've already imported your Windows docs for that one, and it will be closed on our next update.

Also:

  • for some reason, the diff is showing deleting the entire file and then a new file? Should only be the ActionToString? I guess this will be fixed in the separate PR.
  • (if you have not already) Can you add an explicit test to backgammon_test.cc that checks the ActionToString and covers the various corner cases? Does not need to be too thorough.

(btw, I've never done a github PR myself.. good luck, but yeah as you say, probably a lot of online help :))

Thanks!
Marc

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

yes - will add some checks and re-do the PR.

Sorry, have made a bit of a mess of the branch, as I tried to revert the changes I made and ended up deleting the backgammon.cc file, then tried to re-upload the original version and finally decided to stop and get some advice from someone who knows what they are doing here! I'll probably reset the branch and start again if I can work out how to do that.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

@jamesdfrost can you leave this branch intact? I need it for when we push the Windows install instructions.

Can I suggest something potentially easier? Copy your current backgammon.cc file to $HOME. Then move your open_spiel dir to open_spiel_windows, then reclone open_spiel from scratch, then create the PR branch as usual (as you did it for the first time) then copy your modified backgammon.cc from $HOME?

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

(And it's not a problem to leave that branch a mess because I imported before you added the backgsmmon stuff so it should only take the initial commit)

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

I've left the original branch as it was, as was concerned about making things worse! Instead have made a new branch with the updated files backgammon.cc and backgammon_tests.cc - have added the tests as per your suggestion, hope this works for you.

from open_spiel.

jamesdfrost avatar jamesdfrost commented on May 10, 2024

Right - following the import of my other PR I've blown everything away and started again - starting to get my head round how this whole GitHub thing works (with a little help from a very good friend who knows this stuff). So there's a new PR for this notation, which should definitely be OK!

Sorry for all the hassle.

from open_spiel.

lanctot avatar lanctot commented on May 10, 2024

Closing this one now. PR will be merged (and closed) on our next update (tomorrow).

from open_spiel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.