# Frontier Software

Robert Laing's programing notes

# Game Trees

By Robert Laing

$$P(s'|s, a)$$

## Rewards

Discount factor γ is a value between 0 and 1, similar to interest or inflation rates (1 - r) used in present and future value calculatons in finance.

When γ = 1 (ie effectively not there) the formulat is called “additive”.

$$U_h([s_0, s_1, s_2, \ldots, s_n]) = R(s_0) + \gamma R(s_1) + \gamma^2 R(s_2) + \cdots + \gamma^n R(s_n)\\ = \sum_{x=0}^n \gamma^x R(s_x)$$

If R(s_n) is the same (ie a constant) for all ns, the formula for the sum of a finite geometric series applies:

$$\sum_{x=0}^n \gamma^x R = \frac{(1 - \gamma^n)}{(1 - \gamma)} R$$

If the horizon is infinite, n = ∞, the standard formula for an infinite geometric series turns this to:

$$U_h([s_0, s_1, s_2, \ldots]) = \frac{R_{max}}{(1 - \gamma)}$$
$$\sum_{x=1}^n x = \frac{n(n+1)}{2}$$

## Bellman Equation

https://en.wikipedia.org/wiki/Bellman_equation

### Objective Function

MIT OpenCourseWare lecture on Monte Carlo Tree Search

Probabilistic analysis of game trees

### 2. Expansion

#### Upper Confidence Bound

$$ucb1(state, action) = expand + explore$$

expand is the average result from past times an action has been taken from a given state:

$$expand = \frac{wins_{state, action}}{visits_state}$$

explore biases the choice to less explored p(state, action)

### 4. Backpropagation

https://artint.info/html/ArtInt_62.html

These are notes on the basics of game trees and how to use them in Prolog.

As per tradition, I’ll use Tic Tac Toe as the simplest example. Instead of starting with a blank board, it’s best to start with examples with obvious answers: It is X to move. For a human player, it’s fairly obvious X must mark the bottom left corner (the rightmost option in the above diagram), else O will win in the following turn.

To an automaton, however, this involves a state-space search.

## Graphviz

The above types of diagrams are commonly known as digraphs (a contraction of directed graph). The best way to represent these textually and convert them into drawings is probably graphviz’s dot. I used inkscape to create the above since I wanted drawings of the tic tac toe boards, and battle to get graphviz’s automated layout to keep things as I want. The left-to-right order of the states above are not arbitrary: since the textual representation of the states is row-column-mark and sets in software are invariably alphanumerically ordered lists, that’s the likely order of exploration.

Abstracting the full game tree for the above tic-tac-toe example by using lower case alphabet letters for the states and integers for the edges, I get the following diagram: The code to produce this by running dot -Tsvg ttt3.dot -o ttt3.svg looks like this:

digraph game_tree {
node [shape=circle, ordering=out];
f, h [shape=doublecircle, color=red];
k [shape=doublecircle, color=blue];
l [shape=doublecircle];
a -> b [label=1];
a -> c [label=2];
a -> d [label=3];
b -> e [label=4];
b -> f [label=5];
c -> g [label=4];
c -> h [label=5];
d -> i [label=4];
d -> j [label=5];
e -> k [label=6];
j -> k [label=8];
g -> l [label=6];
i -> l [label=7];
}


A couple of points to note: each state or node name is unique — there can’t be two a or whatever nodes in the diagram.

The labels (representing moves) can appear more than once in the overall graph, but not from a given state because a graph tree is a deterministic finite automata (DFA), as opposed to nondeterministic finite automata (NFA) which we mainly find in natural language parsing where the same label — eg the word spring might be a season, mechanical device, geographic feature… — can lead to different states, requiring them to be bundled into sets to convert NFAs to DFAs. Mercifully, we don’t have the same ambiguities in games, so can ignore NFAs.

States and labels have different types, illustrated by using alphabet letters as states and integers as labels. I call labels moves, and many textbooks call them input symbols.

In a game tree, some terminals are desirable and others to be avoided — as I’ve attempted to illustrate by colouring O victories red, X victories blue, and draws black.

## Prolog

Digraphs are easily translated into Prolog.

move(a, 1, b).
move(a, 2, c).
move(a, 3, d).
move(b, 4, e).
move(b, 5, f).
% move(b, 9, a). % cycle
move(c, 4, g).
move(c, 5, h).
move(d, 4, i).
move(d, 5, j).
move(e, 6, k).
move(g, 6, l).
move(i, 7, l).
move(j, 8, k).

win(k).
draw(l).
lose(f).
lose(h).


## Route finding

Prolog textbooks tend to focus on relation(Parent, Child) tuples rather than arc(Parent, Label, Child) triples used to describe automata, which is a pitty since automata are the basis of programming language compilers and lots else.

Since the various graph traversal techniques use states rather than arcs (and how they were initally connected gets lost), these predicates are needed to reasamble the path of inputs signals (which I think is the origin of the jargon term string).

connected([Parent|Parents], [Child|Children], Moves, Acc) :-
move(Parent, Move, Child), !,
connected(Parents, [Parent, Child|Children], [Move|Moves], Acc).

connected([_|Parents], Children, Moves, Acc) :-
connected(Parents, Children, Moves, Acc).

connected([], _, Acc, Acc).


I’m jumping to the type of graph traversal best suited to game trees — iterative deepening with depth first — and then digressing into depth first vs breadth first at the end.

depthlimited(Limit, Start, [R|Path]) :-
depthlimited_(Limit, [depth(0, Start)], [], [R, L|ReversedPath]),
connected(ReversedPath, [L], [], Path).

depthlimited_(_, [depth(_, State)|_], ReversedPath, [lose, State|ReversedPath]) :-
lose(State).

depthlimited_(_, [depth(_, State)|_], ReversedPath, [draw, State|ReversedPath]) :-
draw(State).

depthlimited_(_, [depth(_, State)|_], ReversedPath, [win, State|ReversedPath]) :-
win(State).

depthlimited_(Limit, [depth(Limit, _)|Frontier], ReversedPath, Acc) :-
depthlimited_(Limit, Frontier, ReversedPath, Acc).

depthlimited_(Limit, [depth(N, Parent)|Frontier0], ReversedPath, Acc) :-
Limit > N,
succ(N, M),
getchildren(Parent, Children0),
findall(depth(M, Child), member(Child, Children0), Children1),
append(Children1, Frontier0, Frontier1),
depthlimited_(Limit, Frontier1, [Parent|ReversedPath], Acc).


This guards against infinite cycles for depth first searches, and also helps find and prune losing moves early.

?- depthlimited(1, a, Path).
false.

?- depthlimited(2, a, Path).
Path = [lose, 1, 5] ;
Path = [lose, 2, 5] ;
false.

?- depthlimited(3, a, Path).
Path = [win, 1, 4, 6] ;
Path = [lose, 1, 5] ;
Path = [draw, 2, 4, 6] ;
Path = [lose, 2, 5] ;
Path = [draw, 3, 4, 7] ;
Path = [win, 3, 5, 8] ;
false.


## Iterative Deepening

iterative_deepening(Start, Path) :-
iterative_deepening_(0, Start, Path).

iterative_deepening_(N, Start, Path) :-
depthlimited(N, Start, Path).

iterative_deepening_(N, Start, Path) :-
\+depthlimited(N, Start, Path),
succ(N, M),
iterative_deepening_(M, Start, Path).


This has a snag in that it quits after finding the terminals in paths 1 and 2, abandoning 3 too early.

?- iterative_deepening(a, Path).
Path = [lose, 1, 5] ;
Path = [lose, 2, 5] ;
false.


A way around that is to split the problem into a subtree for each of the initial moves:

?- maplist(iterative_deepening, [b, c, d], Paths).
Paths = [[lose, 5], [lose, 5], [draw, 4, 7]] ;
Paths = [[lose, 5], [lose, 5], [win, 5, 8]] ;
false.


Dividing the problem like this opens the way to two big advantages:

1. It opens the way to using modern multicore hardware which is ready backed into SWI Prolog with concurrent_maplist(:Goal, +List1, +List2)
2. Bad moves can be pruned early on to focus on more promssing moves.
:- use_module(library(thread)).

...

?- concurrent_maplist(iterative_deepening, [b, c, d], Paths).
Paths = [[lose, 5], [lose, 5], [draw, 4, 7]].


## Finding the best move

filtermoves([], [], Acc, Acc).
filtermoves([_|Moves], [[lose|_]|Nexts], Filtered, Acc) :- !,
filtermoves(Moves, Nexts, Filtered, Acc).
filtermoves([Move|Moves], [_|Nexts], Filtered, Acc) :-
filtermoves(Moves, Nexts, [Move|Filtered], Acc).

bestmove(State, Move) :-
findall(Next, move(State, _, Next), Nexts),
concurrent_maplist(iterative_deepening, Nexts, Paths),
findall(Opening, move(State, Opening, _), Openings),
filtermoves(Openings, Paths, [], [Move|_]).


In this example, where there is only one best move, the above code works fine:

?- bestmove(a, Move).
Move = 3.


# Graph traversal basics in Prolog

## Transitive Closure

The “Hello World” of Prolog textbooks is something called a transitive closure, enabling it to search graphs in a few lines of code such as this:

transitive_closure(Start, Path) :-
transitive_closure_(Start, [], RPath),
reverse(RPath, Path).

transitive_closure_(S0, Ms, [lose,M|Ms]) :-
move(S0, M, S1),
lose(S1), !.

transitive_closure_(S0, Ms, [draw,M|Ms]) :-
move(S0, M, S1),
draw(S1), !.

transitive_closure_(S0, Ms, [win,M|Ms]) :-
move(S0, M, S1),
win(S1), !.

transitive_closure_(S0, Ms, Acc) :-
move(S0, M, S1),
transitive_closure_(S1, [M|Ms], Acc).


Using Prolog’s ! cut after finding the first terminal helps it see selections 1 and 2 will lead to loses, whereas 3 will probably lead to a draw.

?- transitive_closure(a, Path).
Path = [1, 5, lose] ;
Path = [2, 5, lose] ;
Path = [3, 4, 7, draw] ;
Path = [3, 5, 8, win].


If the ! is omitted, the result is:

?- transitive_closure(a, Path).
Path = [1, 5, lose] ;
Path = [1, 4, 6, win] ;
Path = [2, 5, lose] ;
Path = [2, 4, 6, draw] ;
Path = [3, 4, 7, draw] ;
Path = [3, 5, 8, win] ;
false.


## Listifying searches

The default Prolog search method does a lot of stuff automagically under the hood. When we progress from abstract game trees to ones which are generated on the fly and are too large to fit into RAM, we need to get back to basics, which involves listifying graphs.

## Spanning the tree

To keep things simple, initially I’m going to traverse the entire tree:

getchildren(Parent, Children) :-
findall(Child, move(Parent, _ , Child), Unsorted),
sort(Unsorted, Children).

depthfirst(Start, Path) :-
depthfirst_([Start], [], ReversedPath),
reverse(ReversedPath, Path).

depthfirst_([], ReversedPath, ReversedPath).

depthfirst_([Parent|Frontier0], ReversedPath, Acc) :-
getchildren(Parent, Children),
append(Children, Frontier0, Frontier1),
depthfirst_(Frontier1, [Parent|ReversedPath], Acc).

?- depthfirst(a, Path).
Path = [a, b, e, k, f, c, g, l, h, d, i, m, j, n].


A danger with depth first search is cycles. Modifying the graph to something like below will put the search into an infinite loop. There are two ways of guarding against cycle traps: by keeping a history list of visited nodes, then using something like memberchk(?Elem, +List) to skip these, or setting a depth limit (often called a boundary in textbooks), which is the approach I’ll follow later.

The only difference between depth first and breadth first is append(Children, Frontier0, Frontier1) is changed to append(Frontier0, Children, Frontier1) — instead of stacking fresh nodes on the frontier, they are queued.

While the coding change is trivial, the change to performance can be enormous, especially for seventies languages such as Prolog, Lisp and ML where finding the end of lists involves laboriously traversing them each time. This is a throwback to the limited memory of the day, and many of these languages have been re-implemented to make appending at the end of long lists less honorous, but it’s best to test implementations.

Traversing the entire graph breadth first produces:

?- breadthfirst(a, Path).
Path = [a, b, c, d, e, f, g, h, i, j, k, l, m, n].


## Searching for goals

### Depth First

I’m going to replace depthfirst_([], ReversedPath, ReversedPath). with:

depthfirst_([State|_], ReversedPath, [lose, State|ReversedPath]) :-
lose(State).

depthfirst_([State|_], ReversedPath, [draw, State|ReversedPath]) :-
draw(State).

depthfirst_([State|_], ReversedPath, [win, State|ReversedPath]) :-
win(State).


Note I’ve omitted the ! cuts, else only the first result would be returned.

This yields:

?- depthfirst(a, Path).
Path = [a, b, e, k, win] ;
Path = [a, b, e, k, f, lose] ;
Path = [a, b, e, k, f, c, g, l, draw] ;
Path = [a, b, e, k, f, c, g, l, h, lose] ;
Path = [a, b, e, k, f, c, g, l, h, d, i, m, draw] ;
Path = [a, b, e, k, f, c, g, l, h, d, i, m, j, n, win] ;
false.


An important thing to note with the above is that the first solution gives the misimpression going from a to b is a winning move for X, failing to see it would open O to win with f. One of the flaws with depth first search is it often gives wrong answers because it works left and down, a problem breadth first doesn’t have.

?- breadthfirst(a, Path), print(Path).
Path = [a, b, c, d, e, f, lose] ;
Path = [a, b, c, d, e, f, g, h, lose] ;
Path = [a, b, c, d, e, f, g, h, i, j, k, win] ;
Path = [a, b, c, d, e, f, g, h, i, j, k, l, draw] ;
Path = [a, b, c, d, e, f, g, h, i, j, k, l, m, draw] ;
Path = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, win] ;
false.


Breadth first has the advantage of seeing a -> b leads to disaster for X first. Also, a cycle doesn’t hang the computer, though it will cause it to offer neverending solutions after the correct ones above:

Path = [a, b, c, d, a, e, f, g, h, i, j, b, c, d, k, l, m, n, a, e, f, lose] ;
...

Last updated on 12 Jan 2021
Published on 12 Jan 2021

Content Footer