Backgammon AI: Exploration of Implementation Logic and Relevant Background (Top)

preface

This collection will detail how to implement a swarm-only genetic algorithm based backgammon AI, using C++ as the underlying programming language.
This post will briefly discuss the implementation ideas and expand on each of them in the subsequent text

Understanding Backgammon

Backgammon Rules

Backgammon is a classic board game with simple yet strategic rules. The game is played on a 19×19 board (13×13 or 15×15 boards can also be used). The goal is to be the first to connect five pieces of the same color on the board (horizontally, vertically or diagonally).

Ground rules:

Pieces: The game uses two colors of pieces, usually black and white.
Drop: Players take turns placing their pieces on the board.
Victory condition: The first player to connect five pieces in a straight line (horizontally, vertically or diagonally) wins.

Backgammon is simple to play and the rules are easy to understand, but it requires a high level of strategy and skill to win the game.

How do human players play backgammon?

Here are some ideas for quintet matchups:

Control center area

Importance of Center Position: Control of the center of the board is crucial to the game. Controlling the center area gives you more chances to create and block your opponent's five-piece lines.

Creating Threats

Connected Threat: Try to make it so that the other team has to play defense and can't focus on their own offense.
Double live three: if two triple links are formed and these are not easily blocked by the opponent, you are able to win within a few moves.

Defending the opponent's connection

Observe your opponent's piece layout: pay attention to the arrangement of your opponent's pieces, especially the three- and four-disc lines that your opponent is trying to form.
Timely blocking: If the opponent has a consecutive three- or four-pawn alignment, priority should be given to blocking the opponent's line.

anticipate the opponent's strategy

Guessing the opponent's intention: understanding the opponent's strategy, predicting the opponent's next move, and making corresponding defense or attack in advance.

How should AI be imitated?

In order for an AI player to learn to play backgammon and even surpass the level of a human player, the following steps should be taken first:

Understanding board information: Converting the state of the board into a format that a program can process. This usually involves encoding the state of each position on the board (e.g., blank, black, or white) into a specific data structure that can be analyzed and processed by the program.
Setting the set of behaviors: defining the range of operations that the AI can perform. In backgammon, the AI can drop a piece at any unoccupied position on the board.
Setting the decision mode: determining how the AI makes decisions. In this example, the AI uses a greedy strategy, which means that it chooses the action with the highest expected payoff in each move. The greedy strategy selects the action that is most favorable to the current situation by evaluating the immediate payoff of each possible drop position.

Understanding board information

Theoretically, the more information that can be given to the AI, the higher the quality of the decisions that the AI will make, and the information about the board can be evaluated in terms of squares, and the value of that square to your side, and to the enemy.

For example, if the enemy can form a pentomino here, then this is a very valuable square for the place, so on your turn it is imperative that you land here to prevent your opponent from winning, unless you land elsewhere where you can win.

In this regard, we can score each viable position on the board and assess its value to our side and the enemy.

How do you define the value of the position to your side?

A piece can be connected to other pieces in four directions to form a quintet, namely: horizontal, vertical, diagonal, main diagonal

The value in a specific direction can be judged using the following methods

Choose one of the four directions
Find 4 squares in the positive direction and 4 squares in the negative direction, stopping early if you encounter a space or an enemy piece.
Count the number of your side's pieces and how well the ends are covered.

For example, for the diagram below, after dropping the disc at that spot, it forms a horizontally oriented two discs in a row with one end blocked and one end unblocked

In total, the following scenarios are possible, and we can assess their value based on empirical formulas.

antennae	1 son	2 sons	3 sons	4 sons	5 sons
commit a gaffe	③	③	②	①	MAX
cover up one end	-	-	③	②	①
stop at both ends	-	-	-	-	①

'-' indicates that the position has a low value in that direction and is not considered. However, if the situation is more favorable in the other direction, then the value of the location may still be very high.

Next, we can discuss the value of pieces in multiple directions, and in general, only the two highest-value directions need to be considered.

This is because two live threes (unobstructed triplets in a row) are enough to win, and three live threes are not a clear advantage.

fig. values (ethical, cultural etc)	optimal direction	suboptimal direction
Lv1	MAX	?
Lv1	①	?
Lv2	②	②
Lv3	②	③
Lv3	②	-
Lv4	③	③
Lv4	③	-
Lv4	-	-

"?" refers to arbitrary cases, e.g. (MAX-?) corresponds to (MAX-MAX), (MAX-①), (MAX-②), (MAX-③), (MAX-'-')

This definitional approach prioritizes behaviors into four levels

Lv1: Winning outright in the next round, or in one round.
Lv2: Down in the probability of winning in a number of rounds.
Lv3: Being able to force the opponent to defend all the time.
Lv4: Lower returns.

How do you synthesize offense and defense?

In order to synthesize attack and defense, the value of a specific position to the enemy must be taken into account. If a position is of high value to the enemy, then our moves here can disrupt the enemy's formation and weaken the enemy's value, and again we can give a table of values as follows

Combined value ranking	Value of your own side	Enemy value	Corresponding bonus values
1	Lv1	?	\(2^{20}\)
2	?	Lv1	\(2^{16}\)
3	Lv2	?	\(2^{12}\)
4	？	Lv2	\(2^{8}\)
5	Lv3	？	\(2^{4}\)
6	Lv4	？	\(2^{0}\)

"?" refers to arbitrary situations, e.g. (Lv1-?) corresponds to (Lv1-Lv1), (Lv1-Lv2), (Lv1-Lv3), (Lv1-Lv4)
When making judgments, they should be made one by one, from top to bottom.

The bonus values given here are for reference only.

summarize

In this subsection, we carefully construct a new comprehensive evaluation method, aiming to comprehensively measure the strategic value of each square on the chessboard for both our own side and the enemy. By designing a series of fine-grained quantitative metrics, we endow the AI/computer with the ability to deeply interpret the chessboard pattern, enabling it to accurately determine the specific value of each square. This approach provides solid data support for the AI/computer to make decisions.

Behavioral set prescriptions and decision-making

In order for the AI to make efficient decisions, we first need to define a suitable and concise set of behaviors. This means that the AI does not have to consider all the positions on the board one by one each time when making a choice. Based on this, we need to develop a strategy that helps the AI filter out the most appropriate one from the many possible decisions. In this way, the AI is able to make the best decision quickly and accurately in a complex environment.

ROI Region of Interest

If you landed in the center of the board in the previous round, you should not land in the far corner of the board in the next round.

In general, only bits that are neighbors to existing pieces (either your own or the enemy's) have value when they are dropped, and first we define neighbors.

Considering a situation where there is only one piece on the board, the following areas are planned to be "linked" to it with high value:

A more specific definition could be given:

If a lattice lies in the horizontal, vertical, main diagonal or sub-diagonal direction of another lattice and the distance between the two lattices is less than or equal to 2, then the two lattices are said to benext toRelationship. Further, a distance of 1 is said to be a strong neighbor and a distance of 2 is said to be a weak neighbor.

Further, we define the region of interest as follows:

A space subsection that meets one of the following requirements is a region of interest:

The grid is a chessboard.epicenter。
The lattice is associated with at least oneyour own sideExistence of the grid on which the piece is placednext toRelationships.
The lattice is associated with at least onethe enemyExistence of the grid on which the piece is placedstrong adjacencyRelationships.

The following figure gives an example of the ROI region for your side's drop, where red is your side's piece, blue is the opponent's piece, and gray indicates the region of interest.
description

Decision-making

Before making a decision, we can evaluate the value of all grids in the region of interest, assuming that the number of grids in the ROI is\(N\)The values of the grids are\(x_0, x_1, ..., x_{N-1}\)We can use the following two methods to choose a decision

hardmax

Choose the decision with the greatest reward, i.e.

\[h(\mathbf{x}) = \arg\max_{i} x_i \]

Softmax hardmax
Unlike hard maxima, soft maxima accept non-optimal solutions with a certain chance, and they contain a constant\(K\), a larger constant K indicates greater acceptance of low-value decisions when the constant\(K\to 0\)When the soft maximum is degraded toHard Max; when the constants\(K\to +\infty\)When the soft maximum is degraded torandomly selected。

\[\text{softmax}(x_i) = \frac{e^{x_i/K}}{\sum_{j=1}^N e^{x_j/K}} \]

concluding remarks

In the next post we will continue to discuss how to train AI.

fig. values (ethical, cultural etc)	optimal direction	suboptimal direction
Lv1	MAX	?
Lv1	①	?
Lv2	②	②
Lv3	②	③
Lv3	②	-
Lv4	③	③
Lv4	③	-
Lv4	-	-

fig. values (ethical, cultural etc)	optimal direction	suboptimal direction
Lv1	MAX	?
Lv1	①	?
Lv2	②	②
Lv3	②	③
Lv3	②	-
Lv4	③	③
Lv4	③	-
Lv4	-	-

fig. values (ethical, cultural etc)	optimal direction	suboptimal direction
Lv1	MAX	?
Lv1	①	?
Lv2	②	②
Lv3	②	③
Lv3	②	-
Lv4	③	③
Lv4	③	-
Lv4	-	-