Backgammon AI: Exploration of Implementation Logic and Relevant Background (below)

foreword

In the previous article, we agreed on a way to measure the value of the lattice, as shown in the table below.

Combined value ranking	Value of your own side	Enemy value	Corresponding bonus values
1	Lv1	?	\(2^{20}\)
2	?	Lv1	\(2^{16}\)
3	Lv2	?	\(2^{12}\)
4	？	Lv2	\(2^{8}\)
5	Lv3	？	\(2^{4}\)
6	Lv4	？	\(2^{0}\)

In the table, different reward values are designed for different scenarios, which are mostly empirical formulas, human-estimated values and not the best values. Similarly, in addition to the first two categories in the above table, the rest can be further subdivided into weights according to the actual situation, a sample example is given here for your reference/understanding:

Combined value ranking	Value of your own side	Enemy value	Corresponding bonus values
3.1	Lv2	Lv2	\(2^{13}\)
3.2	Lv2	Lv3	\(2^{12}\)
3.3	Lv2	Lv4	\(2^{11}\)

Again, it can constitute a kill move (Lv2 rank), and it's naturally better to be able to block the opposite side of a kill move/excellent position in the meantime.

A detailed weighting table is given in the appendix

In this post we will discuss how to make AI learn reward values based on genetic algorithms.

Overview of genetic algorithms

Genetic Algorithm (GA) is an optimization algorithm that simulates natural selection and genetic mechanisms. It is used to find the optimal solution of a problem, and is particularly suitable for complex optimization problems and search problems. Genetic Algorithm is based on Darwin's theory of natural selection and improves the solution step by step by simulating the process of biological evolution.

The basic steps of the genetic algorithm are as follows:

Initialization: creating an initial population where each individual in the population (often called a chromosome or solution) is a potential solution to the problem.
Evaluation: Calculate the fitness value of each individual in the population, this is usually done through an objective function where the fitness value indicates the individual's superiority or inferiority.
Selection: individuals are selected for reproduction based on fitness values, with superior individuals having a higher probability of being selected to generate the next generation of the population.
Crossover (Crossover): generates a new individual by swapping some of the genes of two individuals. This step mimics the mating process of organisms and can generate new solutions.
Mutation: randomly changing genes in some individuals to introduce new genetic variants. This step helps the algorithm to jump out of the local optimum and increase the diversity of solutions.
Replacement: replacing a portion of individuals with the best individuals, retaining the best genes so that there is no decline or oscillation in the phenotype of the population.
Termination: determine whether the algorithm satisfies the termination conditions, such as reaching the maximum number of iterations or finding a good enough solution. If the condition is satisfied, the algorithm terminates and returns the optimal solution; otherwise, return to step 2.

Genetic Algorithm Implementation Ideas

initialization

The AI decision scheme designed in this paper contains a total of 12 parameters, 11 of which are reward weights\(R_i\), 1 is acceptance of inferior options\(K\)。

We can define\(N\)The individual intelligences, respectively, are initialized with initial weights and, in general, the\(N\)You can take 10~100, it is better to choose an even number, otherwise there will be some unnecessary trouble.

The initialization process can be expressed mathematically as:

\[W_i^{t=0} = W_0 \]

Among them.\(W_0\)denotes the initial weight.\(W_i^{t=0}\)denote\(t\)thirteenth meeting of the sixty-fifth session of the General Assembly\(i\)Individuals.

valuation

In this example, the method of letting the AI play the game is used to evaluate the AI score based on the AI's performance in the game, and the specific process is as follows:

Generate a random permutation from 1 to N and assign it to the AI in order
Combine the AIs with serial numbers 1, 3, 5,... AI of serial numbers 1,3,5,... AIs of the numbers 2, 4, 6,...
Record the results of the game into the AI score sheet.
Completion\(N_R\)The round is played, and if it is not completed, the game returns to 1.
Ranking the AI.

crossover

When the ranking is completed, let the AIs in the bottom 50% and the AIs in the top 50% be combined two by two with the following mathematical equation

\[\begin{align*} W_{i}^{t+1}&=W_i^t\times(1-c)+W_{i-50}^t\times c, &N/2 \leq &t \leq N \\ W_{i}^{t+1}&=W_i^t\times(1-c)+W_{i+50}^t\times c, &0 \leq &t \leq N/2 \end{align*} \]

Among them:\(c\)is the learning factor (crossover rate), which indicates how well the AI accepts new knowledge (weights) during the learning process.\(c\)larger, the more the AI tends to accept the new weights.\(c\)The smaller it is, the more the AI tends to keep the old weights. Crossover rate\(c\)in one move you can take it\(0.01\sim0.3\)

interchangeability

First define the locally optimal individual and the globally optimal individual.

local optimum\(W_b^t\): An individual is said to be locally optimal if it ranks first in the round in terms of overall performance (most wins).
global optimum\(W_B\): When only one round of iterations is performed, the globally optimal individual is equal to the locally optimal individual, i.e:\(W_B=W_b^{t=0}\).. When more than one game has been played, the new locally optimal individuals are compared to the globally optimal individuals\(N_R\)rounds, if the global optimal individual wins, it remains the global optimal individual, and if it loses, the local optimal individual becomes the new global optimal individual. This can be expressed mathematically as:

\[W_B= \begin{cases} W_b^t &\text{if}\;W_b^t \;\text{win},\\ W_B &\text{otherwise}. \end{cases} \]

In order to retain the optimal traits, some of the bottom-ranked individuals are replaced with the globally optimal individuals, and the replacement rate is written as\(s\)The general take\(0.02\sim 0.1\)

variation

During mutation, an individual's genes are randomly altered. Define coefficient of variation\(m\), the extent to which it has absolutely mutated, in general\(m\)The scope of the\(0.01\sim0.1\)The mathematical formula is as follows:

\[W_{i,j}^{t}=W^t_{i,j}\times (1+m_j) \]

included among these\(W_{i,j}^{t}\)denote\(t\)thirteenth meeting of the sixty-fifth session of the General Assembly\(i\)Individuals of the first\(j\)weights.\(m_j\)Yes, it is.\((-m,m)\)within the random number.

Process Summary

The flow of genetic algorithm learning is given below

Initializing populations
Create a game where individuals play against each other, score points and rank them.
Determines whether the stop condition is reached, and if not, continues.
Match individuals two by two for crossover operations in order of ranking
Replace the bottom-ranked individuals with locally optimal and globally optimal individuals, respectively
Perform mutation operations
Go to step 2

appendice

Behavioral Priorities

Lv1: Play to win outright, or in one turn.
Lv2: Lower in a high probability to win in a number of turns.
Lv3: Ability to force the opponent to keep defending.
Lv4: Lower returns.

Initial weights table

Combined value ranking	Value of your own side	Enemy value	Corresponding bonus values
1	Lv1	?	\(2^{20}\)
2	?	Lv1	\(2^{16}\)
3.1	Lv2	Lv2	\(2^{13}\)
3.2	Lv2	Lv3	\(2^{12}\)
3.3	Lv2	Lv4	\(2^{11}\)
4.1	Lv3	Lv2	\(2^{9}\)
4.2	Lv4	Lv2	\(2^{8}\)
5.1	Lv3	Lv3	\(2^{6}\)
5.2	Lv3	Lv4	\(2^{4}\)
6.1	Lv4	Lv3	\(2^{2}\)
6.2	Lv4	Lv4	\(2^{0}\)

Description of symbols

notation	significance	Numerical range
\(W\)	Individual (weight)	-
\(R\)	Rewards for action	-
\(K\)	Acceptance of inferior options	-
\(N\)	population size	10~100
\(N_R\)	Number of rounds played at the time of the assessment	10~100
\(T\)	Number of iterations	20~500
\(c\)	crossover rate	0.01~0.03
\(s\)	replacement rate	0.02~0.1
\(m\)	variation rate	0.01~0.1