Location>code7788 >text

Meditations on probability theory: quantitative rules

Popularity:931 ℃/2024-10-17 17:05:31

The TOEFL project has finally come to an end, and 20 days of last month were completely wasted (# ̄~ ̄#), so I'll continue to update the Probability Theory Meditations column and make up for my previous delays (and learn Japanese at the same time, so cheers to myself!). .

introductory

Probability theory is nothing more than common sense expressed as a mathematical formula.

--Laplace (1819)

In our last blogMeditations on Probability Theory: Sensible ReasoningSensible reasoning is introduced in[1][2]The condition of fitness to be satisfied in the, viz:

\[\begin{aligned} (Ⅰ) & \space Expressing the degree of amenability (numerical condition) with real numbers. \\\\ (Ⅱ)& \space Qualitatively consistent with common sense (intuition-like condition). \\\\ (Ⅲ)& \space Having consistency (consistency condition). \end{aligned} \]

Of these, No.\((Ⅲ)\)The consistency condition of the point in turn specifically encompasses the following three meanings:

\[\begin{aligned} & (Ⅲ\text{a}) \space If the conclusion can be introduced in more than one way, then each way must give the same result (non-path dependent). \\space & (Ⅲ\text{b}) \space A robot always considers all evidence relevant to a problem and does not arbitrarily ignore information (non-ideological). \\\\ & (Ⅲ\text{c}) \space Robots always represent the same state of knowledge by assigning the same amenability (full homogeneity). \end{aligned} \]

The above conditions are qualitative. As we will see in this blog, none of the above conditions are null, and no more, no less, just right. Once we derive quantitative rules of sensible reasoning that satisfy the above conditions of sensibility, we realize that we have, in effect, the original definition of probability (Multiplication Rule + Addition Rule + Principle of Non-Difference).

where the condition\((Ⅰ)(Ⅱ)(Ⅲ\text{a})\)It is the "structural" conditions of the robot's brain that determinesReasoning about the inner workings of the robot brain(here "brain" can mean circuit / neural network / ...) that derive the probability ofMultiplication rule (product rule)

\[p(AB\mid C) = p(A\mid C)p(B\mid AC)=p(B\mid C)p(A\mid BC) \]

cap (a poem)Addition rule (sum rule)

\[p(A\mid B) + p(\bar{A}\mid B) = 1 \]

\(p(x)\)is any continuous monotonically increasing function with value range\(0\leqslant p(x) \leqslant 1\)

condition\((Ⅲ\text{b})(Ⅲ\text{c})\)It is the "interface" condition that further establishes theReasoning about the connection between robots and the objective world. Among other things.\((Ⅲ\text{c})\)probabilisticPrinciple of indifference

\[p(A_i\mid B) = \frac{1}{n}, \quad 1 \leqslant i \leqslant n \]

\(\{A_1, \cdots, A_n\}\)is a mutually exclusive and exhaustive set of propositions, i.e., background information\(B\)(which determines that one and only one of them must be true)

Summarized as a mind map below:

Next we look at how the multiplication rule, the addition rule, and the principle of non-difference of probability are actually derived from the sensible condition.

1 Multiplication rules

The mind map for this subsection is below:

We first look for the logical product that will\(AB\)of human rights and the rule of law\(A\)\(B\)the rule associated with the sensibility of the case, i.e., finding the\(AB\mid C\)expressions. We will robotically determine\(AB\)The process of being true is decomposed into a pair of\(A\)cap (a poem)\(B\)The process of making step-by-step decisions, that is:

  1. judgment\(B\)For real;

    \(B\mid C\)

  2. acceptance\(B\)is true, determine\(A\)True.

    \(A\mid BC\)

(first to decide)\(A\)for the true case is the same. (We have added the rationalizations corresponding to each step at the end.)

Let's explain this in natural language. To make a proposition\(AB\)is true, the proposition that\(B\)must be true (by the definition of the logical product), so we need the sensibleness\(B\mid C\). Next, we need to further determine\(A\)To be true, at this point we need to be sensible\(A\mid BC\)specious\(A\mid C\). This is because if the robot knows\(B\)is false, then whatever\(A\)What about the sensibility of (i.e.\(A\mid \bar{B}C\)),\(AB\)Both are definitely false. And once the robot knows\(A\mid BC\)It no longer needs to know.\(A\mid C\)up (which doesn't add anything about\(AB\)(new information).

In addition, the robot does not need to know\(A\mid B\)respond in singing\(B\mid A\). Because without knowing the information\(C\)in the case of\(A\)maybe\(B\)Possible sensibility and robotic knowledge\(C\)is irrelevant to the judgment when it is true. For example, if a robot already knows that the Earth is round, then when making judgments about today's cosmological problems, it does not need to consider the viewpoints it might have had if it did not know that the Earth was round (i.e., consider additional possibilities).

(Of course, since the logical product is exchangeable, i.e.\(AB=BA\), so we can exchange the above statement for the\(A\)cap (a poem)\(B\)I got it.\(BA\mid C\)\(=AB\mid C\)). A robot can get the same in either\(AB\mid C\)The value of the consistency condition\((Ⅲ\text{a})\)(: Non-path dependency).

Going further, we have the following proposition:

Proposition 1 \(AB\mid C\)be\(B \mid C\)cap (a poem)\(A \mid BC\)of some function, i.e:

\[AB \mid C = F[(B \mid C), (A\mid BC)]\tag{1} \]

(The same should be true for\(A\mid C\)cap (a poem)\(B\mid AC\)(a function of a function of a function of a function of a function of a function of a function of a function)

If in doubt about the above reasoning, consider other alternatives such as\(AB\mid C=F[(B \mid C), (A\mid C)]\). However, this form does not satisfy the qualitative legitimacy requirement\((Ⅱ)\): class of intuitive conditions, i.e., violations of human common sense. Because given\(C\)\(A\)It's probably a good idea.\(B\)It may also be quite sensible, but\(AB\)It can be quite unconscionable. For example, a person with a blue left eye is quite logical, and a brown right eye is also quite logical, but having both a blue left eye and a brown right eye is quite out of character.

classifier for sums of money On this point, students who have been "dramatized" by probability theory/causal inference should know that the propositions that the left eye is blue and the right eye is brown are indeed causally independent, but they are still not probabilistically independent. This is because they both have genes that control eye color as covariates (so-called confounders) to form the "\(A\)the true principle\(B\)False" logical correlation (rather than physical causation).

Next, let's look at combining functions in order to satisfy our fitness conditions\(F(x, y)\)What properties are required (it might be useful to set\(x=(B\mid C)\), \(y=(A\mid BC)\))。

Let's first consider the qualitative requirements in\((Ⅱ)\): Class Intuition Condition. Given changes in a priori information\(C\rightarrow C^{\prime}\)instruct sb. to do sth.\(B\)become more sensible, but\(A\)Unchanged:

\[B\mid C^{\prime} > B \mid C, \\ A\mid BC^{\prime} = A\mid BC \]

general knowledge requirement\(AB\)It can only become more sensible, not the other way around:

\[AB \mid C^{\prime} \geqslant AB \mid C \]

The equals sign is equal if and only if\(A\mid BC\)corresponds to hold when it is impossible. This requires that the\(F(x, y)\)pertains to\(x\)is a monotonically increasing function of and if and only if\(y\)Impossible when the function on\(x\)The partial derivative is 0. Similarly, the\(F(x, y)\)It also needs to be about\(y\)is a monotonically increasing function of and if and only if\(x\)Impossible when the function on\(y\)The partial derivative is 0. Furthermore, the function\(F(x, y)\)It must also be continuous, otherwise\((1)\)A small increase in one of the sympathetic values on the right-hand side of Eq. could also result in a\(AB\mid C\)The substantial increase in the

In summary, we have the following proposition:

Proposition 2 \(F(x, y)\)must be\(x\)respond in singing\(y\)of a continuous monotonically increasing function. If we assume that it is differentiable (which is unnecessary but simplifies our derivation), we have

\[F_1(x, y) \equiv \frac{\partial F}{\partial x} \geqslant 0, \quad F_2(x, y) \equiv \frac{\partial F}{\partial y} \geqslant 0 \tag{2} \]

where the first equation equals if and only if\(y\)holds when it indicates impossibility. The second equation equals if and only if\(x\)holds when denoting impossibility (we use here\(F_i\)with respect to\(F\)former Yugoslavia 1945-1990\(i\)(the differentiation of the individual parameters).

Next, we impose the "structural consistency" condition of reasonableness.\((Ⅲ\text{a})\): Non-path-dependent. For example, for sensibleness\(ABC\mid D\), since by the combinatorial nature of Boolean algebra there are\(ABC=(AB)C=A(BC)\), we need to satisfy ourselves that we will get the same result regardless of the order according to which we compute the fitness. For example, one possible order is to first consider\(BC\)is a proposition, then repeat twice to apply the equation\((1)\)

\[(ABC \mid D) = F[(BC \mid D), (A\mid BCD)] = F\{F[(C\mid D), (B\mid CD)], (A\mid BCD)\} \]

Another possible order would be to put\(AB\)as a proposition:

\[(ABC \mid D) = F[(C \mid D), (AB\mid CD)] = F\{(C\mid D), F[(B\mid CD), (A\mid BCD)]\} \]

The consensual results derived from these two sequences must be equal. At this point, we have the following proposition:

Proposition 3 A necessary condition for a robot to reason consistently is that the function must satisfy the equation

\[F[F(x, y), z] = F[x, F(y, z)] \tag{3} \]

Abel first used this equation in his book, and Ozell called it The Associativity Equation.

Based on the above functional equations, we can eventually prove the following conclusions (see the original book for details of the proof, which assumes that the\(F\)(of microscopicity):

Conclusion 1 If the functional equation is to be satisfied\((3)\), then the relation we are looking for must take the following functional form:

\[w(AB\mid C) = w(A\mid BC)w(B\mid C) = w(B\mid AC)w(A\mid C)\tag{4} \]

We'll call itMultiplication rule (product rule). Here.\(w(x)\)To meet the form

\[w(x) \equiv \exp \left\{ \int^x \frac{\mathrm{d} x}{H(x)}\right\} \]

of a continuous monotone positive-valued function (so far, it can be increasing and decreasing and takes on arbitrary values), where the integral has no lower bound, the\(H(x)\)is an arbitrary function.

style\((4)\)It's about reaching coherent and sensible conditions.\((Ⅲ\text{a})\): a necessary condition that must be satisfied for non-path dependence. It can also be proved by mathematical induction that Eq.\((4)\)It also works for any number of propositions (e.g.\(ABCDEFG\mid H\))。

Indeed, in addition to continuous monotonic decreasing, the sympathetic condition\((Ⅱ)\): the class of intuitionistic conditions on functions\(w(x)\)Additional conditions are imposed. For example, in Eq.\((4)\)in its first form, we now assume that when given the\(C\)hour\(A\)is determined, then in the case of the case consisting of\(C\)In the "logical environment" in which knowledge is generated, propositions\(AB=B\)\(AB\)is true if and only if\(B\)is true) According to the most primitive axiom we discussed in the previous chapter, propositions with the same truth value must have the same degree of sensibility, namely

\[AB\mid C = B\mid C \]

In addition to this, we need to have, according to common sense

\[A\mid BC = A\mid C \]

Since the given\(C\)hour\(A\)It has been determined (i.e.\(C\)contain\(A\)), then when given an arbitrary relation to\(C\)Other information that is not contradictory\(B\)when\(A\)Still OK.

classifier for sums of money Those of you who have already been "dramatized" by probability theory should know that, given the\(C\)hour\(A\)sure\(\Rightarrow\)state in advance\(C\)when\(A\)respond in singing\(B\)Conditional Independence.

To summarize, Eq.\((4)\)change into

\[w(B\mid C) = w(A\mid C)w(B\mid C) \]

regardless of whether...\(B\)It must hold for how sensible or insensible the robot is. So we have the following proposition:

Proposition 4 function (math.)\(w(x)\)It must also be of the following nature:

\[\text{ deterministic by } w(A\mid C)=1 \text{ indicates }\tag{5} \]

Conversely, suppose that when given\(C\)hour\(A\)is impossible (i.e.\(C\)contain\(\overline{A}\)), at which point Eq.\((4)\)change into

\[w(A \mid C) = w(A\mid C)w(B\mid C) \]

regardless of whether...\(B\)With whatever consensual nature, this equation must hold. There are only two possible\(w(A\mid C)\)value satisfies this condition:\(0\)maybe\(+\infin\)\(-\infin\)was excluded, or else, by continuity, the\(w(B\mid C)\)(It must be able to take negative values, which contradicts the above equation). Therefore, we have the following proposition:

Proposition 5 function (math.)\(w(x)\)Satisfaction:

\[\text{not possible by } w(A\mid C)=0 \text{ or } +\infin\text{ indicates}\tag{6} \]

In summary.\(w(x)\)In addition to the fact that it must be satisfied that it is a continuous monotonous positive function, according to the symplectic condition\((Ⅱ)\): class intuition condition, it also needs to satisfy the following requirement: if it is an increasing function, the range is from\(0\)(impossible) to\(1\)(determined); if it is a decreasing function, it must range from\(+\infin\)(impossible) to\(1\)(OK) (so far, our conditions have not indicated how it varies within these ranges).

There is little difference in content between these two possible representations. Given a representation that meets the above criteria and uses\(+\infin\)denote the impossible arbitrary monosubtractive function\(w_1(x)\)We can also define the same criteria and use the\(0\)denotes the impossible single increasing function\(w_2(x)\equiv 1/ w_1(x)\). Thus, without loss of generality, we now choose the first form. Thus, we have the following proposition:

Proposition 6 \(w(x)\)for a function that satisfies the following requirements:

\[0 \leqslant w(x) \leqslant 1, \quad \text{and}w(x)\text{continuously monotonically increasing} \tag{7} \]

So far, however, in addition to the above conditions\(w(x)\)Or is it arbitrary.

2 Addition rules

Next, let's take a closer look at the\(w(x)\)impose restrictions. Since the propositions we now consider belong to the Aristotelian logical type, they must be either true or false, and their logical product\(A\overline{A}\)always false (the law of non-contradiction), the logic and the\(A+\overline{A}\)It is always true (the law of the center of the row). This means that\(A\)The sensibility of being false must depend in some way on the sensibility of it being true. If we define\(u\equiv w(A\mid B)\)respond in singing\(v\equiv w(\overline{A}\mid B)\)This implies that there must be some functional relationship

\[v = S(u) \]

Obviously, if we are to satisfy the fitness condition\((Ⅱ)\): Intuition-like condition, then the following proposition must follow:

Proposition 7 \(S(u)\)be\(0\leqslant u \leqslant 1\)is a continuous monotonically decreasing function with extrema

\[S(0)=1, S(1)=0 \tag{8} \]

But further we will see that it cannot be an arbitrary function with these properties because it must also be associated with the\(AB\)respond in singing\(A\overline{B}\)The multiplication rules are consistent:

\[\begin{aligned} w(AB\mid C) = w(A\mid C)w(B\mid AC),\\ w(A\overline{B}\mid C) = w(A\mid C) w(\overline{B}\mid AC) \end{aligned} \]

commander-in-chief (military)\(v=S(u)\)The relationship of substituting into the above equation gives

\[w(AB\mid C)=w(A\mid C)S(w(\overline{B}\mid AC))=w(A\mid C) S\left[\frac{w(A\overline{B}\mid C)}{w(A\mid C)}\right] \]

Again we apply exchangeability:\(w(AB\mid C)\)with respect to\(A\)respond in singing\(B\)Symmetry and therefore consistency\((Ⅲ\text{a})\): Non-path dependency requirements

\[w(A\mid C) S\left[\frac{w(A\overline{B}\mid C)}{w(A\mid C)}\right] = w(B\mid C) S\left[\frac{w(B\overline{A}\mid C)}{w(B\mid C)}\right] \tag{9} \]

This is true for all propositions\(A, B, C\)are valid. In particular, given any new proposition\(D\)regard as\(\overline{B}=AD\)The above equation certainly holds true at that time. At this point, our previous blogMeditations on Probability Theory: Sensible ReasoningThe following conclusions have been derived in:

\[A\overline{B} = \overline{B}, \quad B\overline{A} = \overline{A} \]

In this way, we can make the following substitution:

\[\begin{aligned} w(A\overline{B}\mid C) = w(\overline{B}\mid C)=S[w(B\mid C)],\\ w(B\overline{A}\mid C) = w(\overline{A}\mid C)=S[w(A\mid C)] \end{aligned} \]

honorific title\(x\equiv w(A\mid C), y=w(B \mid C)\)then there are\(w(A\overline{B}\mid C)=S(y)\), \(w(B\overline{A}\mid C)=S(x)\)The Substitution\((6)\)The following proposition is obtained:

Proposition 8

\[xS[\frac{S(y)}{x}] = y S[\frac{S(x)}{y}],\quad 0\leqslant S(y)\leqslant x,\space 0\leqslant x \leqslant 1\tag{10} \]

(Regarding the domain of definition here, it is because\(S(y)=w(\overline{B}\mid C)=w(AD\mid C)=w(A\mid C) w(D\mid AC)\)but (not)\(w(A\mid C)=x\)and for any proposition\(D\)there are\(0\leqslant w(D\mid AC)\leqslant 1\)former situation\(0\leqslant S(y) \leqslant x\). Note that, due to symmetry, there are equally\(0\leqslant S(x) \leqslant y,\space 0\leqslant y \leqslant 1\)

This suggests that in order to continue to satisfy the multiplication rule, the\(S(x)\)must have the scaling attribute. In the\(y=1\)The special case where it becomes

\[S[S(x)] = x \]

this suggests that\(S(x)\)is a self-inverse function:\(S(x) = S^{-1}(x)\)(i.e., the inverse function is the same as the original function). Therefore, there are\(v=S(u)\)then there must be\(u=S^{-1}(v)=S(v)\). This reflects the obvious fact that the\(A\)respond in singing\(\overline{A}\)The relationship between them is self-reflexive, and it doesn't matter which of the letters and the letter with the horizontal line denotes the original proposition and which denotes the negation of the proposition. We noted this in our last blog defining the negation of a proposition (although it may not have been obvious at the time).

In fact, we have the following proposition (see the original book for a detailed proof):

Proposition 9 Satisfying the above conditions\(S\)(and satisfies\(S(0)=1\)) the only solution is

\[S(x) = (1 - x^m)^{1/m},\quad 0\leqslant x \leqslant 1, \space 0 < m < +\infin \tag{11} \]

In turn, we can verify that Eq.\((11)\)is an equation\((10)\)The solution. Eq.\((11)\)is to satisfy the functional equation\((10)\)and left boundary conditions\(S(0)=1\)of the most general function. We then find that it automatically satisfies the right boundary condition\(S(1)=0\)

Since for the functional equation\((10)\)The derivation of the\(\overline{B}=AD\)of special choices, we have so far only shown that Eq.\((11)\)is to satisfy the general consistency requirement equation\((9)\)The necessary condition for the To check for its sufficiency, place the equation\((11)\)substitution formula\((9)\)We got

\[w^m (A\mid C) - w^m (A\overline{B}\mid C) = w^m (B\mid C) - w^m (B\overline{A}\mid C) \]

The equation can be obtained from the multiplication rule. Thus, we prove that Eq.\((11)\)be\(S(x)\)in equation (math.)\((9)\)Sufficient conditions for consistency in the sense of

Our results so far can be summarized as follows: the combinatorial nature of the logical product requires sensibility\(x = A\mid B\)monotonic function of a function (math.)\(w(x)\)The multiplication rule equation must be obeyed\((4)\). And our resulting equation\((11)\)Point out that this function must also obey the following rules:

Conclusion 2 For positive numbers\(m\)function\(w(x)\)Must be met:

\[w^m(A\mid B) + w^m (\overline{A}\mid B) = 1 \tag{12} \]

(by\(x^m + (1 - x^m)^{\frac{1}{m} \cdot m}=1\)(Get)

We'll call itAddition rule (sum rule)

Of course, the multiplication rule can also be written as

\[w^m(AB\mid C) = w^m(A\mid BC)w^m(B\mid C) = w^m(B\mid AC)w^m(A\mid C) \]

We found\(m\)The value of\(m\)A new function can be defined for whatever value it takes

\[p(x) \equiv w^m (x) \]

and if\(w(x)\)because of\(0\)until (a time)\(1\)A continuous monotonically increasing function between then\(w^m(x)\)must also satisfy that condition. Thus, our rule becomes

1. Rules of multiplication

\[p(AB\mid C) = p(A\mid C)p(B\mid AC) = p(B\mid C)p(A\mid BC) \tag{13} \]

2. Addition rules

\[p(A\mid B) + p(\overline{A}\mid B) = 1 \tag{14} \]

included among these\(p(x)\)is any continuous monotonically increasing function with value range\(0\leqslant p(x) \leqslant 1\)

In addition to the multiplication and addition rules, are more relations needed to obtain a complete set of collegial reasoning rules for determining arbitrary logical functions\(f(A_1, \cdots, A_n)\)What about the fitness of the fitness? In the rules of multiplication and addition, we have got the fitness to take the\(AB\)negative (answer)\(\overline{A}\)The formula for conformity. And since we blogged about it in our lastMeditations on Probability Theory: Sensible ReasoningIt has already been mentioned in that Hopping and Negation are the complete set of operations from which all logical functions can be constructed. Thus, by applying the multiplication and addition rules repeatedly, we can get\(A_1, \cdots, A_n\)The amenability of arbitrary propositions in the generated Boolean algebra.

To demonstrate this, we first seek the logical and\(A + B\)of the formula. Repeatedly applying the multiplication rule and the addition rule, we get

\[\begin{aligned} p(A + B \mid C) &= 1 - p(\overline{A}\space{ }\overline{B} \mid C) \\ & = 1 - p(\overline{A}\mid C)p(\overline{B}\mid \overline{A} C)\\ & = 1 - p(\overline{A}\mid C) \left[ 1 - p(B\mid \overline{A}C)\right]\\ & = p(A\mid C) + p(\overline{A}B\mid C) \\ & = p(A\mid C) + p(B\mid C)p(\overline{A}|BC) \\ & = p(A\mid C) + p(B\mid C)\left[1 - p(A\mid BC)\right] \\ & = p(A\mid C) + p(B\mid C) - p(AB\mid C) \end{aligned} \]

Finally, we have

\[p(A + B \mid C) = p(A\mid C) + p(B\mid C) - p(AB\mid C) \tag{15} \]

We will call this last obtained equationGeneralized sum rule (generalized addition rule). Clearly, the primitive addition rule\((14)\)is the generalized addition rule\((15)\)exist\(B=\overline{A}\)The special case of the time.

We mentioned in our last blog that any logical function other than a mutual contradiction can be represented as a logical sum of elementary meromorphic formulas using the disjunction paradigm (DNF). Now, we know that any basic meromorphic formula\(\{Q_i, 1\leqslant i \leqslant 2^n\}\)\(n\)for the number of propositions) can all be determined by repeated applications of the multiplication rule, so repeated applications of the\((15)\)will produce\(Q_i\)The arbitrary logic and the sensibility of the

Thus, whenever the contextual information is sufficient to determine the fitness of the underlying fitness formula, our rule is sufficient to determine that the\({A_1, \cdots, A_n}\)the amenability of each proposition in the generated Boolean algebra. Thus, just as conjunction and negation are a complete set of operations for deductive logic, the above rules for multiplication and addition are a complete set of rules for conjunctive reasoning.

3 Principle of non-difference (initialization values)

So far, we have obtained multiplication and addition rules that describe the relationship between the direct amenability of different propositions, i.e., the basic rules of the inner workings of the robot's "brain". However, we have not shown how amenability relates to our objective world, i.e.How robots initialize the assignment of sympathetic properties based on contextual informationThe "interface" condition, which has not been used in the condition of reasonableness, must be resorted to. To this end, we must resort to the "interface" condition, which has not yet been used in the condition of reasonableness.\((Ⅲ\text{c})\): Full homosexuality.

In the generalized addition rule\((15)\)Gradually add more propositions to the base of the\(A_3, A_4, A_5, \cdots\)etc., it can be shown by mathematical induction that if we have\(n\)two mutually exclusive propositions\({A_1,\cdots, A_n}\), then the above equation can be generalized:

\[p(A_1 + \cdots + A_m \mid B) = \sum_{i=1}^m p (A_i\mid B), 1\leqslant m \leqslant n\tag{16} \]

Next, we assume that the proposition\({A_1,\cdots, A_n}\)are not only mutually exclusive, but also exhaustive, i.e., the background information dictates that one and only one of them must be true, in which case we have the following proposition:

Proposition 10 (coll.) fail (a student)\(m=n\)when the above summation must be equal to 1:

\[\sum_{i=1}^n p (A_i\mid B) = 1 \tag{17} \]

So far, we have not been able to determine the value of each\(p(A_i\mid B)\). We may intuitively and directly make\(p (A_i\mid B) = \frac{1}{n}\)assertions. Here, however, we need to suppress all intuition and argue from logical analysis.

We now consider a mutually exclusive and exhaustive set of propositions:

\[\{A_1, A_2, \cdots, A_n\} \]

We think of it as\(n\)tagged\(1, 2, \cdots, n\)of the boxes. Now, we arbitrarily disrupt the labels of the boxes to get the renumbered set of boxes:

\[\{A_1^{\prime}, A_2^{\prime}, \cdots, A_n^{\prime}\} \]

Let's set up a labeled\(k\)boxes\(A^{\prime}_k\)Actually corresponds to the original box\(A_i\). Since it is essentially the same box (proposition), then from an objective point of view we stipulate that for robots there must be:

\[p(A_i\mid B) = p(A^{\prime}_k\mid B), \quad i = 1, 2, \cdots, n \]

The above equation we callTransformation equationsFor any information\(B\)All must stand.

But just from the objective point of view of being a "God's point of view", the robot does not know how the labels of the boxes are disrupted, i.e., it does not know anything about the original set of propositions.\(\{A_1, A_2, \cdots, A_n\}\)and the set of propositions after disrupting the labels\(\{A_1^{\prime}, A_2^{\prime}, \cdots, A_n^{\prime}\}\)of knowledge states are identical. And our coherent congeniality condition\((Ⅲ\text{c})\)Requiring robots to be in an equivalent state of knowledge entails conferring the same kind of amenability, which means that it must also be available:

\[p(A_k\mid B) = p(A^{\prime}_k\mid B), \quad k = 1, 2, \cdots, n \]

We call itSymmetry equations (symmetry equations)

classifier for sums of money If you're a physics nonce you should have a pretty good intuition for this equation, and can take the\(B\)understood as a given Hamiltonian quantity for the proposition\(A_k\)\(A^{\prime}_k\)The probabilistic assignment can be understood as the problem of finding the corresponding equilibrium/base state. In the absence of any spontaneous symmetry breaking (i.e., satisfying the sympathetic conditions)\((Ⅲ\text{c})\)(all homogeneous), the final equilibrium state should also have uniqueness, and so it is natural to come to this conclusion of ours.

Joining the transformation equation and the symmetry equation, we have

\[p(A_i\mid B) = p(A_k \mid B) \quad i=1,2,\cdots, n \]

This includes the\(n\)The equations, each of which\(i\)All of them correspond to a certain\(k\)

However, the above is only a specific type of disruption, and we require that these relationships hold for arbitrary labeling disruptions. There are a total of\(n!\)Labeling disrupts the way so there is\(n!\)an equivalent problem. And for a given\(i\)in the above equation\(k\)will actually traverse all other all\(n-1\)subscripts. Therefore, the only possibility, if one wants to satisfy the above equation, is that all the\(p(A_i\mid B)\)Equivalent. Plus\(\{A_1^{\prime}, A_2^{\prime}, \cdots, A_n^{\prime}\}\)is exhaustive, the formula\((17)\)must hold, which leads us to the following conclusion:

Conclusion 3 For the set of propositions\(\{A_1, A_2, \cdots, A_n\}\)The only possibility of initializing an assignment for the fitness of the

\[p(A_i\mid B) = \frac{1}{n}, \quad 1 \leqslant i \leqslant n \tag{18} \]

We have finally obtained a definite value for the set of complexity! We call this resultPrinciple of indifference

Thus, our robot only needs to store in its internal memory circuits\(p_i\)The value of is sufficient. Next, the sympathetic nature\(x\equiv A\mid B\)This concept can then be retired and we don't need to use it anymore. We can do it entirely through the amount of\(p\)to realize our theory of sensible reasoning, which we will callProbability

probability (math.)\(p\)Defines a measure of agreeableness that can bespecified scale. While all possible monotonic functions could in principle serve this purpose well, we choose this particular function (which satisfies the principle of non-difference) not in the belief that it is more accurate, but because it is more convenient. The situation is analogous to that in thermodynamicscalibrate (measure or apparatus)of the situation. All possible empirical temperature scales\(t\)are all monotonic functions of each other, the reason we decided to use the Kelvin temperature scale\(T\), not because it is more accurate than other temperature scales, but because it is more convenient. Thermodynamic theorems have their simplest form in this temperature scale, such as the familiar\(\mathrm{d}U = T\mathrm{d}S - P\mathrm{d}V, \mathrm{d}G = - S\mathrm{d}T + V\mathrm{d}P\)et cetera\(T\)It's all Kelvin warm labels.

classifier for sums of money Previously we had the addition rule:\(p(A\mid C) + p(\overline{A}\mid C) = 1\)and two boundary conditions:\(p(A\mid C)=1\)(If\(A\)(for real),\(p(A\mid C)=0\)(If\(A\)(False) has in fact been completedFirst calibrationThis means that it limits the number of\(p(A\mid C)\)cap (a poem)\(p(\overline{A}\mid C)\)relationship and their respective value domains (i.e.\([0, 1]\)(within the range of the scoring interval). The first calibration can be interpreted so that everyone's conformity scores are the same over the scoring interval. However, even if we have already made a distinction between the\(p\)Qualified, but\(p\)is still an arbitrary function (everyone is different), so we also need toSecond calibration, which is what we have here with the full homogeneity rule:\(P(A_i\mid B) = \frac{1}{n}\).. The second calibration makes each person's score of agreeableness numerically conform to the same standard. In this way, we can convert each person's subjective feelings into a uniform value for comparison. An intuitive understanding of the two calibrations can be seen in the figure below (the black and red curves in the figure can be considered as two different people's amenability scores/probabilities):

Also available immediately from the style\((17)\)Derive another rule from this that matches our intuition. Consider the traditional "Bernoulli's altar" problem in probability theory: an altar with 10 balls of the same size and weight, labeled\(\{1, 2, \cdots, 10\}\)The three of them (labeled\(4, 6, 7\)) are black balls and the other 7 are white balls. We shake the altar and take a ball at random. Eq.\((10)\)Background information in\(B\)Consist of these two statements. What is the probability that we take out a black ball?

Define the proposition:\(A_i \equiv Taking out the ith ball (1\leqslant i \leqslant 10)\). Since all 10 possibilities have the same background information, Eq.\((18)\)applies, the robot assigns the same probability value to these 10 possibilities

\[p(A_i\mid B) = \frac{1}{10}, \quad 1 \leqslant i \leqslant 10 \]

To say "take out a black ball" is to say "take out a ball labeled 4, 6, or 7."

\[p(black ball\mid B) = p(A_4 + A_6 + A_7\mid B) \]

And these are mutually exclusive propositions (i.e., they denote mutually exclusive events), so that Eq.\((16)\)Applicable:

\[p(black ball\mid B) = p(A_4) + p(A_6) + p(A_7) = \frac{3}{10} \]

And this is as intuition tells us. More generally, if there is\(N\)A ball like that. A proposition.\(A\)is defined to be in an arbitrary\(M\)is true on a subset of balls (\(0\leqslant M \leqslant N\)), on its complement as false, we have:

\[p(A\mid B) = \frac{M}{N} \]

This was the original mathematical definition of probability given by James Bernoulli, and it was used by most authors for the next 150 years. For example, Laplace's magnum opus, The Analytic Theory of Probability[3]Begin with this sentence:

\[The probability of an event is the ratio of the number of instances that satisfy the condition to the number of all instances, provided that nothing causes us to expect \\\ that any of these instances will occur more often than others, i.e., that they are equiprobable for us. \]

4 Links to qualitative attributes

Finally, let's look at how the quantitative rule relates to what we saw in the last blogMeditations on Probability Theory: Sensible ReasoningThe qualitative triad mentioned in correlates. First, it is obvious that the\(p(A\mid B)\rightarrow 0\)maybe\(p(A\mid B)\rightarrow 1\)In the limiting case of the additive rule\((14)\)describes the original postulate of Aristotelian logic: if the\(A\)is true, then\(\overline{A}\)Must be false, etc.

In fact, all of this logic consists of the two strong trinomials we mentioned in our last blog and all that is deduced from them. These two strong trinitarian theories viz:

\[\begin{aligned} A \Rightarrow B \\ \underline{\quad \quad \space \space \space Areally}\\ Breally \end{aligned}\quad\quad\quad\quad \begin{aligned} A \Rightarrow B \\ \underline{\quad \quad \space \space \space Bvacation}\\ Avacation \end{aligned} \tag{19} \]

(now using implication markers)\(\Rightarrow\)(to express the major premise)

They have endless inferences. The major premise here is what we've been calling background information (common sense) before, and we use the letters\(C\)to indicate that the

\[C \equiv A \Rightarrow B \]

The two kinds of trinomials, then, are each meant to identify the\(p(B\mid AC)\)respond in singing\(p(A\mid \overline{B}C)\)According to the multiplication rule\((13)\)We can represent them as:

\[p(B\mid AC) = \frac{p(AB\mid C)}{p(A\mid C)}, \quad p(A\mid \overline{B}C) = \frac{p(A\overline{B}\mid C)}{p(\overline{B}\mid C)} \]

Next, according to Eq.\((19)\)favourable environment\(A\Rightarrow B\)We have the logistic equation\(AB=A\)Relationship to variables\(\overline{A} + B = 1, A\overline{B}=0\)(See previous blog for conclusion). And so we have\(p(AB \mid C) = p(A\mid C)\)\(p(A\overline{B})=0\)And so

\[p(B\mid AC) = 1, \quad p(A\mid \overline{B}C) = 0 \]

That's exactly what the trinitarian\((19)\)what is stated. Thus, the relationship is simple:Aristotelian deductive logic is the limiting form of our rules of sensible reasoning as the robot becomes more and more convinced of its conclusions

In addition to this, our rules also contain something that is not found in deductive logic: the quantitative form of weak trinomials that we mentioned in the previous blog. For example, for the first kind of weak trinomials:

\[\begin{aligned} A \Rightarrow B \\ \underline{\quad \quad \space \space \space Breal}\\ ABecoming more sensible \end{aligned}\quad\quad\quad\quad \tag{20} \]

Just write:

\[p(A\mid BC) = p( B \mid AC) \frac{p(A\mid C)}{p(B\mid C)} \]

which is due to\(p(B\mid AC)=1\)but (not)\(p(B\mid C)\leqslant 1\)(the range of values inherent in the probability), so that

\[p(A \mid BC) \geqslant p(A\mid C) \]

And that's right in line with the weak trichotomy\((20)\)Matching.

For the 2nd trichotomy:

\[\begin{aligned} A \Rightarrow B \\ \underline{\quad \quad \space \space \space Avacation}\\ BIt's getting even more unconscionable. \end{aligned}\quad\quad\quad\quad \tag{21} \]

You can write:

\[p(B \mid \overline{A}C) = p(B\mid C)\frac{p(\overline{A} \mid BC)}{p(\overline{A}\mid C)} \]

leave it (to sb)\(p(A \mid BC) \geqslant p(A\mid C)\)Gotta.\(p(\overline{A}\mid BC) \leqslant p (\overline{A}\mid C)\)So.

\[p(B\mid \overline{A}C) \leqslant p(B\mid C) \]

This is also the same as the weak trichotomy\((21)\)Kissing.

Finally, let's look at the triad used in police reasoning (see previous blog)Meditations on Probability Theory: Sensible Reasoning). That is, the proposition\(A\)for "men are bad", the proposition\(B\)for "a man committing the above act".\(C\)background information".\(A\)the true principle\(B\)more plausible" (in the experience of the police, it is almost impossible for a good person to behave in this way, and more plausible for a bad person to behave in this way), then weak trinitarianism is defined as follows:

\[\begin{aligned} Athe true principleBmore sensible \\ \underline{\quad \quad \space \space \space Btrue}\\ A变得more sensible \end{aligned}\quad\quad\quad\quad \tag{22} \]

It can write:

\[p(A \mid BC) = p(A\mid C)\frac{p(B \mid AC)}{p(B\mid C)} \]

And with the background information\(C\)We do.\(p(B\mid AC) > p(B\mid C)\)And so

\[p(A\mid BC) > p(A\mid C) \]

And this is as described in our weak trilogy.

In fact, the introduction of probability\(p\)We can then go beyond the above qualitative description and analyze quantitatively how much the context has changed. In the "Thinking Computers" section of our last blog, we asked "What determines\(A\)The fitness of the data is dramatically increased to almost certainty, and still only improves a negligible bit and makes the data\(B\)Almost irrelevant?" The answer we now give is that because\(p(B\mid AC)\leqslant 1\)So only if\(p(B\mid C)\)Very hourly.\(A\)It is only when the sensibility of the situation increases dramatically. That is, if the police officer often barely sees passersby doing this, then when he sees the man's behavior (\(B\)), it will be almost certain that the man is guilty (\(A\)). In addition, if it is known that\(A\)The only way to make it true is to make\(B\)There is a negligible increase in the conformity of the then-observed\(B\)This, in turn, can only make\(A\)There is an almost negligible increase in the fitness of the

In addition to the few classic weak trinomials we have shown above, there are many weak trinomials that can be represented by the quantitative rules of collegial reasoning described above (see Polya's work)[4]), interested kids can go for further extended reading.

5 Commentary

Subjective and objective

In the theory we have developed, any probabilistic assignment is necessarily "subjective" because it describes only a state of knowledge, not anything that can be measured in a physical experiment (where the state of knowledge is that of a reasoning robot, or of another human being reasoning in accordance with the conditions of fitness). Meanwhile, our interface condition\((Ⅲ\text{b})(Ⅲ\text{c})\)Again, these probabilistic assignments are completely "objective" because they are not related to the personalities of the different users. They are a means of describing (or encoding) the information according to the statement given in the question, independent of the personal feelings (hopes, fears, value judgments, etc.) that you or I may have about the proposition in question. Objectivity in this sense is what is needed to become a respected theory of scientific inference.

Venn diagram
Some readers may ask, "Why don't we use Venn diagrams to explain the generalized addition rule\(p(A + B \mid C) = p(A\mid C) + p(B\mid C) - p(AB\mid C)\)And? It makes its meaning clearer." We believe that there are limitations to the use of Venn diagrams because it requires that the event\(A\)respond in singing\(B\)The area of the corresponding region is additive, that is, it requires that the event\(A\)\(B\)can be decomposed into the disambiguation of some mutually exclusive subpropositions. We imagine that it is possible to decompose\(A\)\(B\)It has been subdivided into the points of the diagram, i.e., the final "basic" propositions\(\omega_i\)(Of course, physicists would refuse to call them "atomic" propositions (#^.. ^#).

However, most of the propositions we reason about, such as\(A\): "It's going to rain today,\(B\): "The roof will leak" are just factual, descriptive statements, and they don't necessarily break down into more basic propositions in specific problem scenarios. Of course, you can introduce something irrelevant to force the decomposition. For example, even if the above defined\(B\)Independent of the penguin, we can also decompose it into the analytic fetch\(B = BC_1 + BC_2 + BC_3 + \cdots + BC_N\)which\(C_k\)Indicates that "the number of penguins in Antarctica is\(k\)". By making\(N\)large enough, we can certainly get a valid Boolean algebra statement, but it's a non-starter and doesn't help us infer propositions about whether or not the roof will leak.

Kolmogorov axiom (math.)

In 1933, Kolmogorov proposed a way of expressing probability theory in the language of set theory and measure theory, formalizing and axiomatizing what we implied by the Venn diagrams mentioned earlier. Indeed, the four axioms of probabilistic measurement that initially seem to have been casually formulated by Kolmogorov in his system (and for which Kolmogorov has been criticized) can all be derived as conclusions that satisfy our consistency conditions. Thus we shall find ourselves in favor of Kolmogorov and against his critics on many technical matters.

However, our probabilistic system is conceptually different from Kolmogorov's system in that we do not use sets to interpret propositions, but instead interpret probability distributions as carriers of incomplete information. Part of the result of this is that our system has analytical resources that are simply not available in Kolmogorov's system, which allows us to articulate and solve a wider range of problems (as will be discussed in later sections).

Frequency and Bayesian

This subsection is my own addition, intended to contrast the Bayesian school (the perspective of this book) with the frequency school for ease of study afterwards:

frequency school Bayesian school of thought
background The initial ideas date back to the 19th actuality and were systematically developed at the beginning of the 20th century. Representatives of this period includeRonald A. Fisher cap (a poem)Jerzy Neyman.. They pushFixed values of parameters based on repeated trialsand make statistical inferences based on it. The origins can be traced back to the 18th centuryThomas Bayes. respond in singingPierre Simon Laplace. They do this by combiningA priori knowledge and observational data nextupdate (math.) an unknown parameter of thebelief
mathematical foundation Kolmogorov's axiomatization system 5 conditions of legitimacy
  • Boolean algebra
Attempts to describe/model the content The event itself in the sample space As an extended logic, human perceptions/knowledge/beliefs about events.
Worldview in a nutshell
  • God's perspective: events themselves are random/the world carries some randomness with it
  • The so-called probability is the nature of the event itself
  • As independently repeated experiments are conducted, one's estimate of the probability value of an event becomes more accurate, but the probability value itself remains constant.
  • An Observer's Perspective: Human Knowledge of the World is Incomplete
  • The so-called probability describes human feelings/perceptions/knowledge/beliefs about the event, i.e., the observer's state of knowledge about the event
  • Probability values are updated and change as a person acquires more information
  • *All things are distributed.
Definition of probability Statistical definitions: Frequency of occurrence in independently repeated tests tends to be extreme\(p\)Classical Probability: The experiment has\(N\)Equivalent outcomes, events\(E\)Contains one of the\(M\)The probability that the number of outcomes\(P(E)=M/N\) a real number (math.), which represent human feelings/perceptions/knowledge/beliefs about events, are calibrated and normalized, and can be compared between different people.
Description of the parameter estimation process
  • There is a fixed truth value for the parameter, the data is random and variable
  • utilizationPoint estimate (a numerical value) + Confidence interval to characterize the results of parameter estimation. The form is\(Estimated value^{+upper limit difference}_{-lower limit difference}\)
  • 95%confidence interval (math.): Multiple repetitions of the experiment, make point estimates and compute confidence intervals, 95% of which will contain (nest) the true value (the true value stays the same and the interval changes)
  • The data are fixed, while the parameters to be estimated are unknown and variable
  • utilizationPosterior distribution (a function) to characterize the results of parameter estimation. HoweverIt is also possible to use the credible interval to simplify the output, for example:\(Probability density maximum/mean/median ^{+upper limit difference}_{-lower limit difference}\)
  • 95%confidence interval (math.): the probability that the parameter falls in this interval is 95% (the interval remains the same as the truth value)
Additional tools to address issues
  • Need for specific tools (ad-hoc devices)
  • Only probabilities need to be discussed, no other tools

    consultation

    • [1] Jaynes E T. Probability theory: The logic of science[M]. Cambridge university press, 2003.
    • [2] Jaynes. Liao, H. R. Translation. Meditations on probability theory [M]. People's Posts and Telecommunications Publishing House, 2024.
    • [3] Laplace P S. Théorie analytique des probabilités[M]. Courcier, 1820.
    • [4] Polya G. Mathematics and Plausible Reasoning: Patterns of plausible inference[M]. Princeton University Press, 1990.