0. Preface
I don't know how this kind of thing was invented. It feels very nb.
But it should be simpler than KMP.
oi-wiki
1. Concept
SAM, i.e. the suffix automaton, is for a string\(s\)It means DFA that can represent all its substrings/suffixes. In fact, SAM is a DAG with several points representing several states, including an initial state\(t_0\), These states may represent more than one string, and the edges on the SAM represent several transitions, each transition is a letter connected from one state to another. Each item on the SAM\(t_0\)The string composed of all transfers on the path as the starting point is actually the original string\(s\)A substring of , this substring belongs to the set of strings represented by the end point of this path.
2. Mysterious nature
right collection
First we define: for strings\(s\)a non-empty substring of\(s_0\), its right collection is\(s_0\)exist\(s\)All end positions appearing in
Obviously, the following properties can be obtained:
-
A state on SAM corresponds to several substrings of the same right set. If these substrings are sorted from small to large in length, their lengths are continuous.
-
\( \begin{cases} right(x) \subseteq right(y) & \text{if y is a suffix of x.}\\ right(x) \cap right(y) = \phi & \text{otherwise.} \end{cases} \)
-
If substring\(x\)and\(y\)The right set of the same, so the shorter one must be the longer suffix.
Suffix link fa
According to the nature of the right collection, we can build Parent Tree from the DAG of SAM.
For status\(X\), its suffix link is to connect to the state to which the longest suffix belongs to the right collection\(Y\)superior.
make\(t\)In status\(X\)The longest string in, now we know the string\(t\)The first several suffixes of the right collection and\(t\)The same, and the right collection with several suffixes does not match\(t\)The same, the longest suffix that satisfies the latter is\(v\),but\(right(t) \subseteq right(v)\)。
This is equivalent to constantly\(t\)The letters are deleted at the beginning of the string, and the length of the string will become shorter and shorter. When a certain suffix is deleted\(v\)When I found that it was not only\(t\)These places that appear have appeared, and are still in the original string\(s\)There have been other places. Therefore, its right collection will become larger and include\(right(t)\)elements in.
So\(X\)The suffix link will be connected to\(v\)Status\(Y\)superior.
-
All the suffix links will form a\(t_0\)The tree is the root node.
-
A tree constructed by a right set (the right set of each child node is contained in the right set of the parent node) is the same as a tree constructed by a suffix link.
-
When maintaining SAM, the maximum length of the substring represented by the status is usually recorded.\(len_u\), It can be found that on Parent Tree, a state represents the length range of the substring should be\([len_{fa_u} + 1,len_u]\)。
The above properties are obvious. The automaton we built is actually DAG + Parent Tree.
3. Construct SAM
Make a state\(u\)The longest substring in the substring is\(longest(u)\)。
The current string is\(s\), the newly added character is\(c\), the newly added status is\(u\). The construction process is as follows:
From the last time\(s\)The status of\(p\)Start jumping up the suffix link until\(p\)There is a one on the DAG\(c\)transfer;
If not found, then we add a new letter.\(fa_u = t_0\)Just do it. If found, let the transferred status be\(q\):
- if\(len_q = len_p + 1\),prove\(q\)Only represent\(longest(p) + c\)This substring will be directly\(u\)The suffix links to\(q\);
- if\(len_q > len_p + 1\),prove\(q\)More than just representatives\(longest(p) + c\)This substring, we need to\(q\)Take it apart, break it into one only means\(longest(p) + c\)The state of this substring\(now\)and the rest\(q'\),Will\(now\)The suffix link connects to\(p\)and will\(q'\)and\(u\)The suffix link connects to\(now\), then we continue to use\(p\)Jump upward through the suffix link, if there is a state that is promising\(c\)The transfer points to the original\(q\), point it to\(now\)。
This is because we\(s\)The state starts to jump to the suffix link, and the points reached are\(s\)suffixes, when these suffixes exist as\(c\)When transferring, the state that the transfer reaches must be\(s + c\)a suffix of the longest right collection with\(right(u)\)Different suffixes.
The SAM is constructed.
4. Good application
oi-wiki is already detailed enough.