Location>code7788 >text

Quickly understand the problem of LCS longest common subsequence

Popularity:781 ℃/2025-03-13 00:43:01

In the classic problems of algorithms and data structures, the Longest Common Subsequence (LCS) problem occupies an important position. Given two sequences, we need to find their longest common subsequence, which requires keeping the order of the original sequence elements but not continuity. The LCS problem has a wide application in the fields of text comparison and gene sequence alignment in bioinformatics.

For example, for the sequence X = "ABCBDAB" and the sequence Y = "BDCABB", their longest common subsequence is "BCAB" and its length is 4.

The standard solution to LCS can be solved in relatively efficient time through dynamic programming, but in some special situations, we can further optimize its efficiency through clever transformation, especially when elements in one of the sequences are not repeated.

Dynamic planning

Dynamic programming is the standard method to solve LCS problems. Suppose the lengths of the sequence $X $ and the sequence $Y $ are $ m $ and $ n $ respectively. We define the two-dimensional array $ dp[i][j] $ as the longest common subsequence length of the first $ i $ elements of $ X $ and the first $ j $ elements of $ Y $, then the state transition equation is:

  • If $ X[i-1] = Y[j-1] $, then $ dp[i][j] = dp[i-1][j-1] + 1 $;
  • It can also be taken from the previous state, $ dp[i][j] = \max(dp[i-1][j], dp[i][j-1]) $.

Each $ dp[i][j] $ can be directly derived, the time complexity\(O(mn)\)

Each $ dp[i][j] $ is only deduced by $ dp[i-1]$ , reducing the space complexity to $ O(\min(m,n)) $ by scrolling or alternating arrays. When implementing the specific implementation, only the information of the current and previous row (or column) is maintained.

int LCS(const string &X, const string &Y) {
    int m = (), n = ();
    vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (X[i - 1] == Y[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }
    return dp[m][n];
}

LCS issues without duplicate elements

When there are additional constraints in the problem, we often find the possibility of optimization. Specifically, when the elements in the sequence $X$ are unique, that is, when each element occurs at most once, the LCS problem can be converted into a Longest Increase Subsequence (LIS) problem, thereby achieving efficient solution.

The specific transformation process is:

  1. Create a mapping of elements in the sequence $Y $ to their index positions;
  2. Use the element in the sequence $X $ to check whether the element exists in $Y $. If it exists, it will be replaced by its position in $Y $;
  3. Form a new sequence $Z $, each element in the sequence $Z $ is the index position in the sequence $Y $.

The problem after conversion becomes a LIS looking for sequence $Z $. The reason why this transformation is true is due to the following two core facts:

  • Order preservation: The original LCS requires sequence matching, which is consistent with the strict incremental characteristics of the element position index in the converted sequence $Z$.
  • Uniqueness guarantees mapping single radioactivity: Since the sequence $X$ appears, it must only appear in a unique position. The uniqueness of elements in the mapping process will not be ambiguity of element positions, ensuring that the new sequence $Z$ can accurately represent the sequential relationship between $X$ and $Y$.

Since elements in $Y $ appear in $X $, they must only appear in a unique position, so $Y $ can be converted into an index to find the "incremental sequence of indexes". An incremental index sequence represents a common subsequence.

Therefore, this transformation optimizes the originally naive LCS problem to the LIS problem of $O(m \log m) $, which greatly improves the algorithm efficiency in specific scenarios. The code implementation of LIS can be used to refer to previous articles.

int main() {
    int n;
    cin >> n;

    vector<int> a(n), tail;

    unordered_map<int, int> mp;
    (n);
	
    for (int i = 0; i < n; i++) {
        Cin >> a[i];
        mp[a[i]] = i;
    }
    for (int i = 0; i < n; i++) {
        int x;
        Cin >> x;
        if (!(x)) continue;
        
        x = mp[x];

        auto it = lower_bound((), (), x);
        if (it == ()) {
            tail.push_back(x);
        } else {
            *it = x;
        }
    } printf("%d\n", ());
}