Classical Interval Line Tree Explained: from Principle to Practice

线段树（Segment Tree）是一种非常高效的树形数据结构，用于解决区间查询和修改问题。本文将通过分步骤讲解，带领读者熟练掌握线段树的原理与实现，并探索其应用场景。

引言：数组区间修改问题

线段树要解决这样一个经典问题：比如给定一个数组，频繁地需要进行以下操作：

区间查询：查询数组某一子区间内的最大值、最小值、总和等。假定我们总要反复求某个区间内的元素和。
区间修改：将某个子区间的所有值都进行一次操作，假定我们总要将区间内所有数增加一个固定数值。

暴力来说，我们直接在原数组上求和和修改即可。对于区间查询，我们可以每次直接遍历子区间计算结果；对于区间修改，可以遍历整个子区间进行更新。然而，区间长度最长可以是几乎整个数组长度，这种方法的时间复杂度为 $O(n)$，如果操作频繁且数组较大，效率会变得不可接受。

我们需要一种数据结构能够在单次 $O(\log n)$ 的时间内完成上述操作。线段树应运而生。

线段树的结构与实现

线段树是一种二叉树，用于高效地存储和操作区间信息。

1. 从区间到二叉树

线段树将数组下标空间反复二分划分为多个区间，并使用二叉树存储这些区间的信息：

叶节点：表示数组的单个元素。
内部节点：表示某一子区间的汇总信息（如区间和、最大值等）。

例如，给定数组 [1, 3, 5, 7, 9, 11]，其线段树如下：

              [0, 5]
             /       \
        [0, 2]       [3, 5]
       /     \       /     \
   [0, 1]  (2, 2) [3, 4]  (5, 5)
  /    \          /    \
(0, 0)(1, 1)  (3, 3)(4, 4)

小括号为叶节点，i.e., the value of this element；The center bracket is the extra node that stores the interval information，in this context，Sum of his storage intervals，This value is calculated by the left and right sons。

2. Space complexity and 4-fold arrays

Ordinary arrays take up only 1x the space and do not require extra data, while binary trees of line trees are usually represented as arrays. For arrays of size$n$ The size of the line tree array is usually$4n$This is because:

A line segment tree is a fully binary tree with no more than nodes$2n - 1$。
However, the pre-allocation takes into account the worst case (not a power of 2) will increase the storage requirements, in order to leaf nodes of the left and right nodes of the empty node also still do not need special treatment, in order to simplify the code implementation, we directly assign the array size of$4n$, to ensure that no boundaries are crossed and that practices are developed.

Of course the space complexity is$O(n)$ s, the change is only in the coefficients. You can understand this by comparing it to the array in the above example.

3. Construction of line trees

The following is a code example for constructing a line tree:

#include <iostream>
#include <vector>

using namespace std;

vector<int> tree; // Arrays hold binary trees
vector<int> lazy; // Lazy labeling corresponding to each node of a binary tree，To be used later
vector<int> arr; // Raw data used for tree construction

void build(int node, int start, int end) {
    if (start == end) {
        // leaf node
        tree[node] = arr[start];
    } else {
        // 非leaf node都被继续二分
        int mid = (start + end) / 2;
        int left_child = 2 * node + 1;
        int right_child = 2 * node + 2;

        build(left_child, start, mid);
        build(right_child, mid + 1, end);

        tree[node] = tree[left_child] + tree[right_child]; // The interval sum is the sum of the left and right
    }
}

Interval Queries and Modifications

1. Inter-area queries

Line Tree supports efficient interval queries by partitioning the problem into subintervals. To ask for the sum of all numbers in an interval, you just need to keep splitting the interval in the line tree. The following is the implementation code:

int query(int node, int start, int end, int l, int r) {
    if (r < start || l > end) {
        return 0; // disjointed
    }
    if (l <= start && end <= r) {
        return tree[node]; // fully inclusive of
    }

    // include，then it is left to the left and right subtrees
    int mid = (start + end) / 2;
    int left_child = 2 * node + 1;
    int right_child = 2 * node + 2;

    int left_sum = query(left_child, start, mid, l, r);
    int right_sum = query(right_child, mid + 1, end, l, r);

    return left_sum + right_sum;
}

Suppose we want to modify the interval$[1,4]$, it can be found that the interval is eventually split into several sub-intervals without necessarily always going to the bottom, greatly improving efficiency.

                     [0, 5][36]×
                     /           \
            [0, 2][9]×             [3, 5][27]×
           /           \              /     \
    [0, 1][4]×  (2, 2)[5]√   [3, 4][16]√  (5, 5)[11]
   /       \                 /        \
(0, 0)[1] (1, 1)[3]√      (3, 3)[7] (4, 4)[9]

2. Single-point modification

If you want to modify an element in the array, then just look it up from the top all the way to the bottom, and the bottom node change affects the value of the parent node, and the recursion ends with a recalculation of the sum. We need to update the line tree:

void update(int node, int start, int end, int idx, int val) {
    if (start == end) {
        tree[node] = val;
    } else {
        int mid = (start + end) / 2;
        int left_child = 2 * node + 1;
        int right_child = 2 * node + 2;

        if (idx <= mid) {
            update(left_child, start, mid, idx, val);
        } else {
            update(right_child, mid + 1, end, idx, val);
        }

        tree[node] = tree[left_child] + tree[right_child];
    }
}

仍使用刚才的例子，假定修改下标 1 的值：

                     [0, 5][37]×
                     /           \
            [0, 2][10]×             [3, 5][27]
           /           \              /     \
    [0, 1][5]×  (2, 2)[5]   [3, 4][16]   (5, 5)[11]
   /       \                 /        \
(0, 0)[1] (1, 1)[4]√      (3, 3)[7] (4, 4)[9]

3. 区间修改：懒标记

这是线段树最难的一部分。线段树通过懒标记（Lazy Propagation）来优化区间修改。核心思想: Delayed update, logging the modification operation in an array of tokens and updating only when necessary.

Specifically, if we want to modify the value of a certain interval (e.g., both increase the value of$a$), we still split it into subintervals, and if an interval is fully contained, then instead of recursing downward, we modify only that node and set the lazy marker at that node as$a$, indicating that all my child nodes should add the$a$, but have not yet actually done so. It wasn't until one of the subsequent queries came here that we cleared the lazy marker and pushed it down one level.



void updateRange(int node, int start, int end, int l, int r, int val) {
    if (lazy[node] ! = 0) { // Come to a node, first check for markers, if present then push down one level
        tree[node] += (end - start + 1) * lazy[node];
        if (start ! = end) {
            lazy[2 * node + 1] += lazy[node]; if (start != end) {
            lazy[2 * node + 2] += lazy[node]; }
        }
        lazy[node] = 0; }
    }

    if (r < start || l > end) { // do not intersect at all
        return;
    }

    if (l <= start && end <= r) { // fully contained, then stop here and use lazy marking
        tree[node] += (end - start + 1) * val.
        if (start ! = end) {
            lazy[2 * node + 1] += val; lazy[2 * node + 1] += val; if (start ! = end) {
            lazy[2 * node + 2] += val; if (start ! = end) { lazy[2 * node + 1] += val
        }
        return; }
    }

    // Partial containment is left to the left and right subtrees
    int mid = (start + end) / 2; updateRange(2 * node + 1, start, mid, l, r, val); }
    updateRange(2 * node + 1, start, mid, l, r, val); updateRange(2 * node + 2, mid + 1, end, l, r, val); }
    updateRange(2 * node + 2, mid + 1, end, l, r, val); updateRange(2 * node + 2, mid + 1, end, l, r, val).

    tree[node] = tree[2 * node + 1] + tree[2 * node + 2]; }
}

Continuing with the above example, the$[1,4]$are increased by 1, we find that in$[3,4]$It is labeled at the point and is no longer propagated downwards. As a result, the amount of operations for interval modifications is the same as for interval queries. Without lazy labeling, each modification pushes to the bottom, which is worse than violence. So lazy labeling is a must for line trees, not the icing on the cake.

Although the value of its subtree is incorrect for the moment, accessing the subtree will definitely go through lazy tagging, and when you come here again in any future situation, you will definitely go through lazy tagging and push it down, ensuring that as long as you have accessed the subtree. The result is always correct. As a result, interval queries and single-point modification functions also need to add a push-down marker segment (theif (lazy[node] != 0) (Partial).

The Lazy Marker is like a diligent and lazy janitor of a large field. Whenever fertilizer needs to be applied to certain fields, and if the whole field needs the same fertilizer, he will hang a sign at the gate saying "This field needs fertilizer", but not actually do anything about it for a while. This means that he doesn't have to run down to the field piece by piece, but when someone actually walks into the field, he applies the fertilizer right away and assigns the task to a smaller field on the fly. This saves time and ensures that all the crops in the field are taken care of in a timely manner, and that everyone who walks into the field sees it fertilized. The "laziness" of the Lazy Marker is simply a matter of holding off; his "diligence" is reflected in the fact that he always completes all tasks with precision!

                     [0, 5][40]×
                     /           \
            [0, 2][11]×             [3, 5][29]×
           /           \              /     \
    [0, 1][5]×  (2, 2)[6]√   [3, 4][18|+1]√  (5, 5)[11]
   /       \                 /        \
(0, 0)[1] (1, 1)[4]√      (3, 3)[7] (4, 4)[9]

Why does the line tree decompose into$O(\log n)$ Duan?

We find that each time a query or modification is made, the line tree greatly improves efficiency by bisecting the target interval into several large segments by bisecting the target interval, and long blocks of modifications are always not pushed to the bottom. The reason why a single operation of a line tree is always $ O(\log n) $ is that it reduces the problem size quickly by bisecting the recursion. With each query or modification, the line tree determines whether the intervals are fully contained, fully disjoint, or partially overlapping based on their location:

If the left endpoint of the interval is aligned with the left endpoint of this node, and the right endpoint does not cross the midpoint, then it goes directly to the left subtree, reducing the problem size by half. The reverse is the same.
If the left endpoint of the interval is aligned with the left endpoint of this node, and the right endpoint crosses the midpoint, then the problem is halved in size by querying the entire left subtree or lazy labeling the modifications, and then entering the right subtree. The reverse is the same.

In both cases above, one end of the interval remains aligned after entering the subtree, and the new case is always one of 1 and 2.

If the interval endpoints are not aligned, but are all in one side of the left and right intervals, then it is straightforward to enter and the problem size is still halved. Entering the subtree may be case 3 or 4.
If the interval endpoints are not aligned and span the midpoint of this interval, only in this case will both the left and right subtrees be entered, and the left and right subtrees are always cases 1 and 2, so this happens at most once.

In such a partitioning process, only once will the left and right subtrees be accessed at the same time, and the depth of the recursion is the same as the height of the line tree, which is $ \log n $, so the complexity of each operation is $ O(\log n) $, and the number of nodes accessed is at worst $ 2\log n $ or so. This partitioning mechanism effectively avoids the inefficient operation of traversing the entire array, and is the core reason why line trees are so efficient. You can draw a longer line segment tree yourself to help understand it.

Expanding Knowledge

Pointer version of the line segment tree

在算法思想完全一样的情况下，二叉树也可以使用指针和动态申请空间来实现，指针版线段树动态分配节点内存，适用于稀疏数组。其可以在更新赋值时再创建节点，内存使用效率高，且不需要 4 倍空间，也叫动态开点线段树。适用于大范围稀疏数据。缺点是编程复杂度较高且常数项性能较低。这种方式本文就不展示了。

在本文章的场景下（维护数组），静态数组版本效率高，更为常用。若要维护巨大但稀疏的值域，则指针版本可节省大量空间。

线段树的其他应用

除了区间查询和修改，线段树还能解决以下问题：

区间最值：除了区间和，在线段树节点存储区间最小值或最大值。Their minds are almost identical，Simply change the interval sum to the interval maximum when updating and querying in the branch node。
第 k 小值查询：结合其他算法，可以实现排序和统计信息。
二维线段树：拓展到二维情况。用于处理平面上的区间问题。

线段树擅长处理可分解的区间性质（如求和、最大值、最小值、乘积等），但对于某些非线性性质，它难以处理或效率低下。Some interval problems cannot use line trees，Typical examples areInterval Plurality (Mode) and Interval Median (Median): The plurality cannot be obtained by simply combining the results of two subintervals because it requires global information, i.e., the plurality of subintervals cannot simply be combined into the plurality of the overall interval. The median is similar in that it requires global ordering information within the interval and cannot be solved directly by the partitioning idea of a line tree.