-
implementation model
- Iterator Model
- Materialization Model
- Vectoriazation Model
- comparison
-
Data access methods
- Sequential Scan
- Index Scan
- Multi-Index Scan
- Halloween Problem
- Expression evaluation
implementation model
The Processing Model defines how a database system executes a query plan.
Iterator Model
Basic idea: use a tree structure to organize operators, and then traverse the execution of the entire tree in the middle order, the final output of the root node is the result of the entire query plan.
Each Operator implements the following functions:
-
Next()
- Return value: a tuple or EOF.
- Execution process: a loop calls the child node's
Next()
function.
-
Open()
cap (a poem)Close()
: Similar to constructors and destructors.
The output converges from the bottom to the top (Bottom-To-Top) and supports streaming operations, so it is also called Valcano Model, Pipeline Model.
Materialization Model
Basic idea: instead of returning data one at a time, the operator stores all the data and returns it to the parent node at once.
Compared to Iterator Model, it reduces the function call overhead, but the intermediate results may have to be stored temporarily on disk, and the IO overhead is large.
It is possible to pass some hints down the line, such asLimit
, to avoid scanning too much data.
More applicable to OLTP than OLAP.
Vectoriazation Model
Basic idea: the operator returns a batch of data.
Combines the advantages of Iterator Model and Materialization Model to reduce function calls without intermediate results being too large.
SIMD instructions can be used to accelerate the processing of batch data.
comparison
characterization | Iterator Model | Materialization Model | Vectorization Model |
---|---|---|---|
Data processing unit | Single record (tuple-at-a-time) | Whole intermediate results (table-at-a-time) | Batch recording (vector/batch-at-a-time) |
performances | High overhead and inefficiency of function calls | High Latency, High Memory/I/O Overheads | Low function call overhead and excellent SIMD acceleration performance |
Memory Usage | Low memory requirements | High memory requirements | moderate |
I/O Overhead | lower (one's head) | your (honorific) | moderate |
Cache utilization | differ from | differ from | your (honorific) |
sophistication | Simple to implement | moderate | Complexity of implementation |
Applicable Scenarios | Small data sets, streaming processing | Complex queries with intermediate result reuse | Scenarios with large datasets requiring high performance computing |
Data access methods
There are three main types of data access:
- Full Table Scan (Sequential Scan)
- Index Scan
- Multi-Index Scan (MIS)
Sequential Scan
Optimization means for full table scans:
Data Skipping method:
- Only approximate results are needed: sampling estimates.
- Precise results: Zone Map
Zone Map basic idea: to reduce the whole to zero, in advance of the aggregation of data pages.
fulfillmentSelect * From table Where val > 600
When you do, the following pages can be skipped directly.
Index Scan
How to determine which index to use: data distribution.
Multi-Index Scan
The basic idea: according to the predicate on each index, independently find the data record (Record) that meets the conditions, and then according to the join predicate to perform operations (union, intersection, difference, etc.).
Halloween Problem
For UPDATE statements, you need to keep track of the statements that have been updated, otherwise you will have problems with cascading updates.
<999, Andy> Perform update, go index scan:
- Remove Index
- Update Tuple, <1099, Andy>
- Insert Index
- (Binding check)
At this point, if <1099, Andy> is not tagged, he satisfies the Where clause and will be updated again.
Expression evaluation
Basic idea: use tree structure, build expression tree, use middle-order traversal to perform all the evaluation actions, the result of the evaluation of the root node is the final value.
Where in the database is a tree structure used:
- B+Tree: storage.
- Tree structure + mid-order traversal for value: query plan, expression evaluation.
Optimization: JIT Compilatoin. treats hotspot expression computation nodes as functions and compiles them as inline machine code instead of traversing the nodes each time.