cmu15545 Notes-Query Excution

implementation model
- Iterator Model
- Materialization Model
- Vectoriazation Model
- comparison
Data access methods
- Sequential Scan
- Index Scan
- Multi-Index Scan
Halloween Problem
Expression evaluation

implementation model

The Processing Model defines how a database system executes a query plan.

Iterator Model

Basic idea: use a tree structure to organize operators, and then traverse the execution of the entire tree in the middle order, the final output of the root node is the result of the entire query plan.

Each Operator implements the following functions:

Next()
- Return value: a tuple or EOF.
- Execution process: a loop calls the child node'sNext()function.
Open()cap (a poem)Close(): Similar to constructors and destructors.

The output converges from the bottom to the top (Bottom-To-Top) and supports streaming operations, so it is also called Valcano Model, Pipeline Model.

Materialization Model

Basic idea: instead of returning data one at a time, the operator stores all the data and returns it to the parent node at once.

Compared to Iterator Model, it reduces the function call overhead, but the intermediate results may have to be stored temporarily on disk, and the IO overhead is large.

It is possible to pass some hints down the line, such asLimit, to avoid scanning too much data.

More applicable to OLTP than OLAP.

Vectoriazation Model

Basic idea: the operator returns a batch of data.

Combines the advantages of Iterator Model and Materialization Model to reduce function calls without intermediate results being too large.

SIMD instructions can be used to accelerate the processing of batch data.

comparison

characterization	Iterator Model	Materialization Model	Vectorization Model
Data processing unit	Single record (tuple-at-a-time)	Whole intermediate results (table-at-a-time)	Batch recording (vector/batch-at-a-time)
performances	High overhead and inefficiency of function calls	High Latency, High Memory/I/O Overheads	Low function call overhead and excellent SIMD acceleration performance
Memory Usage	Low memory requirements	High memory requirements	moderate
I/O Overhead	lower (one's head)	your (honorific)	moderate
Cache utilization	differ from	differ from	your (honorific)
sophistication	Simple to implement	moderate	Complexity of implementation
Applicable Scenarios	Small data sets, streaming processing	Complex queries with intermediate result reuse	Scenarios with large datasets requiring high performance computing

Data access methods

There are three main types of data access:

Full Table Scan (Sequential Scan)
Index Scan
Multi-Index Scan (MIS)

Sequential Scan

Optimization means for full table scans:

Data Skipping method:

Only approximate results are needed: sampling estimates.
Precise results: Zone Map

Zone Map basic idea: to reduce the whole to zero, in advance of the aggregation of data pages.

fulfillmentSelect * From table Where val > 600When you do, the following pages can be skipped directly.

Index Scan

How to determine which index to use: data distribution.

Multi-Index Scan

The basic idea: according to the predicate on each index, independently find the data record (Record) that meets the conditions, and then according to the join predicate to perform operations (union, intersection, difference, etc.).

Halloween Problem

For UPDATE statements, you need to keep track of the statements that have been updated, otherwise you will have problems with cascading updates.

<999, Andy> Perform update, go index scan:

Remove Index
Update Tuple, <1099, Andy>
Insert Index
(Binding check)

At this point, if <1099, Andy> is not tagged, he satisfies the Where clause and will be updated again.

Expression evaluation

Basic idea: use tree structure, build expression tree, use middle-order traversal to perform all the evaluation actions, the result of the evaluation of the root node is the final value.

Where in the database is a tree structure used:

B+Tree: storage.

Tree structure + mid-order traversal for value: query plan, expression evaluation.

Optimization: JIT Compilatoin. treats hotspot expression computation nodes as functions and compiles them as inline machine code instead of traversing the nodes each time.