Location>code7788 >text

cmu15545 Notes-Query Excution

Popularity:864 ℃/2024-11-18 12:24:24

catalogs
  • implementation model
    • Iterator Model
    • Materialization Model
    • Vectoriazation Model
    • comparison
  • Data access methods
    • Sequential Scan
    • Index Scan
    • Multi-Index Scan
  • Halloween Problem
  • Expression evaluation

implementation model

The Processing Model defines how a database system executes a query plan.

Iterator Model

Basic idea: use a tree structure to organize operators, and then traverse the execution of the entire tree in the middle order, the final output of the root node is the result of the entire query plan.

Each Operator implements the following functions:

  • Next()
    • Return value: a tuple or EOF.
    • Execution process: a loop calls the child node'sNext()function.
  • Open()cap (a poem)Close(): Similar to constructors and destructors.

image-20241118105113714

The output converges from the bottom to the top (Bottom-To-Top) and supports streaming operations, so it is also called Valcano Model, Pipeline Model.

Materialization Model

Basic idea: instead of returning data one at a time, the operator stores all the data and returns it to the parent node at once.

Compared to Iterator Model, it reduces the function call overhead, but the intermediate results may have to be stored temporarily on disk, and the IO overhead is large.

image-20241118105733041

It is possible to pass some hints down the line, such asLimit, to avoid scanning too much data.

More applicable to OLTP than OLAP.

Vectoriazation Model

Basic idea: the operator returns a batch of data.

Combines the advantages of Iterator Model and Materialization Model to reduce function calls without intermediate results being too large.

SIMD instructions can be used to accelerate the processing of batch data.

image-20241118110928540

comparison

characterization Iterator Model Materialization Model Vectorization Model
Data processing unit Single record (tuple-at-a-time) Whole intermediate results (table-at-a-time) Batch recording (vector/batch-at-a-time)
performances High overhead and inefficiency of function calls High Latency, High Memory/I/O Overheads Low function call overhead and excellent SIMD acceleration performance
Memory Usage Low memory requirements High memory requirements moderate
I/O Overhead lower (one's head) your (honorific) moderate
Cache utilization differ from differ from your (honorific)
sophistication Simple to implement moderate Complexity of implementation
Applicable Scenarios Small data sets, streaming processing Complex queries with intermediate result reuse Scenarios with large datasets requiring high performance computing

Data access methods

There are three main types of data access:

  1. Full Table Scan (Sequential Scan)
  2. Index Scan
  3. Multi-Index Scan (MIS)

Sequential Scan

Optimization means for full table scans:

image-20241118113337122

Data Skipping method:

  1. Only approximate results are needed: sampling estimates.
  2. Precise results: Zone Map

image-20241118113508953

Zone Map basic idea: to reduce the whole to zero, in advance of the aggregation of data pages.

fulfillmentSelect * From table Where val > 600When you do, the following pages can be skipped directly.

image-20241118113722074

Index Scan

How to determine which index to use: data distribution.

image-20241118114047331

Multi-Index Scan

The basic idea: according to the predicate on each index, independently find the data record (Record) that meets the conditions, and then according to the join predicate to perform operations (union, intersection, difference, etc.).

image-20241118114343292

Halloween Problem

For UPDATE statements, you need to keep track of the statements that have been updated, otherwise you will have problems with cascading updates.

image-20241118114850271

<999, Andy> Perform update, go index scan:

  1. Remove Index
  2. Update Tuple, <1099, Andy>
  3. Insert Index
  4. (Binding check)

At this point, if <1099, Andy> is not tagged, he satisfies the Where clause and will be updated again.

Expression evaluation

Basic idea: use tree structure, build expression tree, use middle-order traversal to perform all the evaluation actions, the result of the evaluation of the root node is the final value.

image-20241118115507962

Where in the database is a tree structure used:

  • B+Tree: storage.
  • Tree structure + mid-order traversal for value: query plan, expression evaluation.

Optimization: JIT Compilatoin. treats hotspot expression computation nodes as functions and compiles them as inline machine code instead of traversing the nodes each time.

image-20241118120356183