Location>code7788 >text

cmu15545 - Data Access Method: B+Tree (B+Tree)

Popularity:486 ℃/2024-11-13 17:51:26

catalogs
  • basic concept
    • Disk-based B+ trees
    • Queries and Indexes
  • Design Choices
    • Node Size
    • Merge Thredshold
    • Variable-length Keys
    • Intra-Node Search (INS)
  • Optimization tools
    • Pointer Swizzling
    • Bε-trees
    • Bulk Insert
    • Prefix Compression
    • Deduplication
    • Suffix Truncation

basic concept

Disk-based B+ trees

Why use B+ numbers for data access (Access Method):

image-20241109143725399

  • Naturally organized, supports range finding
  • Supports traversing all data, utilizing sequential IO
  • The time complexity is\(O(logn)\)Meet Performance Needs
  • Compared to B-tree, data accesses are at leaf nodes: high disk space utilization; fewer concurrent conflicts

A basic B+ tree:

  • Three types of borrowing points: root nodes, intermediate nodes, and leaf nodes
  • Data distribution: root and intermediate nodes store only indexes, leaf nodes store data
  • Pointer relationships: parent-child pointers, brother pointers

image-20241113104944036

Disk-based B+ tree mapping:

A node is stored in a Heap File page; the PageId acts in place of a pointer.

  • key-value union storage

    image-20241113105746186

  • The keys are stored separately

    image-20241113105801713

The leaf nodes of a B+ tree store the actual data, and how this data is interpreted depends on different database implementations: some store RecordIDs, and some databases based on Index-Organized Storage (IOS) store tuple (tuple) data directly.

image-20241113110032259

If you don't know about RecordID, the way the data is organized, see theThis blog post

Queries and Indexes

leftmost prefix matching

With union index<a,b,c>The following query conditions are supported

  • (a=1 AND b=2 AND c=3)
  • (a=1 AND b=2)

image-20241113111703944

If all do not satisfy the leftmost prefix match principle, a full table scan is required.

How to handle duplicate keys

  • Add RecordID to make it a unique key

    image-20241113111945385

  • Leaf node overflow (not used in actual systems)

    image-20241113111956652

clustered index

  • A table can only have one clustered index
  • Index keys and values are stored together
  • The data is sorted by the key of the index
  • Synchronize indexes when manipulating data

Clustered indexes are non-essential, depending on the database specific implementation, data in Mysql and SQLite is organized directly with clustered indexes.

Implementing clustered indexes with B+ trees makes it easy to implement range queries and facilitates the full utilization of sequential IO.

image-20241113112518741

For non-clustered indexes, although the keys of the index are ordered, the corresponding data are not necessarily stored in order on disk, so a very effective way is to get the PageID first, then sort according to the PageID, and finally get the data to take full advantage of sequential IO.
image-20241113112630957

Design Choices

Node Size

The slower the storage device reads data and the more it needs to utilize sequential IO, the larger the node will be;

The faster the storage device reads data and the more redundant data reads need to be reduced, the smaller the node will be.

  • HDD:~1MB
  • SSD:~10KB
  • In-Memory:~512B

Merge Thredshold

When the number of keys in a node is less than half-full, the merge is not performed immediately, but small nodes are allowed to exist before the whole tree is periodically rebuilt.

PostgreSQL calls it an unbalanced B+ tree ("non-balanced" B+Tree, nbtree).

Variable-length Keys

  • Pointer: key stores a pointer to the actual data [can't utilize sequential IO because you have to jump to read the contents of the pointer]
  • variable length node (math.)
  • Padding data (Padding)

Index data in the actual system and the heap file data, is able to store the node on the node in the store, is in the store does not store pointers.

Intra-Node Search (INS)

  • Linear lookup: due to the existence of the SIMD instruction set, the actual sequential lookup, which can actually be batch processing

    image
  • binary search

  • Interpolation: when the key has no gap (self-incrementing), the offset can be calculated directly

    image

Optimization tools

Pointer Swizzling

The basic idea: when an object is loaded from disk into memory, its disk address is converted to a memory address (swizzling) in order for the program to be accessed directly in memory by the pointer.

Example: For example, the root node of a B+ tree indexed by the primary key will be pinned and not replaced by the Buffer Pool when it is read, so at this point you can access the root node directly with a memory pointer, omitting the step of asking the Buffer Pool for the memory address using the PageID.

Illustration:

imageimage

Bε-trees

A B+-tree writing optimization.

Basic idea: Instead of modifying data directly when updating, logging is done (similar to log-structured data storage).

The log is recorded on the node, and when the node log is full, the log for that node is pushed down to the child node.

image-20241113133509231

image-20241113133530151

Bulk Insert

Basic idea: create a B+ tree from bottom to top, not top to bottom.

Reduces structural changes in the tree during insertion, provided that the data needs to be pre-sorted.

Keys: 3, 7, 9, 13, 6, 1
Sorted Keys: 1, 3, 6, 7, 9, 13

image-20241113133953890

image-20241113134001752

Prefix Compression

Basic idea: dictionary ordering compresses prefixes.

Deduplication

Basic idea: avoid storing the same key repeatedly in non-unique indexes.

Suffix Truncation

The basic idea: the intermediate nodes just act as guides, so it is sufficient to store the smallest prefix that is recognizable.

imageimage