Database Series: Mainstream library and table middleware introduction (graphic summary)

Database Series: MySQL Slow Query Analysis and Performance Optimization
Database Series: MySQL Index Optimization Summary (Comprehensive Edition)
Database Series: Data Field Changes under High Concurrency
Database Series: Overriding Indexes and Circumventing Table Returns
Database Series: Database High Availability and Lossless Scaling
Database Series: Improving Performance with Highly Distinctive Index Columns
Database Series: Prefix Indexes and Index Length Tradeoffs
Database Series: MySQL Engines MyISAM and InnoDB Comparison
Database Series: Implementing High Concurrency Control under InnoDB
Database Series: 4 Isolation Levels for Transactions
Database Series: Difference between snapshot reads under RR and RC
Database Series: Introduction to MySQL InnoDB Locking Mechanisms
Database Series: What locks are used for different MySQL operations?
Database Series: A Look at the Industry's Mainstream MySQL Data Middleware
Database Series: Paging Performance Issues with Huge Data Tables

1 Introduction

The physical server has limited resources such as CPU, memory, storage devices, number of connections, etc. A large number of connections executing operations at the same time in a certain period of time will cause the database to encounter a performance bottleneck in processing. In order to solve this problem, industry pioneers have fully developed the idea of divide-and-conquer by partitioning the large database tables.
Better control and management is then implemented to provide better performance by using CPU, memory, and storage of multiple machines at the same time. And there are two ways to implement partitioning: vertical splitting and horizontal splitting.

Vertical split (Scale Up) is divided into a vertical library and a vertical table, mainly split by functional modules to solve the competition for resources between individual libraries or individual tables. For example, it is divided into order library, product library, user library... In this way, the table structure is different between multiple databases.
Horizontal split (Scale Out) is subdivided into library table and library table.The table structure in these databases is exactly the same, to address the stress that occurs when the amount of data in a single table grows.

For more details you can refer to these two articles by the author:.
《MySQL Total Disintegration 28: Split Library, Split Table》
《MySQL Comprehensive Disassembly 29: Partition Functions for Partitioning Tables and Libraries》

2 Common mainstream middleware introduction

Since it has been realized in the library and table solution, it is necessary to have a convenient component to support the management.
Split library and table middleware is a key component of database architecture used to solve problems such as high concurrency and large data volumes. These middleware improve the performance and scalability of databases through data sharding, routing, load balancing, and other features. The following is an introduction to some common split-repository and split-table middleware:

2.1 ShardingSphere

summarize：

ShardingSphere is an open source distributed database middleware , provides a library and table , read-write separation , distributed transactions and other functions .
It supports a wide range of databases, such as MySQL, PostgreSQL, Oracle, SQL Server, etc., and can be seamlessly integrated with existing database systems.

Architecture and Principles：

ShardingSphere consists of three main components: Sharding-JDBC, Sharding-Proxy and Sharding-Sidecar.

Sharding-JDBC：The module used to realize the function of branching library and table, it can realize transparent branching library and table operation through simple configuration at the application layer.
Sharding-Proxy：Module used to implement the database proxy function, which can route database requests to different database nodes to achieve read-write separation and load balancing.
Sharding-Sidecar (planned):Positioned as a cloud-native database proxy for Kubernetes, it proxies all access to the database as a Sidecar.

The principle is to realize split database and split table by data slicing and routing. Data sharding is to divide the data into multiple segments, each of which is stored in a different database instance or data table; routing is to route requests to the corresponding database instance or data table according to the data sharding rules.

Advantages and Scenarios：

Flexible scalability: support for horizontal and vertical expansion, you can flexibly adjust the size and performance of the database according to business needs.
High Availability: Supports master-slave replication and multi-live architecture to provide highly available database access and data protection.
Simplifies development and maintenance: Provides easy-to-use interfaces and configurations that can reduce developer workload and maintenance costs.
It is suitable for scenarios such as high concurrent access, large data volume storage, and cross-region deployment.

2.2 MyCAT

summarize：

MyCAT is an open source distributed database middleware , written in Java-based , support for the MySQL protocol , can be used as a proxy server for MySQL .
It supports functions such as separate libraries and tables , read-write separation , global serial number , and has cross-language , cross-platform , cross-database versatility .

Architecture and Principles：

MyCAT uses a proxy model for database routing and sharding.
It consists of two main parts, MyCAT-Server and MyCAT-DataNode. MyCAT-Server is used to receive database requests from clients and route the requests to different database nodes; MyCAT-DataNode is used as the database node that actually stores the data.

Advantages and Scenarios：

Easy to deploy and use: transparent to the project, if you encounter upgrades and other operations, just need to be carried out at the middleware level.
Applicable to the management and scaling problems of large-scale MySQL clusters.
However, MyCAT's SQL support is relatively weak and may require some rewriting and optimization of SQL statements.

2.3 Vitess

summarize：

Vitess is an open source distributed database middleware developed by YouTube , mainly used to solve large-scale MySQL cluster management and scaling problems .

Architecture and Principles：

Vitess provides data sharding, read/write separation, horizontal scaling, and has strong load balancing and fault recovery capabilities.
It implements access control and load balancing of databases through vtgate (Vitess' query router).

Advantages and Scenarios：

Suitable for large-scale MySQL cluster scenarios with powerful horizontal scaling and load balancing features.
However, Vitess has weak support for non-MySQL databases and may not be applicable to other types of database systems.

2.4 Introduction to other middleware

In addition to the above three common split library and split table middleware, there are other middleware such as Cobar (which has been gradually eliminated), TDDL (Taobao Distributed Data Layer), Atlas (Qihoo 360 open source) and so on. These middleware in a specific historical period or specific business scenarios have a certain application value, but with the development of technology and market changes, the scope of their use and influence gradually weakened.

3 Summary

When choosing a middleware for database and table distribution, you need to consider the specific business needs, technology stack, performance requirements, etc. ShardingSphere, MyCAT, and Vitess are the more popular and mature middleware choices, and each of them has different advantages and applicable scenarios. At the same time, we also need to pay attention to the development of middleware and community support, so that we can get better support in the subsequent technical upgrades and maintenance.