Location>code7788 >text

From self-built to cloud-native: the future and transformation of data management

Popularity:279 ℃/2024-08-13 09:07:59

preamble

Recently, I finished watching a documentary released by the Tencent Cloud communityThe Past and Present Life of China's Database, one of the segments made me chuckle. The documentary mentions that when cloud databases were first emerging, the biggest challenge came not from competitors, but from users who chose to build their own databases on cloud servers.

I have many profound feelings about this documentary. From the journey of exploration of homegrown databases, to my personal experience in the database field, to the advanced technology of today's cloud-native databases, I have a lot of feelings. If you are also interested in this topic, feel free to read my other articles which cover more relevant explorations and insights.

Discussion on the development trend of domestic database:/guoxiaoyu/p/18295131

Personal experience of meeting and getting to know the database:/guoxiaoyu/p/18338820

A look at the history of storage and the future of data:/guoxiaoyu/p/18352499

Moving on from this topic, while I thought it was a bit of a funny statement at the time, it was only after I delved into the differences between cloud databases and self-built solutions that I realized the context of this perception. Since cloud databases were not yet commonplace, many people mistakenly thought that cloud databases were nothing more than building open source MySQL or other databases on cloud servers and then having to pay extra for them. So, they would think, if that's the case, why not just rent a server on the cloud and maintain the database themselves?

We can delve into the advantages and disadvantages of self-built databases and cloud databases. The traditional self-built database model requires companies to build their own hardware facilities and maintain software systems, and as business expands, this model gradually exposes the problems of high cost, lack of scalability and difficulty in maintenance.

image

In contrast, the emergence of cloud databases provides a more flexible, efficient and cost-controlled solution for enterprises. Cloud databases not only simplify the data management process, but also enhance the convenience and efficiency of data processing through automation and intelligent means. As a result, cloud databases have clearly demonstrated greater advantages in meeting the growing data needs of enterprises.

According to the analysis of the Gartner report, the cloud database market is on the rise. in 2023, the global database market size is expected to surpass $100 billion, while the market share of cloud databases is expected to reach 55%. The report also states that AWS's database management system (DBMS) has surpassed the revenue of traditional database vendors such as Oracle in 2022, showing the strong growth momentum of cloud databases.

image

In my opinion, the development of cloud databases is going through two important phases: the first phase is the migration from traditional databases to cloud databases; the second phase is the evolution from cloud databases to cloud-native databases. Currently, distributed cloud-native databases are becoming a mainstream trend in the market, signaling the continuous maturity and progress of cloud database technology.

Traditional self-build

The Internet industry has also experienced rapid growth since the popularity of database technology. The ability to store and process data quickly has greatly contributed to the growth of organizations and businesses. During that period, enterprises had to buy servers and storage equipment, build server rooms, and set up specialized development, network, and database operation and maintenance teams on their own in order to meet growing business needs. This self-built model not only required enterprises to bear high hardware investment and human resource costs, but also faced a series of challenges such as long recruitment cycle and rapid technology updates.

image

In addition, the operation and maintenance risks of self-built databases should not be ignored. Issues such as unexpected power outages, hardware failures, and even human errors may lead to service interruptions, thus affecting the normal operation of the business. These problems make the traditional self-built database model gradually reveal its limitations, paving the way for the rise and popularization of cloud database.

image

In this case, a small-scale enterprise may be able to meet the challenge of building its own database. However, once an organization's product traffic grows significantly, it faces the problem of insufficient server room resources and the challenge of how to implement off-site disaster recovery. Since all development and operations staff are usually centralized in one location, enterprises cannot simply rent a new server room in another region to meet the expansion needs of their business. In this way, organizations need to look for solutions to deal with the various infrastructure issues that come with rapid growth. As a result, we are witnessing the emergence of the so-called "pseudo-cloud" phenomenon.

cloud server

I mention cloud servers rather than cloud databases because, based on the customer pain points mentioned above, we can see that the origins of the two services, cloud servers and cloud databases, were in the 2000s to early 2010s, respectively. At that time, businesses were not growing as fast as they are today, so even when users were aware of the existence of cloud databases, they often viewed them as open-source database offerings built by cloud service providers on cloud servers for their organizations and packaged and sold as such. In this case, cloud databases are often viewed as an add-on to cloud servers rather than a standalone, fully-capable solution.

image

During this period, the pain point for users was mainly that they could build their own databases. Users had the option of just buying a server and installing database software such as MySQL on it. As a result, many of the cloud provider's customers at that time tended to choose the self-build option, i.e., to build their own databases on the cloud servers. This approach not only allows users to choose different configurations and server room locations according to their needs, but also solves most of the problems to some extent, such as the location of the server room, unexpected power outages and hardware failures. For these users, the flexible configuration options and controlled environment settings offered by cloud servers can effectively address their primary needs, so other issues often seem irrelevant.

Cloud Database

The needs of all products need to be referenced to the context of their time. As we enter the 2010s and 2020s, the Internet is growing rapidly and this period is considered by many programmers as the decade of dividends. During this period, with the rapid advancement of Internet products, the database that has been built is faced with the need for automatic expansion and contraction. Because no one can foresee which Internet product will suddenly explode in popularity, once a product traffic surge, the traditional database cluster may be difficult to cope with such a large volume of business. In order to avoid losing customer traffic due to website operation and maintenance downtime, the automatic scaling capability of cloud databases becomes a key solution, thus becoming a highly favored choice in the market. During this period, cloud databases excelled in responding to rapidly changing business needs due to their flexibility and scalability.

image

In the past, data forms and application scenarios were relatively homogeneous, for example, traditional databases were mainly applied in traditional industry fields such as finance, operators and government affairs. However, with the rapid development of the Internet, mobile Internet and industrial Internet, various industries are accelerating the process of electronic and informationization. This acceleration makes the form of application services become diversified, which leads to the form of data and application scenarios in the industry are also increasingly rich. This diversification trend puts forward higher requirements and more challenges to the capability of the underlying database, because the new application scenarios not only need to deal with more complex data types, but also require the database to have stronger scalability, flexibility and efficient performance.

Cloud database ceilings are determined by scheduling

In the past, we usually thought that the performance bottleneck of databases was mainly focused on the capability of optimizers. In those days, programmers' interviews almost always involved SQL tuning. After all, the optimizer's functionality was mainly limited to simple operations such as adjusting the order of fields, and could not really optimize the user's SQL queries in depth. Therefore, although optimizers can improve performance to a certain extent, for complex queries or large-scale data processing, their effect is often limited and cannot fully meet the high-performance requirements.

Today, what is even more critical for cloud databases is the integrated scheduling capability of the database, which not only includes scheduling at the compute layer and scheduling at the data level, but also covers scheduling at the resource level. This means that cloud databases need to be able to effectively manage and optimize the allocation of computing resources to cope with dynamically changing workloads; at the same time, efficient scheduling must be carried out at the data storage and access level to ensure fast processing and accurate access to data; in addition, reasonable scheduling must be carried out at the resource level to ensure effective resource utilization and cost control.

image

Intelligent scheduling of data is crucial in cloud databases. Tencent cloud database, for example, supports the following five layers of scheduling capabilities for its storage and computation:

  1. Calculation type: Includes Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), and Streaming Computing to accommodate different business needs and load types.
  2. storehouse scheduling: Library table scheduling based on data affinity to optimize data storage and access efficiency.
  3. Storage type: Supports a variety of storage methods such as row storage, column storage, in-memory tables and logs to adapt to different data processing needs.
  4. Data hot and cold: Differentiate between cold, warm and hot data for greater storage and access flexibility.
  5. Disaster Tolerance Dispatch: Cross-center and cross-city scheduling based on disaster recovery needs to ensure data reliability and business continuity.

In practical applications, we face various scenarios, such as online transactions, AP processing, streaming computing, etc., which all require effective integration of cloud database technologies to achieve the best balance between performance and cost. Therefore, intelligent scheduling is not only a test of technical ability, but also the key to achieving efficient data processing and resource optimization.

Intelligent Operation and Maintenance

Finally, it should be emphasized that simply optimizing the functions and performance of the database itself is not enough to meet comprehensive business needs, and it is also necessary to promote the optimization and transformation of business processes so as to form a virtuous cycle of database applications. Tencent Cloud Database, through its intelligent operation and maintenance platform, utilizes full-link analysis technology to attach business labels to each SQL statement, thus realizing the enhancement of basic business and the precise optimization of business research and development.

This approach not only enables databases to more accurately identify and deal with problems in real-world business scenarios, but also drives continuous improvement and optimization of business processes.

image

Operations and maintenance is often a challenge for many personnel because of the multitude and complexity of tasks involved, which is often daunting. However, an intuitive, visual interface would greatly simplify O&M management.

image

A comprehensive visualization interface clearly displays system metrics and performance data, including real-time monitoring, alarm management, resource usage, and performance bottleneck analysis. With graphical dashboards and interactive charts, O&M personnel can more quickly identify system health status, potential problems, and trend changes, thus improving troubleshooting efficiency and reducing the occurrence of human error. At the same time, the integrated analysis tools can help O&M personnel gain a deeper understanding of the system's operation and formulate more targeted optimization strategies.

migration practice

Tencent Cloud, for example, currently, two major state-owned banks in China have successfully carried out distributed transformation of their banking core systems based on TDSQL, migrating their core systems, which originally relied on foreign databases such as IBM DB2, to the domestically-made distributed architecture of TDSQL. This initiative marks an important breakthrough for domestic cloud databases in the financial industry, showing the increasing competitiveness and trustworthiness of domestic databases in the global market.

In addition, TDSQL has open sourced and donated OpenTenBase to Open Atom Open Source Foundation, which aims to promote the popularization and development of domestic database technology and encourage more developers to participate in the innovation and ecological construction of domestic database technology. Through the model of open source co-creation, TDSQL hopes to work together with the upstream and downstream industrial chain and developers to build a prosperous database technology ecosystem.

image

Cloud Native Database

Into the 20's, cloud-native technology has gradually become a mainstream trend in the industry. Cloud-native databases, as database systems designed and optimized for cloud environments, demonstrate a remarkable degree of flexibility, scalability, elasticity, and automation by deeply integrating the advantages of cloud-native technologies. This type of database system can take full advantage of the dynamic resource allocation and on-demand scalability of cloud computing to cope with the growing demand for data processing and achieve efficient and reliable data management. Let's take Tencent Cloud TDSQL as an example.

image

What he supports is a high degree of elasticity and flexibility tailored to the needs of enterprises, because in real-world applications, server loads often experience peaks and valleys. Tencent Cloud TDSQL is designed with this volatility in mind, ensuring that the system can automatically adjust resource allocation based on real-time load conditions through superior elasticity and scalability. This ability enables enterprises to obtain sufficient resource support during peak periods and reduce resource usage during trough periods, thus maximizing resource utilization efficiency and reducing operating costs.

To better understand the architecture of TDSQL and its elastic properties, you can refer to the architecture diagram of Tencent Cloud Native Database:

image

Why cloud-native databases are better than cloud databases

Still using Tencent Cloud as an example, the cloud-native database TDSQL-C MySQL edition is superior to the traditional cloud database MySQL in a number of ways, so let's compare.

Characteristics/capabilities Cloud Database MySQL Disadvantages Advantages of TDSQL-C MySQL Edition
Data storage limit Limited by a single physical machine Distributed storage up to 400TB in a single instance
Upgrades and extensions Specification upgrades, adding read-only instances are time-consuming Compute nodes are stateless, with seconds to ramp up, switch, and add read-only nodes
data consistency High cost of achieving strong data consistency Strong three-copy consistency at the data block level
data synchronization Binlog-based Data Synchronization, Master-Backup Latency Issues Master-slave synchronization based on redo, with slave latency down to the ms level
write performance Full log and data page updates cause write performance bottlenecks Write-only redo, lightweight writes, higher write performance (140%)
Disk Expansion - Disk expansion is business-agnostic and business-neutral
Backup/archive rate Slow archiving and longer data recovery times GB/sec backup/recovery rate to solve the problem of slow backups and slow recoveries
computing resource - Serverless capable

Cloud-native databases represent the future trend of database technology, which are designed for cloud computing environments and have features that traditional databases do not have. These databases take full advantage of the elasticity, scalability, and automated operation and maintenance capabilities of cloud platforms, and are able to quickly adapt to changes in business requirements and achieve optimal allocation of resources.

These advantages of cloud-native databases become especially important as enterprise digital transformation accelerates. They support microservice architectures, containerized deployments, and DevOps practices, making application development and operations more efficient and agile. In addition, the high availability and disaster recovery capabilities of cloud-native databases ensure enterprise data security and business continuity. As a result, cloud-native databases not only meet current technological needs, but also lay the foundation for future technological development, becoming a key force in driving enterprise innovation and growth.

summarize

Against the backdrop of evolving data technologies, the rise of cloud databases and the popularity of cloud-native databases mark a significant change in database technology. From the initial self-built database model to today's cloud-native databases, organizations have become richer and more flexible in their data management choices. Cloud database is not just a migration of traditional database technology, but also a comprehensive upgrade and optimization of it.

In retrospect, although the traditional self-built database model provides enterprises with great flexibility and control, its high hardware investment, complex maintenance work and expansion difficulties make many enterprises have to face many challenges. Especially in the face of rapid business growth and data processing needs to dramatically increase the situation, the traditional model is stretched to the limit. The emergence of cloud databases is timely, with its superior automated management, elasticity of expansion and cost-effectiveness, providing a more efficient solution for enterprises.

With the further development of cloud technology, cloud-native databases, as a new generation of database systems in the cloud computing environment, show more excellent performance and adaptability. They not only show a high level of intelligence in resource management and scheduling, but also far exceed traditional databases in terms of data processing flexibility and scalability. Tencent Cloud TDSQL, for example, has advantages in elastic expansion, data consistency and performance optimization, which enable enterprises to handle large-scale data with ease, greatly enhancing business continuity and stability.

In the face of increasingly complex business requirements and technical challenges, the advantages of cloud-native databases are becoming more and more prominent. They not only effectively respond to the high load and diverse demands of modern data processing, but also provide strong support for enterprise digital transformation. From business agility to operational efficiency, from data security to cost control, cloud-native databases excel in every aspect.

To summarize, the development of data technology never stops, and the emergence of cloud databases and cloud-native databases has undoubtedly provided new impetus for enterprise data management and business growth. In the future, with further advances in technology and the continuous expansion of application scenarios, cloud-native databases will continue to play their important role in driving data management technology in the direction of more intelligent, flexible and efficient. Enterprises need to keep up with technology trends and flexibly adjust their strategies in this process of change to ensure that they remain competitive and innovative in the digital era.


I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I'm passionate about the open source community. At the same time, I am also an excellent author of Nuggets, Tencent Cloud Creative Star, Ali Cloud expert blogger, Huawei Cloud Enjoyment Expert and other multiple identities.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟