Demystifying Karmada's Multi-Cloud Container Orchestration Technology to Accelerate Distributed Cloud Native Application Upgrades

This article was shared fromSource:The fifth special issue of Huawei Cloud DTSE on Open SourceAuthor: Ren Hongcai Senior Software Engineer, Huawei Cloud, Karmada Community Maintainer.

Managing and orchestrating containerized applications across multiple cloud platforms is one of the complexity challenges facing enterprises today. Karmada's multi-cloud container orchestration technology enables users to easily manage multiple clusters as if they were operating a single cluster, simplifying the complexity of operations and maintenance in multi-cloud environments and accelerating the upgrade of distributed cloud-native applications.

Industry Background

With the rapid development of cloud computing technology and the increasingly diverse needs of enterprises for cloud infrastructure, a multi-cloud strategy has become the preferred choice for many organizations. A multi-cloud environment not only improves business agility and availability, but also effectively reduces the risk of dependence on a single cloud provider. According to a recent survey report, more than 87% of enterprises are using services from multiple cloud vendors, however, with this comes the complex challenge of managing and orchestrating containerized applications across multiple cloud platforms.

The industry's popular container orchestration tool, Kubernetes (K8s for short), has demonstrated strong resource management and automated deployment capabilities within a single cluster, but in the face of multi-cloud scenarios, its cross-cluster resource scheduling, unified management, and data consistency have become pain points that need to be resolved.

At this stage, the orchestration of cloud-native multi-cloud multi-cluster operations faces many challenges:

Duplication of work with many clusters: Ops engineers need to deal with cumbersome cluster configurations, management differences between clusters of different cloud vendors, and fragmented API access portals.
Maintenance challenges of over-dispersed services: cumbersome differentiated configuration of applications across clusters; difficult to manage cross-cloud access of services and synchronization of applications across clusters.
Boundary constraints of clusters: application availability is limited by clusters; resource scheduling, elastic scaling is limited by clusters.
Vendor binding: sticky issues with business deployments, lack of automated failover; lack of neutral open source multi-cloud container orchestration projects.

Karmada multi-cloud container orchestration engine.Simplify the complexity of managing multi-cloud environments

To address the above challenges, Huawei officially launched the open source project Karmada in 2021, aiming to create a cloud-native multi-cloud container orchestration platform.Karmada (Kubernetes Armada, meaning fleet) inherits and surpasses the design concepts of the community's Federation v1 and v2 (kubefed), and instead of simply replicating resources across different Instead of simply replicating resources across clusters, it realizes seamless deployment and management of distributed workloads in multi-cloud environments by means of a new set of APIs and control plane components, while keeping the original resource definition APIs of Kubernetes unchanged.

Karmada provides a global control panel that enables users to manage Kubernetes clusters on multiple clouds as if they were operating a single cluster, simplifying the O&M complexity of multi-cloud environments and introducing advanced cross-cluster scheduling policies that automatically optimize the deployment of workloads to the most appropriate cloud platform or region based on resource requirements, cost, compliance, and other factors. The distributed data management and synchronization mechanism ensures data and configuration consistency across multiple clouds, reducing the complexity of data management.

Practical Example: Karmada in Industrial Intelligent Inspection

The field of industrial intelligent inspection urgently needs to be standardized intelligent inspection to improve efficiency

In the field of LCD panel production, products are often defective due to a variety of factors. For this reason, Automatic Optical Inspection (AOI) equipment has been introduced after key process nodes to detect common defects through optical principles. However, existing AOI equipment only recognizes defects and requires manual classification and identification of defects, a time-consuming process that affects productivity. Numero Uno's customer companies.A leading panel companyThe company introduced an Automatic Defect Classification (ADC) system to improve determination accuracy and reduce labor intensity, using deep learning techniques to automatically classify defect images output from AOI and screen out misclassifications, thereby improving productivity.

The client company took the lead in introducing ADC in one plant and subsequently rolled it out to other plants to save human resources and improve determination efficiency. Nevertheless, due to the complexity of the process and differences in suppliers, the on-site construction shows a tendency of fragmentation and decentralized management, which brings difficulties in data sharing and operation and maintenance. To solve these problems, the client company initiated the construction of an industrial intelligent inspection platform, which utilizes AI technology to achieve standardized intelligent inspection and improve production efficiency and yield rate.

Industrial Intelligent Inspection Platform

The industrial intelligent inspection platform takes ADC as the core and extends it to model training and inspection review, realizing an integrated solution of "cloud" (management + training) + "edge" (reasoning) + "end" (business). The integrated solution aims to improve production quality and data value through a standardized platform. The scope of construction includes resource sharing center, on-site training and side reasoning sub-platforms, which will be implemented in several plants.

Industrial Intelligent Inspection Platform Architecture Diagram

The goal of the project is to achieve on-line, resource sharing and cloud-side standardization of on-site ADCs to reduce O&M load and improve standards. The Industrial Intelligent Inspection Platform aims to provide samples and templates for subsequent ADC construction by standardizing and standardizing ADC systems across the entire group of customer enterprises, reducing costs and cycle time, and improving production and QC efficiency as well as product yield. The platform includes user roles such as system administrators and resource allocators, and involves information flows such as ADC inference, model training, data sharing, and cloud collaboration functions to ensure the production process of automated defect classification for ADC and to improve the utilization of models and defect images.

Building Solutions with Karmada Multi-Cluster Management

I. Cluster management: harmonizing the management of multi-geographic clusters

K8s clusters in different geographic locations are registered to the central cloud system, which manages the clusters in multiple locations.

II. Application management: globally harmonized deployment and monitoring

With Karmada's cluster unified access capability, users can realize functions such as visualization screens in the central cloud that require aggregation of data from member clusters.

1、Cluster monitoring

For online clusters, the center cloud system can display monitoring data for indicators such as memory, CPU, disk, network inflow and outflow rates, GPU, logs, etc., and switch clusters to view the data.

Resource Monitoring

The center cloud can see the same monitoring as the training cloud, with PromQL wrapped by the cluster's Java program through the Karmada Aggregation Layer API and served to the front-end pages.

2. Center cloud data issuance

The data uploaded by the user in the center cloud can be freely chosen to be downloaded to the specified present location, including datasets, annotations, arithmetic projects, arithmetic mirrors, and models.

Datasets, arithmetic projects, and models, usually files, are saved to storage such as local or NAS after the transfer is complete. Annotations, usually structured data, are saved to a DB after the transfer is complete. Operator images, typically exported as tarballs, which are pushed to the current cluster's harbor after the transfer is complete. The Center Cloud comes with its own business K8s cluster in addition to Karmada's control plane, which also includes storage, and thus can act as a staging area. All of the above are realized through Karmada's aggregation layer API to call the file upload to svc that we provide from cluster to cluster.

3. Cross-site training

In case of insufficient training resources in one location, cross-location training can be carried out by applying for resources in other locations. This function is realized by sending the datasets, annotations, operator engineering, operator mirroring and other data required for training in location A to location B, and then train the model through the resources in location B. The trained model is then returned to location A. The model is then trained in location B. The model is then trained in location A. The trained model is then returned to the A site.

The principle is similar to the central cloud data distribution, where the data required for a task is sent directly to the corresponding cluster, reflecting the invocation relationship between member clusters and member clusters.

4、Visualization of large screen

According to the current locations registered in the center cloud, statistics of various types of indicator data from different current locations are displayed in the big screen. When displaying real-time data in this kind of big screen, through the Karmada aggregation layer API, we can conveniently call the svc of the member clusters directly, without having to make all the data displays go through the offline analysis and real-time analysis of the big data, which provides higher timeliness.

Summarizing the outlook

The Karmada project has achieved significant growth and recognition since it was open sourced and joined the Cloud Native Computing Foundation (CNCF) as a sandbox project in 2021. The project was officially promoted to the incubation level of the CNCF at the end of 2023. This achievement marks the widespread recognition of Karmada's technology ecosystem by the global industry and further solidifies its leadership position in distributed cloud-native technology. With its innovative multi-cloud, multi-cluster container orchestration capabilities, the project has been adopted by more than 30 well-known enterprises worldwide for building enterprise-grade cloud-native platforms.

The emergence of Karmada provides a powerful and flexible container orchestration solution for enterprises in the multi-cloud era, which not only solves the pain points of multi-cloud management, but also provides solid technical support for enterprises to explore broader application scenarios in their cloud-native journey. As cloud-native technologies continue to evolve, Karmada is expected to become a key force in connecting and simplifying the multi-cloud ecosystem, helping enterprises unleash the full potential of the cloud and accelerate the process of digital transformation.

Be the first to know about Huawei Cloud's fresh technology~!