0 Basic Reading Top Session Papers - Application-oriented cloud workload prediction: a survey and new perspectives

Original link

Abstract

Accurate workload prediction is valuable for cloud subscribers and providers as it can effectively guide many practices such as performance assurance, cost reduction and energy consumption optimization. However, cloud workload prediction is very challenging due to the complexity and dynamics of workloads and various solutions have been proposed to enhance the predictive behavior, unlike the existing surveys, for the first time, we take a new perspective, i.e., theApplication-oriented rather than prediction methods per sethat provides a comprehensive overview and analysis of the developing landscape of workload forecasting. Specifically, we first present the basic features of workload prediction, and then based on two notable features of cloud applications:Variability and isomerizationAnalysis and categorization of existing work

1 Introduction

Resource management faces great challenges due to the dynamic nature of cloud environments, the diversity of user requests and services, and the elastic provisioning of cloud resources:Long queuing time, unstable performance, resource competition, idle resources, high energy consumption

To address these issues, the following diagram illustrates a proactive framework for implementing IT Operational Artificial Intelligence (AIOps)

Where the AIOps framework (located in the Cloud platform layer)

Monitoring:
The Monitor center is responsible for collecting system operation status and performance data.
Analysis:
Workload predictor: Predicts future load conditions.
SLA analyzer: Analyzes whether a service meets a service level agreement (SLA).
Planning:
Decision maker: develops resource allocation and optimization plans based on forecasts and analysis.
Execution:
Executor: executes specific resource adjustments based on the decision maker's plan.

2 Basic Characteristic

2.1 Predicted Targets

There are two main types to consider

Request workload: external access requests (http requests, api calls, etc.) and internal system calls (requests generated by different components or services within the application system calling each other)

Resource workload: regular resources, including CPU, memory, disk and network bandwidth

2.2 Modeling Technologies

Statistical methods: moving average (MA), autoregressive (AR), exponential smoothing (ES), autoregressive integral moving average (ARIMA), seasonal autoregressive integral moving average (SARIMA) methods, etc.

ML: Linear Regression (LR), Logistic Regression (LoR), Support Vector Machine (SVM), K Nearest Neighbors (KNN), Plain Bayes (NB), Decision Tree (DT), Random Forest (RF), etc.

DL: Artificial Neural Networks (ANN), Extreme Learning Machines (ELM), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit Networks (GRU), Convolutional Neural Networks (CNN), and Temporal Convolutional Neural Networks (TCN)

RL: Q-Learning, Deep Q Networks (DQN), etc.

2.3 Evaluation Metrics Evaluation Indicators

The following are direct indicators

1. Mean absolute error (MAE).\(\begin{equation*} \text{MAE} =\frac{\sum_{t=1}^{m}\vert y_{t}-\hat{y}_{t}\vert }{m} \tag{1} \end{equation*}\)

2. Mean Square Error (MSE).\(\begin{equation*} \text{MSE} =\frac{\sum_{t=1}^{m}\vert y_{t}-\hat{y}_{t}\vert ^{2}}{m} \tag{2} \end{equation*}\)

3. Root Mean Square Error (RMSE).\(\begin{equation*} \text{RMSE}=\sqrt{\text{MSE}}=\sqrt{\frac{\sum_{t=1}^m\left\vert y_t-\hat{y}_t\right\vert^2}{m}} \tag{3} \end{equation*}\)

4. Mean Absolute Percentage Error (MAPE).\(\begin{equation*} \text { MAPE }=\frac{1}{m} \sum_{t=1}^m\left\vert\frac{y_t-\hat{y}_t}{y_t}\right\vert \tag{4} \end{equation*}\)It is a relative error measurement that uses absolute values to prevent positive and negative errors from canceling each other out.

5. Decision factor (R2).\(\begin{equation*} R^{2}= \frac{\Sigma_{t=1}^{m}\vert \hat{y}_{t}-\overline{y}_{t}\vert ^{2}}{\Sigma_{t=1}^{m}\vert y_{t}-\overline{y}_{t}\vert ^{2}} \tag{5} \end{equation*}\)Ratio of residual sum of squares to total sum of squares

Classification-based workload prediction model: let us first define the following four basic classification metrics:

1. True Positive (TP): The number of samples where the true value is positive and the prediction is also positive.

2. False Positives (FP): the number of samples where the true value is negative and the prediction is positive.

3. True Negative (TN): The number of samples where the true value is negative and the prediction is also negative.

4. False Negative (FN): The number of samples where the true value is positive and the prediction is negative.

The following evaluation indicators can be obtained:

1. Accuracy:\(\begin{equation*} \text { Accuracy }=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}} \tag{6} \end{equation*}\), which is the ratio of the number of correctly predicted samples to the number of all samples

2. Precision.\(\begin{equation*} \text { Precision }=\frac{\text{TP}}{\text{TP}+\text{FP}} \tag{7} \end{equation*}\), is the ratio of the number of samples correctly predicted as positive to the number of all samples predicted as positive

3. Recall.\(\begin{equation*} \text { Recall }=\frac{\text{TP}}{\text{TP}+\text{FN}} \tag{8} \end{equation*}\), which is the ratio of the number of samples correctly predicted as positive to the number of all true-positive samples

4.F1 Score:\(\begin{equation*} \text { F1 Score }=\frac{2 \times \text { Precision } \times \text { Recall }}{\text { Precision }+ \text { Recall }} \tag{9} \end{equation*}\),which is the reconciled mean of precision and recall. A higher F1 score indicates that the model has better balanced precision and recall.

Here are some more indirect metrics: execution time, throughput, success rate, SLA violation rate, resource utilization, number of threads, cost (resources, violations, management), profit, energy consumption

2.4 Data sets

Google Cluster Data: It tracks data from Google's cluster management system, also known as Borg.

Alibaba Cluster Data:It tracks data from Alibaba's production clusters and contains detailed information about jobs/applications.

Microsoft Azure Tracking:It tracks data from Microsoft Azure systems, including virtual machine (VM) tracking and Azure Function tracking.

WS-Dream:It maintains three datasets: (1) the QoS dataset, (2) the log dataset, and (3) the comment dataset.

Wikipedia Pagccounts-Raw: It is a tracking of Web requests made by Wikipedia servers, outages, and server issues that may affect the tracking.

Docker Registry Trace Player:It is used to replay anonymous production-level traces of the registry. These traces come from the IBM docker registry.

Grid Workloads Archive: It is a repository of usage traces for multiple grids.

Failure Trace Archive: It is a repository for parallel and distributed system availability tracing.

Planetlab Workload Traces: It is a set of CPU utilization traces collected from PlanetLab VMs over a random 10-day period.

Parallel Workloads Archive: It is a collection of workload traces and models for High Performance Computing (HPC) machines.

Lublin-Feitelsont:It is a model for parallel tasks in supercomputers.

Pegasus Synthetic Workflows: It is analyzed data from 20 integrated workflow applications, each with different sizing options.

Fisher:It is a collection of resource and performance metrics from a real Kubernetes system, logged over 30 days for 10 containers.

3 Application-Oriented Workload Prediction

3.1 Workload variability

3.1.1 High volatility

Unlike HPC systems and grid computing, cloud applications are much more interactive and workloads are much more variable. They are almost 20 times noisier on average than grid computing

1. Linear analysis
The researchers proposed solutions for MA, AR, ES, ARIMA and SARIMA based on statistical analysis of historical data to fit statistical models.

2. Nonlinear analysis
Machine learning algorithms such as KNN, ANN, RF, SVM, etc. were tested and it was confirmed that RF has the highest prediction accuracy, and a workload prediction method based on improved LSTM was proposed, which generates a new RNN architecture by splicing BiLSTM and GridLSTM, and utilizes deep Q-learning to perform federated cloud workload prediction, which is a method for extracting latent patterns and optimizing VM resource allocation models, etc.

3.1.2 Variety of models

New workload patterns always emerge as user behavior, applications, and environments evolve. In addition, non-static workloads exhibit different patterns over time, with more frequent model regeneration and corresponding overheads

Therefore, integrated prediction methods have been gradually applied to cloud workload prediction.A prediction method REAP integrating feature selection and eight machine learning methods is proposed to obtain excellent prediction accuracy.A hybrid model based on ARIMA and ANN is proposed to predict CPU and memory utilization. ARIMA detects linear components, ANN uses ARIMA derived residuals to analyze nonlinear components, etc.

3.2 Workload Heterogeneity

3.2.1 Forecasting objectives

1. Request workload
An ANN-based workload prediction model is proposed

2. Resource workloads
A multivariate time series-based workload prediction framework for multi-attribute resource allocation is proposed, and a BiLSTM model is developed to predict the supply and utilization of multiple resources

3.2.2 Organizational structure

As applications evolve from monolithic architectures to service-oriented architectures to microservice architectures, researchers explore how workload prediction models can incorporate the structural characteristics of applications.

1. Independent analysis
The researchers focused only on modeling the workload variations of the application itself. An integrated workload prediction model that considers the integration of adaptive sliding windows and temporal locality is proposed

2. Correlation analysis
Facing distributed or tiered applications, researchers design prediction methods by analyzing the workload variations of different components in an application. An end-to-end workload prediction method based on deep learning is proposed and the concept of workload group behavior is creatively introduced

3. Large-scale analysis
Large cloud applications may have thousands of instances. Balancing prediction accuracy and model overhead has become a serious challenge for large-scale workload prediction. A feature selection method is proposed to reduce the inference time of predictive models

3.2.3 Type of operation

1. Cluster/Data Center Granularity
A tree-level deep convolutional neural network based on a population optimization algorithm is proposed, and an improved adaptive differential evolution (AADE) learning algorithm with three-dimensional adaptive capability is presented and applied to train neural networks for workload prediction in data centers

/VM Granularity
SARIMA outperforms LSTM in long-term tasks but performs poorly in short-term tasks

3. Container granularity
As virtualization technology continues to improve, many applications have shifted from traditional PM/VM-based deployments to Serverful-based container deployments
A hybrid model of ARIMA and triple exponential smoothing is proposed, which is responsible for mining and predicting linear and nonlinear relationships in container resource workload sequences, respectively

Instance Granularity
A probabilistic-based workload prediction and warm-up model for serverless applications is proposed, which is based on the Fast Fourier Transform to predict whether a function will be called and concurrent in a specific time interval, and to guide the function warm-up and save the

4 ntegration With Resource Management

4.1 Active Capacity Planning

Planning cloud infrastructure capacity for applications is a critical issue that may lead to significant service improvements, cost savings and environmental sustainability.
A capacity planning system for cloud data centers has been developed, which introduces the concept of scenario combination, allowing the exploration of heterogeneous possible topologies and resources, as well as horizontal and vertical resource scaling.

4.2 Active application deployment

Application deployment enables the placement of application instances on the cloud infrastructure, which typically determines the initial location of the application. Proactive deployment helps to prevent co-located applications from entering undesirable states, such as resource contention and wastage
A containerized task deployment algorithm is proposed to optimize server resource utilization by deploying co-located containers. It uses K-means algorithm to classify tasks using historical traces with reference to workload dimensions such as CPU usage, memory consumption, disk storage, and network bandwidth.

4.3 Unsolicited request scheduling

Effective traffic sensing can guide proactive scheduling of user requests
A method for flexible task scheduling using data clustering is proposed. The number of tasks in each class cluster is predicted using ARIMA to provide a reference for resource allocation. Then, the proposed energy-efficient resource allocation method dynamically provides resources for the tasks in each cluster

4.4 Proactive resource allocation

Resource allocation is essentially short-term capacity planning, which is the process of configuring and allocating resources from installed capacity. Allocating appropriate resources to applications at runtime is a key issue because cloud computing is an on-demand and pay-as-you-go model.
A memory allocation optimization model, SLAM, is designed for serverless workflow applications.It uses distributed tracing to identify relationships between functions and estimate workflow execution times for different memory configurations

4.5 Active Elastic Expansion

Elasticity is a key feature provided by the cloud computing model for applications [105], where variants of horizontal and vertical scaling and their combinations are common implementation operations. Active elastic scaling allows resources to be added or removed in advance, reducing the impact of resource scaling time and ensuring high quality service and cost efficiency.
A Kubernetes-based scaling system is proposed, which includes BiLSTM-based workload prediction algorithms with an attention mechanism and reinforcement learning methods for passive and active scaling

4.6 Active dynamic migration

Application migration is essentially dynamic application deployment that dynamically adjusts the mapping between application instances and infrastructure. Therefore, there is a need to realize the transformation of applications from source to target locations.
A workload prediction based VM migration strategy is presented to improve energy efficiency. Neural networks are utilized for workload prediction and VM migration is executed based on the proposed Harris Hawks Spider Monkey Optimization where the decision making process considers power, workload and resource parameters.

5. Future Direction

Large Scale Workload Prediction, Workload Prediction with Serverless Instances, Multi-Topology Guided Workload Prediction, Large Scale Models Applied to Workload Prediction, Model Interpretability, Model Unreliability