k8s cluster deployment with multiple Master nodes
I. Preparatory work
1. Prepare five hosts (three Master nodes, one Node node, one normal user) as follows:
character | IP | random access memory (RAM) | crux | (computer) disk |
---|---|---|---|---|
Master01 | 192.168.116.141 | 4G | 4 | 55G |
Master02 | 192.168.116.142 | 4G | 4 | 55G |
Master03 | 192.168.116.143 | 4G | 4 | 55G |
Node | 192.168.116.144 | 4G | 4 | 55G |
regular user | 192.168.116.150 | 4G | 4 | 55G |
2. Close SElinux, because SElinux will affect some components of K8S can not work properly:
sed -i '1,$s/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# reboot
3. The four hosts are configured with hostnames (excluding ordinary users) as follows:
Control node Master01:
hostnamectl set-hostname master01 && bash
Control node Master02:
hostnamectl set-hostname master02 && bash
Control node Master03:
hostnamectl set-hostname master03 && bash
Work node Node:
hostnamectl set-hostname node && bash
4. Four hosts (excluding ordinary users) are configured with host files:
-
Go to the hosts file:
vim /etc/hosts
-
Modify the contents of the file to add four hosts and their IPs:
127.0.0.1 localhost localhost4 localhost4.localdomain4 ::1 localhost localhost6 localhost6.localdomain6 192.168.116.141 master01 192.168.116.142 master02 192.168.116.143 master03 192.168.116.144 node
-
After the modification, you can use the ping command on all four hosts to check if they are connected:
ping -c1 -W1 master01 ping -c1 -W1 master02 ping -c1 -W1 master03 ping -c1 -W1 node
5. Four hosts (excluding ordinary users) download the required unexpected component packages and related dependency packages respectively:
yum install -y yum-utils device-mapper-persistent-data lvm2 wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip autoconf automake zlib-devel epel-release openssh-server libaio-devel vim ncurses-devel socat conntrack telnet ipvsadm
The required related unexpected component packages are explained below:
yum-utils: A number of auxiliary tools are provided foryum
Package managers such asyum-config-manager
,repoquery
etc.
device-mapper-persistent-data: Relates to Linux's device mapping capabilities, often associated with LVM (Logical Volume Management) and container storage (e.g. Docker).
lvm2: Logical Volume Manager for managing logical volumes on disks, allowing flexible disk partition management.
wget: A non-interactive web downloader that supports HTTP, HTTPS and FTP protocols and is often used to download files.
net-tools: Provides some classic web tools such asifconfig
,netstat
etc., for viewing and managing network configurations.
nfs-utils: A toolkit that supports NFS (Network File System) and allows clients to mount remote file systems.
lrzsz:lrz
cap (a poem)lsz
It is a command line tool for X/ZMODEM file transfer protocol under Linux system, which is commonly used to transfer data through serial port.
gcc: A GNU C compiler for compiling C programs.
gcc-c++: A GNU C++ compiler for compiling C++ language programs.
make: Used to build and compile programs, usually in conjunction with theMakefile
Used in conjunction to control the compilation and packaging process of a program.
cmake: A cross-platform build system generation tool for managing the compilation process of a project, especially for large and complex projects.
libxml2-devel: Developmentlibxml2
library header file.libxml2
is a C library for parsing XML files.
openssl-devel: Header files and development libraries for development of the OpenSSL library, the library for SSL/TLS encryption.
curl: A command line tool for transferring data, supporting multiple protocols (HTTP, FTP, etc.).
curl-devel: Developmentcurl
Libraries and header files to support use in codecurl
Related Functions.
unzip: for decompression.zip
Documentation.
autoconf: A tool for automatically generating configuration scripts, often used to generate packageconfigure
Documentation.
automake: Automatically generated Documentation, combined with
autoconf
Use, for building systems.
zlib-devel:zlib
library development header files.zlib
is a library for data compression.
epel-release: Used to enable the EPEL (Extra Packages for Enterprise Linux) repository, which provides a large number of additional packages.
openssh-server: OpenSSH server for remote login and management of the system via SSH.
libaio-devel: Development header file for an asynchronous I/O library that provides asynchronous file I/O support, commonly used in database and high performance applications.
vim: A powerful text editor with multiple language support and extended functionality.
ncurses-devel: Developmentncurses
library that provides tools for building terminal controls and user interfaces.
socat: A versatile network tool for bi-directional data transfer that supports multiple protocols and address types.
conntrack: Connection tracking tool that displays and manipulates the connection tracking table in the kernel, commonly used for network firewall and NAT configuration.
telnet: A simple network protocol for remote login that allows communication with a remote host via the command line.
ipvsadm: Used to manage IPVS (IP Virtual Server), a load balancing module in the Linux kernel commonly used for high availability load balancing clusters.
6. Configure password-free login between hosts
Four nodes execute simultaneously:
1) Configure three Master hosts to another Node host for password-free login:
ssh-keygen # Press enter without typing anything when you encounter a problem.
2) Pass the public key file just generated to the other Master and node nodes, after entering YES, and after entering the password corresponding to the host:
ssh-copy-id master01
ssh-copy-id master02
ssh-copy-id master03
ssh-copy-id node
7. Turn off the firewall on all hosts.
If you do not want to turn off the firewall can add firewall-cmd rules for filtering and screening, related content query information, not a demonstration.
Turn off the firewall:
systemctl stop firewalld && systemctl disable firewalld
systemctl status firewalld # Query Firewall Status,When closed, it should read Active: inactive (dead)
Adding Firewall Rules:
6443: Kubernetes Api Server 2379, 2380: Etcd database
10250, 10255: kubelet service 10257: kube-controller-manager service
10259: kube-scheduler service 30000-32767: NodePort port mapped on physical machine
179, 473, 4789, 9099: Calico services 9090, 3000: Prometheus monitoring + Grafana panel
8443: Kubernetes Dashboard Control Panel
# Kubernetes API Server
firewall-cmd --zone=public --add-port=6443/tcp --permanent
# Etcd comprehensive database
firewall-cmd --zone=public --add-port=2379-2380/tcp --permanent
# Kubelet service
firewall-cmd --zone=public --add-port=10250/tcp --permanent
firewall-cmd --zone=public --add-port=10255/tcp --permanent
# Kube-Controller-Manager service
firewall-cmd --zone=public --add-port=10257/tcp --permanent
# Kube-Scheduler service
firewall-cmd --zone=public --add-port=10259/tcp --permanent
# NodePort mapping port
firewall-cmd --zone=public --add-port=30000-32767/tcp --permanent
# Calico service
firewall-cmd --zone=public --add-port=179/tcp --permanent # BGP
firewall-cmd --zone=public --add-port=473/tcp --permanent # IP-in-IP
firewall-cmd --zone=public --add-port=4789/udp --permanent # VXLAN
firewall-cmd --zone=public --add-port=9099/tcp --permanent # Calico service
#Prometheuscontrol+Grafanakneading board
firewall-cmd --zone=public --add-port=9090/tcp --permanent
firewall-cmd --zone=public --add-port=3000/tcp --permanent
# Kubernetes Dashboard控制kneading board
firewall-cmd --zone=public --add-port=8443/tcp --permanent
# Reload the firewall configuration to apply the changes
firewall-cmd --reload
8. Four hosts with swap swap partition turned off
Swap partitions are much slower to read and write than physical memory. If Kubernetes workloads rely on swap to compensate for out-of-memory, this can lead to significant performance degradation, especially in resource-intensive container applications.Kubernetes prefers to expose nodes to out-of-memory situations directly rather than relying on swap, which prompts the scheduler to reallocate resources.
By default, Kubernetes will add thekubelet
Check at startupswap
status and ask for it to be turned off. If theswap
is not turned off, Kubernetes may not start properly and report errors. Example:
[!WARNING]
kubelet: Swap is enabled; production deployments should disable swap.
In order for Kubernetes to work properly, it is recommended to permanently disable swap on all nodes, as well as adjust the system's memory management:
swapoff -a # Shut down the current swap
sed -i '/swap/s/^/#/' /etc/fstab # add comment before swap
grep swap /etc/fstab # A successful shutdown would look like this: #/dev/mapper/rl-swap none swap defaults 0 0
9. Modify kernel parameters
Each of the four hosts (excluding the regular user) executes.
modprobe br_netfilter
-
modprobe
: Commands for loading or unloading kernel modules. -
br_netfilter
: This module allows bridged network traffic to be filtered by iptables rules and is typically used when network bridging is enabled. -
This module is primarily used in Kubernetes container networking environments to ensure that the Linux kernel properly handles the filtering and forwarding of network traffic, especially in inter-container communication.
Each of the four hosts executes.
cat > /etc// <<EOF
-nf-call-ip6tables = 1
-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p /etc// # Putting the configuration into effect
- -nf-call-ip6tables = 1: Allow IPv6 network traffic to be used when bridging through Linux networks
ip6tables
Perform filtering. - -nf-call-iptables = 1: Allow IPv4 network traffic to be used when bridging over a Linux network
iptables
Perform filtering. - net.ipv4.ip_forward = 1: Allows the Linux kernel to forward (route) IPv4 packets.
These settings ensure that in Kubernetes, network bridging traffic can be routed through theiptables
cap (a poem)ip6tables
filtering, and enable IPv4 packet forwarding to improve network security and communication capabilities.
10. Configure the yum sources for installing Docker and Containerd
Each of the four hosts to install docker-ce source (any one of them, only one), the subsequent operation only demonstrates the Ali source.
# Ali source
yum-config-manager --add-repo /docker-ce/linux/centos/
# Tsinghua University open source software mirrors
yum-config-manager --add-repo /docker-ce/linux/centos/
# University of Science and Technology of China open source mirrors
yum-config-manager --add-repo /docker-ce/linux/centos/ # USTC open source mirrors
# CSU mirror repositories
yum-config-manager --add-repo /docker-ce/linux/centos/
# Huawei cloud sources
yum-config-manager --add-repo /docker-ce/linux/centos/ # Huawei cloud source
11. Configure the yum sources needed for the K8S command line tools
cat > /etc// <<EOF
[kubernetes]
name=Kubernetes
baseurl=/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=/kubernetes/yum/doc/
/kubernetes/yum/doc/
EOF
yum makecache
12. Four hosts for time synchronization
Both Chrony and NTPD are tools used for time synchronization, but Chrony has its own unique advantages in many ways. Below are some of the main advantages of Chrony over NTPD, and based on these, a deployment of chrony time synchronization:
fast synchronization | Chrony can synchronize time faster when network latency is high or the connection is unstable. | It usually takes longer to achieve time synchronization. |
---|---|---|
vantage | Chrony | NTPD |
adaptable | Performs well on mobile devices or in virtual environments and adapts quickly to network changes. | Poor performance in these environments. |
Clock drift correction | The ability to better handle system clock drift is achieved through frequency tuning. | Weak handling of system clock drift. |
Simple configuration | The configuration is relatively simple and intuitive, easy to understand and use. | There are more configuration options and it may take more time to familiarize yourself with them. |
1) Installation of Chrony on four mainframes
yum -y install chrony
2) Four hosts modify configuration files to add domestic NTP servers
echo "server iburst" >> /etc/
echo "server iburst" >> /etc/
echo "server iburst" >> /etc/
echo "server iburst" >> /etc/
tail -n 4 /etc/
systemctl restart chronyd
3) You can set up a timed task to restart the chrony service every minute for time calibration (not required)
echo "* * * * * /usr/bin/systemctl restart chronyd" | tee -a /var/spool/cron/root
It is recommended to do this manually by first executing thecrontab -e
command, after adding the following to the timed task
* * * * * /usr/bin/systemctl restart chronyd
- The five asterisks indicate time scheduling, with each asterisk representing a time field, from left to right:
- First asterisk: minutes (0-59)
- Second asterisk: hours (0-23)
- Third asterisk: Date (1-31)
- Fourth asterisk: month (1-12)
- Fifth asterisk: day of the week (0-7, with 0 and 7 both representing Sundays)
- Here, each field is represented by the
*
means "every", thus* * * * *
It means "every second of every minute". -
/usr/bin/systemctl
besystemctl
The full path to the command to manage the system services.
13. Install Containerd
Containerd is a high-performance container runtime that takes care of container lifecycle management in Kubernetes, including creating, running, stopping, and deleting containers, as well as pulling and managing images from image repositories.Containerd provides a Container Runtime Interface (CRI), which seamlessly integrates with Kubernetes to ensure efficient resource utilization and fast container startup times. Containerd provides a Container Runtime Interface (CRI) that seamlessly integrates with Kubernetes to ensure efficient resource utilization and fast container boot times. Containerd also supports event monitoring and logging, making it easy for operations and debugging, and is a key component for container orchestration and management.
Four host installations of containerd version 1.6.22
yum -y install -1.6.22
yum -y install -1.6.22 --allowerasing # Choose this if you have problems installing, use the first one by default.
Create the containerd's configuration file directory and modify the self-contained。
mkdir -pv /etc/containerd
vim /etc/containerd/
The modifications are as follows:
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "/run/containerd/"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_ca = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
[metrics]
address = ""
grpc_histogram = false
[plugins]
[plugins."."]
deletion_threshold = 0
mutation_threshold = 100
pause_threshold = 0.02
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."."]
device_ownership_from_security_context = false
disable_apparmor = false
disable_cgroup = false
disable_hugetlb_controller = true
disable_proc_mount = false
disable_tcp_service = true
enable_selinux = false
enable_tls_streaming = false
enable_unprivileged_icmp = false
enable_unprivileged_ports = false
ignore_image_defined_volumes = false
max_concurrent_downloads = 3
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "/google_containers/pause:3.9"
selinux_category_range = 1024
stats_collect_period = 10
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
systemd_cgroup = false
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins.".".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/"
conf_template = ""
ip_pref = ""
max_conf_num = 1
[plugins.".".containerd]
default_runtime_name = "runc"
disable_snapshot_annotations = true
discard_unpacked_layers = false
ignore_rdt_not_enabled_errors = false
no_pivot = false
snapshotter = "overlayfs"
[plugins.".".containerd.default_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins.".".containerd.default_runtime.options]
[plugins.".".]
[plugins.".".]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ".v2"
[plugins.".".]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
[plugins.".".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins.".".containerd.untrusted_workload_runtime.options]
[plugins.".".image_decryption]
key_model = "node"
[plugins.".".registry]
config_path = ""
[plugins.".".]
[plugins.".".]
[plugins.".".]
[plugins.".".]
[plugins.".".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."."]
path = "/opt/containerd"
[plugins."."]
interval = "10s"
[plugins."."]
sampling_ratio = 1.0
service_name = "containerd"
[plugins."."]
content_sharing_policy = "shared"
[plugins."."]
no_prometheus = false
[plugins."."]
no_shim = false
runtime = "runc"
runtime_root = ""
shim = "containerd-shim"
shim_debug = false
[plugins."."]
platforms = ["linux/amd64"]
sched_core = false
[plugins.".-service"]
default = ["walking"]
[plugins.".-service"]
rdt_config_file = ""
[plugins."."]
root_path = ""
[plugins."."]
root_path = ""
[plugins."."]
async_remove = false
base_image_size = ""
discard_blocks = false
fs_options = ""
fs_type = ""
pool_name = ""
root_path = ""
[plugins."."]
root_path = ""
[plugins."."]
root_path = ""
upperdir_label = false
[plugins."."]
root_path = ""
[plugins."."]
endpoint = ""
insecure = false
protocol = ""
[proxy_plugins]
[stream_processors]
[stream_processors."."]
accepts = ["application/.+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/."
[stream_processors."."]
accepts = ["application/.+gzip+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/.+gzip"
[timeouts]
"" = "0s"
"" = "5s"
"" = "5s"
"" = "3s"
"" = "2s"
[ttrpc]
address = ""
gid = 0
uid = 0
sandbox mirror source: Sets up sandbox container images used by Kubernetes to support efficient management of containers.
- sandbox_image = "/google_containers/pause:3.9"
hugeTLB Controller: Disables the hugeTLB controller, reducing memory management complexity for environments that don't need it.
- disable_hugetlb_controller = true
Web Plugin Path: Specify the binary and configuration path for the CNI network plug-in to ensure proper network functionality.
- bin_dir = "/opt/cni/bin"
- conf_dir = "/etc/cni/"
Garbage Collection Scheduler: Adjust garbage collection thresholds and startup latency to optimize container resource management and performance.
- pause_threshold = 0.02
- startup_delay = "100ms"
streaming media server: Configure the address and port of the streaming service to enable efficient data transmission with the client.
- stream_server_address = "127.0.0.1"
- stream_server_port = "0"
Starting and setting up containerd to boot up on its own
systemctl enable containerd --now
systemctl status containerd
14. Install Docker-ce (use docker's pull image feature)
1) Install the latest version of docker-ce on each of the four hosts:
yum -y install docker-ce
2) Start and set up docker to boot itself:
systemctl start docker && systemctl enable
3) Configure docker's image gas pedal address:
Note: Ali accelerated address loginAliCloud AcceleratorCheck it out, everyone's accelerated address is different
tee /etc/docker/ <<-'EOF'
{
"registry-mirrors": [
"",
"",
"",
"https://dockerhub.",
"."
]
}
EOF
systemctl daemon-reload
systemctl restart docker
systemctl status docker
Second, K8S installation and deployment
1.Install K8S related core components
Each of the four hosts installs K8S-related core components:
yum -y install kubelet-1.28.2 kubeadm-1.28.2 kubectl-1.28.2
systemctl enable kubelet
-
kubelet
It is the core agent on each node in a Kubernetes cluster, and is responsible for managing and maintaining the lifecycle of Pods and containers on the node according to the instructions of the control plane, ensuring that containers are running according to specification, and communicating with the control plane on a regular basis. kubelet reports the state of the nodes and Pods to the control node's apiServer, which stores this information in the etcd database. kubelet reports the status of nodes and pods to the control node's apiServer, which stores this information in the etcd database. -
kubeadm
is a tool for simplifying the installation and management of Kubernetes clusters, quickly initializing control plane nodes and adding worker nodes to the cluster, reducing the complexity of manual configuration. -
kubectl
is a command line tool for Kubernetes that is used for administrators to interact with the cluster and perform various tasks such as deploying applications, viewing resources, troubleshooting issues, managing cluster status, etc. It communicates directly with the Kubernetes API through the command line.
2. Realize high availability of kubernetes apiServer nodes through keepalived+nginx
1) Install keepalived+nginx on each of the three Master nodes to achieve load balancing and reverse proxy for apiserver. master01 serves as the master node for keepalived, and master02 and master03 serve as standby nodes for keepalived.
yum -y install epel-release nginx keepalived nginx-mod-stream
2) Modify the configuration profile:
vim /etc/nginx/
3) Change the configuration profile complete information as follows:
user nginx;
worker_processes auto;
error_log /var/log/nginx/;
pid /run/;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
}
# additional stream configure
stream {
# Log format
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
# Log storage path
access_log /var/log/nginx/ main;
# master scheduling resource pool
upstream k8s-apiserver {
server 192.168.116.141:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.116.142:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.116.143:6443 weight=5 max_fails=3 fail_timeout=30s;
}
server {
listen 16443; # avoidance of conflict with Kubernetes master * of nodes
proxy_pass k8s-apiserver; # Doing a reverse proxy to a resource pool
}
}
http {
include /etc/nginx/;
default_type application/octet-stream;
# Log format
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/ main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx//*.conf;
}
-
stream
module is used for Layer 4 load balancing, forwarding to multiplek8s-apiserver
Nodes.- four-tier load balancingWorks at the transport layer, focusing only on information about the TCP/UDP connection, such as source IP address, port, destination IP address and port, and forwarding traffic based on this information, without concern for the content of the transmitted data.
- Seven-layer load balancingWork at the application layer, understanding higher level protocols (e.g. HTTP, HTTPS) and performing more complex traffic distribution and processing based on the specifics of the request (e.g. URL, Cookies, Headers).
-
log_format
cap (a poem)access_log
Used for logging. -
upstream
Defines multiplek8s-apiserver
servers, the load balancing policy is based on weightsweight
and has a troubleshooting mechanism. -
server
In the block.listen
Using the Port16443
in order to avoid a conflict with the Kubernetes master node's default port6443
Conflict.
4) Restart nginx and set it to boot up:
systemctl restart nginx && systemctl enable nginx
5) Write the keepalived check script first:
Note: Place this script on each of the three Master nodes. It is recommended to place it in the /etc/keepalived/ directory for subsequent optimization updates.
/bin/bash /bin/bash
# Author :lyx
# Description :check nginx
# Date :2024.10.06
# Version :2.3
# Define the log file path
LOG_FILE="/var/log/nginx_keepalived_check.log" # Define the path to the log file.
MAX_LINES=1000 # Set the log to keep 1000 lines (because without limiting the log, it will grow indefinitely and take up a lot of disk space)
# Function that logs the log with detailed time formatting and retains the last 1000 lines of logging
log_message() {
local time_stamp=$(date '+%Y-%m-%d %H:%M:%S') # Define time formatting
echo "$time_stamp - $1" >> $LOG_FILE
# Intercept the log file, keeping only the last 1000 lines.
tail -n $MAX_LINES $LOG_FILE > ${LOG_FILE}.tmp && mv ${LOG_FILE}.tmp $LOG_FILE
}
# Functions to check if Nginx is running
check_nginx() {
pgrep -f "nginx: master" > /dev/null 2>&1
echo $?
}
# 1. Check if Nginx is alive.
log_message "Checking Nginx status..."
if [ $(check_nginx) -ne 0 ]; then
log_message "Nginx is not running, trying to start Nginx..."
# 2. If Nginx is not running, then try starting it
systemctl start nginx
sleep 2 # Wait for Nginx to start
# 3. Check the Nginx status again
log_message "Checking status again after starting Nginx..."
if [ $(check_nginx) -ne 0 ]; then
log_message "Nginx failed to start, stopping the Keepalived service..."
# 4. If Nginx fails to start, stop Keepalived
systemctl stop keepalived
systemctl stop keepalived
log_message "Nginx started successfully."
fi
else
log_message "Nginx is running normally."
fi
- If there is a problem with nginx, it will print Chinese logs to: /var/log/nginx_keepalived_check.log , you can manually check the output log messages, and then according to the specific time of the corresponding failure, combined with the nginx logs to repair or optimization.
Grant script executable permissions respectively:
chmod +x /etc/keepalived/keepalived_nginx_check.sh
The main purpose of this script is toMonitoring the Running Status of the Nginx Serviceand attempts to restart Nginx when it detects that it has stopped running. If the restart fails, the script stops the Keepalived service to avoid continuing to provide unavailable services.
-
Monitoring Nginx Status: A script periodically checks that Nginx is running correctly, using the
pgrep
command to detect the status of the master process. - Auto-repair mechanism: If Nginx is not running, try restarting the service and checking its status again; if the restart is successful, log it.
- Stop Keepalived: If Nginx fails to start, stop the Keepalived service to prevent the server from continuing to run as a failed node.
6) Modify the configuration file for the keepalived master node Master01:
global_defs {
notification_email {
notification_email {
failover@
sysadmin@
}
notification_email_from @
smtp_server 127.0.0.1 # Server used to send, receive and relay emails
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script keepalived_nginx_check { # Here is the script that was added in the previous step, call it here
script "/etc/keepalived/keepalived_nginx_check.sh" # Modify the path of the script according to your own additions, it is recommended to put it in this directory for easy management.
}
vrrp_instance VI_1 {
state MASTER # Modify state to MASTER for primary and BACKUP for backup.
interface ens160 # Change your actual NIC name.
virtual_router_id 51 # The virtual route ID of the master and backup should be the same.
priority 100 # Priority, set the priority of the backup server to be lower than that of the master server.
advert_int 1 # Broadcast packet delivery interval is 1 sec.
authentication {
auth_type PASS
auth_pass 1111
authentication { auth_type PASS auth_pass 1111}
virtual_ipaddress {
192.168.116.16/24 # Change the virtual IP to an unoccupied IP address, just make the virtual IP the same for both the primary and the backup
}
track_script {
keepalived_nginx_check # vrrp_script The name of the script to be called for tracking, so that Keepalived can take appropriate action based on the results returned by the script.
}
}
7) Modify the configuration file for the keepalived backup node Master02:
global_defs {
notification_email {
notification_email {
failover@
sysadmin@
}
notification_email_from @
smtp_server 127.0.0.1 # Server used to send, receive and relay emails
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script keepalived_nginx_check { # Here is the script that was added in the previous step, call it here
script "/etc/keepalived/keepalived_nginx_check.sh" # Modify the path of the script according to your own additions, it is recommended to put it in this directory for easy management.
}
vrrp_instance VI_1 {
state BACKUP # Modify state to MASTER for primary and BACKUP for backup.
interface ens160 # Change your actual NIC name.
virtual_router_id 51 # The virtual route IDs of the master and backup should be the same.
priority 90 # Priority, set the priority of the backup server to be lower than that of the master server.
advert_int 1 # Broadcast packet delivery interval is 1 second.
authentication {
auth_type PASS
auth_pass 1111
authentication { auth_type PASS auth_pass 1111}
virtual_ipaddress {
192.168.116.16/24 # Change the virtual IP to an unoccupied IP address, just make the virtual IP the same for both the primary and the backup
}
track_script {
keepalived_nginx_check # vrrp_script The name of the script to be called for tracking, so that Keepalived can act accordingly based on the results returned by the script.
}
}
7) Modify the configuration file for keepalived backup node Master03:
global_defs {
notification_email {
notification_email {
failover@
sysadmin@
}
notification_email_from @
smtp_server 127.0.0.1 # Server used to send, receive and relay emails
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script keepalived_nginx_check { # Here is the script that was added in the previous step, call it here
script "/etc/keepalived/keepalived_nginx_check.sh" # Modify the path of the script according to your own additions, it is recommended to put it in this directory for easy management.
}
vrrp_instance VI_1 {
state BACKUP # Modify state to MASTER for primary and BACKUP for backup.
interface ens160 # Change your actual NIC name.
virtual_router_id 51 # The virtual route IDs of the master and backup should be the same.
priority 80 # Priority, set the priority of the backup server to be lower than that of the master server.
advert_int 1 # Broadcast packet delivery interval is 1 second.
authentication {
auth_type PASS
auth_pass 1111
authentication { auth_type PASS auth_pass 1111}
virtual_ipaddress {
192.168.116.16/24 # Change the virtual IP to an unoccupied IP address, just make the virtual IP the same for both the primary and the backup
}
track_script {
keepalived_nginx_check # vrrp_script The name of the script to be called for tracking, so that Keepalived can act accordingly based on the results returned by the script.
}
}
- The main modifications to the three nodes are these three parts:
- state MASTER / BACKUP
- interface ens160
- priority 100 / 90 / 80
8) Reload the configuration file and restart the nginx+keepalived service:
systemctl daemon-reload && systemctl restart nginx
systemctl restart keepalived && systemctl enable keepalived
9) Check if the virtual IP is bound successfully:
ip address show | grep 192.168.116.16 # Check the virtual IP against your own settings
The following message will be displayed if the binding is successful:
[!IMPORTANT]
inet 192.168.116.16/24 scope global secondary ens160
10) Check if the keepalived drift is set successfully (keepalived_nginx_check.sh script is in effect)
(1) Shut down the keepalived service on the keepalived master node Master01:
systemctl stop keepalived
(2) Switch to the keepalived standby node Master01 to view the NIC information.
ip address show | grep 192.168.116.16 # Check the virtual IP against your own settings
If the following prompt appears, the drift is successful:
[!IMPORTANT]
inet 192.168.116.16/24 scope global secondary ens160
Note: If the master keepalived is not abnormal, the 192.168.116.16 virtual IP cannot be viewed on the two standby nodes.
3.Initialize K8S cluster
1)Master01 nodeInitialize the K8S cluster using kubeadm respectively:
Notes:kubeadm installs K8S, and the components of the control and worker nodes are based on thePodRunning.
kubeadm config print init-defaults >
- Generate a default configuration file redirection output to the center of the
2) Modify the file you just generated with kubeadm:
sed -i '/localAPIEndpoint/s/^/#/'
sed -i '/advertiseAddress/s/^/#/'
sed -i '/bindPort/s/^/#/'
sed -i '/name: node/s/^/#/'
sed -i 's|criSocket:. *|criSocket: unix://$(find / -name | head -n 1)|"
sed -i 's|imageRepository: registry.|imageRepository: /google_containers|' # The original configuration is a foreign k8s source, in order to speed up the download of the image, you need to change it to a domestic source
sed -i '/serviceSubnet/a\ podSubnet: 10.244.0.0/12' # /a\ means one line below the serviceSubnet line.
cat <<EOF >>
---
apiVersion: . /v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---apiVersion: . /v1alpha1 kind: KubeProxyConfiguration
apiVersion: . /v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF
more # Check it manually
-
advertiseAddress
is the advertised address of the Kubernetes control node through which other nodes communicate with the control plane node. It is usually the IP address of the server on which the control node resides, and to ensure that the control plane node can communicate with other nodes in the network through the correctControl Node IP Address(My MasterIP is: 192.168.116.131) to communicate. -
criSocket
Specified is the address of the container runtime (CRI) socket used by Kubernetes, which K8S uses to communicate with container runtimes such as containerd to manage and start containers. To ensure that K8S uses the correct container runtime socket. Finding the file path and replacing it in the configuration file with the find command ensures that the path is accurate and avoids manual search and configuration errors. -
IPVS
Mode supports more load balancing algorithms and better performance, especially in the case of more cluster nodes and services, which can significantly improve network forwarding efficiency and stability (if you do not specify mode as ipvs, iptables is selected by default, and iptables performance is relatively poor). -
uniform use
systemd
as containers and system servicescgroup
driver, avoid using thecgroupfs
This improves compatibility and stability between Kubernetes and the host system.Notes:Host IP, Pod IP, and Service IPCannot be on the same network segmentThis can lead to IP conflicts, routing confusion, and network isolation failures, affecting normal communication and network security in Kubernetes.
3) Initialize the K8S, based on the fileMaster01 nodePull the required image for Kubernetes 1.28.0 (you can choose either of the two methods):
(1) Use use usekubeadm
command to quickly pull images of all the core components of Kubernetes and ensure that the versions are consistent.
kubeadm config images pull --image-repository="/google_containers" --kubernetes-version=v1.28.0
(2) Usectr
command, requiring finer-grained control, or in thekubeadm
When problems occur during the pulling of the mirror, you canutilizationctr
The command pulls manuallyMirroring.
ctr -n= images pull /google_containers/kube-apiserver:v1.28.0
ctr -n= images pull /google_containers/kube-controller-manager:v1.28.0
ctr -n= images pull /google_containers/kube-scheduler:v1.28.0
ctr -n= images pull /google_containers/kube-proxy:v1.28.0
ctr -n= images pull /google_containers/pause:3.9
ctr -n= images pull /google_containers/etcd:3.5.9-0
ctr -n= images pull /google_containers/coredns:v1.10.1
4) At the Master control node, initialize the Kubernetes master node
kubeadm init --config= --ignore-preflight-errors=SystemVerification
Individual operating systems may have kubelet startup failures, as prompted below, if prompted successfully then ignore the following steps:
[!WARNING]
dial tcp [::1]:10248: connect: connection refused
fulfillmentsystemctl status kubelet
The following error message was found:
[!WARNING]
Process: 2226953 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 2226953 (code=exited, status=1/FAILURE)
The solution is as follows.control nodeImplementation:
sed -i 's|ExecStart=/usr/bin/kubelet|ExecStart=/usr/bin/kubelet --container-runtime-endpoint=unix://$(find / -name | head -n 1) --kubeconfig=/etc/kubernetes/ --config=/var/lib/kubelet/|' /usr/lib/systemd/system/
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset # Remove the installation errorK8S
kubeadm init --config= --ignore-preflight-errors=SystemVerification # reinstallation
3. Set up the Kubernetes configuration file so that the current user can use thekubectl
command to interact with the Kubernetes cluster
The control node Master01 executes:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/ $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
4. Expand the k8s control node and add Master01 and Master02 to the k8s cluster
1) Master02 and Master03 create a certificate storage directory:
mkdir -pv /etc/kubernetes/pki/etcd && mkdir -pv ~/.kube/
2) Remotely copy the certificate file to Master02:
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ [email protected]:/etc/kubernetes/pki/etcd/
scp /etc/kubernetes/pki/etcd/ [email protected]:/etc/kubernetes/pki/etcd/
3) Remotely copy the certificate file to Master03:
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ [email protected]:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ [email protected]:/etc/kubernetes/pki/etcd/
scp /etc/kubernetes/pki/etcd/ [email protected]:/etc/kubernetes/pki/etcd/
4) Control node Master01 generates cluster token:
kubeadm token create --print-join-command
Generate the token as follows:
[!IMPORTANT]
kubeadm join 192.168.116.141:6443 --token pb1pk7.6p6w2jl1gjvlmmdz --discovery-token-ca-cert-hash sha256:b3b9de172cf6c48d97396621858a666e0be2d2d5578e4ce0fba5f1739b735fc1
-
in the event thatcontrol nodeJoining the cluster requires joining after the generated token
--control-plane
, as follows:kubeadm join 192.168.116.141:6443 --token pb1pk7.6p6w2jl1gjvlmmdz --discovery-token-ca-cert-hash sha256:b3b9de172cf6c48d97396621858a666e0be2d2d5578e4ce0fba5f1739b735fc1 --control-plane
-
Execute the following command after the control node joins the cluster to work with kubectl commands:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/ $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
-
-
in the event thatoperating nodeTo join the cluster, just copy the generated line of command directly to the node host and execute it:
kubeadm join 192.168.116.141:6443 --token pb1pk7.6p6w2jl1gjvlmmdz --discovery-token-ca-cert-hash sha256:b3b9de172cf6c48d97396621858a666e0be2d2d5578e4ce0fba5f1739b735fc1
-
After a worker node joins the cluster, set a user's
kubectl
environment that enables it to interact with the Kubernetes cluster::mkdir ~/.kube cp /etc/kubernetes/ ~/.kube/config
Attention:If the following error is reported when expanding a node:
[!CAUTION]
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address
Please ensure that:
- The cluster has a stable controlPlaneEndpoint address.
- The certificates that must be shared among control plane instances are provided.
To see the stack trace of this error execute with --v=5 or higher
- The solution is to add
controlPlaneEndpoint
:
kubectl -n kube-system edit cm kubeadm-config
- Add the following
controlPlaneEndpoint: 192.168.116.141:6443
(Master01's IP address) can be solved:
kind: ClusterConfiguration kubernetesVersion: v1.18.0 controlPlaneEndpoint: 192.168.116.141:6443
-
Re-copy the generated line of command to the node host for execution (the control node needs to add the
--control-plane
), the expansion is successful if the prompt is as follows:To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/ $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Run 'kubectl get nodes' to see this node join the cluster.
-
5. Install k8s network component Calico
Calico is a popular open source networking solution designed to provide efficient, scalable, and secure network connectivity for Kubernetes. It uses an IP-based network model , so that each Pod can get a unique IP address , thus simplifying network management . Calico supports a variety of network policies , you can achieve fine-grained traffic control and security policies , such as label-based access control , allowing users to define which Pods can communicate with each other. (Calico supports a variety of network policies for fine-grained traffic control and security policies, such as label-based access control that allows users to define which Pods can communicate with each other.)
1) Install calico on each of the four hosts:
ctr image pull /ddn-k8s//calico/cni:v3.25.0
ctr image pull /ddn-k8s//calico/pod2daemon-flexvol:v3.25.0
ctr image pull /ddn-k8s//calico/node:v3.25.0
ctr image pull /ddn-k8s//calico/kube-controllers:v3.25.0
ctr image pull /ddn-k8s//calico/typha:v3.25.0
2) control node to download calico3.25.0 yaml configuration file (download failed to copy the URL to the browser, manually copy and paste the contents of the Master01 node to the same effect)
curl -O -L /projectcalico/calico/v3.25.0/manifests/
3) Edit, find the CLUSTER_TYPE line and add a pair of key-value pairs underneath to ensure that the NIC interface is used (note the indentation):
Original configuration:
- name: CLUSTER_TYPE
value: "k8s,bgp"
New Configuration:
- name: CLUSTER_TYPE
value: "k8s,bgp"
- name: IP_AUTODELECTION_METHOD
value: "interface=ens160"
Note: The name of the network card of different operating systems have differences, for example: centos7.9 the name of the network card for the ens33, you have to fill in the value: "interface=ens33", need to be flexible.
Note: If there is a calico pull image error problem, you may not have modified the imagePullPresent rule, you can modify the official source download to Huawei source download as follows:
sed -i '1,$s|/calico/cni:v3.25.0|/ddn-k8s//calico/cni:v3.25.0|g'
sed -i '1,$s|/calico/node:v3.25.0|/ddn-k8s//calico/node:v3.25.0|g'
sed -i '1,$s|/calico/kube-controllers:v3.25.0|/ddn-k8s//calico/kube-controllers:v3.25.0|g'
4) Deployment of calico web services
kubectl apply -f
Viewing a Kubernetes cluster that belongs to thekube-system
Details of all Pods in the namespace (both control and worker nodes are checked):
kubectl get pod --namespace kube-system -o wide
The message that calico was successfully installed is roughly as follows:
[!IMPORTANT]
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-665548954f-5dgjw 1/1 Running 0 3m3s 10.255.112.131 master01
calico-node-4gb27 1/1 Running 0 3m3s 192.168.116.142 master02
calico-node-4kckd 1/1 Running 0 3m3s 192.168.116.144 node
calico-node-cpnwp 1/1 Running 0 3m3s 192.168.116.141 master01
calico-node-hldt2 1/1 Running 0 3m3s 192.168.116.143 master03
coredns-66f779496c-8pzvp 1/1 Running 0 61m 10.255.112.129 master01
coredns-66f779496c-frsvq 1/1 Running 0 61m 10.255.112.130 master01
etcd-master01 1/1 Running 0 61m 192.168.116.141 master01
etcd-master02 1/1 Running 0 21m 192.168.116.142 master02
etcd-master03 1/1 Running 0 20m 192.168.116.143 master03
kube-apiserver-master01 1/1 Running 0 61m 192.168.116.141 master01
kube-apiserver-master02 1/1 Running 0 22m 192.168.116.142 master02
kube-apiserver-master03 1/1 Running 1 (21m ago) 19m 192.168.116.143 master03
kube-controller-manager-master01 1/1 Running 1 (21m ago) 61m 192.168.116.141 master01
kube-controller-manager-master02 1/1 Running 0 22m 192.168.116.142 master02
kube-controller-manager-master03 1/1 Running 0 20m 192.168.116.143 master03
kube-proxy-jvt6w 1/1 Running 0 31m 192.168.116.144 node
kube-proxy-lw8g4 1/1 Running 0 61m 192.168.116.141 master01
kube-proxy-mjw8h 1/1 Running 0 22m 192.168.116.142 master02
kube-proxy-rtlpz 1/1 Running 0 21m 192.168.116.143 master03
kube-scheduler-master01 1/1 Running 1 (21m ago) 61m 192.168.116.141 master01
kube-scheduler-master02 1/1 Running 0 22m 192.168.116.142 master02
kube-scheduler-master03 1/1 Running 0 19m 192.168.116.143 master03
6. Configure Etcd for high availability
Etcd's default yaml file --initial-cluster only specifies itself, so it needs to be modified to specify our three Master node hosts:
Master01 node
sed -i 's|--initial-cluster=master01=https://192.168.116.141:2380|--initial-cluster=master01=https://192.168.116.141:2380,master02=https://192.168.116.142:2380,master03=https://192.168.116.143:2380|' /etc/kubernetes/manifests/
Master02 node
sed -i 's|--initial-cluster=master01=https://192.168.116.141:2380|--initial-cluster=master01=https://192.168.116.142:2380,master02=https://192.168.116.142:2380,master03=https://192.168.116.143:2380|' /etc/kubernetes/manifests/
Master03 node
sed -i 's|--initial-cluster=master01=https://192.168.116.141:2380|--initial-cluster=master01=https://192.168.116.143:2380,master02=https://192.168.116.142:2380,master03=https://192.168.116.143:2380|' /etc/kubernetes/manifests/
- The purpose of the modification is to make a change in theKubernetes High Availability (HA) Cluster hit the nail on the headetcd cluster Provides initial cluster information about the node. Simply put, the
--initial-cluster
is configured to tell the etcd node information about other members of the cluster so that they can communicate with each other, maintain consistency, and provide high availability guarantees.
7. k8s apiserver certificate renewal
View the certificate expiration date for the Kubernetes API server:
openssl x509 -in /etc/kubernetes/pki/ -noout -text | grep Not
[!CAUTION]
Not Before: Oct 6 14:09:37 2024 GMT Not After : Oct 6 14:55:00 2025 GMT
Use kubeadm to deploy k8s automatically created crt certificate is valid for 1 year time, in order to ensure that the certificate is effective for a long time, here choose to modify the expiration time of the crt certificate to 100 years, download the official provide script (operates on the Master control node):
git clone /yuyicai/
cd update-kube-cert
chmod 755
sed -i '1,$s/CERT_DAYS=3650/CERT_DAYS=36500/g'
./ all
./ all --cri containerd
openssl x509 -in /etc/kubernetes/pki/ -noout -text | grep Not
[!CAUTION]
Not Before: Oct 6 15:43:28 2024 GMT Not After : Sep 12 15:43:28 2124 GMT
The extension of the statute of limitations of the certificate was successful!