This article is the nineteenth in a series on writing Docker from scratch, adding support for cgroup v2.
For the full code see:/lixd/mydocker
Welcome, Star.
We recommend reading the following articles to get a general idea of the basic implementation of docker:
- core principle:A Deeper Understanding of Docker Core Principles: Namespace, Cgroups, and Rootfs
- Namespace-based view isolation:Exploring Linux Namespace: Behind the Magic of Docker Isolation
-
Resource limitation based on cgroups
- A First Look at Linux Cgroups: The Wonderful World of Resource Control
- A deep dive into the Linux Cgroups subsystem: fine-grained resource management
- Docker and Linux Cgroups: A Magic Tour of Resource Isolation
- Filesystems based on overlayfs:Docker Magic Demystified: Exploring UnionFS and OverlayFS
- Docker networking based on veth pair, bridge, iptables, etc.:Docker Networking Revealed: Manually Implementing Docker Bridged Networks
The development environment is as follows:
root@mydocker:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
root@mydocker:~# uname -r
5.4.0-74-generic
Note: The root user is required
1. General
This article mainly add support for cgroup v2, automatically recognize the current system cgroup version.
2. Realization
Determining cgroup version
Use the following command to see whether Cgroups V1 or V2 is currently in use on your system
stat -fc %T /sys/fs/cgroup/
If the output iscgroup2fs
That's V2, like this.
root@tezn:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs
If the output istmpfs
That's V1, like this.
[root@docker cgroup]# stat -fc %T /sys/fs/cgroup/
tmpfs
The Go implementation is as follows:
const (
unifiedMountpoint = "/sys/fs/cgroup"
)
var (
isUnifiedOnce
isUnified bool
)
// IsCgroup2UnifiedMode returns whether we are running in cgroup v2 unified mode.
func IsCgroup2UnifiedMode() bool {
(func() {
var st unix.Statfs_t
err := (unifiedMountpoint, &st)
if err != nil && (err) {
// For rootless containers, sweep it under the rug.
isUnified = false
return
}
isUnified = == unix.CGROUP2_SUPER_MAGIC
})
return isUnified
}
cgroup v2 support
The process of using cgroup v2 is basically the same as v1.
- 1) Create a sub-cgroup
- 2) Configure cpu, memory, etc. Subsystems
- 3) Configure the processes that need to be restricted
Creating sub-cgroups
To create a sub-cgroup, create a subdirectory under the cgroup root directory, which for cgroup v2 will be/sys/fs/cgroup
const UnifiedMountpoint = "/sys/fs/cgroup"
// getCgroupPath finds the absolute path to the cgroup on the filesystem
/*
The actual path is the root directory and cgroup name stitched together into one path.
If auto-create is specified, check if it exists first. If the corresponding directory does not exist, the cgroup does not exist, so create one here.
*/
func getCgroupPath(cgroupPath string, autoCreate bool) (string, error) {
// Returns the cgroup without autocreate.
cgroupRoot := UnifiedMountpoint
absPath := (cgroupRoot, cgroupPath)
if !autoCreate {
return absPath, nil
}
// Determine if it exists only if autoCreate is specified
_, err := (absPath)
// Create only if it doesn't exist
if err ! = nil && (err) {
err = (absPath, constant.Perm0755)
return absPath, err
}
return absPath, (err, "create cgroup")
}
Configuring the Subsystem
In the case of cpu, for example, just add a specific limit to it, like this:
echo 5000 10000 >
The meaning is that out of 10000 CPU cycles, 5000 are allocated to this cgroup, that is, the processes managed by this cgroup will not use more than 50% of the CPU on a single core.
The specific realizations are as follows:
func (s *CpuSubSystem) Set(cgroupPath string, res *) error {
if == 0 {
return nil
}
subCgroupPath, err := getCgroupPath(cgroupPath, true)
if err ! = nil {
return err
}
// cpu.cfs_period_us & cpu.cfs_quota_us controls CPU usage in microseconds, e.g. for every 1 second, the process can only use 200ms, which is equivalent to only 20% of the CPU.
// In v2, cpu.cfs_period_us & cpu.cfs_quota_us are directly recorded in cpu.cfs_period_us, e.g., 5000 10000, which is a limit of 50% of the CPU used.
This limits cpu usage to 50% if ! = 0 {
// cpu.cfs_quota_us is controlled by the parameters passed by the user, e.g. if the parameter is 20, it is limited to 20% CPU, so just set cpu.cfs_quota_us to 20% of cpu.cfs_period_us.
// This is a simple calculation, and doesn't deal with special cases like negative numbers or anything else.
if err = ((subCgroupPath, ""), []byte(("%s %s", (PeriodDefault/Percent*), PeriodDefault)), constant.Perm0644); err ! = nil {
return ("set cgroup cpu share fail %v", err)
}
}
return nil
}
Configure the processes to be restricted
Just write the pid in.
echo 1033 >
The Go implementation is as follows:
func (s *CpuSubSystem) Apply(cgroupPath string, pid int) error {
return applyCgroup(pid, cgroupPath)
}
func applyCgroup(pid int, cgroupPath string) error {
subCgroupPath, err := getCgroupPath(cgroupPath, true)
if err != nil {
return (err, "get cgroup %s", cgroupPath)
}
if err = ((subCgroupPath, ""), []byte((pid)),
constant.Perm0644); err != nil {
return ("set cgroup proc fail %v", err)
}
return nil
}
removes
Delete the subdirectory under cgroup to remove the
func (s *CpuSubSystem) Remove(cgroupPath string) error {
subCgroupPath, err := getCgroupPath(cgroupPath, false)
if err != nil {
return err
}
return (subCgroupPath)
}
Compatible with V1 and V2
Simply determine the current system cgroup version when creating the CgroupManager
func NewCgroupManager(path string) CgroupManager {
if IsCgroup2UnifiedMode() {
("use cgroup v2")
return NewCgroupManagerV2(path)
}
("use cgroup v1")
return NewCgroupManagerV1(path)
}
3. Testing
cgroup v1
Go to the cgroup v1 environment for testing
root@mydocker:~/mydocker# ./mydocker run -mem 10m -cpu 10 -it -name cgroupv1 busybox /bin/sh
{"level":"info","msg":"createTty true","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"resConf:\u0026{10m 10 }","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"lower:/var/lib/mydocker/overlay2/3845479957/lower :/var/lib/mydocker/image/","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"mount overlayfs: [/usr/bin/mount -t overlay overlay -o lowerdir=/var/lib/mydocker/overlay2/3845479957/lower,upperdir=/var/lib/mydocker/overlay2/3845479957/upper,workdir=/var/lib/mydocker/overlay2/3845479957/work /var/lib/mydocker/overlay2/3845479957/merged]","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"use cgroup v1","time":"2024-04-14T13:23:19+08:00"}
{"level":"error","msg":"apply subsystem:cpuset err:set cgroup proc fail write /sys/fs/cgroup/cpuset/mydocker-cgroup/tasks: no space left on device","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"command all is /bin/sh","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"init come on","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"Current location is /var/lib/mydocker/overlay2/3845479957/merged","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2024-04-14T13:23:19+08:00"}
According to the logs, cgroup v1 is currently in use.
{"level":"info","msg":"use cgroup v1","time":"2024-04-14T13:23:19+08:00"}
Execute the following command to test memory allocation
yes > /dev/null
As you can see, it was killed by OOM after a while.
/ # yes > /dev/null
Killed
Execute the following command to run the cpu to full capacity
while : ; do : ; done &
It's really limited to 10%.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1212 root 20 0 1332 68 4 R 9.9 0.0 0:02.30 sh
cgroup v2
Go to the cgroup v2 environment for testing, or refer to the following steps to switch to the v2 version.
Switch to cgroup v2
You can also manually enable cgroup v2 on your Linux distribution by modifying the kernel cmdline boot parameter.
If your distribution uses GRUB, then you should add the following to the/etc/default/grub
lowerGRUB_CMDLINE_LINUX
Addsystemd.unified_cgroup_hierarchy=1
Then executesudo update-grub
。
Edit grub configuration
vi /etc/default/grub
The content is roughly like this:
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""
To the last lineGRUB_CMDLINE_LINUX
carry out modifications
GRUB_CMDLINE_LINUX="quiet splash systemd.unified_cgroup_hierarchy=1"
Then execute the following command to update the GRUB configuration
sudo update-grub
Take a final look at the startup parameters to make sure the configuration changes are in place
cat /boot/grub/ | grep "systemd.unified_cgroup_hierarchy=1"
And then there's the reboot.
reboot
After rebooting and checking, it's switched to cgroups v2, no surprise!
root@cgroupv2:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs
beta (software)
./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh
root@mydocker:~/mydocker# ./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh
{"level":"info","msg":"createTty true","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"resConf:\u0026{10m 10 }","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"lower:/var/lib/mydocker/overlay2/3526930704/lower :/var/lib/mydocker/image/","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"mount overlayfs: [/usr/bin/mount -t overlay overlay -o lowerdir=/var/lib/mydocker/overlay2/3526930704/lower,upperdir=/var/lib/mydocker/overlay2/3526930704/upper,workdir=/var/lib/mydocker/overlay2/3526930704/work /var/lib/mydocker/overlay2/3526930704/merged]","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"use cgroup v2","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"init come on","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"command all is /bin/sh","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"Current location is /var/lib/mydocker/overlay2/3526930704/merged","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2024-04-14T13:26:32+08:00"}
According to the logs, cgroup v2 is currently in use.
{"level":"info","msg":"use cgroup v2","time":"2024-04-14T13:26:32+08:00"}
Perform the same test, the results are consistent, indicating that cgroup v2 is working properly.
Execute the following command to test memory allocation
yes > /dev/null
As you can see, it was killed by OOM after a while.
/ # yes > /dev/null
Killed
Execute the following command to run the cpu to full capacity
while : ; do : ; done &
It's really limited to 10%.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1212 root 20 0 1332 68 4 R 9.9 0.0 0:02.30 sh
4. Summary
This article adds cgroup v2 support to mydocker, which allows you to adaptively switch between cgroup versions.
For the full code see:/lixd/mydocker
Welcome to follow~
Docker from Scratch SeriesContinuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.
The relevant code can be found atfeat-cgroup-v2
branch, the test script is as follows:
You need to prepare the files in the /var/lib/mydocker/image directory in advance, as described in Section IV.2.
# Cloning Code
git clone -b feat-cgroup-v2 /lixd/
cd mydocker
# Pulling dependencies and compiling
go mod tidy
go build .
# beta (software)
./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh