Location>code7788 >text

Writing Docker from Scratch (XIX) - Adding cgroup v2 Support

Popularity:577 ℃/2024-07-24 09:12:57

This article is the nineteenth in a series on writing Docker from scratch, adding support for cgroup v2.


For the full code see:/lixd/mydocker
Welcome, Star.

We recommend reading the following articles to get a general idea of the basic implementation of docker:

  • core principleA Deeper Understanding of Docker Core Principles: Namespace, Cgroups, and Rootfs
  • Namespace-based view isolationExploring Linux Namespace: Behind the Magic of Docker Isolation
  • Resource limitation based on cgroups
    • A First Look at Linux Cgroups: The Wonderful World of Resource Control
    • A deep dive into the Linux Cgroups subsystem: fine-grained resource management
    • Docker and Linux Cgroups: A Magic Tour of Resource Isolation
  • Filesystems based on overlayfsDocker Magic Demystified: Exploring UnionFS and OverlayFS
  • Docker networking based on veth pair, bridge, iptables, etc.Docker Networking Revealed: Manually Implementing Docker Bridged Networks

The development environment is as follows:

root@mydocker:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal
root@mydocker:~# uname -r
5.4.0-74-generic

Note: The root user is required

1. General

This article mainly add support for cgroup v2, automatically recognize the current system cgroup version.

2. Realization

Determining cgroup version

Use the following command to see whether Cgroups V1 or V2 is currently in use on your system

stat -fc %T /sys/fs/cgroup/

If the output iscgroup2fs That's V2, like this.

root@tezn:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs

If the output istmpfs That's V1, like this.

[root@docker cgroup]# stat -fc %T /sys/fs/cgroup/
tmpfs

The Go implementation is as follows:

const (
	unifiedMountpoint = "/sys/fs/cgroup"
)

var (
	isUnifiedOnce 
	isUnified     bool
)

// IsCgroup2UnifiedMode returns whether we are running in cgroup v2 unified mode.
func IsCgroup2UnifiedMode() bool {
	(func() {
		var st unix.Statfs_t
		err := (unifiedMountpoint, &st)
		if err != nil && (err) {
			// For rootless containers, sweep it under the rug.
			isUnified = false
			return
		}
		isUnified =  == unix.CGROUP2_SUPER_MAGIC
	})
	return isUnified
}

cgroup v2 support

The process of using cgroup v2 is basically the same as v1.

  • 1) Create a sub-cgroup
  • 2) Configure cpu, memory, etc. Subsystems
  • 3) Configure the processes that need to be restricted

Creating sub-cgroups

To create a sub-cgroup, create a subdirectory under the cgroup root directory, which for cgroup v2 will be/sys/fs/cgroup

const UnifiedMountpoint = "/sys/fs/cgroup"

// getCgroupPath finds the absolute path to the cgroup on the filesystem
/*
The actual path is the root directory and cgroup name stitched together into one path.
If auto-create is specified, check if it exists first. If the corresponding directory does not exist, the cgroup does not exist, so create one here.
*/
func getCgroupPath(cgroupPath string, autoCreate bool) (string, error) {
// Returns the cgroup without autocreate.
cgroupRoot := UnifiedMountpoint
absPath := (cgroupRoot, cgroupPath)
if !autoCreate {
return absPath, nil
}
// Determine if it exists only if autoCreate is specified
_, err := (absPath)
// Create only if it doesn't exist
if err ! = nil && (err) {
err = (absPath, constant.Perm0755)
return absPath, err
}
return absPath, (err, "create cgroup")
}

Configuring the Subsystem

In the case of cpu, for example, just add a specific limit to it, like this:

echo 5000 10000 > 

The meaning is that out of 10000 CPU cycles, 5000 are allocated to this cgroup, that is, the processes managed by this cgroup will not use more than 50% of the CPU on a single core.

The specific realizations are as follows:

func (s *CpuSubSystem) Set(cgroupPath string, res *) error {
if == 0 {
return nil
}
subCgroupPath, err := getCgroupPath(cgroupPath, true)
if err ! = nil {
return err
}

// cpu.cfs_period_us & cpu.cfs_quota_us controls CPU usage in microseconds, e.g. for every 1 second, the process can only use 200ms, which is equivalent to only 20% of the CPU.
// In v2, cpu.cfs_period_us & cpu.cfs_quota_us are directly recorded in cpu.cfs_period_us, e.g., 5000 10000, which is a limit of 50% of the CPU used.
This limits cpu usage to 50% if ! = 0 {
// cpu.cfs_quota_us is controlled by the parameters passed by the user, e.g. if the parameter is 20, it is limited to 20% CPU, so just set cpu.cfs_quota_us to 20% of cpu.cfs_period_us.
// This is a simple calculation, and doesn't deal with special cases like negative numbers or anything else.
if err = ((subCgroupPath, ""), []byte(("%s %s", (PeriodDefault/Percent*), PeriodDefault)), constant.Perm0644); err ! = nil {
return ("set cgroup cpu share fail %v", err)
}
}
return nil
}

Configure the processes to be restricted

Just write the pid in.

echo 1033 > 

The Go implementation is as follows:

func (s *CpuSubSystem) Apply(cgroupPath string, pid int) error {
	return applyCgroup(pid, cgroupPath)
}

func applyCgroup(pid int, cgroupPath string) error {
	subCgroupPath, err := getCgroupPath(cgroupPath, true)
	if err != nil {
		return (err, "get cgroup %s", cgroupPath)
	}
	if err = ((subCgroupPath, ""), []byte((pid)),
		constant.Perm0644); err != nil {
		return ("set cgroup proc fail %v", err)
	}
	return nil
}

removes

Delete the subdirectory under cgroup to remove the

func (s *CpuSubSystem) Remove(cgroupPath string) error {
	subCgroupPath, err := getCgroupPath(cgroupPath, false)
	if err != nil {
		return err
	}
	return (subCgroupPath)
}

Compatible with V1 and V2

Simply determine the current system cgroup version when creating the CgroupManager

func NewCgroupManager(path string) CgroupManager {
	if IsCgroup2UnifiedMode() {
		("use cgroup v2")
		return NewCgroupManagerV2(path)
	}
	("use cgroup v1")
	return NewCgroupManagerV1(path)
}

3. Testing

cgroup v1

Go to the cgroup v1 environment for testing

root@mydocker:~/mydocker# ./mydocker run -mem 10m -cpu 10 -it -name cgroupv1 busybox /bin/sh
{"level":"info","msg":"createTty true","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"resConf:\u0026{10m 10 }","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"lower:/var/lib/mydocker/overlay2/3845479957/lower :/var/lib/mydocker/image/","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"mount overlayfs: [/usr/bin/mount -t overlay overlay -o lowerdir=/var/lib/mydocker/overlay2/3845479957/lower,upperdir=/var/lib/mydocker/overlay2/3845479957/upper,workdir=/var/lib/mydocker/overlay2/3845479957/work /var/lib/mydocker/overlay2/3845479957/merged]","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"use cgroup v1","time":"2024-04-14T13:23:19+08:00"}
{"level":"error","msg":"apply subsystem:cpuset err:set cgroup proc fail write /sys/fs/cgroup/cpuset/mydocker-cgroup/tasks: no space left on device","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"command all is /bin/sh","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"init come on","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"Current location is /var/lib/mydocker/overlay2/3845479957/merged","time":"2024-04-14T13:23:19+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2024-04-14T13:23:19+08:00"}

According to the logs, cgroup v1 is currently in use.

{"level":"info","msg":"use cgroup v1","time":"2024-04-14T13:23:19+08:00"}

Execute the following command to test memory allocation

yes > /dev/null

As you can see, it was killed by OOM after a while.

/ # yes > /dev/null
Killed

Execute the following command to run the cpu to full capacity

while : ; do : ; done &

It's really limited to 10%.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND              
1212 root      20   0    1332     68      4 R   9.9   0.0   0:02.30 sh  

cgroup v2

Go to the cgroup v2 environment for testing, or refer to the following steps to switch to the v2 version.

Switch to cgroup v2

You can also manually enable cgroup v2 on your Linux distribution by modifying the kernel cmdline boot parameter.

If your distribution uses GRUB, then you should add the following to the/etc/default/grub lowerGRUB_CMDLINE_LINUX Addsystemd.unified_cgroup_hierarchy=1Then executesudo update-grub

Edit grub configuration

vi /etc/default/grub

The content is roughly like this:

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""

To the last lineGRUB_CMDLINE_LINUXcarry out modifications

GRUB_CMDLINE_LINUX="quiet splash systemd.unified_cgroup_hierarchy=1"

Then execute the following command to update the GRUB configuration

sudo update-grub

Take a final look at the startup parameters to make sure the configuration changes are in place

cat /boot/grub/ | grep "systemd.unified_cgroup_hierarchy=1"

And then there's the reboot.

reboot

After rebooting and checking, it's switched to cgroups v2, no surprise!

root@cgroupv2:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs

beta (software)

./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh
root@mydocker:~/mydocker# ./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh
{"level":"info","msg":"createTty true","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"resConf:\u0026{10m 10 }","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"lower:/var/lib/mydocker/overlay2/3526930704/lower :/var/lib/mydocker/image/","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"mount overlayfs: [/usr/bin/mount -t overlay overlay -o lowerdir=/var/lib/mydocker/overlay2/3526930704/lower,upperdir=/var/lib/mydocker/overlay2/3526930704/upper,workdir=/var/lib/mydocker/overlay2/3526930704/work /var/lib/mydocker/overlay2/3526930704/merged]","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"use cgroup v2","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"init come on","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"command all is /bin/sh","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"Current location is /var/lib/mydocker/overlay2/3526930704/merged","time":"2024-04-14T13:26:32+08:00"}
{"level":"info","msg":"Find path /bin/sh","time":"2024-04-14T13:26:32+08:00"}

According to the logs, cgroup v2 is currently in use.

{"level":"info","msg":"use cgroup v2","time":"2024-04-14T13:26:32+08:00"}

Perform the same test, the results are consistent, indicating that cgroup v2 is working properly.

Execute the following command to test memory allocation

yes > /dev/null

As you can see, it was killed by OOM after a while.

/ # yes > /dev/null
Killed

Execute the following command to run the cpu to full capacity

while : ; do : ; done &

It's really limited to 10%.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND              
1212 root      20   0    1332     68      4 R   9.9   0.0   0:02.30 sh  

4. Summary

This article adds cgroup v2 support to mydocker, which allows you to adaptively switch between cgroup versions.


For the full code see:/lixd/mydocker
Welcome to follow~


Docker from Scratch SeriesContinuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.


The relevant code can be found atfeat-cgroup-v2 branch, the test script is as follows:

You need to prepare the files in the /var/lib/mydocker/image directory in advance, as described in Section IV.2.

# Cloning Code
git clone -b feat-cgroup-v2 /lixd/
cd mydocker
# Pulling dependencies and compiling
go mod tidy
go build .
# beta (software)
./mydocker run -mem 10m -cpu 10 -it -name cgroupv2 busybox /bin/sh