0. Preface
This series introduces the Go runtime scheduler. In order to learn the Go language, you can't go wrong with the runtime, which is a layer of the "operating system" that does "all types" of processing on our programs. The scheduler, as the core of the runtime, is a must-know. In this series, we will go deeper into the source code of the runtime scheduler in conjunction with the Go plan9 assembly to see how the runtime scheduler works in various scenarios, such as program runtime, goroutine thread creation, etc. We will also discuss how the runtime scheduler works in the context of the Go plan9 assembly.
This series will utilize Go plan9 assembly-related knowledge, so if you're not familiar with it, check it out first!here are Understand.
1. Go program initialization
First, from a classicHello World
Programs to start with, look at the start of the program and what the scheduler does when you start that program.
package main
func main() {
println("Hello World")
}
1.1 Preparation
Program startup goes through two stages, compilation and linking, which we can do via thego build -x
View the process of building a program:
# go build -x
...
// compile compiling
/usr/local/go/pkg/tool/linux_amd64/compile -o $WORK/b001/_pkg_.a -trimpath "$WORK/b001=>" -p main -complete -buildid uHBjeIlqt1oQO9TLC5SE/uHBjeIlqt1oQO9TLC5SE -goversion go1.21.0 -c=3 -nolocalimports -importcfg $WORK/b001/importcfg -pack ./
...
// link Linking library files to generate executables
/usr/local/go/pkg/tool/linux_amd64/link -o $WORK/b001/exe/ -importcfg $WORK/b001/ -buildmode=exe -buildid=27kmwBgRtsWy6cL5ofDV/uHBjeIlqt1oQO9TLC5SE/Ye3W7EEwzML-FanTsWbe/27kmwBgRtsWy6cL5ofDV -extld=gcc $WORK/b001/_pkg_.a
Here the irrelevant output is omitted, and the executable file is obtained after the compilation and linking processhello
:
# ls
hello
# ./hello
Hello World
1.2 Access to the program
The previous section generated the executable programhello
. Moving on to the subject of this article, bydlv
go intohello
program to see what the runtime does when executing a Go program.
We can do this through thereadelf
View the entry point of the executable program:
# readelf -h ./hello
ELF Header:
...
Entry point address: 0x455e40
Omit irrelevant information and focus on theEntry point address
It's into thehello
The entry point to the program. The entry point to the program is accessed through thedlv
Access to this entry point:
# dlv exec ./hello
Type 'help' for list of commands.
(dlv) b *0x455e40
Breakpoint 1 set at 0x455e40 for _rt0_amd64_linux() /usr/local/go/src/runtime/rt0_linux_amd64.s:8
You can see that the entry point points to the/go/src/runtime/rt0_linux_amd64.s
hit the nail on the head_rt0_amd64_linux()
function.
Next, go into this function to see what the runtime does when you start a Go program.
// c The command is executed to the entry point location
(dlv) c
> _rt0_amd64_linux() /usr/local/go/src/runtime/rt0_linux_amd64.s:8 (hits total:1) (PC: 0x455e40)
Warning: debugging optimized function
3: // license that can be found in the LICENSE file.
4:
5: #include ""
6:
7: TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
=> 8: JMP _rt0_amd64(SB) // jump to _rt0_amd64
// si single-step instruction
(dlv) si
> _rt0_amd64() /usr/local/go/src/runtime/asm_amd64.s:16 (PC: 0x454200)
Warning: debugging optimized function
TEXT _rt0_amd64(SB) /usr/local/go/src/runtime/asm_amd64.s
=> asm_amd64.s:16 0x454200 488b3c24 mov rdi, qword ptr [rsp]
asm_amd64.s:17 0x454204 488d742408 lea rsi, ptr [rsp+0x8]
asm_amd64.s:18 0x454209 e912000000 jmp $runtime.rt0_go // 这里jump to runtime (used form a nominal expression) rt0_go
// go into rt0_go
(dlv) si
> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:161 (PC: 0x454220)
Warning: debugging optimized function
TEXT runtime.rt0_go(SB) /usr/local/go/src/runtime/asm_amd64.s
=> asm_amd64.s:161 0x454220 4889f8 mov rax, rdi
asm_amd64.s:162 0x454223 4889f3 mov rbx, rsi
asm_amd64.s:163 0x454226 4883ec28 sub rsp, 0x28
asm_amd64.s:164 0x45422a 4883e4f0 and rsp, -0x10
asm_amd64.s:165 0x45422e 4889442418 mov qword ptr [rsp+0x18], rax
asm_amd64.s:166 0x454233 48895c2420 mov qword ptr [rsp+0x20], rbx
rt0_go
is the entry point for runtime to execute Go programs.
It should be added that the si we use shows the instructions that the CPU executes in a single step, the instructions that the CPU actually executes. The Go plan9 assemblies are "optimized" assemblies, so the output shown in si is not the same as that defined in asm_amd64.s. When analyzing the two, you can combine them together. You can analyze the two together when you actually analyze it.
linkingasm_amd64.s/rt0_go Analyze the CPU instructions output by si:
=> asm_amd64.s:161 0x454220 4889f8 mov rax, rdi // move argc from rdi register to rax register: rax = argc
asm_amd64.s:162 0x454223 4889f3 mov rbx, rsi // move argv from rsi register to rbx register: rbx = argv
asm_amd64.s:163 0x454226 4883ec28 sub rsp, 0x28 // open stack space
asm_amd64.s:164 0x45422a 4883e4f0 and rsp, -0x10 // align stack space to an integer multiple of 16 bytes (because a set of SSE instructions for the CPU requires memory addresses to be a multiple of 16 bytes)
asm_amd64.s:165 0x45422e 4889442418 mov qword ptr [rsp+0x18], rax // move argc to stack space [rsp+0x18]
asm_amd64.s:166 0x454233 48895c2420 mov qword ptr [rsp+0x20], rbx // move argv to stack space [rsp+0x20]
Draw the stack space as shown below:
Continued analysis.
(dlv) si
> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:170 (PC: 0x454238)
Warning: debugging optimized function
asm_amd64.s:166 0x454233 48895c2420 mov qword ptr [rsp+0x20], rbx
=> asm_amd64.s:170 0x454238 488d3d815b0700 lea rdi, ptr [runtime.g0] // move the address of runtime.g0 into the rdi register, rdi = &g0
asm_amd64.s:171 0x45423f 488d9c240000ffff lea rbx, ptr [rsp+0xffff0000] // move the value of the [rsp+0xffff0000] address into rbx, more on this later
asm_amd64.s:172 0x454247 48895f10 mov qword ptr [rdi+0x10], rbx // move the address in rbx to [rdi+0x10], which is actually to g0.stackguard0
asm_amd64.s:173 0x45424b 48895f18 mov qword ptr [rdi+0x18], rbx // move address in rbx to [rdi+0x18], actually to g0.stackguard1
asm_amd64.s:174 0x45424f 48891f mov qword ptr [rdi], rbx // move address in rbx to [rdi], actually to
asm_amd64.s:175 0x454252 48896708 mov qword ptr [rdi+0x8], rsp // move the address in rsp, to [rdi+0x8], actually to
commandruntime.g0 Provides an execution environment for the main runtime thread to run in; it is not a goroutine that executes user code.
utilizationregs
Viewing Registersrbx
What is stored:
(dlv) regs
Rip = 0x000000000045423f
Rsp = 0x00007ffec8d155f0
Rbx = 0x00007ffec8d15628
(dlv) si
> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:172 (PC: 0x454247)
Warning: debugging optimized function
asm_amd64.s:171 0x45423f 488d9c240000ffff lea rbx, ptr [rsp+0xffff0000]
=> asm_amd64.s:172 0x454247 48895f10 mov qword ptr [rdi+0x10], rbx
(dlv) regs
Rip = 0x0000000000454247
Rsp = 0x00007ffec8d155f0
Rbx = 0x00007ffec8d055f0
As you can see, this instruction actually points to a section of stack space, thersp:0x00007ffec8d155f0
points to the bottom of the stack.rbx:0x00007ffec8d055f0
point to the top of the stack, and their memory space is 64KB.
Based on the above analysis, the stack space layout is drawn as follows:
Continue the analysis down the page, omitting some irrelevant assembly code. Directly from theasm_amd64.s/runtime·rt0_go:258
Start watching:
258 LEAQ runtime·m0+m_tls(SB), DI
259 CALL runtime·settls(SB)
260
261 // store through it, to make sure it works
262 get_tls(BX)
263 MOVQ $0x123, g(BX)
264 MOVQ runtime·m0+m_tls(SB), AX
265 CMPQ AX, $0x123
266 JEQ 2(PC)
267 CALL runtime·abort(SB)
dlv
Breakpoint to assembly instruction line 258:
(dlv) b /usr/local/go/src/runtime/asm_amd64.s:258
Breakpoint 2 set at 0x4542cb for runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:258
(dlv) c
(dlv) si
> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:259 (PC: 0x4542d2)
Warning: debugging optimized function
// Move [runtime.m0+136] address to rdi, rdi = &runtime.
asm_amd64.s:258 0x4542cb* 488d3d565f0700 lea rdi, ptr [runtime.m0+136]
// call set thread local storage
=> asm_amd64.s:259 0x4542d2 e809240000 call $
// move 0x123 to fs:[0xfffffff8]
asm_amd64.s:263 0x4542d7 6448c70425f8ffffff23010000 mov qword ptr fs:[0xfffffff8], 0x123
// move the value of [runtime.m0+136] to the rax register
asm_amd64.s:264 0x4542e4 488b053d5f0700 mov rax, qword ptr [runtime.m0+136]
// Compare the value of the rax register to see if it is equal to 0x123, if not then call $
asm_amd64.s:265 0x4542eb 483d23010000 cmp rax, 0x123
asm_amd64.s:266 0x4542f1 7405 jz 0x4542f8
asm_amd64.s:267 0x4542f3 e808040000 call $
This instruction involves knowledge of Thread Local Storage. Thread local storage (TLS) is a mechanism that allows each thread to have its own separate set of variables, even if those variables share the same code across multiple threads. In Go runtime, each operating system thread (M) needs to know which goroutine (G) it is currently executing. To access this information efficiently, Go runtime uses TLS to store a pointer to G. This allows each thread to access the pointer to G via TLS. This allows each thread to quickly find out which G it is currently running through the TLS. m0 is the first operating system thread when a Go program is started and is responsible for initializing the entire Go runtime. m0 automatically sets up the TLS for other threads as they are created through Go runtime's scheduler and writes the pointer to G to the TLS. m0 is a special thread, however. It is created directly by the operating system without going through the Go scheduler, so you need to set the TLS via assembly instructions.
The logic of this instruction is to set theruntime.
The address is sent to therdi
register, followed by a call to Sets the value of the fs segment base address register so that it can be accessed by the segment base address and the offset of the. Finally verify that the set
[Paragraph base address: offset]
Is it possible to correctly access thewill
0x123
float[Paragraph base address: offset]
, which at this point, if accessed correctly, should be passed to the[0] = 0x123
and then the[runtime.m0+136]
The content of the[0]
Take it out and move it torax
registers are compared, and if they are the same, then it means that the registers have been passed through the[Paragraph base address: offset]
can correctly access theOtherwise, call
abort
runtime
。
Each thread has its own set of CPU register values, and different threads store global variables privately through different segment fs base address registers. For more detailed information, please refer to theGo Language Scheduler Source Code Scenario Analysis No. 10: Thread Local Storage。
To deepen our understanding of this piece, let's look at exactly how it's set up from an assembly perspective.
asm_amd64.s:258 0x4542cb* 488d3d565f0700 lea rdi, ptr [runtime.m0+136]
=> rdi = &runtime. = 0x00000000004ca228
asm_amd64.s:259 0x4542d2 e809240000 call $
=> The setting is the value of the Fs_base segment base address register, regs View Fs_base=0x00000000004ca230
asm_amd64.s:263 0x4542d7 6448c70425f8ffffff23010000 mov qword ptr fs:[0xfffffff8], 0x123
=> fs:[0xfffffff8], fs is the segment base address, actually the value of the Fs_base segment base address register, and [0xfffffff8] is the offset. fs:[0xfffffff8] = 0x00000000004ca230:[0xfffffff8] = 0x00000000004ca228
=> The actual memory address accessed through the segment base address register fs:[0xfffffff8] is the address 0x00000000004ca228
Continue down the line:
=> asm_amd64.s:271 0x4542f8 488d0dc15a0700 lea rcx, ptr [runtime.g0] // move address of runtime.g0 to rcx, rcx = &runtime.g0
asm_amd64.s:272 0x4542ff 6448890c25f8ffffffff mov qword ptr fs:[0xfffffff8], rcx // move rcx to , actually [0] = &runtime.g0
asm_amd64.s:273 0x454308 488d05915e0700 lea rax, ptr [runtime.m0] // move address of runtime.m0 to rax, rax = &runtime.m0
asm_amd64.s:276 0x45430f 488908 mov qword ptr [rax], rcx // move address of runtime.g0 to runtime.m0, actually runtime.m0.g0 = &runtime.g0
asm_amd64.s:278 0x454312 48894130 mov qword ptr [rcx+0x30], rax // move address of runtime.m0 to runtime., actually runtime. = &runtime.m0
What the above instruction does is associate the main threadm0
cap (a poem)g0
This way.m0
Then you have the runtime execution environment. Draw the memory layout as shown below:
2. Summary
This brings us to the end of the initialization part of our program, and the next article will formally move on to the scheduler part.