0. Preface
The sky is the limit, and finally we are at the point where we can run the main goroutine session. Let's walk around and see how a goroutine really works.
1. Run goroutine
To recap, the first lecture, Go Program Initialization, describes how a Go program enters the runtime, and then unveils the runtime. The second talk describes the initialization of the scheduler, which is necessary to run the goroutine scheduler, and can only work when the scheduler is ready. The third talk describes how the main goroutine is created. Only when a goroutine is created can it start running, otherwise there is no way to execute code. In this talk, we will continue to describe how to run the main goroutine.
We know that the main goroutine is in _Grunnable state, and to make the main goroutine in _Grunning state, we need to bind it to P. After all, P is responsible for scheduling tasks for threads to process, and only threads bound to P can process the corresponding goroutine. After all, P is responsible for scheduling tasks for threads to process, and only threads bound to P can process the corresponding goroutine.
1.1 Binding P
Back to the codenewproc
:
func newproc(fn *funcval) {
gp := getg()
pc := getcallerpc()
systemstack(func() {
newg := newproc1(fn, gp, pc) // create newg, in this case the main goroutine
pp := getg(). () // Get the current working thread bound to P, here = allp[0]
runqput(pp, newg, true) // Bind allp[0] to the main goroutine.
if mainStarted { // mainStarted is not started yet, here is false
wakep()
}
})
}
go intorunqput
function to see how the main goroutine is connected to theallp[0]
Bound:
// runqput tries to put g on the local runnable queue.
// If next is false, runqput adds g to the tail of the runnable queue.
If next is false, runqput adds g to the tail of the runnable queue. // If next is true, runqput puts g in the slot.
// If next is false, runqput adds g to the tail of the runnable queue. // If next is true, runqput puts g in the slot.
// Executed only by the owner P.
func runqput(pp *p, gp *g, next bool) {
...
if next {
oldnext := // from P.
oldnext := // Get the next goroutine to be executed from P's runnext, where = nil
here = nil if ! (oldnext, guintptr((gp))) { // Update P's runnext to gp, where gp is the main goroutine.
goto retryNext
}
if oldnext == 0 { // if the original goroutine P was going to execute was nil, return it, if the main goroutine was created, return it
here the main goroutine is created and will be returned directly.
}
gp = () // If not nil, it's a goroutine that will be executed, and the ousted goroutine will be processed later.
}
retry.
h := (&)
t :=
if t-h < uint32(len()) { // P's tail and head point to the local run queue runq, if the current queue length is less than runq add the evicted goroutine to the tail of the queue
[t%uint32(len())].set(gp)
(&, t+1)
return
}
if runqputslow(pp, gp, h, t) { // If the current queue length of P is equal to or not less than runq, the local queue is full, add the evicted goroutine to the global queue.
return
}
goto retry
}
runqput
The function binds P to the goroutine and handles the local run queue in P at the same time. The basic flow is described more clearly in the comments.
Here we are binding to the main goroutine, which is just as good as binding directly to the runnext member of P. However, for therunqput
In terms of the overall treatment of therunqputslow
function:
// Put g and a batch of work from local runnable queue on global queue.
// Executed only by the owner P.
func runqputslow(pp *p, gp *g, h, t uint32) bool {
var batch [len()/2 + 1]*g // Define a batch that is half as long as g. batch is used to hold g.
// First, grab a batch from the local queue. n := t - h
n := t - h
n = n / 2
if n ! = uint32(len()/2) {
throw("runqputslow: queue is not full")
}
for i := uint32(0); i < n; i++ {
batch[i] = [(h+i)%uint32(len())].ptr() // take half of g from P's runq and put it into batch
}
If ! (&, h, h+n) { // case-release, commits consume // update P's runqhead to point to the head of the local queue
return false
}
batch[n] = gp // put the evicted goroutine at the end of the batch
if randomizeScheduler { // if randomizeScheduler, then we have to randomize the order of the g's in the batch to ensure randomization
for i := uint32(1); i <= n; i++ {
j := fastrandn(i + 1)
batch[i], batch[j] = batch[j], batch[i]
}
}
// Link the goroutines.
for i := uint32(0); i < n; i++ {
batch[i]. (batch[i+1]) // The schedlinks of the goroutines in the batch point to the other goroutines in order, constructing a chain table.
}
var q gQueue // gQueue is a pointer to the head and tail of a batch, pointing to the head batch[0] and the tail batch[n], respectively.
(batch[0])
(batch[n])
// Now put the batch on global queue.
lock(&) // Manipulate the global variable sched to lock it.
globrunqputbatch(& q, int32(n+1)) // globrunqputbatch passes the batch pointed to by q to global variable sched. unlock(&) // unlock the batch[0] (batch[n]) // Now put the batch on global queue.
unlock(&) // unlock the batch
unlock(&) // return true
unlock(&) // return true
func globrunqputbatch(batch *gQueue, n int32) {
assertLockHeld(&)
(*batch) // Here we point to the batch.
+= n // the runqsize of the sched plus n, where n is the newly added goroutine
*batch = gQueue{}
}
If P's local queue is full, then therunqputslow
The goroutine is half of the local queue in the global queue. Here the local queue is of fixed length and has limited capacity, and an array is used to represent the queue. The length of the global queue is not fixed, and the global queue is represented by a linked list.
We can draw a schematic as follows. Note that the schematic is just for better understanding and has nothing to do with how we run the main goroutine here:
1.2 Running the main goroutine
With P bound to the main goroutine, it is theoretically possible to run the main goroutine already. Go ahead and see what the code executes:
> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:358 (PC: 0x45434a)
Warning: debugging optimized function
353: PUSHQ AX
354: CALL runtime·newproc(SB)
355: POPQ AX
356:
357: // start this M
=> 358: CALL runtime·mstart(SB) // call (programming) mstart means that the current thread is working;mstart is a function that never returns
359:
360: CALL runtime·abort(SB) // mstart should never return
361: RET
362:
Downward implementation:
(dlv) si
> () /usr/local/go/src/runtime/asm_amd64.s:394 (PC: 0x4543c0)
Warning: debugging optimized function
TEXT (SB) /usr/local/go/src/runtime/asm_amd64.s
=> asm_amd64.s:394 0x4543c0 e87b290000 call $runtime.mstart0
asm_amd64.s:395 0x4543c5 c3 ret
invocationsruntime.mstart0
:
func mstart0() {
gp := getg() // gp = g0
...
mstart1()
...
}
call (programming)mstart1
:
func mstart1() {
gp := getg() // gp = g0
// Saves the thread execution stack so that it doesn't return when the thread enters the schedule function, which means that the thread execution stack is reusable
= guintptr((gp))
= getcallerpc()
= getcallersp()
...
if fn := ; fn ! = nil { // Execute main goroutine, fn == nil
fn()
}
...
schedule() // The thread enters the schedule loop, which never returns.
}
go intoschedule
:
func schedule() {
mp := getg().m // mp = m0
...
top.
pp := () // pp = allp[0]
= false
// Threads have two states, spin and non-spin. Spin means that the thread is not working and is in the work finding phase. Non-spin means the thread is working.
// If the thread is spinning, but there is a goroutine in the P local queue to which the thread is bound, an exception is thrown.
if && ( ! = 0 || ! = ) {
throw("schedule: spinning with local work")
}
// blocks until work is available
gp, inheritTime, tryWakeP := findRunnable() // Find a goroutine that is in _Grunnable state.
...
execute(gp, inheritTime) // Run the goroutine, in this case the main goroutine.
}
schedule
The focus infindRunaable
function to enter the function:
// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
// tryWakeP indicates that the returned goroutine is not normal (GC worker, trace
// reader) so the caller should try to wake a P.
func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
mp := getg().m // mp = m0
top:
pp := () // pp = allp[0] = p0
...
// Check the global runnable queue once in a while to ensure fairness.
// Otherwise two goroutines can completely occupy the local runqueue
// by constantly respawning each other.
// The official notes explain this piece of logic in great detail already,We'll just skip it.,be lazy
if %61 == 0 && > 0 {
lock(&)
gp := globrunqget(pp, 1)
unlock(&)
if gp != nil {
return gp, false, false
}
}
// local runq
// through (a gap) P local queue looking for goroutine
if gp, inheritTime := runqget(pp); gp != nil {
return gp, inheritTime, false
}
...
}
findRunnable
First of all, to be fair, every 61 calls to the schedule function will fetch goroutines from the global runnable queue, so as to prevent the goroutines in the global queue from being "starved". Then get the goroutine from P's local queue, here the main goroutine will get the goroutine from P's local queue. seerunqget
:
func runqget(pp *p) (gp *g, inheritTime bool) {
// If there's a runnext, it's the next G to run.
next :=
// If the runnext is non-0 and the CAS fails, it could only have been stolen by another P,
// because other Ps can race to set runnext to 0, but only the current P can set it to non-0.
// Hence, there's no need to retry this CAS if it fails.
if next != 0 && (next, 0) {
return (), true
}
for {
h := (&) // load-acquire, synchronize with other consumers
t :=
if t == h {
return nil, false
}
gp := [h%uint32(len())].ptr()
if (&, h, h+1) { // cas-release, commits consume
return gp, false
}
}
}
The annotations are detailed enough, first get the runnext of P as the goroutine to run, if the goroutine is not empty, reset the runnext and return the goroutine, if the goroutine is empty, get the goroutine from the local queue.
pass (a bill or inspection etc)findRunnable
We get the executable main goroutine and call theexecute
Execute the main goroutine.
go intoexecute
:
func execute(gp *g, inheritTime bool) {
mp := getg().m // mp = m0
= gp // = g1
= mp // = m0
casgstatus(gp, _Grunnable, _Grunning) // update goroutine The state of affairs of the _Grunning
= 0
= false
gp.stackguard0 = + stackGuard
if !inheritTime {
().schedtick++
}
...
gogo(&)
}
existexecute
The thread is associated with the gouroutine, the state of the goroutine is updated, and the switch from the g0 stack to the gp stack is accomplished by calling gogo, which is written in assembly for the following reasons:
The gogo function is also written in assembly language, which is necessary because the scheduling of the goroutine involves switching between different execution streams.
As we've seen in the previous discussion of operating system thread switching, the switching of execution streams is essentially the switching of CPU registers and function call stacks. However, neither go nor c is a high-level language that can accurately control CPU registers, so high-level languages can't do anything about it, and you have to rely on assembly instructions to achieve the goal.
go intogogo
,gogo
Passed in is the sched structure of the goroutine:
TEXT runtime·gogo(SB), NOSPLIT, $0-8
MOVQ buf+0(FP), BX // gobuf
MOVQ gobuf_g(BX), DX // gobuf (used form a nominal expression) g assign DX
MOVQ 0(DX), CX // make sure g != nil
JMP gogo<>(SB) // Jump to private function gogo<>
TEXT gogo<>(SB), NOSPLIT, $0
get_tls(CX) // Get current thread tls 中(used form a nominal expression) goroutine
MOVQ DX, g(CX)
MOVQ DX, R14 // set the g register
MOVQ gobuf_sp(BX), SP // restore SP
MOVQ gobuf_ret(BX), AX // AX =
MOVQ gobuf_ctxt(BX), DX // DX =
MOVQ gobuf_bp(BX), BP // BP =
MOVQ $0, gobuf_sp(BX) // clear to help garbage collector
MOVQ $0, gobuf_ret(BX)
MOVQ $0, gobuf_ctxt(BX)
MOVQ $0, gobuf_bp(BX)
MOVQ gobuf_pc(BX), BX // BX =
JMP BX // jump to
existgogo<>
The switch from the g0 to the gp stack is accomplished in theMOVQ gobuf_sp(BX), SP
and jumps to execution. Let's see what the address of the instruction to be executed is:
asm_amd64.s:421 0x45363a 488b5b08 mov rbx, qword ptr [rbx+0x8]
=> asm_amd64.s:422 0x45363e ffe3 jmp rbx
(dlv) regs
Rbx = 0x000000000042ee80
fulfillmentJMP BX
jump to0x000000000042ee80
:
(dlv) si
> () /usr/local/go/src/runtime/:144 (PC: 0x42ee80)
Warning: debugging optimized function
TEXT (SB) /usr/local/go/src/runtime/
=> :144 0x42ee80 4c8d6424e8 lea r12, ptr [rsp-0x18]
Finally we unraveled its mystery, this command points to the function's first assembly instruction. That is, the jump to the
This function calls the main function in our main package. Check out the
function:
// The main goroutine.
func main() {
mp := getg().m // mp = m0
if == 8 {
maxstacksize = 1000000000 // bonded warehouse,The maximum space of the stack is 1GB
} else {
maxstacksize = 250000000
}
...
// Allow newproc to start new Ms.
mainStarted = true
if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
systemstack(func() {
newm(sysmon, nil, -1) // Open a monitoring thread,This thread is important.,We'll talk about it later.,I'll leave it here for now.,have sb do sth sysmon Fly a while.
})
}
...
// make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
fn := main_main // Here. main_main The link is main package main function (math.)
fn() // fulfillment
...
runExitHooks(0)
exit(0) // fulfillment完 followed by a call to exit Exit Thread
for {
var x *int32
*x = 0
}
}
is executed in the main goroutine stack. The call in the function executes the user code we wrote:
(dlv) n
266: fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
=> 267: fn()
(dlv) s
> () ./:3 (PC: 0x45766a)
Warning: debugging optimized function
1: package main
2:
=> 3: func main() {
4: println("Hello World")
5: }
After execution the thread calls
exit(0)
Withdrawal from the program.
2. Summary
So our main goroutine is done, and it took us four lectures to get through a main goroutine, so that's not easy. Of course, the story of the Go runtime scheduler isn't over yet, so let's continue in the next talk.