Go runtime scheduler in a nutshell (4): running the main goroutine

0. Preface

The sky is the limit, and finally we are at the point where we can run the main goroutine session. Let's walk around and see how a goroutine really works.

1. Run goroutine

To recap, the first lecture, Go Program Initialization, describes how a Go program enters the runtime, and then unveils the runtime. The second talk describes the initialization of the scheduler, which is necessary to run the goroutine scheduler, and can only work when the scheduler is ready. The third talk describes how the main goroutine is created. Only when a goroutine is created can it start running, otherwise there is no way to execute code. In this talk, we will continue to describe how to run the main goroutine.

We know that the main goroutine is in _Grunnable state, and to make the main goroutine in _Grunning state, we need to bind it to P. After all, P is responsible for scheduling tasks for threads to process, and only threads bound to P can process the corresponding goroutine. After all, P is responsible for scheduling tasks for threads to process, and only threads bound to P can process the corresponding goroutine.

1.1 Binding P

Back to the codenewproc：

func newproc(fn *funcval) {
gp := getg()
pc := getcallerpc()
systemstack(func() {
newg := newproc1(fn, gp, pc) // create newg, in this case the main goroutine

pp := getg(). () // Get the current working thread bound to P, here = allp[0]
runqput(pp, newg, true) // Bind allp[0] to the main goroutine.

if mainStarted { // mainStarted is not started yet, here is false
wakep()
}
})
}

go intorunqput function to see how the main goroutine is connected to theallp[0] Bound:

// runqput tries to put g on the local runnable queue.
// If next is false, runqput adds g to the tail of the runnable queue.
If next is false, runqput adds g to the tail of the runnable queue. // If next is true, runqput puts g in the slot.
// If next is false, runqput adds g to the tail of the runnable queue. // If next is true, runqput puts g in the slot.
// Executed only by the owner P.
func runqput(pp *p, gp *g, next bool) {
...
if next {
oldnext := // from P.
oldnext := // Get the next goroutine to be executed from P's runnext, where = nil
here = nil if ! (oldnext, guintptr((gp))) { // Update P's runnext to gp, where gp is the main goroutine.
goto retryNext
}
if oldnext == 0 { // if the original goroutine P was going to execute was nil, return it, if the main goroutine was created, return it
here the main goroutine is created and will be returned directly.
}
gp = () // If not nil, it's a goroutine that will be executed, and the ousted goroutine will be processed later.
}

retry.
h := (&)
t :=

if t-h < uint32(len()) { // P's tail and head point to the local run queue runq, if the current queue length is less than runq add the evicted goroutine to the tail of the queue
[t%uint32(len())].set(gp)
(&, t+1)
return
}

if runqputslow(pp, gp, h, t) { // If the current queue length of P is equal to or not less than runq, the local queue is full, add the evicted goroutine to the global queue.
return
}

goto retry
}

runqput The function binds P to the goroutine and handles the local run queue in P at the same time. The basic flow is described more clearly in the comments.

Here we are binding to the main goroutine, which is just as good as binding directly to the runnext member of P. However, for therunqput In terms of the overall treatment of therunqputslow function:

// Put g and a batch of work from local runnable queue on global queue.
// Executed only by the owner P.
func runqputslow(pp *p, gp *g, h, t uint32) bool {
var batch [len()/2 + 1]*g // Define a batch that is half as long as g. batch is used to hold g.

// First, grab a batch from the local queue. n := t - h
n := t - h
n = n / 2
if n ! = uint32(len()/2) {
throw("runqputslow: queue is not full")
}
for i := uint32(0); i < n; i++ {
batch[i] = [(h+i)%uint32(len())].ptr() // take half of g from P's runq and put it into batch
}
If ! (&, h, h+n) { // case-release, commits consume // update P's runqhead to point to the head of the local queue
return false
}
batch[n] = gp // put the evicted goroutine at the end of the batch

if randomizeScheduler { // if randomizeScheduler, then we have to randomize the order of the g's in the batch to ensure randomization
for i := uint32(1); i <= n; i++ {
j := fastrandn(i + 1)
batch[i], batch[j] = batch[j], batch[i]
}
}

// Link the goroutines.
for i := uint32(0); i < n; i++ {
batch[i]. (batch[i+1]) // The schedlinks of the goroutines in the batch point to the other goroutines in order, constructing a chain table.
}
var q gQueue // gQueue is a pointer to the head and tail of a batch, pointing to the head batch[0] and the tail batch[n], respectively.
(batch[0])
(batch[n])

// Now put the batch on global queue.
lock(&) // Manipulate the global variable sched to lock it.
globrunqputbatch(& q, int32(n+1)) // globrunqputbatch passes the batch pointed to by q to global variable sched. unlock(&) // unlock the batch[0] (batch[n]) // Now put the batch on global queue.
unlock(&) // unlock the batch
unlock(&) // return true
unlock(&) // return true

func globrunqputbatch(batch *gQueue, n int32) {
assertLockHeld(&)

(*batch) // Here we point to the batch.
+= n // the runqsize of the sched plus n, where n is the newly added goroutine
*batch = gQueue{}
}

If P's local queue is full, then therunqputslow The goroutine is half of the local queue in the global queue. Here the local queue is of fixed length and has limited capacity, and an array is used to represent the queue. The length of the global queue is not fixed, and the global queue is represented by a linked list.

We can draw a schematic as follows. Note that the schematic is just for better understanding and has nothing to do with how we run the main goroutine here:

1.2 Running the main goroutine

With P bound to the main goroutine, it is theoretically possible to run the main goroutine already. Go ahead and see what the code executes:

> runtime.rt0_go() /usr/local/go/src/runtime/asm_amd64.s:358 (PC: 0x45434a)
Warning: debugging optimized function
   353: PUSHQ AX
   354: CALL runtime·newproc(SB)
   355: POPQ AX
   356:
   357:         // start this M
=> 358:         CALL runtime·mstart(SB) // call (programming) mstart means that the current thread is working；mstart is a function that never returns
   359:
   360:         CALL runtime·abort(SB) // mstart should never return
   361:         RET
   362:

Downward implementation:

(dlv) si
> () /usr/local/go/src/runtime/asm_amd64.s:394 (PC: 0x4543c0)
Warning: debugging optimized function
TEXT (SB) /usr/local/go/src/runtime/asm_amd64.s
=>      asm_amd64.s:394 0x4543c0        e87b290000      call $runtime.mstart0
        asm_amd64.s:395 0x4543c5        c3              ret

invocationsruntime.mstart0：

func mstart0() {
	gp := getg()                // gp = g0
    ...
    mstart1()
    ...
}

call (programming)mstart1：

func mstart1() {
gp := getg() // gp = g0

    // Saves the thread execution stack so that it doesn't return when the thread enters the schedule function, which means that the thread execution stack is reusable
     = guintptr((gp))
= getcallerpc()
= getcallersp()

    ...
    if fn := ; fn ! = nil { // Execute main goroutine, fn == nil
fn()
}

    ...
    schedule() // The thread enters the schedule loop, which never returns.
}

go intoschedule：

func schedule() {
mp := getg().m // mp = m0
    ...
top.
pp := () // pp = allp[0]
= false

    // Threads have two states, spin and non-spin. Spin means that the thread is not working and is in the work finding phase. Non-spin means the thread is working.
    // If the thread is spinning, but there is a goroutine in the P local queue to which the thread is bound, an exception is thrown.
    if && ( ! = 0 || ! = ) {
throw("schedule: spinning with local work")
}

    // blocks until work is available
    gp, inheritTime, tryWakeP := findRunnable() // Find a goroutine that is in _Grunnable state.

    ...
    execute(gp, inheritTime) // Run the goroutine, in this case the main goroutine.
}

schedule The focus infindRunaable function to enter the function:

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
// tryWakeP indicates that the returned goroutine is not normal (GC worker, trace
// reader) so the caller should try to wake a P.
func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
	mp := getg().m // mp = m0

top:
	pp := () // pp = allp[0] = p0

    ...
    // Check the global runnable queue once in a while to ensure fairness.
    // Otherwise two goroutines can completely occupy the local runqueue
    // by constantly respawning each other.
    // The official notes explain this piece of logic in great detail already，We'll just skip it.，be lazy
    if %61 == 0 && > 0 {
        lock(&)
		gp := globrunqget(pp, 1)
		unlock(&)
		if gp != nil {
			return gp, false, false
		}
    }

    // local runq
    // through (a gap) P local queue looking for goroutine
	if gp, inheritTime := runqget(pp); gp != nil {
		return gp, inheritTime, false
	}
    ...
}

findRunnable First of all, to be fair, every 61 calls to the schedule function will fetch goroutines from the global runnable queue, so as to prevent the goroutines in the global queue from being "starved". Then get the goroutine from P's local queue, here the main goroutine will get the goroutine from P's local queue. seerunqget：

func runqget(pp *p) (gp *g, inheritTime bool) {
	// If there's a runnext, it's the next G to run.
	next := 
	// If the runnext is non-0 and the CAS fails, it could only have been stolen by another P,
	// because other Ps can race to set runnext to 0, but only the current P can set it to non-0.
	// Hence, there's no need to retry this CAS if it fails.
	if next != 0 && (next, 0) {
		return (), true
	}

	for {
		h := (&) // load-acquire, synchronize with other consumers
		t := 
		if t == h {
			return nil, false
		}
		gp := [h%uint32(len())].ptr()
		if (&, h, h+1) { // cas-release, commits consume
			return gp, false
		}
	}
}

The annotations are detailed enough, first get the runnext of P as the goroutine to run, if the goroutine is not empty, reset the runnext and return the goroutine, if the goroutine is empty, get the goroutine from the local queue.

pass (a bill or inspection etc)findRunnable We get the executable main goroutine and call theexecute Execute the main goroutine.

go intoexecute：

func execute(gp *g, inheritTime bool) {
	mp := getg().m // mp = m0

     = gp // = g1
	 = mp // = m0
	casgstatus(gp, _Grunnable, _Grunning) // update goroutine The state of affairs of the _Grunning
	 = 0
	 = false
	gp.stackguard0 = + stackGuard
	if !inheritTime {
		().schedtick++
	}

    ...
    gogo(&)
}

existexecute The thread is associated with the gouroutine, the state of the goroutine is updated, and the switch from the g0 stack to the gp stack is accomplished by calling gogo, which is written in assembly for the following reasons:

The gogo function is also written in assembly language, which is necessary because the scheduling of the goroutine involves switching between different execution streams.

As we've seen in the previous discussion of operating system thread switching, the switching of execution streams is essentially the switching of CPU registers and function call stacks. However, neither go nor c is a high-level language that can accurately control CPU registers, so high-level languages can't do anything about it, and you have to rely on assembly instructions to achieve the goal.

go intogogo，gogo Passed in is the sched structure of the goroutine:

TEXT runtime·gogo(SB), NOSPLIT, $0-8
	MOVQ buf+0(FP), BX // gobuf
	MOVQ gobuf_g(BX), DX // gobuf (used form a nominal expression) g assign DX
	MOVQ 0(DX), CX // make sure g != nil
	JMP gogo<>(SB) // Jump to private function gogo<>

TEXT gogo<>(SB), NOSPLIT, $0
	get_tls(CX) // Get current thread tls 中(used form a nominal expression) goroutine
	MOVQ DX, g(CX)
	MOVQ DX, R14 // set the g register
	MOVQ gobuf_sp(BX), SP // restore SP
	MOVQ gobuf_ret(BX), AX // AX =
	MOVQ gobuf_ctxt(BX), DX // DX =
	MOVQ gobuf_bp(BX), BP // BP =
	MOVQ $0, gobuf_sp(BX) // clear to help garbage collector
	MOVQ $0, gobuf_ret(BX)
	MOVQ $0, gobuf_ctxt(BX)
	MOVQ $0, gobuf_bp(BX)
	MOVQ gobuf_pc(BX), BX // BX =
	JMP BX // jump to

existgogo<> The switch from the g0 to the gp stack is accomplished in theMOVQ gobuf_sp(BX), SPand jumps to execution. Let's see what the address of the instruction to be executed is:

asm_amd64.s:421 0x45363a        488b5b08                mov rbx, qword ptr [rbx+0x8]
=>      asm_amd64.s:422 0x45363e        ffe3                    jmp rbx
(dlv) regs
    Rbx = 0x000000000042ee80

fulfillmentJMP BX jump to0x000000000042ee80：

(dlv) si
> () /usr/local/go/src/runtime/:144 (PC: 0x42ee80)
Warning: debugging optimized function
TEXT (SB) /usr/local/go/src/runtime/
=>      :144     0x42ee80        4c8d6424e8      lea r12, ptr [rsp-0x18]

Finally we unraveled its mystery, this command points to the function's first assembly instruction. That is, the jump to theThis function calls the main function in our main package. Check out the function:

// The main goroutine.
func main() {
	mp := getg().m // mp = m0

    if == 8 {
		maxstacksize = 1000000000 // bonded warehouse，The maximum space of the stack is 1GB
	} else {
		maxstacksize = 250000000
	}

    ...
    // Allow newproc to start new Ms.
	mainStarted = true

    if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
		systemstack(func() {
			newm(sysmon, nil, -1) // Open a monitoring thread，This thread is important.，We'll talk about it later.，I'll leave it here for now.，have sb do sth sysmon Fly a while.
		})
	}

    ...
    // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
    fn := main_main // Here. main_main The link is main package main function (math.)
	fn() // fulfillment
    ...
    runExitHooks(0)

	exit(0) // fulfillment完 followed by a call to exit Exit Thread
    for {
		var x *int32
		*x = 0
	}
}

is executed in the main goroutine stack. The call in the function executes the user code we wrote:

(dlv) n
266:            fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
=> 267:         fn()
(dlv) s
> () ./:3 (PC: 0x45766a)
Warning: debugging optimized function
     1: package main
     2:
=>   3: func main() {
     4:         println("Hello World")
     5: }

After execution the thread callsexit(0) Withdrawal from the program.

2. Summary

So our main goroutine is done, and it took us four lectures to get through a main goroutine, so that's not easy. Of course, the story of the Go runtime scheduler isn't over yet, so let's continue in the next talk.