Go runtime scheduler in a nutshell (II): scheduler initialization

0. Preface

previous session Having introduced the process of Go program initialization, this talk moves on to the initialization of the scheduler.

Continuing with the execution from the previous lecture, omitting some irrelevant code, the execution to theruntime/asm_amd64.s:rt0_go:343L：

(dlv) si
        asm_amd64.s:343 0x45431c* 8b442418 mov eax, dword ptr [rsp+0x18] // [rsp+0x18] stores the value of argc, eax = argc
        asm_amd64.s:344 0x454320 890424 mov dword ptr [rsp], eax // move argc to rsp, [rsp] = argc
        asm_amd64.s:345 0x454323 488b442420 mov rax, qword ptr [rsp+0x20] // [rsp+0x20] stores the value of argv, rax = [rsp+0x20]
        asm_amd64.s:346 0x454328 4889442408 mov qword ptr [rsp+0x8], rax // move argv to [rsp+0x8], [rsp+0x8] = argv
        asm_amd64.s:347 0x45432d e88e2a0000 call $ // call process argc and argv on stack
        asm_amd64.s:348 0x454332 e8c9280000 call $ // call Initialize the number of system cores
        asm_amd64.s:349 0x454337 e8e4290000 call $ call $ // Initialize system core count.

The above command calls the Processes the function arguments, followed by a call to Initialize the number of system cores. existruntime.os_linux.go Defined in:

func osinit() {
	ncpu = getproccount()
	physHugePageSize = getHugePageSize()
	osArchInit()
}

Main initialization system core countncpu, that core is the number of logical cores.

Moving on to the main topic of this article scheduler initialization function.

1. Scheduler initialization

The code for scheduler initialization is in the：

// The bootstrap sequence is:
//
// call osinit
// call schedinit
// make & queue new G
// call runtime·mstart
//
// The new G calls runtime·main.
func schedinit() {
    // step1: through (a gap) TLS Get the current execution thread's goroutine，gp = [0] = g0
    gp := getg()

    // step2: Setting the maximum number of threads
	 = 10000

    // step3: Initializing threads，Here the thread is initialized m0
    mcommoninit(, -1)

    // step4: call (programming) procresize establish Ps
    procs := ncpu
	if procresize(procs) != nil {
		throw("unknown runnable goroutine during bootstrap")
	}
}

Omits irrelevant code from the function.

First, step1 callsgetg() Get the goroutine that the current thread is executing. runtime is everywheregetg()Go runtime maintains a pointer to g for each thread (OS thread or Go runtime thread), indicating the goroutine the current thread is running.

intuitive analysis.get() The assembly implementation is similar to the following:

TEXT runtime-getg(SB), NOSPLIT, $0
    MOVQ TLS, AX // get g from Thread Local Storage
    MOVQ g(AX), BX // Move the value of g to the BX register.
    RET

After you get the current execution goroutine, you can call themcommoninit Initializes the thread that executes the goroutine:

func mcommoninit(mp *m, id int64) {
    // Get the thread's goroutine，Here's what's being fetched g0
	gp := getg()
    ...

    // For global variables sched lock
    lock(&)

    // set up mp (used form a nominal expression) id
    if id >= 0 {
		 = id
	} else {
		 = mReserveID()
	}

    // Add to allm so garbage collector doesn't free g->m
	// when it is just in a register or thread-local storage.
	 = allm

	// NumCgoCall() iterates over allm w/o schedlock,
	// so we need to publish it safely.
	atomicstorep((&allm), (mp))         // allm = &m0
	unlock(&)
}

mcommoninit function creates a new function for themp Set the id and associate mp with the global variable allm. Update the memory distribution as shown below:

Continue to step4procresize function, which isschedinit The focus of the

func procresize(nprocs int32) *p {
    // old = gomaxprocs = 0
old := gomaxprocs
if old < 0 || nprocs <= 0 {
throw("procresize: invalid arg")
}

    // procresize adjusts the number of P's based on the new nprocs, don't do it here, skip it
    if nprocs > int32(len(allp)) {
        ...
    }

    // Initialize P
for i := old; i < nprocs; i++ {
pp := allp[i]
if pp == nil {
pp = new(p)
}
        // Initialize the newly created P
allp[i] if pp == nil { pp = new(p) }
        // Associate the newly created P with the global variable allp
atomicstorep((&allp[i]), (pp)) // allp[i] = &pp
}
    ...
}

procresize The function is rather long and is presented here in segments.

First create P, then callinit Initialize the created P:

func (pp *p) init(id int32) {
	 = id
	 = _Pgcstop        // _Pgcstop = 3
    ...
}

The id of the newly created P is the index i of the loop, and the state is _Pgcstop. Next, combine the created P with the global variableallp Make the correlation.

Read on.procresize function:

func procresize(nprocs int32) *p {
    // gp = g0
    gp := getg()

    // Determine if the executing goroutine thread is bound to P or not.
    // If so, and it's a valid P, continue binding; if not, go to else logic;
if ! = 0 && ().id < nprocs {
// continue to use the current P
().status = _Prunning
(). ()
    } else {
...
= 0 // initialize = 0
pp := allp[0] // take the first P from allp
= 0 // set m of P equal to 0
= _Pidle // update P's state to _Pidle(0)
acquirep(pp) // associate P with m
...
}
}

acquirep() function binds P to the current thread m as follows:

func acquirep(pp *p) {
wirep(pp)
...
}

func wirep(pp *p) {
    // gp = g0
gp := getg()

    // Throw an exception if the current thread is already bound to P
if ! = 0 {
throw("wirep: already in go")
}

    // If the current P is already bound to m and the state of P is not equal to _Pidle then throw an exception
if ! = 0 || ! = _Pidle {
id := int64(0)
if ! = 0 {
id = ().id
}
print("wirep: p->m=", , "(", id, ") p->status=", , "\n"))
throw("wirep: invalid p state")
}

(pp) // bind P of current thread m to pp, here = allp[0]
() // Bind m of P to current thread, here allp[0].m = m0
= _Prunning // If P is bound to m, that means P can schedule g to run on the thread. Here we set P's state to _Prunning(1)
}

Based on the above analysis, update the memory distribution as shown below:

(Here we have nprocs = 3, so len(allp) = 3 in the graph)

It doesn't end here. Keep reading.procresize：

func procresize(nprocs int32) *p {
    ...
    // runnablePs stores the runnable Ps.
    var runnablePs *p
for i := nprocs - 1; i >= 0; i-- {
pp := allp[i]
        // skip if P is the current thread-bound P
if () == pp {
continue
}
        // Set the state of P to _Pidle(0) to indicate that P is currently idle.
= _Pidle

        // runqempty determines if the local run queue in P is empty.
        // If it is empty, then no goroutine exists in P
if runqempty(pp) {
pidleput(pp, now) // if empty, bind P to the global variable sched, so that threads can use sched to find P in an idle state
} else {
(mget()) // if not null, call mget() to get the idle thread m, and bind to it
(runnablePs) // point P's link to runnablePs to indicate that P is runnable
runnablePs = pp // point runnablePs to P, and the caller gets the runnable P via runnalbePs
}
}

...
return runnablePs
}

The final segment is what to do with P in allp that is not bound to the current thread. First, you set the state of P to _Pidle(0), then you callrunqempty Determines whether the local run queue for the current thread is empty:

// runqempty reports whether pp has no Gs on its local run queue.
// It never returns true spuriously.
func runqempty(pp *p) bool {
	// Defend against a race where 1) pp has G1 in runqnext but runqhead == runqtail,
	// 2) runqput on pp kicks G1 to the runq, 3) runqget on pp empties runqnext.
	// Simply observing that runqhead == runqtail and then observing that runqnext == nil
	// does not mean the queue is empty.
	for {
		head := (&)
		tail := (&)
		runnext := ((*uintptr)((&)))
		if tail == (&) {
			return head == tail && runnext == 0
		}
	}
}

Here in Prunq The local run queue is stored. runqhead of P points to the head of the runq queue (actually an array), and runqtail points to the tail of the runq queue.
runnext in P points to the next goroutine to be executed, which has the highest priority. You can refer to therunqempty to see why determining an empty queue is written the way it is.

If there is no running goroutine in P, then callpidleput Add P to the global variable sched:

func pidleput(pp *p, now int64) int64 {
...
= // =
(pp) // = P
(1) // the number of P's in space
...
return now
}

Here's ournprocs = 3, the initialization has only one allp[0] that is _Prunning and the remaining two Ps are in _Pidle state. Update the memory distribution as shown below:

2. Summary

Well, here we are basically done with our scheduler initialization logic. In the next talk, we will continue to analyze the creation of the main gouroutine.