GoLang Concurrent Goroutiney Principles and GMP Modeling Explained

This article was originally published at.GoLang Concurrent Goroutiney Principles and GMP Modeling Explained

What is a goroutine?

Goroutine is a lightweight thread in the Go language, which also becomes a concurrent thread, managed by the Go runtime. It is one of the core concepts of concurrent programming in the Go language.Goroutine is designed to make implementing concurrent programming in Go very simple and efficient.

Here are some key features about Goroutine:

Lightweight: Goroutines are created and switched with very little overhead. Goroutines take up less memory and resources than OS-level threads. A typical Goroutine requires only a few kilobytes of stack space, and the stack space can grow dynamically as needed.
Concurrent execution: Goroutines can execute multiple tasks concurrently, and Go automatically schedules the Goroutine to an available processor at runtime to take full advantage of the power of multi-core processors.
Simple syntax: Starting a Goroutine is as simple as prefixing the function call with the go keyword. For example, go myFunction() will start a new Goroutine to execute the myFunction function.
Communication and Synchronization : The Go language provides the Channel mechanism for communication and synchronization between Goroutines. Channels are a type-safe way of communicating to pass data between different Goroutines.

What is a concurrent program?

Concurrent Programming (Coroutine) is a lighter weight approach to concurrent programming than threading. It allows multiple tasks to be executed within a single thread, and can switch between tasks without the overhead of thread context switching. Concurrency is achieved through collaborative multitasking, which means that switching between tasks is explicitly controlled by the program, rather than being scheduled by the operating system.

Here are some of the key features of concurrent programs:

Lightweight: Co-threads have very low overhead for creation and switching because they don't require OS-level thread management.
Non-preemptive: the switching of coprograms is explicit and is specified by the programmer in the code, rather than being preemptively scheduled by the operating system.
State preservation: A concurrent program can save its state when execution is suspended and continue from where it was suspended when execution resumes.
Asynchronous Programming: Co-programming is well suited for asynchronous programming, especially in I/O-intensive tasks, where you can switch to other tasks while waiting for the I/O operation to complete, thus increasing program concurrency and efficiency.

Goroutin is Go's implementation of the scenario of concatenation.

Here's a simple go goroutine example showing how to use concurrency:

package main

import (
"fmt"
"sync"
"time"
)

// Define a simple function that simulates a time-consuming operation
func printNumbers(wg *) {
defer () // Call the Done method at the end of the function.
for i := 1; i <= 5; i++ {
("Number: %d\n", i)
(1 * ) // simulate a time-consuming operation
}
}

func main() {
var wg

// Start a goroutine to execute the printNumbers function.
(1)
go printNumbers(&wg)

// The main goroutine continues with the rest of the execution.
for i := 'A'; i <= 'E'; i++ {
("Letter: %c\n", i)
(1 * ) // simulate a time-consuming operation
}

// Wait for all goroutines to complete
()
}

We define a function called printNumbers that prints the numbers 1 through 5 and pauses for 1 second after each print. Then, in the main function, we use the go keyword to start a new goroutine to execute the printNumbers function. Meanwhile, the main goroutine continues to perform other operations, printing the letters A through E and pausing for 1 second after each print.

Note that the main goroutine and the newly started goroutine are executed concurrently. To ensure that all goroutines complete, we use to wait for all goroutines to complete. We call (1) before starting the goroutine and () at the end of the printNumbers function. Finally, we call () in the main function and wait for all goroutines to complete. This ensures that the program does not exit until all the goroutines have completed.

Concurrency is a powerful tool that simplifies concurrent programming, especially when dealing with I/O-intensive tasks.

Goroutin Implementation Principle

The principle of Goroutine implementation includes several aspects such as Goroutine creation, scheduling, context switching and stack management. Through the GPM model and efficient scheduling mechanism, Go runtime can efficiently manage and schedule a large number of Goroutines to achieve highly concurrent programming.

Goroutine creation

When starting a new Goroutine with the go keyword, the Go runtime performs the following steps:

Allocate G structure: Go will allocate a G structure (G for Goroutine) for the new Goroutine when it runs, which contains the Goroutine's state information, stack pointer, program counter, etc.
Allocating stack space: When Go runs, it allocates initial stack space, usually a few kilobytes, to the new Goroutine, which grows dynamically and can be expanded automatically as needed.
Initialize G structure: Go will initialize the G structure when it runs, and fill the Goroutine's entry function, parameters, stack pointer and other information into the G structure.
Add a Goroutine to the scheduling queue: When Go runs, it will add a new Goroutine to the local run queue of some P (Processor) and wait for scheduling to execute it.

Scheduling of the Goroutine

The Go runtime uses the GPM model (Goroutine, Processor, Machine) to manage and schedule the Goroutine. the scheduling process is as follows:

P (Processor): P is a Go runtime abstraction that represents a logical processor. Each P holds a local runtime queue for storing pending Goroutines. the number of Ps is usually equal to the number of CPU cores in the machine, and can be set by a function.
M (Machine): M denotes an operating system thread, M is responsible for the actual execution of the Goroutine in P. M and P are one-to-one bound, an M can only be bound to one P, but a P can be bound to multiple M's (through the preemption mechanism). the number of M's is dynamically managed and determined by the Go runtime system. the number of M's is not fixed but dynamically adjusted according to the running situation and system resource usage. The number of M's is not fixed, but is dynamically adjusted according to the program's operation and the use of system resources. The () and () functions allow us to see the current number of Goroutines and CPU cores, and the Go runtime has a default maximum limit on the number of M's to prevent the creation of too many M's and running out of system resources. This limit can be adjusted via the environment variable GOMAXPROCS, but usually does not need to be set manually.
G (Goroutine): represents a goroutine that has its own stack, instruction pointer, and other information (channel being waited for, etc.) for scheduling.
Scheduling loop: each P will keep taking Goroutine from the local run queue in a loop and assign it to the bound M for execution. If P's local run queue is empty, P will try to steal Goroutine from other P's local run queue (work stealing mechanism).

As you can see from the figure above, there are 2 physical threads M, each of which has a processor P, and each of which also has a running goroutine. the number of P can be set by GOMAXPROCS(), which actually represents the true concurrency, i.e. how many goroutines can run at the same time. The grayed-out goroutines in the figure are not running, but are in the ready state, waiting to be dispatched. p maintains this queue (called a runqueue), and in Go, it's easy to start a goroutine: a go function does it, so for every go statement that is executed, a goroutine is added to the runqueue at the end of the run queue, and a goroutine is added to the runqueue at the end of the runqueue. So every time a go statement is executed, the runqueue queue adds a goroutine to the end of it, and at the next scheduling point, a goroutine is taken out of the runqueue (how do you decide which goroutine to take?) and executed.

The number of P's can be greater than the number of CPU cores of the ware?

In the Go language, the number of P (Processor) is usually equal to the number of CPU cores of the machine, but it can be adjusted by functions. By default, Go runs with the number of P's set to the number of logical CPU cores of the machine. However, the number of P's can be set to be greater or less than the number of CPU cores of the machine, depending on specific application requirements and performance considerations.

To adjust the amount of P, you can use a function to set the amount of P. Example:

package main

import (
"fmt"
"runtime"
"sync"
)

func worker(id int, wg *) {
worker(id, wg *) { defer ()
("Worker %d starting\n", id)
// Simulate the workload
for i := 0; i < 1000000000; i++ {
}
("Worker %d done\n", id)
}

func main() {
// Set the number of P's to twice the number of logical CPU cores of the machine
numCPU := ()
(numCPU * 2)

var wg

// Start multiple Goroutines
for i := 1; i <= 10; i++ {
(1)
go worker(i, &wg)
}

// Wait for all Goroutines to complete
()
("All workers done")
}

In this example, we set the number of P's to twice the number of logical CPU cores of the machine. The purpose of this is to observe the performance of the program under different settings of the number of P's.

The effect of having more P's than CPU cores
- Context switch increase: when the number of P is larger than the number of CPU cores, it may lead to more context switches. Because the OS needs to schedule more threads (M) on a limited number of CPU cores, this may increase the scheduling overhead.
- Resource contention: more P means more Goroutines can run at the same time, but this may also lead to more resource contention, especially in I/O intensive tasks. Too many P may lead to resource contention, which in turn reduces the overall performance of the program.
- Concurrency Improvement: In some cases, increasing the number of P's can improve the concurrency of a program, especially in the presence of a large number of blocking operations (e.g., I/O operations). More P's can make better use of CPU resources and reduce blocking time.
The effect of the number of P's being smaller than the number of CPU cores
- Decreased CPU utilization: When the number of P is smaller than the number of CPU cores, it may lead to underutilization of CPU resources. This is because the number of P limits the number of Goroutines running at the same time, which may result in some CPU cores being idle.
- Reduced context switching: A smaller number of P's reduces the overhead of context switching because the operating system has fewer threads (M) to schedule. This may improve the performance of CPU-intensive tasks.

Choosing the Right Number of Ps Choosing the right number of Ps needs to be tailored to the specific application scenario and performance requirements. The following are some suggestions:

CPU-intensive tasks: For CPU-intensive tasks, the number of P is usually set equal to or close to the number of logical CPU cores of the machine to fully utilize the CPU resources.
I/O-intensive tasks: For I/O-intensive tasks, consider setting the number of P's to be greater than the number of CPU cores to improve concurrency and resource utilization.
Performance testing and tuning: Find the optimal P quantity setting through performance testing and tuning. You can try different P quantities to observe the performance of the program and choose the optimal configuration.

Context switching in the Goroutine

Goroutine context switching is managed by the Go runtime scheduler and involves the following steps:

Save the current Goroutine's state: When a Goroutine is hung, Go will save the current Goroutine's state information when it is running, including program counters, stack pointers, registers and so on.
Switch to a new Goroutine: the Go runtime will take the next pending Goroutine from P's local run queue and restore its state information.
Restore the state of the new Goroutine: Go runtime will load the state information of the new Goroutine into the CPU registers and jump to the program counter location of the new Goroutine to continue execution.

When does a Goroutine hang?A Goroutine hangs when it performs a blocking operation, uses a synchronization primitive, is scheduled by the scheduler, and is created and destroyed.The Go runtime manages the hanging and recovery of a Goroutine through an efficient scheduling mechanism that enables highly concurrent and high-performance program execution. Understanding these hangs can help you write efficient concurrent programs and avoid potential performance problems.

blocking operation

When a Goroutine performs a blocking operation, it hangs until the blocking operation completes. Common blocking operations include:

I/O operations: e.g., file reading and writing, network communication, etc.
System calls: e.g. calls to blocking functions provided by the operating system.
Channel operation: If a send or receive operation is performed on an unbuffered Channel, the Goroutine will hang if there is no corresponding receiver or sender.

synchronous proto-language

When synchronization operations are performed using synchronization primitives (e.g., , etc.), the Goroutine may hang until conditions are met. Example:

Mutex: When a Goroutine tries to acquire a mutex lock that is already held by another Goroutine, it hangs until the lock is released.
Conditional variables (Cond): when Goroutine waits for a conditional variable, it will be hung until the conditional variable is notified.

scheduler dispatch

The Go runtime scheduler hangs and resumes the Goroutine as needed for efficient concurrent scheduling. The scheduler may hang the Goroutine in the following situations:

Time Slice Exhaustion: The Go scheduler uses collaborative scheduling, when a Goroutine runs out of time slice, the scheduler hangs the Goroutine and schedules other Goroutines to execute.
Active yielding: Goroutine can actively yield CPU by calling (), the scheduler will hang the Goroutine and schedule other Goroutines to execute.

Goroutine creation and destruction

Creation: when a new Goroutine is created, it is hung until the scheduler schedules it for execution.
Destroy: When a Goroutine finishes execution or is explicitly terminated, it is hung and removed from the scheduler.

Goroutine's Stack Management

The Goroutine's stack space is dynamically allocated and can be automatically expanded as needed.The Go runtime uses either a segmented stack or a continuous stack to manage the Goroutine's stack space:

Segmented Stacks: In earlier versions of Go, Goroutines used segmented stacks. Each Goroutine's stack consists of several small segments. When the stack space is insufficient, Go runtime allocates new stack segments and links them to the existing ones.
Continuous Stacks: In Go 1.3 and later, Goroutines use continuous stacks. Each Goroutine's stack is a contiguous block of memory, and when the stack runs out of space, Go runtime allocates a larger stack and copies the contents of the existing stack to the new one.