23.1 What is a Fast System Call
System calls are a means by which the operating system provides services for 3-privileged level tasks. In 32-bit operating systems, we implement system calls through interrupts. Since system calls are a very frequently used mechanism and interrupts are not specifically designed for system calls, 64-bit CPUs provide a specialized mechanism for system calls: fast system calls.
Fast system calls are handled by the specializedsyscall
The command is initiated with a dedicatedsysret
The command returns.syscall
must be transferred from privilege level 3 to privilege level 0.sysret
Must return from privilege level 0 to privilege level 3. The fast system call uses register passes throughout, and the system call function'scs:rip
is predetermined, so that thesyscall/sysret
None of them require parameters.
To summarize, the whole set of mechanisms for fast system calls is very fixed, which leads to high efficiency.
23.2 Installation of Quick System Calls
Before you can use Fast System Calls, you need to install the components required for Fast System Calls, which involves 4 MSRs.
23.2.1 IA32_EFER
Fast System Calls This feature is turned off in its initial state, and its switch is located in theIA32_EFER
of the 0th digit. This MSR, which we have already seen, is numbered0xc0000080
。
23.2.2 IA32_STAR
The lower 32 bits of this MSR are reserved; bits 32 to 47 are used to set thesyscall
Privileged Level 0 segment selector used; bits 48 to 63 are used to set thesysret
The 3-privileged segment selector used.
Note that it doesn't say that the setting is a "segment selector", just a "segment selector", because the selector setting has a rather strange definition:
- For bits 32 to 47, the value itself is treated as a privileged level 0 code segment selector; the value obtained by adding 8 to this value is treated as a privileged level 0 data segment selector.
- For bits 48 to 63, the value itself will be treated as a 3-privileged compatibility mode code segment selector; the value obtained by adding 8 to this value will be treated as a 3-privileged data segment selector; and the value obtained by adding 16 to this value will be treated as a 3-privileged IA32-e mode code segment selector. Then, when executing the
sysret
When it does, which code segment does it actually choose? This question will be discussed in
The segment selector is obtained by shifting the descriptor index value left by 3 bits, so adding 8 is the next descriptor in the GDT. That is, bits 32 to 47 set the first of two consecutive segment descriptors; bits 48 to 63 set the first of three consecutive segment descriptors. However, since our operating system never uses compatibility mode code segments, this descriptor is not defined in the GDT.
The number of this MSR is0xc0000081
。
23.2.3 IA32_LSTAR
This MSR is used to set the address of the system call function, which is numbered0xc0000082
。
23.2.4 IA32_FMASK
This MSR is used to set the RFLAGS masking mask. Specifically, when executingsyscall
whenrflags
It will turn out like this:rflags &= ~IA32_FMASK
. In our operating system, this MSR is used to mask the IF bit with a mask mask of0x200
。
The number of this MSR is0xc0000084
。
23.3 syscall
Implementation details
When executing thesyscall
When it does, the CPU performs the following operations:
rcx = rip
r11 = rflags
cs = IA32_STAR[32:47]
rip = IA32_LSTAR
rflags &= ~IA32_FMASK
In other words.rcx
cap (a poem)r11
will besyscall
use, they cannot be used for passing parameters. In addition, thesyscall
will not have a direct impact onrsp
do any processing, which is an important issue that we will discuss below.
23.4 sysret
Implementation details
When executing thesysret
When it does, the CPU performs the following operations:
rip = rcx
rflags = r11
- in the event that
sysret
There is no 64-bit prefix, then:cs = IA32_STAR[48:63]
Otherwise:cs = IA32_STAR[48:63] + 16
That is to say:
- The operating system needs to be protected
rcx
together withr11
-
sysret
Requires a 64-bit prefix
Point 1 above will be discussed below; point 2 is available in nasmo64 sysret
Realization.
23.5 Implementation of system calls
See the code in this chapter23/
。
In line 3, the declaration ofsyscallInit
function. This function is implemented in assembly language.
Next, look at the code for this chapter23/
。
In lines 15 to 18, replaceIA32_EFER
of position 0 of 1 to turn on the fast system call function.
Lines 20 to 23, setIA32_STAR
. In GDT, descriptor #3 is a privileged level 0 code segment and descriptor #4 is a privileged level 0 data segment, and these two segment descriptors correspond to theIA32_STAR
bits 32 to 47; descriptor #5 is a 3-privileged data segment, and descriptor #6 is a 3-privileged code segment with no compatibility mode code segment, so descriptor #4 should be forced to be installed here into theIA32_STAR
bits 48 to 63 of the program, leaving descriptors #5 and #6 in the correct position.
In lines 25 to 29, the system call functionsyscallHandle
address to install into theIA32_LSTAR
。
In lines 31 to 34, the masking masks will be0x200
mounted onIA32_FMASK
。
At this point, the fast system call is ready.
syscallHandle
function is a system call function. In 32-bit operating systems, system calls are realized by interrupts, and when an interrupt occurs, the CPU automatically switches to the 0-privileged stack, which is provided by the operating system to ensure its safety. What does "safe stack" mean? If you do not switch the stack, what is the problem? Please see the following example:
void test()
{
char s[] = "666";
__asm__ __volatile__("syscall");
}
Translating this code into assembly language could be:
test:
mov dword [rsp - 4], '666'
syscall
ret
can be found: this function'srsp
is not and does not need to actually subtract 4, but if such arsp
Providing it for use by the system call function is just wrong, because the system call function doesn't know exactly how the stack is supposed to work. This is the problem posed by unsafe stacks, so it is necessary to switch to a safe stack at system call time.
However.syscall
does not switch the stack automatically, we need to do this manually.0 The privilege level stack is in the TSS, and the address of the TSS is0xffff800000092000
But to use this address, you must first turn over the 64-bit immediate number using a register. Which register to use? Unrelated to the ABI, it seems that whichever one is used is not perfect. At this point, our previously setIA32_GS_BASE
It came in handy to usegs
It is then possible to manipulate the TSS directly. Not only that, but our operating system's TSS is extended to 128 bytes, and a small section of memory after 104 bytes can be used to back up the current stack before changing thersp
. At this point, the stack change problem is perfectly solved.
In line 44, replacersp
Backup to[TSS + 104]
。
On line 45, switch to the 0 privilege level stack.
Lines 47-48, protectionrcx
together withr11
.. The stack is now safe to use.
Lines 50 to 51, callrax
The specified function.
Lines 53-54, recoveryrcx
together withr11
。
On line 56, restore the 3-privilege level stack.
Line 58, return from fast system call.
Lines 60 to 63, define the system call table. system call #1 is reserved for use in subsequent chapters.
Next, look at the code for this chapter23/
。
_start
Functions are the real entry point for 3-privilege level tasks, and their use is to make the task exit automatically at the end.
23.6 Compilation and Testing
Code for this chapter23/Makefile
increasedtogether with
compilation and linking commands.
Code for this chapter23/
together with23/
System calls 0 & 2 were tested.