Location>code7788 >text

[rCore Study Notes 031] Hardware Mechanism for SV39 Multilevel Page Tables

Popularity:485 ℃/2024-10-27 03:04:45

When you see this title, you know that the previous section mentionedRISC-V ManualSection 10.6 is back in action.

It is important to note that RV32's paging scheme, Sv32, supports 4GiB of virtual address space, and RV64 supports a variety of paging schemes, but we will only introduce the most popular one, Sv39.

The RISC-V paging scheme is named after the SvX pattern, where X is the length of the virtual address in bits.

Virtual and physical addresses

direct access to a physical address

The default ishasn'tIf the MMU is turned on, the address of the memory will be accessed regardless of the privilege level of the CPU.allPhysical address.

Enable MMU

There is a CSR(Control and Status Register)Control Status Register), determines the control of the MMU. The namesatp(Supervisor Address Translation and Protection Supervisory Address Translation and Protection Register).

The structure is shown in the figure.

  • MODE Controls which page table implementation the CPU uses;
    • existMODEfor 0--b0000All prioritized memory accesses are direct accesses to physical memory.
    • existMODESet to 8--b1000When the paging mechanism is turned on, and the chosenbeSV39paging mechanism, thenS/Ulevel of privileged access will need to go through the MMU.
    • existMODESet to 9--b1001When the paging mechanism is enabled, and aSV48The paging mechanism, hereNo discussion of this paging mechanism.
  • ASID This is an address space identifier, and we don't need to worry about the process concept here.
  • PPN It's stored.root sheetThe physical page number. Thus, given a virtual page number, the CPU can map it to a physical page number step by step, starting from the root of the three-level page table.

this onePPNIt's also important, although we've only touched on mode selection here.MODEBut when it came time to actually finish the deposit.PPNThe root node of the decision determines our current differentAPPSame virtual memoryHow to map toDifferent physical memoryThe ...

address structure

First, we will only considervirtual addresscap (a poem)physical addressthe number of digits, and the number ofIgnoring the role of each bit of the address.

So with a 39-bit virtual address, you can access up to 512 gigabytes of memory.

What's more, there are56-bit (computing)The physical addresses of the GPT are used to access a larger range of memory (data from the GPT).

But in reality it is pointless to discuss this maximum storage limit, there is no way to generate such a form of address.virtual addresscap (a poem)physical addressThere is no fixed mapping between theHarder to pass MMUPerforms address translation.

Then we actually use a paged storage, so there is a set of addresses belonging to the paged storage, as shown in the following figure (the figure is stolen)reference manual).

This storage ismisalignment+paginationThe way in which ...

Note here that the size of each page is set to4KiB. This is our own setting, corresponding to theoffsetbe12bit, so that the offset can be used to access each of theByte.

RISC-V requires an address length of 64 bits, here we only use 39 bits, but that doesn't mean that the first 38 bits are not required.

With SV39 paging mode enabled, only the lower 39 bits are truly significant; SV39 paging mode requires that the 25 bits [63:39] of a 64-bit virtual address must be the same as the 38th bit, otherwise the MMU simply assumes that it is an illegitimate virtual address. After passing this check, the MMU takes the lower 39 bits and tries to convert them into a 56-bit physical address.

Then there's a very amazing thing that happens, is that because the latter part of [63:39] has to be the same as the 38th bit, then it's all 1s or all 0s, then, in fact, it's all 1s or all 0s.0~38The memory space represented by the bits is not a continuous 512 gigabytes, but is divided into 256 gigabytes at the highest address and 256 gigabytes at the lowest address.

data structure abstraction

Encapsulation of usize

Required data structures.

  1. physical address
  2. virtual address
  3. Physical pages
  4. Virtual Pages

The way to use it over here is to use atuple structure (computing).

// os/src/mm/

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
pub struct PhysAddr(pub usize);

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
pub struct VirtAddr(pub usize);

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
pub struct PhysPageNum(pub usize);

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq)]
pub struct VirtPageNum(pub usize);

innerusizejust likeSTRUCT.0.

One thing to note here is that the use of#[derive(Trait)]automaticTraitEspecially if you haven't used it much before.Ordcap (a poem)PartialOrd:

  1. Copy: This trait indicates that the value of the type can be "copied". HavingCopy The type of a trait can be simply assigned to another variable without changing the original value. For example, basic numeric types (such asi32f64) and tuples (if all the types they contain implement theCopy) are realizedCopyThis trait cannot be implemented manually. This trait cannot be implemented manually, but only through derivation.
  2. Clone: This trait allows a value to be explicitly copied. As with theCopy Different.Clone needs to be called explicitly (e.g., using theclone() (Methods).Clone trait requires that types can be copied, but does not require that copying be costless. If a type implements theCopyIt automatically also realizesClone, but the reverse is not necessarily true.
  3. Ord: This trait indicates that the type supports full-order comparisons, i.e., it can compare any two values and determine the order between them (less than, equal to, greater than). This means that the type must implement thePartialOrd cap (a poem)Eq
  4. PartialOrd: This trait allows for a partially sequential comparison of values of a type. This means that it is possible to compare two values, but it may returnNone, indicating that they are not comparable. Most of the time, if the types can be completely sorted, then thePartialOrd It will also be realized.
  5. Eq: This trait indicates that the type supports equivalence comparisons, i.e., it can compare two values to see if they are equal.Eq bePartialEq cap (a poem)Ord basis, which requires that types can be compared for equivalence.
  6. PartialEq: This trait allows equivalence comparisons of values of types, but may returnNone, indicating that they are not comparable. Most of the time, if the types are fully comparable, thePartialEq It will also be realized.
    (The explanation here comes from Kimi)

So the advantage of this encapsulation is, in effect, that theAdd a layer of abstractionSometimes abstractions that don't seem useful are actually very beneficial when used.

Instead of using the usize base type, which corresponds directly to the RISC-V 64 hardware, we have deliberately abstracted each of them to a different type, so that with the help of the Rust compiler, they can be used in a variety of convenient and safe ways.type conversion (Type Conversion) to build the page table.

Conversion between types and between types and usize via From

Encapsulated types with usize conversion

// os/src/mm/

const PA_WIDTH_SV39: usize = 56;
const PPN_WIDTH_SV39: usize = PA_WIDTH_SV39 - PAGE_SIZE_BITS;

impl From<usize> for PhysAddr {
    fn from(v: usize) -> Self { Self(v & ( (1 << PA_WIDTH_SV39) - 1 )) }
}
impl From<usize> for PhysPageNum {
    fn from(v: usize) -> Self { Self(v & ( (1 << PPN_WIDTH_SV39) - 1 )) }
}

impl From<PhysAddr> for usize {
    fn from(v: PhysAddr) -> Self { v.0 }
}
impl From<PhysPageNum> for usize {
    fn from(v: PhysPageNum) -> Self { v.0 }
}

Pay attention here.PAGE_SIZE_BITSis stored as a constant in theInside.

//os/src/

//*  Constants for mm *//
pub const PAGE_SIZE_BITS: usize = 12;

Conversion between encapsulated types

Just realized something when I saw it here, soaddressis better thanpage numberI need one more.offsetIf the information is converted directly, how can the information not be lost?

Here carefully look at the following code, we can see that if only thePhysPageNum, then convert it toPhysAddrdefaults tooffsetbe0Then the other way around.PhysAddrchange intoPhysPageNumWhen you are in a situation where you need to determine theoffsetyes or no0ifoffsetfault0Then it's an error.

// os/src/mm/

impl PhysAddr {
    pub fn page_offset(&self) -> usize { self.0 & (PAGE_SIZE - 1) }
}

impl From<PhysAddr> for PhysPageNum {
    fn from(v: PhysAddr) -> Self {
        assert_eq!(v.page_offset(), 0);
        ()
    }
}

impl From<PhysPageNum> for PhysAddr {
    fn from(v: PhysPageNum) -> Self { Self(v.0 << PAGE_SIZE_BITS) }
}

Likewise.PAGE_SIZEis a constant that is stored inLane.

//os/src/

//*  Constants for mm *//
pub const PAGE_SIZE: usize = 4096;

In this case, the readoffsetmethodologiespage_offsetRealization of.

impl PhysAddr {
    pub fn page_offset(&self) -> usize { self.0 & (PAGE_SIZE - 1) }
}

Finally, we had some other questions, so if theoffsetfault0Can't it be converted? The answer is that it can be converted, but not through theimplicitThe conversion is done explicitly, i.e., you need to know that you've done the conversion with thediscard four, but treat five as whole (of decimal points)(b) The rounding of .

// os/src/mm/

impl PhysAddr {
    pub fn floor(&self) -> PhysPageNum { PhysPageNum(self.0 / PAGE_SIZE) }
    pub fn ceil(&self) -> PhysPageNum { PhysPageNum((self.0 + PAGE_SIZE - 1) / PAGE_SIZE) }
}

included among theseflooris rounded down, theceilis rounded up.

It's actually still hard to understand the meaning of rounding up, because thisoffsetToo close to the next page, so you just take it down?

Rounding up is to take a new unoccupied page, rounding down is to take the current page.

page entry (in a dictionary)

page entry (in a dictionary)is a program that also containsPage number informationcap (a poem)flag positionThe data structure of the

The page number information is equivalent to a dictionary that maps avirtual addressand aphysical address.

The flags are different: the

  • V(Valid): The page table entry is legal only if bit V is 1;
  • R(Read)/W(Write)/X(eXecute): control whether the corresponding virtual page indexed to this page table entry is allowed to read/write/execute, respectively;
  • U(User): Controls whether the corresponding virtual page indexed to this page table entry is allowed to be accessed if the CPU is at privilege level U or not;
  • G: Ignore it for now;
  • A(Accessed): the processor records whether the corresponding virtual page of the page table entry has been accessed since this bit on the page table entry was cleared;
  • D(Dirty): the processor records whether the corresponding virtual page of the page table entry has been modified since this bit on the page table entry was cleared.

apart fromG Outside of the above bits can be set by the operating system, only theA potentially hazardous substancesD bits are dynamically set directly by the processor to1 The page has been accessed or repaired (note.):A potentially hazardous substancesD (Whether the bits can be directly modified by the processor hardware depends on the specific implementation of the processor.)

This time to implement it in rust, this time the use of a bit for a FLAG form, rust provides such a package, the name is calledbitflags.

Therefore.

  1. existpackage is declared in the
  2. existto declare the use of this package.

As I write this blog.bitflagThe version of the program is as of2.6.0, which can be viewed by looking at theRelated SitesSee the latest version and details.

// os/

[dependencies]
bitflags = "2.6.0"

existAdd the following to the list.

extern crate bitflags;

establishos/src/mm/page_table.rs, realize this part of the FLAG.

use bitflags::bitflags;

bitflags!{
    pub struct PTEFlags: u8{
        const V = 1<<0;
        const R = 1<<1;
        const W = 1<<2;
        const X = 1<<3;
        const U = 1<<4;
        const G = 1<<5;
        const A = 1<<6;
        const D = 1<<7;
    }
}

How to interpret it? Here's a simple example.more detailedThe part that requires continued independent study: the

bitflags! {
    struct FlagsType: u8 {
//                    -- Bits type
//         --------- Flags type
        const A = 1;
//            ----- Flag
    }
}

let flag = FlagsType::A;
//  ---- Flags value

Here are the four corresponding sections.

  1. Bits typeMeaning this.flagHow many digits are there, corresponding to the type of the corresponding number of digits.
  2. FlagsTypeThis is the general name given to the set of flags.
  3. FlagThis is the name given to a bit that is used as a flag.
  4. Flags valuemeans that you can use theFlagsType::FlagAccesses the value of the corresponding flag bit.

So at this point go ahead and implement the page table entries: the

// os/src/mm/page_table.rs

#[derive(Copy, Clone)]
#[repr(C)]
pub struct PageTableEntry {
    pub bits: usize,
}

impl PageTableEntry {
    pub fn new(ppn: PhysPageNum, flags: PTEFlags) -> Self {
        PageTableEntry {
            bits: ppn.0 << 10 |  as usize,
        }
    }
    pub fn empty() -> Self {
        PageTableEntry {
            bits: 0,
        }
    }
    pub fn ppn(&self) -> PhysPageNum {
        ( >> 10 & ((1usize << 44) - 1)).into()
    }
    pub fn flags(&self) -> PTEFlags {
        PTEFlags::from_bits( as u8).unwrap()
    }
}

Then it is also possible to continue to implement something that has a judgment aboutflag positionThe auxiliary function of

// os/src/mm/page_table.rs

impl PageTableEntry {
    pub fn is_valid(&self) -> bool {
        (() & PTEFlags::V).bits() != PTEFlags::empty().bits()
    }
}

Note the new version herebitflagsThe code does not allow thePTEFlagsdirectly, so we can compare it by comparing itsbits().

multilevel page table

linear table

This thing is quite handy, for example, if we want to access theVirtualPageNumbe0(used form a nominal expression)page entry (in a dictionary), to map to thePhysicalPageNumwhen we only need to calculate thebase_addr+8iYou'll be able to access it.

What's hard to understand here is that it starts calculating directly after that, as if all of the values from the first one are being calculated.VirtualPageNum=0To the last one.VirtualPageNum=2^27It's all gone.

Of course, if we use a piece of memory here and a piece of memory there, even though we don't use the intermediate unused memory, does it mean that this piece of memory can't be used? That makes sense.

The expression in the documentation is.

Since the virtual page number has\(2^{27}\) In this case, each virtual page number corresponds to an 8-byte page table entry, and each page table consumes 1GiB of memory!

Our understanding is that.Since the virtual page number has\(2^{27}\) species, each virtual page number corresponds to an 8-byte page table entry, then 1GiB of memory needs to be reserved for each page table!

Dictionary Trees and Multilevel Page Tables

The representation of the dictionary tree inreference manualIt's very clear, I won't repeat it here, just get in there and learn it.

TIPS:
The dictionary tree always reminds me of a Huffman tree, but it's not the same. A Huffman tree holds the characters corresponding to a line of binary code, while a dictionary tree holds a string of characters.

Here he talks about the dictionary tree and then asks you to migrate the page table entries directly, which is really hard to understand from his words.

In fact the SV39 paging mechanism is equivalent to a dictionary tree. The 27-bit virtual page number can be viewed as a string of length n=3 with character set α={0,1,2,... ,511}, since each character consists of 9 bits. And we no longer maintain the so-called count of strings, but rather find the page table entries corresponding to the strings (virtual page numbers). Therefore, each leaf node needs to store 512 8-byte page table entries, which is exactly 4KiB in total, and can be placed directly in a physical page frame. For a non-leaf node, functionally it only needs to save 512 pointers to subordinate nodes, but we save 512 page table entries just like a leaf node, so that each node can be placed in a physical page frame, and the position of the node can be replaced by the physical page number of the physical page frame it is in. When you want to go down from a non-leaf node, you only need to find the physical page number field of the page table entry corresponding to the current character, which points to the location of the next level node, so that the function of non-leaf node transit is also realized. The internal part of each node is a linear table, i.e., the physical address of the node is added to the offset of the character to find the page table entry pointing to the next level of node (for non-leaf nodes) or the page table entry that can be used directly for address translation (for leaf nodes).

I drew a picture here, we understand through this picture, here to break down the virtual memory, the first 27-bit page table number into three 9-bit, that each bit can be expressed in the range of 0 ~ 511 a total of 512 kinds.

Then we can take each step of thePrefixes for nodesSeeing this as a 9-bit number, then the next level is 9-bit again, thusRootThere are 512 variations under the node (first level page table), thenNodeThere are also 512 variations under nodes (secondary page tables), thenLeafCorresponding under the node is the 512 kinds ofpage entry (in a dictionary).

Here's an example of what it's like to be0x000 0001An example of a dictionary tree representing a three-level page table.

By checking the page table entries in thePNNNaturally, you can access thephysical memory.

To summarize, the query sequence looks like this.

Here's a very hard thing to think about, considering that theThe page table entries themselves exist in physical memoryRi.

Then consider that there are 512 page table entries corresponding to the leaf nodes and each page table entry has8byteSo it just happened to take up4KiB.

It will be recalled here that we were setting the size of each page to be4KiB, then each node occupies exactly one physical page.

large page

The big page is the expansion section, you need to read it.SV39 Hardware Mechanisms for Multilevel Page Tables - rCore-Tutorial-Book-v3 3.6.0-alpha.1 Documentation, here do not dwell on, because here and the specific implementation of the code has little to do, I do not have any understanding of their own.

SV39 Memory footprint of multi-level page tables

This part is also straightforward to look atSV39 Hardware Mechanisms for Multilevel Page Tables - rCore-Tutorial-Book-v3 3.6.0-alpha.1 Documentation.

The harder part to understand here is why all of a sudden he starts dividing by2MiB,1GiB.

this one2MiBHere's how it works. Each node in the three-level page table has 512page entry (in a dictionary), then a page table entry is assignable4KiBof memory.512*4KiB=2MiB. What he means is that on average every time you use2MiBThe space requires the creation of a tertiary node.

Similarly, the average distribution1GiBThe space requires the creation of a secondary node.

Another important point here is that the consumption of page table entries (nodes) is8ByteThis is the same as the number of page table entries that can be allocated in a4KiBThe memory is different, so don't get confused.

SV39 Address Translation Process

The principle of checking a page table, which we have already explained, is to check a three-level dictionary tree using 9 bits of data at one level.

There is one more thing to say about the RISC-V page table lookup process.

Suppose we have the virtual address(VPN0,VPN1,VPN2,offset) :

  1. We first record the page number of the "physical page of the currently used level 1 page table" loaded into thesatp in the register;
  2. particle marking the following noun as a direct objectVPN0 Find the physical page number of the secondary page table as an offset in the physical page of the primary page table;
  3. particle marking the following noun as a direct objectVPN1 Finds the physical page number of the tertiary page table as an offset in the physical page of the secondary page table;
  4. particle marking the following noun as a direct objectVPN2 Finds the physical page number of the location to be accessed as an offset in the physical pages of the three-level page table;
  5. The physical page base address corresponding to the physical page number (i.e., the physical page number shifted 12 bits to the left) plus theoffset is the physical address corresponding to the virtual address.

Quicklist (TLB)

Here the description in the reference manual is good enough, summarized by using theTLBThe mapping is cached, which increases the speed of the conversion.

As we know, the access speed of physical memory is much slower than the running speed of CPU. If we follow the page table mechanism step by step, the conversion of a virtual address into a physical address requires three accesses to physical memory, and after obtaining the physical address, we need to access the physical memory once more to complete the access. This definitely reduces the system execution efficiency to a great extent.
Practice has shown that the virtual address access process of the vast majority of applications is characterized by temporal and spatial locality. Therefore, inside the CPU, we use the MMU in theFast Tables (TLB, Translation Lookaside Buffer) The TLB cache is used as the page table cache for mapping virtual page numbers to physical page numbers. This part of the knowledge is reflected in the Computer Composition Principles course, when we want to perform an address translation, there is a high probability that the corresponding address mapping has been completed in the recent past, so we can first go to the TLB cache to check, if there is a mapping, we can complete the mapping directly without having to access the memory so many times.

multitasking sheet

The description above also states.

  1. There's something to be said about having to access the currentFirst level page numberdeposit (e.g. in a bank account)satp
  2. The fast meter is passed through the

Then the different app'sThe table of pages is differentThen the first thing to do is to changeFirst level page number, that is, to modify thesatp.

The cache in the fast table also needs to be updated, so you need to use thecommand to clear the entireTLB.