Location>code7788 >text

DDCA -- large cache, virtual memory: multi-core cache, NUCA cache, page tables, etc.

Popularity:146 ℃/2024-11-13 00:49:06

1. Multicore issues in caching

1.1 Caching in multicore systems

image-20241111144658714
  • Intel Montecito Cache
    • Two cores, each with a private 12 MB L3 cache and a 1 MB L2 cache, are shown in dark blue.
image-20241111144822880
  • Cache efficiency becomes more important in multi-core/multi-threaded systems
    • Memory bandwidth is very valuable
    • Cache space is a limited resource for each kernel/thread
  • How to design a cache in a multicore system?
    • Shared Cache vs. Private Cache
    • How to maximizeImprove overall system performance
    • How to create a shared cache for different threads in a shared cacheProvide QoS (Quality of Service)
    • Should the cache management algorithmSensing threads (i.e. should they know if different threads are accessing them)
    • what wayAllocating space for threads in the shared cache

1.2 Private vs. Shared Caching

  • Private Cache: Cache belongs to only one kernel (one shared block can exist in multiple caches)
  • Shared Cache: Cache shared by multiple cores

L3 is generally shared.

image-20241111150618196

1.2.1 Concept and benefits of resource sharing

Idea:Instead of allocating a hardware resource exclusively to one upstream and downstream hardware, multiple upstream and downstream hardware can share the use of it.

  • Resource examples: functional unit, pipeline, cache, bus, memory, interconnect, storage

Pros:

  • Resource sharingIncreased utilization/efficiency -> throughput
    • When a thread has an idle resource, another thread can use it; no need to copy shared data
  • Reduced communication delays
    • For example, in a multithreaded processor, data shared by multiple threads can be stored in the same cache
  • Compatible with shared memory programming model
    • e.g. cuda

1.2.2 Disadvantages of resource sharing

  • Resource sharing leads toscramble for resources

    • When a resource is not idle, another thread cannot use it
    • If the space is occupied by one thread, another thread needs to reoccupy it
  • Sometimes reduces the performance of each or some threads

    • Threads may perform worse than when run individually
  • Eliminate performance isolationi.e., inconsistent runtime performance

    • Thread performance depends on co-executing threads
      • The current running program gets a different amount of cache space than other running programs that share the cache, and the performance of the program is determined by the other running programs; if the other running programs take up most of the cache, the performance is reduced
  • Uncontrolled (free sharing) sharing sessionsReduced QoS

    • Causes inequity, depletion of resources

Therefore.There is a need to utilize shared resources efficiently and equitably.

1.2.3 Shared vs. private caches in multicore

  • Advantages of shared caching:
    • Space is dynamically allocated between cores
    • No wasted space due to copying
    • Cache coherency may be faster (easier to locate data on misses)
  • Advantages of private caching:
    • L2 small -> faster access times
    • Dedicated bus to L2 -> less competition

1.2.4 Inter-core shared caches

Pros:

  • High effective capacity
  • Dynamic divisionAvailable Cache Space
    • Fragmentation due to lack of static division
    • If one kernel is not using some space, another kernel can use the
  • Easier to maintain cache consistency (cache blocks in a single location)

Drawbacks:

  • Slower access(Cache not tightly coupled to kernel)
  • Kernel conflict loss due to access from other kernels
    • Losses due to inter-core interference
    • Some kernels break hits for other kernels (some kernels kick blocks needed by other kernels out of cache)
  • Guaranteeing a minimum level of service (or fairness) for each kernel is more difficult (how much space, how much bandwidth?)

Example:

Core 1 running t1 alone needs to share the space of the size of the blue area in cache L2, and Core 2 running t2 alone needs to share the space of the size of the orange area in cache L2. When these two Cores are running at the same time, the throughput of t2 will be reduced drastically due to unfair cache sharing.

image-20241111154409325image-20241111154508021image-20241111154543573

1.3 UCA & NUCA

Currently, small size caches areUniform Cache Access (UCA): i.e., the access latency is a constant value regardless of where the data is found.

For large multi-terabyte caches, limiting access times by worst-case latency is too costly, hence the introduction of theNon-Uniform Cache Architecture (NUCA)

1.3.1 Large-scale NUCA

In the NUCA architecture, the CPU is connected to multiple cache units, each of which may have different access latencies, unlike the traditional unified cache access architecture. Due to large caches, it is not possible to guarantee that all cache units have the same access latency, so the NUCA architecture is introduced to solve this problem.

image-20241111160653760

There are several key issues that need to be addressed in the NUCA architecture:

  • Mapping: How to map memory addresses to different cache units in order to access data efficiently.

  • Migration: How to move data between different cache units to minimize access latency. For example, frequently used data can be migrated to a cache unit closer to the CPU.

  • Search: How to efficiently search for data in the cache to ensure that the cache unit where the target data is located is found.

  • Replication: How to replicate data in different cache units to increase access speed and reduce communication latency.

1.3.2 Shared NUCA Caches

image-20241111161033222

This image shows the structure of a multicore processor, where each Core has its ownL1 cache(L1 D and L1 I) as well as the sharedL2 cache slicingand each Core's And each Core'sL1 cache and L2 cache slices are combined to form a separate unit (Tile)

  • Shared L2 Cache: The entire L2 cache is partitioned into multiple parts, with each core owning a portion of the L2 cache slice. This distributed L2 cache design reduces cache access latency and improves cache scalability.
  • Cache Controller Functions: The cache controller in the figure is responsible for forwarding address requests, directing them to the corresponding L2 cache slices, and handling cache coherency operations. This is important for data consistency in a multi-core processor, ensuring that each Core's access to shared data is correct.

2. Virtual Memory (VM)

2.1 Virtual vs. Physical Memory

  • programmerSeeing it.virtual memory

    • Memory can be thought of as "infinite"
  • In fact.physical memoryThe size is much smaller than the programmer assumed

  • systems(System software + hardware, interoperable)Mapping Virtual Memory Addresses to Physical Memory

    • The system automatically manages the physical memory space thatTransparency to programmers
  • Pros:Programmers don't need to know the physical size of memory or manage it

    • To a programmer, a small amount of physical memory may seem like a huge amount of memory
  • Drawbacks:More complex system software and architecture

2.1.1 Advantages of automatic memory management

  • Programmers don't deal with physical addresses

  • Each process has its own mapping of virtual to physical addresses

  • Enables

    • Code and data can be located anywhere in physical memory(Repositioning)
    • Isolation/separation of code and data of different processes in physical memory(Protection and isolation)
    • Code and data sharing between multiple processes(shared)

2.1.2 Physical memory-only systems

Examples:

  • early system
  • Many embedded systems

The CPU's load or store address is used to access memory directly.

image-20241111165416089

Problems with the system:

  • Limited physical memory capacity (cost)
    • Should programmers be concerned about the size of code/data blocks in physical memory?
    • Should programmers manage the movement of data from disk to physical memory?
    • Should a programmer make sure that two processes (different programs) do not use the same physical memory?
  • In addition, the ISA address space can be larger than the physical memory size.
    • For example, the 64-bit address space is byte-addressable.
    • What if there isn't enough physical memory?

2.1.3 Difficulties with direct physical address access

Programmers need to manage physical memory space

  • Inconvenient and difficult to realize
  • More difficult to manage when there are multiple processes

Difficult to support code and data redistribution

  • Addresses are specified directly in the program, leading to a lack of flexibility

Difficulty in supporting multiple processes

  • Difficulty in achieving protection and isolation between processes
  • Problems with shared physical memory space

Difficult to support data/code sharing across processes

  • Inability to easily share data and code between different processes

2.2 Virtual Memory

  • Idea: give programmers the illusion of large address space and small physical memory

    • This way programmers don't have to worry about managing physical memory
  • Programmers can assume an "infinite" amount of physical memory.

  • Hardware and software work together to automatically manage physical memory space to provide this illusion

    • This illusion is maintained for each individual process.

2.2.1 Basic mechanisms

  • Undirectionality in addressing

    • indirect level addressing
  • The address generated by each instruction in the program is a"Virtual address"

    • That is, it is not used to access the physical address of the main memory
    • In the x86 architecture, this is called a "linear address."
  • Address translationmechanism maps virtual addresses tophysical address

    • In the x86 architecture, this is called a "real address."

    • The address translation mechanism can be realized by both hardware and software.

2.2.2 Mapping between virtual and physical memory

image-20241111175301192

virtual address space

  • Each process has its own virtual address space, labeled as“virtual address space 1”cap (a poem)“virtual address space 2”

  • Each virtual address space can be up to 256TB in size.

  • The virtual address space is divided intoVirtual page (virtual page)Each virtual page is 4KB.

physical address space

  • Physical memory space is divided intoPhysical pageThese pages are the same size as the virtual pages in the virtual address space (also 4KB).

address mapping

  • Process 1's 0-4KB virtual pages in the virtual address space are mapped to 8-12KB physical pages in the physical address space.
  • The 0-4KB virtual pages in the virtual address space of Process 2 are mapped to 0-4KB physical pages in the physical address space.

2.2.3 Systems with virtual memory (Page based)

image-20241111192133051

This image shows how the virtual memory system works based on the paging mechanism, explaining how virtual addresses are converted to physical addresses through the page table.

  • Page Table
    • It is a mapping table forConvert virtual addresses to physical addresses
    • Each virtual page (fixed-size block of a virtual address) is mapped to a corresponding physical page (fixed-size block of a physical address) via a page table
    • Page tables are managed by the operating system.Each process usually has its own separate page table so that the virtual address spaces of different processes do not interfere with each other
  • address translation
    • Address Translation is realized by looking up the page table.
    • When the CPU wants to access a virtual address, it queries the page table to find the corresponding physical address.If the mapping exists, the virtual address is translated into a physical address and the corresponding physical memory is accessed
    • If the virtual address does not have a corresponding physical page, it may happen that thePage faultat this timeNeed to load the required pages from disk into physical memory

2.2.4 Virtual Pages, Physical Frames/Pages (VPs)

  • Virtual address space divided into pages
    • Virtual address space divided into multiple pages
  • Physical address space divided into frames
    • Physical address space divided into multiple physical frames
  • virtual pagecan be mapped to:
    • physical frame(if the page is in physical memory)
    • Somewhere on the disk(if the page is not in physical memory)
  • If the accessed virtual page is not in memory but on disk:
    • The virtual memory system loads the page from disk into a physical page and establishes a mapping relationship -- this is called therequirements paging
  • Page Tableis a table that stores the mapping relationship between virtual pages and physical page frames

2.2.5 Physical Memory as Cache

In other words.Physical memory is used to store a cache of pages on disk.

In fact, in modern systems it is aFully associative caching (virtual pages can be mapped to any physical frame)

Similar to the caching issue we discussed earlier, there is a similar issue here:

  • place: How do I place or find a page in the cache?
  • interchangeability: When the cache runs out of space, which page is removed to make room?
  • Management Granularity: Should pages be large, small or uniformly sized?
  • Write Policy: How are write operations handled? Is it write back or some other strategy?

2.2.6 Virtual memory definitions

Many concepts can be analogized:

image-20241111194021657
  • Page Size: Amount of memory transferred from hard disk to DRAM at one time
  • Address translation: Determination of physical address from virtual address
  • Page table: Used to translate virtual addresses to physical addresses (and to find the location of associated data)

2.2.7 Virtual and physical addresses

image-20241111194714848
  • Most accesses take place in physical memory

  • But the program sees a large amount of virtual memory

Example:

The flexibility of mapping virtual pages to physical pages allows physical memory to be used as a cache to store currently active pages while supporting address space isolation and larger virtual address spaces.

image-20241111194850429

2.3 Address translation

  • Virtual and physical memory are divided into pages
image-20241111195251139

As shown in the above figure, both virtual and physical memory are divided into pages, each of which is 8KB in size.

  • Virtual AddressDivided into two parts:

    • Virtual Page Number (VPN): Used to locate specific pages in virtual memory.

    • Page Offset: A 13-bit offset that specifies a specific location within the page.

  • Physical AddressDivided into two parts:

    • Physical Page Number (Physical Page Number): The virtual page number is the physical page number obtained through page table conversion.
    • Page Offset: remains unchanged from the offset in the virtual address and maps directly to the offset location of the physical address.
image-20241111201447750
  • Page Table: Each process has its own separate page table. The virtual page number (VPN) serves as an index to the page table and is used to look up the correspondingPage Table Entry (PTE)
  • Page Table Entry (PTE): Each PTE provides information related to the page, includingValid Bits respond in singingPhysical Frame Number (PFN)The valid bits indicate whether the page is in memory. A valid bit of 0 indicates that the page is not in memory and a page fault is triggered.
  • Physical address generation: By looking up the page table and getting the correspondingPhysical Frame Number (PFN)and then with thepage offset Combine to generate a physical address.
image-20241111201828485
  • Parameter Definition:

    • \(P=2^p\): The size of the page in bytes.

    • \(N=2^n\): The size of the virtual address space.

    • \(M=2^m\): The size of the physical address space.

  • Page offset bits do not change during conversion.

2.3.1 Example

image-20241111200101877

As shown above:

  • System Configuration:

    • Virtual memory size:\(2\ GB = 2^{31}\ bytes\)

    • Physical memory size:\(128\ MB = 2^{27}\ bytes\)

    • Page Size:\(4\ KB = 2^{12}\ bytes\)

  • Organizational structure:

    • Virtual address: 31 bits

    • Physical address: 27 bits

    • Page offset: 12 bits

    • Number of virtual pages =\(2^{31} / 2^{12} = 2^{19}\)(19-digit virtual page number)

    • Number of physical pages =\(2^{27} / 2^{12} = 2^{15}\)(15 digits for physical page number)

2.3.2 How do we translate addresses?

  • Page Table

    • Each virtual page hasTable of Contents
  • everyoneTable of ContentsBoth:

    • Valid bit:Whether the virtual page is located in physical memory (if not, it must be fetched from the hard disk)
    • Physical page number (Physical page number):Location of virtual pages in physical memory
    • (Replacement strategy, dirty bits)

2.3.3 Page Table Access

  • How do we access the page table?

  • Page Table Base Register (PTBR)(CR3 in x86)

  • Page Table Limit Register (PTLR)

  • If the virtual page number (VPN) is out of range (exceeds the PTLR), the process is not allocated this virtual page -> throws an exception

  • The page table base address register is part of the process context

    • Just like program counters (PCs), status registers, and general-purpose registers

    • When a process makes a context switch it needs to load the

2.3.4 Page Table Address Translation Example

  • Example page table:
image-20241111202212815

2.3.4.1 Example1

virtual address structure: A virtual address consists of a virtual page number (VPN) and a page offset. In the above figure, the virtual page number of the virtual address is0x00002The page offset is47C

Indexing of page tables: The page table uses the VPN as an index to find the corresponding physical page number (PPN). In the image, each row of the page table represents a page map entry, where the "V" bit indicates whether the entry is valid or not.

Page Table Base Address Register (PTBR): The page table is located in physical memory and is accessed through thePage Table Base Address Register (PTBR) Specifies its starting addressThe PTBR provides an entry point to access the page table.

Page table provides PPN: The page table entry contains the physical page number (PPN). For VPNs that are0x00002 entries, the page table gives the corresponding physical page numbers0x7FFF

address translation: The final physical address is a combination of a PPN and a page offset. In this example, the physical address resulting from the translation is0x7FFF47C

No change in page offset: During address translation, the page offset portion does not change and is passed directly from the virtual address to the physical address.

2.3.4.2 Example2

  • We first need to find the page table entry that contains the corresponding VPN transition.
  • at addressPTBR + VPN * PTE-size Look for the PTE at the

Problem:What is the physical address of virtual address 0x5F20?

image-20241111204359606

Problem:What is the physical address of virtual address 0x73E0? invalid, need to go to disk to find out

image-20241111204535375

2.3.5 Memory Hierarchy Properties

  • Virtual memory pages can be placed anywhere in physical memory (fully phased)

  • Replacement is usually done using theLRU Strategy(Because missing pages are costly, we can put some effort into minimizing them.)

  • Use the page table (indexed by virtual page number) to convert virtual page numbers to physical page numbers.

  • The memory-disk hierarchy can beInclusive or ExclusiveWrite policy is write-back (write-back)

3. Some challenges to virtual memory

3.1 Page Table Challenges

  • Challenge 1: Page table is large

    • At least some of it needs to be located in physical memory
    • Solution: Multi-level page table
  • Challenge 2:At least two memory accesses per instruction fetch or data access:

    1. One for address translation (page table reading)
    2. One uses physical addresses to access data (after address translation)
    • Two memory access instruction fetches or data accesses can significantly reduce execution time
      • Accelerated address translation
  • Challenge 3:When do we perform transformations related to cache access?

3.2 Challenge 1: Page Table Size

image-20241111211241724

The 64-bit virtual address (VA) is divided into two parts, the first 52 bits are the virtual page number (VPN), and the last 12 bits are the page offset, which is converted to a 40-bit physical address (PA) after address translation.

  • Assuming a 64-bit VA and a 40-bit PA, how big is the page table?
    • \(2^{52}entry (in a dictionary, encyclopedia etc) \times 4\ bytes \approx 2^{54}\ bytes\)
    • And that's just for one process! And that process may not be using the entire VM space!

Solution: Multi-level page table

Organize the page tables into a hierarchy so that only a small portion of the first level of the page table needs to be located in physical memory.

  • The first-level page table must be located in physical memory
  • Only the required secondary page tables can be retained in physical memory
image-20241111214639724
  • For an N-level page table, we need N visits to the page table to find the PTE
image-20241111215204949

x86 architecture example:

image-20241111212810473

The above figure shows the process of mapping a linear address to a physical address using theAddress translation mechanism for two-level page tables. This is an example based on a 4KB page table size, 32-bit physical address space.

  • linear address space: A linear address consists of a Dir, a Table, and an Offset. These fields are used to index the paging structure to find the corresponding physical address.

  • Page catalogs and page lists

    • table of contents(Page Directory: The "directory" part of the linear address is used to index the page directory to find the corresponding page directory entry (Pg. Dir. Entry).

    • leaflet(Page Table: The page table entry points to the location of the page table. The "Table Entry" part of the linear address is used to index the page table and find the corresponding page table entry (Pg. Tbl. Entry).

  • physical address

    • Page table entries contain addresses to specific physical pages. Once the physical page is found in the page table entry, the "offset" portion of the linear address is added to give the complete physical address.
  • CR3 Register: The CR3 register holds the physical address of the page directory and is used to locate the page directory during address translation.

Advantage: This method is particularly effective when you don't need to use all of the virtual address space, because each level of the page table is allocated only when it is needed, thus reducing wasted memory.

x84-64's 4-level page table:

image-20241111221302834

3.3 Challenge2: At least two memory accesses

  • Idea: A hardware structure is used to cache page table entries (PTEs) →Translation lookaside buffer (TLB)
  • TLB missed.What should I do when I do?
    • Which TLB entry to replace?
    • Who handles TLB misses?softwareneverthelesshardware
  • page faultWhat should I do when I do?
    • Which virtual page to replace from physical memory?
    • Who handles page errors?softwareneverthelesshardware

3.3.1 Translation lookaside buffer (TLB)

  • Idea: caching page table entries (PTEs) in the processor's hardware fabric to speed up address translation

  • Translation lookaside buffer(TLB):

    • Cache the most recently used PTE

    • Reduce the number of memory accesses required for most instruction fetches and data accesses to just one

  • Page table access is highly time localized

    • Data access is temporally and spatially localized
    • Large page size (say 4KB, 8KB, or even 1-2GB)
    • Successive commands and accesses have a high probability of visiting the same page
  • TLB

    • Small: visits in ~ 1 cycle
    • Typically 16 - 512 entries
    • high relevance
    • \(>95-99\%\) Typical hit rate (depending on workload)
    • Reduce the number of memory accesses for most instruction fetches and loads/stores to just one

3.3.2 Accelerating Address Translation with TLBs

  • Essentially a cache of recent address translations
    • Avoid going into the page table with every reference
image-20241111222843357
  • Index: Use the low bit of the virtual page number (VPN) as an index for quickly finding the corresponding entry.

  • Tag: Use the unused bits of the VPN and the process ID to form tags that are used to ensure that virtual pages are uniquely identified.

  • Data: Stores the contents of the Page Table Entry (PTE), which is the physical page number (PPN).

  • Status: Records status information such as the validity of the entry and whether it has been modified (dirty).

image-20241111224033028

Process:

  • The input virtual address consists of a VPN and a page offset.

  • Use the VPN to match the index of the TLB and verify that the entry is valid by tagging it.

  • If a valid entry is found, the physical page number is combined with the page offset to get the physical address.

  • If no valid entries are found (TLB missing), then the page table needs to be looked up.

3.3.3 Handling TLB Misses

  • TLB capacity is too small to hold all PTEs
    • Some translations will inevitably miss in the TLB
    • Memory must be accessed to find the right PTE
      • Called a walk-through catalog/table
      • High performance loss
  • Who handles the TLB miss? Hardware or software?

Methodology 1. Hardware management(e.g. x86)

  • hardware implementationPage walkmanipulate
  • Hardware acquires the PTE and inserts it into the TLB
    • If the TLB is full, the entry willinterchangeabilityAnother entry
  • Transparent completion of system software

Methodology 2.(e.g. MIPS)

  • Hardware-induced anomalies
  • operating system executionPage walkmanipulate
  • Operating System Acquisition PTE
  • The operating system inserts/deletes entries in the TLB

3.4 Challenge3: When to Perform Conversions Related to Cache Accesses

  • Relationship between TLB and L1 caches
    • Address translation and caching
  • When is address translation performed?
    • Before or after accessing the L1 cache?

4. Virtual memory support and examples

4.1 Virtual memory support

Virtual memory requiresHardware and softwareof support:

  • Page tables are stored in memory
  • can be cached in a special hardware structure called a Translation Backup Buffer (TLB)

Hardware components are calledMMU (memory management unit)

  • Includes page table base address registers, TLBs, and page table traversers

Responsibilities of the softwareis to utilize the MMU to:

  • Populate the page table and decide which pages to replace in physical memory
  • Change page table registers on process switch (to use the page table of the currently running thread)
  • Handling page errors and ensuring correct mapping

4.2 Address Translation

Page size is specified by the ISA (Instruction Set Architecture)

  • Currently common page sizes: 4KB, 8KB, 2GB ...... (mix of small and large pages)
  • Tradeoffs? (analogy to cache block)

Each virtual page has a page table entry (PTE)

  • What is contained in a Page Table Entry (PTE)?

4.3 Contents of Page Table Entry (PTE)

A page table is a "Tag Store" for physical memory data storage:

  • A table of mappings between virtual and physical memory

A Page Table Entry (PTE) is a "Tag Store Entry" for a virtual page in memory:

  • A valid bit is required to indicate validity/existence in physical memory (corresponding to the "Present/Absent" bit in the diagram).
  • Tag bits (Physical Frame Number PFN) are required to support address translation (corresponding to the "Frame Number" bit in the diagram).
  • Requires bits to support the replacement mechanism (corresponding to the "Reference" bit in the diagram).
  • A dirty bit is required to support "write-back caching".
  • Protection bits are required to enable access control and protection.
Page Table Entry

4.4 Page Fault ("A Miss in Physical Memory")

If the page is not in physical memory but on disk:

  • A Page Table Entry indicates that the virtual page is not in memory.
  • Accessing such pages triggers a Page Fault Exception.
  • The operating system's exception handler is called to move the data from disk to memory.
  • During this time, other processes can continue to execute.
  • The operating system has complete control over the placement of data.
image-20241111231237880

Before Fault (before a page fault occurs):

  • The CPU attempts to access a virtual address, and the corresponding page table entry indicates that the page is not in physical memory.
  • The pages are on disk, so there are no valid physical addresses in the page table.
  • The operating system detects a page error, triggers exception handling, and loads the page from disk into physical memory.

After Fault (after page error handling is complete):

  • After the handler completes, the page is successfully loaded into physical memory.
  • The page table entries are updated and now point to actual addresses in physical memory.
  • When the CPU accesses the virtual address again, it can find the physical address directly through the updated page table.

4.5 Address Translation: Page Hit

image-20241111231719428

The above figure shows how the processor translates the virtual address to a physical address through the MMU (Memory Management Unit) and fetches the data from the cache or memory.

  1. The processor sends a virtual address (VA) to the MMU:The processor generates a virtual address (VA) and sends it to the MMU to request access to data stored in memory.

2-3. The MMU fetches PTEs (page table entries) from the page table in memory:The MMU uses virtual addresses to look up page table entries (PTEs) to obtain the mapping relationship between virtual and physical addresses. If the page table entry is not in the TLB, the MMU fetches the corresponding PTE from memory.

  1. The MMU fetches the PTE from the page table and generates the physical address (PA) and sends it to the first level cache (L1 Cache):With the information in the PTE, the MMU converts the virtual address to a physical address (PA), and the converted physical address is passed to the L1 cache, thus speeding up data access.
  2. The L1 cache sends the data word to the processor:The L1 cache fetches data from the appropriate physical address and passes the data word back to the processor to complete the data request.

4.6 Address Translation: Page Fault

image-20241112172701734

The above figure shows the process of handling a Page Fault exception that occurs during address translation.

  1. The processor sends a virtual address (VA) to the MMU: The processor generates a virtual address and sends it to the MMU, which is responsible for translating the virtual address into a physical address.

  2. MMU gets page table entry address from page table (PTEA): The MMU looks up the corresponding page table entry address (PTEA) through the virtual address to obtain the page table entry (PTE).

  3. MMU gets page table entries (PTEs) from the page table: The MMU accesses page table entries (PTEs) in memory to determine whether a virtual address is in physical memory.

  4. valid bit is 0, MMU triggers missing page exception: If a valid bit in the page table entry is 0, indicating that the page is not in physical memory, the MMU triggers a Page Fault Exception, handing control over to the Page Fault Exception handler.

  5. The exception handler identifies the replacement page and writes back to disk if the page is a dirty page: The exception handler selects a page as a "Victim Page" and, if it is a dirty page, writes its contents back to disk to free up space.

  6. The exception handler loads a new page from disk and updates the page table entry (PTE) in memory: The exception handler loads the required pages from disk into physical memory and updates the page table entries so that the virtual addresses can be correctly mapped to the new physical addresses.

  7. The handler returns to the original process and re-executes the instruction that triggered the missing page: After the page load is complete, the processor re-executes the instruction that triggered the missing page to complete the operation using the correct physical address.

4.7 Caching vs.

  • Physical memory (DRAM) is a cache of Disk

    • Typically managed by system software through a virtual memory subsystem
  • Page replacement is similar to cache replacement

  • Page tables are "tag stores" for physical memory data storage

  • What's the difference?

    • Access Speed Requirements for Cache vs. Physical Memory
    • Number of blocks in cache and physical memory
    • "Tolerable" time to find alternative candidate blocks (Disk vs. memory access latency)
    • The role of hardware and software

4.8 Page Replacement Algorithm

  • If physical memory is full (i.e., the list of available physical pages is empty), which physical page is to be replaced in case of mispaging?

  • Is it feasible to use True LRU?

    • 4GB RAM, 4KB page, what are the possibilities for sorting?
    • There are\(4GB/4KB=1000000\)and hence the ordering probability is its factorial\(1000000!\)
  • Approximation algorithms for modern systems using LRUs

    • For example, the CLOCK algorithm
  • and more complex algorithms that take into account the use of "frequencies"

    • . ARC Algorithm

4.9 Clock Page Replacement Algorithm

image-20241112230721842

Pictured above.Clock Page Replacement Algorithm The algorithm is used to determine which pages need to be swapped out to make room for new pages when memory is low. The algorithm is used to decide which page needs to be displaced to make room for a new page when memory is low. The details are as follows:

  1. recurring linked list: The operating system maintains a circular list of physical frames in memory. This circular list is like the dial of a clock, where each physical frame corresponds to a "clock point".
  2. Pointer (Hand): The CLOCK algorithm uses a "pointer" or "hand" that points to the most recently inspected frame. When a page replacement is needed, the pointer scans the list clockwise, looking for a suitable page to replace.
  3. Reference Bit (R Bit): When a page is accessed, the reference bit (R bit) in its Page Table Entry (PTE) is set to 1 to indicate that the page has been used recently.
  4. Replacement rules
    • When a page needs to be replaced, the CLOCK algorithm scans the list clockwise, starting with the physical frame pointed to by the pointer. If it finds a frame with a reference bit (R bit) of 0, it is selected as the replacement target.
    • If the R bit of the scanned frame is 1, it is cleared to zero and the pointer continues to be moved to find the next frame until a frame with an R bit of 0 is found.
  5. Pointer update: When a suitable page is found for the replacement, the pointer points to the next frame, ready for the next replacement.

Advantages of the CLOCK algorithm:

  • The CLOCK algorithm is an improvedFIFO Replacement algorithm, which uses reference bits to determine whether a page has been visited in the recent past and avoids frequent replacement of active pages.
  • The CLOCK algorithm achieves "approximate LRU" replacement of pages by loop traversal, but at a lower cost than the real LRU algorithm.