Physical structure allocation of the file system

In the computer world, files are abstract collections of data that provide an intuitive way for users to work with data. The data in these files must ultimately be stored on a concrete physical device, such as an HDD, SSD, or USB. these storage devices map their physical media through a device controller into a huge, randomly addressable address space, which we can think of as an oversized array. A device can store multiple files, so how to map multiple abstract files onto this huge space is a very important question. The design of the file system plays a crucial role here; it must distribute the files to the physical storage media in a reasonable way to ensure efficient storage and access. In this article, we will introduce you to several classic file distribution methods and their advantages and disadvantages.

Basic concepts of physical storage: blocks and clusters

When the hard disk has been abstracted by the device controller as an oversized array, for the deviceBlockUsually the smallest unit of the storage medium. Each block has a fixed size, e.g. 4KB, determined by the storage medium itself. These blocks are the smallest allocation and management unit of the storage device. And to better utilize the space and reduce management overhead, blocks can be further combined into larger storage units in the system, calledClusterA cluster usually consists of several consecutive blocks. The cluster size can usually be specified at the time of formatting and not changed thereafter, with a cluster containing at least one block, or it can be set to contain multiple blocks. After formatting is complete, the cluster becomes the smallest allocation unit in the file system perspective.

Cluster size directly affects storage efficiency and file system performance. Large clusters (e.g., 32KB or 64KB) mean that each cluster can hold more data, which reduces the file system's overhead in managing these clusters, making it easier to manage and providing high read and write performance, but it can lead to the situation where even a 1-byte file occupies a cluster, resulting in a situation where a small file takes up too much space, causing theInternal fragmentation. Conversely, small clusters (e.g., 512 bytes or 4 KB) can reduce internal fragmentation, but increase the complexity and processing overhead of managing clusters. Therefore, the choice of cluster size is an important decision to weigh storage efficiency against management overhead.

Continuous distribution: the simplest but difficult to scale up scheme

The memory has been divided into huge arrays addressed in clusters, and the files have been divided into clusters, so next we need to think about how to arrange locations for them.

Continuous Allocation (Contiguous Allocation)It is the simplest form of physical allocation in a file system. It stores files in a set of contiguous clusters. In a file record, only the file'sstarting positioncap (a poem)lengths, so that when accessing a file, you only need to read the specified length of data, starting from the starting position. Because sequentially allocated files are tightly packed on physical disks, it is ideal forRandom accessThe performance is excellent, especially for sequential access to large files. When adding a new file, simply find a contiguous block that can hold it and insert it.

However, there are many fatal drawbacks to continuous distribution:

External Fragmentation: As files continue to be created and deleted, disk space is cut up into many small, discrete chunks, leading to fragmentation, and ultimately there may be a situation where, although there is enough total space, not enough contiguous space can be found to allocate to new files.
Fixed file size: Since files must be stored in contiguous space, their size needs to be determined at the time of creation and cannot be dynamically expanded. If a file needs to grow, it may overwrite data in other files.
Frequent data movement: If the file needs to be expanded and the current space is insufficient, it may be necessary to move the file to another larger contiguous area, which will result in a large number ofData movement, consuming performance.

These drawbacks make sequential allocation rarely used in modern operating systems, especially when faced with high-frequency file operations. However, in some specific cases, such as for magnetic tapes like theSequential access storage medium (sequential access storage medium), continuous allocation remains a valid option.

Simple chained table allocation: an attempt to solve the fragmentation problem

To address the issue of external debris, an improved approach is toLinked Allocation. In a chained table allocation, the file is divided into blocks, and each block is linked to the next block by a pointer, thus forming a chained table. The file record only needs to store the address of the first block of the file (and possibly the last block address), and the rest of the blocks are accessed via the chained table pointers. This discrete allocation method also eliminates external fragmentation.

However, there are obvious drawbacks to chained table allocation:

No random access: Since blocks are linked by pointers, it is not possible to jump directly to a specific block, so only theSequential accessThe contents of the file. If you need to access a specific location in a file, you must traverse the entire chain table from the beginning.
Reliability issues: Since each block is linked by a pointer, once a pointer to a block is corrupted, the entire remainder of the file becomes inaccessible.
storage overhead: The need to store an additional pointer per block increases the storage overhead, and the smaller the file, the more significant the waste from this additional overhead.

Chained-list allocation solves the problem of external fragmentation, but the disadvantages of the inability to randomize access and the high storage overhead make it also more limited in use in modern operating systems.

Explicit Chained Table Allocation: File Allocation Table (FAT)

Another way to improve on the chained table allocation is toExplicit Linked Allocation (ELA)This means that the set of pointer information for each block (i.e., the next block) is stored in a specialized table (instead of being stored in each block), which is theFile Allocation Table (FAT)FAT is a structure that centrally stores the positional relationship between each cluster and its successor clusters, so that all the clusters of a file can be found step-by-step by looking for the file's starting cluster.

Under the FAT structure, the file record still only needs to store the file'sStarting block positionThis makes it much easier to jump between blocks within a file, although FAT is still not truly randomized, it is much faster than a simple chained table allocation. This design makes it easier to jump from block to block within a file, and while FAT still doesn't allow for truly random access, the jumps are much faster than simple chained-list allocations. To improve access speed, operating systems often cache FATs in memory, thus reducing the need for frequent disk reads during file access.

The FAT structure has the following characteristics:

Centralized management of linked table information: By centralizing information about all clusters, FAT makes file management much simpler, with only one table for the entire file system, and is especially effective in keeping track of which clusters are free when space is allocated and reclaimed.
Increased efficiency of visits: Since FAT can be cached in memory, file access is significantly faster, especially when jumping within a file.
Documentation restrictions: The cluster size and the number of pointer bits determine the maximum size of a single file. Assuming that a FAT file system uses 16-bit pointers to represent the location of each cluster and that the cluster size is 4KB, the maximum number of clusters that can be represented is 2^16 = 65536 clusters. Therefore, the maximum file size supported by the file system is 65536 * 4KB = 256MB.

The FAT family of file systems was first designed in 1977, and has evolved many times and is still used today, especially in embedded devices and flash memory devices (such as USB flash memory). exFAT is a new file system introduced by Microsoft in 2006, designed specifically for flash memory devices, with the main purpose of overcoming the limitations of FAT32 while maintaining compatibility. exFAT is a new file system introduced by Microsoft in 2006, designed for flash memory devices, with the main purpose of overcoming the limitations of FAT32, and expanding its functionality compared to the original version described above, while maintaining compatibility.

Index allocation: truly randomized access

On the basis of discrete storage, it is clear that it is not necessary to use a chained table can record the location of each block in a variable-length array in the file record, which allows for a trueRandom access. Such an array is calledIndexThis type of distribution is calledIndexed Allocation (Indexed Allocation)In index allocation, the file system maintains an index block for each file in the file record. In index allocation, the file system maintains an index block for each file in the file record, which stores the physical location of each data block of the file. By looking up this index block, the operating system can directly locate any block of the file, thus realizing fast random access. In short, the index can be realized with an array, you can also choose a binary tree or hash table, which is not the focus of this article.

Single-Level Index (Single-Level Index)is the simplest type of index allocation, in which each file has a separate index block that records the location of all data blocks of the file. This makes file access very flexible, especially for scenarios that require frequent random access, and index allocation has significant advantages. In addition, since the index block centrally stores all block locations, adding and deleting files also becomes easier by simply updating the index block, without the need to perform complex move operations on the physical blocks.

For example, assuming a single-level indexing system where each index block is 4KB in size and each pointer occupies 4 bytes, an index block can hold up to 4KB / 4B = 1024 pointers. If each data block is also 4KB in size, then the maximum file size supported by this file system is 1024 * 4KB = 4MB.

As you can see, single-level indexes are suitable for small to medium-sized files. However, the index block size is limited, which means that single-level indexes have a low upper file size limit and can only be used for smaller files. For very large files, other multi-level indexing schemes need to be used.

multilevel index

Since the size of a file record is usually limited, the file system introduced theMulti-Level Indexing (MLI)concept, also called indirect indexing. Multilevel indexes manage the location of data blocks in a file by using multiple levels of index blocks. For example.Two-Level Indexing (Two-Level Indexing)Use a single primary index block to point to multiple secondary index blocks, which in turn record the location of the file's data blocks. This design allows the file system to store larger files than a single-level index.

Taking the secondary index as an example, the index stored in the file record is the primary index, and the blocks pointed to in the primary index are not the file itself, but other indexed blocks called secondary indexes, from which the physical location of the file clusters are recorded.

Hybrid Indexing: The Mainstream of Modern File Systems

Multilevel indexes raise the upper limit on the size of a single file, but if all the file records use multilevel indexes, even a file of only 1 byte would need to occupy at least one block of each level of the multilevel index. The smaller the file, the more overhead the index itself imposes.

Modern file systems, in order to balance the storage needs of small and large files, have introduced theHybrid IndexingThe way. Hybrid indexing combines the advantages of direct indexing, indirect indexing, and multilevel indexing. For example, in some file systems, file records contain a number of pointers to data blocks directly (for small files), and pointers to index blocks indirectly (for larger files). This design allows the file system to efficiently handle a wide range of file sizes, both to reduce the management overhead of small files and to support efficient access to large files.

For example, suppose a file system uses a mixed indexing approach, where each inode contains 12 direct pointers, 1 first-level indirect pointer, 1 second-level indirect pointer, and 1 third-level indirect pointer. Assume that each data block is 4KB in size and each pointer occupies 4 bytes. Then:

direct pointer: You can point directly to 12 blocks of data, with a storage size of 12 * 4KB = 48KB. i.e., files under 48 KB require only a single level of indexing.
First-level indirect pointer: Points to an index block that contains 4KB / 4B = 1024 pointers, each pointing to a block of data, so it can store 1024 * 4KB = 4MB of data. That is, files under 4 MB + 48 KB only need the first two index levels.
Secondary indirect pointer: Points to an index block, and each pointer in that index block points to another index block, so that you can point to 1024 * 1024 blocks of data, for a total of 1024 * 1024 * 4KB = 4GB of data.
Tertiary indirect pointer: By the same token, you can point to 1024 * 1024 * 1024 blocks of data, for a total of 1024 * 1024 * 1024 * 4KB = 4TB of data.

As a result, the maximum file size supported by this hybrid index structure is approximately 48KB + 4MB + 4GB + 4TB. It can be seen that the design of the hybrid index allows the file system to efficiently handle small files as well as support very large files.

The hybrid indexing approach is widely used in the vast majority of modern file systems, for example:

NTFS（New Technology File System）: It is the default file system of the Windows operating system and uses a hybrid index structure to enable flexible management of files.
HFS+（Hierarchical File System Plus）: is the file system that used to be used in macOS to manage files through index nodes and hybrid indexes.
APFS（Apple File System）: A new generation of file systems developed by Apple for macOS, iOS, and other devices that also use hybrid indexing to improve the efficiency and flexibility of file access.
ext4（Fourth Extended File System）: is the default file system in the Linux operating system and uses a hybrid index structure with extended multi-level indexing and logging mechanisms to improve file management efficiency and reliability.

The advantage of the hybrid index structure lies in its adaptability to both small and large files, enabling the file system to provide efficient storage and access performance in various application scenarios. Therefore, hybrid indexes have become the mainstream choice for modern file system design. In this way, file systems are not only able to manage storage space efficiently, but also provide flexible random access capabilities to meet the needs of different users and applications.