Solution #1: The File System Checker

In this lesson, we look at a solution to fix the crash consistency problem by devising the file system checker.

We'll cover the following

What is fsck

Early file systems took a simple approach to crash consistency. Basically, they decided to let inconsistencies happen and then fix them later (when rebooting). A classic example of this lazy approach is found in a tool that does this: fsckPronounced either “eff-ess-see-kay”, “eff-ess-check”, or, if you don’t like the tool, “eff- suck”. Yes, serious professional people use this term.. fsck is a UNIX tool for finding such inconsistencies and repairing them; similar tools to check and repair a disk partition exist on different systems. Note that such an approach can’t fix all problems. Consider, for example, the case above where the file system looks consistent but the inode points to garbage data. The only real goal is to make sure the file system metadata is internally consistent.

What does fsck do

The tool fsck operates in a number of phases, as summarized in McKusick and Kowalski’s paper“Fsck – The UNIX File System Check Program” by Marshall Kirk McKusick and T. J. Kowalski. Revised in 1996. Describes the first comprehensive file-system checking tool, the eponymous fsck. Written by some of the same people who brought you FFS.. It is run before the file system is mounted and made available (fsck assumes that no other file-system activity is on-going while it runs). Once finished, the on-disk file system should be consistent and thus can be made accessible to users.

Here is a basic summary of what fsck does:

  • Superblock: fsck first checks if the superblock looks reasonable, mostly doing sanity checks such as making sure the file system size is greater than the number of blocks that have been allocated. Usually, the goal of these sanity checks is to find a suspect (corrupt) superblock. In this case, the system (or administrator) may decide to use an alternate copy of the superblock.

  • Free blocks: Next, fsck scans the inodes, indirect blocks, double indirect blocks, etc., to build an understanding of which blocks are currently allocated within the file system. It uses this knowledge to produce a correct version of the allocation bitmaps; thus, if there is any inconsistency between bitmaps and inodes, it is resolved by trusting the information within the inodes. The same type of check is performed for all the inodes, making sure that all inodes that look like they are in use are marked as such in the inode bitmaps.

  • Inode state: Each inode is checked for corruption or other problems. For example, fsck makes sure that each allocated inode has a valid type field (e.g., regular file, directory, symbolic link, etc.). If there are problems with the inode fields that are not easily fixed, the inode is considered suspect and cleared by fsck; the inode bitmap is correspondingly updated.

  • Inode links: fsck also verifies the link count of each allocated inode. As you may recall, the link count indicates the number of different directories that contain a reference (i.e., a link) to this particular file. To verify the link count, fsck scans through the entire directory tree, starting at the root directory, and builds its own link counts for every file and directory in the file system. If there is a mismatch between the newly-calculated count and that found within an inode, corrective action must be taken, usually by fixing the count within the inode. If an allocated inode is discovered but no directory refers to it, it is moved to the lost+found directory.

  • Duplicates: fsck also checks for duplicate pointers, i.e., cases where two different inodes refer to the same block. If one inode is obviously bad, it may be cleared. Alternately, the pointed-to block could be copied, thus giving each inode its own copy as desired.

  • Bad blocks: A check for bad block pointers is also performed while scanning through the list of all pointers. A pointer is considered “bad” if it obviously points to something outside its valid range, e.g., it has an address that refers to a block greater than the partition size. In this case, fsck can’t do anything too intelligent; it just removes (clears) the pointer from the inode or indirect block.

  • Directory checks: fsck does not understand the contents of user files. However, directories hold specifically formatted information created by the file system itself. Thus, fsck performs additional integrity checks on the contents of each directory, making sure that . and .. are the first entries, that each inode referred to in a directory entry is allocated, and ensuring that no directory is linked to more than once in the entire hierarchy.

As you can see, building a working fsck requires intricate knowledge of the file system and making sure such a piece of code works correctly in all cases can be challenging“SQCK: A Declarative File System Checker” by Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. OSDI ’08, San Diego, California. Our own paper on a new and better way to build a file system checker using SQL queries. We also show some problems with the existing checker, finding numerous bugs and odd behaviors, a direct result of the complexity of fsck.. However, fsck (and similar approaches) have a bigger and perhaps more fundamental problem: they are too slow. With a very large disk volume, scanning the entire disk to find all the allocated blocks and read the entire directory tree may take many minutes or hours. Performance of fsck, as disks grew in capacity and RAIDs grew in popularity, became prohibitive (despite recent advances“ffsck: The Fast File System Checker” by Ao Ma, Chris Dragga, Andrea C. Arpaci- Dusseau, Remzi H. Arpaci-Dusseau. FAST ’13, San Jose, California, February 2013. A recent paper of ours detailing how to make fsck an order of magnitude faster. Some of the ideas have already been incorporated into the BSD file system checker and are deployed today.).

At a higher level, the basic premise of fsck seems just a tad irrational. Consider our example above, where just three blocks are written to the disk; it is incredibly expensive to scan the entire disk to fix problems that occurred during an update of just three blocks. This situation is akin to dropping your keys on the floor in your bedroom, and then commencing a search-the-entire-house-for-keys recovery algorithm, starting in the basement and working your way through every room. It works but is wasteful. Thus, as disks (and RAIDs) grew, researchers and practitioners started to look for other solutions.

Get hands-on with 1300+ tech skills courses.