Introduction to Log-Structured File System

This lesson presents the need for a file system focused on improving write performance and introduces the LFS.

We'll cover the following

In the early 90s, a group at Berkeley led by Professor John Ousterhout and graduate student Mendel Rosenblum developed a new file system known as the log-structured file system“Design and Implementation of the Log-structured File System” by Mendel Rosenblum and John Ousterhout. SOSP ’91, Pacific Grove, CA, October 1991. The original SOSP paper about LFS, which has been cited by hundreds of other papers and inspired many real systems..

Observations

Their motivation to do so was based on the following observations:

  • System memories are growing: As memory gets bigger, more data can be cached in memory. As more data is cached, disk traffic increasingly consists of writes, as reads are serviced by the cache. Thus, file system performance is largely determined by its write performance.

  • There is a large gap between random I/O performance and sequential I/O performance: Hard-drive transfer bandwidth has increased a great deal over the years“Hardware Technology Trends and Database Opportunities” by David A. Patterson. ACM SIGMOD ’98 Keynote, 1998. Available online here: http://www.cs.berkeley.edu/ ̃pattrsn/talks/keynote.html. A great set of slides on technology trends in computer systems. Hopefully, Patterson will create another of these sometime soon.. As more bits are packed into the surface of a drive, the bandwidth when accessing said bits increases. Seek and rotational delay costs, however, have decreased slowly; it is challenging to make cheap and small motors spin the platters faster or move the disk arm more quickly. Thus, if you are able to use disks in a sequential manner, you gain a sizable performance advantage over approaches that cause seeks and rotations.

  • Existing file systems perform poorly on many common workloads: For example, FFS“A Fast File System for UNIX” by Marshall K. McKusick, William N. Joy, Sam J. Leffler, Robert S. Fabry. ACM TOCS, Volume 2:3, August 1984. The original FFS paper; see the chapter on FFS for more details. would perform a large number of writes to create a new file of size one block: one for a new inode, one to update the inode bitmap, one to the directory data block that the file is in, one to the directory inode to update it, one to the new data block that is a part of the new file, and one to the data bitmap to mark the data block as allocated. Thus, although FFS places all of these blocks within the same block group, FFS incurs many short seeks and subsequent rotational delays and thus performance falls far short of peak sequential bandwidth.

  • File systems are not RAID-aware: For example, both RAID-4 and RAID-5 have the small-write problem where a logical write to a single block causes 4 physical I/Os to take place. Existing file systems do not try to avoid this worst-case RAID writing behavior.

An ideal file system would thus focus on write performance, and try to make use of the sequential bandwidth of the disk. Further, it would perform well on common workloads that not only write out data but also update on-disk metadata structures frequently. Finally, it would work well on RAIDs as well as single disks.

TIP: DETAILS MATTER

All interesting systems are composed of a few general ideas and a number of details. Sometimes, when you are learning about these systems, you think to yourself “Oh, I get the general idea; the rest is just details,” and you use this to only half-learn how things really work. Don’t do this! Many times, the details are critical. As we’ll see with LFS, the general idea is easy to understand, but to really build a working system, you have to think through all of the tricky cases.

LFS

The new type of file system Rosenblum and Ousterhout introduced was called LFS, short for the Log-structured File System. When writing to disk, LFS first buffers all updates (including metadata!) in an in-memory segment. When the segment is full, it is written to disk in one long, sequential transfer to an unused part of the disk. LFS never overwrites existing data, but rather always writes segments to free locations. Because segments are large, the disk (or RAID) is used efficiently, and the performance of the file system approaches its zenith.

THE CRUX: HOW TO MAKE ALL WRITES SEQUENTIAL WRITES?

How can a file system transform all writes into sequential writes? For reads, this task is impossible, as the desired block to be read may be anywhere on disk. For writes, however, the file system always has a choice, and it is exactly this choice we hope to exploit.

Get hands-on with 1400+ tech skills courses.