The Linux Virtual Memory System: The Page Cache
In this lesson, you'll learn how the Linux page cache stores data and uses the 2Q replacement policy to free up space.
We'll cover the following
To reduce costs of accessing persistent storage (the focus of the third part of this course), most systems use aggressive caching subsystems to keep popular data items in memory. Linux, in this regard, is no different than traditional operating systems.
How linux page cache works
The Linux page cache is unified, keeping pages in memory from three primary sources: memory-mapped files, file data and metadata from devices (usually accessed by directing read()
and write()
calls to the file system), and heap and stack pages that comprise each process (sometimes called anonymous memory, because there is no named file underneath of it, but rather swap space). These entities are kept in a page cache hash table, allowing for quick lookup when said data is needed.
The page cache tracks if entries are clean (read but not updated) or dirty (a.k.a., modified). Dirty data is periodically written to the backing store (i.e., to a specific file for file data, or to swap space for anonymous regions) by background threads (called pdflush
), thus ensuring that modified data eventually is written back to persistent storage. This background activity either takes place after a certain time period or if too many pages are considered dirty (both configurable parameters).
2Q replacement policy
In some cases, a system runs low on memory, and Linux has to decide which pages to kick out of memory to free up space. To do so, Linux uses a modified form of
The basic idea is simple: standard LRU replacement is effective, but can be subverted by certain common access patterns. For example, if a process repeatedly accesses a large file (especially one that is nearly the size of memory, or larger), LRU will kick every other file out of memory. Even worse: retaining portions of this file in memory isn’t useful, as they are never re-referenced before getting kicked out of memory.
The Linux version of the 2Q replacement algorithm solves this problem by keeping two lists, and dividing memory between them. When accessed for the first time, a page is placed on one queue (called A1
in the original paper, but the inactive list in Linux); when it is re-referenced, the page is promoted to the other queue (called Aq
in the original, but the active list in Linux). When replacement needs to take place, the candidate for replacement is taken from the inactive list.
Linux would ideally manage these lists in perfect LRU order, but, as discussed in earlier chapters, doing so is costly. Thus, as with many OSes, an approximation of LRU (similar to clock replacement) is used.
This 2Q approach generally behaves quite a bit like LRU, but notably handles the case where a cyclic large-file access occurs by confining the pages of that cyclic access to the inactive list. Because said pages are never re-referenced before getting kicked out of memory, they do not flush out other useful pages found in the active list.
ASIDE: THE UBIQUITY OF MEMORY-MAPPING
Memory mapping predates Linux by some years, and is used in many places within Linux and other modern systems. The idea is simple: by calling
mmap()
on an already opened file descriptor, a process is returned a pointer to the beginning of a region of virtual memory where the contents of the file seem to be located. By then using that pointer, a process can access any part of the file with a simple pointer dereference.Accesses to parts of a memory-mapped file that have not yet been brought into memory trigger page faults, at which point the OS will page in the relevant data and make it accessible by updating the page table of the process accordingly (i.e., demand paging).
Every regular Linux process uses memory-mapped files, even the code in main() does not call
mmap()
directly, because of how Linux loads code from the executable and shared library code into memory. Below is the (highly abbreviated) output of thepmap
command line tool, which shows what different mapping comprise the virtual address space of a running program (the shell, in this example,tcsh
). The output shows four columns: the virtual address of the mapping, its size, the protection bits of the region, and the source of the mapping:0000000000400000 372K r-x-- tcsh 00000000019d5000 1780K rw--- [anon ] 00007f4e7cf06000 1792K r-x-- libc-2.23.so 00007f4e7d2d0000 36K r-x-- libcrypt2.23.so 00007f4e7d508000 148K r-x-- libtinfo.so.5.9 00007f4e7d731000 152K r-x-- ld-2.23.so 00007f4e7d932000 16K rw--- [stack ]
As you can see from this output, the code from the
tcsh
binary, as well as code fromlibc
,libcrypt
,libtinfo
, and code from the dynamic linker itself (ld.so
) are all mapped into the address space. Also present are two anonymous regions, the heap (the second entry, labeledanon
) and the stack (labeledstack
). Memory-mapped files provide a straight- forward and efficient way for the OS to construct a modern address space.
Get hands-on with 1400+ tech skills courses.