Measuring File Locality

Let's try to analyze whether there is a significant locality in the namespace.

To understand better whether the heuristics mentioned in the last lesson make sense, let’s analyze some traces of file system access and see if indeed there is namespace locality. For some reason, there doesn’t seem to be a good study of this topic in the literature.

Specifically, we’ll use the SEER traces“The Design of the SEER Predictive Caching System” by G. H. Kuenning. MOBICOMM ’94, Santa Cruz, California, December 1994. According to Kuenning, this is the best overview of the SEER project, which led to (among other things) the collection of these traces. and analyze how “far away” file accesses were from one another in the directory tree. For example, if file f is opened, and then re-opened next in the trace (before any other files are opened), the distance between these two opens in the directory tree is zero (as they are the same file). If a file f in directory dir (i.e., dir/f) is opened and followed by an open of file g in the same directory (i.e., dir/g), the distance between the two file accesses is one, as they share the same directory but are not the same file. Our distance metric, in other words, measures how far up the directory tree you have to travel to find the common ancestor of two files; the closer they are in the tree, the lower the metric.

Get hands-on with 1400+ tech skills courses.