A New Problem: Misdirected Writes

In this lesson, we look at the problem of misdirected writes and discuss a solution for it.

We'll cover the following

The basic scheme described in the previous lesson works well in the general case of corrupted blocks. However, modern disks have a couple of unusual failure modes that require different solutions.

The first failure mode of interest is called a misdirected write. This arises in disk and RAID controllers which write the data to disk correctly, except in the wrong location. In a single-disk system, this means that the disk wrote block DxD_x not to address xx (as desired) but rather to address yy (thus “corrupting” DyD_y). In addition, within a multi-disk system, the controller may also write Di,xD_{i,x} not to address xx of disk ii but rather to some other disk jj. Thus our question:

CRUX: HOW TO HANDLE MISDIRECTED WRITES

How should a storage system or disk controller detect misdirected writes? What additional features are required from the checksum?

Adding a physical identifier

The answer, not surprisingly, is simple: add a little more information to each checksum. In this case, adding a physical identifier (physical ID) is quite helpful. For example, if the stored information now contains the checksum C(D)C(D) and both the disk and sector numbers of the block, it is easy for the client to determine whether the correct information resides within a particular locale. Specifically, if the client is reading block 4 on disk 10 (D10,4D_{10,4}), the stored information should include that disk number and sector offset, as shown below. If the information does not match, a misdirected write has taken place, and a corruption is now detected. Here is an example of what this added information would look like on a two-disk system. Note that this figure, like the others before it, is not to scale, as the checksums are usually small (e.g., 8 bytes) whereas the blocks are much larger (e.g., 4 KB or bigger):

Get hands-on with 1300+ tech skills courses.