Search⌘ K

Storing Files on Tape

Explore how greedy algorithms optimize the order of storing files on tape to minimize expected and total access costs. Understand the role of file lengths and access frequencies, and apply sorting strategies that improve read efficiency without backtracking or dynamic programming.

Introduction to storing files

Suppose we have a set of nn files that we want to store on magnetic tape. In the future, users will want to read those files from the tape. Reading a file from tape isn’t like reading a file from a disk; first, we have to fast-forward past all the other files, and that takes a significant amount of time. Let L[1..n]L[1 .. n] be an array listing the lengths of each file; specifically, file ii has length L[i]L[i]. If the files are stored in order from 11 to nn, then the cost of accessing the kk-th file is

cost(k)=ki=1L[i].cost(k) = \underset{i=1}{\overset{k}\sum}L[i].

Expected cost of accessing a random file

The cost reflects the fact that before we read file kk, we must first scan past all the earlier files on the tape. If we assume for the moment that each file is equally likely to be accessed, then the expected cost of searching for a random file is

E[cost]=nk=1cost(k)n=1nnk=1 ki=1L[i].E[cost] = \underset{k=1}{\overset{n}{\sum}}\frac{cost(k)}{n} = \frac{1}{n} \underset{k=1}{\overset{n}{\sum}} \space \underset{i=1}{\overset{k}{\sum}}L[i]. ...