...

/

LinearHashTable: Linear Probing

LinearHashTable: Linear Probing

Learn to implement a LinearHashTable.

Open addressing

The ChainedHashTable data structure uses an array of lists, where the iith list stores all elements x such that hash(x) = i. An alternative, called open addressing, is to store the elements directly in an array, t, with each array location in t storing at most one value. This approach is taken by the LinearHashTable described in this lesson. In some places, this data structure is described as open addressing with linear probing.

Structure of LinearHashTable

The main idea behind a LinearHashTable is that we would, ideally, like to store the element x with hash value i = hash(x) in the table location t[i]. If we cannot do this (because some element is already stored there) then we try to store it at location t[(i + 1) mod t.length]; if that’s not possible, then we try t[(i+ 2) mod t.length], and so on, until we find a place for x.

There are three types of entries stored in t:

  1. Data values: Actual values in the USet that we are representing.
  2. null values: Stored at array locations where no data has ever been stored.
  3. del values: Stored at array locations where data was once stored but that has since been deleted.

In addition to the counter, n, that keeps track of the number of elements in the LinearHashTable, a counter, q, keeps track of the number of elements of Types 1 and 3. That is, q is equal to n plus the number of del values in t. To make this work efficiently, we need t to be considerably larger than q, so that there are lots of null values in t. The operations on a LinearHashTable therefore maintain the invariant that t.length2q\text{t.length} \ge 2q.

To summarize, a LinearHashTable contains an array, t, that stores data elements, and integers n and q that keep track of the number of data elements and non-null values of t, respectively. Because many hash functions only work for table sizes that are a power of 22, we also keep an integer d and maintain the invariant that t.length=2d.\text{t.length} = 2^{d}.

Press + to interact
T[] t; // the table
int n; // the size
int d; // t.length = 2ˆd
int q; // number of non-null entries in t

The LinearHashTable operations

The find(x) operation in a LinearHashTable is simple. We start at array entry t[i] where i = hash(x) and search entries t[i],t[(i+1)modt.length]t[i], t[(i + 1) \mod \text{t.length}], t[(i+2)modt.length]t[(i + 2) \mod \text{t.length}], and so on, until we find an index i such that, either, t[i]=x,t[i^{\prime}] = x, or t[i]=t[i^{\prime}] = null. In the former case we return t[i]t[i^{\prime}]. In the latter case, we conclude that x is not contained in the hash table and return null.

Press + to interact
T find(T x) {
int i = hash(x);
while (t[i] != null) {
if (t[i] != del && x.equals(t[i])) return t[i];
i = (i == t.length - 1) ? 0 : i + 1; // increment i
}
return null;
}

The add(x) operation is also fairly easy to implement. After checking that x is not already stored in the table (using find(x)), we search t[i],t[(i+1)modt.length],t[(i+2)modt.length],t[i],t[(i+1) \mod \text{t.length}], t[(i+2) \mod \text{t.length}], and so on, until we find a null or del and store x at that location, increment n, and q, if appropriate.

Press + to interact
boolean add(T x) {
if (find(x) != null) return false;
if (2 * (q + 1) > t.length) resize(); // max 50% occupancy
int i = hash(x);
while (t[i] != null && t[i] != del)
i = (i == t.length - 1) ? 0 : i + 1; // increment i
if (t[i] == null) q++;
n++;
t[i] = x;
return true;
}

By now, the implementation of the remove(x) operation should be obvious. We search t[i],t[(i+1)modt.length],t[(i+2)modt.length],t[i], t[(i + 1) \mod \text{t.length}], t[(i + 2) \mod \text{t.length}], and so on until we find an index ii^{\prime} such that t[i]=t[i^{\prime}] = x or t[i]=t[i^{\prime}] = null. In the former case, we set t[i]=t[i^{\prime}]= del and return true. In the latter case we conclude that x was not stored in the table (and therefore cannot be deleted) and return false.

Press + to interact
T remove(T x) {
int i = hash(x);
while (t[i] != null) {
T y = t[i];
if (y != del && x.equals(y)) {
t[i] = del;
n--;
if (8 * n < t.length) resize(); // min 12.5% occupancy
return y;
}
i = (i == t.length - 1) ? 0 : i + 1; // increment i
}
return null;
}

The correctness of the find(x), add(x), and remove(x) methods is easy to verify, though it relies on the use of del values. Notice that none of these operations ever sets a non-null entry to null. Therefore, when we reach an index ii^{\prime} such that t[i]t[i^{\prime} ] = null, this is a proof that the element, x, that we are searching for is not stored in the table; t[i]t[i^{\prime} ] has always been null, so there is no reason that a previous add(x) operation would have proceeded beyond index i.i^{\prime}.

The resize() method is called by add(x) when the number of non-null entries exceeds t.length/2\text{t.length}/2 or by remove(x) when the number of data entries is less than t.length/8\text{t.length}/8. The resize() method works like the resize() methods in other array-based data structures. We find the smallest non-negative integer d such that 2d3n2^{d} \ge 3n. We reallocate the array t so that it has size 2d2^{d} , and then we insert all the elements in the old version of t into the newly-resized copy of t. While doing this, we reset q equal to n since the newly-allocated t contains no del values.

Press + to interact
void resize() {
d = 1;
while ((1 << d) < 3 * n) d++;
T[] told = t;
t = newArray(1 << d);
q = n;
// insert everything from told
for (int k = 0; k < told.length; k++) {
if (told[k] != null && told[k] != del) {
int i = hash(told[k]);
while (t[i] != null)
i = (i == t.length - 1) ? 0 : i + 1;
t[i] = told[k];
}
}
}

Analysis of linear probing

Notice that each operation, add(x), remove(x), or find(x), finishes as soon as (or before) it discovers the first null entry in t. The intuition behind the analysis of linear probing is that, since at least half the elements in t are equal to null, an operation should not take long to complete because it will very quickly come across a null entry. We shouldn’t rely too heavily on this intuition, though, because it would lead us to (the incorrect) the conclusion that the expected number of locations in t examined by an operation is at most 2.

For the rest of this section, we will assume that all hash values are independently and uniformly distributed in {0,....,t.length1}\{0,....,\text{t.length} - 1\}. This is not a realistic assumption, but it will make it possible for us to analyze linear probing. Later in this section, we will describe a method, called tabulation hashing, that produces a hash function that is “good enough” for linear probing. We will also assume that all indexes into the positions of t are taken modulo t.length}, so that t[i] is really a shorthand for t[imodt.length].t[i \mod \text{t.length}].

We say that a run of length kk that starts at ii occurs when all the table entries t[i],t[i+ ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy