Data Structures with Generic Types in C++/

...

LinearHashTable: Linear Probing

Learn to implement a LinearHashTable.

We'll cover the following...

Open addressing
Structure of LinearHashTable
The LinearHashTable operations
Analysis of linear probing
Analysis of LinearHashTable
Tabulation hashing
- Explanation

Open addressing

The ChainedHashTable data structure uses an array of lists, where the $i$ th list stores all elements x such that hash(x) = i. An alternative, called open addressing, is to store the elements directly in an array, t, with each array location in t storing at most one value. This approach is taken by the LinearHashTable described in this lesson. In some places, this data structure is described as open addressing with linear probing.

Structure of `LinearHashTable`

The main idea behind a LinearHashTable is that we would, ideally, like to store the element x with hash value i = hash(x) in the table location t[i]. If we cannot do this (because some element is already stored there) then we try to store it at location t[(i + 1) mod t.length]; if that’s not possible, then we try t[(i+ 2) mod t.length], and so on, until we find a place for x.

There are three types of entries stored in t:

Data values: Actual values in the USet that we are representing.
null values: Stored at array locations where no data has ever been stored.
del values: Stored at array locations where data was once stored but that has since been deleted.

In addition to the counter, n, that keeps track of the number of elements in the LinearHashTable, a counter, q, keeps track of the number of elements of Types 1 and 3. That is, q is equal to n plus the number of del values in t. To make this work efficiently, we need t to be considerably larger than q, so that there are lots of null values in t. The operations on a LinearHashTable therefore maintain the invariant that $\text{t.length} \ge 2q$ .

To summarize, a LinearHashTable contains an array, t, that stores data elements, and integers n and q that keep track of the number of data elements and non-null values of t, respectively. Because many hash functions only work for table sizes that are a power of $2$ , we also keep an integer d and maintain the invariant that $\text{t.length} = 2^{d}.$

Press + to interact

The `LinearHashTable` operations

The find(x) operation in a LinearHashTable is simple. We start at array entry t[i] where i = hash(x) and search entries $t[i], t[(i + 1) \mod \text{t.length}]$ , $t[(i + 2) \mod \text{t.length}]$ , and so on, until we find an index i such that, either, $t[i^{\prime}] = x,$ or $t[i^{\prime}] =$ null. In the former case we return $t[i^{\prime}]$ . In the latter case, we conclude that x is not contained in the hash table and return null.

Press + to interact

By now, the implementation of the remove(x) operation should be obvious. We search $t[i], t[(i + 1) \mod \text{t.length}], t[(i + 2) \mod \text{t.length}],$ and so on until we find an index $i^{\prime}$ such that $t[i^{\prime}] =$ x or $t[i^{\prime}] =$ null. In the former case, we set $t[i^{\prime}]=$ del and return true. In the latter case we conclude that x was not stored in the table (and therefore cannot be deleted) and return false.

Press + to interact

The correctness of the find(x), add(x), and remove(x) methods is easy to verify, though it relies on the use of del values. Notice that none of these operations ever sets a non-null entry to null. Therefore, when we reach an index $i^{\prime}$ such that $t[i^{\prime} ]$ = null, this is a proof that the element, x, that we are searching for is not stored in the table; $t[i^{\prime} ]$ has always been null, so there is no reason that a previous add(x) operation would have proceeded beyond index $i^{\prime}.$

The resize() method is called by add(x) when the number of non-null entries exceeds $\text{t.length}/2$ or by remove(x) when the number of data entries is less than $\text{t.length}/8$ . The resize() method works like the resize() methods in other array-based data structures. We find the smallest non-negative integer d such that $2^{d} \ge 3n$ . We reallocate the array t so that it has size $2^{d}$ , and then we insert all the elements in the old version of t into the newly-resized copy of t. While doing this, we reset q equal to n since the newly-allocated t contains no del values.

Press + to interact

Analysis of linear probing

Notice that each operation, add(x), remove(x), or find(x), finishes as soon as (or before) it discovers the first null entry in t. The intuition behind the analysis of linear probing is that, since at least half the elements in t are equal to null, an operation should not take long to complete because it will very quickly come across a null entry. We shouldn’t rely too heavily on this intuition, though, because it would lead us to (the incorrect) the conclusion that the expected number of locations in t examined by an operation is at most 2.

For the rest of this section, we will assume that all hash values are independently and uniformly distributed in $\{0,....,\text{t.length} - 1\}$ . This is not a realistic assumption, but it will make it possible for us to analyze linear probing. Later in this section, we will describe a method, called tabulation hashing, that produces a hash function that is “good enough” for linear probing. We will also assume that all indexes into the positions of t are taken modulo t.length}, so that t[i] is really a shorthand for $t[i \mod \text{t.length}].$

We say that a run of length $k$ that starts at $i$ occurs when all the table entries $t[i],t[i + 1],...,t[i+k−1]$ ...

Overview

Array-Based Lists

Linked Lists

Skiplists

Hash Tables

Binary Trees

Random Binary Search Trees

Scapegoat Trees

Red-Black Trees

Heaps

Sorting Algorithms

Graphs

Data Structures for Integers

External Memory Searching

Wrap Up

LinearHashTable: Linear Probing

Open addressing

Structure of `LinearHashTable`

The `LinearHashTable` operations

Analysis of linear probing

LinearHashTable: Linear Probing

Open addressing

Structure of LinearHashTable

The LinearHashTable operations

Analysis of linear probing

Structure of `LinearHashTable`

The `LinearHashTable` operations