Data Structures with Generic Types in Python/

...

LinearHashTable: Linear Probing

Learn to implement a LinearHashTable.

We'll cover the following...

Open addressing
Structure of LinearHashTable
The LinearHashTable operations
Analysis of linear probing
Analysis of LinearHashTable
Tabulation hashing

Open addressing

The ChainedHashTable data structure uses an array of lists, where the $i^{th}$ list stores all elements x such that hash(x) = i. An alternative, called open addressing, is to store the elements directly in an array, t, with each array location in t storing at most one value. This approach is taken by the LinearHashTable described in this section. In some places, this data structure is described as open addressing with linear probing.

Structure of `LinearHashTable`

The main idea behind a LinearHashTable is that we would, ideally, like to store the element x with hash value i = hash(x) in the table location t[i]. If we cannot do this (because some element is already stored there) then we try to store it at location t[(i + 1) mod t.length]; if that’s not possible, then we try t[(i+ 2) mod t.length], and so on, until we find a place for x.

There are three types of entries stored in t:

data values: actual values in the USet that we are representing;
nil values: at array locations where no data has ever been stored; and
del values: at array locations where data was once stored but that has since been deleted.

In addition to the counter, n, that keeps track of the number of elements in the LinearHashTable, a counter, q, keeps track of the number of elements of Types 1 and 3. That is, q is equal to n plus the number of del values in t. To make this work efficiently, we need t to be considerably larger than q, so that there are lots of nil values in t. The operations on a LinearHashTable therefore maintain the invariant that $\text{t.length} \ge 2q$ .

To summarize, a LinearHashTable contains an array, t, that stores data elements, and integers n and q that keep track of the number of data elements and non-nil values of t, respectively. Because many hash functions only work for table sizes that are a power of $2$ , we also keep an integer d and maintain the invariant that $\text{t.length} = 2^{d}.$

Press + to interact

The `LinearHashTable` operations

The find(x) operation in a LinearHashTable is simple. We start at array entry t[i] where i = hash(x) and search entries $t[i], t[(i + 1) \mod \text{t.length}]$ , $t[(i + 2) \mod \text{t.length}]$ , and so on, until we find an index i such that, either, $t[i^{\prime}] = x,$ or $t[i^{\prime}] =$ nil. In the former case we return $t[i^{\prime}]$ . In the latter case, we conclude that x is not contained in the hash table and return nil.

Press + to interact

class LinearHashTable(BaseSet):
    
    def __init__(self, iterable=[]):
        self._initialize()
        self.initialize()
        self.add_all(iterable)
        
    def add(self, x):
        if self.find(x) is not None: return False
        if 2*(self.q+1) > len(self.t): self._resize()
        i = self._hash(x)
        while self.t[i] is not None and self.t[i] != self.dl:
            i = (i + 1) % len(self.t)
        if self.t[i] is None: self.q += 1
        self.n += 1
        self.t[i] = x
        return True

By now, the implementation of the remove(x) operation should be obvious. We search $t[i], t[(i + 1) \mod \text{t.length}], t[(i + 2) \mod \text{t.length}],$ and so on until we find an index $i^{\prime}$ such that $t[i^{\prime}] =$ x or $t[i^{\prime}] =$ nil. In the former case, we set $t[i^{\prime}]=$ del and return true. In the latter case we conclude that x was not stored in the table (and therefore cannot be deleted) and return false.

Press + to interact

class LinearHashTable(BaseSet):
    
    def __init__(self, iterable=[]):
        self._initialize()
        self.initialize()
        self.add_all(iterable)
        
    def remove(self, x):
        i = self._hash(x)
        while self.t[i] is not None:
            y = self.t[i]
            if y != self.dl and x == y:
                self.t[i] = self.dl
                self.n -= 1
                if 8*self.n < len(self.t): resize()
                return y
            i = (i + 1) % len(self.t)
        return None

The correctness of the find(x), add(x), and remove(x) methods is easy to verify, though it relies on the use of del values. Notice that none of these operations ever sets a non-nil entry to nil. Therefore, when we reach an index $i^{\prime}$ such that $t[i^{\prime} ]$ = nil, this is a proof that the element, x, that we are searching for is not stored in the table; $t[i^{\prime} ]$ has always been nil, so there is no reason that a previous add(x) operation would have proceeded beyond index $i^{\prime}.$

The _resize() method is called by add(x) when the number of non-nil entries exceeds $\text{t.length}/2$ or by remove(x) when the number of data entries is less than $\text{t.length}/8$ . The _resize() method works like the _resize() methods in other array-based data structures. We find the smallest non-negative integer d such that $2^{d} \ge 3n$ . We reallocate the array t so that it has size $2^{d}$ , and then we insert all the elements in the old version of t into the newly-resized copy of t. While doing this, we reset q equal to n since the newly-allocated t contains no del values.

Press + to interact

class LinearHashTable(BaseSet):
    
    def __init__(self, iterable=[]):
        self._initialize()
        self.initialize()
        self.add_all(iterable)
        
    def _resize(self):
        self.d = 1
        while ((1<<self.d) < 3*self.n): self.d += 1
        told = self.t
        self.t = new_array((1<<self.d))
        self.q = self.n
        for x in told:
            if x is not None and x != self.dl:
                i = self._hash(x)
                while self.t[i] is not None: 
                    i = (i+1) % len(self.t)
                self.t[i] = x

Analysis of linear probing

Notice that each operation, add(x), remove(x), or find(x), finishes as soon as (or before) it discovers the first nil entry in t. The intuition behind the analysis of linear probing is that, since at least half the elements in t are equal to nil, an operation should not take long to complete because it will very quickly come across a nil entry. We shouldn’t rely too heavily on this intuition, though, because it would lead us to (the incorrect) the conclusion that the expected number of locations in t examined by an operation is at most 2.

For the rest of this section, we will assume that all hash values are independently and uniformly distributed in $\{0,....,\text{t.length} - 1\}$ . This is not a realistic assumption, but it will make it possible for us to analyze linear probing. Later in this section we will describe a method, called tabulation hashing, that produces a hash function that is “good enough” for linear probing. We will also assume that all indices into the positions of t are taken modulo t.length}, so that t[i] is really a shorthand for $t[i \mod \text{t.length}].$

We say that a run of length $k$ that starts at $i$ occurs when all the table entries $t[i],t[i + 1],...,t[i+k−1]$ ...

Overview

Array-Based Lists

Linked Lists

Skiplists

Hash Tables

Binary Trees

Random Binary Search Trees

Scapegoat Trees

Red-Black Trees

Heaps

Sorting Algorithms

Graphs

Data Structures for Integers

External Memory Searching

Wrap Up

LinearHashTable: Linear Probing

Open addressing

Structure of `LinearHashTable`

The `LinearHashTable` operations

Analysis of linear probing

LinearHashTable: Linear Probing

Open addressing

Structure of LinearHashTable

The LinearHashTable operations

Analysis of linear probing

Structure of `LinearHashTable`

The `LinearHashTable` operations