...

/

LinearHashTable: Linear Probing

LinearHashTable: Linear Probing

Learn to implement a LinearHashTable.

Open addressing

The ChainedHashTable data structure uses an array of lists, where the ithi^{th} list stores all elements x such that hash(x) = i. An alternative, called open addressing, is to store the elements directly in an array, t, with each array location in t storing at most one value. This approach is taken by the LinearHashTable described in this section. In some places, this data structure is described as open addressing with linear probing.

Structure of LinearHashTable

The main idea behind a LinearHashTable is that we would, ideally, like to store the element x with hash value i = hash(x) in the table location t[i]. If we cannot do this (because some element is already stored there) then we try to store it at location t[(i + 1) mod t.length]; if that’s not possible, then we try t[(i+ 2) mod t.length], and so on, until we find a place for x.

There are three types of entries stored in t:

  1. data values: actual values in the USet that we are representing;
  2. nil values: at array locations where no data has ever been stored; and
  3. del values: at array locations where data was once stored but that has since been deleted.

In addition to the counter, n, that keeps track of the number of elements in the LinearHashTable, a counter, q, keeps track of the number of elements of Types 1 and 3. That is, q is equal to n plus the number of del values in t. To make this work efficiently, we need t to be considerably larger than q, so that there are lots of nil values in t. The operations on a LinearHashTable therefore maintain the invariant that t.length2q\text{t.length} \ge 2q.

To summarize, a LinearHashTable contains an array, t, that stores data elements, and integers n and q that keep track of the number of data elements and non-nil values of t, respectively. Because many hash functions only work for table sizes that are a power of 22, we also keep an integer d and maintain the invariant that t.length=2d.\text{t.length} = 2^{d}.

Press + to interact
class LinearHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.initialize()
self.add_all(iterable)
def _initialize(self):
self.dl = object()
def initialize(self):
self.d = 1
self.t = new_array((1<<self.d))
self.q = 0
self.n = 0

The LinearHashTable operations

The find(x) operation in a LinearHashTable is simple. We start at array entry t[i] where i = hash(x) and search entries t[i],t[(i+1)modt.length]t[i], t[(i + 1) \mod \text{t.length}], t[(i+2)modt.length]t[(i + 2) \mod \text{t.length}], and so on, until we find an index i such that, either, t[i]=x,t[i^{\prime}] = x, or t[i]=t[i^{\prime}] = nil. In the former case we return t[i]t[i^{\prime}]. In the latter case, we conclude that x is not contained in the hash table and return nil.

Press + to interact
class LinearHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.initialize()
self.add_all(iterable)
def find(self, x):
i = self._hash(x)
while self.t[i] is not None:
if self.t[i] != self.dl and x == self.t[i]:
return self.t[i]
i = (i + 1) % len(self.t)

The add(x) operation is also fairly easy to implement. After checking that x is not already stored in the table (using find(x)), we search t[i],t[(i+1)modt.length],t[(i+2)modt.length],t[i],t[(i+1) \mod \text{t.length}], t[(i+2) \mod \text{t.length}], and so on, until we find a nil or del and store x at that location, increment n, and q, if appropriate.

Press + to interact
class LinearHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.initialize()
self.add_all(iterable)
def add(self, x):
if self.find(x) is not None: return False
if 2*(self.q+1) > len(self.t): self._resize()
i = self._hash(x)
while self.t[i] is not None and self.t[i] != self.dl:
i = (i + 1) % len(self.t)
if self.t[i] is None: self.q += 1
self.n += 1
self.t[i] = x
return True

By now, the implementation of the remove(x) operation should be obvious. We search t[i],t[(i+1)modt.length],t[(i+2)modt.length],t[i], t[(i + 1) \mod \text{t.length}], t[(i + 2) \mod \text{t.length}], and so on until we find an index ii^{\prime} such that t[i]=t[i^{\prime}] = x or t[i]=t[i^{\prime}] = nil. In the former case, we set t[i]=t[i^{\prime}]= del and return true. In the latter case we conclude that x was not stored in the table (and therefore cannot be deleted) and return false.

Press + to interact
class LinearHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.initialize()
self.add_all(iterable)
def remove(self, x):
i = self._hash(x)
while self.t[i] is not None:
y = self.t[i]
if y != self.dl and x == y:
self.t[i] = self.dl
self.n -= 1
if 8*self.n < len(self.t): resize()
return y
i = (i + 1) % len(self.t)
return None

The correctness of the find(x), add(x), and remove(x) methods is easy to verify, though it relies on the use of del values. Notice that none of these operations ever sets a non-nil entry to nil. Therefore, when we reach an index ii^{\prime} such that t[i]t[i^{\prime} ] = nil, this is a proof that the element, x, that we are searching for is not stored in the table; t[i]t[i^{\prime} ] has always been nil, so there is no reason that a previous add(x) operation would have proceeded beyond index i.i^{\prime}.

The _resize() method is called by add(x) when the number of non-nil entries exceeds t.length/2\text{t.length}/2 or by remove(x) when the number of data entries is less than t.length/8\text{t.length}/8. The _resize() method works like the _resize() methods in other array-based data structures. We find the smallest non-negative integer d such that 2d3n2^{d} \ge 3n. We reallocate the array t so that it has size 2d2^{d} , and then we insert all the elements in the old version of t into the newly-resized copy of t. While doing this, we reset q equal to n since the newly-allocated t contains no del values.

Press + to interact
class LinearHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.initialize()
self.add_all(iterable)
def _resize(self):
self.d = 1
while ((1<<self.d) < 3*self.n): self.d += 1
told = self.t
self.t = new_array((1<<self.d))
self.q = self.n
for x in told:
if x is not None and x != self.dl:
i = self._hash(x)
while self.t[i] is not None:
i = (i+1) % len(self.t)
self.t[i] = x

Analysis of linear probing

Notice that each operation, add(x), remove(x), or find(x), finishes as soon as (or before) it discovers the first nil entry in t. The intuition behind the analysis of linear probing is that, since at least half the elements in t are equal to nil, an operation should not take long to complete because it will very quickly come across a nil entry. We shouldn’t rely too heavily on this intuition, though, because it would lead us to (the incorrect) the conclusion that the expected number of locations in t examined by an operation is at most 2.

For the rest of this section, we will assume that all hash values are independently and uniformly distributed in {0,....,t.length1}\{0,....,\text{t.length} - 1\}. This is not a realistic assumption, but it will make it possible for us to analyze linear probing. Later in this section we will describe a method, called tabulation hashing, that produces a hash function that is “good enough” for linear probing. We will also assume that all indices into the positions of t are taken modulo t.length}, so that t[i] is really a shorthand for t[imodt.length].t[i \mod \text{t.length}].

We say that a run of length kk that starts at ii occurs when all the table entries ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy