LinearHashTable: Linear Probing
Learn to implement a LinearHashTable.
Open addressing
The ChainedHashTable
data structure uses an array of lists, where the
list stores all elements x
such that hash(x) = i
. An alternative, called open addressing, is to store the elements directly in an array, t
, with each array location in t
storing at most one value. This approach is taken by the LinearHashTable
described in this section. In some places, this data
structure is described as open addressing with linear probing.
Structure of LinearHashTable
The main idea behind a LinearHashTable
is that we would, ideally, like to store the element x
with hash value i = hash(x)
in the table location t[i]
. If we cannot do this (because some element is already stored there) then we try to store it at location t[(i + 1) mod t.length]
; if that’s not possible, then we try t[(i+ 2) mod t.length]
, and so on, until we find a place for x
.
There are three types of entries stored in t
:
- data values: actual values in the USet that we are representing;
nil
values: at array locations where no data has ever been stored; anddel
values: at array locations where data was once stored but that has since been deleted.
In addition to the counter, n
, that keeps track of the number of elements
in the LinearHashTable
, a counter, q
, keeps track of the number of elements of Types 1 and 3. That is, q
is equal to n
plus the number of del
values in t
. To make this work efficiently, we need t to be considerably
larger than q
, so that there are lots of nil
values in t
. The operations on
a LinearHashTable
therefore maintain the invariant that .
To summarize, a LinearHashTable
contains an array, t
, that stores
data elements, and integers n
and q
that keep track of the number of
data elements and non-nil values of t
, respectively. Because many hash
functions only work for table sizes that are a power of , we also keep an
integer d
and maintain the invariant that
class LinearHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.initialize()self.add_all(iterable)def _initialize(self):self.dl = object()def initialize(self):self.d = 1self.t = new_array((1<<self.d))self.q = 0self.n = 0
The LinearHashTable
operations
The find(x)
operation in a LinearHashTable
is simple. We start at
array entry t[i]
where i = hash(x)
and search entries , , and so on, until we find an index i
such that, either, or nil
. In the former case we return . In the latter case, we conclude that x
is not contained in the hash table and return nil
.
class LinearHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.initialize()self.add_all(iterable)def find(self, x):i = self._hash(x)while self.t[i] is not None:if self.t[i] != self.dl and x == self.t[i]:return self.t[i]i = (i + 1) % len(self.t)
The add(x)
operation is also fairly easy to implement. After checking
that x
is not already stored in the table (using find(x)
), we search and so on, until we find a nil
or del
and store x
at that location, increment n
, and q
, if appropriate.
class LinearHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.initialize()self.add_all(iterable)def add(self, x):if self.find(x) is not None: return Falseif 2*(self.q+1) > len(self.t): self._resize()i = self._hash(x)while self.t[i] is not None and self.t[i] != self.dl:i = (i + 1) % len(self.t)if self.t[i] is None: self.q += 1self.n += 1self.t[i] = xreturn True
By now, the implementation of the remove(x)
operation should be obvious. We search and so on until we find an index such that x
or nil
. In the former case, we set del
and return true
. In the latter case we conclude that x
was not stored in the table (and therefore cannot be deleted) and return false
.
class LinearHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.initialize()self.add_all(iterable)def remove(self, x):i = self._hash(x)while self.t[i] is not None:y = self.t[i]if y != self.dl and x == y:self.t[i] = self.dlself.n -= 1if 8*self.n < len(self.t): resize()return yi = (i + 1) % len(self.t)return None
The correctness of the find(x)
, add(x)
, and remove(x)
methods is easy
to verify, though it relies on the use of del
values. Notice that none of
these operations ever sets a non-nil entry to nil
. Therefore, when we
reach an index
such that = nil
, this is a proof that the element, x
,
that we are searching for is not stored in the table; has always been
nil
, so there is no reason that a previous add(x)
operation would have
proceeded beyond index
The _resize()
method is called by add(x)
when the number of non-nil entries exceeds or by remove(x)
when the number of
data entries is less than . The _resize()
method works like the _resize()
methods in other array-based data structures. We find the smallest non-negative integer d
such that . We reallocate the array t
so
that it has size
, and then we insert all the elements in the old version
of t
into the newly-resized copy of t
. While doing this, we reset q
equal
to n
since the newly-allocated t
contains no del
values.
class LinearHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.initialize()self.add_all(iterable)def _resize(self):self.d = 1while ((1<<self.d) < 3*self.n): self.d += 1told = self.tself.t = new_array((1<<self.d))self.q = self.nfor x in told:if x is not None and x != self.dl:i = self._hash(x)while self.t[i] is not None:i = (i+1) % len(self.t)self.t[i] = x
Analysis of linear probing
Notice that each operation, add(x)
, remove(x)
, or find(x)
, finishes as soon
as (or before) it discovers the first nil
entry in t
. The intuition behind the analysis of linear probing is that, since at least half the elements in t
are equal to nil
, an operation should not take long to complete because
it will very quickly come across a nil entry. We shouldn’t rely too heavily on this intuition, though, because it would lead us to (the incorrect)
the conclusion that the expected number of locations in t
examined by an
operation is at most 2.
For the rest of this section, we will assume that all hash values are
independently and uniformly distributed in . This is
not a realistic assumption, but it will make it possible for us to analyze
linear probing. Later in this section we will describe a method, called
tabulation hashing, that produces a hash function that is “good enough”
for linear probing. We will also assume that all indices into the positions
of t
are taken modulo t.length}
, so that t[i]
is really a shorthand for
We say that a run of length that starts at occurs when all the table entries ...
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy