ChainedHashTable: Hashing with Chaining
Check your understanding of hash tables.
We'll cover the following...
Hash tables overview
Hash tables are an efficient method of storing a small number, n
, of integers from a large range . The term hash table includes a
broad range of data structures. There are
two of the most common implementations of hash tables:
- hashing with chaining
- linear probing
Very often hash tables store types of data that are not integers. In this case, an integer hash code is associated with each data item and is used in the hash table. There are various ways of how such hash codes are generated.
Some of the methods used for hashing require random choices of integers in some specific range. In the code samples, some of these “random” integers are hard-coded constants. These constants were obtained using random bits generated from atmospheric noise.
The ChainedHashTable
structure
A ChainedHashTable
data structure uses hashing with chaining to store
data as an array, t
, of lists. An integer, n
, keeps track of the total number of items in all lists. See the below figure:
To initialize a ChainedHashTable
, we use the following variables:
class ChainedHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.add_all(iterable)def _initialize(self):self.d = 1self.t = self._alloc_table((1<<self.d))self.z = self._random_odd_int()self.n = 0
The hash value of a data item x
, denoted hash(x)
is a value in the range
. All items with hash value i
are stored in the list at t[i]
. To ensure that lists don’t get too long, we maintain the invariant.
so that the average number of elements stored in one of these lists is
t.length
.
To add an element, x
, to the hash table, we first check if the length of
t
needs to be increased and, if so, we grow t
. With this out of the way
we hash x
to get an integer, i
, in the range t.length
, and we
append x
to the list t[i]
:
class ChainedHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.add_all(iterable)def add(self, x):if self.find(x) is not None: return Falseif self.n+1 > len(self.t): self._resize()self.t[self._hash(x)].append(x)self.n += 1return True
Growing the table, if necessary, involves doubling the length of t and
reinserting all elements into the new table. This strategy is exactly the
same as the one used in the implementation of ArrayStack
and the same result applies: The cost of growing is only constant when amortized over
a sequence of insertions.
Besides growing, the only other work done when adding a new value
x
to a ChainedHashTable
involves appending x
to the list t[hash(x)]
. For any of the list implementations, this takes only constant time.
To remove an element, x
, from the hash table, we iterate over the list
t[hash(x)]
until we find x
so that we can remove it:
class ChainedHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.add_all(iterable)def remove(self, x):ell = self.t[self._hash(x)]for y in ell:if y == x:ell.remove_value(y)self.n -= 1if 3*self.n < len(self.t): self._resize()return yreturn None
This takes time, where denotes the length of the list stored at t[i]
.
Searching for the element x
in a hash table is similar. We perform a
linear search on the list t[hash(x)]
:
class ChainedHashTable(BaseSet):def __init__(self, iterable=[]):self._initialize()self.add_all(iterable)def find(self, x):for y in self.t[self._hash(x)]:if y == x:return yreturn None
Again, this takes time proportional to the length of the list t[hash(x)]
.
The performance of a hash table depends critically on the choice of
the hash function. A good hash function will spread the elements evenly
among the t.length
lists, so that the expected size of the list t[hash(x)]
is
. On the other hand, a bad hash function will hash all values (including x
) to the same table location, in which case the size
of the list t[hash(x)]
will be n
. In the next section, we describe a good hash
function.
Multiplicative hashing
Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic and integer division. It uses the div
operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers
and , .
In multiplicative hashing, we use a hash table of size for some integer d
(called the dimension). The formula for hashing an integer is
Here, z
is a randomly chosen odd integer in . This hash function can be realized very efficiently by observing that, by default, operations on integers are already done modulo where w
is the number of bits in an integer. See the below figure:
Note: This is true for most programming languages including C, C#, C++, and Java. Notable exceptions are Python and Ruby, in which the result of a fixed-length ...
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy