...

/

ChainedHashTable: Hashing with Chaining

ChainedHashTable: Hashing with Chaining

Check your understanding of hash tables.

Hash tables overview

Hash tables are an efficient method of storing a small number, n, of integers from a large range U={0,...,2w1}U = \{0,...,2^{w}− 1\}. The term hash table includes a broad range of data structures. There are two of the most common implementations of hash tables:

  • hashing with chaining
  • linear probing

Very often hash tables store types of data that are not integers. In this case, an integer hash code is associated with each data item and is used in the hash table. There are various ways of how such hash codes are generated.

Some of the methods used for hashing require random choices of integers in some specific range. In the code samples, some of these “random” integers are hard-coded constants. These constants were obtained using random bits generated from atmospheric noise.

The ChainedHashTable structure

A ChainedHashTable data structure uses hashing with chaining to store data as an array, t, of lists. An integer, n, keeps track of the total number of items in all lists. See the below figure:

Press + to interact
An example of a ChainedHashTable with n = 14 and t.length = 16. In this example hash(x) = 6
An example of a ChainedHashTable with n = 14 and t.length = 16. In this example hash(x) = 6

To initialize a ChainedHashTable, we use the following variables:

Press + to interact
class ChainedHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.add_all(iterable)
def _initialize(self):
self.d = 1
self.t = self._alloc_table((1<<self.d))
self.z = self._random_odd_int()
self.n = 0

The hash value of a data item x, denoted hash(x) is a value in the range {0,...,t.length1}\{0,..., \text{t.length}-1\}. All items with hash value i are stored in the list at t[i]. To ensure that lists don’t get too long, we maintain the invariant.

nt.length\textcolor{Red}{n \leq \text{t.length}}

so that the average number of elements stored in one of these lists is n/n / t.length 1\le 1.

To add an element, x, to the hash table, we first check if the length of t needs to be increased and, if so, we grow t. With this out of the way we hash x to get an integer, i, in the range {0,,\{0,\ldots,t.length1}-1\}, and we append x to the list t[i]:

Press + to interact
class ChainedHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.add_all(iterable)
def add(self, x):
if self.find(x) is not None: return False
if self.n+1 > len(self.t): self._resize()
self.t[self._hash(x)].append(x)
self.n += 1
return True

Growing the table, if necessary, involves doubling the length of t and reinserting all elements into the new table. This strategy is exactly the same as the one used in the implementation of ArrayStack and the same result applies: The cost of growing is only constant when amortized over a sequence of insertions.

Besides growing, the only other work done when adding a new value x to a ChainedHashTable involves appending x to the list t[hash(x)]. For any of the list implementations, this takes only constant time.

To remove an element, x, from the hash table, we iterate over the list t[hash(x)] until we find x so that we can remove it:

Press + to interact
class ChainedHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.add_all(iterable)
def remove(self, x):
ell = self.t[self._hash(x)]
for y in ell:
if y == x:
ell.remove_value(y)
self.n -= 1
if 3*self.n < len(self.t): self._resize()
return y
return None

This takes O(nhash(x))O(n_ \text{hash(x)}) time, where nin_i denotes the length of the list stored at t[i].

Searching for the element x in a hash table is similar. We perform a linear search on the list t[hash(x)]:

Press + to interact
class ChainedHashTable(BaseSet):
def __init__(self, iterable=[]):
self._initialize()
self.add_all(iterable)
def find(self, x):
for y in self.t[self._hash(x)]:
if y == x:
return y
return None

Again, this takes time proportional to the length of the list t[hash(x)].

The performance of a hash table depends critically on the choice of the hash function. A good hash function will spread the elements evenly among the t.length lists, so that the expected size of the list t[hash(x)] is O(n/t.length)=O(1)O(n/ \text{t.length}) = O(1). On the other hand, a bad hash function will hash all values (including x) to the same table location, in which case the size of the list t[hash(x)] will be n. In the next section, we describe a good hash function.

Multiplicative hashing

Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic and integer division. It uses the div operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers a0a \ge 0 and b1b \ge 1, a div b=a/ba \text{ div }b = \lfloor a/b \rfloor.

In multiplicative hashing, we use a hash table of size 2d2^{d} for some integer d (called the dimension). The formula for hashing an integer x{0,...,2w1}x \in \{0,...,2^{w} − 1\} is

hash(x)=((zx)  mod 2w)div 2wd\text{hash(x)} = ((z · x)\ \text{ mod}\ 2^{w}) \text{div}\ 2^{w-d}

Here, z is a randomly chosen odd integer in {1,...,2w1}\{ 1,...,2^{w} −1\}. This hash function can be realized very efficiently by observing that, by default, operations on integers are already done modulo 2w2^{w} where w is the number of bits in an integer. See the below figure:

Note: This is true for most programming languages including C, C#, C++, and Java. Notable exceptions are Python and Ruby, in which the result of a fixed-length wbitw-bit ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy