In hash tables, generally, a hash function is used to compute the index of the array. The hash value is used as an index in the hash table to hold the key.
For two or more keys, the hash function can return the same hash value.
Before discussing collision resolution techniques, let's first understand what is hashing.
A collision occurs when two or more keys are assigned the same hash value. For example, if you have a hash table that has indexes from 0–6, the hash function will then be H(i)%6. Suppose we have to insert the values: 2, 6, 24, and 15 in this hashtable. Then the hash function will map the values 6 and 24 to the same index, causing a collision. Collision may also occur due to the limited size of the hash table or the use of a poor hash function.
We employ collision resolution strategies to deal with this collision. There are generally two types of collision resolution techniques:
Open hashing.
Closed hashing.
Let's first discuss open hashing in detail.
Open hashing or more widely known as chaining is one of the simplest approaches to avoid collision in hash tables. In open hashing, each hash table slot, also known as a bucket, contains a linked list of elements that hash to the same slot. The term chaining is used to describe the process because the linked list is used, which is like a chain of elements.
Let's understand chaining with an example:
Suppose we have a set of values {5,36,20,41,14,1,27,34} that are to be inserted in a hash table of length 7. Therefore, indexes from 0-6 and hash function to be H(i)%6. Here's a visual representation of how chaining works.
Note: Don't get confused, insertion at head is performed while inserting in linked list in chaining
Like open hashing, closed hashing is also a technique used for collision resolution in hash tables. Unlike open hashing, where collisions are resolved by chaining elements in separate chains, closed hashing aims to store all elements directly within the hash table itself without the use of additional data structures like linked lists.
In closed hashing, when a collision occurs, and a hash slot is already occupied, the algorithm probes for the next available slot in the hash table until an empty slot is found.
Let's have a look at how search, insert and delete operations are conducted in closed hashing:
Search: To search for an element in closed hashing, you calculate its hash value and probe through the slots until you find the desired element or an empty slot. If the element is found, you can retrieve it. If an empty slot is encountered during probing, the element is not present in the hash table.
Insert: When inserting a new element, you calculate its hash value and probe through the slots until you find an empty slot. Once an empty slot is found, you insert the element into that slot.
Delete: Deleting an element in closed hashing typically involves marking the slot as "deleted" or using a special marker to indicate that the slot was once occupied but is now vacant.
The probing process can be done in various ways, such as linear probing, quadratic probing, or double hashing.
Linear Probing: In linear probing, if a collision occurs and a slot is already occupied, the algorithm linearly searches for the next available slot in a sequential manner. It checks the next slot and continues checking subsequent slots until it finds an empty slot.
The general formula for linear probing:
Note: Linear probing may cause primary clustering. Primary clustering refers to a phenomenon in closed hashing where consecutive collisions form long chains of occupied slots, leading to the accumulation of elements in specific regions of the hash table.
Quadratic Probing: In quadratic probing, instead of probing the next slot linearly, the algorithm uses a quadratic function to determine the next probe position. The probing sequence follows a quadratic pattern until an empty slot is found.
The general formula for quadratic probing:
Note: Quadratic probing may cause secondary clustering. Secondary clustering is another form of clustering in closed hashing that occurs when different keys produce the same probe sequence, leading to repeated patterns of collisions.
Double hashing: utilizes two hash functions to determine the probe sequence. When a collision occurs, it applies the second hash function to calculate an offset, which is then used to find the next slot to probe.
The general formula for double hashing:
At last, let's compare the collision resolution techniques in terms of their advantages, disadvantages, and performance.
Technique | Advantages | Disadvantages | Performance |
Chaining | Efficient for high collision scenarios. | Requires extra memory for linked lists. | Lookup, insertion, and deletion have O(1 + k/n) average complexity, where k is the number of elements. |
Linear Probing | Simple implementation and cache-friendly. | Prone to primary clustering. | Lookup, insertion, and deletion have O(1) average complexity, but can degrade to O(n) in worst cases. |
Quadratic Probing | Reduces primary clustering. | Suffers from secondary clustering. | Lookup, insertion, and deletion have O(1) average complexity, but can degrade to O(n) in worst cases. |
Double Hashing | Uniform distribution and avoids clustering issues. | Requires careful selection of hash functions. | Lookup, insertion, and deletion have O(1) average complexity, but performance can degrade if hash functions produce many collisions. |
Free Resources