Solution: Design HashSet

Let’s solve the Design HashSet using the Custom Data Structures pattern.

We'll cover the following

Statement
Solution
- Time complexity
- Space complexity

Statement

Design a MyHashSet class without using any built-in hash table libraries and implement the following methods in it:

void add(key): Inserts the value key into the HashSet.
bool contains(key): Returns TRUE if the key exists in the HashSet, FALSE otherwise.
void remove(key): Removes the value key if it exists in the HashSet.

Constraints:

$0$ $\leq$ key $\leq 10^6$
At most, $10^4$ calls will be made to add, contains, and remove methods.

Solution

Designing a hash set involves addressing two main challenges:

Hash function: A hash function maps a given key to an index in a storage space. This function takes an integer key and returns its remainder when divided by a predefined number. A good hash function ensures that keys are evenly distributed across the storage space, preventing the clustering of keys in certain locations. This even distribution helps maintain efficient access to stored values.
Collision handling: Although the purpose of the hash function is to distribute keys evenly across the storage space, collisions can still occur when two different keys map to the same index. We can use open addressing, 2-choice hashing, or separate chaining to handle these collisions. This solution uses separate chaining, where each index of storage space, called a bucket, contains a binary search tree (BST) to store multiple values.

The algorithm to solve this problem is as follows:

Write the Bucket class and implement the following methods in it:
- Initialize the bucket: The Bucket class initializes an empty binary search tree (BST) when creating a new bucket. This tree will store the keys that hash to this specific bucket.
- Insert a key: The insert(Integer key) method inserts the key into the binary search tree. If the key is already present, the insertion is skipped.
- Delete a key: The delete(Integer key) method removes the specified key from the binary search tree. It searches for the key and deletes it if found.
- Check if a key exists: The exists(Integer key) method checks if a key is present in the binary search tree. It searches for the key and returns TRUE if found and FALSE otherwise.
Write the MyHashSet class and implement the following methods in it:
- Initialize the HashSet (constructor): Create a bucket_array with $769$ elements, each representing a bucket. Each bucket is initialized as an empty binary search tree (BST) to hold multiple values. We use $769$ as the size of the bucket_array because it is a prime number, which helps reduce collisions by distributing keys more evenly across buckets.
- Hash function: Implement the hash(int key) method to compute the index for a key using key % 769. This helps distribute keys uniformly across the array.
- Add a key: In the add(int key) method, use the hash function to find the appropriate bucket index. Insert the key into the binary search tree at that index, skipping the insertion if the key already exists.
- Remove a key: In the remove(int key) method, determine the bucket index using the hash function. Locate and delete the key from the corresponding binary search tree.
- Check if a key exists: The contains(int key) method uses the hash function to find the bucket index. It then checks the binary search tree at that index to see if the key is present.

Let’s look at the following illustration to get a better understanding of the solution:

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.

Getting Started

Two Pointers

Fast and Slow Pointers

Sliding Window

Merge Intervals

In-Place Manipulation of a Linked List

Two Heaps

K-way merge

Top K Elements

Modified Binary Search

Subsets

Greedy Techniques

Backtracking

Dynamic Programming

Cyclic Sort

Topological Sort

Matrices

Stacks

Graphs

Tree Depth-First Search

Tree Breadth-First Search

Trie

Hash Maps

Knowing What to Track

Union Find

Custom Data Structures

Bitwise Manipulation

Challenge Yourself

Conclusion

Solution: Design HashSet

Statement

Solution