Solution: Design HashSet

Let’s solve the Design HashSet using the Custom Data Structures pattern.

Statement

Design a MyHashSet class without using any built-in hash table libraries and implement the following methods in it:

  • void add(key): Inserts the value key into the HashSet.

  • bool contains(key): Returns TRUE if the key exists in the HashSet, FALSE otherwise.

  • void remove(key): Removes the value key if it exists in the HashSet.

Constraints:

  • 00 \leq key 106\leq 10^6

  • At most, 10410^4 calls will be made to add, contains, and remove methods.

Solution

Designing a hash set involves addressing two main challenges:

  • Hash function: A hash function maps a given key to an index in a storage space. This function takes an integer key and returns its remainder when divided by a predefined number. A good hash function ensures that keys are evenly distributed across the storage space, preventing the clustering of keys in certain locations. This even distribution helps maintain efficient access to stored values.

  • Collision handling: Although the purpose of the hash function is to distribute keys evenly across the storage space, collisions can still occur when two different keys map to the same index. We can use open addressing, 2-choice hashing, or separate chaining to handle these collisions. This solution uses separate chaining, where each index of storage space, called a bucket, contains a binary search tree (BST) to store multiple values.

The algorithm to solve this problem is as follows:

  • Write the Bucket class and implement the following methods in it:

    • Initialize the bucket: The Bucket class initializes an empty binary search tree (BST) when creating a new bucket. This tree will store the keys that hash to this specific bucket.

    • Insert a key: The insert(Integer key) method inserts the key into the binary search tree. If the key is already present, the insertion is skipped.

    • Delete a key: The delete(Integer key) method removes the specified key from the binary search tree. It searches for the key and deletes it if found.

    • Check if a key exists: The exists(Integer key) method checks if a key is present in the binary search tree. It searches for the key and returns TRUE if found and FALSE otherwise.

  • Write the MyHashSet class and implement the following methods in it:

    • Initialize the HashSet (constructor): Create a bucket_array with 769769 elements, each representing a bucket. Each bucket is initialized as an empty binary search tree (BST) to hold multiple values. We use 769769 as the size of the bucket_array because it is a prime number, which helps reduce collisions by distributing keys more evenly across buckets.

    • Hash function: Implement the hash(int key) method to compute the index for a key using key % 769. This helps distribute keys uniformly across the array.

    • Add a key: In the add(int key) method, use the hash function to find the appropriate bucket index. Insert the key into the binary search tree at that index, skipping the insertion if the key already exists.

    • Remove a key: In the remove(int key) method, determine the bucket index using the hash function. Locate and delete the key from the corresponding binary search tree.

    • Check if a key exists: The contains(int key) method uses the hash function to find the bucket index. It then checks the binary search tree at that index to see if the key is present.

Let’s look at the following illustration to get a better understanding of the solution:

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.