Skip to content
← All posts

Design HashSet: Build a Hash Set From Scratch

4 min read
leetcodeproblemeasydesignhash-map

LeetCode 705. Design HashSet asks you to implement a hash set without using any built-in hash table libraries. You need three operations: add(key), remove(key), and contains(key). All values range from 0 to 1,000,000.

The core idea is simple: pick a bucket count, hash each key to a bucket index, and store collisions in a chain (a list) at that bucket.

HashSet (size = 5, hash = key % 5)[0]head1015[1]head1621[2]head2[3]empty[4]head9
A HashSet with 5 buckets. Each bucket holds a chain of values that share the same hash (key % 5).

Problem statement

Design a MyHashSet class that supports:

  • add(key) inserts the value key into the set
  • remove(key) removes key from the set (no-op if not present)
  • contains(key) returns true if the set contains key, false otherwise

All keys are integers in the range [0, 1000000].

Approach: bucket array with chaining

You create a fixed-size array of buckets. For any key, compute bucket_index = key % num_buckets. Each bucket holds a list of keys that map to the same index. When you add a key, you first check if it already exists in the chain (sets do not allow duplicates). When you remove, you scan the chain and delete the matching value. When you check contains, you scan the chain for a match.

Choosing a prime number for the bucket count (like 769) helps distribute keys more evenly and reduces the chance of long chains.

class MyHashSet:

    def __init__(self):
        self.size = 769
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key: int) -> int:
        return key % self.size

    def add(self, key: int) -> None:
        bucket = self.buckets[self._hash(key)]
        if key not in bucket:
            bucket.append(key)

    def remove(self, key: int) -> None:
        bucket = self.buckets[self._hash(key)]
        if key in bucket:
            bucket.remove(key)

    def contains(self, key: int) -> bool:
        bucket = self.buckets[self._hash(key)]
        return key in bucket

This solution passes on LeetCode and runs well within the time limits.

Visual walkthrough

Let's trace through a sequence of operations to see how the buckets change step by step.

Step 1: add(1) (add)

Bucket [1]:empty1
Inserted

hash(1) = 1 % 5 = 1. Bucket 1 is empty, so insert 1 into the chain.

Step 2: add(6) (add)

Bucket [1]:116
Inserted

hash(6) = 6 % 5 = 1. Bucket 1 already has [1]. Value 6 is not present, so append it.

Step 3: contains(1) (contains)

Bucket [1]:16
True

hash(1) = 1 % 5 = 1. Search bucket 1 chain: found 1.

Step 4: contains(3) (contains)

Bucket [3]:empty
False

hash(3) = 3 % 5 = 3. Bucket 3 is empty, so 3 is not in the set.

Step 5: remove(1) (remove)

Bucket [1]:166
Removed

hash(1) = 1 % 5 = 1. Search bucket 1 chain, find 1, and remove it. Chain becomes [6].

Step 6: contains(1) (contains)

Bucket [1]:6
False

hash(1) = 1 % 5 = 1. Search bucket 1 chain: only 6 remains. Value 1 is no longer present.

Notice that add(1) and add(6) both land in bucket 1 (since 1 % 5 = 1 and 6 % 5 = 1). This is a hash collision, and chaining handles it naturally by storing both values in the same list. After remove(1), only 6 remains in bucket 1, so contains(1) correctly returns false.

Why not use a boolean array?

You could allocate a boolean array of size 1,000,001 and use direct indexing. That works and gives O(1) for every operation. But it wastes memory when the number of stored keys is small compared to the key range. The bucket + chaining approach uses space proportional to the number of stored elements, which is a better tradeoff in practice. Interviewers also want to see that you understand hashing fundamentals, not just array indexing.

If an interviewer asks about resizing, mention that real hash tables double their bucket count and rehash all keys when the load factor exceeds a threshold (commonly 0.75). For this LeetCode problem, a fixed bucket count is sufficient.

Complexity analysis

MetricValue
TimeO(n / k) average per operation, where n is the number of keys and k is the bucket count. With a good hash function and enough buckets, this is effectively O(1).
SpaceO(k + n). The bucket array takes O(k) space, and storing n keys takes O(n) total across all chains.

Building blocks

This problem decomposes into two reusable pieces that appear across many design problems.

1. Hash function and bucket mapping

Computing key % num_buckets to map a key to a bucket index. This same pattern appears in Design HashMap, open addressing schemes, and consistent hashing. Choosing a prime bucket count reduces clustering.

def _hash(self, key: int) -> int:
    return key % self.size

2. Linked list or array chain operations

Scanning a list for a value, appending if absent, and removing if present. These are the same insert/search/delete operations you use in any linked list or dynamic array. Practicing them here reinforces the pattern for problems like LRU Cache and Implement Trie.

bucket = self.buckets[self._hash(key)]
if key not in bucket:
    bucket.append(key)

On CodeBricks, you drill each building block separately so that when they combine in a design problem like this, both pieces feel automatic.

Edge cases

  • Adding a duplicate: add(1) twice should only store one copy. The if key not in bucket guard prevents duplicates.
  • Removing a non-existent key: remove(5) when 5 was never added should be a no-op. The if key in bucket check handles this.
  • Key is 0: Zero is a valid key. The hash function 0 % 769 = 0 maps it to bucket 0, which works correctly since 0 is a valid list element.
  • Many collisions: If all keys hash to the same bucket, operations degrade to O(n). This is unlikely with a prime bucket count and uniform input distribution.

From understanding to recall

You can read through this solution in a few minutes and understand every line. But understanding is not the same as being able to implement it cleanly under interview pressure two weeks from now.

The gap between understanding and recall is where spaced repetition comes in. CodeBricks breaks Design HashSet into its building blocks (hash function mapping, chain insert/search/delete) and drills them at increasing intervals. You type each piece from scratch, building real muscle memory. After a few review cycles, the bucket + chaining pattern becomes automatic rather than something you have to rederive.

Related posts