Design HashSet: Build a Hash Set From Scratch
LeetCode 705. Design HashSet asks you to implement a hash set without using any built-in hash table libraries. You need three operations: add(key), remove(key), and contains(key). All values range from 0 to 1,000,000.
The core idea is simple: pick a bucket count, hash each key to a bucket index, and store collisions in a chain (a list) at that bucket.
Problem statement
Design a MyHashSet class that supports:
add(key)inserts the valuekeyinto the setremove(key)removeskeyfrom the set (no-op if not present)contains(key)returnstrueif the set containskey,falseotherwise
All keys are integers in the range [0, 1000000].
Approach: bucket array with chaining
You create a fixed-size array of buckets. For any key, compute bucket_index = key % num_buckets. Each bucket holds a list of keys that map to the same index. When you add a key, you first check if it already exists in the chain (sets do not allow duplicates). When you remove, you scan the chain and delete the matching value. When you check contains, you scan the chain for a match.
Choosing a prime number for the bucket count (like 769) helps distribute keys more evenly and reduces the chance of long chains.
class MyHashSet:
def __init__(self):
self.size = 769
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key: int) -> int:
return key % self.size
def add(self, key: int) -> None:
bucket = self.buckets[self._hash(key)]
if key not in bucket:
bucket.append(key)
def remove(self, key: int) -> None:
bucket = self.buckets[self._hash(key)]
if key in bucket:
bucket.remove(key)
def contains(self, key: int) -> bool:
bucket = self.buckets[self._hash(key)]
return key in bucket
This solution passes on LeetCode and runs well within the time limits.
Visual walkthrough
Let's trace through a sequence of operations to see how the buckets change step by step.
Step 1: add(1) (add)
hash(1) = 1 % 5 = 1. Bucket 1 is empty, so insert 1 into the chain.
Step 2: add(6) (add)
hash(6) = 6 % 5 = 1. Bucket 1 already has [1]. Value 6 is not present, so append it.
Step 3: contains(1) (contains)
hash(1) = 1 % 5 = 1. Search bucket 1 chain: found 1.
Step 4: contains(3) (contains)
hash(3) = 3 % 5 = 3. Bucket 3 is empty, so 3 is not in the set.
Step 5: remove(1) (remove)
hash(1) = 1 % 5 = 1. Search bucket 1 chain, find 1, and remove it. Chain becomes [6].
Step 6: contains(1) (contains)
hash(1) = 1 % 5 = 1. Search bucket 1 chain: only 6 remains. Value 1 is no longer present.
Notice that add(1) and add(6) both land in bucket 1 (since 1 % 5 = 1 and 6 % 5 = 1). This is a hash collision, and chaining handles it naturally by storing both values in the same list. After remove(1), only 6 remains in bucket 1, so contains(1) correctly returns false.
Why not use a boolean array?
You could allocate a boolean array of size 1,000,001 and use direct indexing. That works and gives O(1) for every operation. But it wastes memory when the number of stored keys is small compared to the key range. The bucket + chaining approach uses space proportional to the number of stored elements, which is a better tradeoff in practice. Interviewers also want to see that you understand hashing fundamentals, not just array indexing.
If an interviewer asks about resizing, mention that real hash tables double their bucket count and rehash all keys when the load factor exceeds a threshold (commonly 0.75). For this LeetCode problem, a fixed bucket count is sufficient.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O(n / k) average per operation, where n is the number of keys and k is the bucket count. With a good hash function and enough buckets, this is effectively O(1). |
| Space | O(k + n). The bucket array takes O(k) space, and storing n keys takes O(n) total across all chains. |
Building blocks
This problem decomposes into two reusable pieces that appear across many design problems.
1. Hash function and bucket mapping
Computing key % num_buckets to map a key to a bucket index. This same pattern appears in Design HashMap, open addressing schemes, and consistent hashing. Choosing a prime bucket count reduces clustering.
def _hash(self, key: int) -> int:
return key % self.size
2. Linked list or array chain operations
Scanning a list for a value, appending if absent, and removing if present. These are the same insert/search/delete operations you use in any linked list or dynamic array. Practicing them here reinforces the pattern for problems like LRU Cache and Implement Trie.
bucket = self.buckets[self._hash(key)]
if key not in bucket:
bucket.append(key)
On CodeBricks, you drill each building block separately so that when they combine in a design problem like this, both pieces feel automatic.
Edge cases
- Adding a duplicate:
add(1)twice should only store one copy. Theif key not in bucketguard prevents duplicates. - Removing a non-existent key:
remove(5)when 5 was never added should be a no-op. Theif key in bucketcheck handles this. - Key is 0: Zero is a valid key. The hash function
0 % 769 = 0maps it to bucket 0, which works correctly since 0 is a valid list element. - Many collisions: If all keys hash to the same bucket, operations degrade to O(n). This is unlikely with a prime bucket count and uniform input distribution.
From understanding to recall
You can read through this solution in a few minutes and understand every line. But understanding is not the same as being able to implement it cleanly under interview pressure two weeks from now.
The gap between understanding and recall is where spaced repetition comes in. CodeBricks breaks Design HashSet into its building blocks (hash function mapping, chain insert/search/delete) and drills them at increasing intervals. You type each piece from scratch, building real muscle memory. After a few review cycles, the bucket + chaining pattern becomes automatic rather than something you have to rederive.
Related posts
- Hash Maps: The Most Important Data Structure - Deep dive into how hash maps work internally, the foundation for this problem
- Implement Trie - Another "design a data structure from scratch" problem that tests similar implementation skills
- Hash Map Patterns - Common patterns for hash-based problems, including frequency counting and grouping