Making File Names Unique: Hash Map Tracking
Making File Names Unique is LeetCode problem 1487. You are given an array of file name strings. Your job is to process them in order and ensure every name in the output is unique. If a name has already been used, you append the smallest available suffix (k) to make it unique. The result is an array of the same length where no two names are the same.
Why this problem matters
This is not just an interview exercise. It mirrors a real-world pattern you encounter all the time: operating systems do exactly this when you copy a file and the destination already has a file with the same name. Cloud storage, backup tools, and batch import systems all face the same deduplication challenge. The core skill here, using a hash map to track state as you process a sequence, shows up constantly in production code and in other LeetCode problems.
The key insight
A naive approach would be to keep a set of used names and, for each conflict, try (1), (2), (3), etc. until you find one that is free. That works, but it can be extremely slow. Imagine 10,000 copies of "doc". When you process the last one, you would try (1) through (9999) before finding a free suffix. That is O(n^2) in the worst case.
The fix: instead of just tracking which names are used, also track the next suffix to try for each base name. When "doc" is first used, record next_k = 1. When "doc" appears again, you know to start trying at (1) instead of re-scanning from scratch. After using "doc(1)", bump next_k to 2. This turns repeated lookups into amortized O(1) work per name.
The solution
def getFolderNames(names):
name_count = {}
result = []
for name in names:
if name not in name_count:
result.append(name)
name_count[name] = 1
else:
k = name_count[name]
candidate = f"{name}({k})"
while candidate in name_count:
k += 1
candidate = f"{name}({k})"
result.append(candidate)
name_count[name] = k + 1
name_count[candidate] = 1
return result
Here is what each piece does:
name_count = {}is our hash map. Each key is a name string, and its value is the next suffix number to try when that name has a conflict.- For each name, we first check if it is already in the map. If not, we use it directly and set its
next_kto 1 (meaning if this name appears again, start trying from(1)). - If the name is already taken, we grab the stored
next_kand build a candidate like"doc(2)". If that candidate is also taken, we incrementkand try again. - Once we find a free candidate, we add it to the result, register the candidate in the map with
next_k = 1, and update the original name'snext_kso the next conflict starts from a higher number.
Tracking next_k per base name is what makes this O(n) amortized instead of O(n^2). Without it, every conflict would re-scan suffixes from 1, leading to repeated work. With it, each suffix value is tried at most once across the entire input.
Visual walkthrough
Step 1: Process "doc". Not in the map yet.
"doc" is not in the map. Use it as-is and record that the next suffix for "doc" is 1.
Step 2: Process "doc". Already in the map, try suffix (1).
"doc" exists. Try "doc(1)", which is free. Use it, update "doc" next_k to 2, and register "doc(1)" in the map.
Step 3: Process "image". Not in the map.
"image" is fresh. Use it directly and set its next suffix to 1.
Step 4: Process "doc(1)". Already taken by step 2!
"doc(1)" is already used. Try "doc(1)(1)", which is free. Register it and bump "doc(1)" next_k to 2.
Step 5: Process "doc". Still in the map, try suffix (2).
"doc" exists with next_k=2. Try "doc(2)", which is free. Use it, bump "doc" next_k to 3, and register "doc(2)".
Notice how in step 4, "doc(1)" was already registered by step 2. The algorithm treats "doc(1)" as its own base name and appends (1) to it, producing "doc(1)(1)". The map handles this naturally because every unique string gets its own entry.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Hash map with suffix tracking | O(n) amortized | O(n) |
Time: O(n) amortized. Each name is processed once. The while loop that increments k may run multiple times for a single name, but across all names, each suffix value is visited at most once. The total work across all while-loop iterations is bounded by n, giving amortized O(1) per name.
Space: O(n). We store up to n entries in the hash map (one per unique output name) plus the result array of size n.
The building blocks
1. Hash map for tracking used names
name_count = {}
if name not in name_count:
name_count[name] = 1
This is the classic "have we seen this before" pattern. The hash map gives O(1) lookup to check whether a name is already taken. The value stores metadata (the next suffix to try) rather than just a boolean, which is what makes this problem interesting.
2. Suffix increment pattern
k = name_count[name]
candidate = f"{name}({k})"
while candidate in name_count:
k += 1
candidate = f"{name}({k})"
name_count[name] = k + 1
This is where the efficiency lives. By remembering where we left off (next_k), we avoid redundant work. After finding a free suffix, we store k + 1 so the next conflict for this base name picks up right where we stopped. This pattern of "resume from last known position" shows up in many hash map problems.
Edge cases
- No duplicates. If every name is unique, no suffix is ever appended. Each name gets added to the map with
next_k = 1, and the output matches the input exactly. - All names identical. Something like
["a", "a", "a", "a"]produces["a", "a(1)", "a(2)", "a(3)"]. Thenext_kcounter handles this smoothly. - Name with parenthesized suffix already present. Input like
["doc", "doc(1)", "doc"]is tricky. The first"doc"is fine. Then"doc(1)"is added as a completely separate name. When the third"doc"arrives, the map says to try"doc(1)", but that is taken. So we bump to"doc(2)". The while loop handles this case automatically. - Long chains of conflicts. Even if many suffixes are pre-occupied, the amortized cost stays O(n) because each suffix is only tried once globally.
- Empty string.
""is a valid name. Duplicates of""become"(1)","(2)", etc. No special handling required.
From understanding to recall
This problem is a great test of whether you can combine two ideas: hash map lookups and stateful counters. The algorithm is short, but the detail that makes it efficient (storing next_k instead of just a boolean) is easy to forget under pressure.
Spaced repetition helps you internalize the "resume from last position" trick. After a few review cycles, you will not just remember the solution. You will recognize the pattern whenever a problem asks you to deduplicate a sequence efficiently. The combination of "check membership" and "track next available value" is a building block that transfers to problems like making unique usernames, allocating IDs, or assigning resources.
Related posts
- Group Anagrams - Hash map grouping pattern
- Design HashMap - Building hash maps from scratch
- Design Add and Search Words - String tracking with data structures
If you want to lock these hash map patterns into long-term memory, CodeBricks uses spaced repetition to help you practice the right problems at the right time. No more forgetting what you learned last week.