Skip to content
← All posts

Unique Morse Code Words: Hash Set Deduplication

4 min read
leetcodeproblemeasystringshash-map

LeetCode 804. Unique Morse Code Words is a clean warm-up problem that combines two fundamental ideas: transforming strings through a mapping and deduplicating results with a hash set. You are given an array of words and a fixed Morse code alphabet. Your job is to convert each word into its Morse representation and count how many distinct transformations exist.

It is a small problem, but it exercises the exact same "transform then deduplicate" pattern that appears in harder problems like Group Anagrams.

The problem

You are given an array of lowercase English words. Each letter maps to a predefined Morse code string:

a -> ".-", b -> "-...", c -> "-.-.", and so on through z -> "--.."

To transform a word, you concatenate the Morse code of each letter. For example, "gin" becomes "--." + ".." + "-." = "--...-.". Different words can produce the same Morse string. Your task is to return the number of unique Morse representations across all words in the array.

wordsmorse codeunique setgin--...-.zen--...-.gig--...--.msg--...--.--...-.--...--.Result: 2 unique transformations
"gin" and "zen" produce the same Morse string. "gig" and "msg" also collide. Two unique codes remain.

The approach

The algorithm is two steps:

  1. For each word, build its Morse code string by looking up each character and concatenating the results.
  2. Add each Morse string to a set. Sets automatically discard duplicates.
  3. Return the size of the set.
def unique_morse_representations(words: list[str]) -> int:
    morse = [".-","-...","-.-.","-..",".","..-.",
             "--.","....","..",".---","-.-",".-..",
             "--","-.","---",".--.","--.-",".-.",
             "...","-","..-","...-",".--","-..-",
             "-.--","--.."]
    seen = set()
    for word in words:
        code = "".join(morse[ord(c) - ord('a')] for c in word)
        seen.add(code)
    return len(seen)

The morse list maps each letter to its code by index. For each word, you convert every character to its Morse equivalent, join them into one string, and drop it into a set. At the end, the set contains only unique transformations, so its length is your answer.

Step-by-step walkthrough

Let's trace through the example ["gin", "zen", "gig", "msg"] and watch how the set grows (or does not) with each word.

Step 1: Convert "gin" to Morse and add to the set

g--.i..n-.concat: --...-.set:--...-.

Each letter maps to its Morse code. Concatenate them: "--...-.". The set is empty, so this is a new entry.

Step 2: Convert "zen" to Morse. It matches "gin"

z--..e.n-.concat: --...-.set:--...-.

z="--..", e=".", n="-.". Concatenated: "--...-.". Already in the set, so nothing new is added.

Step 3: Convert "gig" to Morse and add to the set

g--.i..g--.concat: --...--.set:--...-.--...--.

g="--.", i="..", g="--." concatenates to "--...--.". This is new. The set now has 2 entries.

Step 4: Convert "msg" to Morse. It matches "gig"

m--s...g--.concat: --...--.set:--...-.--...--.

m="--", s="...", g="--." concatenates to "--...--.". Already in the set. Final answer: 2 unique transformations.

After processing all four words, the set holds exactly two entries. Two pairs of words happened to collide into the same Morse string, leaving us with two unique transformations.

Complexity analysis

MetricValue
TimeO(n * k)
SpaceO(n * k)

Here n is the number of words and k is the average word length. For each word, you iterate through its k characters to build the Morse string. That string gets stored in the set, which in the worst case (all unique) holds n entries, each of length proportional to k. The lookup table itself is a fixed-size array of 26 entries, so it contributes only O(1) space.

The building blocks

Hash set for deduplication

The core pattern here is the "seen set." You process a stream of values, and you only care about how many distinct values exist. A set gives you O(1) insertion and O(1) membership checking, making it the natural choice whenever uniqueness is the question. You will see this same pattern in Contains Duplicate, Happy Number, and dozens of other problems.

String concatenation and mapping

The other building block is the character-level transformation. You take each character of a string, map it to something else (in this case, a Morse code fragment), and concatenate the results into a new string. This "map and join" pattern is also the basis of problems like Group Anagrams (where the mapping is sorting) and Isomorphic Strings (where the mapping is a character-to-character bijection). Once you internalize the template of "iterate characters, transform each one, combine the results," you can adapt it to any transformation function.

Edge cases

  • All words produce the same Morse code. The set ends up with a single entry. The answer is 1.
  • All words are identical. Identical words produce identical Morse strings, so the set has one entry. This is a subset of the previous case.
  • Single-character words. Each word is one letter, so its Morse code is just that letter's code. The number of unique results equals the number of distinct letters in the input.

From understanding to recall

This problem is easy to understand on first read, but the building blocks inside it (set-based deduplication, character mapping, string concatenation) show up constantly in harder problems. The risk is not that you will forget how to solve Unique Morse Code Words specifically. The risk is that when you see a harder problem that requires the same "transform and deduplicate" pattern, you will not make the connection fast enough.

Spaced repetition fixes that. You drill the set deduplication template and the map-and-join template separately, at increasing intervals. After a few review cycles, the moment a problem says "how many unique X exist," the set pattern fires automatically. You spend your time figuring out the transformation function, not rediscovering the deduplication strategy.

Related posts