Skip to content
← All posts

Determine if Two Strings Are Close: Frequency Matching Pattern

5 min read
leetcodeproblemmediumstringshash-mapsorting

You are given two strings word1 and word2. Two strings are considered "close" if you can attain one from the other using these operations any number of times:

  • Operation 1: Swap any two existing characters. For example, abcde becomes aecdb.
  • Operation 2: Transform every occurrence of one existing character into another existing character, and do the same with the other character. For example, aacabb becomes bbcbaa (all a become b and all b become a).

Return true if word1 and word2 are close, and false otherwise.

This is LeetCode 1657: Determine if Two Strings Are Close, and it is a clean example of how thinking about what operations preserve leads you to an efficient solution without simulating the operations at all.

Two strings that are closeword1abc"abc"swap a,b then swap b,cword2bca"bca"Character frequencies (both strings)word11a1b1c=word21a1b1cSorted frequencies: [1, 1, 1] == [1, 1, 1] ✓
word1 = "abc" and word2 = "bca" are close. They share the same character set and the same frequency distribution.

Why this problem matters

String transformation problems show up frequently in interviews, and the naive approach of trying to simulate every possible sequence of operations is a dead end. This problem teaches you to think about invariants: properties that remain true no matter how many operations you apply. Once you identify the right invariants, the problem collapses into a simple comparison.

The pattern here, comparing frequency distributions rather than simulating transformations, appears in anagram problems, character rearrangement problems, and anywhere you need to determine if one structure can be reshuffled into another.

The key insight

Operation 1 (swapping) lets you rearrange characters in any order. That means the positions of characters do not matter, only the frequencies do. Operation 2 (transforming) lets you swap the labels on two characters, effectively exchanging their frequencies. But neither operation can create a character that does not already exist, and neither operation changes the multiset of frequency values.

This gives us two invariants:

  1. Both strings must contain the same set of characters. If word1 has an a but word2 does not, no sequence of operations can bridge that gap.
  2. When you collect the frequency of each character and sort those frequencies, both strings must produce the same sorted list. Operation 2 can shuffle which character gets which frequency, but it cannot change the collection of frequencies themselves.

If both conditions hold, the strings are close.

The solution

from collections import Counter

def close_strings(word1: str, word2: str) -> bool:
    if len(word1) != len(word2):
        return False

    freq1 = Counter(word1)
    freq2 = Counter(word2)

    if freq1.keys() != freq2.keys():
        return False

    return sorted(freq1.values()) == sorted(freq2.values())

The function first checks if the two strings have the same length. If they do not, they cannot possibly be close. Next, it builds a frequency counter for each string using Python's Counter. The keys() comparison checks that both strings contain exactly the same set of characters. Finally, it sorts the frequency values and compares them. If the sorted frequencies match, the strings are close.

This "sorted frequency signature" pattern is powerful beyond this single problem. Whenever you need to check if two collections have the same distribution regardless of labeling, sort the counts and compare. You will see this in Group Anagrams, task scheduling problems, and anywhere frequency equivalence matters.

Visual walkthrough

Let's trace through word1 = "aabbcc" and word2 = "ccbbaa" step by step. We check character sets, count frequencies, then sort and compare.

Step 1: Check if both strings have the same character set

word1 = "aabbcc"aabbccchars: {a, b, c}word2 = "ccbbaa"

word1 = "aabbcc", word2 = "ccbbaa". Both contain exactly {a, b, c}. If the character sets differ, return false immediately.

Step 1 (continued): Compare character sets

word1 chars:abcword2 chars:abcSame set \u2713

Set for word1: {a, b, c}. Set for word2: {a, b, c}. They match, so we continue.

Step 2: Count character frequencies for both strings

word1 frequencies2a2b2cword2 frequencies2a2b2c

Count how many times each character appears. word1: a=2, b=2, c=2. word2: a=2, b=2, c=2.

Step 3: Sort the frequency arrays and compare

word1 sorted freq:222word2 sorted freq:222Match \u2713 return true

word1 sorted frequencies: [2, 2, 2]. word2 sorted frequencies: [2, 2, 2]. They match, so the strings are close. Return true.

Counter-example: word1 = "aab", word2 = "abb"

"aab" vs "aac""aab" chars:ab"aac" chars:acSets differ \u2717 return falseOperation 2 (transform) cannot introduce new characters.

Same character set {a, b}, but word1 frequencies sorted = [1, 2] and word2 frequencies sorted = [1, 2]. These match, so they ARE close. Now consider word1 = "aab", word2 = "aac": character sets differ (b vs c), so return false.

Both conditions pass: the character sets are identical and the sorted frequency arrays match. The answer is true. For the counter-example "aab" vs "aac", the character sets differ (b is not in "aac"), so we return false immediately without even checking frequencies.

Complexity analysis

ApproachTimeSpace
Frequency sortingO(n + k log k)O(k)

Time is O(n + k log k), where n is the length of the strings and k is the size of the character alphabet. Building the frequency counters takes O(n). Sorting the frequency values takes O(k log k). Since k is at most 26 for lowercase English letters, the sorting step is effectively O(1), making the overall time O(n) in practice.

Space is O(k) for the two frequency counters. With a fixed alphabet of 26 lowercase letters, this is O(1) extra space.

The building blocks

1. Character frequency counting

The foundation of this problem is building a frequency map from a string:

from collections import Counter

freq = Counter("aabbcc")

This gives you {'a': 2, 'b': 2, 'c': 2}. Frequency counting is the backbone of anagram detection, sliding window problems, and any problem where character order does not matter but character counts do. You will use this building block in Valid Anagram, Group Anagrams, Find All Anagrams in a String, and Minimum Window Substring.

2. Sorted frequency signature comparison

Once you have frequency maps, extracting and sorting the values gives you a canonical signature:

sorted(Counter("aabbcc").values())
sorted(Counter("ccbbaa").values())

Both produce [2, 2, 2]. Sorting strips away the character labels and focuses purely on the distribution. This is the same idea behind grouping anagrams by sorted strings, but applied to frequency values instead of characters. If two frequency signatures match after sorting, the distributions are equivalent up to relabeling.

Edge cases

Before submitting, think through these scenarios:

  • Different lengths: "abc" vs "abcd". Different lengths means different total character counts. Return false immediately.
  • Different character sets: "aab" vs "aac". The character b appears in one but not the other. No operation can introduce a new character. Return false.
  • Same characters, different frequency distribution: "aabc" vs "abcc". Character sets are both {a, b, c}, but sorted frequencies are [1, 1, 2] vs [1, 1, 2]. These are actually close. Now consider "aabbc" vs "abbcc": sorted frequencies are [1, 2, 2] vs [1, 2, 2], also close.
  • Single character strings: "aaa" vs "aaa". Same character set, same frequencies. Return true.
  • Identical strings: Always close. Same characters, same frequencies by definition.
  • All unique characters: "abc" vs "bca". Sorted frequencies are [1, 1, 1] for both. Return true.

From understanding to recall

You now know that "close strings" boils down to two checks: same character set, same sorted frequency list. The logic is elegant and the code is short. But under interview pressure, you might second-guess yourself. Do you check character sets or frequency values first? Does the length check matter? Do you sort the keys or the values?

These are exactly the kind of details that slip away without practice. Spaced repetition locks them in. You write the frequency counting, the set comparison, and the sorted values check from memory at increasing intervals. After a few rounds, you do not hesitate. You see "can you transform one string into another" and the two invariants surface immediately.

Related posts

CodeBricks breaks Determine if Two Strings Are Close into its frequency counting and sorted signature building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a string transformation problem shows up in your interview, you do not think about it. You just write it.