Minimum Number of People to Teach: Greedy Language Selection
You are given n languages, m users (each knowing some subset of languages), and a list of friendship pairs. Two friends can communicate if they share at least one common language. Your task is to pick ONE language and teach it to as few users as possible so that every pair of friends can communicate.
This is LeetCode 1733: Minimum Number of People to Teach, a medium problem that blends graph thinking with greedy optimization. The trick is not to try every possible assignment of languages to users. Instead, you focus only on the friendships that are already broken and find the single language that fixes all of them with the fewest lessons.
Why this problem matters
At first glance, this looks like it might need some kind of network flow or complex optimization. You have users, languages, and friendships forming a web of constraints. But the problem has a key simplification: you are only allowed to teach one language. That constraint collapses the search space dramatically.
The real challenge is figuring out which users you even need to consider. If two friends already share a language, they are fine. You only care about the friend pairs where no common language exists. Once you isolate those "disconnected" pairs, you collect every user involved in at least one such pair. That is your candidate pool.
This pattern of narrowing the problem before optimizing shows up frequently in interview problems. You do not try to solve the general case. You identify exactly which data points matter and operate only on those.
The key insight
The algorithm has three phases. First, scan every friendship pair and check if the two users share at least one language. If they do, skip them. If they do not, both users go into a "needs help" set. Second, for each of the n languages, count how many users in the "needs help" set do not already know that language. Third, return the minimum count across all languages.
Why does this work? You can only teach one language. Every disconnected pair must end up sharing that language. So every user in a disconnected pair must either already know the chosen language or be taught it. The cost for a given language is the number of users in the pool who do not know it. You want to minimize that cost.
The greedy choice is simply: pick the language that the most people in the pool already know. That minimizes how many people you need to teach.
The solution
def minimum_teachings(n, languages, friendships):
lang_sets = [set()] + [set(langs) for langs in languages]
need_help = set()
for u, v in friendships:
if not lang_sets[u] & lang_sets[v]:
need_help.add(u)
need_help.add(v)
if not need_help:
return 0
min_teach = len(need_help)
for lang in range(1, n + 1):
teach_count = sum(1 for user in need_help if lang not in lang_sets[user])
min_teach = min(min_teach, teach_count)
return min_teach
Let's walk through what each piece does.
First, we convert each user's language list into a set for O(1) lookup. The lang_sets array is 1-indexed (with an empty set at index 0) to match the 1-indexed user IDs from the problem.
Next, we scan every friendship. For each pair (u, v), we check if their language sets have any overlap using set intersection (&). If the intersection is empty, these two cannot communicate, so both go into the need_help set. Using a set here automatically deduplicates users who appear in multiple disconnected pairs.
If need_help is empty, all friends can already talk. Return 0.
Otherwise, for each language from 1 to n, we count how many users in need_help do not know that language. That count is how many people we would need to teach. We track the minimum across all languages and return it.
The set intersection check lang_sets[u] & lang_sets[v] is the core operation. Converting lists to sets upfront makes this O(min(|A|, |B|)) per pair instead of O(|A| * |B|). For users who know many languages, this matters.
Visual walkthrough
Step 1: Find friend pairs that cannot communicate
For each friendship, check if the two users share at least one language. If they do not, this pair is 'disconnected' and at least one of them must learn a new language.
Disconnected users: U1 (knows L1), U2 (knows L2), U3 (knows L1,L2), U4 (knows L3)
3 out of 4 friend pairs cannot communicate. These are the pairs we need to fix.
Step 2: Collect all users involved in disconnected pairs
Gather every user that appears in at least one disconnected friendship. These are the only users we might need to teach. Users who can already talk to all their friends are irrelevant.
The pool of users we must consider: {U1, U2, U3, U4}. All four appear in at least one disconnected pair.
Step 3: Try each language, count how many users need to learn it
For each candidate language, count how many users in the pool do not already know it. The language with the smallest count is the best choice, because teaching it requires the fewest people.
| Language | Users who know it | Users who must learn it | Teach count |
|---|---|---|---|
| L1 | U1, U3 | U2, U4 | 2 |
| L2 | U2, U3 | U1, U4 | 2 |
| L3 | U4 | U1, U2, U3 | 3 |
Language 2 requires teaching only 2 users (U1 and U4). That is fewer than L1 (2 users) or L3 (3 users).
Step 4: Return the minimum teach count
The answer is the minimum across all languages. Here, both L1 and L2 tie at 2. We return 2, meaning we need to teach at most 2 people to ensure every friend pair can communicate.
Answer: 2. Teach L2 to U1 and U4 (or L1 to U2 and U4). Either way, all friend pairs now share a language.
The key observation in Step 3 is that we only count users within the "needs help" pool. Users who are not involved in any disconnected friendship are irrelevant. This keeps the counting efficient and focused.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Greedy + hash set | O(n * (f + m)) | O(m) |
Here n is the number of languages, f is the number of friendships, and m is the number of users.
Time: For each friendship, the set intersection check takes O(L) where L is the size of the smaller language set. In the worst case this is bounded by n, but typically much smaller. Building the need_help set is O(f * L). Then for each of the n languages, we scan the need_help set (at most m users), giving O(n * m). The total is O(f * L + n * m), which simplifies to O(n * (f + m)) in the worst case.
Space: We store language sets for all users (O(m * L) total), plus the need_help set (O(m)). The dominant term is O(m * L), but since the problem constrains total language knowledge, this is effectively O(m).
The building blocks
1. Set intersection for shared-element checking
The pattern of converting lists to sets and using intersection to detect overlap:
set_a = set(list_a)
set_b = set(list_b)
has_overlap = bool(set_a & set_b)
This appears whenever you need to check if two collections share any element. You see it in problems like Group Anagrams (checking character frequency overlap), word ladder (checking one-letter-apart words), and any problem involving shared membership between two groups.
2. Greedy minimization over candidates
The pattern of trying every candidate and picking the one with the lowest cost:
min_cost = float('inf')
for candidate in candidates:
cost = compute_cost(candidate, relevant_data)
min_cost = min(min_cost, cost)
return min_cost
This is the simplest form of greedy optimization. It works when the number of candidates is small enough to enumerate (here, n languages) and the cost function is cheap to evaluate. You see this same skeleton in problems like finding the best meeting point, choosing the optimal starting station, or selecting the minimum-cost operation.
Edge cases
- All friends already communicate: every pair shares at least one language. The
need_helpset is empty, so return 0. - One language, many users who do not know it: if there is only one language, every user in a disconnected pair who does not know it must learn it. The answer is the size of
need_helpminus those who already know language 1. - A single disconnected pair: only two users are in the pool. The answer is at most 2, or 1 if one of them already knows the chosen language.
From understanding to recall
You have seen how the algorithm narrows the problem to disconnected pairs, collects the relevant users, and tries each language. The logic is clean. But writing it under time pressure is a different challenge. Remembering to use set intersection instead of nested loops, building the 1-indexed language set array, and iterating over languages rather than users are all small details that matter.
Spaced repetition turns understanding into recall. You practice writing the set-intersection friendship check and the greedy language scan from scratch, at increasing intervals. After a few rounds, the pattern is automatic. You see "minimize teaching to fix all pairs" and the code flows out without hesitation.
Related posts
- Group Anagrams - Hash map grouping, similar set intersection thinking
- Isomorphic Strings - Mapping relationships between elements
- Word Pattern - Pattern matching with hash maps
CodeBricks breaks the Minimum Number of People to Teach problem into its set-intersection and greedy-minimization building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a greedy optimization question shows up in your interview, you do not think about it. You just write it.