Skip to content
← All posts

Minimum Number of Vertices to Reach All Nodes: Find the Sources

5 min read
leetcodeproblemmediumgraph

Given a directed acyclic graph with n nodes labeled from 0 to n - 1 and a list of directed edges, find the smallest set of vertices from which all nodes in the graph are reachable. The answer is guaranteed to be unique.

This is LeetCode 1557: Minimum Number of Vertices to Reach All Nodes, and it teaches a fundamental graph concept: identifying source nodes by checking in-degrees.

0in:01in:12in:23in:04in:15in:1Source (in-degree 0)Reachable (in-degree > 0)
Nodes 0 and 3 have in-degree 0 (green). They cannot be reached from any other node, so they must be in the answer. Every other node (blue) is reachable from at least one source.

Why this problem matters

In any directed graph, some nodes act as "sources" that feed into the rest of the structure. Think of a water distribution network. The pumping stations (nodes with no incoming pipes) are the only places water can originate. If you want to supply water to every location, you need at least one pumping station for each disconnected source.

This same idea appears in dependency graphs, build systems, and task scheduling. Identifying which nodes have no prerequisites (in-degree 0) is the first step in topological sort and many other graph algorithms. This problem isolates that single concept and makes you practice it.

The key insight

A node with in-degree 0 has no incoming edges. That means no other node can reach it. You must include it in your starting set, because there is no alternative path to get there.

A node with in-degree greater than 0 has at least one incoming edge. Some other node can reach it. So you never need to include it directly in your starting set.

The answer is exactly the set of nodes with in-degree 0. Nothing more, nothing less.

The algorithm:

  1. Build a set of all destination nodes (nodes that appear as the target of at least one edge).
  2. Any node from 0 to n - 1 that is NOT in that set has in-degree 0.
  3. Collect those nodes. That is your answer.

The solution

def find_smallest_set_of_vertices(n: int, edges: list[list[int]]) -> list[int]:
    has_incoming = set()
    for _, to in edges:
        has_incoming.add(to)
    return [i for i in range(n) if i not in has_incoming]

Let's walk through each piece.

The has_incoming set tracks every node that is the target of at least one edge. We only care about the second element in each edge pair (to), because that tells us which node receives an incoming edge.

After scanning all edges, we iterate from 0 to n - 1. Any node not in has_incoming has zero incoming edges. It is a source node, and it must be in the answer.

We do not need to build an adjacency list. We do not need to run BFS or DFS. The problem reduces to a single scan of the edges followed by a single scan of the node range.

You do not need to count exact in-degrees. A boolean "has any incoming edge" is enough. Using a set instead of a counter keeps the logic simple.

Visual walkthrough

Let's trace the algorithm on a 6-node graph with edges [[0,1], [0,2], [1,2], [3,4], [4,5], [3,5]].

Step 1: Count in-degrees for every node.

012345
in-degrees: {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2}

Walk through every edge and count how many edges point into each node. Nodes 0 and 3 receive zero incoming edges.

Step 2: Find all nodes with in-degree 0.

012345
in-degrees: {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2}

Nodes 0 and 3 have in-degree 0 (green). No other node can reach them, so they must be starting points.

Step 3: Return the sources. That is the answer.

012345
in-degrees: {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2}

Answer: [0, 3]. From these two nodes you can reach every other node in the graph by following directed edges.

In Step 1, we count incoming edges. Nodes 1, 2, 4, and 5 all have at least one incoming edge. Nodes 0 and 3 have none.

In Step 2, we identify nodes 0 and 3 as the sources. They are the only nodes that no edge points to.

In Step 3, we return [0, 3]. From node 0, you can reach nodes 1 and 2. From node 3, you can reach nodes 4 and 5. Every node is covered.

Complexity analysis

ApproachTimeSpace
In-degree checkO(n + e)O(n)

Time: We scan all e edges once to build the set, then scan all n nodes once to find those with in-degree 0. Total: O(n + e).

Space: The has_incoming set stores at most n node IDs. The output list stores the source nodes. Total: O(n).

The building blocks

1. In-degree counting

has_incoming = set()
for _, to in edges:
    has_incoming.add(to)

This pattern appears whenever you need to classify nodes by their incoming connections. In topological sort (like in Course Schedule), you count exact in-degrees and use them to determine processing order. Here, a simple set suffices because you only need to know whether the count is zero or not.

2. Filtering nodes by property

result = [i for i in range(n) if i not in has_incoming]

This is a general pattern: iterate over all candidates and filter by a condition. The set lookup makes each check O(1). You see this same structure in problems where you need to find nodes without a certain property, elements missing from a collection, or items that satisfy a boolean predicate.

Edge cases

  • No edges at all: every node has in-degree 0. The answer is all nodes [0, 1, ..., n-1].
  • Single node, no edges: n = 1, edges is empty. The answer is [0].
  • Fully connected chain: edges form a single chain 0 -> 1 -> 2 -> ... -> n-1. Only node 0 has in-degree 0.
  • Multiple disconnected components: each component has its own source node. The answer includes one source per component.
  • Node with many incoming edges: a node that receives edges from every other node still has in-degree greater than 0. It is never included in the answer.
  • Two nodes pointing to each other: not possible in a DAG, but if edges were [[0,1],[1,0]], both nodes would have in-degree 1 and neither would be a source. The problem guarantees a DAG, so this case does not arise.

From understanding to recall

The logic here is simple: find nodes with in-degree 0. But in an interview, simple problems can still trip you up if you overthink them. You might start building an adjacency list, running DFS, or trying to find reachability. None of that is necessary.

Spaced repetition locks in the direct approach. You practice recognizing that "minimum vertices to reach all nodes" maps to "in-degree 0 nodes." You practice writing the set-based solution from memory. After a few reps, you see the problem statement and immediately know the answer pattern. No wasted time exploring dead ends.

Related posts

  • Clone Graph - Another graph traversal problem that requires understanding adjacency representation
  • Course Schedule - Uses in-degree counting for topological sort, the same core pattern as this problem
  • Number of Islands - Graph traversal fundamentals with DFS on a grid

CodeBricks breaks Minimum Number of Vertices to Reach All Nodes into its in-degree-counting and node-filtering building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a graph source identification problem shows up in your interview, you do not hesitate. You just write it.