Skip to content
← All posts

Remove Sub-Folders from the Filesystem: Sort and Prefix Check

4 min read
leetcodeproblemmediumarraysstringstrie

LeetCode 1233, Remove Sub-Folders from the Filesystem, gives you a list of folder paths and asks you to remove every sub-folder. A sub-folder is any folder that lives inside another folder in the list. You return only the parent folders.

For example, given ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"], the answer is ["/a", "/c/d", "/c/f"]. The folder "/a/b" lives inside "/a", and "/c/d/e" lives inside "/c/d", so both get removed.

//a/c/a/b/c/d/c/f/c/d/ekeepremove (sub-folder)
Folder tree for ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]. Green nodes are kept. Red nodes are sub-folders that get removed.

Why this problem matters

Filtering nested paths is a real task that shows up in file system tools, build systems, and version control ignore rules. The underlying technique, sorting strings so that related items cluster together, appears across many problems involving hierarchical or prefix-based data.

The key insight

Sort the folders lexicographically. After sorting, any sub-folder will appear immediately after its parent folder. That means you only need to track the most recent parent you added to the result. For each new folder, check if it starts with that parent followed by "/". If it does, skip it. If it does not, it is a new independent folder, so add it to the result and update the parent.

The "/" suffix is critical. Without it, "/a" would incorrectly match "/ab" as a sub-folder, since "/ab".startswith("/a") is true. By checking "/ab".startswith("/a/"), you correctly identify that "/ab" is a sibling, not a child.

The solution

def remove_subfolders(folder):
    folder.sort()
    result = [folder[0]]
    for i in range(1, len(folder)):
        last = result[-1]
        if not folder[i].startswith(last + "/"):
            result.append(folder[i])
    return result

After sorting, the first folder is always a parent (nothing before it could contain it). Then you walk through the rest. For each folder, you grab the last folder you added to the result. If the current folder does not begin with last + "/", it is not a sub-folder, so you add it. Otherwise you skip it.

Appending "/" before the startswith check prevents false matches between siblings like "/a" and "/ab". The slash guarantees you are checking for an actual directory boundary, not just a shared prefix.

Visual walkthrough

Step 1: Sort the folders lexicographically

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

[ ]

After sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]. Sub-folders now appear right after their parents.

Step 2: Process "/a"

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

/a

No previous parent to compare against. Add "/a" to the result and set it as the current parent.

Step 3: Process "/a/b"

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

/a

"/a/b" starts with "/a/", so it is a sub-folder of "/a". Skip it.

Step 4: Process "/c/d"

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

/a/c/d

"/c/d" does not start with "/a/". It is a new parent. Add it to the result.

Step 5: Process "/c/d/e"

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

/a/c/d

"/c/d/e" starts with "/c/d/", so it is a sub-folder. Skip it.

Step 6: Process "/c/f"

Sorted folders

/a/a/b/c/d/c/d/e/c/f

Result so far

/a/c/d/c/f

"/c/f" does not start with "/c/d/". New parent. Add it to the result.

After processing all five folders, the result contains ["/a", "/c/d", "/c/f"]. The two sub-folders were detected and skipped because they started with an existing parent path followed by "/".

Complexity analysis

ApproachTimeSpace
Sort + prefix checkO(n * m * log n)O(n)

Here n is the number of folders and m is the average length of each folder path. Sorting costs O(n * m * log n) because each string comparison during the sort takes up to O(m) time. The single pass through the sorted list costs O(n * m) for the prefix checks. Space is O(n) for storing the result (or O(n * m) if you count the characters).

The building blocks

1. Lexicographic sorting for prefix grouping

folder.sort()

Sorting strings lexicographically places all paths sharing a common prefix next to each other. This is the same trick used in problems like Longest Common Prefix, where sorting lets you compare just the first and last string. Here, it ensures that a parent always appears before any of its children.

2. Prefix checking with a boundary character

folder[i].startswith(last + "/")

Checking whether one string starts with another is a common pattern in path and URL manipulation. The key detail is appending the delimiter ("/") to avoid partial matches. This same idea applies any time you compare hierarchical identifiers, whether they are file paths, domain names, or package namespaces.

Edge cases

  • No sub-folders at all. If every folder is independent (like ["/a", "/b", "/c"]), sorting still works and the prefix check never triggers. You return all of them.
  • Deeply nested chains. Given ["/a", "/a/b", "/a/b/c", "/a/b/c/d"], only "/a" survives. Each subsequent folder starts with "/a/", so they all get skipped.
  • Similar prefixes that are not parents. ["/a", "/ab", "/a/b"] returns ["/a", "/ab"]. The check "/ab".startswith("/a/") is false, so "/ab" is correctly kept. Only "/a/b" is a true sub-folder.
  • Single folder. If the list contains just one folder, return it as-is. The loop never executes.
  • Root-level folders only. Paths like ["/x", "/y", "/z"] have no nesting. The algorithm handles this with zero skips.

From understanding to recall

The code for this problem is short, but two details are easy to forget: the sort step that makes the prefix check work, and the "/" suffix that prevents false matches. These are exactly the kind of small decisions that slip away after a few days.

Spaced repetition helps you lock in those details. Instead of re-reading the full solution, you drill the building blocks (sort for prefix grouping, boundary character in startswith) at increasing intervals. After a few reps, the entire approach becomes automatic. When you see a filtering or prefix problem in an interview, you will know the pattern without hesitation.

Related posts

  • Longest Common Prefix - Another problem where sorting strings enables efficient prefix comparison
  • Implement Trie - An alternative approach to this problem using a trie data structure
  • Simplify Path - Path manipulation using a stack, another common filesystem pattern