Reorder Data in Log Files: Custom Sorting
You are given an array of logs. Each log is a space-delimited string where the first token is an identifier. The rest of the string is either all lowercase letters (a "letter-log") or all digits (a "digit-log"). You need to reorder the logs so that all letter-logs come before any digit-log, letter-logs are sorted lexicographically by their content (with the identifier as a tiebreaker), and digit-logs remain in their original relative order.
This is LeetCode 937: Reorder Data in Log Files, and it is a perfect exercise in writing a custom sort comparator. The tricky part is not the algorithm itself but getting the sorting rules exactly right.
Breaking down the rules
There are three sorting rules packed into one problem:
- Letter-logs come before digit-logs. This is a partition step.
- Letter-logs are sorted lexicographically by content. If two letter-logs have the same content, you break the tie using the identifier.
- Digit-logs stay in their original order. This means you cannot sort digit-logs at all. Their relative positions from the input must be preserved.
Rule 3 is the one that catches people. If you try to sort the entire array with a single comparator, you need to make sure digit-logs are treated as "equal" to each other so a stable sort preserves their order. Alternatively, you can just separate the two groups, sort only the letter-logs, and concatenate.
The partition-then-sort approach is cleaner than writing one giant comparator. Pull the digit-logs out, sort the letter-logs, and stitch the two lists back together. It makes the code easier to read and harder to get wrong.
The approach
The cleanest way to solve this:
- Iterate through the logs and classify each one as a letter-log or digit-log based on the first character after the identifier.
- Collect letter-logs and digit-logs into separate lists.
- Sort the letter-logs using a custom key: first by the content (everything after the identifier), then by the identifier itself.
- Return the sorted letter-logs followed by the digit-logs (in their original order).
Python's sort is stable, so if you use a tuple key of (content, identifier), logs with identical content will be ordered by identifier, which is exactly what the problem asks for.
The solution
def reorderLogFiles(logs):
letter_logs = []
digit_logs = []
for log in logs:
parts = log.split(" ", 1)
if parts[1][0].isdigit():
digit_logs.append(log)
else:
letter_logs.append(log)
letter_logs.sort(key=lambda log: (log.split(" ", 1)[1], log.split(" ", 1)[0]))
return letter_logs + digit_logs
Let's walk through how this works on the example input ["dig1 8 1 5 1", "let1 art can", "dig2 3 6", "let2 own kit dig", "let3 art zero"]:
Step 1: Classify each log
Look at the first token after the identifier. If it starts with a digit, the log is a digit-log. Otherwise it is a letter-log.
Step 2: Partition into two groups
Separate letter-logs from digit-logs. Digit-logs stay in their original relative order.
Step 3: Sort letter-logs by content, then identifier
Sort by the content after the identifier. "art can" and "art zero" both start with "art", so compare the next word. Ties in content fall back to the identifier.
Step 4: Concatenate sorted letter-logs + digit-logs
Append the digit-logs (in original order) after the sorted letter-logs. This is the final result.
Why this works
The key insight is that the problem is really two independent sub-problems glued together. Letter-logs need a lexicographic sort with a specific tiebreaker. Digit-logs need no sorting at all. By splitting the input into two groups first, you avoid the complexity of a single comparator that handles both cases.
Python's str.split(" ", 1) with a maxsplit of 1 is the perfect tool here. It splits only on the first space, giving you [identifier, content] in one call. The sort key (content, identifier) handles the primary and tiebreaker ordering in a single tuple comparison.
Because Python's built-in sort is stable, equal elements keep their original relative order. This matters for digit-logs if you were sorting the whole array, but since we separate them out entirely, stability is only relevant within the letter-log sort (and tuples already handle ties explicitly).
In an interview, mention that you are relying on stable sort behavior. It shows you understand the subtlety of rule 3. Even though the partition approach sidesteps the issue for digit-logs, interviewers appreciate the awareness.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Partition + sort letter-logs | O(n * m * log n) | O(n * m) |
Time: O(n * m * log n). You iterate through all n logs once to classify them. Then you sort the letter-logs. In the worst case all logs are letter-logs, so you sort n items. Each comparison involves comparing strings of length up to m (the maximum log length), giving O(m * log n) for the sort step. Combined: O(n * m * log n).
Space: O(n * m). You store the letter-logs and digit-logs in separate lists, which together hold all n logs. Each log can be up to length m.
Edge cases to watch for
- All digit-logs: No letter-logs to sort. Return the original array unchanged.
- All letter-logs: No digit-logs to append. Just sort and return.
- Identical content in letter-logs: The tiebreaker is the identifier. Make sure your sort key includes the identifier as the second element of the tuple.
- Single log: Return it as-is. Works naturally with the partition approach.
- Identifier with digits: The identifier like
"let1"can contain digits. Only the content portion (after the first space) determines whether it is a letter-log or digit-log.
The building blocks
This problem exercises two fundamental patterns:
Custom sort comparators. Many problems ask you to sort data with non-standard ordering rules. The key skill is translating English rules into a sort key or comparator function. Here, the rule "sort by content, then by identifier" maps directly to a tuple key (content, identifier).
Stable partitioning. Splitting an array into groups while preserving relative order within each group is a pattern that shows up often. You partition letter-logs from digit-logs, sort one group, and leave the other untouched. This is cleaner than trying to encode everything into one sort.
From understanding to recall
The logic here is simple enough that you might think you will remember it. But the details trip people up. Do you split on the first space or all spaces? Is the tiebreaker the identifier or the full log string? Do digit-logs go before or after letter-logs?
Spaced repetition locks in these details. You practice writing the split(" ", 1) call, the tuple sort key, and the partition logic from memory at increasing intervals. After a few reps, the solution flows out without hesitation.
Custom sorting is one of the most frequently tested patterns in interviews. Drilling it across several problems (this one, custom sort string, group anagrams) builds a reusable mental template you can adapt on the fly.
Related posts
- Sort an Array - Core sorting fundamentals that underpin every custom-sort problem
- Group Anagrams - Another problem where classifying strings into groups and sorting within groups is the key idea
- Custom Sort String - A direct companion problem focused on sorting with a custom character ordering