Skip to content
← All posts

Reorder Data in Log Files: Custom Sorting

5 min read
leetcodeproblemmediumarraysstringssorting

You are given an array of logs. Each log is a space-delimited string where the first token is an identifier. The rest of the string is either all lowercase letters (a "letter-log") or all digits (a "digit-log"). You need to reorder the logs so that all letter-logs come before any digit-log, letter-logs are sorted lexicographically by their content (with the identifier as a tiebreaker), and digit-logs remain in their original relative order.

This is LeetCode 937: Reorder Data in Log Files, and it is a perfect exercise in writing a custom sort comparator. The tricky part is not the algorithm itself but getting the sorting rules exactly right.

Input logsSorted resultdig1 8 1 5 1let1 art candig2 3 6let2 own kit diglet3 art zerolet1 art canlet3 art zerolet2 own kit digdig1 8 1 5 1dig2 3 6Letter-logDigit-log
Letter-logs are sorted and placed first. Digit-logs keep their original order and follow at the end.

Breaking down the rules

There are three sorting rules packed into one problem:

  1. Letter-logs come before digit-logs. This is a partition step.
  2. Letter-logs are sorted lexicographically by content. If two letter-logs have the same content, you break the tie using the identifier.
  3. Digit-logs stay in their original order. This means you cannot sort digit-logs at all. Their relative positions from the input must be preserved.

Rule 3 is the one that catches people. If you try to sort the entire array with a single comparator, you need to make sure digit-logs are treated as "equal" to each other so a stable sort preserves their order. Alternatively, you can just separate the two groups, sort only the letter-logs, and concatenate.

The partition-then-sort approach is cleaner than writing one giant comparator. Pull the digit-logs out, sort the letter-logs, and stitch the two lists back together. It makes the code easier to read and harder to get wrong.

The approach

The cleanest way to solve this:

  1. Iterate through the logs and classify each one as a letter-log or digit-log based on the first character after the identifier.
  2. Collect letter-logs and digit-logs into separate lists.
  3. Sort the letter-logs using a custom key: first by the content (everything after the identifier), then by the identifier itself.
  4. Return the sorted letter-logs followed by the digit-logs (in their original order).

Python's sort is stable, so if you use a tuple key of (content, identifier), logs with identical content will be ordered by identifier, which is exactly what the problem asks for.

The solution

def reorderLogFiles(logs):
    letter_logs = []
    digit_logs = []

    for log in logs:
        parts = log.split(" ", 1)
        if parts[1][0].isdigit():
            digit_logs.append(log)
        else:
            letter_logs.append(log)

    letter_logs.sort(key=lambda log: (log.split(" ", 1)[1], log.split(" ", 1)[0]))

    return letter_logs + digit_logs

Let's walk through how this works on the example input ["dig1 8 1 5 1", "let1 art can", "dig2 3 6", "let2 own kit dig", "let3 art zero"]:

Step 1: Classify each log

dig1 8 1 5 1digit-loglet1 art canletter-logdig2 3 6digit-loglet2 own kit digletter-loglet3 art zeroletter-log

Look at the first token after the identifier. If it starts with a digit, the log is a digit-log. Otherwise it is a letter-log.

Step 2: Partition into two groups

Letter-logslet1 art canlet2 own kit diglet3 art zeroDigit-logs (original order)dig1 8 1 5 1dig2 3 6

Separate letter-logs from digit-logs. Digit-logs stay in their original relative order.

Step 3: Sort letter-logs by content, then identifier

Before sorting:let1 art canlet2 own kit diglet3 art zeroAfter sorting:let1 art canlet3 art zerolet2 own kit dig

Sort by the content after the identifier. "art can" and "art zero" both start with "art", so compare the next word. Ties in content fall back to the identifier.

Step 4: Concatenate sorted letter-logs + digit-logs

let1 art can1let3 art zero2let2 own kit dig3dig1 8 1 5 14dig2 3 65sorted letter-logsdigit-logs (original order)

Append the digit-logs (in original order) after the sorted letter-logs. This is the final result.

Why this works

The key insight is that the problem is really two independent sub-problems glued together. Letter-logs need a lexicographic sort with a specific tiebreaker. Digit-logs need no sorting at all. By splitting the input into two groups first, you avoid the complexity of a single comparator that handles both cases.

Python's str.split(" ", 1) with a maxsplit of 1 is the perfect tool here. It splits only on the first space, giving you [identifier, content] in one call. The sort key (content, identifier) handles the primary and tiebreaker ordering in a single tuple comparison.

Because Python's built-in sort is stable, equal elements keep their original relative order. This matters for digit-logs if you were sorting the whole array, but since we separate them out entirely, stability is only relevant within the letter-log sort (and tuples already handle ties explicitly).

In an interview, mention that you are relying on stable sort behavior. It shows you understand the subtlety of rule 3. Even though the partition approach sidesteps the issue for digit-logs, interviewers appreciate the awareness.

Complexity analysis

ApproachTimeSpace
Partition + sort letter-logsO(n * m * log n)O(n * m)

Time: O(n * m * log n). You iterate through all n logs once to classify them. Then you sort the letter-logs. In the worst case all logs are letter-logs, so you sort n items. Each comparison involves comparing strings of length up to m (the maximum log length), giving O(m * log n) for the sort step. Combined: O(n * m * log n).

Space: O(n * m). You store the letter-logs and digit-logs in separate lists, which together hold all n logs. Each log can be up to length m.

Edge cases to watch for

  • All digit-logs: No letter-logs to sort. Return the original array unchanged.
  • All letter-logs: No digit-logs to append. Just sort and return.
  • Identical content in letter-logs: The tiebreaker is the identifier. Make sure your sort key includes the identifier as the second element of the tuple.
  • Single log: Return it as-is. Works naturally with the partition approach.
  • Identifier with digits: The identifier like "let1" can contain digits. Only the content portion (after the first space) determines whether it is a letter-log or digit-log.

The building blocks

This problem exercises two fundamental patterns:

Custom sort comparators. Many problems ask you to sort data with non-standard ordering rules. The key skill is translating English rules into a sort key or comparator function. Here, the rule "sort by content, then by identifier" maps directly to a tuple key (content, identifier).

Stable partitioning. Splitting an array into groups while preserving relative order within each group is a pattern that shows up often. You partition letter-logs from digit-logs, sort one group, and leave the other untouched. This is cleaner than trying to encode everything into one sort.

From understanding to recall

The logic here is simple enough that you might think you will remember it. But the details trip people up. Do you split on the first space or all spaces? Is the tiebreaker the identifier or the full log string? Do digit-logs go before or after letter-logs?

Spaced repetition locks in these details. You practice writing the split(" ", 1) call, the tuple sort key, and the partition logic from memory at increasing intervals. After a few reps, the solution flows out without hesitation.

Custom sorting is one of the most frequently tested patterns in interviews. Drilling it across several problems (this one, custom sort string, group anagrams) builds a reusable mental template you can adapt on the fly.

Related posts

  • Sort an Array - Core sorting fundamentals that underpin every custom-sort problem
  • Group Anagrams - Another problem where classifying strings into groups and sorting within groups is the key idea
  • Custom Sort String - A direct companion problem focused on sorting with a custom character ordering