Skip to content
← All posts

Masking Personal Information: String Parsing

5 min read
leetcodeproblemmediumstrings

LeetCode 831, Masking Personal Information, gives you a string that is either an email address or a phone number. Your job is to mask it according to specific rules and return the result. The challenge is not any single rule, it is handling two completely different formats in one function and getting every detail right.

The problem

You are given a string s that is either a valid email address or a valid phone number. You need to return the masked version.

For an email like "LeetCode@LeetCode.com":

  • Convert the entire string to lowercase
  • Replace the name (everything before @) with the first character, five asterisks, and the last character
  • Keep the domain unchanged
  • Result: "l*****e@leetcode.com"

For a phone number like "+1(234)567-8910":

  • Extract only the digits
  • The last 10 digits are the local number, any remaining digits form the country code
  • The last 4 digits stay visible, everything else is masked
  • Local format: "***-***-8910"
  • With country code: "+*-***-***-8910" (one * per country code digit)

The input length is 1 <= s.length <= 100, and the input is guaranteed to be a valid email or phone number.

EmailinputLeetCode@LeetCode.comoutputl*****e@leetcode.comPhoneinput+1(234)567-8910output+*-***-***-8910keptmaskeddomain/last4
Email masking keeps the first and last character of the name, replaces the middle with five asterisks, and lowercases everything. Phone masking extracts digits, keeps the last four visible, and formats with dashes.

The diagram shows both transformations side by side. For the email, the name gets collapsed to first character, five stars, last character, and the domain is lowercased but otherwise untouched. For the phone, all formatting characters are stripped, leaving only digits, and then a standard masked format is applied.

The key insight

The first thing you do is figure out which type of input you have. If the string contains an @, it is an email. Otherwise, it is a phone number. Once you know the type, you apply a small set of rules specific to that type. There is no overlap between the two paths, so you can handle them as two independent branches.

For emails, the masking is purely about string slicing and concatenation after lowercasing. For phones, the masking is about digit extraction and formatted output. Both paths are short and clean when you separate them.

The solution

def maskPII(s: str) -> str:
    if "@" in s:
        s = s.lower()
        name, domain = s.split("@")
        return name[0] + "*****" + name[-1] + "@" + domain
    else:
        digits = "".join(ch for ch in s if ch.isdigit())
        local = "***-***-" + digits[-4:]
        if len(digits) == 10:
            return local
        country_code = "*" * (len(digits) - 10)
        return "+" + country_code + "-" + local

The function checks for @ to decide the branch. In the email branch, it lowercases, splits on @, and rebuilds the name with masking. In the phone branch, it strips non-digits, formats the last 4 as visible, and prepends the country code mask if there are more than 10 digits.

Visual walkthrough

Here is a step-by-step trace showing how the algorithm processes an email and a phone number through each stage of the masking logic.

Step 1: Detect the type

s = "LeetCode@LeetCode.com" → contains "@" → email

Check if the input string contains an "@" character. If it does, treat it as an email. Otherwise, treat it as a phone number.

Step 2: Email — lowercase everything

"LeetCode@LeetCode.com" → "leetcode@leetcode.com"

Convert the entire string to lowercase. This handles the case-insensitive comparison requirement in one step.

Step 3: Email — split on @

"leetcode@leetcode.com" → name="leetcode", domain="leetcode.com"

Split the string at the "@" character. The part before it is the name. The part after it is the domain, which stays unchanged.

Step 4: Email — mask the name

"leetcode" → "l" + "*****" + "e" → "l*****e"

Take the first character, add exactly five asterisks, then add the last character. The name length does not matter, the masked name is always 7 characters.

Step 5: Phone — extract digits

"+1(234)567-8910" → digits = "12345678910" → 11 digits

Strip all non-digit characters (parentheses, dashes, spaces, plus sign). Count the digits. If there are 10, it is a local number. If there are more, the extra digits form the country code.

Step 6: Phone — format the result

"12345678910" → 11 digits → "+*-***-***-8910"

The last 4 digits stay visible. The local part becomes "***-***-XXXX". If there is a country code (11 digits means 1 extra), prepend "+*-". For 12 digits, prepend "+**-", and so on.

Complexity analysis

AspectValueWhy
TimeO(n)Single pass to check for @, one pass to lowercase or extract digits, one pass to build output
SpaceO(n)The lowercased string or digit string requires space proportional to the input length
DifficultyMediumTwo distinct code paths, but each path is a short sequence of string operations

Both branches scan the input a constant number of times. No nested loops, no data structures beyond a few strings. The space is dominated by the output string itself.

The building blocks

Type detection via character search

The entire solution pivots on a single check: does the string contain @? This pattern of detecting the input type by scanning for a sentinel character appears in many parsing problems. You identify a distinguishing feature, branch on it, and handle each case separately. This keeps the logic flat and avoids complex conditional chains.

Digit extraction and formatted output

The phone branch strips everything that is not a digit and then reassembles the result in a fixed format. This "extract then reformat" pattern shows up whenever you need to normalize messy input. You throw away the noise (parentheses, dashes, spaces), work with the clean data (just digits), and produce the output in a canonical format. The same approach works for problems involving phone numbers, credit cards, or any structured numeric string.

Edge cases

Minimum-length email name. If the name before @ is only one character, like "a@example.com", then name[0] and name[-1] are the same character. The result is "a*****a@example.com". The five asterisks always appear regardless of the original name length.

Exactly 10 digits (no country code). A phone number like "(234)567-8910" has exactly 10 digits. The result is just "***-***-8910" with no country code prefix. You skip the + prefix entirely.

Long country codes. A number like "+111(234)567-8910" has 13 digits. That means 3 country code digits, producing "+***-***-***-8910". The country code mask grows with however many extra digits exist beyond 10.

Mixed case in email. "AB@cd.EF" must become "a*****b@cd.ef". Lowercasing the entire string before splitting handles this in one step, so you never need to worry about case in either the name or the domain separately.

From understanding to recall

This problem tests careful implementation more than algorithmic cleverness. The logic is clean once you see it, but the details slip away: how many asterisks in the email name (five, always), the exact dash placement in the phone format, whether the country code prefix includes a trailing dash. These are the details that trip you up in an interview.

Spaced repetition helps you lock in the format rules. You practice writing the two branches from scratch, verify against the expected outputs, and revisit it a few days later. After a few reps, the "***-***-" pattern for the local number and the name[0] + "*****" + name[-1] pattern for the email are automatic. You stop second-guessing the number of asterisks.

Related posts

  • Validate IP Address - Another string parsing problem that branches on input type and applies format-specific validation rules
  • Compare Version Numbers - String splitting and structured comparison, a similar "parse then process" approach
  • String to Integer (atoi) - Character-by-character parsing with multiple edge cases in formatting

When you are ready to make these patterns stick, the best approach is deliberate repetition. Reading the solution once builds understanding. Practicing it from memory builds recall.