Masking Personal Information: String Parsing
LeetCode 831, Masking Personal Information, gives you a string that is either an email address or a phone number. Your job is to mask it according to specific rules and return the result. The challenge is not any single rule, it is handling two completely different formats in one function and getting every detail right.
The problem
You are given a string s that is either a valid email address or a valid phone number. You need to return the masked version.
For an email like "LeetCode@LeetCode.com":
- Convert the entire string to lowercase
- Replace the name (everything before
@) with the first character, five asterisks, and the last character - Keep the domain unchanged
- Result:
"l*****e@leetcode.com"
For a phone number like "+1(234)567-8910":
- Extract only the digits
- The last 10 digits are the local number, any remaining digits form the country code
- The last 4 digits stay visible, everything else is masked
- Local format:
"***-***-8910" - With country code:
"+*-***-***-8910"(one*per country code digit)
The input length is 1 <= s.length <= 100, and the input is guaranteed to be a valid email or phone number.
The diagram shows both transformations side by side. For the email, the name gets collapsed to first character, five stars, last character, and the domain is lowercased but otherwise untouched. For the phone, all formatting characters are stripped, leaving only digits, and then a standard masked format is applied.
The key insight
The first thing you do is figure out which type of input you have. If the string contains an @, it is an email. Otherwise, it is a phone number. Once you know the type, you apply a small set of rules specific to that type. There is no overlap between the two paths, so you can handle them as two independent branches.
For emails, the masking is purely about string slicing and concatenation after lowercasing. For phones, the masking is about digit extraction and formatted output. Both paths are short and clean when you separate them.
The solution
def maskPII(s: str) -> str:
if "@" in s:
s = s.lower()
name, domain = s.split("@")
return name[0] + "*****" + name[-1] + "@" + domain
else:
digits = "".join(ch for ch in s if ch.isdigit())
local = "***-***-" + digits[-4:]
if len(digits) == 10:
return local
country_code = "*" * (len(digits) - 10)
return "+" + country_code + "-" + local
The function checks for @ to decide the branch. In the email branch, it lowercases, splits on @, and rebuilds the name with masking. In the phone branch, it strips non-digits, formats the last 4 as visible, and prepends the country code mask if there are more than 10 digits.
Visual walkthrough
Here is a step-by-step trace showing how the algorithm processes an email and a phone number through each stage of the masking logic.
Step 1: Detect the type
Check if the input string contains an "@" character. If it does, treat it as an email. Otherwise, treat it as a phone number.
Step 2: Email — lowercase everything
Convert the entire string to lowercase. This handles the case-insensitive comparison requirement in one step.
Step 3: Email — split on @
Split the string at the "@" character. The part before it is the name. The part after it is the domain, which stays unchanged.
Step 4: Email — mask the name
Take the first character, add exactly five asterisks, then add the last character. The name length does not matter, the masked name is always 7 characters.
Step 5: Phone — extract digits
Strip all non-digit characters (parentheses, dashes, spaces, plus sign). Count the digits. If there are 10, it is a local number. If there are more, the extra digits form the country code.
Step 6: Phone — format the result
The last 4 digits stay visible. The local part becomes "***-***-XXXX". If there is a country code (11 digits means 1 extra), prepend "+*-". For 12 digits, prepend "+**-", and so on.
Complexity analysis
| Aspect | Value | Why |
|---|---|---|
| Time | O(n) | Single pass to check for @, one pass to lowercase or extract digits, one pass to build output |
| Space | O(n) | The lowercased string or digit string requires space proportional to the input length |
| Difficulty | Medium | Two distinct code paths, but each path is a short sequence of string operations |
Both branches scan the input a constant number of times. No nested loops, no data structures beyond a few strings. The space is dominated by the output string itself.
The building blocks
Type detection via character search
The entire solution pivots on a single check: does the string contain @? This pattern of detecting the input type by scanning for a sentinel character appears in many parsing problems. You identify a distinguishing feature, branch on it, and handle each case separately. This keeps the logic flat and avoids complex conditional chains.
Digit extraction and formatted output
The phone branch strips everything that is not a digit and then reassembles the result in a fixed format. This "extract then reformat" pattern shows up whenever you need to normalize messy input. You throw away the noise (parentheses, dashes, spaces), work with the clean data (just digits), and produce the output in a canonical format. The same approach works for problems involving phone numbers, credit cards, or any structured numeric string.
Edge cases
Minimum-length email name. If the name before @ is only one character, like "a@example.com", then name[0] and name[-1] are the same character. The result is "a*****a@example.com". The five asterisks always appear regardless of the original name length.
Exactly 10 digits (no country code). A phone number like "(234)567-8910" has exactly 10 digits. The result is just "***-***-8910" with no country code prefix. You skip the + prefix entirely.
Long country codes. A number like "+111(234)567-8910" has 13 digits. That means 3 country code digits, producing "+***-***-***-8910". The country code mask grows with however many extra digits exist beyond 10.
Mixed case in email. "AB@cd.EF" must become "a*****b@cd.ef". Lowercasing the entire string before splitting handles this in one step, so you never need to worry about case in either the name or the domain separately.
From understanding to recall
This problem tests careful implementation more than algorithmic cleverness. The logic is clean once you see it, but the details slip away: how many asterisks in the email name (five, always), the exact dash placement in the phone format, whether the country code prefix includes a trailing dash. These are the details that trip you up in an interview.
Spaced repetition helps you lock in the format rules. You practice writing the two branches from scratch, verify against the expected outputs, and revisit it a few days later. After a few reps, the "***-***-" pattern for the local number and the name[0] + "*****" + name[-1] pattern for the email are automatic. You stop second-guessing the number of asterisks.
Related posts
- Validate IP Address - Another string parsing problem that branches on input type and applies format-specific validation rules
- Compare Version Numbers - String splitting and structured comparison, a similar "parse then process" approach
- String to Integer (atoi) - Character-by-character parsing with multiple edge cases in formatting
When you are ready to make these patterns stick, the best approach is deliberate repetition. Reading the solution once builds understanding. Practicing it from memory builds recall.