Serialize and Deserialize Binary Tree
You are given a binary tree. Design an algorithm to serialize it to a string and deserialize that string back into the original tree. The serialized format is up to you, as long as you can perfectly reconstruct the tree from it.
This is LeetCode 297: Serialize and Deserialize Binary Tree, and it is one of those "hard" problems that feels way easier once you see the trick. The trick is null markers. If you record where the nulls are during serialization, you have enough information to rebuild the tree without any ambiguity.
Why this problem matters
Serializing a tree to a string and back is not just a LeetCode exercise. It comes up in real systems all the time. Databases store tree-shaped data as flat strings. Message queues pass tree structures between services. Config files encode nested hierarchies into text. The core question is always the same: how do you flatten a recursive structure into a linear format and reconstruct it perfectly?
The reason most people struggle with this problem is that they try to serialize the tree without null markers. If you just write down the node values in some traversal order, you lose the shape of the tree. The values [1, 2, 3] could represent dozens of different tree structures. But [1, 2, N, N, 3, N, N] (where N marks null children) describes exactly one tree. The null markers encode the structure.
The key insight
When you do a preorder traversal (root, left, right) and record null children explicitly, you capture two things at once:
- The values of every node
- The shape of the tree (where each subtree ends)
That is all you need. The null markers tell the deserializer "stop going deeper, this branch is done." Without them, you would not know where one subtree ends and the next begins.
Think of it like parentheses in math. The null markers act as closing brackets that tell you when to stop recursing and backtrack.
Approach 1: Preorder DFS
This is the cleanest approach and the one you should learn first. The idea is simple: traverse the tree in preorder (root, left, right), recording each value. When you hit a null child, record a special marker like "N". To deserialize, consume tokens one by one in the same preorder order, recursively building left and right subtrees.
Serialize
def serialize(root):
result = []
def dfs(node):
if node is None:
result.append("N")
return
result.append(str(node.val))
dfs(node.left)
dfs(node.right)
dfs(root)
return ",".join(result)
Walk the tree in preorder. At each node, append its value. At each null child, append "N". Join everything with commas. For the tree [1, 2, 3, null, null, 4, 5], this produces "1,2,N,N,3,4,N,N,5,N,N".
Deserialize
def deserialize(data):
tokens = iter(data.split(","))
def dfs():
val = next(tokens)
if val == "N":
return None
node = TreeNode(int(val))
node.left = dfs()
node.right = dfs()
return node
return dfs()
Split the string into tokens and consume them one at a time. If the token is "N", return None (base case). Otherwise, create a node with that value, then recursively build its left subtree, then its right subtree. The order of consumption matches the order of serialization, so everything lines up perfectly.
The deserialize function uses a Python iterator (iter()) so that every call to next(tokens) automatically advances to the next token. This is the pre-order state passing pattern: instead of passing an index around and returning the updated index, you use shared mutable state (the iterator) that all recursive calls read from. Each call to dfs() consumes exactly the tokens it needs, and the iterator keeps track of where we are.
Visual walkthrough: Deserialization
Let's trace how the deserializer rebuilds the tree from "1,2,N,N,3,4,N,N,5,N,N". Watch how the iterator advances through the tokens as recursive calls build the tree from top to bottom, left to right.
Step 1: Read '1'. Create root node.
Pop '1' from tokens. Create node(1). Now recurse left.
Step 2: Read '2'. Create left child of 1.
Pop '2'. Create node(2) as 1's left child. Recurse left from 2.
Step 3: Read 'N'. Left child of 2 is null.
Pop 'N'. Return None. Node 2 has no left child. Now recurse right from 2.
Step 4: Read 'N'. Right child of 2 is null. Node 2 complete.
Pop 'N'. Return None. Node 2 is a leaf (both children null). Back up to build 1's right subtree.
Step 5: Read '3'. Create right child of 1. Then read '4' as left child of 3.
Pop '3', then '4'. Node 4 is 3's left child. Its two 'N' children follow next.
Step 6: Two 'N's close node 4. Read '5' as right child of 3.
Node 4 done. Pop '5' as 3's right child. Two more 'N's will close it.
Step 7: Two final 'N's close node 5. Tree fully rebuilt!
All tokens consumed. The tree is identical to the original. Deserialization complete.
Notice the pattern. Each call to dfs() pops one token. If it is a value, we create a node and immediately recurse left (which pops more tokens), then recurse right (which pops more tokens). If it is "N", we just return None and the caller moves on to the right child. The iterator acts as shared state that all the recursive calls draw from.
Approach 2: BFS (level-order)
You can also serialize using BFS (breadth-first search), which produces a level-order representation. This is actually the format LeetCode uses to display trees in its examples.
Serialize with BFS
from collections import deque
def serialize(root):
if root is None:
return "N"
result = []
queue = deque([root])
while queue:
node = queue.popleft()
if node is None:
result.append("N")
else:
result.append(str(node.val))
queue.append(node.left)
queue.append(node.right)
return ",".join(result)
Process nodes level by level. For each node, record its value and enqueue its children (including nulls). Null nodes get recorded as "N" but do not enqueue their own children.
Deserialize with BFS
from collections import deque
def deserialize(data):
tokens = data.split(",")
if tokens[0] == "N":
return None
root = TreeNode(int(tokens[0]))
queue = deque([root])
i = 1
while queue:
node = queue.popleft()
if tokens[i] != "N":
node.left = TreeNode(int(tokens[i]))
queue.append(node.left)
i += 1
if tokens[i] != "N":
node.right = TreeNode(int(tokens[i]))
queue.append(node.right)
i += 1
return root
Create the root from the first token. Then process nodes from the queue: for each node, the next two tokens are its left and right children. If a token is not "N", create a child node and add it to the queue. The index i walks through the tokens in lockstep with the queue.
The BFS approach produces the same level-order format you see in LeetCode problem descriptions (like [1,2,3,null,null,4,5]). Some interviewers prefer this because it is a more "intuitive" representation. But the DFS approach is shorter, uses less code, and is easier to get right under pressure. Learn DFS first.
Which approach to use?
DFS (preorder) is better when:
- You want the shortest, most elegant code
- You are comfortable with recursion and iterators
- You want to impress an interviewer with a clean solution
BFS (level-order) is better when:
- The interviewer specifically asks for level-order format
- You find iterative code easier to debug
- You want to match LeetCode's own tree display format
For interviews, lead with DFS. It is fewer lines, harder to mess up, and shows you understand recursive tree traversal deeply.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| DFS preorder | O(n) | O(n) |
| BFS level-order | O(n) | O(n) |
Time is O(n) for both serialization and deserialization. Every node gets visited exactly once during serialization, and every token gets consumed exactly once during deserialization.
Space is O(n) for both. The serialized string itself is O(n) since we store one token per node plus one token per null child (and a binary tree with n nodes has n+1 null children, so the total is 2n+1 tokens). For DFS, the recursion stack uses O(h) space where h is the tree height, but the output string dominates. For BFS, the queue holds at most O(w) nodes where w is the maximum width.
Edge cases to watch for
- Empty tree (root is None): serialize as "N", deserialize by checking the first token. Both approaches handle this.
- Single node: serializes to "val,N,N". Two null markers for the absent children. Deserializes back to a single node.
- Skewed tree (linked list shape): DFS recursion stack goes O(n) deep. BFS queue stays small (O(1) per level). If stack depth is a concern, BFS is safer.
- Negative values: node values can be negative. Make sure your delimiter (comma) does not conflict with the minus sign. Since we split on commas, "-3" is a single token and
int("-3")works fine. - Large values: Python handles arbitrarily large integers, so no overflow issues.
The building blocks
This problem is built on two fundamental building blocks.
The first is the pre-order state passing pattern (pre-order-state-passing). During deserialization, you need to consume tokens in preorder while building the tree recursively. The natural approach is to use an iterator (or a global index) as shared mutable state that all recursive calls draw from. Each call pops the next token, decides whether it is a value or null, and either creates a node and recurses or returns None.
# Pre-order state passing skeleton:
def build():
val = next(tokens) # consume next token (shared state)
if val == SENTINEL:
return None
node = TreeNode(val)
node.left = build() # recurse left (consumes more tokens)
node.right = build() # recurse right (consumes more tokens)
return node
This pattern shows up whenever you need to reconstruct a tree from a flat sequence: Construct Binary Tree from Preorder and Inorder, Recover a Tree From Preorder Traversal, and similar problems.
The second is the recursive null base case (recursive-null-base). Every recursive tree function needs to handle the null case. In deserialization, the null base case is "if the token is the sentinel, return None." In serialization, it is "if the node is None, record the sentinel." This is the same base case pattern from Maximum Depth of Binary Tree and Validate BST, just applied to construction instead of inspection.
# Null base case in serialization:
def dfs(node):
if node is None: # base case
result.append("N")
return
result.append(str(node.val))
dfs(node.left)
dfs(node.right)
Once you internalize these two bricks, you can serialize and deserialize from scratch without memorizing the full solution. You know the traversal order (preorder), you know how to pass state through recursion (iterator), and you know the base case (null sentinel). The rest writes itself.
From understanding to recall
You just read through two complete approaches for the serialize deserialize binary tree problem and traced the deserialization step by step. The preorder DFS version is elegant. The BFS version is straightforward. Both make sense right now.
But here is the real test. Two weeks from now, when you sit down for an interview and the interviewer says "serialize and deserialize a binary tree," will you remember that null markers are the key? Will you remember to use an iterator for shared state instead of fumbling with index passing? Will you remember that the base case returns None when it sees "N"?
Understanding and recall are different skills. You understood the solution. Recall means you can produce it under pressure, from scratch, with no reference material.
Spaced repetition is how you bridge that gap. You practice typing the DFS serialize and deserialize functions from memory at increasing intervals. First after a day, then three days, then a week. Each time, you reconstruct the null sentinel, the preorder traversal, the iterator consumption. After a few reps, the tree to string conversion and the reverse are automatic.
The pre-order state passing building block and the recursive null base case together cover dozens of tree construction and traversal problems. Drilling them individually with spaced repetition is far more effective than grinding random LeetCode 297 variants and hoping the patterns stick.
Related posts
- Maximum Depth of Binary Tree - The simplest recursive tree problem, teaches the null base case pattern you use in serialization
- Validate Binary Search Tree - Another tree recursion problem that passes state down through recursive calls, similar to how deserialization passes the iterator