Skip to content
← All posts

Snapshot Array: Versioned Data with Binary Search

7 min read
leetcodeproblemmediumarrayshash-mapbinary-search

You need to design a data structure called SnapshotArray that supports three operations. set(index, val) sets the element at a given index to a value. snap() takes a snapshot of the array and returns the snap_id (the number of times snap has been called minus one). get(index, snap_id) returns the value at the given index at the time the given snapshot was taken.

The naive approach of copying the entire array on every snap() call would be far too expensive. The challenge is to make snapshots cheap while still supporting fast lookups into past versions of the array.

This is LeetCode 1146: Snapshot Array.

SnapshotArray (length=3), snap_id up to 1arr[0](0, 5)(1, 12)arr[1](0, 9)arr[2](1, 7)
Each array index stores a list of (snap_id, value) pairs. Only changes are recorded, not every snapshot.

Why this problem matters

Versioned data structures show up everywhere in real systems: databases use multiversion concurrency control, version control systems track file states over time, and undo/redo functionality relies on snapshots. This problem distills the core idea into a compact design exercise. It also combines two patterns you will see repeatedly in interviews: storing only what changes (instead of copying everything) and using binary search to retrieve the right version efficiently.

Once you understand the approach here, you can apply the same "log changes, search on read" strategy to problems like Time Based Key-Value Store (LeetCode 981) and any problem where you need to look up historical state.

The key insight

Instead of copying the entire array on each snapshot, you store only the changes. Each index maintains its own history: a list of (snap_id, value) pairs, appended only when set is called. When snap() is called, you just increment a counter. No data is copied at all.

The trick is in get(index, snap_id). Since each index's history is sorted by snap_id (snap_ids only increase over time), you can binary search for the largest snap_id that does not exceed the requested one. This gives you the value that was "current" at that snapshot.

The algorithm:

  1. Initialize each index with an empty history list. Keep a snap_counter starting at 0.
  2. On set(index, val), append (snap_counter, val) to the history at that index. If the last entry already has the same snap_id, update it in place instead of appending a duplicate.
  3. On snap(), return the current snap_counter and increment it. Nothing else happens.
  4. On get(index, snap_id), binary search the history at that index for the largest snap_id that is snap_id or less. Return the associated value, or 0 if no entry exists.

The solution

import bisect

class SnapshotArray:
    def __init__(self, length: int):
        self.snaps = [[] for _ in range(length)]
        self.snap_id = 0

    def set(self, index: int, val: int) -> None:
        history = self.snaps[index]
        if history and history[-1][0] == self.snap_id:
            history[-1] = (self.snap_id, val)
        else:
            history.append((self.snap_id, val))

    def snap(self) -> int:
        result = self.snap_id
        self.snap_id += 1
        return result

    def get(self, index: int, snap_id: int) -> int:
        history = self.snaps[index]
        if not history:
            return 0
        i = bisect.bisect_right(history, (snap_id, float("inf"))) - 1
        if i < 0:
            return 0
        return history[i][1]

Let's walk through each piece.

__init__ creates a list of empty lists, one per index. Each inner list will hold (snap_id, value) tuples. The snap_id counter starts at 0.

set appends a (snap_id, val) pair to the history at the given index. There is one optimization: if the most recent entry already belongs to the current snap_id, we overwrite it instead of appending. This prevents the history from growing unnecessarily when set is called multiple times for the same index before the next snapshot.

snap returns the current snap_id and increments the counter. This is the entire cost of a snapshot. No arrays are copied, no data is duplicated. The snap_id acts as a logical timestamp that partitions the history entries.

get performs a binary search on the history at the given index. We use bisect_right with the tuple (snap_id, float("inf")) to find the insertion point just past any entry with that snap_id. Subtracting one gives us the index of the last entry whose snap_id does not exceed the target. If that index is negative, no entry existed before the requested snapshot, so we return 0.

The binary search in get is the same "find the largest value not exceeding a target" pattern you see in Time Based Key-Value Store. Tuples compare element by element in Python, so (snap_id, float("inf")) sorts after any (snap_id, v) for a real value v. This makes bisect_right land exactly where you need it.

Visual walkthrough

Let's trace through the LeetCode example step by step. Watch how each operation only modifies the history for the affected index, and how get uses binary search to find the right version.

Step 1: SnapshotArray(3). Initialize with length 3.

snap_counter = 0arr[0](empty)arr[1](empty)arr[2](empty)

All indices start with empty history lists. snap_counter starts at 0.

Step 2: set(0, 5). Record value 5 at index 0 under snap_id 0.

snap_counter = 0arr[0](0, 5)arr[1](empty)arr[2](empty)

Append (0, 5) to arr[0]'s history. Only the changed index is updated.

Step 3: snap(). Returns 0, then increments snap_counter to 1.

snap_counter = 1arr[0](0, 5)arr[1](empty)arr[2](empty)

snap() returns the current snap_counter (0) and bumps it to 1. No data is copied.

Step 4: set(0, 6). Record value 6 at index 0 under snap_id 1.

snap_counter = 1arr[0](0, 5)(1, 6)arr[1](empty)arr[2](empty)

Append (1, 6) to arr[0]'s history. Both versions are preserved.

Step 5: get(0, 0). Binary search arr[0] for the largest snap_id that is 0 or less.

snap_counter = 1arr[0](0, 5)(1, 6)arr[1](empty)arr[2](empty)

Binary search finds (0, 5). The snap_id 0 entry has value 5. Return 5.

Notice that snap() does zero work on the data itself. It just bumps a counter. All the real work happens lazily in get, where binary search finds the correct version in logarithmic time. This "write the change, search on read" pattern is the core idea behind the entire solution.

Complexity analysis

OperationTimeSpace
setO(1)O(1)
snapO(1)O(1)
getO(log S)O(1)
Overall space-O(Q)

set is O(1) because it either appends to a list or overwrites the last element. Both are constant-time operations.

snap is O(1) because it only increments a counter and returns the old value. This is the key advantage over the naive "copy everything" approach, which would be O(n) per snapshot.

get is O(log S) where S is the number of entries in the history for that particular index. In the worst case, if every snap had a set call for that index, S equals the total number of snapshots. Binary search over S entries costs O(log S).

Overall space is O(Q) where Q is the total number of set calls. Each set call adds at most one entry to one history list. We never store more entries than the number of set operations performed.

The building blocks

1. Lazy snapshotting with a logical timestamp

Instead of copying the array, you use a counter that increments on each snap(). Each set call tags its value with the current counter. This "logical timestamp" approach avoids O(n) copy costs and is the same strategy used in multiversion concurrency control in databases. You will see this pattern whenever a problem asks you to efficiently retrieve past states.

2. Binary search for bounded lookup

The get operation searches a sorted list for the largest entry whose key does not exceed a target. This is the "rightmost valid" variant of binary search. You encounter it in Time Based Key-Value Store, Koko Eating Bananas, and any problem where sorted data meets a "find the best candidate at or before X" query.

Edge cases

Before submitting, think through these scenarios:

  • get before any set: the history for that index is empty. Return 0 (the default value).
  • Multiple sets to the same index in one snap: only the last value matters. The in-place overwrite in set ensures the history does not grow with redundant entries for the same snap_id.
  • get with a snap_id of 0: binary search should still work correctly on a single-element list or return 0 if the index was never set before snap 0.
  • Large gap in snap_ids for an index: if index 5 was only set during snap 0 and snap 1000, get(5, 500) should return the value from snap 0. Binary search handles this naturally because it finds the largest snap_id not exceeding 500.
  • All sets to a single index: the history for one index grows to Q entries while all others remain empty. Space is proportional to actual writes, not to (length * snapshots).
  • snap called many times without any set: snap_counter increases but no new history entries are created. This is free.

From understanding to recall

You have seen how Snapshot Array works: tag each write with a logical timestamp, do nothing on snap, and binary search on read. The design is elegant and the code is short. But reproducing it under pressure requires getting the binary search details exactly right.

The tricky parts are subtle. Do you use bisect_left or bisect_right? How do you handle the tuple comparison to land on the correct entry? What do you return when the binary search index is negative? These are not conceptual misunderstandings. They are recall gaps, and they cost time in interviews.

Spaced repetition is how you close them. You practice writing the set/snap/get methods from memory at increasing intervals. After a few rounds, you see "versioned lookup" in a problem description and the entire structure flows out without hesitation. The binary search template, the lazy snapshot counter, the tuple trick with float("inf") become automatic.

Related posts

  • Time Based Key-Value Store - The same "store timestamped entries, binary search on read" pattern applied to a key-value store
  • Design HashMap - Building a hash map from scratch, the foundational data structure used for per-index storage
  • Binary Search - The core binary search template that powers the bounded lookup in get

CodeBricks breaks Snapshot Array into its lazy timestamp tagging and bounded binary search building blocks, then drills them independently with spaced repetition. You type each piece from scratch until the pattern is automatic. When a versioned data structure problem shows up in your interview, you do not think about it. You just write it.