Encode and Decode TinyURL: Hash Map Design with Random Keys
Encode and Decode TinyURL is LeetCode problem 535. You need to design a URL shortening service with two methods: encode takes a long URL and returns a short one, and decode takes the short URL and returns the original. The system needs to be consistent, so decoding an encoded URL always gives back exactly the original.
This is an open-ended design problem. There is no single correct answer, which is what makes it interesting. The interviewer wants to see you think through trade-offs: random keys vs. counters, collision handling, and how to efficiently map in both directions.
The problem
Design two functions:
encode(longUrl)- Converts a long URL into a shortened URL.decode(shortUrl)- Converts the shortened URL back into the original long URL.
The key constraint: calling decode(encode(url)) must return the original url. The two functions must be inverses of each other.
The key insight
You need a bidirectional mapping. When encoding, you generate a short key for the long URL and store the relationship. When decoding, you look up that key to retrieve the original. The simplest correct approach uses two hash maps: one mapping keys to URLs, and one mapping URLs to keys. The second map lets you check if a URL has already been encoded, so you return the same short URL every time instead of generating a new one.
The URL-to-key map is not strictly required for correctness, but it prevents generating duplicate short URLs for the same long URL. Without it, encoding the same URL twice would create two different short URLs, both pointing to the same destination. That wastes keys and could confuse users.
The solution
import random
import string
class Codec:
def __init__(self):
self.key_to_url = {}
self.url_to_key = {}
self.chars = string.ascii_letters + string.digits
self.base = "http://tinyurl.com/"
def _generate_key(self):
return ''.join(random.choices(self.chars, k=6))
def encode(self, longUrl):
if longUrl in self.url_to_key:
return self.base + self.url_to_key[longUrl]
key = self._generate_key()
while key in self.key_to_url:
key = self._generate_key()
self.key_to_url[key] = longUrl
self.url_to_key[longUrl] = key
return self.base + key
def decode(self, shortUrl):
key = shortUrl.replace(self.base, "")
return self.key_to_url[key]
Step-by-step walkthrough
encode("https://leetcode.com/problems/design-tinyurl")
Generate random key "abc123". Store mapping in both directions. Return short URL.
encode("https://example.com/my-page")
Generate a different random key "xyz789". Add to both maps. No collision with the first key.
decode("http://tinyurl.com/abc123")
Extract key "abc123" from the short URL. Look it up in key-to-url map. Return original URL.
decode("http://tinyurl.com/xyz789")
Same process. Extract key "xyz789", look it up, return the second original URL.
Complexity analysis
| Approach | Time | Space |
|---|---|---|
| Counter-based | O(1) encode/decode | O(n) for n URLs |
| Random key | O(1) amortized | O(n) for n URLs |
Time: Both encode and decode run in O(1). Encode does a hash map lookup and possibly generates a random key. Decode does a single hash map lookup. The random key generation is O(1) amortized because collisions are extremely rare with a 6-character alphanumeric key (62^6 = over 56 billion possible keys).
Space: O(n) where n is the number of unique URLs encoded. Each URL requires two hash map entries (one in each direction).
The building blocks
1. Bidirectional mapping
The core data structure is a pair of hash maps. One maps short keys to long URLs (for decoding), and the other maps long URLs to short keys (for deduplication during encoding). This two-map pattern appears whenever you need O(1) lookup in both directions. You trade extra memory for the ability to go from key to value and from value to key without scanning.
2. Random key generation with collision handling
The key generation strategy matters. A simple counter (0, 1, 2, ...) works but exposes how many URLs have been shortened and produces predictable keys. Random alphanumeric strings are harder to guess. The while loop that regenerates on collision is the safety net. With 62^6 possible keys, you will almost never hit it in practice, but the code must handle it to be correct.
Edge cases
- Same URL encoded twice. Without the URL-to-key map, you would generate a new key each time. The solution handles this by checking
url_to_keyfirst and returning the existing short URL. - Very long URLs. The hash map stores the full URL string as a value. Python handles arbitrarily long strings, so this is not a problem for correctness, but memory usage scales with URL length.
- Collision during key generation. The while loop retries until it finds an unused key. With a 6-character key space of 62 characters, this is astronomically unlikely until you approach billions of stored URLs.
- Empty or malformed URLs. The problem guarantees valid URL inputs, but in a production system you would validate the input before encoding.
- Decoding a key that was never encoded. The problem guarantees decode is only called on previously encoded URLs. In production, you would return an error or a 404.
From understanding to recall
The encode/decode TinyURL problem feels simple once you see the answer, but reproducing it from scratch reveals the decisions you need to make. Which key generation strategy? How do you prevent duplicates? Do you need one map or two? The specific choices are easy to forget. Practice building the full class a few times, and the structure - two maps, a key generator, a collision loop - will become automatic.
Related posts
- LRU Cache - Another design problem pairing hash maps with a second data structure for O(1) operations
- Design Add and Search Words - A trie-based design problem where the data structure must support flexible lookups
- Implement Trie - The foundation for string-based design problems using prefix trees