Find the Index of the First Occurrence in a String
LeetCode 28, Find the Index of the First Occurrence in a String, gives you two strings haystack and needle. Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Examples:
haystack = "sadbutsad",needle = "sad"returns0haystack = "leetcode",needle = "leeto"returns-1
The approach: sliding window comparison
Think of the needle as a fixed-size window that slides across the haystack one position at a time. At each position i, you extract the substring haystack[i:i+len(needle)] and compare it to needle. If they match, return i. If you reach the end without finding a match, return -1.
The window only needs to start at positions 0 through len(haystack) - len(needle). Any later starting position would not leave enough characters in the haystack to fit the entire needle.
This is a direct application of substring matching. You are not searching for a single character but for a contiguous sequence of characters. The sliding window ensures you check every valid alignment of needle against haystack.
The code
def str_str(haystack: str, needle: str) -> int:
n, m = len(haystack), len(needle)
for i in range(n - m + 1):
if haystack[i:i + m] == needle:
return i
return -1
Visual walkthrough
Let's trace through both examples to see how the sliding window moves across the haystack.
Example 1: haystack = "sadbutsad", needle = "sad"
Step 1: Window at position 0
haystack[0:3] = "sad". Every character matches the needle. Return 0.
Example 2: haystack = "leetcode", needle = "leeto"
Step 1: Window at position 0
haystack[0:5] = "leetc". The first four characters match, but 'c' does not equal 'o'. Slide the window.
Step 2: Window at position 1
haystack[1] = 'e' does not match needle[0] = 'l'. Mismatch on the first character. Slide.
Step 3: Window at position 2
haystack[2] = 'e' does not match needle[0] = 'l'. Slide.
Step 4: Window at position 3
haystack[3] = 't' does not match needle[0] = 'l'. No more valid starting positions remain. Return -1.
In the first example, the very first window position is a match, so the algorithm returns immediately. In the second example, position 0 almost matches but fails at the fifth character. Positions 1 through 3 fail on the first character, and no valid positions remain. The function returns -1.
Complexity analysis
| Metric | Value |
|---|---|
| Time | O((n - m) * m), where n = len(haystack) and m = len(needle) |
| Space | O(1), no extra data structures beyond the loop variable |
At each of the n - m + 1 starting positions, the substring comparison can examine up to m characters. In the worst case (like haystack = "aaaaaab" and needle = "aab"), most comparisons go deep before failing.
In practice, mismatches tend to happen early, and many starting positions fail on the first character. The average case is often much better than the worst case.
For very large inputs, algorithms like KMP or Rabin-Karp can achieve O(n + m) time by avoiding redundant comparisons. For interview purposes and typical input sizes, the sliding window approach is clean, correct, and fast enough.
Building blocks
This problem is built from two fundamental techniques that appear across many string problems.
Substring matching
Extracting a substring of a specific length and comparing it against a target. This is the core operation here: haystack[i:i + m] == needle. The same idea shows up whenever you need to check if one string appears inside another, whether for pattern matching, searching, or validation.
Sliding comparison
Moving a fixed-size window across a sequence and evaluating a condition at each position. You have seen this pattern in problems like Longest Substring Without Repeating Characters and Minimum Window Substring, though those use variable-size windows. Here the window size is fixed at len(needle), which simplifies the logic to a single loop.
Edge cases
Needle is empty. The problem constraints guarantee needle has at least one character, but if you want to handle it defensively, an empty needle should return 0. The loop range(n - 0 + 1) would run from 0 to n, and an empty-string comparison always matches at index 0.
Needle is longer than haystack. If m is greater than n, the expression n - m + 1 is zero or negative. The range() call produces no iterations, so the function falls through to return -1. This is correct without any special handling.
Match at the end. If needle only appears at the very last valid position (like haystack = "mississippi", needle = "ippi"), the loop reaches i = n - m and finds the match there. The window covers the final m characters of the haystack.
No match at all. The loop exhausts every starting position without finding a match, and the function returns -1.
From understanding to recall
This problem feels simple when you read through it. Slide a window, compare substrings, return the index. But two weeks from now, the details that matter are the ones that slip: the loop bound of n - m + 1, the slice haystack[i:i + m], and returning -1 as the fallback. These are small but precise.
Spaced repetition locks these details in. You drill the sliding comparison pattern by typing it from scratch at increasing intervals. After a few reps, the loop structure and boundary logic become automatic. When you encounter a substring search problem in an interview, the code is already in your fingers.
Related posts
- Longest Common Prefix - Another easy string problem that involves character-by-character comparison across strings
- Valid Anagram - A string problem that uses frequency counting instead of positional comparison