Skip to content
← All posts

Find the Index of the First Occurrence in a String

4 min read
leetcodeproblemeasystrings

LeetCode 28, Find the Index of the First Occurrence in a String, gives you two strings haystack and needle. Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Examples:

  • haystack = "sadbutsad", needle = "sad" returns 0
  • haystack = "leetcode", needle = "leeto" returns -1
012345678sadbutsadsadhaystackneedlematch at index 0
The needle "sad" aligns with the first three characters of "sadbutsad". The first occurrence starts at index 0.

The approach: sliding window comparison

Think of the needle as a fixed-size window that slides across the haystack one position at a time. At each position i, you extract the substring haystack[i:i+len(needle)] and compare it to needle. If they match, return i. If you reach the end without finding a match, return -1.

The window only needs to start at positions 0 through len(haystack) - len(needle). Any later starting position would not leave enough characters in the haystack to fit the entire needle.

This is a direct application of substring matching. You are not searching for a single character but for a contiguous sequence of characters. The sliding window ensures you check every valid alignment of needle against haystack.

The code

def str_str(haystack: str, needle: str) -> int:
    n, m = len(haystack), len(needle)
    for i in range(n - m + 1):
        if haystack[i:i + m] == needle:
            return i
    return -1

Visual walkthrough

Let's trace through both examples to see how the sliding window moves across the haystack.

Example 1: haystack = "sadbutsad", needle = "sad"

Step 1: Window at position 0

012345678sadbutsadsad

haystack[0:3] = "sad". Every character matches the needle. Return 0.

Example 2: haystack = "leetcode", needle = "leeto"

Step 1: Window at position 0

01234567leetcodeleeto

haystack[0:5] = "leetc". The first four characters match, but 'c' does not equal 'o'. Slide the window.

Step 2: Window at position 1

01234567leetcodeleeto

haystack[1] = 'e' does not match needle[0] = 'l'. Mismatch on the first character. Slide.

Step 3: Window at position 2

01234567leetcodeleeto

haystack[2] = 'e' does not match needle[0] = 'l'. Slide.

Step 4: Window at position 3

01234567leetcodeleeto

haystack[3] = 't' does not match needle[0] = 'l'. No more valid starting positions remain. Return -1.

In the first example, the very first window position is a match, so the algorithm returns immediately. In the second example, position 0 almost matches but fails at the fifth character. Positions 1 through 3 fail on the first character, and no valid positions remain. The function returns -1.

Complexity analysis

MetricValue
TimeO((n - m) * m), where n = len(haystack) and m = len(needle)
SpaceO(1), no extra data structures beyond the loop variable

At each of the n - m + 1 starting positions, the substring comparison can examine up to m characters. In the worst case (like haystack = "aaaaaab" and needle = "aab"), most comparisons go deep before failing.

In practice, mismatches tend to happen early, and many starting positions fail on the first character. The average case is often much better than the worst case.

For very large inputs, algorithms like KMP or Rabin-Karp can achieve O(n + m) time by avoiding redundant comparisons. For interview purposes and typical input sizes, the sliding window approach is clean, correct, and fast enough.

Building blocks

This problem is built from two fundamental techniques that appear across many string problems.

Substring matching

Extracting a substring of a specific length and comparing it against a target. This is the core operation here: haystack[i:i + m] == needle. The same idea shows up whenever you need to check if one string appears inside another, whether for pattern matching, searching, or validation.

Sliding comparison

Moving a fixed-size window across a sequence and evaluating a condition at each position. You have seen this pattern in problems like Longest Substring Without Repeating Characters and Minimum Window Substring, though those use variable-size windows. Here the window size is fixed at len(needle), which simplifies the logic to a single loop.

Edge cases

Needle is empty. The problem constraints guarantee needle has at least one character, but if you want to handle it defensively, an empty needle should return 0. The loop range(n - 0 + 1) would run from 0 to n, and an empty-string comparison always matches at index 0.

Needle is longer than haystack. If m is greater than n, the expression n - m + 1 is zero or negative. The range() call produces no iterations, so the function falls through to return -1. This is correct without any special handling.

Match at the end. If needle only appears at the very last valid position (like haystack = "mississippi", needle = "ippi"), the loop reaches i = n - m and finds the match there. The window covers the final m characters of the haystack.

No match at all. The loop exhausts every starting position without finding a match, and the function returns -1.

From understanding to recall

This problem feels simple when you read through it. Slide a window, compare substrings, return the index. But two weeks from now, the details that matter are the ones that slip: the loop bound of n - m + 1, the slice haystack[i:i + m], and returning -1 as the fallback. These are small but precise.

Spaced repetition locks these details in. You drill the sliding comparison pattern by typing it from scratch at increasing intervals. After a few reps, the loop structure and boundary logic become automatic. When you encounter a substring search problem in an interview, the code is already in your fingers.

Related posts

  • Longest Common Prefix - Another easy string problem that involves character-by-character comparison across strings
  • Valid Anagram - A string problem that uses frequency counting instead of positional comparison