Unit Testing: Testing Individual Pieces in Isolation
A unit test verifies that a single function, method, or class works correctly in isolation. No database. No network. No filesystem. Just your code and a set of inputs with expected outputs. If the test fails, you know exactly which piece is broken.
Unit testing is the foundation of every serious testing strategy. It is the fastest, cheapest, and most precise way to catch bugs. If you only do one kind of testing (you should do more), unit testing is the one to start with.
What counts as a "unit"?
A unit is the smallest testable piece of your code. In most cases, that means a single function or method. Sometimes it means a class, if the class is small and focused.
def is_palindrome(s):
cleaned = ''.join(c.lower() for c in s if c.isalnum())
return cleaned == cleaned[::-1]
# Unit tests
assert is_palindrome("racecar") == True
assert is_palindrome("hello") == False
assert is_palindrome("A man a plan a canal Panama") == True
assert is_palindrome("") == True
Each test exercises one function with one input and checks one expected output. The function does not depend on anything external. That is the ideal unit test: fast, focused, and self-contained.
Why unit testing matters
Fast feedback
Unit tests run in milliseconds. You can execute thousands of them in seconds because they do not touch databases, APIs, or the filesystem. This speed means you can run them constantly, after every change, without interrupting your workflow.
Precise failure localization
When a unit test fails, you know exactly which function is broken. You do not have to dig through logs or trace through multiple layers of the application. The test name tells you what failed and the assertion tells you how.
Cheap bug detection
The cost of fixing a bug goes up dramatically the later you find it. A bug caught by a unit test during development costs minutes. The same bug found during integration testing costs hours. The same bug found in production costs days, along with user trust and potentially revenue. Unit tests are your cheapest line of defense.
How to write good unit tests
The Arrange-Act-Assert pattern
Every unit test follows three steps:
- Arrange the inputs and preconditions.
- Act by calling the function under test.
- Assert that the result matches what you expect.
def test_sorting_empty_list():
# Arrange
items = []
# Act
result = merge_sort(items)
# Assert
assert result == []
def test_sorting_already_sorted():
# Arrange
items = [1, 2, 3, 4, 5]
# Act
result = merge_sort(items)
# Assert
assert result == [1, 2, 3, 4, 5]
def test_sorting_reverse_order():
# Arrange
items = [5, 4, 3, 2, 1]
# Act
result = merge_sort(items)
# Assert
assert result == [1, 2, 3, 4, 5]
def test_sorting_with_duplicates():
# Arrange
items = [3, 1, 4, 1, 5, 9, 2, 6, 5]
# Act
result = merge_sort(items)
# Assert
assert result == [1, 1, 2, 3, 4, 5, 5, 6, 9]
This structure makes every test readable at a glance. You can immediately see what is being set up, what is being called, and what is expected.
One behavior per test
Each test should verify one thing. If a test checks three different behaviors and fails, you have to read the whole test to figure out which behavior is broken. With one behavior per test, the test name alone tells you what went wrong.
# Bad: testing multiple behaviors in one test
def test_stack():
s = Stack()
s.push(1)
s.push(2)
assert s.peek() == 2
assert s.pop() == 2
assert s.pop() == 1
assert s.is_empty() == True
# Good: one behavior per test
def test_push_adds_to_top():
s = Stack()
s.push(1)
s.push(2)
assert s.peek() == 2
def test_pop_removes_from_top():
s = Stack()
s.push(1)
s.push(2)
assert s.pop() == 2
def test_pop_until_empty():
s = Stack()
s.push(1)
s.pop()
assert s.is_empty() == True
Descriptive names
Test names should describe the scenario and the expected result. When a test fails in a CI pipeline, the name is often the first thing you see. Make it count.
# Vague names
def test_1(): ...
def test_sort(): ...
def test_edge_case(): ...
# Descriptive names
def test_sort_returns_empty_list_for_empty_input(): ...
def test_sort_handles_single_element(): ...
def test_sort_preserves_order_of_equal_elements(): ...
A good test name reads like a sentence: "sort returns empty list for empty input." If that test fails, you know immediately what broke.
What to test
Good unit tests cover four categories of inputs.
Normal cases
The common, expected inputs that represent typical usage.
def test_fibonacci_normal_cases():
assert fibonacci(1) == 1
assert fibonacci(5) == 5
assert fibonacci(10) == 55
Edge cases
Unusual but valid inputs at the extremes of what the function should handle.
def test_fibonacci_edge_cases():
assert fibonacci(0) == 0
assert fibonacci(1) == 1
assert fibonacci(2) == 1
Error cases
Inputs that should cause the function to raise an exception or return an error value.
import pytest
def test_fibonacci_error_cases():
with pytest.raises(ValueError):
fibonacci(-1)
with pytest.raises(TypeError):
fibonacci("five")
Boundary values
Values right at the edge of where behavior changes. Bugs cluster at boundaries, so these tests catch off-by-one errors and similar issues.
def categorize_age(age):
if age < 0 or age > 150:
raise ValueError("Invalid age")
if age < 13:
return "child"
if age < 18:
return "teen"
return "adult"
def test_age_boundaries():
# Right at each boundary
assert categorize_age(0) == "child"
assert categorize_age(12) == "child"
assert categorize_age(13) == "teen"
assert categorize_age(17) == "teen"
assert categorize_age(18) == "adult"
assert categorize_age(150) == "adult"
# Just outside valid range
with pytest.raises(ValueError):
categorize_age(-1)
with pytest.raises(ValueError):
categorize_age(151)
When deciding what to test, think about partitions: groups of inputs that should produce the same kind of result. Test at least one value from each partition, and always test the boundaries between partitions. This gives you strong coverage without writing hundreds of redundant tests.
Testing a class: Stack example
Here is a complete example of unit testing a data structure. Each test is focused, descriptive, and tests one behavior.
import pytest
class Stack:
def __init__(self):
self._items = []
def push(self, item):
self._items.append(item)
def pop(self):
if self.is_empty():
raise IndexError("pop from empty stack")
return self._items.pop()
def peek(self):
if self.is_empty():
raise IndexError("peek at empty stack")
return self._items[-1]
def is_empty(self):
return len(self._items) == 0
def size(self):
return len(self._items)
def test_new_stack_is_empty():
s = Stack()
assert s.is_empty() == True
assert s.size() == 0
def test_push_increases_size():
s = Stack()
s.push(10)
assert s.size() == 1
s.push(20)
assert s.size() == 2
def test_peek_returns_top_without_removing():
s = Stack()
s.push(10)
s.push(20)
assert s.peek() == 20
assert s.size() == 2
def test_pop_returns_and_removes_top():
s = Stack()
s.push(10)
s.push(20)
assert s.pop() == 20
assert s.size() == 1
def test_pop_empty_stack_raises():
s = Stack()
with pytest.raises(IndexError):
s.pop()
def test_peek_empty_stack_raises():
s = Stack()
with pytest.raises(IndexError):
s.peek()
def test_push_pop_order_is_lifo():
s = Stack()
for i in range(5):
s.push(i)
results = [s.pop() for _ in range(5)]
assert results == [4, 3, 2, 1, 0]
Notice the pattern: each test creates a fresh Stack, performs one action, and checks one result. The tests do not depend on each other. You can run them in any order.
Mocking and stubs
Sometimes a function depends on something external, like a database, an API, or a file. You do not want your unit test to call a real API. It would be slow, flaky, and test the API instead of your code. That is where mocking comes in.
A mock replaces a real dependency with a fake one that you control. You define what the fake returns, and your test verifies that your function handles the response correctly.
from unittest.mock import patch
def get_user_greeting(user_id):
user = fetch_user_from_api(user_id) # External dependency
return f"Hello, {user['name']}!"
@patch("mymodule.fetch_user_from_api")
def test_greeting_with_valid_user(mock_fetch):
# Arrange: mock returns a fake user
mock_fetch.return_value = {"name": "Alice", "id": 1}
# Act
result = get_user_greeting(1)
# Assert
assert result == "Hello, Alice!"
mock_fetch.assert_called_once_with(1)
@patch("mymodule.fetch_user_from_api")
def test_greeting_when_api_fails(mock_fetch):
# Arrange: mock raises an exception
mock_fetch.side_effect = ConnectionError("API down")
# Act and Assert
with pytest.raises(ConnectionError):
get_user_greeting(1)
The mock lets you test your function's logic without depending on the real API. You can simulate success, failure, timeouts, and any other scenario. Your tests stay fast and reliable.
A stub is similar but simpler. A mock tracks how it was called (you can assert that it was called with specific arguments). A stub just returns a fixed value without tracking calls. In practice, people often use the word "mock" for both.
Mock at the boundary, not inside. If your function calls a helper function that calls another helper function that calls the database, mock the database call, not the intermediate helpers. Mocking too deep ties your tests to implementation details and makes them brittle.
Common mistakes
Testing implementation details instead of behavior
# Bad: testing HOW the function works
def test_sort_uses_partition():
with patch("mymodule.partition") as mock_partition:
quick_sort([3, 1, 2])
mock_partition.assert_called()
# Good: testing WHAT the function produces
def test_sort_returns_sorted_list():
assert quick_sort([3, 1, 2]) == [1, 2, 3]
The first test breaks if you switch from quicksort to mergesort, even though the behavior (sorting) is the same. The second test passes regardless of the sorting algorithm. Test the output, not the internal mechanics.
Tests that depend on each other
# Bad: test_b depends on test_a running first
shared_list = []
def test_a_adds_item():
shared_list.append(1)
assert len(shared_list) == 1
def test_b_checks_item():
assert shared_list[0] == 1 # Fails if test_a did not run first
Each test should set up its own state from scratch. Tests that share mutable state create hidden dependencies that cause mysterious failures when test order changes.
Testing too many things in one test
When a test checks five different assertions across three different behaviors, a failure tells you almost nothing. You still have to read the test body and figure out which assertion failed and why. Keep it to one logical behavior per test.
Not testing error cases
If your function can fail, test that it fails correctly. Does it raise the right exception? Does it return a meaningful error message? Untested error paths are where production bugs hide.
The testing pyramid
The testing pyramid describes the ideal distribution of test types in your codebase.
At the base: many unit tests. They are fast, cheap, and precise. They form the bulk of your test suite and catch the majority of bugs during development.
In the middle: fewer integration tests. These verify that modules work together correctly. They are slower and less precise than unit tests, but they catch bugs that unit tests cannot, like interface mismatches and configuration errors.
At the top: few end-to-end (E2E) tests. These test the entire system from the user's perspective. They are the slowest and most expensive to maintain, but they catch bugs that only appear when the full system is running.
A common ratio is 70% unit tests, 20% integration tests, and 10% E2E tests. The exact numbers vary by project, but the shape stays the same: lots of fast, cheap tests at the bottom and a few slow, expensive tests at the top.
If your pyramid is inverted (lots of E2E tests and few unit tests), your test suite will be slow, flaky, and expensive to maintain. If a test fails, you will spend more time figuring out what broke than actually fixing it.
What unit tests do NOT catch
Unit tests verify that individual pieces work in isolation. They are blind to problems that only appear when pieces interact or when real-world conditions come into play.
Integration bugs. Function A returns a list. Function B expects a dictionary. Both pass their unit tests individually, but the system breaks when they are connected. Unit tests cannot catch interface mismatches.
UI and layout bugs. A button works correctly in your unit test (it calls the right handler), but it is hidden behind another element on the actual page. Unit tests do not render UI.
Real-world data issues. Your function handles all your test inputs perfectly, but production data contains Unicode characters, null bytes, or strings that are 10 million characters long. Unit tests only cover the cases you think of.
Performance problems. A function returns the correct result but takes 30 seconds on real data. Unit tests verify correctness, not performance (unless you specifically add timing assertions).
Configuration and environment issues. The code works on your machine but fails in production because an environment variable is missing or a file path is different.
This is exactly why unit tests are the foundation, not the entire building. You need integration tests, E2E tests, and other strategies to cover the gaps.
The takeaway
Unit testing is the most fundamental testing practice in software engineering. It gives you fast feedback, precise failure localization, and cheap bug detection. A well-written unit test suite lets you refactor with confidence, catch regressions instantly, and document how your code is supposed to behave.
Write tests that follow Arrange-Act-Assert. Test one behavior per test. Give tests descriptive names. Cover normal cases, edge cases, error cases, and boundary values. Use mocks to isolate your code from external dependencies. And avoid the common traps: do not test implementation details, do not let tests depend on each other, and do not try to verify everything in one test.
Unit tests are not enough on their own. They need integration tests and E2E tests to cover the gaps. But they are where every good testing strategy starts.
Related posts
- Software Testing Types covers the full landscape of testing strategies, from unit to integration to back-to-back testing.
- Verification vs. Validation explains the difference between "did we build it right" and "did we build the right thing."
- Cohesion in Software Design explains why well-structured, focused modules are much easier to unit test.