Integration Testing: Verifying That Modules Work Together

May 11, 20269 min read

software-engineeringpatterns

Your unit tests all pass. Every function works perfectly in isolation. You deploy with confidence, and the system immediately breaks. The API sends data that the service layer cannot parse. The service layer writes to the database in a format the query layer does not expect. Every individual piece is correct, but the whole system is wrong.

This is exactly what integration testing is designed to catch. Integration tests verify that two or more modules work correctly when connected. They test the seams, the places where one module hands off data to another. Those seams are where the most dangerous bugs hide.

Why unit tests are not enough

Unit tests answer one question: does this function produce the correct output for a given input? That is valuable, but it says nothing about what happens when that function's output becomes another function's input.

Here are the kinds of bugs that unit tests will never catch:

Interface mismatches. Module A returns a list of tuples. Module B expects a list of dictionaries. Both modules are individually correct. The system breaks the moment they connect.

Configuration errors. The database connection string points to the wrong host. The API route is registered at /api/orders but the frontend calls /api/order. No amount of unit testing will surface a misconfigured environment variable.

Data format issues. Module A stores timestamps as Unix integers. Module B expects ISO 8601 strings. Both modules handle their respective formats perfectly. The data is mangled in transit.

Authentication and authorization flows. The login function returns the right token. The protected endpoint validates tokens correctly. But the token format changed between versions, and the two modules no longer agree on what a valid token looks like.

These bugs live in the connections, not in the components. You need a different kind of test to find them.

A unit test that passes while the system fails

Here is a concrete example. You have two modules: a UserService that fetches user data and a NotificationService that sends welcome emails.

# user_service.py
class UserService:
    def get_user(self, user_id):
        # Returns user data from the database
        return {"user_id": user_id, "name": "Alice", "email": "alice@example.com"}

# notification_service.py
class NotificationService:
    def send_welcome_email(self, user):
        # Expects user to have an "email_address" field
        recipient = user["email_address"]
        return f"Welcome email sent to {recipient}"

Now look at the unit tests. Both pass.

# test_user_service.py
def test_get_user():
    service = UserService()
    user = service.get_user(1)
    assert user["user_id"] == 1
    assert user["name"] == "Alice"
    assert user["email"] == "alice@example.com"
    # PASSES

# test_notification_service.py
def test_send_welcome_email():
    mock_user = {"email_address": "alice@example.com", "name": "Alice"}
    service = NotificationService()
    result = service.send_welcome_email(mock_user)
    assert "alice@example.com" in result
    # PASSES

Both tests pass. But the system is broken. UserService returns a dictionary with "email" as the key. NotificationService expects "email_address". When you wire these two modules together, you get a KeyError at runtime.

The unit test for NotificationService used a mock with "email_address", which matched the function's expectation. But the real data source uses "email". The mock hid the mismatch.

An integration test catches this immediately:

def test_welcome_email_flow():
    user_service = UserService()
    notification_service = NotificationService()

    user = user_service.get_user(1)
    result = notification_service.send_welcome_email(user)  # KeyError: 'email_address'

    assert "alice@example.com" in result

This test fails, which is exactly what you want. It tells you the two modules do not agree on the data contract before the bug ever reaches production.

Practice writing testable code

Testing an API to service to database flow

Real integration tests often span three or more layers. Here is an example that tests a complete order creation flow: API endpoint, service layer, and database.

import pytest
from app import create_app
from app.database import get_db, reset_db

@pytest.fixture
def client():
    app = create_app(testing=True)
    with app.test_client() as client:
        reset_db()
        yield client

def test_create_order_end_to_end(client):
    # Step 1: Create a user via the API
    response = client.post("/api/users", json={
        "name": "Alice",
        "email": "alice@example.com"
    })
    assert response.status_code == 201
    user_id = response.get_json()["id"]

    # Step 2: Create an order for that user
    response = client.post("/api/orders", json={
        "user_id": user_id,
        "items": [
            {"product_id": "WIDGET-1", "quantity": 3, "price": 9.99},
            {"product_id": "GADGET-2", "quantity": 1, "price": 24.99}
        ]
    })
    assert response.status_code == 201
    order = response.get_json()

    # Step 3: Verify the API response has the correct total
    assert order["total"] == 54.96  # (3 * 9.99) + (1 * 24.99)
    assert order["status"] == "pending"

    # Step 4: Verify the database actually stored the order
    db = get_db()
    stored_order = db.query_order(order["id"])
    assert stored_order is not None
    assert stored_order.total == 54.96
    assert stored_order.user_id == user_id
    assert len(stored_order.items) == 2

This test exercises the full chain. The API receives the request and passes it to the service layer. The service layer calculates the total and passes it to the database. The database stores the order. The test verifies every handoff: that the API correctly forwards data, that the service correctly computes the total, and that the database correctly persists the result.

If any of those connections are broken, this test fails. A wrong column name in the SQL query, a mismatched JSON field, a calculation error in the service layer, or a missing foreign key constraint. All of these surface here.

Big bang vs. incremental integration

There are two fundamental strategies for integration testing.

Big bang integration combines all modules at once and tests the entire system. This is simple to set up but painful to debug. When something fails, the bug could be in any module or any connection. With 10 modules, you might have dozens of interactions to investigate.

Incremental integration adds and tests one module at a time. You start with module A, verify it works, then add module B and test A + B together, then add module C and test A + B + C. At each step, if something breaks, you know the newest module or its connections caused the problem.

Incremental integration is almost always the better choice. It isolates failures and makes debugging faster. For a deeper look at top-down, bottom-up, and sandwich strategies, see Incremental Testing.

The testing pyramid

Integration tests sit in the middle layer of the testing pyramid.

At the base, you have unit tests. They are fast, cheap, and numerous. You should have hundreds or thousands of them. They catch bugs in individual functions and give you precise failure messages.

In the middle, you have integration tests. They are slower and more expensive than unit tests because they involve multiple components. You have fewer of them, focused on the most important connections in your system.

At the top, you have end-to-end tests. They test the entire system from the user's perspective, often through a browser or a full API client. They are the slowest and most expensive, so you have the fewest of them.

A common ratio is 70% unit tests, 20% integration tests, and 10% end-to-end tests. The exact numbers depend on your system, but the shape is always a pyramid: many fast tests at the base, fewer slow tests at the top.

Integration tests are the middle layer because they provide the best balance of realism and speed. They are realistic enough to catch connection bugs that unit tests miss, but fast enough that you can run them on every commit.

Build testing skills with spaced repetition

Best practices

Use a real database, not mocks

The whole point of an integration test is to verify that modules actually work together. If you mock the database, you are not testing the integration. You are testing your mock.

Use an in-memory database like SQLite for fast tests, or spin up a real PostgreSQL instance in a Docker container. The test should exercise the actual SQL queries, the actual schema, and the actual connection logic. That is where the bugs are.

# Bad: mocking the database defeats the purpose
def test_create_user_with_mock():
    mock_db = Mock()
    mock_db.insert.return_value = {"id": 1, "name": "Alice"}
    service = UserService(db=mock_db)
    result = service.create_user("Alice")
    assert result["id"] == 1  # This proves nothing about the real database

# Good: using a real database
def test_create_user_with_real_db():
    db = connect_to_test_database()
    service = UserService(db=db)
    result = service.create_user("Alice")

    # Verify the data actually made it into the database
    stored = db.query("SELECT * FROM users WHERE id = %s", (result["id"],))
    assert stored["name"] == "Alice"

Test critical paths end-to-end

You do not need integration tests for every possible interaction. Focus on the paths that matter most: user registration, payment processing, order creation, authentication flows. These are the paths where a failure means lost revenue or lost users.

Keep tests focused on the integration point

An integration test should verify that the connection between modules works. It should not re-test all the business logic that your unit tests already cover. If your unit tests verify that the tax calculation is correct for 50 different scenarios, your integration test only needs to verify that the tax calculation result actually reaches the database.

Clean up test data

Every integration test should leave the database in the same state it found it. Use transactions that roll back after each test, or reset the database between tests. Leftover data from one test can cause the next test to pass or fail for the wrong reasons.

@pytest.fixture(autouse=True)
def clean_database():
    db = get_test_db()
    db.begin_transaction()
    yield
    db.rollback()  # Undo everything the test did

Common mistakes

Making integration tests too broad

An integration test that tests everything at once is a big bang test in disguise. When it fails, you have no idea which connection is broken. Keep each test focused on one integration point: API to service, service to database, service to external API.

# Too broad: testing everything in one test
def test_entire_checkout_flow():
    register_user()
    add_items_to_cart()
    apply_discount_code()
    process_payment()
    send_confirmation_email()
    update_inventory()
    generate_invoice()
    # If this fails, which of the 7 steps broke?

# Focused: testing one integration at a time
def test_payment_updates_order_status():
    order = create_test_order()
    payment_service.process(order)
    assert db.get_order(order.id).status == "paid"

Not cleaning up test data

If test A creates a user named "Alice" and test B assumes the users table is empty, test B will fail when run after test A but pass when run alone. This leads to flaky tests that pass sometimes and fail others, depending on the order they run. Always clean up.

Slow test suites from hitting a real database on every test

Integration tests are inherently slower than unit tests, but they should not take minutes per test. Use transactions that roll back instead of deleting rows. Use in-memory databases where possible. Run integration tests in parallel if your framework supports it. Group related assertions into a single test instead of making 10 separate tests that each set up and tear down the same database state.

If your integration test suite takes more than a few minutes, developers will stop running it. A test suite that nobody runs catches zero bugs.

The takeaway

Unit tests verify that individual modules work. Integration tests verify that those modules work together. Both are necessary. A system that passes all its unit tests but has no integration tests is a system where every brick is perfect but the mortar between them might be missing.

Focus your integration tests on the connections: the API contracts, the database queries, the data formats passed between layers. Use real dependencies instead of mocks. Keep each test focused on one integration point. Clean up after yourself. And keep the suite fast enough that your team actually runs it.

The bugs that bring down production systems are rarely inside a single function. They are in the space between functions, where one module's assumptions clash with another module's reality. Integration testing is how you find those bugs before your users do.

Try CodeBricks free

Software Testing Types covers the full landscape of testing strategies, including unit testing, back-to-back testing, white box, and black box approaches.
Unit Testing dives deep into testing individual functions in isolation and when to use mocks.
Coupling in Software Design explains how the level of coupling between modules directly affects how hard they are to integration test.