Test Doubles and Isolation

Pixel art movie production split-scene: busy green-screen rehearsals with stunt doubles (Stub, Mock, Fake) doing thousands of takes, alongside principal photography with the real stars (Real Database, Live API) for the final few shots. Film strip at bottom shows the ratio. Tagline: Rehearse with doubles. Ship with stars.

Lecture overview:

Total time: ~55 minutes (tight!)
Prerequisites: Students have JUnit basics, testing principles from earlier lectures
Connects to: A4 (testing with Mockito), prior readability work, mutation testing grading

Key framing:

We're about to scale up to larger codebases
As systems grow, testing becomes harder—dependencies multiply
This lecture gives you tools to keep tests fast, focused, and reliable
Like L13 (AI coding assistants): tools change, responsibility doesn’t
If you can’t evaluate a test (or a mock), you can’t trust it—AI or not

Structure:

Three pillars: Fast, Reliable, Finds Bugs (~10 min)
Input generation strategies (~5 min)
Test scope spectrum (unit/integration/E2E) (~10 min)
The dependency problem (~5 min)
Test doubles: stubs, fakes, spies (~10 min)
Mockito framework (~10 min)
Tradeoffs (~5 min)

→ Transition: Let's start with the learning objectives...

Poll: Did you get your GitHub Student Pack set up with CoPilot access?

A. Yes

B. Haven't tried yet

C. Tried, still waiting for it to go through

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

Poll: When did you request GitHub student access?

A. I already had it

B. Last week

C. Over the weekend

D. Monday

E. Tuesday

F. Today

G. I haven't requested it

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

CS 3100: Program Design and Implementation II

Lecture 15: Test Doubles and Isolation

Learning Objectives

After this lecture, you will be able to:

Distinguish between unit, integration, and end-to-end tests
Explain the challenge of testing code with external dependencies
Identify properties of high-quality individual tests (hermetic, clear, non-brittle)
Differentiate between stubs, fakes, and spies as types of test doubles
Apply mocking frameworks to generate test doubles
Evaluate the tradeoffs of using test doubles
Apply AI coding assistants to generate test plans and test doubles

What Makes a Good Test Suite?

Fast, reliable, and finds bugs—and individual tests must be hermetic (self-contained), clear, and non-brittle.

🎯

Finds Bugs

Actually detects defects in the code

🔒

Is Reliable

Passes when code is correct, fails when it's not

⚡

Is Fast

Runs quickly so you actually run it

Suite goals

Find bugs
Run automatically
Be cheap to run

Individual tests

Hermetic
Clear + debuggable
Not brittle (avoid unspecified behavior)
Use public APIs only

The three pillars of a good test suite:

1. Finds Bugs:

A test that passes regardless of bugs is useless
100% coverage with no assertions = 0% bug detection
We need tests that actually verify correct behavior

2. Is Reliable:

No "flaky" tests that randomly pass or fail
Deterministic: same code → same result
When it fails, you trust it found a real bug

3. Is Fast:

If tests take 10 minutes, you won't run them often
Fast feedback = catch bugs early
Slow tests = bugs slip through

The challenge: These goals can conflict! We'll explore each pillar and see how different test types make different tradeoffs.

Module 11 connection (test adequacy):

A good suite has goals (find bugs, automatic, cheap)
A good test has properties (hermetic, clear, non-brittle, public APIs only)
We’ll keep returning to these as we talk about doubles/mocks and AI

→ Transition: Now let's explore each pillar...

⚡ What Makes Tests Fast (or Slow)?

I/O dominates—memory is microseconds, network is milliseconds.

Fast (microseconds)

Creating objects in memory
Calling methods
Arithmetic, string operations

Slow (milliseconds to seconds)

File system I/O
Network calls (APIs, databases)
Starting processes/containers

🔒 What Makes Tests Reliable (or Flaky)?

Control all state → deterministic.
Depend on external world → flaky.

A "flaky" test sometimes passes, sometimes fails — same code!

Reliable (deterministic)

Same inputs → same outputs
No shared mutable state
No timing dependencies

Flaky (non-deterministic)

Network timeouts
Race conditions
External service state

Pause and ask: "Have you ever had a test that passed locally but failed in CI? Why?"

Common causes of flakiness:

Network: API might be slow, rate-limited, or down
Timing: "wait 2 seconds" might not be enough
Shared state: Another test changed the database
Environment: Different OS, timezone, locale

Module 11 connection:

Good tests are hermetic (setup/teardown; leave no trace) → prevents test-order dependency
Avoid Thread.sleep(...)-style tests → async-wait is the #1 flaky cause in empirical studies
“Test smells” often correlate with flakiness (timing assumptions, shared state)

Implication for testing:

Tests that control all state = reliable
Tests that depend on external world = flaky
If you want reliable tests, isolate from external state

→ Transition: Now let's think about bug detection...

🎯 Measuring Bug-Finding: Coverage

Coverage measures what code ran, not whether it's correct.

Code coverage measures what code your tests execute

Statement coverage: % of lines executed
Branch coverage: % of decision paths taken (if/else, loops)

Code Coverage with Factorial 1

VS Code's Test Explorer lets you run tests with coverage, showing coverage of:

lines (green for visited)
branches (green for both ways, hashed for one way)

VS Code screenshot showing partial coverage of factorial with a single base-case test

Code Coverage with Factorial 2

VS Code screenshot showing 100% coverage of factorial with base-case and recursive tests

We have 100% coverage!

Poll: What Coverage Guarantees Bug-Free Code?

A. 100% line coverage

B. 100% branch coverage

C. 100% line and branch coverage

D. none of the above

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

Coverage Can Still Miss Bugs

100% branch coverage ≠ 100% tested behaviors.

Even 100% branch coverage does not mean you tested every behavior.

public int magic(int x, int y) {
    int z;
    if (x != 0) { z = x + 10; }
    else { z = 0; }

    if (y > 0) { return y / z; } // BUG: can divide by zero
    else { return x; }
}

@Test void t1() { assertEquals(2, magic(1, 22)); }   // covers x!=0, y>0
@Test void t2() { assertEquals(0, magic(0, -10)); }  // covers x==0, y<=0

// 100% branch coverage… but magic(0, 5) throws ArithmeticException

The Truth About Testing

"Program testing can be used to show the presence of bugs, but never to show their absence!"
— Edsger W. Dijkstra

Where Do Test Inputs Come From?

Spec-driven and code-driven strategies complement each other.

Two fundamental strategies for choosing test inputs:

Spec-Driven

(from requirements)

What should the code do?
Boundary values, equivalence classes
"2 cups → should convert to 32 tbsp"

Code-Driven

(from implementation)

Look at branches, paths
Choose inputs to exercise each path
Coverage guides what's missing

AI-assisted (new reality): ask for equivalence classes + boundaries… then you validate the oracle.

The input selection problem:

There are infinitely many possible inputs to any method
We can't test them all—need to choose wisely
These two strategies complement each other

Spec-driven (sometimes called black-box):

Read the spec: "converts cups to tablespoons"
What are interesting inputs? 0, 1, fractional, negative?
Boundary values: max precision, overflow cases

Code-driven (sometimes called white-box):

Look at the if-statements and loops
Pick inputs that go down each branch
Coverage tells you what code ran, helps find untested paths

Best practice: Start with spec-driven (what SHOULD it do), then use coverage to find gaps (what did I MISS testing).

Module 11 connection (equivalence classes):

Example: ZIP code lookup has classes like “not 5 digits”, “5 digits but invalid”, “valid with one place name”, “valid with multiple”
Boundary testing: barely legal / barely illegal inputs (4 digits vs 5 digits vs 6 digits, non-numeric strings, leading zeros)

AI connection (L13):

AI is good at enumerating plausible equivalence classes and boundary cases quickly
But the Evaluate step is on you: are these classes actually relevant to the spec? are expected results correct?

Beyoncé rule (Software Engineering at Google):

“If you liked it you should have put a test on it” — ship features with tests so they keep working during maintenance

→ Transition: Let's start with what you already know...

Example-Based Testing: What You Know

You choose specific inputs from requirements or by examining code.

You write specific test cases with specific inputs

@Test
void shouldConvert2CupsTo32Tablespoons() {
    ExactQuantity twoCups = new ExactQuantity(2, Unit.CUP);

    Quantity result = twoCups.convertTo(Unit.TABLESPOON, registry);

    assertThat(result.getAmount()).isEqualTo(32);  // 2 cups × 16 tbsp/cup
}

Parameterized Tests: Same Logic, Many Inputs

Write the test logic once; run it with many inputs.

From the HW3 handout — test the same behavior with different data

static Stream<Arguments> toStringTestCases() {
    return Stream.of(
        Arguments.of(1, "Preheat oven", "1. Preheat oven"),
        Arguments.of(2, "Mix ingredients", "2. Mix ingredients"),
        Arguments.of(10, "Serve warm", "10. Serve warm"),
        Arguments.of(99, "Final step", "99. Final step"));
}

@ParameterizedTest(name = "step {0}: \"{1}\" -> \"{2}\"")
@MethodSource("toStringTestCases")
void shouldFormatCorrectly(int stepNumber, String text, String expected) {
    Instruction instruction = new Instruction(stepNumber, text, List.of());
    assertThat(instruction.toString()).isEqualTo(expected);
}

Fuzzing: Generate LOTS of Inputs

Generate millions of inputs to find crashes you'd never imagine.

Automatically generate inputs to find crashes and bugs

Graybox Fuzzing (used by security researchers):

Start with sample inputs, randomly mutate them
Monitor code coverage to guide mutation
If mutation covers new code → keep it, mutate further
Run millions of inputs per second

Google's OSS-Fuzz has found 10,000+ bugs in open-source projects

Unit Tests

Unit Tests: ⚡ Fast + 🔒 Reliable

Test one unit in isolation: fast, focused, deterministic.

Test a single "unit" — typically one class — in isolation

⚡ Fast: Run in milliseconds (no I/O, no network)
🔒 Reliable: No external state = deterministic
✓ Focused: When it fails, you know exactly where to look

The challenge: achieving isolation when code has dependencies

Unit Test Example: From HW3 Handout

If it fails, you know exactly where the bug is.

Testing Instruction.toString() — one class, no dependencies

@ParameterizedTest(name = "step {0}: \"{1}\" -> \"{2}\"")
@DisplayName("should format as stepNumber. text")
void shouldFormatCorrectly(int stepNumber, String text, String expected) {
    Instruction instruction = new Instruction(stepNumber, text, List.of());

    assertThat(instruction.toString()).isEqualTo(expected);
}

// Test cases: (1, "Preheat oven", "1. Preheat oven")
//             (2, "Mix ingredients", "2. Mix ingredients")
//             (10, "Serve warm", "10. Serve warm")

Unit Tests Miss Some Problems

Integration Tests: The Middle Ground

Test components together to catch seam bugs.

Test multiple components interacting

⚡≈ Moderate speed: Some I/O, but local
🔒≈ Mostly reliable: Controlled environment
🎯✓ Catches seam bugs: Serialization, protocols, formats

Do components communicate correctly? Agree on data formats?

The Mars Climate Orbiter Disaster

Integration Test Example: From HW3 Handout

Round-trip tests catch serialization and format issues.

Testing Recipe + JsonRecipeRepository + Jackson + file system together

@Test
@DisplayName("round-trip preserves recipe with all fields")
void roundTripPreservesRecipeWithAllFields() {
    Recipe recipe = new Recipe("test-id", "Chocolate Cake",
        new ExactQuantity(8, Unit.WHOLE),
        List.of(new MeasuredIngredient("flour",
            new ExactQuantity(2, Unit.CUP), null, null)),
        List.of(new Instruction(1, "Mix ingredients", List.of())),
        List.of());

    repository.save(recipe);  // Writes JSON to file system
    Optional<Recipe> loaded = repository.findById("test-id");

    assertTrue(loaded.isPresent());
    assertEquals(recipe, loaded.get());
}

End-to-End Tests: 🎯 Finds Real Bugs

Test the whole system as users experience it—slow but realistic.

Test the entire system as a user would experience it

⚡✗ Slow: Seconds or minutes per test
🔒✗ Flaky: Network glitches, timing issues
🎯✓ Realistic: Tests what users actually experience

The opposite tradeoff: sacrifice speed and reliability for realism

E2E Example: Cook Your Books Import-to-Export

One test, many systems—and many potential failure points.

One test, five systems: Image → OCR → Parser → Repository → Exporter

The Practical Mix

E2E tests for critical journeys; unit tests for edge cases.

Why not just write E2E tests for everything?

Imagine testing 15 edge cases for ImportService:

	Unit Tests	E2E Tests
Example	ImportService logic	Full OCR-to-export workflow
Time to run	~150ms total	~30 seconds
Flakiness	None	OCR API, file system
Debug time	Seconds	Minutes to hours

The Test Pyramid

Test pyramid showing unit tests at base, integration in middle, E2E at top

The Dependency Problem

How do we test code that depends on databases, networks, hardware?

Consider SubmissionService from Pawtograder:

public class SubmissionService {
    private final GitHubService github;
    private final AutograderRunner autograder;
    private final NotificationService notifier;
    private final Database database;

    public SubmissionService(GitHubService github,
                             AutograderRunner autograder,
                             NotificationService notifier,
                             Database database) {
        this.github = github;
        this.autograder = autograder;
        this.notifier = notifier;
        this.database = database;
    }
    // ...
}

The Method Under Test

Testing side effects requires controlling dependencies.

public GradeResult processSubmission(Submission submission) {
    CodeSnapshot code = github.fetchCode(submission.repoUrl());
    TestResult result = autograder.runTests(code);

    if (result.allPassed()) {
        database.saveGrade(submission.studentId(), result.score());
        notifier.send(submission.studentEmail(),
                      "Your submission passed!", NotificationLevel.INFO);
    } else {
        notifier.send(submission.studentEmail(),
                      "Some tests failed", NotificationLevel.WARNING);
    }
    return new GradeResult(submission.studentId(), result);
}

To test this, we'd need real GitHub, real containers, real email!

Test Doubles: Stand-Ins for Real Dependencies

Stubs return canned answers; fakes work simply; spies record calls.

Stubs

Return canned answers

Fakes

Simplified implementations

Spies

Record what happened

From simplest to most sophisticated

Stubs: Return Canned Answers

Ignore details you don't care about; return what you need.

class StubGitHubService implements GitHubService {
    private final CodeSnapshot fixedCode;

    public StubGitHubService(CodeSnapshot code) {
        this.fixedCode = code;
    }

    @Override
    public CodeSnapshot fetchCode(String repoUrl) {
        return fixedCode;  // Always returns the same code
    }
}

Ignores the repo URL, always returns sample code — that's fine!

Spies: Record What Happened (Decorator Pattern)

Wrap, record, delegate—verify interactions after the fact.

class SpyDatabase implements Database {
    private final Database delegate;  // Wraps a real implementation
    private boolean saveGradeCalled = false;
    private String savedStudentId = null;

    public SpyDatabase(Database realDatabase) {
        this.delegate = realDatabase;  // Decorator pattern!
    }

    @Override
    public void saveGrade(String studentId, int score) {
        this.saveGradeCalled = true;       // Record the call
        this.savedStudentId = studentId;
        delegate.saveGrade(studentId, score);  // Delegate to real impl
    }

    // Query methods for tests
    public boolean wasSaveGradeCalled() { return saveGradeCalled; }
    public String getSavedStudentId() { return savedStudentId; }
}

A Complete Test with Hand-Rolled Doubles

Stubs + spies enable fast, focused tests.

@Test
public void savesGradeWhenAllTestsPass() {
    StubGitHubService stubGithub = new StubGitHubService(sampleCode());
    StubAutograderRunner stubAutograder = new StubAutograderRunner(
        new TestResult(true, 100));  // All tests pass, score 100
    SpyDatabase spyDatabase = new SpyDatabase();
    StubNotificationService stubNotifier = new StubNotificationService();

    SubmissionService service = new SubmissionService(
        stubGithub, stubAutograder, stubNotifier, spyDatabase);

    service.processSubmission(new Submission("student123", "repo-url", "email"));

    assertTrue(spyDatabase.wasSaveGradeCalled());
    assertEquals("student123", spyDatabase.getSavedStudentId());
    assertEquals(100, spyDatabase.getSavedScore());
}

The Pain of Hand-Rolling Test Doubles

Four classes for one test? That doesn't scale.

We wrote four separate classes just to test one method!

StubGitHubService
StubAutograderRunner
SpyDatabase
StubNotificationService

And we haven't tested failures, timeouts, edge cases...

Do we need a new stub class for every test result?

Mockito is a Test Framework that Dynamically Creates Mocks

A mocking framework generates test doubles at runtime

@Test
public void savesGradeWhenAllTestsPass() {
    // Create test doubles — Mockito generates these at runtime
    GitHubService mockGithub = mock(GitHubService.class);
    AutograderRunner mockAutograder = mock(AutograderRunner.class);
    NotificationService mockNotifier = mock(NotificationService.class);
    Database mockDatabase = mock(Database.class);

    // Configure stub behavior
    when(mockGithub.fetchCode(anyString())).thenReturn(sampleCode());
    when(mockAutograder.runTests(any())).thenReturn(new TestResult(true, 100));

    SubmissionService service = new SubmissionService(
        mockGithub, mockAutograder, mockNotifier, mockDatabase);
    service.processSubmission(new Submission("student123", "repo-url", "email"));

    // Verify spy recordings
    verify(mockDatabase).saveGrade("student123", 100);
    verify(mockNotifier).send(eq("email"), contains("passed"), any());
}

Mockito: Create, Configure, Verify

mock(), when().thenReturn(), verify()—the three operations you'll use most.

`mock(Class.class)`	Create a test double
`when(...).thenReturn(...)`	Configure stub behavior
`verify(mock).method(...)`	Check spy recordings
`when(...).thenThrow(...)`	Simulate exceptions

Mockito uses reflection to generate implementations at runtime

AI for Test Doubles (Especially Mockito)

AI generates boilerplate; you evaluate whether mocks match reality.

AI assistants are great at generating boilerplate if you keep evaluation in the loop.

Plan first: write the behaviors you need to simulate and verify
Generate: ask AI to produce when(...)/verify(...) scaffolding
Review: does the mock reflect real dependency behavior?
Break it: introduce a bug — does the test fail?

Argument Matchers: Flexible Matching

Match any value, exact values, or custom predicates.

// Return sample code for ANY repo URL
when(mockGithub.fetchCode(anyString())).thenReturn(sampleCode());

// Verify saveGrade was called with any student ID
verify(mockDatabase).saveGrade(anyString(), anyInt());

// Verify with custom condition
verify(mockNotifier).send(
    anyString(),
    argThat(message -> message.contains("passed")),
    eq(NotificationLevel.INFO)
);

anyString(), any(), eq(), argThat()

Fakes: When You Need Real Behavior

When you need save-then-retrieve, use a working in-memory implementation.

class FakeUserRepository implements UserRepository {
    private final Map<String, User> users = new HashMap<>();

    @Override
    public void save(User user) {
        users.put(user.getId(), user);
    }

    @Override
    public User findById(String id) {
        return users.get(id);
    }

    @Override
    public List<User> findAll() {
        return new ArrayList<>(users.values());
    }
}

A working implementation — just simpler than the real database

The Good: Why Test Doubles Work

Speed, determinism, isolation, and easy error simulation.

✓ Speed: Tests run in milliseconds
✓ Determinism: No flaky tests from real APIs
✓ Isolation: Failures point to the code under test
✓ Edge cases: Easy to simulate errors

// Simulating a GitHub API failure — one line!
when(mockGithub.fetchCode(anyString()))
    .thenThrow(new GitHubException("API rate limit exceeded"));

Dependency Injection Simplified

Tweet from patrick thomson @importantshock
I love to use [1] dependency [2] injection [3].

1 pass
2 values
3 functions

8:58 AM - 17 Jan 2019

The Dangerous: False Confidence

False confidence from testing with mocks

The Dangerous: Brittle Tests

Test behavior, not implementation details.

Brittle tests rely on unspecified behaviors:

// BRITTLE: Assumes Set iteration order (unspecified!)
assertThat(recipe.getTags().toString())
    .isEqualTo("[vegetarian, quick, healthy]");

// BRITTLE: Exact string match on unspecified message format
verify(mockNotifier).send(eq("email@example.com"),
    eq("Your submission passed!"), any());

// BETTER: Test the behavior, not the implementation
assertThat(recipe.getTags()).containsExactlyInAnyOrder(
    "vegetarian", "quick", "healthy");

Brittle tests rely on unspecified behaviors:

Set/Map iteration order (not guaranteed!)
Exact string formatting of messages
Internal method names that could be renamed
Order of operations that doesn't affect correctness

Brittle test symptoms:

Tests break when you refactor, even though behavior is unchanged
Passes on your machine, fails in CI (different JVM, OS)
Spend more time fixing tests than writing features

The deeper problem:

Are you testing BEHAVIOR or IMPLEMENTATION?
"Tags should contain these values" vs "Tags.toString() equals this exact string"
The first is behavior; the second is implementation detail

Readability connection (L5): Tests are documentation. If you're documenting implementation details, you're documenting the wrong thing.

→ Transition: Guidelines for when to use test doubles...

When to Use Test Doubles

If mock setup is more complex than the code, reconsider.

Use test doubles when:

Dependency is slow or unreliable
Need to simulate error conditions
Dependency has side effects (emails, charges)
Want to verify interactions

Be cautious when:

Mock setup is more complex than code
Verifying implementation, not behavior
Mocking types you don't own

If mock setup is getting elaborate, consider an integration test instead

Mutation Testing

Connection to Your Assignments

Mutation testing grades whether your tests catch bugs, not just pass.

Your test suites are graded by mutation testing

We introduce bugs (mutations) into your code
Your tests should catch those bugs
A test that passes with the bug present = weak test

It's not enough for tests to pass — they must detect bugs!

AI for Test Planning (Plan Mode)

Plan first, generate drafts, then evaluate.

AI assistants are particularly good at generating drafts — especially when you start from a plan.

Turning “behaviors to test” into test skeletons
Enumerating edge cases (equivalence classes, boundaries)
Generating Mockito boilerplate (when/verify)
Creating test data and fixtures

Key rule (L13): only ask AI to produce what you can evaluate.

The Test-First Evaluation Trick

If you can't describe the oracle, you can't evaluate the AI's tests.

Before you ask AI for tests, answer this yourself:

“What would a good test check?”

What behavior should change for pass vs fail?
What’s the oracle (strong, not “doesn’t crash”)?
What inputs hit boundaries / equivalence classes?
What side effects must be verified?

If you can’t answer these, you’re in the “low familiarity” danger zone (L13): you can’t evaluate the output.

Summary

Test scope spectrum: Unit (fast/focused) → Integration → E2E (slow/complete)
Good tests: hermetic, clear, non-brittle, with strong oracles
Test doubles stand in for real dependencies
Stubs return canned answers; spies record interactions; fakes are simplified implementations
Mockito generates test doubles at runtime
Tradeoffs: Speed/isolation vs false confidence/brittleness
Mutation testing: Tests must detect bugs, not just pass (coverage helps find gaps)
AI for testing: plan first, generate drafts, then evaluate (don’t “vibe test”)

Next Steps

Reading: Lecture notes, Mockito documentation (linked on course site)
Next lecture: Designing for Testability

Bonus Slide

Reddit post from r/ProgrammerHumor titled 'Found this at work. Someone padded a repo with thousands of lines like this to pass a 75% code coverage check.' Shows Java code with a method called fakeCoverage() containing 'Integer i = 0;' followed by dozens of identical 'i++;' statements to artificially inflate line count and code coverage metrics.

Poll: Did you get your GitHub Student Pack set up with CoPilot access?​

Poll: When did you request GitHub student access?​

CS 3100: Program Design and Implementation II

Lecture 15: Test Doubles and Isolation​

Learning Objectives​

What Makes a Good Test Suite?​

⚡ What Makes Tests Fast (or Slow)?​

🔒 What Makes Tests Reliable (or Flaky)?​

🎯 Measuring Bug-Finding: Coverage​

Code Coverage with Factorial 1​

Code Coverage with Factorial 2​

Poll: What Coverage Guarantees Bug-Free Code?​

Coverage Can Still Miss Bugs​

The Truth About Testing​

Where Do Test Inputs Come From?​

Example-Based Testing: What You Know​

Parameterized Tests: Same Logic, Many Inputs​

Fuzzing: Generate LOTS of Inputs​

Unit Tests​

Unit Tests: ⚡ Fast + 🔒 Reliable​

Unit Test Example: From HW3 Handout​

Unit Tests Miss Some Problems​

Integration Tests: The Middle Ground​

The Mars Climate Orbiter Disaster​

Integration Test Example: From HW3 Handout​

End-to-End Tests: 🎯 Finds Real Bugs​

E2E Example: Cook Your Books Import-to-Export​

The Practical Mix​

The Test Pyramid​

The Dependency Problem​

The Method Under Test​

Test Doubles: Stand-Ins for Real Dependencies​

Stubs: Return Canned Answers​

Spies: Record What Happened (Decorator Pattern)​

A Complete Test with Hand-Rolled Doubles​

The Pain of Hand-Rolling Test Doubles​

Mockito is a Test Framework that Dynamically Creates Mocks​

Mockito: Create, Configure, Verify​

AI for Test Doubles (Especially Mockito)​

Argument Matchers: Flexible Matching​

Fakes: When You Need Real Behavior​

The Good: Why Test Doubles Work​

Dependency Injection Simplified​

The Dangerous: False Confidence​

The Dangerous: Brittle Tests​

When to Use Test Doubles​

Mutation Testing​

Connection to Your Assignments​

AI for Test Planning (Plan Mode)​

The Test-First Evaluation Trick​

Summary​

Next Steps​

Bonus Slide​

Poll: Did you get your GitHub Student Pack set up with CoPilot access?

Poll: When did you request GitHub student access?

Lecture 15: Test Doubles and Isolation

Learning Objectives

What Makes a Good Test Suite?

⚡ What Makes Tests Fast (or Slow)?

🔒 What Makes Tests Reliable (or Flaky)?

🎯 Measuring Bug-Finding: Coverage

Code Coverage with Factorial 1

Code Coverage with Factorial 2

Poll: What Coverage Guarantees Bug-Free Code?

Coverage Can Still Miss Bugs

The Truth About Testing

Where Do Test Inputs Come From?

Example-Based Testing: What You Know

Parameterized Tests: Same Logic, Many Inputs

Fuzzing: Generate LOTS of Inputs

Unit Tests

Unit Tests: ⚡ Fast + 🔒 Reliable

Unit Test Example: From HW3 Handout

Unit Tests Miss Some Problems

Integration Tests: The Middle Ground

The Mars Climate Orbiter Disaster

Integration Test Example: From HW3 Handout

End-to-End Tests: 🎯 Finds Real Bugs

E2E Example: Cook Your Books Import-to-Export

The Practical Mix

The Test Pyramid

The Dependency Problem

The Method Under Test

Test Doubles: Stand-Ins for Real Dependencies

Stubs: Return Canned Answers

Spies: Record What Happened (Decorator Pattern)

A Complete Test with Hand-Rolled Doubles

The Pain of Hand-Rolling Test Doubles

Mockito is a Test Framework that Dynamically Creates Mocks

Mockito: Create, Configure, Verify

AI for Test Doubles (Especially Mockito)

Argument Matchers: Flexible Matching

Fakes: When You Need Real Behavior

The Good: Why Test Doubles Work

Dependency Injection Simplified

The Dangerous: False Confidence

The Dangerous: Brittle Tests

When to Use Test Doubles

Mutation Testing

Connection to Your Assignments

AI for Test Planning (Plan Mode)

The Test-First Evaluation Trick

Summary

Next Steps

Bonus Slide