Skip to main content
Pixel art: SceneItAll hub with a speedometer. LEFT side shows slug-slow sequential device chain in orange. RIGHT side shows fast parallel processing in teal with green checkmarks. A magnifying glass labeled PROFILE hovers over the hub. Tagline: Measure, Don't Guess.

CS 3100: Program Design and Implementation II

Lecture 34: Performance

©2026 Jonathan Bell, CC-BY-SA

Learning Objectives

After this lecture, you will be able to:

  1. Reason about algorithmic growth using Big-O notation
  2. Identify performance bottlenecks: measure, don't guess
  3. Analyze the performance impact of architectural decisions
  4. Apply common patterns to improve performance

Performance Touches Everything We've Learned — But Performance for Whom?

LecturePerformance concept
L3 (Java Collections)ArrayList vs LinkedList
L20 (Networks)Network latency, caching
L31 (Concurrency I)Thread overhead
L32 (Concurrency II)I/O scaled-time table
L33 (Event Architecture)Rate limiting

Today we bring those threads together. Core principle: measure, don't guess.

L14's scientific method applies here too — hypothesize, measure, iterate.

Big-O Describes How Code Scales, Not How Fast It Is

NotationNameSceneItAll example
O(1)ConstantHashMap lookup by ID
O(log n)LogarithmicBinary search sorted devices
O(n)LinearIterate all devices by name
O(n log n)LinearithmicSort 1,000 devices
O(n²)QuadraticCompare every device pair

Recognize Complexity in Code Without Proofs

// O(1) — constant: one operation regardless of collection size
Device device = deviceMap.get(deviceId);

// O(n) — linear: one loop through the collection
for (Device d : devices) {
if (d.getName().equals(name)) return d;
}

// O(n²) — quadratic: nested loops over the same collection
for (Device a : devices) {
for (Device b : devices) {
if (a != b && a.getRoom().equals(b.getRoom())) {
// compare every pair — grows with n²
}
}
}

// O(n log n) — sorting
Collections.sort(devices, Comparator.comparing(Device::getBrightness));

The practical test: "If I double the input, how much slower?" O(n) = 2x. O(n²) = 4x.

Constants Don't Matter — Until They Do

// O(n) total: each ArrayList.get(i) is O(1)
for (int i = 0; i < arrayList.size(); i++) {
process(arrayList.get(i)); // ~1 ns per element (cache-friendly)
}
// O(n²) total: each LinkedList.get(i) walks up to O(n) nodes
for (int i = 0; i < linkedList.size(); i++) {
process(linkedList.get(i)); // pointer chasing; cache-unfriendly
}

Walking the whole list with an iterator is O(n) for both. But ArrayList is 10-100x faster due to cache locality.

"Big-O says they're the same. Your users disagree."

Big-O Matters When n Is Large — or When Each Operation Is Expensive

Big-O doesn't tell you about constant factors or GC pressure. "This is why we still need profiling."

You Cannot Trust Your Intuition About Where Time Is Spent

Code that looks slow may be irrelevant to overall runtime.

Code that looks fast may be called millions of times and dominate the profile.

"Is this method inherently expensive, or is it being called too many times?"

  • Inherently expensive → optimize the algorithm
  • Called too many times → cache or batch

Flame Graphs Show Where Time Goes

Simplified flame graph: computeOptimal() is a wide red bar (40% CPU, the bottleneck) while getById() is a tiny blue bar (2% CPU, negligible despite high call count)

Tools (know they exist, not details): JFR (Java Flight Recorder, built into JDK), flame graphs, heap dumps

Performance Has Several Dimensions That Trade Off

MetricWhat it measuresExample
LatencyTime for a single operation"How long until the user sees their grade?"
ThroughputOperations per unit time"How many submissions per minute?"
MemoryHeap/stack consumption"How much RAM for 1,000 devices?"
CPUProcessor time consumed"CPU-bound or I/O-bound?" (recall L32)

Optimizing one can worsen another: caching reduces latency but increases memory.

Data Location Determines Performance

Architecture determines where data lives. Monolith = RAM (5 min). Microservices = network (5 years).

ArrayList Wins Because of Cache Lines, Not Algorithms

Recall L3: "Use ArrayList by default." Now we explain why.

Both are O(n) to iterate. ArrayList is 10-100x faster because of cache locality.

Latency Budgets: Where Does the Time Actually Go?

Optimizing the 5ms computation to 1ms saves 4ms — irrelevant. Batching Zigbee saves 100ms+ — significant.

Hub runs great on Pi 5 — but 80% of deployed hubs are Pi 3s. Optimizing for modern hardware excludes existing users.

You've Lived Inside This Latency Budget

Infrastructure dominates everything. It typically takes 2-3 minutes just to go from push to running tests. Optimizing test execution from 10s to 8s saves 2s — irrelevant compared to the infrastructure overhead.

Amazon found every 100ms of added latency cost them 1% of sales. L20 Fallacy 2: latency is not zero.

Caching: The Fastest Operation Is the One That Doesn't Happen

Before: compute every time

public SceneSettings getSettings(
Scene scene, SensorData sensors) {
// 50ms each time
return settingsEngine
.computeOptimal(scene, sensors);
}

After: cache by inputs — O(f(n)) → O(1)

private final Map<CacheKey, SceneSettings>
cache = new ConcurrentHashMap<>();

public SceneSettings getSettings(
Scene scene, SensorData sensors) {
return cache.computeIfAbsent(
new CacheKey(scene.getId(),
sensors.hash()),
k -> settingsEngine
.computeOptimal(scene, sensors));
}

Cache when: same inputs, staleness acceptable. Don't cache: inputs always change, or staleness is unsafe (L33).

Batching: Amortize the Fixed Cost Across Many Items

Before: 15 calls × 200ms = 3 seconds

for (Device device : devices) {
zigbee.sendCommand(device, command);
}

After: 1 batch call = 200ms

zigbee.sendBatch(devices, command);
WhereThe problemThe fix
DatabaseN queries for N records (N+1 problem)One query with JOIN
NetworkN API callsBatch endpoint
File I/OWrite one byte at a timeBuffered writer

Pooling: Reuse Expensive Resources

// Thread pool: reuse threads instead of creating new ones per task
ExecutorService pool = Executors.newFixedThreadPool(10);

You've already used this pattern — ExecutorService in L31 is a thread pool.

ResourcePool typeWhy it matters
ThreadsThread pool (L31)Each thread costs 512KB-1MB of stack
DB connectionsConnection poolOpening a connection: ~10ms TCP + auth
HTTP connectionsHTTP keep-aliveReuse TCP connections across requests

Creating the resource once and reusing it is always faster than creating and destroying per use.

Premature Optimization Is the Root of All Evil

"Premature optimization is the root of all evil." — Donald Knuth

Every optimization increases coupling (L7): cached values must be invalidated, batched operations add complexity, pooled resources must be managed.

Caching, batching, and pooling all create objects on the heap. Who cleans them up?

Automatic Memory Management Trades Performance for Safety

C/C++ (manual)

Full control over when memory is freed.

But: use-after-free, double-free, memory leaks. Among the most dangerous bugs in software engineering.

Java/Python/JS (automatic)

Garbage collector decides when to free.

You cannot use-after-free in Java. Trade-off: GC may pause your application at inconvenient times.

Automatic memory management is overwhelmingly the right default — the bugs it prevents are far more costly than the performance it sacrifices.

Use-After-Free: How a PNG Can Root Your Phone

This is not hypothetical — Apple's FORCEDENTRY (2021) used exactly this pattern. NSO Group exploited a bug in Apple's image parser to install Pegasus spyware. No user interaction required.

GC Is Everywhere, Not Just the JVM

SystemWhat it managesHow it reclaimsPerformance cost
JVM GCHeap objectsMark-and-sweep: trace from roots, free unreachableGC pauses (ms to seconds)
PostgreSQLTable rowsVACUUM: find dead rows from old transactions, reclaim spaceVACUUM pauses, table bloat
File systemDisk blocksReference counting + periodic GC of orphaned blocksBackground I/O
KafkaLog segmentsRetention policy: delete segments older than N daysDisk cleanup spikes

Same pattern at every level: automatic reclamation of unused resources, background cost.

"A database is a big list with well-maintained indexes — and its own garbage collector."

Comprehension Check

Open Poll Everywhere and answer the three questions.

Performance Trade-offs Aren't Neutral — They Distribute Costs

Performance trade-offs distribute benefits and costs across users:

  • Accessibility + Inclusivity: Poor performance on constrained devices = exclusion
  • Environmental Sustainability: 10% efficiency in code running billions of times matters
  • SceneItAll: Hub runs great on Pi 5, but most deployed hubs are Pi 3s

Same safety-vs-performance trade-off at every level:

  • Strong consistency is slower but safer
  • Error handling adds complexity but prevents silent failure
  • Staged rollouts are slower but limit blast radius

Forward to L36: Jevons' paradox — efficiency enables more usage, not less total consumption.

Looking Ahead: L35

Wednesday: Safety and Reliability

We've been treating performance as "making things faster." But what happens when the safety mechanism you removed for performance is the one that would have prevented harm?

  • Therac-25: Replaced hardware interlocks with software for speed — killed six patients
  • Boeing 737 MAX: Single sensor, no pilot training — 346 killed
  • CrowdStrike Falcon: Skipped staged rollout — 8.5 million machines bricked

Today: where does time go, and how do we spend it wisely?
Wednesday: what happens when the system fails, and who gets hurt?