A pixel art split-scene: LEFT shows a stressed student with their car in a snowy lot — $2,200 parking permit, insurance bills, gas costs, car sitting unused. RIGHT shows the same student relaxed, tapping CharlieCard on the MBTA Green Line, sharing the train with others. BUT: MBTA signs show 'No trains past Government Center' and 'Service ends 12:30 AM' — you play by their rules. Tagline: Shared Infrastructure. Shared Rules. Shared Savings.

Lecture overview:

Total time: ~65-70 MINUTES
Prerequisites: L19 (monoliths, modular monoliths, architectural styles), L20 (distributed systems, eight fallacies, REST, security)
Connects to: L22 (teams and collaboration), L31-33 (concurrency, event architecture)

Structure:

Warm-up: Image Resize Example (~8 min) — concrete feature → server code → infrastructure iceberg
Recap: From Distributed to Serverless (~3 min) — connects L19/L20 arc
The Cloud Deployment Spectrum (~5 min) — builds on the "iceberg" foundation
What Does Your App Need? (~3 min) — bridge: problems that lead to building blocks
Infrastructure Building Blocks (~10 min) — databases, storage, queues, caches, observability
Defining Serverless + FaaS (~8 min) — includes Lambda code examples with S3 triggers
Energy Efficiency (~3 min) — sustainability tradeoffs
Requirements Fit (~3 min) — good fit vs poor fit
Connection to Course Concepts (~5 min) — information hiding at scale, same questions every scale
Bringing It Together + L22 Preview (~5 min)

Key theme: Serverless is technical partitioning with a vendor — you write functions, they operate infrastructure. It's not magic; it's a point on a spectrum with real tradeoffs. The same architectural principles (information hiding, hexagonal architecture) apply at this scale.

Important pedagogical note: Many students have never deployed code to anything other than their own laptop. The foundational section bridges this gap before diving into cloud deployment models.

→ Transition: Let's start with the title...

CS 3100: Program Design and Implementation II

Lecture 21: Serverless Architecture

Learning Objectives

After this lecture, you will be able to:

Recognize common infrastructure building blocks (databases, queues, caches, object storage, observability) and their architectural roles
Define "serverless" architecture and Functions as a Service (FaaS) concepts
Compare serverless to traditional and container-based architectures, identifying tradeoffs
Identify requirements that are well-suited or poorly-suited for serverless
Apply a decision framework for choosing between architectural styles based on team size, scaling needs, and operational capacity

Important framing: You will encounter serverless systems in internships and jobs. The goal is to understand why teams choose serverless and reason about whether it fits a given problem — not to become a serverless architect overnight.

Warm-Up: A Simple Feature

You're building an app that lets users upload photos. You need a feature: resize images to create thumbnails. The core logic is straightforward — we will use an image processing library.

public class ImageUtils {
    public static byte[] resize(byte[] imageData, int width, int height) {
        BufferedImage original = ImageIO.read(new ByteArrayInputStream(imageData));
        BufferedImage thumbnail = new BufferedImage(width, height, original.getType());
        
        Graphics2D g = thumbnail.createGraphics();
        g.drawImage(original, 0, 0, width, height, null);
        g.dispose();
        
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ImageIO.write(thumbnail, "jpg", out);
        return out.toByteArray();
    }
}

This is the easy part. ~15 lines of code. You could write this in an hour.

But how do users actually send you an image? This code runs on your laptop. Your users are... not on your laptop.

Now Make It a Service

For users to access your image resizer, you need an HTTP server that listens for requests. Now look how much code you need:

public class ImageResizeServer {
    private static final int PORT = 8080;
    private static volatile boolean running = true;

    public static void main(String[] args) throws Exception {
        // YOU set up the server
        HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
        server.createContext("/resize", exchange -> {
            byte[] imageData = parseMultipartUpload(exchange);  // Parsing uploads is painful
            byte[] thumbnail = ImageUtils.resize(imageData, 200, 200);  // ← Your actual logic
            exchange.sendResponseHeaders(200, thumbnail.length);
            exchange.getResponseBody().write(thumbnail);
            exchange.close();
        });
        server.createContext("/health", ex -> {  // Load balancers need this
            ex.sendResponseHeaders(200, 2);
            ex.getResponseBody().write("OK".getBytes());
            ex.close();
        });

        // YOU handle graceful shutdown
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            running = false;
            server.stop(5);
        }));

        server.start();
        System.out.println("Server running on port " + PORT);
        while (running) { Thread.sleep(1000); }  // Runs FOREVER until killed
    }
}

Your 15 lines of business logic are now buried in 30+ lines of server boilerplate. And we haven't even talked about where this server runs...

Where Does This Server Run?

You can run java ImageResizeServer on your laptop. But your laptop sleeps when you sleep. Your users are in different time zones.

Split scene: LEFT shows a peaceful dorm room at 2:47 AM — student asleep, laptop closed, everything quiet. RIGHT shows a server at the SAME 2:47 AM, anthropomorphized as an exhausted worker surrounded by incoming requests from Tokyo, London, Sydney — it can never sleep. Tagline: When you close your laptop, GitHub is still answering requests from 20 million developers.

The Infrastructure Iceberg

Your ImageResizeServer needs to run somewhere. Running code that serves users around the clock requires far more than just "a computer." This is what we call infrastructure.

An iceberg diagram: Above water (small) is 'Your App' — the code you write. Below water (massive) are the infrastructure layers: Runtime Environment, Operating System, Hardware, Network, Power & Cooling, Physical Space. Tagline: Your app is just the tip of the iceberg.

Walk through each layer (bottom to top):

Physical space:

"Someone needs a building. With doors that lock."
"Fire suppression systems. Security guards. Insurance."

Power and cooling:

"Computers need electricity. Constant electricity."
"They generate heat — a lot of it. Need AC running 24/7."
"Power goes out? Your app goes down."

Network:

"Need an internet connection. A FAST one."
"Not your home WiFi — enterprise-grade networking"
"Multiple redundant connections so one failure doesn't kill you"

The computer:

"Actual physical hardware. CPUs, RAM, hard drives."
"Hard drives fail. RAM goes bad. CPUs overheat."
"Someone needs to notice and replace them."

Operating system:

"Linux, Windows Server — manages the hardware"
"Security patches. Updates. Configuration."

Runtime:

"Java needs a JVM. Python needs an interpreter."
"These need to be installed, configured, kept up to date."

Your app (finally!):

"All that infrastructure... just so YOUR code can run"
"The iceberg visual: your code is the tiny visible tip"
"Your ImageResizeServer with its 15 lines of business logic... sits on top of ALL of this"

Key insight:

"Infrastructure = everything beneath your code"
"Someone has to manage ALL of these layers"
"The question is: who manages this for your image resize service?"

→ Transition: You have two choices for who manages this...

Two Choices: Own It or Rent It

You have a fundamental choice about who manages all that infrastructure. This is the core tradeoff that defines cloud computing — and it's exactly like choosing between owning a car and taking the T.

Own Your Infrastructure

Buy servers. Rent data center space. Hire ops engineers. Configure everything yourself.

Total control — choose any hardware, any software
Predictable costs at high scale
You're responsible for everything: uptime, security, maintenance
When something breaks at 3 AM, your phone rings

Rent from a Cloud Provider

AWS, Google Cloud, Azure, Supabase — they own the data centers, you use their services.

Less control — work within their constraints
Pay-per-use pricing (can be cheaper... or more expensive)
They handle most operational concerns
When their stuff breaks at 3 AM, their phone rings

This is the same tradeoff you make with transportation in Boston. Own a car? Total freedom, but you pay for parking, insurance, gas, maintenance — even when it sits unused. Take the T? Cheaper per trip, but you follow their schedule and routes.

The ownership tradeoff:

"This is a fundamental decision in software engineering"
"Own vs. rent — it applies to infrastructure just like cars"

Own it:

"Companies like Google and Amazon own their data centers"
"They have tens of thousands of servers, teams of engineers"
"When you're at that scale, it makes sense"
"But for a small team or a class project? Probably not."

Rent it:

"Cloud providers own massive infrastructure"
"They spread the cost across thousands of customers"
"You rent a small slice"
"Like an apartment vs. owning a house"

The Boston analogy:

"This should feel familiar to any of you who've debated bringing a car to Boston"
"Parking at Northeastern: $2,200/semester"
"CharlieCard: $2.40/ride"
"If you only need to go somewhere occasionally, the math is obvious"
"But if you need to go somewhere the T doesn't go... you're stuck"

Foreshadowing serverless:

"Serverless is like ONLY paying for the rides you take"
"No monthly parking fee. No insurance. Just per-trip cost."
"But you REALLY have to follow their rules."

→ Transition: Let's see this as a spectrum...

Recap: From Distributed Systems to Serverless

In L19, we explored architectural styles — monoliths, modular monoliths, and the tradeoffs between them. In L20, we crossed the network boundary and discovered the Eight Fallacies of Distributed Computing.

L19: How do we organize code?

Architectural styles emerge from quality attribute requirements. Monolith-first is usually right.

L20: What changes over networks?

The eight fallacies. Latency, failures, security boundaries. Every call needs timeout + retry.

L21: What if someone else manages it?

Serverless = technical partitioning with a vendor. You write functions; they operate infrastructure.

Today's key insight: Serverless doesn't eliminate distributed systems complexity — it shifts who deals with it. The eight fallacies still apply. You just don't write the retry logic yourself.

Connect to what we just covered:

"We just saw the iceberg — all that infrastructure beneath your code"
"The 'own it vs. rent it' choice"
"Now let's connect this to our architectural journey from L19 and L20"

Connect the arc:

L19: Inside a single deployment — how do we structure code?
L20: Across deployments — what changes when components talk over networks?
L21: Who operates the infrastructure those deployments run on?

The key framing:

"Serverless" sounds like magic — "no servers!"
Reality: there ARE servers, you just don't manage them — it's the "rent it" extreme
All eight fallacies still apply to YOUR code
The vendor handles retry logic, scaling, availability — but you pay for that in other ways

→ Transition: The "own vs. rent" choice isn't binary — it's a spectrum. Let's map it out...

The Cloud Deployment Spectrum

A horizontal spectrum of seven cloud deployment models. Each station shows an 8-layer stack (Facility, Power, Network, Hardware, Virtualization, OS, Runtime, App) with orange layers (you manage) and teal layers (provider manages). LEFT: Own Data Center has all orange layers, sweating developer. Moving RIGHT: progressively more teal layers. RIGHT: FaaS has almost all teal, with developer relaxing on a cloud-shaped hammock holding only a tiny fn() function. Legend shows orange=you, teal=provider.

Not shown: SaaS (Software as a Service) — even further right. For image resizing, you could outsource entirely to a vendor: call their API, pay per transformation, write zero image code. Maximum convenience, zero customization.

Connect to the iceberg:

"Remember the iceberg? All those layers beneath your code?"
"This spectrum shows WHO manages each layer"
"Each step to the right trades CONTROL for OPERATIONAL SIMPLICITY"

Walk through the spectrum (reference the iceberg layers):

Left side — you manage the entire iceberg:

Own data center: you manage ALL layers (facility, power, network, hardware, OS, runtime, app)
Real companies do this! Google, Amazon, large banks — when scale justifies it
Most of you will never need to think about power and cooling

Middle — the cloud IaaS era (2006-2015):

EC2 launched 2006 — revolutionary! No more buying hardware
The cloud provider handles the bottom of the iceberg (physical stuff)
But you still patch the OS, configure the firewall, scale manually
Bottlenose lives here (VMs + containers)

Right side — serverless territory:

PaaS (Heroku, 2007) was the first "just deploy your code"
FaaS (Lambda, 2014) went further: just deploy FUNCTIONS
Almost the entire iceberg is managed FOR you
Pawtograder lives here (Supabase Edge Functions)

Not shown: SaaS (the extreme right):

"There's actually one more step we didn't show"
SaaS = you don't even write code, you just USE someone else's service
Image resize example: Cloudinary, imgix, Cloudflare Images
You call their API: "resize this image to 200x200" — done
You pay per transformation, they handle everything
Trade-off: maximum convenience, but zero customization

The key insight:

"Serverless" isn't one thing — it's the rightmost portion of this spectrum
The vendor handles more of the iceberg; you handle less
But you're not getting something for nothing — you pay in flexibility, vendor lock-in, and sometimes cost at scale
SaaS is even further right: you don't write ANY code, but you're locked into their exact feature set

Connect to L20:

"Remember the eight fallacies? They apply at EVERY level except 'own data center'"
"But as you move right, the VENDOR deals with more of the retry logic, scaling, and availability"
"The fallacies don't disappear — the responsibility shifts"

→ Transition: But wait — just having compute isn't enough. Real apps need more than just somewhere to run code...

Beyond Compute: What Else Does Your Application Need?

Okay, you've got a server (or a function) running your code. But code alone isn't enough. Real applications have needs that go beyond just executing instructions.

A small application mascot standing on a cloud, surrounded by six question bubbles: 'Where do I put user data?' (persistence), 'Where do I store large files?' (storage), '1000 requests at once!' (traffic spikes), 'Why fetch the same data repeatedly?' (caching), 'How do users find me?' (routing), 'What broke at 3 AM?' (debugging). Tagline: Every real application faces these problems.

Frame this as universal problems:

"So you've got your code running somewhere. Great!"
"But immediately you hit problems that EVERY web application faces"
"These aren't unique to your app — they're so common that the industry has standardized solutions"

Walk through each problem:

Data persistence:

"Your app tracks user accounts. Where does that data GO?"
"If the server restarts, is that data gone?"
"You need somewhere to STORE data permanently"

Large files:

"A user uploads a profile picture. Or a video. Or a 50MB PDF."
"You can't just keep that in memory. Where does it live?"

Traffic spikes:

"Your app is quiet most of the day. Then everyone submits at 11:59 PM."
"How do you handle 1000 requests when you were handling 10?"

Repeated work:

"Every request asks 'what's the course name?' — same answer every time"
"Why fetch it from the database 1000 times?"

Routing:

"Your code is running somewhere. How does a user's browser FIND it?"
"How do you handle authentication before requests even reach your code?"

Debugging:

"Something broke. You can't SSH in (especially with serverless). Now what?"
"How do you even know something is broken?"

Key insight:

"These problems are so universal that cloud platforms provide standard SOLUTIONS"
"Let's give names to those solutions..."

→ Transition: Cloud platforms solve these with standardized building blocks...

Infrastructure Building Blocks

Cloud platforms provide standardized components that solve these recurring problems. Just as we have design patterns in code, these "building blocks" appear across architectural styles.

Databases

Structured data persistence

PostgreSQL, MongoDB, DynamoDB

Object Storage

Files and binary data at scale

S3, Cloud Storage, Supabase Storage

Message Queues

Async communication, buffering

SQS, Pub/Sub, RabbitMQ, pgmq

Caches

Fast access to hot data

Redis, Memcached, Upstash

API Gateways

Unified entry point, auth, routing

AWS API Gateway, Kong

Observability

Logs, metrics, traces

Sentry, Datadog, CloudWatch

Serverless architecture is fundamentally about composing these managed services: you write functions containing business logic; the cloud provider operates the infrastructure.

Connect to the problems we just saw:

"Each of these solves one of the problems we just identified"
"Databases → data persistence"
"Object Storage → large files"
"Message Queues → traffic spikes"
"Caches → repeated work"
"API Gateways → routing and access"
"Observability → debugging"

Frame this section:

"These aren't serverless-specific — they appear in monoliths, microservices, everywhere"
"But in serverless, you COMPOSE these services rather than deploying your own"
"Let's look at each one and see how Pawtograder uses them"

The key insight:

Traditional: you deploy PostgreSQL on your own server
Serverless: you use Supabase (managed PostgreSQL) and just connect to it
Same database, different operational model

Why this matters:

When you read system architecture diagrams, you'll see these components
Knowing what they do helps you understand the system's capabilities and constraints

→ Transition: Let's start with the most fundamental: databases...

Databases: Structured Data Persistence

When your application needs to remember something across restarts — user accounts, submissions, grades — that data lives in a database. The "right" choice depends on query patterns.

Relational (SQL)

Complex queries, relationships, transactions

PostgreSQL, MySQL

SELECT s.*, a.name
FROM submissions s
JOIN assignments a ON s.assignment_id = a.id
WHERE s.student_id = ?

Document (NoSQL)

Flexible schemas, JSON-like storage, no schema enforcement, limited query power

MongoDB, Firestore

{
  student: "alice",
  scores: [85, 92, 78],
  metadata: { ... }
}

Key-Value

Simple lookups by ID, extremely fast, no schema enforcement, no query power (just fetch by key)

DynamoDB, Redis

GET user:12345
SET session:abc123 {...}

Pawtograder: Uses PostgreSQL — we need queries like "find all submissions by this student across all assignments" and "calculate average scores grouped by section." Relational databases shine here.

Query patterns drive the choice:

Pawtograder queries: "submissions by student," "average scores by assignment," "students who haven't submitted"
These involve RELATIONSHIPS between students, courses, assignments, submissions
Document databases would struggle — you'd fetch all data and filter in code

The tradeoff:

Relational: powerful queries, ACID transactions, but rigid schema
Document: flexible schema, rapid iteration, but limited query power
Key-Value: blazing fast for simple lookups, useless for complex queries

Both Bottlenose and Pawtograder use PostgreSQL:

Autograding platforms inherently involve complex relationships
The query patterns demand relational capabilities

Looking ahead:

We'll explore consistency tradeoffs more in L31-32 (Concurrency)
"What happens when two students submit at the exact same moment?"

→ Transition: But not everything belongs in a database...

Object Storage: Files and Binary Data

Object Storage Characteristics

Cheap for large amounts of data
Durable (replicated across locations)
Simple (put, get, delete by key)

Common Services

Provider	Service
AWS	S3
Google	Cloud Storage
Azure	Blob Storage
Supabase	Storage (built on S3)

Pawtograder: Test files for grading can be several MB. They go to cloud storage; the grading system downloads them when needed. You wouldn't put a 5MB file in a database column.

Message Queues: Asynchronous Communication

A message queue lets components communicate without being online at the same time. Producer puts a message; consumer picks it up later. This decouples producers from consumers and buffers work during spikes.

Pawtograder: Grading

Student pushes to GitHub → GitHub enqueues an Actions workflow → grading runs when a runner is available.

Student sees "grading started" immediately. If the runner crashes, GitHub re-queues the job — no submissions lost.

Pawtograder: Repo Creation

Creating repos for 200 students → enqueue "create repo" tasks → background process works through at GitHub's rate limit (60/min).

Instructor sees immediate confirmation; repos appear over minutes.

Key property: Once the queue confirms receipt, it guarantees eventual delivery. The producer moves on; work happens even if consumers crash and restart. This is the retry + graceful degradation pattern from L20, built into infrastructure.

Connect to L20:

Remember the retry pattern? Queues BUILD THAT IN
Producer doesn't need to implement retry — the queue handles it
"At-least-once delivery" is the default guarantee

Why queues matter for serverless:

Serverless functions are stateless and ephemeral
You can't hold work "in memory" waiting
Queues provide durable storage for pending work

The grading example:

Student pushes code to GitHub
GitHub Actions has a queue of pending workflow runs
When a runner is free, it picks up the next job
If runner crashes mid-job, GitHub re-queues it automatically

The repo creation example:

GitHub rate limits: 60 API calls/minute
200 students × 1 repo each = 200 calls needed
Without queue: fail at student 61, lose progress
With queue: work through at sustainable pace, guaranteed completion

Examples:

AWS SQS, Google Pub/Sub, RabbitMQ, Apache Kafka
GitHub Actions itself uses queues internally!

→ Transition: What about speeding up repeated access?

Caches and API Gateways

Caches: Fast Access to Hot Data

Store copies of frequently-accessed data in memory. Serve directly instead of querying the database every time.

Service	Use Case
Redis	Session data, fast lookups
Memcached	Distributed cache
CDN	Static files at edge

Tradeoff: Speed vs. staleness. When should the cache refresh?

API Gateways: Unified Entry Point

Single entry point for your APIs. Routes requests, handles auth, enforces rate limits.

Pawtograder example: Supabase Gateway routes

/auth/* → authentication
/rest/v1/* → PostgREST (database)
/functions/v1/* → Edge Functions

Connection to L20: Caching addresses Fallacies 2-3 (latency, bandwidth). API gateways centralize the authentication and security concerns we discussed.

Observability: Seeing Inside Distributed Systems

In a monolith, debugging is (relatively) straightforward: one log file, one stack trace. In serverless, a single user action might trigger multiple functions across different machines that may not even exist anymore.

An illustration showing ephemeral serverless functions (some solid, some fading, some appearing) emitting streams of logs (purple), metrics (teal), and traces (orange) that flow down into a central log aggregation dashboard showing searchable logs, metrics graphs, and trace diagrams. Callout: You can't SSH into a function that no longer exists.

Serverless functions are ephemeral — they spin up, execute, and disappear. You can't SSH in and look around. You must invest in observability, or debugging becomes impossible.

Why this is critical for serverless:

Functions are ephemeral — no persistent process to inspect
Can't "SSH into the server" because there's no server
If you don't capture logs BEFORE the function terminates, they're gone

Pawtograder example:

Grading run fails → instructor asks "why did my student get 0?"
We need to see: Did the build fail? Tests timeout? API call fail?
Structured logs capture this; instructors can pull up the grading log

The hidden cost:

When you split a monolith into serverless functions, debuggability decreases
You MUST invest in observability tooling to compensate
This is one of the hidden costs teams often underestimate

Services:

Sentry: error tracking with stack traces, breadcrumbs
Datadog: full observability (logs, metrics, traces, dashboards)
CloudWatch: AWS's built-in, basic but functional

→ Transition: You've used logging before — let's see how it changes in serverless...

The Request ID: Finding Your Logs in the Chaos

You've used SLF4J — a facade that lets you write logger.info() without knowing where logs go. Serverless platforms tag every log line with a Request ID. This seems minor until you see the alternative.

Without Request IDs: Interleaved chaos

[INFO] Processing submission for alice
[INFO] Processing submission for bob
[INFO] Running tests...
[INFO] Processing submission for carol
[ERROR] Test failed: NullPointerException
[INFO] Running tests...
[INFO] Completed in 847ms
[INFO] Running tests...
[ERROR] Timeout after 30s
[INFO] Completed in 234ms

Which error belongs to which student? Good luck.

With Request IDs: Filter by one request

Filter: RequestId = "3f1e..."

START RequestId: 3f1e...
[INFO] 3f1e Processing submission for bob  
[INFO] 3f1e Running tests...
[ERROR] 3f1e Test failed: NullPointerException
END RequestId: 3f1e...
REPORT Duration: 892ms Status: 500

One student's entire request, start to finish.

Key insight: When 100 students submit at once, you get 100 concurrent function instances writing to the same log stream. The Request ID is how you untangle them. The platform adds it automatically — you just filter by it when debugging.

Beyond filtering: Error collection services (Sentry, Datadog) also provide alerting (Slack when errors spike), error grouping (100 identical stack traces → 1 issue), dashboards (error rate over time), and distributed tracing (follow a request across multiple services).

The problem with concurrent logs:

"Imagine 100 students submit at the deadline"
"100 function instances start running simultaneously"
"All of them write to the same log stream"
"Without Request IDs, the logs are interleaved — completely unreadable"

Show the left side first:

"Look at this mess. Which NullPointerException goes with which student?"
"Which 'Completed' message matches which 'Processing' message?"
"This is what logs look like without correlation IDs"

The platform solution:

Every log line is tagged with the Request ID
When debugging: filter by that ID
You see ONLY the logs from that one invocation
Start, your logs, end, report — all together

The facade pattern still applies:

Your code still just calls logger.info()
Platform injects the Request ID automatically
Different platforms (AWS, Google, Azure) all do this
SLF4J with Logback → SLF4J with LambdaLogger — code doesn't change

Pawtograder example:

Student says "my submission failed"
Find the Request ID from the error notification
Filter CloudWatch: see exactly what happened to THEIR request
No need to scroll through thousands of interleaved log lines

Beyond basic logging — what observability services provide:

Alerting: "Send a Slack message if error rate exceeds 5%"
Error grouping: 100 students hit the same bug → shows as 1 issue with count
Stack traces: Full stack trace with source code context
Dashboards: Visualize error rates, latency percentiles over time
Distributed tracing: Follow one request across API → Database → External service
Anomaly detection: "Error rate is 10x higher than usual"

Services to know:

Sentry: Error tracking, great for catching exceptions with context
Datadog: Full observability platform (logs, metrics, traces, APM)
AWS CloudWatch: Built-in, basic but functional
Grafana + Loki: Open-source alternative

→ Transition: With this vocabulary, let's define what "serverless" actually means...

Defining "Serverless"

"Serverless" is a bit of a misnomer — there are still servers, you just don't manage them. The key insight is organizational: serverless is technical partitioning with a vendor.

Remember L19: Technical vs. Domain Partitioning?

Technical: Organize by role (controllers, services, repositories)
Domain: Organize by business capability (users, grading, submissions)

Serverless takes technical partitioning to the organizational level: a cloud vendor operates infrastructure as a service.

Specialization Through Outsourcing

The vendor specializes in infrastructure — container orchestration, auto-scaling, security patching.

You specialize in your domain — courses, assignments, grading.

Each side focuses on what they do best.

You gain operational simplicity and elasticity. You lose control: vendor abstractions constrain how you build, pricing determines costs at scale, and switching vendors means rewriting infrastructure code.

Event-Driven Execution

Serverless functions are triggered by events — not just HTTP requests. This enables reactive architectures where functions respond to changes in the system.

Four event sources (HTTP Request like POST /submissions, File Upload like test files, Database Change like new submission triggers, Schedule like nightly exports) shown as distinct icons at top, with arrows converging on a central serverless function. Multiple copies of the function appear (illustrating auto-scaling). Functions connect to downstream services.

The event-driven paradigm:

Traditional servers: "listen on port 8080, handle whatever comes"
Serverless: "when THIS event happens, run THIS function"
More declarative, more focused

Event sources in practice:

HTTP: most common, what we'll show in the code example
File upload: "when a file lands in this bucket, process it"
Database change: "when a row is inserted, trigger downstream actions"
Schedule: "every night at 2 AM, run this cleanup job"
Message queue: "when a message arrives, process it"

Pawtograder examples:

HTTP: POST /submissions (student submits work), POST /feedback (grader reports results)
File upload: Test files for assignments stored in cloud storage
Schedule: could use for nightly grade exports (not currently implemented)
Database triggers: handled by PostgreSQL itself, not Edge Functions

The scaling insight:

100 students submit at once? 100 function instances spin up
You don't configure this — it just happens
This is the "elastic" in "elastic computing"

→ Transition: Now let's see what the code looks like...

The AWS Lambda SDK: A Programming Model

AWS Lambda provides a Java library that defines how your code interacts with the platform. The key abstraction is the RequestHandler interface — a generic interface you implement.

// From the AWS Lambda Java SDK (aws-lambda-java-core)
public interface RequestHandler<I, O> {
    O handleRequest(I input, Context context);
}

The Generic Types

I (Input): What triggers your function
- S3Event — file uploaded to S3
- APIGatewayProxyRequestEvent — HTTP request
- SQSEvent — message from a queue
- ScheduledEvent — schedule/timer trigger
O (Output): What your function returns
- APIGatewayProxyResponseEvent — HTTP response
- String — simple text output
- void — fire-and-forget

The Context Object

AWS passes metadata about the invocation:

context.getFunctionName();     // "ImageResizer"
context.getRemainingTimeInMillis(); // 29000
context.getAwsRequestId();     // unique ID
context.getMemoryLimitInMB();  // 512
context.getLogger();           // CloudWatch logger

Useful for logging, timeouts, debugging.

Notice: No main() method. No server setup. No port binding. You implement ONE method — AWS handles the rest.

Frame this as a library/SDK:

"Just like JUnit has @Test annotations, Lambda has RequestHandler"
"You implement an interface, the framework calls your code"
"This is Dependency Inversion at the platform level"

The generic types pattern:

"You've seen generics before — List of String, Map of K to V"
"RequestHandler of I and O — I is input type, O is output type"
"Different event sources have different input types"
"The SDK provides pre-built types for common triggers"

Event types in the SDK:

S3Event: contains bucket name, object key, event type
APIGatewayProxyRequestEvent: contains headers, body, path, query params
SQSEvent: contains message body, attributes, receipt handle
ScheduledEvent: contains scheduled time, rule name

The Context object:

"AWS gives you metadata about THIS invocation"
"getRemainingTimeInMillis() is crucial — check before long operations"
"getLogger() writes to CloudWatch — how you debug serverless"

Connection to course concepts:

"This is the Strategy pattern! Your handler is the strategy, AWS is the context."
"Also Dependency Injection — AWS injects the event and context"

→ Transition: Let's see this in action with a real example...

Functions as a Service (FaaS)

Instead of deploying an application that runs continuously, you deploy functions that execute in response to events. Focus on the principles, not the syntax:

public class CreateSubmissionHandler
        implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

    public APIGatewayProxyResponseEvent handleRequest(
            APIGatewayProxyRequestEvent request, Context context) {
        // Create clients for this request (platform provides connection details via environment)
        var db = new PostgresClient(System.getenv("DATABASE_URL"));
        var storage = new S3Client(System.getenv("BUCKET_NAME"));

        SubmissionRequest body = parseJson(request.getBody());                      // Parse input
        OIDCClaims claims = verifyGitHubOIDC(request.getHeaders().get("Authorization")); // Verify
        Submission sub = db.insertSubmission(body.assignmentId(), claims.repo());   // Do the work

        return new APIGatewayProxyResponseEvent()
            .withStatusCode(200)
            .withBody(toJson(new SubmissionResponse(sub.id(), storage.getUrl(...))));
    }
}

① Event-Driven
Platform calls you when event arrives. No main(), no server.

② Stateless
No state persists between calls. Create what you need fresh.

③ Input → Output
Request in, response out. Pure transformation.

④ Platform Lifecycle
Container spins up, handler runs, recycles. Pay per call.

Four key FaaS principles (the boxes):

① Event-Driven:

"Notice what's MISSING: no main(), no server startup, no port binding"
"The platform calls YOUR code when an event happens"
"You don't listen for requests — you respond to them"

② Stateless:

"Notice we create the database and storage clients inside the handler — fresh each time"
"You can't rely on anything persisting between calls"
"This is why we need external state (databases, KV stores)"

③ Input → Output:

"Request comes in, response goes out"
"Like a pure function at the infrastructure level"
"Same principle as functional programming — minimize side effects"

④ Platform Lifecycle:

Request arrives → AWS routes it
Cold start: spin up new container, load runtime (100ms-5s)
Warm container: reuse existing (fast!)
Execute your handler
Return response, maybe recycle container
You pay ONLY for the time your code runs

What you DON'T write:

Server startup code, health checks, graceful shutdown
Process management, scaling logic, load balancing

→ Transition: Let's compare traditional vs serverless...

Traditional Server: Image Resize (Revisited)

Remember our warm-up? With a traditional server, YOU manage the infrastructure. Let's summarize what that ImageResizeServer requires:

Your code responsibilities:

main() method to start the server
Port binding and configuration
Health check endpoints for load balancers
Graceful shutdown handling
Multipart form parsing
Error handling and logging

Infrastructure responsibilities:

Server runs 24/7 (even at 3 AM with zero requests)
YOU restart it when it crashes
YOU scale horizontally (more instances)
YOU configure load balancing
YOU handle SSL certificates
YOU pay for idle time

The 15 lines of image resize logic are buried under all this operational work. What if you could just write the resize function and let someone else handle the rest?

Lambda: Same Feature, Less Code

Same image resize, but with Lambda. No main(), no health checks, no shutdown hooks. Just implement the handler — AWS runs it when a file arrives.

// Triggered automatically when a file is uploaded to the "uploads" S3 bucket
public class ImageResizeHandler implements RequestHandler<S3Event, String> {

    // Optimization: reuse across "warm" invocations (see note below)
    private final S3Client s3 = S3Client.create();

    @Override
    public String handleRequest(S3Event event, Context context) {
        // S3 tells us which file was uploaded
        var record = event.getRecords().get(0).getS3();
        String bucket = record.getBucket().getName();
        String key = record.getObject().getKey();  // e.g., "uploads/profile-123.jpg"

        // Download the original image
        byte[] original = s3.getObjectAsBytes(r -> r.bucket(bucket).key(key)).asByteArray();

        // Resize it (using any image library)
        byte[] thumbnail = ImageUtils.resize(original, 200, 200);

        // Save the thumbnail to a different location
        String thumbKey = key.replace("uploads/", "thumbnails/");
        s3.putObject(r -> r.bucket(bucket).key(thumbKey), RequestBody.fromBytes(thumbnail));

        return "Resized: " + key + " → " + thumbKey;
    }
}

What you didn't write: No polling loop checking for new files. No server listening. No scaling config. Upload 1000 images? 1000 functions run in parallel.

Walk through the code:

S3Event: AWS automatically passes info about the uploaded file
Context: metadata about this invocation (request ID, time remaining, etc.)
The function downloads, resizes, and uploads — that's it

Why is S3Client a field instead of created inside handleRequest?

AWS recommends reusing SDK clients across "warm" invocations for performance
Lambda may reuse the same container for multiple requests — the field survives between calls
This is an optimization: creating an S3Client is expensive (DNS lookup, TLS handshake)
But it creates a debugging trap: if you store REQUEST-SPECIFIC state in a field, it leaks between invocations
Rule of thumb: SDK clients in fields = OK. Business state in fields = bug.

The trigger configuration (not shown in code):

In AWS Console or infrastructure-as-code: "When a file is PUT in bucket X, invoke this function"
You configure which events trigger which functions
The function itself doesn't know about the trigger — it just receives the event

Why this is powerful:

No server running 24/7 waiting for uploads
Scales automatically — 1000 uploads = 1000 parallel invocations
Pay only for execution time (typically milliseconds per image)

Real-world usage:

Instagram, TikTok, any app with user uploads
Also: video transcoding, PDF generation, log processing

→ Transition: There are also some broader considerations worth discussing...

Energy Efficiency Considerations

Serverless architecture has interesting sustainability implications that cut both ways.

Potential Energy Savings

No idle power: Monolith runs 24/7 even at 3 AM. Serverless consumes energy only when executing.
Shared infrastructure: Cloud providers achieve high utilization across thousands of customers. 80% utilization > 10%.
Right-sized execution: Functions get exactly the resources needed (modulo startup overhead).

Potential Energy Costs

Cold start overhead: Spinning up new containers has energy costs warm monoliths avoid.
Per-request overhead: Each invocation goes through routing, logging, billing infrastructure.
Distributed chattiness: Many small functions calling each other = network energy costs.

The architectural lesson: batch operations when possible. Pawtograder's submitFeedback() sends all test results in one call, not 100 separate calls. This saves latency, cost, AND energy.

When Does Serverless Fit?

Split illustration showing serverless fit. LEFT (green): Image resize (S3 trigger), Welcome emails (database trigger), Webhook handlers (GitHub events), Submissions at deadline (bursty traffic). RIGHT (red): Video encoding (timeout), Multiplayer games (cold starts), In-memory cache (stateless problem), High-frequency trading (sustained load). Center shows the three key questions: scaling, latency, ops.

Connect to examples from today:

GREEN SIDE — we built these!

"Image resize: Remember the Lambda code? S3 event triggers function, no server running."
"Welcome emails: From our architecture comparison — database trigger fires function."
"Webhook handlers: GitHub sends push event, function processes it, done. Quick, stateless."
"Bursty traffic: 1000 profile uploads at deadline? 1000 Lambda instances. $0 at 3 AM."

RED SIDE — these need different solutions:

"Video encoding: 30 minutes to transcode. Lambda times out at 15. Use containers."
"Multiplayer games: Cold start of 2 seconds = player dies waiting. Need always-warm."
"In-memory cache: Function memory disappears between calls. Need external storage."
"High-frequency trading: Sustained 1M requests/sec. Per-invocation pricing gets expensive fast."

The three questions in the center:

"These are the same three questions from our decision framework"
"Scaling pattern, latency needs, ops capacity"
"If your workload is on the green side for all three → serverless is a great fit"

Key message:

"This isn't about 'serverless good' or 'serverless bad'"
"It's about matching the tool to the problem"
"Now you have concrete examples to pattern-match against"

→ Transition: Let's connect this back to course concepts...

Information Hiding at Scale

A zoom-out sequence showing information hiding at four scales: INNERMOST (L6) shows a Submission class with private fields hidden behind public methods. SECOND (L16) shows the class inside hexagonal architecture with ports (SubmissionRepositoryPort, FileStoragePort) and adapters (PostgresSubmissionAdapter, S3StorageAdapter). THIRD (L18) shows the Pawtograder service with modules hidden behind API endpoints. OUTERMOST (L21) shows the service in the cloud where GitHub Actions just calls POST /submissions. Tagline: L6 said hide what might change — same principle at every scale.

GitHub Actions calls POST /submissions and POST /feedback. It doesn't know — or care — whether these are Edge Functions, Lambda, or a traditional server. That's information hiding at the architectural level.

The fractal nature of information hiding:

L6: classes hide implementation behind public methods
L16: hexagonal architecture hides adapters behind ports
L21: serverless hides infrastructure behind API endpoints

Walk through the table:

Hexagonal Architecture:

Domain logic (deadline enforcement, grade calculation) stays in the core
Edge Functions are just adapters — thin wrappers around the domain
The hexagon's domain core doesn't know it's "serverless"

Dependency Injection:

Edge Functions receive configured clients (database, storage) from the runtime
Environment variables inject connection strings
Same pattern as L17 (Dependency Injection), at infrastructure scale

Information Hiding:

GitHub Actions code just calls POST /submissions and POST /feedback
It doesn't mention "Supabase" or "Edge Functions" — just HTTP endpoints
If we migrated to AWS Lambda, the grading workflow wouldn't change

Fallacies (L20):

Edge Functions still deal with network unreliability
Supabase's client libraries include retry logic
The fallacies don't disappear — they're handled at the infrastructure level

Quality Attributes:

We chose serverless for SCALABILITY (deadline spikes) and COST (pay-per-use)
We accepted LATENCY tradeoff (cold starts)
Same quality attribute reasoning as L19

→ Transition: Let's bring it all together...

Bringing It Together: L19 → L20 → L21

Lecture	Question	Key Insight
L19	How do we organize code?	Architectural styles emerge from quality attribute requirements. Monolith-first is usually right.
L20	What changes over networks?	The eight fallacies. Every network call can fail, be slow, or be intercepted.
L21	What if someone else manages infra?	Serverless = technical partitioning with a vendor. Same principles, different operational model.

The thread connecting all three:

Same design principles at every scale:

Information hiding (L6)
Coupling and cohesion (L7)
Hexagonal architecture (L16)
Quality attribute tradeoffs (L19)

The practical takeaway:

No single architecture is right for everything. Pawtograder's hybrid approach demonstrates this — serverless API, managed compute for grading, PostgreSQL for domain logic.

Synthesis:

This is the end of our "architecture trilogy" (L19-L20-L21)
Each lecture built on the previous
The principles are consistent; the scale changes

The common thread:

L6: Hide implementation behind interfaces
L16: Hexagonal architecture — domain core doesn't know about adapters
L19: Service boundaries follow the same principle at larger scale
L20: Network boundaries require explicit handling of failure
L21: Vendor boundaries let you outsource infrastructure concerns

Pawtograder as a case study:

We used the same autograding example across all three lectures
Different perspectives on the same system
Shows how these concepts work together in practice

What you should remember:

Architecture is about tradeoffs, not "best practices"
Start simple, add complexity when you have a reason
The eight fallacies always apply (unless you own the data center)
Hexagonal architecture makes architectural decisions more reversible

→ Transition: These same questions work at every scale...

Same Questions, Every Scale

At every level — class, module, service, system — you ask the same four questions:

Question	What It Determines
What changes independently?	Where to draw boundaries
Who needs to know?	What the interface should hide
What can fail?	How explicit your error handling must be
What are you trading?	Whether the tradeoff is worth it

L6-L7

Classes & methods

Private fields, cohesive modules

L16-L18

Services & boundaries

Ports, adapters, APIs

L20

Network boundaries

Fallacies, failures, security

L21

Vendor boundaries

Managed infra, tradeoffs

The four questions are universal:

They work when you're designing a class
They work when you're drawing service boundaries
They work when you're choosing between serverless and containers
The SCALE changes. The QUESTIONS don't.

Walk through each:

"What changes independently?" → This is L6's information hiding, L18's rate-of-change heuristic, L21's "what can you outsource?"
"Who needs to know?" → This is interfaces at every level — method signatures, API contracts, vendor abstractions
"What can fail?" → This becomes MORE critical as scale increases (L20's fallacies)
"What are you trading?" → L19's quality attributes — every choice has costs and benefits

The progression across lectures:

Same questions, different consequences
At class level: failure = exception, cost = complexity
At network level: failure = timeout/retry, cost = latency/reliability
At vendor level: failure = outage, cost = lock-in/flexibility

→ Transition: This is what thinking like an architect means...

The Architect's Toolkit

You now have a framework for approaching any system:

When you see a boundary, ask:

What's hiding behind it?
Who owns each side?
What happens when communication fails?

When you're drawing a boundary, ask:

What changes independently?
Who needs to know about what?
Is this a one-way door or two-way door?

When evaluating an architecture, ask:

What quality attributes drove these choices?
What tradeoffs were accepted?
What would break if requirements changed?

When choosing complexity, ask:

Do I have a specific problem this solves?
Can I start simpler and evolve?
What's the cost of being wrong?

The principles scale. The details change. The questions stay the same.

This is the synthesis of L18-L21:

Not just facts to memorize, but a WAY OF THINKING
These questions work in interviews, code reviews, architecture discussions
They work whether you're looking at a monolith or a distributed system

The meta-lesson:

Architecture isn't about memorizing style names
It's about asking the right questions consistently
The answers differ based on context — but the questions are universal

For their careers:

When they join a company and see an existing architecture, these questions help them understand WHY
When they're asked to make architectural decisions, these questions guide them
When they're debugging distributed systems, "what can fail?" becomes critical

Final thought:

L22 adds one more dimension: the PEOPLE who build these systems
Conway's Law says team structure shapes architecture
The questions expand: "Who will build this? Who will maintain it?"

→ Transition: One more thought before we move on...

The Quality Without a Name

In L18, we mentioned Christopher Alexander — the architect whose work inspired software design patterns.

Alexander's insight:

The most livable, enduring structures emerge through gradual, adaptive growth — not grand master plans.

You don't design the perfect building. You create the conditions for one to emerge.

The same is true for software:

Start with good boundaries (L18)
Let styles emerge from understanding (L19)
Respect what networks add (L20)
Choose vendors consciously (L21)

Then let the system grow within those constraints.

Alexander called this ineffable quality that makes spaces feel alive the "Quality Without a Name." You can't define it precisely — but you know it when you see it. Well-designed software has it too.

Why end with Alexander?

He's the intellectual ancestor of everything we've discussed
"Design Patterns" (GoF) explicitly cites his work
His philosophy connects architecture (buildings) to architecture (software)

The Timeless Way of Building (1979):

Alexander argued against top-down master planning
The best buildings, neighborhoods, cities emerge through gradual growth
But NOT random growth — growth guided by patterns and principles
Sound familiar? That's exactly what we've been saying about software.

Quality Without a Name (QWAN):

Alexander's term for what makes a space feel "alive"
He couldn't define it precisely — but devoted his career to understanding it
Some software has this quality: it feels right, it's pleasant to work with
You can't achieve QWAN through checklists — but the principles we've discussed create conditions for it

The connection to this module:

L18: "Just enough architecture" — decide what's hard to reverse, defer the rest
L19: Styles emerge from understanding, not from picking off a menu
L20-L21: Respect constraints, work with them rather than against them
Together: create conditions for good software to emerge

Easter egg:

"QWAN Coffee" in the Pawtograder example (L18) = Quality Without A Name
Students who noticed it get the reference now

→ Transition: Speaking of conditions for emergence — teams are one of those conditions...

What's Next: Teams and Collaboration

We've been implicitly assuming a single developer making all decisions. Real software is built by teams — and team structure has a big impact on how software gets built.

L22: Teams and Collaboration

How teams organize, communicate, coordinate
Why org structure shapes system structure
Architectural boundaries often become team boundaries
Strategies for effective collaboration

The connection:

Today we saw serverless as outsourcing infrastructure to a specialist vendor — your team focuses on domain logic, they focus on infra.

That's an organizational decision as much as a technical one.

Lecture 21: Serverless Architecture​

Learning Objectives​

Warm-Up: A Simple Feature​

Now Make It a Service​

Where Does This Server Run?​

The Infrastructure Iceberg​

Two Choices: Own It or Rent It​

Recap: From Distributed Systems to Serverless​

The Cloud Deployment Spectrum​

Beyond Compute: What Else Does Your Application Need?​

Infrastructure Building Blocks​

Databases: Structured Data Persistence​

Object Storage: Files and Binary Data​

Message Queues: Asynchronous Communication​

Caches and API Gateways​

Observability: Seeing Inside Distributed Systems​

The Request ID: Finding Your Logs in the Chaos​

Defining "Serverless"​

Event-Driven Execution​

The AWS Lambda SDK: A Programming Model​

Functions as a Service (FaaS)​

Traditional Server: Image Resize (Revisited)​

Lambda: Same Feature, Less Code​

Energy Efficiency Considerations​

When Does Serverless Fit?​

Information Hiding at Scale​

Bringing It Together: L19 → L20 → L21​

Same Questions, Every Scale​

The Architect's Toolkit​

The Quality Without a Name​

What's Next: Teams and Collaboration​