Skip to main content
A pixel art split-scene: LEFT shows a stressed student with their car in a snowy lot — $2,200 parking permit, insurance bills, gas costs, car sitting unused. RIGHT shows the same student relaxed, tapping CharlieCard on the MBTA Green Line, sharing the train with others. BUT: MBTA signs show 'No trains past Government Center' and 'Service ends 12:30 AM' — you play by their rules. Tagline: Shared Infrastructure. Shared Rules. Shared Savings.

CS 3100: Program Design and Implementation II

Lecture 21: Serverless Architecture

©2026 Jonathan Bell, CC-BY-SA

Learning Objectives

After this lecture, you will be able to:

  1. Recognize common infrastructure building blocks (databases, queues, caches, object storage, observability) and their architectural roles
  2. Define "serverless" architecture and Functions as a Service (FaaS) concepts
  3. Compare serverless to traditional and container-based architectures, identifying tradeoffs
  4. Identify requirements that are well-suited or poorly-suited for serverless
  5. Apply a decision framework for choosing between architectural styles based on team size, scaling needs, and operational capacity

Important framing: You will encounter serverless systems in internships and jobs. The goal is to understand why teams choose serverless and reason about whether it fits a given problem — not to become a serverless architect overnight.

Warm-Up: A Simple Feature

You're building an app that lets users upload photos. You need a feature: resize images to create thumbnails. The core logic is straightforward — we will use an image processing library.

public class ImageUtils {
public static byte[] resize(byte[] imageData, int width, int height) {
BufferedImage original = ImageIO.read(new ByteArrayInputStream(imageData));
BufferedImage thumbnail = new BufferedImage(width, height, original.getType());

Graphics2D g = thumbnail.createGraphics();
g.drawImage(original, 0, 0, width, height, null);
g.dispose();

ByteArrayOutputStream out = new ByteArrayOutputStream();
ImageIO.write(thumbnail, "jpg", out);
return out.toByteArray();
}
}

This is the easy part. ~15 lines of code. You could write this in an hour.

But how do users actually send you an image? This code runs on your laptop. Your users are... not on your laptop.

Now Make It a Service

For users to access your image resizer, you need an HTTP server that listens for requests. Now look how much code you need:

public class ImageResizeServer {
private static final int PORT = 8080;
private static volatile boolean running = true;

public static void main(String[] args) throws Exception {
// YOU set up the server
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.createContext("/resize", exchange -> {
byte[] imageData = parseMultipartUpload(exchange); // Parsing uploads is painful
byte[] thumbnail = ImageUtils.resize(imageData, 200, 200); // ← Your actual logic
exchange.sendResponseHeaders(200, thumbnail.length);
exchange.getResponseBody().write(thumbnail);
exchange.close();
});
server.createContext("/health", ex -> { // Load balancers need this
ex.sendResponseHeaders(200, 2);
ex.getResponseBody().write("OK".getBytes());
ex.close();
});

// YOU handle graceful shutdown
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
running = false;
server.stop(5);
}));

server.start();
System.out.println("Server running on port " + PORT);
while (running) { Thread.sleep(1000); } // Runs FOREVER until killed
}
}

Your 15 lines of business logic are now buried in 30+ lines of server boilerplate. And we haven't even talked about where this server runs...

Where Does This Server Run?

You can run java ImageResizeServer on your laptop. But your laptop sleeps when you sleep. Your users are in different time zones.

Split scene: LEFT shows a peaceful dorm room at 2:47 AM — student asleep, laptop closed, everything quiet. RIGHT shows a server at the SAME 2:47 AM, anthropomorphized as an exhausted worker surrounded by incoming requests from Tokyo, London, Sydney — it can never sleep. Tagline: When you close your laptop, GitHub is still answering requests from 20 million developers.

The Infrastructure Iceberg

Your ImageResizeServer needs to run somewhere. Running code that serves users around the clock requires far more than just "a computer." This is what we call infrastructure.

An iceberg diagram: Above water (small) is 'Your App' — the code you write. Below water (massive) are the infrastructure layers: Runtime Environment, Operating System, Hardware, Network, Power & Cooling, Physical Space. Tagline: Your app is just the tip of the iceberg.

Two Choices: Own It or Rent It

You have a fundamental choice about who manages all that infrastructure. This is the core tradeoff that defines cloud computing — and it's exactly like choosing between owning a car and taking the T.

Own Your Infrastructure

Buy servers. Rent data center space. Hire ops engineers. Configure everything yourself.

  • Total control — choose any hardware, any software
  • Predictable costs at high scale
  • You're responsible for everything: uptime, security, maintenance
  • When something breaks at 3 AM, your phone rings

Rent from a Cloud Provider

AWS, Google Cloud, Azure, Supabase — they own the data centers, you use their services.

  • Less control — work within their constraints
  • Pay-per-use pricing (can be cheaper... or more expensive)
  • They handle most operational concerns
  • When their stuff breaks at 3 AM, their phone rings

This is the same tradeoff you make with transportation in Boston. Own a car? Total freedom, but you pay for parking, insurance, gas, maintenance — even when it sits unused. Take the T? Cheaper per trip, but you follow their schedule and routes.

Recap: From Distributed Systems to Serverless

In L19, we explored architectural styles — monoliths, modular monoliths, and the tradeoffs between them. In L20, we crossed the network boundary and discovered the Eight Fallacies of Distributed Computing.

L19: How do we organize code?

Architectural styles emerge from quality attribute requirements. Monolith-first is usually right.

L20: What changes over networks?

The eight fallacies. Latency, failures, security boundaries. Every call needs timeout + retry.

L21: What if someone else manages it?

Serverless = technical partitioning with a vendor. You write functions; they operate infrastructure.

Today's key insight: Serverless doesn't eliminate distributed systems complexity — it shifts who deals with it. The eight fallacies still apply. You just don't write the retry logic yourself.

The Cloud Deployment Spectrum

A horizontal spectrum of seven cloud deployment models. Each station shows an 8-layer stack (Facility, Power, Network, Hardware, Virtualization, OS, Runtime, App) with orange layers (you manage) and teal layers (provider manages). LEFT: Own Data Center has all orange layers, sweating developer. Moving RIGHT: progressively more teal layers. RIGHT: FaaS has almost all teal, with developer relaxing on a cloud-shaped hammock holding only a tiny fn() function. Legend shows orange=you, teal=provider.

Not shown: SaaS (Software as a Service) — even further right. For image resizing, you could outsource entirely to a vendor: call their API, pay per transformation, write zero image code. Maximum convenience, zero customization.

Beyond Compute: What Else Does Your Application Need?

Okay, you've got a server (or a function) running your code. But code alone isn't enough. Real applications have needs that go beyond just executing instructions.

A small application mascot standing on a cloud, surrounded by six question bubbles: 'Where do I put user data?' (persistence), 'Where do I store large files?' (storage), '1000 requests at once!' (traffic spikes), 'Why fetch the same data repeatedly?' (caching), 'How do users find me?' (routing), 'What broke at 3 AM?' (debugging). Tagline: Every real application faces these problems.

Infrastructure Building Blocks

Cloud platforms provide standardized components that solve these recurring problems. Just as we have design patterns in code, these "building blocks" appear across architectural styles.

Databases

Structured data persistence

PostgreSQL, MongoDB, DynamoDB

Object Storage

Files and binary data at scale

S3, Cloud Storage, Supabase Storage

Message Queues

Async communication, buffering

SQS, Pub/Sub, RabbitMQ, pgmq

Caches

Fast access to hot data

Redis, Memcached, Upstash

API Gateways

Unified entry point, auth, routing

AWS API Gateway, Kong

Observability

Logs, metrics, traces

Sentry, Datadog, CloudWatch

Serverless architecture is fundamentally about composing these managed services: you write functions containing business logic; the cloud provider operates the infrastructure.

Databases: Structured Data Persistence

When your application needs to remember something across restarts — user accounts, submissions, grades — that data lives in a database. The "right" choice depends on query patterns.

Relational (SQL)

Complex queries, relationships, transactions

PostgreSQL, MySQL

SELECT s.*, a.name
FROM submissions s
JOIN assignments a ON s.assignment_id = a.id
WHERE s.student_id = ?

Document (NoSQL)

Flexible schemas, JSON-like storage, no schema enforcement, limited query power

MongoDB, Firestore

{
student: "alice",
scores: [85, 92, 78],
metadata: { ... }
}

Key-Value

Simple lookups by ID, extremely fast, no schema enforcement, no query power (just fetch by key)

DynamoDB, Redis

GET user:12345
SET session:abc123 {...}

Pawtograder: Uses PostgreSQL — we need queries like "find all submissions by this student across all assignments" and "calculate average scores grouped by section." Relational databases shine here.

Object Storage: Files and Binary Data

Object Storage Characteristics

  • Cheap for large amounts of data
  • Durable (replicated across locations)
  • Simple (put, get, delete by key)

Common Services

ProviderService
AWSS3
GoogleCloud Storage
AzureBlob Storage
SupabaseStorage (built on S3)

Pawtograder: Test files for grading can be several MB. They go to cloud storage; the grading system downloads them when needed. You wouldn't put a 5MB file in a database column.

Message Queues: Asynchronous Communication

A message queue lets components communicate without being online at the same time. Producer puts a message; consumer picks it up later. This decouples producers from consumers and buffers work during spikes.

Pawtograder: Grading

Student pushes to GitHub → GitHub enqueues an Actions workflow → grading runs when a runner is available.

Student sees "grading started" immediately. If the runner crashes, GitHub re-queues the job — no submissions lost.

Pawtograder: Repo Creation

Creating repos for 200 students → enqueue "create repo" tasks → background process works through at GitHub's rate limit (60/min).

Instructor sees immediate confirmation; repos appear over minutes.

Key property: Once the queue confirms receipt, it guarantees eventual delivery. The producer moves on; work happens even if consumers crash and restart. This is the retry + graceful degradation pattern from L20, built into infrastructure.

Caches and API Gateways

Caches: Fast Access to Hot Data

Store copies of frequently-accessed data in memory. Serve directly instead of querying the database every time.

ServiceUse Case
RedisSession data, fast lookups
MemcachedDistributed cache
CDNStatic files at edge

Tradeoff: Speed vs. staleness. When should the cache refresh?

API Gateways: Unified Entry Point

Single entry point for your APIs. Routes requests, handles auth, enforces rate limits.

Pawtograder example: Supabase Gateway routes

  • /auth/* → authentication
  • /rest/v1/* → PostgREST (database)
  • /functions/v1/* → Edge Functions

Connection to L20: Caching addresses Fallacies 2-3 (latency, bandwidth). API gateways centralize the authentication and security concerns we discussed.

Observability: Seeing Inside Distributed Systems

In a monolith, debugging is (relatively) straightforward: one log file, one stack trace. In serverless, a single user action might trigger multiple functions across different machines that may not even exist anymore.

An illustration showing ephemeral serverless functions (some solid, some fading, some appearing) emitting streams of logs (purple), metrics (teal), and traces (orange) that flow down into a central log aggregation dashboard showing searchable logs, metrics graphs, and trace diagrams. Callout: You can't SSH into a function that no longer exists.

Serverless functions are ephemeral — they spin up, execute, and disappear. You can't SSH in and look around. You must invest in observability, or debugging becomes impossible.

The Request ID: Finding Your Logs in the Chaos

You've used SLF4J — a facade that lets you write logger.info() without knowing where logs go. Serverless platforms tag every log line with a Request ID. This seems minor until you see the alternative.

Without Request IDs: Interleaved chaos

[INFO] Processing submission for alice
[INFO] Processing submission for bob
[INFO] Running tests...
[INFO] Processing submission for carol
[ERROR] Test failed: NullPointerException
[INFO] Running tests...
[INFO] Completed in 847ms
[INFO] Running tests...
[ERROR] Timeout after 30s
[INFO] Completed in 234ms

Which error belongs to which student? Good luck.

With Request IDs: Filter by one request

Filter: RequestId = "3f1e..."

START RequestId: 3f1e...
[INFO] 3f1e Processing submission for bob
[INFO] 3f1e Running tests...
[ERROR] 3f1e Test failed: NullPointerException
END RequestId: 3f1e...
REPORT Duration: 892ms Status: 500

One student's entire request, start to finish.

Key insight: When 100 students submit at once, you get 100 concurrent function instances writing to the same log stream. The Request ID is how you untangle them. The platform adds it automatically — you just filter by it when debugging.

Beyond filtering: Error collection services (Sentry, Datadog) also provide alerting (Slack when errors spike), error grouping (100 identical stack traces → 1 issue), dashboards (error rate over time), and distributed tracing (follow a request across multiple services).

Defining "Serverless"

"Serverless" is a bit of a misnomer — there are still servers, you just don't manage them. The key insight is organizational: serverless is technical partitioning with a vendor.

Remember L19: Technical vs. Domain Partitioning?

  • Technical: Organize by role (controllers, services, repositories)
  • Domain: Organize by business capability (users, grading, submissions)

Serverless takes technical partitioning to the organizational level: a cloud vendor operates infrastructure as a service.

Specialization Through Outsourcing

The vendor specializes in infrastructure — container orchestration, auto-scaling, security patching.

You specialize in your domain — courses, assignments, grading.

Each side focuses on what they do best.

You gain operational simplicity and elasticity. You lose control: vendor abstractions constrain how you build, pricing determines costs at scale, and switching vendors means rewriting infrastructure code.

Event-Driven Execution

Serverless functions are triggered by events — not just HTTP requests. This enables reactive architectures where functions respond to changes in the system.

Four event sources (HTTP Request like POST /submissions, File Upload like test files, Database Change like new submission triggers, Schedule like nightly exports) shown as distinct icons at top, with arrows converging on a central serverless function. Multiple copies of the function appear (illustrating auto-scaling). Functions connect to downstream services.

The AWS Lambda SDK: A Programming Model

AWS Lambda provides a Java library that defines how your code interacts with the platform. The key abstraction is the RequestHandler interface — a generic interface you implement.

// From the AWS Lambda Java SDK (aws-lambda-java-core)
public interface RequestHandler<I, O> {
O handleRequest(I input, Context context);
}

The Generic Types

  • I (Input): What triggers your function

    • S3Event — file uploaded to S3
    • APIGatewayProxyRequestEvent — HTTP request
    • SQSEvent — message from a queue
    • ScheduledEvent — schedule/timer trigger
  • O (Output): What your function returns

    • APIGatewayProxyResponseEvent — HTTP response
    • String — simple text output
    • void — fire-and-forget

The Context Object

AWS passes metadata about the invocation:

context.getFunctionName();     // "ImageResizer"
context.getRemainingTimeInMillis(); // 29000
context.getAwsRequestId(); // unique ID
context.getMemoryLimitInMB(); // 512
context.getLogger(); // CloudWatch logger

Useful for logging, timeouts, debugging.

Notice: No main() method. No server setup. No port binding. You implement ONE method — AWS handles the rest.

Functions as a Service (FaaS)

Instead of deploying an application that runs continuously, you deploy functions that execute in response to events. Focus on the principles, not the syntax:

public class CreateSubmissionHandler
implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

public APIGatewayProxyResponseEvent handleRequest(
APIGatewayProxyRequestEvent request, Context context) {
// Create clients for this request (platform provides connection details via environment)
var db = new PostgresClient(System.getenv("DATABASE_URL"));
var storage = new S3Client(System.getenv("BUCKET_NAME"));

SubmissionRequest body = parseJson(request.getBody()); // Parse input
OIDCClaims claims = verifyGitHubOIDC(request.getHeaders().get("Authorization")); // Verify
Submission sub = db.insertSubmission(body.assignmentId(), claims.repo()); // Do the work

return new APIGatewayProxyResponseEvent()
.withStatusCode(200)
.withBody(toJson(new SubmissionResponse(sub.id(), storage.getUrl(...))));
}
}

① Event-Driven
Platform calls you when event arrives. No main(), no server.

② Stateless
No state persists between calls. Create what you need fresh.

③ Input → Output
Request in, response out. Pure transformation.

④ Platform Lifecycle
Container spins up, handler runs, recycles. Pay per call.

Traditional Server: Image Resize (Revisited)

Remember our warm-up? With a traditional server, YOU manage the infrastructure. Let's summarize what that ImageResizeServer requires:

Your code responsibilities:

  • main() method to start the server
  • Port binding and configuration
  • Health check endpoints for load balancers
  • Graceful shutdown handling
  • Multipart form parsing
  • Error handling and logging

Infrastructure responsibilities:

  • Server runs 24/7 (even at 3 AM with zero requests)
  • YOU restart it when it crashes
  • YOU scale horizontally (more instances)
  • YOU configure load balancing
  • YOU handle SSL certificates
  • YOU pay for idle time

The 15 lines of image resize logic are buried under all this operational work. What if you could just write the resize function and let someone else handle the rest?

Lambda: Same Feature, Less Code

Same image resize, but with Lambda. No main(), no health checks, no shutdown hooks. Just implement the handler — AWS runs it when a file arrives.

// Triggered automatically when a file is uploaded to the "uploads" S3 bucket
public class ImageResizeHandler implements RequestHandler<S3Event, String> {

// Optimization: reuse across "warm" invocations (see note below)
private final S3Client s3 = S3Client.create();

@Override
public String handleRequest(S3Event event, Context context) {
// S3 tells us which file was uploaded
var record = event.getRecords().get(0).getS3();
String bucket = record.getBucket().getName();
String key = record.getObject().getKey(); // e.g., "uploads/profile-123.jpg"

// Download the original image
byte[] original = s3.getObjectAsBytes(r -> r.bucket(bucket).key(key)).asByteArray();

// Resize it (using any image library)
byte[] thumbnail = ImageUtils.resize(original, 200, 200);

// Save the thumbnail to a different location
String thumbKey = key.replace("uploads/", "thumbnails/");
s3.putObject(r -> r.bucket(bucket).key(thumbKey), RequestBody.fromBytes(thumbnail));

return "Resized: " + key + " → " + thumbKey;
}
}

What you didn't write: No polling loop checking for new files. No server listening. No scaling config. Upload 1000 images? 1000 functions run in parallel.

Energy Efficiency Considerations

Serverless architecture has interesting sustainability implications that cut both ways.

Potential Energy Savings

  • No idle power: Monolith runs 24/7 even at 3 AM. Serverless consumes energy only when executing.

  • Shared infrastructure: Cloud providers achieve high utilization across thousands of customers. 80% utilization > 10%.

  • Right-sized execution: Functions get exactly the resources needed (modulo startup overhead).

Potential Energy Costs

  • Cold start overhead: Spinning up new containers has energy costs warm monoliths avoid.

  • Per-request overhead: Each invocation goes through routing, logging, billing infrastructure.

  • Distributed chattiness: Many small functions calling each other = network energy costs.

The architectural lesson: batch operations when possible. Pawtograder's submitFeedback() sends all test results in one call, not 100 separate calls. This saves latency, cost, AND energy.

When Does Serverless Fit?

Split illustration showing serverless fit. LEFT (green): Image resize (S3 trigger), Welcome emails (database trigger), Webhook handlers (GitHub events), Submissions at deadline (bursty traffic). RIGHT (red): Video encoding (timeout), Multiplayer games (cold starts), In-memory cache (stateless problem), High-frequency trading (sustained load). Center shows the three key questions: scaling, latency, ops.

Information Hiding at Scale

A zoom-out sequence showing information hiding at four scales: INNERMOST (L6) shows a Submission class with private fields hidden behind public methods. SECOND (L16) shows the class inside hexagonal architecture with ports (SubmissionRepositoryPort, FileStoragePort) and adapters (PostgresSubmissionAdapter, S3StorageAdapter). THIRD (L18) shows the Pawtograder service with modules hidden behind API endpoints. OUTERMOST (L21) shows the service in the cloud where GitHub Actions just calls POST /submissions. Tagline: L6 said hide what might change — same principle at every scale.

GitHub Actions calls POST /submissions and POST /feedback. It doesn't know — or care — whether these are Edge Functions, Lambda, or a traditional server. That's information hiding at the architectural level.

Bringing It Together: L19 → L20 → L21

LectureQuestionKey Insight
L19How do we organize code?Architectural styles emerge from quality attribute requirements. Monolith-first is usually right.
L20What changes over networks?The eight fallacies. Every network call can fail, be slow, or be intercepted.
L21What if someone else manages infra?Serverless = technical partitioning with a vendor. Same principles, different operational model.

The thread connecting all three:

Same design principles at every scale:

  • Information hiding (L6)
  • Coupling and cohesion (L7)
  • Hexagonal architecture (L16)
  • Quality attribute tradeoffs (L19)

The practical takeaway:

No single architecture is right for everything. Pawtograder's hybrid approach demonstrates this — serverless API, managed compute for grading, PostgreSQL for domain logic.

Same Questions, Every Scale

At every level — class, module, service, system — you ask the same four questions:

QuestionWhat It Determines
What changes independently?Where to draw boundaries
Who needs to know?What the interface should hide
What can fail?How explicit your error handling must be
What are you trading?Whether the tradeoff is worth it

L6-L7

Classes & methods

Private fields, cohesive modules

L16-L18

Services & boundaries

Ports, adapters, APIs

L20

Network boundaries

Fallacies, failures, security

L21

Vendor boundaries

Managed infra, tradeoffs

The Architect's Toolkit

You now have a framework for approaching any system:

When you see a boundary, ask:

  • What's hiding behind it?
  • Who owns each side?
  • What happens when communication fails?

When you're drawing a boundary, ask:

  • What changes independently?
  • Who needs to know about what?
  • Is this a one-way door or two-way door?

When evaluating an architecture, ask:

  • What quality attributes drove these choices?
  • What tradeoffs were accepted?
  • What would break if requirements changed?

When choosing complexity, ask:

  • Do I have a specific problem this solves?
  • Can I start simpler and evolve?
  • What's the cost of being wrong?

The principles scale. The details change. The questions stay the same.

The Quality Without a Name

In L18, we mentioned Christopher Alexander — the architect whose work inspired software design patterns.

Alexander's insight:

The most livable, enduring structures emerge through gradual, adaptive growth — not grand master plans.

You don't design the perfect building. You create the conditions for one to emerge.

The same is true for software:

  • Start with good boundaries (L18)
  • Let styles emerge from understanding (L19)
  • Respect what networks add (L20)
  • Choose vendors consciously (L21)

Then let the system grow within those constraints.

Alexander called this ineffable quality that makes spaces feel alive the "Quality Without a Name." You can't define it precisely — but you know it when you see it. Well-designed software has it too.

What's Next: Teams and Collaboration

We've been implicitly assuming a single developer making all decisions. Real software is built by teams — and team structure has a big impact on how software gets built.

L22: Teams and Collaboration

  • How teams organize, communicate, coordinate
  • Why org structure shapes system structure
  • Architectural boundaries often become team boundaries
  • Strategies for effective collaboration

The connection:

Today we saw serverless as outsourcing infrastructure to a specialist vendor — your team focuses on domain logic, they focus on infra.

That's an organizational decision as much as a technical one.