24 · System & Infrastructure Architecture

The layer beneath the application: how systems scale and stay available, and the infrastructure that runs them — load balancing, caching tiers, databases, message queues, CDNs/edge, containers and orchestration, CI/CD, infrastructure-as-code, observability, and deployment strategies. Written for a frontend engineer who must design integrations, reason about system design interviews, and ship to production reliably.

Positioning

A senior frontend engineer isn’t a DevOps/SRE, but operates inside a system and must understand it: where your app is served from, how it scales, why the API is sometimes slow, what a CDN/edge does to your caching (18), how your deploy reaches users, and how to read a system-design interview. This file gives the system-design vocabulary (scaling, availability, consistency, caching, queues) and the infra literacy (containers, CI/CD, IaC, observability, deploy strategies) that senior frontend roles assume. It complements software architecture (10, 20–22) and the decision-making file (25).

Foundations: the qualities you’re designing for

System architecture trades off a handful of qualities:

Scalability — handle growth in load/data without redesign.
Availability — stay up (measured in “nines”: 99.9% ≈ 8.7h/yr down; 99.99% ≈ 52min).
Reliability / Fault tolerance — keep working despite component failures.
Performance / Latency — fast responses (15, 18).
Consistency — all readers see the same data (vs eventual consistency, 13).
Maintainability, Security (17), Cost.

Two master trade-offs frame everything:

CAP theorem — under a network Partition you must choose Consistency or Availability. PACELC extends it: else (no partition), trade Latency vs Consistency. Distributed systems are usually eventually consistent by choice — which is why your UI must tolerate stale reads (13).
Vertical vs horizontal scaling — scale up (bigger machine: simple, has a ceiling, single point of failure) vs scale out (more machines: near-unlimited, needs statelessness + load balancing + coordination). Modern systems scale out; the enabling requirement is statelessness (no per-user state on a given server — push it to a shared store/session service).

Deep dive: system building blocks

1. Load balancing

Distributes traffic across many server instances (round-robin, least-connections, IP-hash, latency-based). Enables horizontal scaling and availability (route around dead instances via health checks). Lives at L4 (TCP) or L7 (HTTP, can route by path/host — relevant to MFE/zone routing, 09/08). Adds the need for statelessness or sticky sessions.

2. Caching (the highest-leverage performance tool, at every tier)

Store computed/fetched results closer to the consumer. Tiers, outer→inner:

Browser cache + HTTP caching (18) — Cache-Control, ETag, stale-while-revalidate.
CDN / edge cache — static assets and increasingly dynamic/edge-rendered content at PoPs near users (18).
Application / in-memory cache — Redis/Memcached for sessions, computed results, rate-limit counters, hot data.
Database cache — query/result caches, materialized views (CQRS read models, 13). Core concerns: invalidation (“one of the two hard problems”), TTL, eviction (LRU/LFU), cache stampede (many misses at once → use request coalescing/locks), and write strategies (write-through, write-back, cache-aside). A BFF (12) is a common caching choke point.

3. Databases

Relational (SQL) — Postgres/MySQL. Strong consistency, ACID transactions, joins, schemas. Default for most apps; most teams over-reach for NoSQL too early.
NoSQL families: document (MongoDB), key-value (Redis, DynamoDB), wide-column (Cassandra), graph (Neo4j). Chosen for scale-out, flexible schema, or specific access patterns; usually eventually consistent and join-light.
Concepts to know: ACID vs BASE, indexing (and how a missing index makes a query O(n)), N+1 query problem (the backend twin of the frontend N+1, 12), replication (read replicas for read scaling), sharding/partitioning (horizontal data split for write scaling), and transactions vs distributed sagas (13).
Frontend touchpoint: this is why some data is strongly consistent and some isn’t; why “search” might hit a different store (Elasticsearch) than “checkout.”

4. Message queues & event streaming

Kafka, RabbitMQ, SQS, NATS decouple producers from consumers for asynchronous, resilient processing (13). Enable: load leveling (absorb spikes), background jobs (emails, image processing), and event-driven architectures. Guarantees to know: at-least-once vs exactly-once delivery, ordering, idempotent consumers, dead-letter queues. Frontend touchpoint: real-time updates pushed to the browser via WebSocket/SSE (04, 18) often originate from these streams; “your order is processing” reflects async queue work.

5. API layer

REST, GraphQL (12), gRPC (service-to-service, binary/HTTP2), tRPC (TS end-to-end). An API gateway is the single entry point (routing, auth, rate-limiting, 13); a BFF is the per-experience variant (12).
Rate limiting (token bucket/leaky bucket), API versioning, idempotency keys for safe retries.

6. CDN & edge compute

CDNs cache near users; edge runtimes (Cloudflare Workers, Vercel Edge, 08) run code at PoPs for SSR/personalization/auth with minimal latency — the infra that makes streaming SSR/RSC fast globally (07).

Deep dive: infrastructure & delivery

7. Containers & orchestration

Docker packages an app + its dependencies into a portable image that runs identically anywhere — solves “works on my machine,” and is the unit of modern deployment.
Kubernetes (K8s) orchestrates containers at scale: scheduling, self-healing (restart failed pods), horizontal autoscaling, rolling updates, service discovery, secrets/config. Heavyweight; many frontend teams instead use PaaS (Vercel/Netlify/Render/Fly) that hide K8s.
Service mesh (13) handles service-to-service mTLS/retries/observability via sidecars.

8. CI/CD (your daily infra)

CI — on every push: install, lint/typecheck, test (16), build (14), and produce artifacts. Fast feedback; gate merges.
CD — automatically deploy passing builds to staging/production. Continuous delivery (one click to prod) vs continuous deployment (fully automatic).
Pipeline shape that works (16): static checks → unit → integration → build → deploy preview → E2E on preview → promote. Tools: GitHub Actions, GitLab CI (Rian’s context), CircleCI. Frontend specifics: preview deployments per PR, caching dependencies/build, bundle-size budgets (14/15) as a gate, and watch CI memory on coverage providers.

9. Deployment strategies (how new code reaches users safely)

Rolling — replace instances gradually; default in K8s.
Blue-green — two identical environments; switch traffic from blue (old) to green (new) instantly; instant rollback by switching back.
Canary — release to a small % of users, watch metrics, ramp up or roll back. Pairs with feature flags (LaunchDarkly/Unleash) for decoupling deploy from release and gradual rollout/kill-switch.
Frontend note: immutable, content-hashed assets (18) make frontend deploys atomic; keep old chunks available so in-flight sessions don’t 404 mid-deploy.

10. Infrastructure as Code (IaC)

Define infra in version-controlled code, not clicks: Terraform (declarative, multi-cloud), Pulumi (real languages), AWS CDK, CloudFormation. Benefits: reproducible, reviewable, auditable environments; no “snowflake” servers. GitOps extends this — the repo is the source of truth for infra state.

11. Observability (you can’t fix what you can’t see)

Three pillars: logs (events), metrics (numeric time series — latency, error rate, throughput; the “RED”/“USE” methods), traces (a request’s path across services — distributed tracing via OpenTelemetry, essential for microservices/BFF debugging, 12/13). Add alerting on SLOs and error tracking (Sentry) + RUM (15) for the frontend. OpenTelemetry is the vendor-neutral standard.

12. Frontend deployment infra specifically

Static/SSG → object storage (S3) + CDN (CloudFront) or a Jamstack host (Netlify).
SSR/RSC → Node/edge runtime (Vercel, Cloudflare, a container on K8s) (07, 08).
MFEs → independently deployed remotes behind a CDN, discovered via a manifest (09).
Concerns: cache-busting via hashed filenames, atomic deploys, environment config injection, and not breaking long-lived sessions on deploy.

Worked example: a scalable web system (system-design sketch)

                         ┌─────────── CDN / Edge (static + cache + edge SSR) ──────────┐
   Users ───DNS(anycast)─▶                                                              │
                         └──────────────────────────┬───────────────────────────────────┘
                                                    ▼
                                          Load Balancer (L7, health checks)
                                                    │  (stateless app tier → scale out)
                        ┌───────────────────────────┼───────────────────────────┐
                        ▼                            ▼                           ▼
                   App/SSR node                 App/SSR node                BFF / API gateway
                        │                            │                           │
                        └──────────── Redis (sessions, cache) ───────────────────┤
                                                    │                            ▼
                          Primary DB (writes) ──replication──▶ Read replicas   Services
                                   │                                             │
                                   └────────── events ──▶ Kafka ──▶ async workers (email, search index)
   Cross-cutting: CI/CD pipeline · IaC (Terraform) · Observability (OTel: logs/metrics/traces) · feature flags

Reading it: scale out behind a load balancer (stateless apps, sessions in Redis), cache at CDN/edge/Redis tiers, separate read replicas from the write primary, push slow work to queues, and keep the whole thing reproducible (IaC) and observable (OTel). This is the shape behind most system-design answers.

Pitfalls & gotchas

Stateful app servers blocking horizontal scaling — externalize session/state.
Reaching for microservices/NoSQL/K8s prematurely — huge operational cost; start simple (25).
Cache invalidation bugs — stale data, or stampedes on expiry; plan TTL + coalescing.
No idempotency on retried operations — duplicates (13).
Ignoring the N+1 query on the backend feeding your UI — slow APIs no frontend trick fixes.
Treating eventual consistency as immediate — UIs that break on stale reads (13).
No observability — flying blind; add tracing/metrics/error-tracking before you need them.
Deploys that 404 old chunks — keep prior hashed assets during/after deploy.
Snowflake infra (hand-clicked) — unreproducible; use IaC.

Interview questions

Vertical vs horizontal scaling — trade-offs and the statelessness requirement.
State the CAP theorem (and PACELC). What does choosing AP vs CP mean for a UI?
Where can you cache in a web stack, and what are the invalidation/stampede concerns?
SQL vs NoSQL — when each? What are replication and sharding?
What problem do message queues solve? At-least-once vs exactly-once?
What do Docker and Kubernetes each do?
Blue-green vs canary vs rolling deploys — and where feature flags fit.
What is Infrastructure as Code and why use it?
Name the three pillars of observability and what distributed tracing buys you.
Sketch a scalable system for a high-traffic web app.

Recommendations

Design app tiers to be stateless and scale out behind a load balancer; keep state in shared stores.
Cache at every tier with deliberate TTL/invalidation; protect against stampedes.
Default to relational storage; adopt NoSQL/sharding only for proven scale/access-pattern needs.
Use queues for async/spiky work; make consumers idempotent.
Containerize; reach for managed PaaS over raw K8s unless you need K8s.
Treat CI/CD + IaC + observability as part of the product: PR previews, bundle budgets (15), tracing (OTel), error tracking (Sentry).
Ship frontend with atomic, hash-busted deploys and feature flags to separate deploy from release.
Match complexity to need — start simple (25); add infrastructure when load/teams justify it.

Books & references

“Designing Data-Intensive Applications” — Martin Kleppmann (DDIA). The single best systems book: consistency, replication, partitioning, queues, streams. Essential. (Shared with 13.)
“System Design Interview” Vol 1 & 2 — Alex Xu. The standard interview-prep books; build the vocabulary above into reusable templates. (ByteByteGo is the companion site/newsletter.)
“Building Microservices” — Sam Newman; “Release It!” — Michael Nygard (stability/ops patterns) (12, 13).
“The DevOps Handbook” / “Accelerate” — Kim/Forsgren et al. CI/CD, delivery performance, and the metrics that matter.
“Site Reliability Engineering” — Google (free at sre.google). SLOs, observability, operating at scale.
Docker docs, Kubernetes docs, Terraform docs, OpenTelemetry docs — primary infra references.
AWS/GCP Well-Architected Framework — vendor-neutral-ish principles for reliability, performance, cost, security.

Connections

13-microservices-and-orchestration.md — the distributed-systems patterns (sagas, CQRS, EDA) that run on this infra.
25-architecture-decisions-and-tradeoffs.md — monolith vs microservices, when to add this complexity.
18-networking-and-protocols.md — CDNs, edge, HTTP caching, DNS, TLS at the transport layer.
12-bff-and-data-enrichment.md — the BFF as caching/aggregation tier and tracing node.
15-performance-and-core-web-vitals.md — RUM, edge/CDN, caching as performance levers; bundle budgets in CI.
08-nextjs-and-meta-frameworks.md — edge runtime, deployment targets, caching layers.
16-testing.md — where tests sit in the CI/CD pipeline.