13 · Microservices & Orchestration
The distributed-systems context a senior frontend engineer must understand to design BFFs, reason about consistency, and survive system-design interviews: orchestration vs choreography, sagas, gateways, service mesh, event-driven architecture, and CQRS. Written for the frontend’s perspective on the backend it talks to.
Positioning
You don’t need to build microservices to be a strong frontend engineer, but you must understand them: your BFF (12) sits on top of them, your data is eventually consistent because of them, your MFE boundaries (09) mirror them, and system-design interviews for senior roles assume fluency. This file is the distributed-systems literacy layer — enough to design good frontend integrations and to hold your own in architecture discussions, without pretending to be a backend specialist.
Foundations
Why microservices (and why often not)
Microservices decompose a system into independently deployable services owned by autonomous teams, organized around business capabilities / bounded contexts (10). Benefits: independent deployment, team autonomy, targeted scaling, fault isolation, technology heterogeneity. Costs: distributed-systems complexity, network failure modes, data consistency headaches, operational overhead, harder debugging. The honest default for most teams is a modular monolith first; extract services when team-scale or scaling pressure justifies it (same logic as MFEs). Conway’s Law governs both: your architecture will mirror your org chart, so design the org and the system together.
Communication styles
- Synchronous (request/response: REST, gRPC) — simple mental model, but creates temporal coupling and failure-cascade risk.
- Asynchronous (messaging/events: Kafka, RabbitMQ, SQS) — services emit/consume events; looser coupling, better resilience, but eventual consistency and harder reasoning.
Deep dive
1. Orchestration vs Choreography (the central distinction)
How do you coordinate a multi-service business process (e.g., place order → reserve inventory → charge payment → ship)?
- Orchestration — a central orchestrator explicitly directs each step (“command and control”). One service (or a workflow engine like Temporal/Camunda) tells others what to do and tracks progress.
- Pros: explicit, observable flow; centralized error handling; easy to see the whole process.
- Cons: the orchestrator can become a coupling hub / single point of complexity; it “knows too much.”
- Choreography — no central brain; each service reacts to events and emits its own. The process emerges from local rules (“event-driven”).
- Pros: maximal decoupling, autonomy, resilience; easy to add new reactors.
- Cons: no single place shows the whole flow; emergent behavior is hard to debug/observe; risk of cyclic event storms.
- In practice: hybrid. Choreography between bounded contexts, orchestration within one. Use orchestration when the process is complex and needs visibility/compensation; choreography when you want loose coupling and independent evolution. (Direct analogy to MFE shells: a shell orchestrates; an event-bus is choreography.)
2. Sagas — distributed transactions without 2PC
You can’t hold an ACID transaction across services. A saga is a sequence of local transactions where each step has a compensating action to undo it if a later step fails (semantic rollback, not real rollback).
- Orchestration-based saga — an orchestrator invokes steps and triggers compensations on failure.
- Choreography-based saga — services emit events; failure events trigger compensating handlers.
- Frontend impact: this is why you see eventual consistency and “pending” states. An order may be “placed” before payment confirms; your UI must model intermediate/optimistic states (
05’suseOptimistic) and handle eventual failure (a later “payment failed” update).
3. API Gateway & BFF
- API Gateway — single entry point: routing, auth, rate-limiting, TLS termination, request aggregation. Decouples clients from the internal service topology.
- BFF (
12) — per-experience gateway with UI-specific aggregation/shaping. Often sits behind or alongside the gateway.
4. Service Mesh
Infrastructure layer (e.g., Istio, Linkerd) that handles service-to-service concerns — mTLS, retries, timeouts, circuit breaking, load balancing, and observability — via sidecar proxies, outside application code. It moves resilience patterns (12) into the platform. Frontend relevance: low directly, but it’s why backend teams talk about “the mesh handling retries” and why your BFF may not need to implement all resilience itself.
5. Event-Driven Architecture (EDA)
Services communicate by producing/consuming events on a broker (Kafka, Pulsar, NATS).
- Event notification (thin event, fetch details) vs event-carried state transfer (fat event carries the data) vs event sourcing (the event log is the source of truth; state is a fold over events).
- Pub/sub, topics, partitions, consumer groups, ordering guarantees, at-least-once vs exactly-once delivery, idempotent consumers.
- Frontend touchpoints: real-time UIs consume these via WebSockets/SSE (
04,18) pushed from a gateway; “live” dashboards and notifications are EDA surfacing to the client.
6. CQRS & Event Sourcing
- CQRS (Command Query Responsibility Segregation) — separate the write model (commands, normalized, validated) from the read model (queries, denormalized, optimized for views). The read side can be a materialized view tailored per screen.
- Event Sourcing — persist state as an append-only event log; rebuild current state by replaying. Gives audit/time-travel; costs complexity, eventual consistency, and schema-evolution pain.
- Frontend impact: read models are often exactly what a BFF/GraphQL serves; “the read model is eventually consistent” explains why a UI sometimes shows stale data right after a write — design for it (optimistic UI + reconciliation,
05/06).
7. Consistency, CAP, and what the UI must assume
- CAP / PACELC: under partition you trade consistency vs availability; even without partitions you trade latency vs consistency. Distributed systems are usually eventually consistent.
- The senior frontend takeaway: never assume read-your-writes. After a mutation, the next read may not reflect it. Use optimistic updates, cache invalidation with refetch (TanStack Query,
06), and idempotency keys for retried mutations.
Worked example: an order saga and what the UI sees
User clicks "Place order"
│ UI: optimistic "Order placed (processing…)" (useOptimistic)
▼
[Order svc] create order (PENDING) ──event──▶ [Inventory svc] reserve stock
│ success ──event──▶ [Payment svc] charge
│ │ FAIL
│ ▼
└──compensate: release ◀── emit "payment_failed"
▼
UI receives "payment_failed" (via SSE/poll) → revert optimistic state → show actionable error
The UI must model: optimistic immediate feedback, a real “processing” state, and graceful handling of an asynchronous failure that arrives after the request “succeeded.” This is the everyday consequence of sagas + eventual consistency for frontend engineers.
Pitfalls & gotchas (frontend-facing)
- Assuming synchronous, immediately-consistent backends — designing UIs that break under eventual consistency.
- No optimistic/intermediate states — users stare at spinners during multi-service flows.
- Non-idempotent mutations + retries — double-charges/double-posts; use idempotency keys.
- Treating a distributed monolith as microservices — services that must deploy together give you all the cost and none of the benefit.
- Chatty choreography with no observability — nobody can explain what happened across services (push for distributed tracing).
- Over-decomposing too early — premature microservices/MFEs; start modular.
- Ignoring partial failure — assuming “the API call worked” means the whole business process completed.
Interview questions
- Orchestration vs choreography — define, trade off, when each?
- What is a saga? Compensating transaction? Orchestrated vs choreographed saga?
- Why can’t you use a normal ACID transaction across services?
- API gateway vs BFF vs service mesh — what does each handle?
- What is CQRS, and how might a read model map to a BFF/UI?
- Event sourcing — benefits and costs? How does it surface as stale UI data?
- Explain eventual consistency and how a UI should handle “read-your-writes” failing.
- At-least-once delivery → what must consumers be? (Idempotent.)
- How does Conway’s Law connect microservices and micro-frontends?
- When would you not use microservices?
Recommendations
- Treat the backend as eventually consistent by default; design UIs with optimistic + reconciled states (
05,06). - Make mutations idempotent (idempotency keys) so retries are safe.
- Expect orchestration within a context, choreography between contexts; mirror this in MFE shells/event buses (
09). - Lean on read models / BFFs to get screen-shaped data; don’t make the client join across services.
- Push for distributed tracing so cross-service failures are debuggable (your BFF should propagate trace headers).
- Know enough CAP/saga/CQRS vocabulary to design integrations and pass senior system-design interviews — you don’t need to run Kafka, you need to reason about it.
Books & references
- “Building Microservices” (2nd ed.) — Sam Newman. The best single overview; communication, decomposition, BFF, resilience.
- “Microservices Patterns” — Chris Richardson (Manning). Sagas, API composition, CQRS, event sourcing — pattern by pattern. The reference.
- “Designing Data-Intensive Applications” — Martin Kleppmann (DDIA). The deep foundation on consistency, replication, partitioning, stream processing. Essential and timeless.
- “Release It!” — Michael Nygard. Stability patterns (circuit breaker, bulkhead, timeout) the BFF inherits.
- “Enterprise Integration Patterns” — Hohpe & Woolf. Messaging patterns underlying EDA.
- microservices.io — Chris Richardson’s pattern catalog (free, canonical).
- martinfowler.com — articles on microservices, CQRS, event sourcing, and the “Monolith First” caution.
- Temporal / Camunda docs — modern orchestration/workflow engines.
Connections
12-bff-and-data-enrichment.md— the BFF is the frontend’s adapter onto these services; resilience patterns originate here.09-micro-frontends.md— MFEs are the UI mirror of microservices; orchestration/choreography reappears as shell/event-bus.10-frontend-architecture.md— DDD bounded contexts and Conway’s Law drive decomposition on both tiers.05-react-internals-and-patterns.md—useOptimistic/transitions are how the UI absorbs eventual consistency.06-state-management-and-stores.md— TanStack Query cache invalidation models the read-side staleness.18-networking-and-protocols.md— WebSockets/SSE are how event-driven backends reach the browser.