13 · Microservices & Orchestration

The distributed-systems context a senior frontend engineer must understand to design BFFs, reason about consistency, and survive system-design interviews: orchestration vs choreography, sagas, gateways, service mesh, event-driven architecture, and CQRS. Written for the frontend’s perspective on the backend it talks to.

Positioning

You don’t need to build microservices to be a strong frontend engineer, but you must understand them: your BFF (12) sits on top of them, your data is eventually consistent because of them, your MFE boundaries (09) mirror them, and system-design interviews for senior roles assume fluency. This file is the distributed-systems literacy layer — enough to design good frontend integrations and to hold your own in architecture discussions, without pretending to be a backend specialist.

Foundations

Why microservices (and why often not)

Microservices decompose a system into independently deployable services owned by autonomous teams, organized around business capabilities / bounded contexts (10). Benefits: independent deployment, team autonomy, targeted scaling, fault isolation, technology heterogeneity. Costs: distributed-systems complexity, network failure modes, data consistency headaches, operational overhead, harder debugging. The honest default for most teams is a modular monolith first; extract services when team-scale or scaling pressure justifies it (same logic as MFEs). Conway’s Law governs both: your architecture will mirror your org chart, so design the org and the system together.

Communication styles

Synchronous (request/response: REST, gRPC) — simple mental model, but creates temporal coupling and failure-cascade risk.
Asynchronous (messaging/events: Kafka, RabbitMQ, SQS) — services emit/consume events; looser coupling, better resilience, but eventual consistency and harder reasoning.

Deep dive

1. Orchestration vs Choreography (the central distinction)

How do you coordinate a multi-service business process (e.g., place order → reserve inventory → charge payment → ship)?

Orchestration — a central orchestrator explicitly directs each step (“command and control”). One service (or a workflow engine like Temporal/Camunda) tells others what to do and tracks progress.
- Pros: explicit, observable flow; centralized error handling; easy to see the whole process.
- Cons: the orchestrator can become a coupling hub / single point of complexity; it “knows too much.”
Choreography — no central brain; each service reacts to events and emits its own. The process emerges from local rules (“event-driven”).
- Pros: maximal decoupling, autonomy, resilience; easy to add new reactors.
- Cons: no single place shows the whole flow; emergent behavior is hard to debug/observe; risk of cyclic event storms.
In practice: hybrid. Choreography between bounded contexts, orchestration within one. Use orchestration when the process is complex and needs visibility/compensation; choreography when you want loose coupling and independent evolution. (Direct analogy to MFE shells: a shell orchestrates; an event-bus is choreography.)

2. Sagas — distributed transactions without 2PC

You can’t hold an ACID transaction across services. A saga is a sequence of local transactions where each step has a compensating action to undo it if a later step fails (semantic rollback, not real rollback).

Orchestration-based saga — an orchestrator invokes steps and triggers compensations on failure.
Choreography-based saga — services emit events; failure events trigger compensating handlers.
Frontend impact: this is why you see eventual consistency and “pending” states. An order may be “placed” before payment confirms; your UI must model intermediate/optimistic states (05’s useOptimistic) and handle eventual failure (a later “payment failed” update).

3. API Gateway & BFF

API Gateway — single entry point: routing, auth, rate-limiting, TLS termination, request aggregation. Decouples clients from the internal service topology.
BFF (12) — per-experience gateway with UI-specific aggregation/shaping. Often sits behind or alongside the gateway.

4. Service Mesh

Infrastructure layer (e.g., Istio, Linkerd) that handles service-to-service concerns — mTLS, retries, timeouts, circuit breaking, load balancing, and observability — via sidecar proxies, outside application code. It moves resilience patterns (12) into the platform. Frontend relevance: low directly, but it’s why backend teams talk about “the mesh handling retries” and why your BFF may not need to implement all resilience itself.

5. Event-Driven Architecture (EDA)

Services communicate by producing/consuming events on a broker (Kafka, Pulsar, NATS).

Event notification (thin event, fetch details) vs event-carried state transfer (fat event carries the data) vs event sourcing (the event log is the source of truth; state is a fold over events).
Pub/sub, topics, partitions, consumer groups, ordering guarantees, at-least-once vs exactly-once delivery, idempotent consumers.
Frontend touchpoints: real-time UIs consume these via WebSockets/SSE (04, 18) pushed from a gateway; “live” dashboards and notifications are EDA surfacing to the client.

6. CQRS & Event Sourcing

CQRS (Command Query Responsibility Segregation) — separate the write model (commands, normalized, validated) from the read model (queries, denormalized, optimized for views). The read side can be a materialized view tailored per screen.
Event Sourcing — persist state as an append-only event log; rebuild current state by replaying. Gives audit/time-travel; costs complexity, eventual consistency, and schema-evolution pain.
Frontend impact: read models are often exactly what a BFF/GraphQL serves; “the read model is eventually consistent” explains why a UI sometimes shows stale data right after a write — design for it (optimistic UI + reconciliation, 05/06).

7. Consistency, CAP, and what the UI must assume

CAP / PACELC: under partition you trade consistency vs availability; even without partitions you trade latency vs consistency. Distributed systems are usually eventually consistent.
The senior frontend takeaway: never assume read-your-writes. After a mutation, the next read may not reflect it. Use optimistic updates, cache invalidation with refetch (TanStack Query, 06), and idempotency keys for retried mutations.

Worked example: an order saga and what the UI sees

User clicks "Place order"
  │  UI: optimistic "Order placed (processing…)"  (useOptimistic)
  ▼
[Order svc] create order (PENDING) ──event──▶ [Inventory svc] reserve stock
                                                     │ success ──event──▶ [Payment svc] charge
                                                     │                          │ FAIL
                                                     │                          ▼
                                                     └──compensate: release ◀── emit "payment_failed"
  ▼
UI receives "payment_failed" (via SSE/poll) → revert optimistic state → show actionable error

The UI must model: optimistic immediate feedback, a real “processing” state, and graceful handling of an asynchronous failure that arrives after the request “succeeded.” This is the everyday consequence of sagas + eventual consistency for frontend engineers.

Pitfalls & gotchas (frontend-facing)

Assuming synchronous, immediately-consistent backends — designing UIs that break under eventual consistency.
No optimistic/intermediate states — users stare at spinners during multi-service flows.
Non-idempotent mutations + retries — double-charges/double-posts; use idempotency keys.
Treating a distributed monolith as microservices — services that must deploy together give you all the cost and none of the benefit.
Chatty choreography with no observability — nobody can explain what happened across services (push for distributed tracing).
Over-decomposing too early — premature microservices/MFEs; start modular.
Ignoring partial failure — assuming “the API call worked” means the whole business process completed.

Interview questions

Orchestration vs choreography — define, trade off, when each?
What is a saga? Compensating transaction? Orchestrated vs choreographed saga?
Why can’t you use a normal ACID transaction across services?
API gateway vs BFF vs service mesh — what does each handle?
What is CQRS, and how might a read model map to a BFF/UI?
Event sourcing — benefits and costs? How does it surface as stale UI data?
Explain eventual consistency and how a UI should handle “read-your-writes” failing.
At-least-once delivery → what must consumers be? (Idempotent.)
How does Conway’s Law connect microservices and micro-frontends?
When would you not use microservices?

Recommendations

Treat the backend as eventually consistent by default; design UIs with optimistic + reconciled states (05, 06).
Make mutations idempotent (idempotency keys) so retries are safe.
Expect orchestration within a context, choreography between contexts; mirror this in MFE shells/event buses (09).
Lean on read models / BFFs to get screen-shaped data; don’t make the client join across services.
Push for distributed tracing so cross-service failures are debuggable (your BFF should propagate trace headers).
Know enough CAP/saga/CQRS vocabulary to design integrations and pass senior system-design interviews — you don’t need to run Kafka, you need to reason about it.

Books & references

“Building Microservices” (2nd ed.) — Sam Newman. The best single overview; communication, decomposition, BFF, resilience.
“Microservices Patterns” — Chris Richardson (Manning). Sagas, API composition, CQRS, event sourcing — pattern by pattern. The reference.
“Designing Data-Intensive Applications” — Martin Kleppmann (DDIA). The deep foundation on consistency, replication, partitioning, stream processing. Essential and timeless.
“Release It!” — Michael Nygard. Stability patterns (circuit breaker, bulkhead, timeout) the BFF inherits.
“Enterprise Integration Patterns” — Hohpe & Woolf. Messaging patterns underlying EDA.
microservices.io — Chris Richardson’s pattern catalog (free, canonical).
martinfowler.com — articles on microservices, CQRS, event sourcing, and the “Monolith First” caution.
Temporal / Camunda docs — modern orchestration/workflow engines.

Connections

12-bff-and-data-enrichment.md — the BFF is the frontend’s adapter onto these services; resilience patterns originate here.
09-micro-frontends.md — MFEs are the UI mirror of microservices; orchestration/choreography reappears as shell/event-bus.
10-frontend-architecture.md — DDD bounded contexts and Conway’s Law drive decomposition on both tiers.
05-react-internals-and-patterns.md — useOptimistic/transitions are how the UI absorbs eventual consistency.
06-state-management-and-stores.md — TanStack Query cache invalidation models the read-side staleness.
18-networking-and-protocols.md — WebSockets/SSE are how event-driven backends reach the browser.