How to Read This Post
Each scenario shows a diagram first, then a short note on why the pattern matters. Complexity increases as you scroll.
| Pattern | Approach | Consistency | Best For |
|---|---|---|---|
| 2PC | Coordinator-driven | Strong (atomic) | Short-lived, cross-DB |
| 3PC | Non-blocking 2PC | Strong (reduces blocking) | Theoretical improvement |
| Saga Choreography | Event-driven chain | Eventual | Loosely coupled services |
| Saga Orchestration | Central coordinator | Eventual | Complex workflows |
| Outbox Pattern | Local TX + relay | At-least-once | Event publishing guarantee |
Level 1 — Foundations
1. The Distributed Transaction Problem
Create order ✓"] C --> PS["Payment Service
Charge $50 ✓"] C --> IS["Inventory Service
Reserve item ✗ FAILED"] IS -->|"What now?"| PROBLEM{{"Order created,
payment charged,
but no inventory!"}} PROBLEM --> Q1["Refund payment?"] PROBLEM --> Q2["Cancel order?"] PROBLEM --> Q3["Retry inventory?"] PROBLEM --> Q4["All of the above
in what order?"] style C fill:#4a9eff,stroke:#2d7ed8,color:#fff style OS fill:#51cf66,stroke:#37b24d,color:#fff style PS fill:#51cf66,stroke:#37b24d,color:#fff style IS fill:#ff6b6b,stroke:#d44,color:#fff style PROBLEM fill:#ffd43b,stroke:#f59f00,color:#333 style Q1 fill:#546e7a,stroke:#90a4ae,color:#fff style Q2 fill:#546e7a,stroke:#90a4ae,color:#fff style Q3 fill:#546e7a,stroke:#90a4ae,color:#fff style Q4 fill:#a29bfe,stroke:#6c5ce7,color:#fff
The core problem. In a monolith, a single database transaction guarantees all-or-nothing. In microservices, each service has its own database — there’s no single transaction boundary. If one step fails after others succeed, you have an inconsistent state.
2. ACID vs BASE
All or nothing"] A2["Consistency
Valid state transitions"] A3["Isolation
Concurrent TXs don't interfere"] A4["Durability
Committed = permanent"] end subgraph BASE["BASE (Distributed)"] direction TB B1["Basically Available
System always responds"] B2["Soft state
State may change over time"] B3["Eventually consistent
Converges to consistency"] end ACID -->|"single DB"| MONO["Monolith
PostgreSQL, MySQL"] BASE -->|"distributed"| MICRO["Microservices
Saga, Event Sourcing"] style A1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style A2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style A3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style A4 fill:#4a9eff,stroke:#2d7ed8,color:#fff style B1 fill:#51cf66,stroke:#37b24d,color:#fff style B2 fill:#51cf66,stroke:#37b24d,color:#fff style B3 fill:#51cf66,stroke:#37b24d,color:#fff style MONO fill:#546e7a,stroke:#90a4ae,color:#fff style MICRO fill:#546e7a,stroke:#90a4ae,color:#fff
ACID vs BASE. ACID gives you strong guarantees within a single database. BASE trades immediate consistency for availability and partition tolerance. Most distributed transaction patterns implement BASE semantics — eventual consistency through compensating actions.
3. Local vs Distributed Transactions
Rollback on failure ✓ end rect rgb(60, 30, 30) Note over C: Distributed Transaction (multiple services) participant OS as Order DB participant IS as Inventory DB participant PS as Payment DB C->>OS: INSERT order ✓ C->>IS: UPDATE inventory ✓ C->>PS: INSERT payment ✗ FAIL Note over OS,PS: Order exists, inventory reserved
but payment failed!
No automatic rollback across DBs end
Local vs distributed. A local transaction relies on the database engine’s ACID guarantees — BEGIN, do work, COMMIT or ROLLBACK. Across multiple databases, there’s no shared transaction log. You need an external coordination protocol (2PC) or a compensation strategy (Saga).
Level 2 — Two-Phase Commit
4. 2PC Happy Path
Coordinator participant P1 as Participant 1
(Order DB) participant P2 as Participant 2
(Inventory DB) participant P3 as Participant 3
(Payment DB) C->>TC: Place order Note over TC,P3: Phase 1 — Prepare (Vote) par Prepare all participants TC->>P1: PREPARE TC->>P2: PREPARE TC->>P3: PREPARE end Note over P1: Write to WAL
Acquire locks Note over P2: Write to WAL
Acquire locks Note over P3: Write to WAL
Acquire locks P1-->>TC: VOTE YES P2-->>TC: VOTE YES P3-->>TC: VOTE YES Note over TC: All voted YES
Decision: COMMIT Note over TC,P3: Phase 2 — Commit par Commit all participants TC->>P1: COMMIT TC->>P2: COMMIT TC->>P3: COMMIT end P1-->>TC: ACK (committed, locks released) P2-->>TC: ACK (committed, locks released) P3-->>TC: ACK (committed, locks released) TC-->>C: Order placed successfully
2PC happy path. The coordinator asks all participants to prepare (acquire locks, write to WAL). If everyone votes YES, the coordinator sends COMMIT. If anyone votes NO, everyone rolls back. This guarantees atomicity across multiple databases.
5. 2PC Abort Scenario
Coordinator participant P1 as Order DB participant P2 as Inventory DB participant P3 as Payment DB Note over TC,P3: Phase 1 — Prepare par Prepare TC->>P1: PREPARE TC->>P2: PREPARE TC->>P3: PREPARE end P1-->>TC: VOTE YES P2-->>TC: VOTE NO (insufficient stock!) P3-->>TC: VOTE YES Note over TC: One NO vote
Decision: ABORT Note over TC,P3: Phase 2 — Rollback par Rollback all TC->>P1: ROLLBACK TC->>P2: ROLLBACK TC->>P3: ROLLBACK end P1-->>TC: ACK (rolled back, locks released) P2-->>TC: ACK (already aborted) P3-->>TC: ACK (rolled back, locks released) Note over TC: All clean — no partial state
2PC abort. One NO vote aborts the entire transaction. All participants release locks and roll back. This is the safe path — better to abort than to leave the system in an inconsistent state.
6. 2PC Coordinator Failure
Coordinator participant P1 as Participant 1 participant P2 as Participant 2 participant P3 as Participant 3 Note over TC,P3: Phase 1 — Prepare TC->>P1: PREPARE TC->>P2: PREPARE TC->>P3: PREPARE P1-->>TC: VOTE YES P2-->>TC: VOTE YES P3-->>TC: VOTE YES Note over TC: All voted YES
About to send COMMIT... Note over TC: ⚡ COORDINATOR CRASHES Note over P1: Holding locks...
Waiting for decision...
Can't commit (no instruction)
Can't abort (might miss commit) Note over P2: Holding locks...
BLOCKED
Other transactions waiting Note over P3: Holding locks...
BLOCKED
Timeout? Still can't decide Note over P1,P3: PARTICIPANTS ARE STUCK
holding locks indefinitely
until coordinator recovers
The 2PC blocking problem. If the coordinator crashes after collecting votes but before sending the decision, participants are stuck holding locks. They can’t commit (they don’t know the decision) and they can’t abort (the coordinator might have decided to commit). This is 2PC’s fundamental weakness.
7. Three-Phase Commit (3PC)
participants can safely ABORT
(no commit instruction received) Note over TC,P3: Phase 3 — DoCommit par Commit TC->>P1: DoCommit TC->>P2: DoCommit TC->>P3: DoCommit end P1-->>TC: Committed P2-->>TC: Committed P3-->>TC: Committed
3PC adds a pre-commit phase. After all participants agree (Phase 1), the coordinator sends PreCommit (Phase 2). If the coordinator crashes before DoCommit, participants haven’t committed yet and can safely abort after a timeout. This reduces the blocking window but doesn’t eliminate it entirely — and it’s rarely used in practice because network partitions break its assumptions.
Level 3 — Saga Pattern
8. Saga Choreography
Service"] -->|"order.created"| EB1((Event Bus)) EB1 --> PS["Payment
Service"] PS -->|"payment.completed"| EB2((Event Bus)) EB2 --> IS["Inventory
Service"] IS -->|"stock.reserved"| EB3((Event Bus)) EB3 --> SS["Shipping
Service"] SS -->|"shipment.scheduled"| EB4((Event Bus)) EB4 --> NS["Notification
Service"] end subgraph Compensation["Compensation (payment fails)"] PS2["Payment
Service"] -->|"payment.failed"| EBC((Event Bus)) EBC --> OS2["Order Service
cancel order"] end style OS fill:#4a9eff,stroke:#2d7ed8,color:#fff style PS fill:#51cf66,stroke:#37b24d,color:#fff style IS fill:#ffd43b,stroke:#f59f00,color:#333 style SS fill:#a29bfe,stroke:#6c5ce7,color:#fff style NS fill:#27ae60,stroke:#1e8449,color:#fff style EB1 fill:#ff6b6b,stroke:#d44,color:#fff style EB2 fill:#ff6b6b,stroke:#d44,color:#fff style EB3 fill:#ff6b6b,stroke:#d44,color:#fff style EB4 fill:#ff6b6b,stroke:#d44,color:#fff style PS2 fill:#51cf66,stroke:#37b24d,color:#fff style OS2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style EBC fill:#ff6b6b,stroke:#d44,color:#fff
Saga choreography. Each service publishes an event when it completes its step. The next service listens and acts. No central coordinator — services are loosely coupled and independently deployable. The downside: the workflow is implicit, scattered across services, hard to trace and debug.
9. Saga Orchestration
state of entire workflow
in its own database
Saga orchestration. A central orchestrator drives the workflow step-by-step. It calls each service, waits for the response, and decides the next action. The workflow is explicit and visible in one place. Easier to debug than choreography, but the orchestrator is a single point that must be highly available.
10. Saga Compensation (Rollback)
Trigger compensation
in REVERSE order Note over O,SS: Compensating Transactions (reverse) O->>IS: 3c. Release inventory IS-->>O: ✓ Stock released O->>PS: 2c. Refund payment PS-->>O: ✓ Payment refunded O->>OS: 1c. Cancel order OS-->>O: ✓ Order cancelled Note over O: All compensations complete
System is consistent
Saga compensation. Unlike 2PC rollback (which undoes uncommitted work), saga compensation applies new transactions that semantically reverse completed steps. Refund instead of undo-charge. Cancel-order instead of undo-create. Each service must implement both forward and compensating operations.
11. Saga with Timeout and Retry
No response received O->>PS: Retry: Charge $50 (idempotency_key=pay_123) Note over PS: Same idempotency key
Return cached result
(NOT double-charged) PS-->>O: ✓ Payment completed (cached) Note over O: Retry succeeded
Continue to next step rect rgb(60, 30, 30) Note over O,PS: After 3 retries — still failing O->>PS: Retry 3: Charge $50 (key=pay_123) Note over PS: ⚡ Still unavailable Note over O: Max retries exceeded
Trigger compensation end
Idempotency + retry. Network failures are expected. The orchestrator retries with idempotency keys to prevent double-processing. If retries are exhausted, compensation kicks in. Every saga step must be idempotent — calling it twice with the same key produces the same result.
Level 4 — Advanced Patterns
12. Outbox Pattern
INSERT order"] OB["Outbox Table
INSERT event"] end APP -->|"atomic write"| TX end subgraph Relay["Event Relay"] direction TB CDC["CDC / Poller
reads outbox table"] end subgraph Broker["Message Broker"] direction TB K["Kafka / RabbitMQ"] end subgraph Consumers["Downstream Services"] PS["Payment Service"] IS["Inventory Service"] end OB -->|"poll / CDC"| CDC CDC -->|"publish"| K K --> PS K --> IS style APP fill:#4a9eff,stroke:#2d7ed8,color:#fff style DB fill:#51cf66,stroke:#37b24d,color:#fff style OB fill:#ffd43b,stroke:#f59f00,color:#333 style CDC fill:#a29bfe,stroke:#6c5ce7,color:#fff style K fill:#ff6b6b,stroke:#d44,color:#fff style PS fill:#27ae60,stroke:#1e8449,color:#fff style IS fill:#27ae60,stroke:#1e8449,color:#fff
Outbox pattern. The service writes both the business data and the event to an outbox table in a single local transaction — no distributed transaction needed. A separate relay (CDC or poller) reads the outbox and publishes to the message broker. This guarantees at-least-once event delivery without 2PC between the database and the broker.
13. Saga vs 2PC Decision Guide
transaction?"}} START -->|"Same database
or XA-compatible"| TPC["Use 2PC
Strong consistency
Short-lived locks"] START -->|"Different services
different databases"| SAGA_Q{{"Long-running?
Multiple steps?"}} SAGA_Q -->|"Simple, 2-3 services"| CHOREO["Saga Choreography
Event-driven
Low coupling"] SAGA_Q -->|"Complex workflow
many steps"| ORCH["Saga Orchestration
Central coordinator
Explicit flow"] ORCH --> OUTBOX["+ Outbox Pattern
Reliable event delivery"] CHOREO --> OUTBOX style START fill:#ffd43b,stroke:#f59f00,color:#333 style TPC fill:#4a9eff,stroke:#2d7ed8,color:#fff style SAGA_Q fill:#ffd43b,stroke:#f59f00,color:#333 style CHOREO fill:#51cf66,stroke:#37b24d,color:#fff style ORCH fill:#a29bfe,stroke:#6c5ce7,color:#fff style OUTBOX fill:#27ae60,stroke:#1e8449,color:#fff
| Aspect | 2PC | Saga Choreography | Saga Orchestration |
|---|---|---|---|
| Consistency | Strong (atomic) | Eventual | Eventual |
| Coupling | Tight (coordinator) | Loose (events) | Medium (orchestrator) |
| Latency | High (locks held) | Low (async) | Medium (sequential) |
| Visibility | Low (implicit) | Low (scattered) | High (central flow) |
| Scalability | Limited | High | High |
| Failure handling | Rollback | Compensation events | Compensation commands |
| Use case | DB-to-DB, XA | Simple microservice flows | Complex order workflows |
TL;DR — When to Use What
| Scenario | Pattern | Why |
|---|---|---|
| Same DB or XA support | 2PC | Strong atomicity, built-in support |
| 2-3 loosely coupled services | Saga Choreography | Simple, no coordinator needed |
| Complex multi-step workflow | Saga Orchestration | Visible flow, centralized error handling |
| Reliable event publishing | Outbox Pattern | No dual-write problem |
| Sub-second transactions | 2PC | Locks are short-lived |
| Long-running business processes | Saga | Minutes/hours, compensation on failure |
- Can you use a single database? → Do it. Local transactions beat everything.
- Need cross-service atomicity? → Saga with orchestration + outbox. It’s the industry standard.
- Have XA-compatible databases? → 2PC works, but watch for lock contention.
- Building event-driven architecture? → Saga choreography fits naturally.
- Worried about dual writes? → Outbox pattern. Always.