How to Read This Post

Each scenario shows a diagram first, then a short note on why the pattern matters. Complexity increases as you scroll.

Strategy Category Strength Trade-off
Leader-Follower Replication Simple, strong consistency for reads Write bottleneck at leader
Multi-Leader Replication Multi-DC writes Conflict resolution needed
Leaderless (Quorum) Replication No single point of failure Tunable consistency
Range Partitioning Partitioning Range scans, sorted data Hotspot risk
Hash Partitioning Partitioning Even distribution No range queries
Consistent Hashing Partitioning Minimal redistribution Virtual nodes needed

Level 1 — Foundations

1. Single Node Storage

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart LR C1["Client 1"] --> DB[(Single Database)] C2["Client 2"] --> DB C3["Client 3"] --> DB DB -->|"⚡ crash"| LOST["All data LOST
All clients BLOCKED"] style C1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style DB fill:#ff6b6b,stroke:#d44,color:#fff style LOST fill:#546e7a,stroke:#90a4ae,color:#fff

Single point of failure. One database handles everything — simple but fatal. If it crashes, all data is lost and all clients are blocked. Every distributed storage pattern exists to solve this problem.


2. CAP Theorem

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB CAP{{"CAP Theorem
Pick two during partition"}} CAP --> CP["CP — Consistency + Partition Tolerance
Reject writes during partition"] CAP --> AP["AP — Availability + Partition Tolerance
Accept writes, resolve conflicts later"] CAP --> CA["CA — Consistency + Availability
Only if no partition (single node)"] CP --> CP_EX["HBase, MongoDB, etcd
ZooKeeper, Redis Cluster"] AP --> AP_EX["Cassandra, DynamoDB
CouchDB, Riak"] CA --> CA_EX["PostgreSQL (single node)
MySQL (single node)"] style CAP fill:#ffd43b,stroke:#f59f00,color:#333 style CP fill:#4a9eff,stroke:#2d7ed8,color:#fff style AP fill:#51cf66,stroke:#37b24d,color:#fff style CA fill:#a29bfe,stroke:#6c5ce7,color:#fff style CP_EX fill:#546e7a,stroke:#90a4ae,color:#fff style AP_EX fill:#546e7a,stroke:#90a4ae,color:#fff style CA_EX fill:#546e7a,stroke:#90a4ae,color:#fff

CAP theorem. During a network partition, you must choose: consistency (all nodes see the same data) or availability (every request gets a response). CA systems don’t exist in distributed form — partitions always happen eventually.


3. Read/Write Trade-offs

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB WL["Workload Type?"] WL -->|"Read-heavy
(90% reads)"| RH["Add read replicas"] WL -->|"Write-heavy
(90% writes)"| WH["Partition data across nodes"] WL -->|"Mixed"| MX["Replicate + Partition"] RH --> RH_S["1 Leader + N Followers
Scale reads horizontally"] WH --> WH_S["Shard by key
Each shard handles its writes"] MX --> MX_S["Partition for writes
Replicate each partition for reads"] style WL fill:#ffd43b,stroke:#f59f00,color:#333 style RH fill:#4a9eff,stroke:#2d7ed8,color:#fff style WH fill:#ff6b6b,stroke:#d44,color:#fff style MX fill:#a29bfe,stroke:#6c5ce7,color:#fff style RH_S fill:#51cf66,stroke:#37b24d,color:#fff style WH_S fill:#51cf66,stroke:#37b24d,color:#fff style MX_S fill:#51cf66,stroke:#37b24d,color:#fff

Design for your workload. Read-heavy? Add replicas. Write-heavy? Partition. Most real systems need both — partition data for write throughput, replicate each partition for read scaling and durability.


Level 2 — Replication

4. Leader-Follower Replication

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB CW["Client (writes)"] -->|"INSERT, UPDATE"| L["Leader
(read-write)"] CR1["Client (reads)"] --> F1["Follower 1
(read-only)"] CR2["Client (reads)"] --> F2["Follower 2
(read-only)"] CR3["Client (reads)"] --> L L -->|"replication stream"| F1 L -->|"replication stream"| F2 L -->|"replication stream"| F3["Follower 3
(standby)"] F3 -->|"if leader fails"| PROMOTE["Promoted to Leader"] style CW fill:#4a9eff,stroke:#2d7ed8,color:#fff style CR1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style CR2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style CR3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style L fill:#ffd43b,stroke:#f59f00,color:#333 style F1 fill:#51cf66,stroke:#37b24d,color:#fff style F2 fill:#51cf66,stroke:#37b24d,color:#fff style F3 fill:#546e7a,stroke:#90a4ae,color:#fff style PROMOTE fill:#ff6b6b,stroke:#d44,color:#fff

Leader-follower (master-slave). All writes go through a single leader. Followers replicate the write-ahead log and serve reads. PostgreSQL, MySQL, MongoDB all support this. Simple, but the leader is the write bottleneck and a failover target.


5. Synchronous vs Asynchronous Replication

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% sequenceDiagram participant C as Client participant L as Leader participant F1 as Follower 1 participant F2 as Follower 2 rect rgb(30, 60, 30) Note over C,F2: Synchronous Replication C->>L: INSERT user=alice L->>L: Write to local WAL L->>F1: Replicate L->>F2: Replicate F1-->>L: ACK F2-->>L: ACK Note over L: Wait for ALL followers L-->>C: OK (committed) Note over C: Strong consistency
Higher latency (network RTT) end rect rgb(60, 30, 30) Note over C,F2: Asynchronous Replication C->>L: INSERT user=bob L->>L: Write to local WAL L-->>C: OK (committed) Note over C: Immediate response
Lower latency L->>F1: Replicate (background) L->>F2: Replicate (background) Note over F1,F2: Followers may lag
Risk: data loss if leader crashes end

Sync vs async — the durability-latency trade-off. Synchronous replication guarantees no data loss but adds network latency to every write. Asynchronous returns immediately but followers can lag — if the leader crashes before replication, writes are lost. Semi-synchronous (ack from at least one follower) is the common middle ground.


6. Multi-Leader Replication

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph DC1["Datacenter US-East"] C1["Client"] --> L1["Leader 1"] L1 --> F1A["Follower"] L1 --> F1B["Follower"] end subgraph DC2["Datacenter EU-West"] C2["Client"] --> L2["Leader 2"] L2 --> F2A["Follower"] L2 --> F2B["Follower"] end subgraph DC3["Datacenter AP-South"] C3["Client"] --> L3["Leader 3"] L3 --> F3A["Follower"] end L1 <-->|"async replication"| L2 L2 <-->|"async replication"| L3 L1 <-->|"async replication"| L3 L1 -.->|"conflict!"| CONFLICT["Write Conflict Resolution
Last-write-wins / CRDTs / Custom"] style C1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style L1 fill:#ffd43b,stroke:#f59f00,color:#333 style L2 fill:#ffd43b,stroke:#f59f00,color:#333 style L3 fill:#ffd43b,stroke:#f59f00,color:#333 style F1A fill:#51cf66,stroke:#37b24d,color:#fff style F1B fill:#51cf66,stroke:#37b24d,color:#fff style F2A fill:#51cf66,stroke:#37b24d,color:#fff style F2B fill:#51cf66,stroke:#37b24d,color:#fff style F3A fill:#51cf66,stroke:#37b24d,color:#fff style CONFLICT fill:#ff6b6b,stroke:#d44,color:#fff

Multi-leader replication. Each datacenter has its own leader that accepts writes locally with low latency. Leaders replicate to each other asynchronously. The catch: two leaders can modify the same row simultaneously — you need a conflict resolution strategy (last-write-wins, CRDTs, or application-level merging).


7. Leaderless Replication (Quorum)

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% sequenceDiagram participant C as Client participant N1 as Node 1 participant N2 as Node 2 participant N3 as Node 3 Note over C,N3: N=3 nodes, W=2, R=2
W + R > N ensures overlap rect rgb(30, 60, 30) Note over C,N3: Write (W=2 required) C->>N1: Write x=5 (v2) C->>N2: Write x=5 (v2) C->>N3: Write x=5 (v2) N1-->>C: ACK N2-->>C: ACK Note over N3: Temporarily unavailable Note over C: 2/3 ACKs = W satisfied ✓ end rect rgb(30, 30, 60) Note over C,N3: Read (R=2 required) C->>N1: Read x C->>N2: Read x C->>N3: Read x N1-->>C: x=5 (v2) N2-->>C: x=5 (v2) N3-->>C: x=3 (v1, stale!) Note over C: 2/3 responses have v2
Return x=5 (latest version) end rect rgb(60, 30, 30) Note over C,N3: Read Repair C->>N3: Write x=5 (v2) N3-->>C: ACK Note over N1,N3: All nodes now consistent end

Leaderless quorum reads/writes (Dynamo-style). Write to W nodes, read from R nodes. If W + R > N, at least one node in the read set has the latest write. The client picks the newest version and repairs stale nodes. Cassandra, DynamoDB, and Riak use this approach.


Level 3 — Partitioning

8. Range-Based Partitioning

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB R["Router / Client"] R -->|"keys A-F"| N1["Node 1
A-F
apple, banana, cherry"] R -->|"keys G-M"| N2["Node 2
G-M
grape, kiwi, lemon"] R -->|"keys N-Z"| N3["Node 3
N-Z
orange, peach, tomato"] subgraph Problem["Hotspot Problem"] direction LR TS["Time-series keys:
2024-03-01, 2024-03-02..."] TS --> HOT["All writes hit
the SAME partition"] end style R fill:#ffd43b,stroke:#f59f00,color:#333 style N1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style N2 fill:#51cf66,stroke:#37b24d,color:#fff style N3 fill:#a29bfe,stroke:#6c5ce7,color:#fff style TS fill:#546e7a,stroke:#90a4ae,color:#fff style HOT fill:#ff6b6b,stroke:#d44,color:#fff

Range partitioning. Keys are split into contiguous ranges — simple and supports efficient range scans (SELECT * WHERE key BETWEEN 'A' AND 'F'). The danger: sequential keys (timestamps, auto-increment IDs) create hotspots where all writes hammer a single partition.


9. Hash-Based Partitioning

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Before["3 Nodes — hash(key) mod 3"] direction LR K1["key: alice
hash=7
7 mod 3 = 1"] --> BN1["Node 1"] K2["key: bob
hash=12
12 mod 3 = 0"] --> BN0["Node 0"] K3["key: charlie
hash=5
5 mod 3 = 2"] --> BN2["Node 2"] end Before -->|"Add Node 3"| After subgraph After["4 Nodes — hash(key) mod 4"] direction LR K4["key: alice
hash=7
7 mod 4 = 3"] --> AN3["Node 3 ← MOVED!"] K5["key: bob
hash=12
12 mod 4 = 0"] --> AN0["Node 0 (same)"] K6["key: charlie
hash=5
5 mod 4 = 1"] --> AN1["Node 1 ← MOVED!"] end After --> PROBLEM["Adding 1 node reshuffles
~75% of all keys!"] style K1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style BN0 fill:#51cf66,stroke:#37b24d,color:#fff style BN1 fill:#51cf66,stroke:#37b24d,color:#fff style BN2 fill:#51cf66,stroke:#37b24d,color:#fff style K4 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K5 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K6 fill:#4a9eff,stroke:#2d7ed8,color:#fff style AN0 fill:#51cf66,stroke:#37b24d,color:#fff style AN1 fill:#ffd43b,stroke:#f59f00,color:#333 style AN3 fill:#ffd43b,stroke:#f59f00,color:#333 style PROBLEM fill:#ff6b6b,stroke:#d44,color:#fff

Hash mod N partitioning. Hash the key, mod by node count — even distribution, no hotspots. But adding or removing a node changes the mod, reshuffling most keys. This triggers massive data migration. Consistent hashing solves this.


10. Consistent Hashing

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Ring["Hash Ring (0 → 2^32)"] direction TB NA["Node A
position: 90°"] NB["Node B
position: 210°"] NC["Node C
position: 330°"] K1["key1 → 45°
→ Node A (next clockwise)"] K2["key2 → 150°
→ Node B (next clockwise)"] K3["key3 → 270°
→ Node C (next clockwise)"] end subgraph AddNode["Add Node D at 180°"] direction TB NA2["Node A: keeps key1"] ND["Node D (NEW): gets key2
(was Node B's)"] NB2["Node B: loses only key2"] NC2["Node C: keeps key3"] end Ring -->|"only 1/N keys move"| AddNode style NA fill:#4a9eff,stroke:#2d7ed8,color:#fff style NB fill:#51cf66,stroke:#37b24d,color:#fff style NC fill:#a29bfe,stroke:#6c5ce7,color:#fff style K1 fill:#546e7a,stroke:#90a4ae,color:#fff style K2 fill:#546e7a,stroke:#90a4ae,color:#fff style K3 fill:#546e7a,stroke:#90a4ae,color:#fff style NA2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style ND fill:#ffd43b,stroke:#f59f00,color:#333 style NB2 fill:#51cf66,stroke:#37b24d,color:#fff style NC2 fill:#a29bfe,stroke:#6c5ce7,color:#fff

Consistent hashing. Nodes and keys are placed on the same hash ring. Each key maps to the next node clockwise. Adding or removing a node only moves ~1/N of the keys — not all of them. DynamoDB, Cassandra, and Memcached all use consistent hashing.


Level 4 — Advanced Patterns

11. Consistent Hashing with Virtual Nodes

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph NoVnodes["Without Virtual Nodes"] direction LR P1["Node A
owns 60%"] --- P2["Node B
owns 15%"] --- P3["Node C
owns 25%"] end subgraph WithVnodes["With Virtual Nodes (4 vnodes each)"] direction TB V1["Ring positions:"] VA1["A-1 (30°)"] --- VB1["B-1 (60°)"] --- VC1["C-1 (100°)"] VA2["A-2 (140°)"] --- VB2["B-2 (180°)"] --- VC2["C-2 (220°)"] VA3["A-3 (260°)"] --- VB3["B-3 (290°)"] --- VC3["C-3 (320°)"] VA4["A-4 (350°)"] end NoVnodes -->|"unbalanced!"| FIX["Virtual nodes
spread each physical node
across the ring"] FIX --> WithVnodes WithVnodes --> RESULT["Node A: ~33%
Node B: ~33%
Node C: ~33%"] style P1 fill:#ff6b6b,stroke:#d44,color:#fff style P2 fill:#51cf66,stroke:#37b24d,color:#fff style P3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA4 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VB1 fill:#51cf66,stroke:#37b24d,color:#fff style VB2 fill:#51cf66,stroke:#37b24d,color:#fff style VB3 fill:#51cf66,stroke:#37b24d,color:#fff style VC1 fill:#a29bfe,stroke:#6c5ce7,color:#fff style VC2 fill:#a29bfe,stroke:#6c5ce7,color:#fff style VC3 fill:#a29bfe,stroke:#6c5ce7,color:#fff style V1 fill:#546e7a,stroke:#90a4ae,color:#fff style FIX fill:#ffd43b,stroke:#f59f00,color:#333 style RESULT fill:#27ae60,stroke:#1e8449,color:#fff

Virtual nodes. With only physical nodes on the ring, distribution is uneven. Virtual nodes place each physical node at multiple positions (e.g., 256 vnodes per server). This smooths out the distribution. Cassandra uses 256 vnodes per node by default.


12. Replication + Partitioning Combined

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Data["Data: 6 Partitions, RF=3"] direction LR P1["Partition 1"] P2["Partition 2"] P3["Partition 3"] P4["Partition 4"] P5["Partition 5"] P6["Partition 6"] end subgraph Nodes["4 Nodes"] direction TB N1["Node 1
P1 (leader), P2, P6"] N2["Node 2
P2 (leader), P3, P1"] N3["Node 3
P3 (leader), P4, P5, P2"] N4["Node 4
P4 (leader), P5 (leader), P6 (leader), P1, P3"] end Data --> Nodes N2 -->|"Node 2 dies"| SURVIVE["P1: still on N1, N4
P2: still on N1, N3
P3: still on N3, N4
No data lost!"] style P1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style P2 fill:#51cf66,stroke:#37b24d,color:#fff style P3 fill:#a29bfe,stroke:#6c5ce7,color:#fff style P4 fill:#ffd43b,stroke:#f59f00,color:#333 style P5 fill:#ff6b6b,stroke:#d44,color:#fff style P6 fill:#27ae60,stroke:#1e8449,color:#fff style N1 fill:#546e7a,stroke:#90a4ae,color:#fff style N2 fill:#546e7a,stroke:#90a4ae,color:#fff style N3 fill:#546e7a,stroke:#90a4ae,color:#fff style N4 fill:#546e7a,stroke:#90a4ae,color:#fff style SURVIVE fill:#51cf66,stroke:#37b24d,color:#fff

Partition + replicate. Data is split into partitions for write throughput, then each partition is replicated across N nodes for durability. Each partition has a leader for writes and followers for reads. Kafka, CockroachDB, and Cassandra all combine partitioning with replication.


13. Rack-Aware Data Placement

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph DC["Datacenter"] subgraph Rack1["Rack 1"] N1["Node 1
Partition A (leader)
Partition C (replica)"] N2["Node 2
Partition B (replica)"] end subgraph Rack2["Rack 2"] N3["Node 3
Partition A (replica)
Partition B (leader)"] N4["Node 4
Partition C (replica)"] end subgraph Rack3["Rack 3"] N5["Node 5
Partition A (replica)
Partition B (replica)"] N6["Node 6
Partition C (leader)"] end end Rack1 -->|"Rack 1 power failure"| SAFE["Partition A: Rack 2 + Rack 3
Partition B: Rack 2 + Rack 3
Partition C: Rack 2 + Rack 3
ALL partitions survive!"] style N1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style N2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style N3 fill:#51cf66,stroke:#37b24d,color:#fff style N4 fill:#51cf66,stroke:#37b24d,color:#fff style N5 fill:#a29bfe,stroke:#6c5ce7,color:#fff style N6 fill:#a29bfe,stroke:#6c5ce7,color:#fff style SAFE fill:#27ae60,stroke:#1e8449,color:#fff

Rack-aware placement. Never put all replicas of a partition on the same rack, zone, or region. If a rack loses power, every partition should still have a majority of replicas alive in other racks. HDFS, Cassandra, and Kafka all support rack-aware replica placement.


TL;DR — When to Use What

Scenario Strategy Why
Read-heavy, single region Leader-Follower Simple, add read replicas as needed
Multi-region writes Multi-Leader Low-latency local writes
Maximum availability Leaderless Quorum No single point of failure
Sorted range queries Range Partitioning Efficient scans, co-located data
Even distribution Hash Partitioning No hotspots, but no range queries
Elastic scaling Consistent Hashing Minimal data movement on resize
Production systems All of the above Partition + replicate + rack-aware
quadrantChart title Storage Strategy Trade-offs x-axis "Simple" --> "Complex" y-axis "Read Optimized" --> "Write Optimized" quadrant-1 "Sharding" quadrant-2 "Advanced" quadrant-3 "Replication" quadrant-4 "Specialized" "Leader-Follower": [0.20, 0.25] "Multi-Leader": [0.60, 0.65] "Leaderless Quorum": [0.70, 0.55] "Range Partitioning": [0.35, 0.70] "Hash Partitioning": [0.40, 0.75] "Consistent Hashing": [0.65, 0.80] "Virtual Nodes": [0.75, 0.85] "Rack-Aware Placement": [0.80, 0.50]

  • Starting simple? → Leader-Follower replication. It covers 80% of use cases.
  • Outgrowing a single writer? → Partition your data with consistent hashing.
  • Going multi-region? → Multi-Leader or leaderless with tunable consistency.
  • Already at scale? → You’re combining all of these. The real skill is knowing which dials to turn.