How to Read This Post

Each scenario shows a diagram first, then a short note on why the pattern matters. Complexity increases as you scroll.

Strategy	Category	Strength	Trade-off
Leader-Follower	Replication	Simple, strong consistency for reads	Write bottleneck at leader
Multi-Leader	Replication	Multi-DC writes	Conflict resolution needed
Leaderless (Quorum)	Replication	No single point of failure	Tunable consistency
Range Partitioning	Partitioning	Range scans, sorted data	Hotspot risk
Hash Partitioning	Partitioning	Even distribution	No range queries
Consistent Hashing	Partitioning	Minimal redistribution	Virtual nodes needed

Level 1 — Foundations

1. Single Node Storage

Single point of failure. One database handles everything — simple but fatal. If it crashes, all data is lost and all clients are blocked. Every distributed storage pattern exists to solve this problem.

2. CAP Theorem

CAP theorem. During a network partition, you must choose: consistency (all nodes see the same data) or availability (every request gets a response). CA systems don’t exist in distributed form — partitions always happen eventually.

3. Read/Write Trade-offs

Design for your workload. Read-heavy? Add replicas. Write-heavy? Partition. Most real systems need both — partition data for write throughput, replicate each partition for read scaling and durability.

Level 2 — Replication

4. Leader-Follower Replication

Leader-follower (master-slave). All writes go through a single leader. Followers replicate the write-ahead log and serve reads. PostgreSQL, MySQL, MongoDB all support this. Simple, but the leader is the write bottleneck and a failover target.

5. Synchronous vs Asynchronous Replication

Sync vs async — the durability-latency trade-off. Synchronous replication guarantees no data loss but adds network latency to every write. Asynchronous returns immediately but followers can lag — if the leader crashes before replication, writes are lost. Semi-synchronous (ack from at least one follower) is the common middle ground.

6. Multi-Leader Replication

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph DC1["Datacenter US-East"] C1["Client"] --> L1["Leader 1"] L1 --> F1A["Follower"] L1 --> F1B["Follower"] end subgraph DC2["Datacenter EU-West"] C2["Client"] --> L2["Leader 2"] L2 --> F2A["Follower"] L2 --> F2B["Follower"] end subgraph DC3["Datacenter AP-South"] C3["Client"] --> L3["Leader 3"] L3 --> F3A["Follower"] end L1 <-->|"async replication"| L2 L2 <-->|"async replication"| L3 L1 <-->|"async replication"| L3 L1 -.->|"conflict!"| CONFLICT["Write Conflict Resolution
Last-write-wins / CRDTs / Custom"] style C1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style C3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style L1 fill:#ffd43b,stroke:#f59f00,color:#333 style L2 fill:#ffd43b,stroke:#f59f00,color:#333 style L3 fill:#ffd43b,stroke:#f59f00,color:#333 style F1A fill:#51cf66,stroke:#37b24d,color:#fff style F1B fill:#51cf66,stroke:#37b24d,color:#fff style F2A fill:#51cf66,stroke:#37b24d,color:#fff style F2B fill:#51cf66,stroke:#37b24d,color:#fff style F3A fill:#51cf66,stroke:#37b24d,color:#fff style CONFLICT fill:#ff6b6b,stroke:#d44,color:#fff

Multi-leader replication. Each datacenter has its own leader that accepts writes locally with low latency. Leaders replicate to each other asynchronously. The catch: two leaders can modify the same row simultaneously — you need a conflict resolution strategy (last-write-wins, CRDTs, or application-level merging).

7. Leaderless Replication (Quorum)

Leaderless quorum reads/writes (Dynamo-style). Write to W nodes, read from R nodes. If W + R > N, at least one node in the read set has the latest write. The client picks the newest version and repairs stale nodes. Cassandra, DynamoDB, and Riak use this approach.

Level 3 — Partitioning

8. Range-Based Partitioning

Range partitioning. Keys are split into contiguous ranges — simple and supports efficient range scans (SELECT * WHERE key BETWEEN 'A' AND 'F'). The danger: sequential keys (timestamps, auto-increment IDs) create hotspots where all writes hammer a single partition.

9. Hash-Based Partitioning

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Before["3 Nodes — hash(key) mod 3"] direction LR K1["key: alice
hash=7
7 mod 3 = 1"] --> BN1["Node 1"] K2["key: bob
hash=12
12 mod 3 = 0"] --> BN0["Node 0"] K3["key: charlie
hash=5
5 mod 3 = 2"] --> BN2["Node 2"] end Before -->|"Add Node 3"| After subgraph After["4 Nodes — hash(key) mod 4"] direction LR K4["key: alice
hash=7
7 mod 4 = 3"] --> AN3["Node 3 ← MOVED!"] K5["key: bob
hash=12
12 mod 4 = 0"] --> AN0["Node 0 (same)"] K6["key: charlie
hash=5
5 mod 4 = 1"] --> AN1["Node 1 ← MOVED!"] end After --> PROBLEM["Adding 1 node reshuffles
~75% of all keys!"] style K1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style BN0 fill:#51cf66,stroke:#37b24d,color:#fff style BN1 fill:#51cf66,stroke:#37b24d,color:#fff style BN2 fill:#51cf66,stroke:#37b24d,color:#fff style K4 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K5 fill:#4a9eff,stroke:#2d7ed8,color:#fff style K6 fill:#4a9eff,stroke:#2d7ed8,color:#fff style AN0 fill:#51cf66,stroke:#37b24d,color:#fff style AN1 fill:#ffd43b,stroke:#f59f00,color:#333 style AN3 fill:#ffd43b,stroke:#f59f00,color:#333 style PROBLEM fill:#ff6b6b,stroke:#d44,color:#fff

Hash mod N partitioning. Hash the key, mod by node count — even distribution, no hotspots. But adding or removing a node changes the mod, reshuffling most keys. This triggers massive data migration. Consistent hashing solves this.

10. Consistent Hashing

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Ring["Hash Ring (0 → 2^32)"] direction TB NA["Node A
position: 90°"] NB["Node B
position: 210°"] NC["Node C
position: 330°"] K1["key1 → 45°
→ Node A (next clockwise)"] K2["key2 → 150°
→ Node B (next clockwise)"] K3["key3 → 270°
→ Node C (next clockwise)"] end subgraph AddNode["Add Node D at 180°"] direction TB NA2["Node A: keeps key1"] ND["Node D (NEW): gets key2
(was Node B's)"] NB2["Node B: loses only key2"] NC2["Node C: keeps key3"] end Ring -->|"only 1/N keys move"| AddNode style NA fill:#4a9eff,stroke:#2d7ed8,color:#fff style NB fill:#51cf66,stroke:#37b24d,color:#fff style NC fill:#a29bfe,stroke:#6c5ce7,color:#fff style K1 fill:#546e7a,stroke:#90a4ae,color:#fff style K2 fill:#546e7a,stroke:#90a4ae,color:#fff style K3 fill:#546e7a,stroke:#90a4ae,color:#fff style NA2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style ND fill:#ffd43b,stroke:#f59f00,color:#333 style NB2 fill:#51cf66,stroke:#37b24d,color:#fff style NC2 fill:#a29bfe,stroke:#6c5ce7,color:#fff

Consistent hashing. Nodes and keys are placed on the same hash ring. Each key maps to the next node clockwise. Adding or removing a node only moves ~1/N of the keys — not all of them. DynamoDB, Cassandra, and Memcached all use consistent hashing.

Level 4 — Advanced Patterns

11. Consistent Hashing with Virtual Nodes

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph NoVnodes["Without Virtual Nodes"] direction LR P1["Node A
owns 60%"] --- P2["Node B
owns 15%"] --- P3["Node C
owns 25%"] end subgraph WithVnodes["With Virtual Nodes (4 vnodes each)"] direction TB V1["Ring positions:"] VA1["A-1 (30°)"] --- VB1["B-1 (60°)"] --- VC1["C-1 (100°)"] VA2["A-2 (140°)"] --- VB2["B-2 (180°)"] --- VC2["C-2 (220°)"] VA3["A-3 (260°)"] --- VB3["B-3 (290°)"] --- VC3["C-3 (320°)"] VA4["A-4 (350°)"] end NoVnodes -->|"unbalanced!"| FIX["Virtual nodes
spread each physical node
across the ring"] FIX --> WithVnodes WithVnodes --> RESULT["Node A: ~33%
Node B: ~33%
Node C: ~33%"] style P1 fill:#ff6b6b,stroke:#d44,color:#fff style P2 fill:#51cf66,stroke:#37b24d,color:#fff style P3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA2 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA3 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VA4 fill:#4a9eff,stroke:#2d7ed8,color:#fff style VB1 fill:#51cf66,stroke:#37b24d,color:#fff style VB2 fill:#51cf66,stroke:#37b24d,color:#fff style VB3 fill:#51cf66,stroke:#37b24d,color:#fff style VC1 fill:#a29bfe,stroke:#6c5ce7,color:#fff style VC2 fill:#a29bfe,stroke:#6c5ce7,color:#fff style VC3 fill:#a29bfe,stroke:#6c5ce7,color:#fff style V1 fill:#546e7a,stroke:#90a4ae,color:#fff style FIX fill:#ffd43b,stroke:#f59f00,color:#333 style RESULT fill:#27ae60,stroke:#1e8449,color:#fff

Virtual nodes. With only physical nodes on the ring, distribution is uneven. Virtual nodes place each physical node at multiple positions (e.g., 256 vnodes per server). This smooths out the distribution. Cassandra uses 256 vnodes per node by default.

12. Replication + Partitioning Combined

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TB subgraph Data["Data: 6 Partitions, RF=3"] direction LR P1["Partition 1"] P2["Partition 2"] P3["Partition 3"] P4["Partition 4"] P5["Partition 5"] P6["Partition 6"] end subgraph Nodes["4 Nodes"] direction TB N1["Node 1
P1 (leader), P2, P6"] N2["Node 2
P2 (leader), P3, P1"] N3["Node 3
P3 (leader), P4, P5, P2"] N4["Node 4
P4 (leader), P5 (leader), P6 (leader), P1, P3"] end Data --> Nodes N2 -->|"Node 2 dies"| SURVIVE["P1: still on N1, N4
P2: still on N1, N3
P3: still on N3, N4
No data lost!"] style P1 fill:#4a9eff,stroke:#2d7ed8,color:#fff style P2 fill:#51cf66,stroke:#37b24d,color:#fff style P3 fill:#a29bfe,stroke:#6c5ce7,color:#fff style P4 fill:#ffd43b,stroke:#f59f00,color:#333 style P5 fill:#ff6b6b,stroke:#d44,color:#fff style P6 fill:#27ae60,stroke:#1e8449,color:#fff style N1 fill:#546e7a,stroke:#90a4ae,color:#fff style N2 fill:#546e7a,stroke:#90a4ae,color:#fff style N3 fill:#546e7a,stroke:#90a4ae,color:#fff style N4 fill:#546e7a,stroke:#90a4ae,color:#fff style SURVIVE fill:#51cf66,stroke:#37b24d,color:#fff

Partition + replicate. Data is split into partitions for write throughput, then each partition is replicated across N nodes for durability. Each partition has a leader for writes and followers for reads. Kafka, CockroachDB, and Cassandra all combine partitioning with replication.

13. Rack-Aware Data Placement

Rack-aware placement. Never put all replicas of a partition on the same rack, zone, or region. If a rack loses power, every partition should still have a majority of replicas alive in other racks. HDFS, Cassandra, and Kafka all support rack-aware replica placement.

TL;DR — When to Use What

Scenario	Strategy	Why
Read-heavy, single region	Leader-Follower	Simple, add read replicas as needed
Multi-region writes	Multi-Leader	Low-latency local writes
Maximum availability	Leaderless Quorum	No single point of failure
Sorted range queries	Range Partitioning	Efficient scans, co-located data
Even distribution	Hash Partitioning	No hotspots, but no range queries
Elastic scaling	Consistent Hashing	Minimal data movement on resize
Production systems	All of the above	Partition + replicate + rack-aware

quadrantChart title Storage Strategy Trade-offs x-axis "Simple" --> "Complex" y-axis "Read Optimized" --> "Write Optimized" quadrant-1 "Sharding" quadrant-2 "Advanced" quadrant-3 "Replication" quadrant-4 "Specialized" "Leader-Follower": [0.20, 0.25] "Multi-Leader": [0.60, 0.65] "Leaderless Quorum": [0.70, 0.55] "Range Partitioning": [0.35, 0.70] "Hash Partitioning": [0.40, 0.75] "Consistent Hashing": [0.65, 0.80] "Virtual Nodes": [0.75, 0.85] "Rack-Aware Placement": [0.80, 0.50]

Starting simple? → Leader-Follower replication. It covers 80% of use cases.
Outgrowing a single writer? → Partition your data with consistent hashing.
Going multi-region? → Multi-Leader or leaderless with tunable consistency.
Already at scale? → You’re combining all of these. The real skill is knowing which dials to turn.

Distributed Storage: A Visual Guide

How to Read This Post

Level 1 — Foundations

1. Single Node Storage

2. CAP Theorem

3. Read/Write Trade-offs

Level 2 — Replication

4. Leader-Follower Replication

5. Synchronous vs Asynchronous Replication

6. Multi-Leader Replication

7. Leaderless Replication (Quorum)

Level 3 — Partitioning

8. Range-Based Partitioning

9. Hash-Based Partitioning

10. Consistent Hashing

Level 4 — Advanced Patterns

11. Consistent Hashing with Virtual Nodes

12. Replication + Partitioning Combined

13. Rack-Aware Data Placement

TL;DR — When to Use What

AI Assistant

Hi! I'm your AI assistant

How to Read This Post#

Level 1 — Foundations#

1. Single Node Storage#

2. CAP Theorem#

3. Read/Write Trade-offs#

Level 2 — Replication#

4. Leader-Follower Replication#

5. Synchronous vs Asynchronous Replication#

6. Multi-Leader Replication#

7. Leaderless Replication (Quorum)#

Level 3 — Partitioning#

8. Range-Based Partitioning#

9. Hash-Based Partitioning#

10. Consistent Hashing#

Level 4 — Advanced Patterns#

11. Consistent Hashing with Virtual Nodes#

12. Replication + Partitioning Combined#

13. Rack-Aware Data Placement#

TL;DR — When to Use What#

How to Read This Post

Level 1 — Foundations

1. Single Node Storage

2. CAP Theorem

3. Read/Write Trade-offs

Level 2 — Replication

4. Leader-Follower Replication

5. Synchronous vs Asynchronous Replication

6. Multi-Leader Replication

7. Leaderless Replication (Quorum)

Level 3 — Partitioning

8. Range-Based Partitioning

9. Hash-Based Partitioning

10. Consistent Hashing

Level 4 — Advanced Patterns

11. Consistent Hashing with Virtual Nodes

12. Replication + Partitioning Combined

13. Rack-Aware Data Placement

TL;DR — When to Use What