The Two Generals’ Problem
The Two Generals’ Problem, also known as the Two Armies Problem, is a classic thought experiment that demonstrates the impossibility of achieving perfect consensus over an unreliable communication channel. It was first formulated by E. A. Akkoyunlu, K. Ekanadham, and R. V. Huber in 1975.
The Scenario
Two armies need to coordinate an attack:
- General A and General B surround an enemy
- They must attack simultaneously to win
- If only one attacks → defeat
- They communicate via messengers through enemy territory
- Messages can be lost or intercepted
- Question: Can they guarantee coordinated attack?
The Impossible Dilemma
at dawn GA->>Enemy: "Attack at dawn" Enemy->>GB: Message delivered Note over GB: Received message,
but A doesn't know! GB->>Enemy: "Acknowledged" Enemy->>GA: ACK delivered? Note over GA: Received ACK,
but B doesn't know! GA->>Enemy: "ACK of ACK" Enemy->>GB: Delivered? Note over GA,GB: This never ends!
The Core Problem
The infinite regress:
- General A sends: “Attack at dawn”
- Did B receive it? A doesn’t know!
- B sends ACK: “Got it, will attack”
- Did A receive ACK? B doesn’t know!
- A sends ACK-of-ACK: “Got your ACK”
- Did B receive it? A doesn’t know!
- → Infinite cycle, no guaranteed consensus
The Impossibility Result
Theorem: It is impossible to achieve guaranteed consensus between two parties communicating over an unreliable channel.
Proof sketch:
- Assume protocol with n messages achieves consensus
- Last message cannot be essential (might be lost)
- Protocol with n-1 messages must work
- By induction, 0 messages must work → contradiction
Real-World Implications
This impossibility affects many systems:
- TCP handshake: Uses 3-way handshake (pragmatic, not perfect)
- Distributed transactions: 2-phase commit has coordinator
- Network protocols: Use timeouts and retries
- Microservices: Eventual consistency instead of strong consistency
- Blockchain: Use probabilistic finality
Implementation in Go
Demonstrating the Problem
package main
import (
"fmt"
"math/rand"
"strings"
"sync"
"time"
)
// MessageType represents different message types
type MessageType string
const (
Attack MessageType = "ATTACK"
Acknowledge MessageType = "ACK"
AckOfAck MessageType = "ACK_OF_ACK"
)
// Message sent between generals
type Message struct {
Type MessageType
Sender string
Recipient string
SeqNum int
Timestamp time.Time
}
// UnreliableChannel simulates enemy territory
type UnreliableChannel struct {
LossRate float64 // Probability of message loss
Delay time.Duration
Delivered int
Lost int
mu sync.Mutex
}
func NewUnreliableChannel(lossRate float64, delay time.Duration) *UnreliableChannel {
return &UnreliableChannel{
LossRate: lossRate,
Delay: delay,
}
}
// Send attempts to deliver a message
func (uc *UnreliableChannel) Send(msg Message, recipient chan Message) bool {
uc.mu.Lock()
defer uc.mu.Unlock()
// Simulate network delay
time.Sleep(uc.Delay)
// Random message loss
if rand.Float64() < uc.LossRate {
uc.Lost++
fmt.Printf(" 📪 Message LOST: %s from %s → %s (seq: %d)\n",
msg.Type, msg.Sender, msg.Recipient, msg.SeqNum)
return false
}
// Deliver message
select {
case recipient <- msg:
uc.Delivered++
fmt.Printf(" ✉️ Message delivered: %s from %s → %s (seq: %d)\n",
msg.Type, msg.Sender, msg.Recipient, msg.SeqNum)
return true
case <-time.After(100 * time.Millisecond):
uc.Lost++
return false
}
}
// General represents one of the two generals
type General struct {
Name string
Inbox chan Message
Outbox chan Message
Channel *UnreliableChannel
Peer *General
AttackDecision bool
Confidence int // How confident are we?
MaxRounds int
mu sync.Mutex
wg sync.WaitGroup
}
func NewGeneral(name string, channel *UnreliableChannel, maxRounds int) *General {
return &General{
Name: name,
Inbox: make(chan Message, 100),
Outbox: make(chan Message, 100),
Channel: channel,
MaxRounds: maxRounds,
}
}
// Start begins general's message handling
func (g *General) Start() {
g.wg.Add(1)
go g.run()
}
// Stop gracefully stops the general
func (g *General) Stop() {
close(g.Inbox)
g.wg.Wait()
}
// run is the main event loop
func (g *General) run() {
defer g.wg.Done()
timeout := time.NewTimer(5 * time.Second)
defer timeout.Stop()
for {
select {
case msg, ok := <-g.Inbox:
if !ok {
return
}
g.handleMessage(msg)
case <-timeout.C:
return
}
}
}
// handleMessage processes incoming messages
func (g *General) handleMessage(msg Message) {
g.mu.Lock()
defer g.mu.Unlock()
fmt.Printf("[%s] 📨 Received %s (seq: %d)\n", g.Name, msg.Type, msg.SeqNum)
switch msg.Type {
case Attack:
// Received attack order
g.AttackDecision = true
g.Confidence++
// Send acknowledgment
ack := Message{
Type: Acknowledge,
Sender: g.Name,
Recipient: msg.Sender,
SeqNum: msg.SeqNum + 1,
Timestamp: time.Now(),
}
g.sendMessage(ack)
case Acknowledge:
// Received acknowledgment
g.Confidence++
fmt.Printf("[%s] ✅ Peer acknowledged! Confidence: %d\n", g.Name, g.Confidence)
// Send ACK of ACK
if g.Confidence < g.MaxRounds {
ackOfAck := Message{
Type: AckOfAck,
Sender: g.Name,
Recipient: msg.Sender,
SeqNum: msg.SeqNum + 1,
Timestamp: time.Now(),
}
g.sendMessage(ackOfAck)
}
case AckOfAck:
// Received ACK of ACK
g.Confidence++
fmt.Printf("[%s] ✅ Peer confirmed ACK! Confidence: %d\n", g.Name, g.Confidence)
// Continue the cycle?
if g.Confidence < g.MaxRounds {
ack := Message{
Type: Acknowledge,
Sender: g.Name,
Recipient: msg.Sender,
SeqNum: msg.SeqNum + 1,
Timestamp: time.Now(),
}
g.sendMessage(ack)
}
}
}
// sendMessage sends a message through the unreliable channel
func (g *General) sendMessage(msg Message) {
go func() {
fmt.Printf("[%s] 📤 Sending %s (seq: %d)\n", g.Name, msg.Type, msg.SeqNum)
g.Channel.Send(msg, g.Peer.Inbox)
}()
}
// InitiateAttack starts the coordination protocol
func (g *General) InitiateAttack() {
g.mu.Lock()
defer g.mu.Unlock()
fmt.Printf("\n[%s] 🎯 Initiating attack coordination\n", g.Name)
g.AttackDecision = true
msg := Message{
Type: Attack,
Sender: g.Name,
Recipient: g.Peer.Name,
SeqNum: 1,
Timestamp: time.Now(),
}
g.sendMessage(msg)
}
// WillAttack returns whether general will attack
func (g *General) WillAttack() bool {
g.mu.Lock()
defer g.mu.Unlock()
return g.AttackDecision && g.Confidence > 0
}
// GetConfidence returns confidence level
func (g *General) GetConfidence() int {
g.mu.Lock()
defer g.mu.Unlock()
return g.Confidence
}
func main() {
rand.Seed(42)
fmt.Println("=== Two Generals' Problem ===")
fmt.Println("Demonstrating impossibility of guaranteed consensus\n")
scenarios := []struct {
name string
lossRate float64
maxRounds int
}{
{"Perfect channel (0% loss)", 0.0, 3},
{"Low loss (20%)", 0.2, 5},
{"High loss (50%)", 0.5, 10},
}
for _, scenario := range scenarios {
fmt.Println("\n" + strings.Repeat("=", 60))
fmt.Printf("Scenario: %s\n", scenario.name)
fmt.Println(strings.Repeat("=", 60) + "\n")
runScenario(scenario.lossRate, scenario.maxRounds)
time.Sleep(1 * time.Second)
}
}
func runScenario(lossRate float64, maxRounds int) {
channel := NewUnreliableChannel(lossRate, 100*time.Millisecond)
generalA := NewGeneral("General A", channel, maxRounds)
generalB := NewGeneral("General B", channel, maxRounds)
// Link generals
generalA.Peer = generalB
generalB.Peer = generalA
// Start both generals
generalA.Start()
generalB.Start()
// General A initiates attack
time.Sleep(100 * time.Millisecond)
generalA.InitiateAttack()
// Let protocol run
time.Sleep(3 * time.Second)
// Check results
fmt.Println("\n📊 Results:")
fmt.Printf(" Channel stats: %d delivered, %d lost (%.1f%% loss)\n",
channel.Delivered, channel.Lost,
float64(channel.Lost)/float64(channel.Delivered+channel.Lost)*100)
aWillAttack := generalA.WillAttack()
bWillAttack := generalB.WillAttack()
aConfidence := generalA.GetConfidence()
bConfidence := generalB.GetConfidence()
fmt.Printf(" General A: Will attack=%v, Confidence=%d\n", aWillAttack, aConfidence)
fmt.Printf(" General B: Will attack=%v, Confidence=%d\n", bWillAttack, bConfidence)
if aWillAttack && bWillAttack {
fmt.Println(" ✅ Both generals will attack (coordinated)")
} else if !aWillAttack && !bWillAttack {
fmt.Println(" ⏸️ Neither general will attack (safe)")
} else {
fmt.Println(" ❌ UNCOORDINATED! One will attack, other won't (disaster)")
}
// Cleanup
generalA.Stop()
generalB.Stop()
}
How Real Systems Cope
Since perfect consensus is impossible, real systems use pragmatic solutions:
1. TCP Three-Way Handshake
// TCP doesn't solve Two Generals, but uses practical approach
type TCPHandshake struct {
ClientSeq int
ServerSeq int
Timeout time.Duration
}
// 1. Client → Server: SYN
// 2. Server → Client: SYN-ACK
// 3. Client → Server: ACK
// After 3-way handshake, both sides have "reasonable confidence"
// but not mathematical guarantee
2. Timeouts and Retries
func SendWithRetry(msg Message, maxRetries int, timeout time.Duration) error {
for i := 0; i < maxRetries; i++ {
if err := send(msg); err == nil {
// Wait for ACK
select {
case <-ackReceived:
return nil // Reasonable confidence
case <-time.After(timeout):
continue // Retry
}
}
}
return errors.New("failed after retries")
}
3. Eventual Consistency
// Accept that perfect consensus is impossible
// Use "eventually consistent" systems instead
type EventuallyConsistent struct {
State map[string]int
Conflicts int
}
// Updates propagate eventually
// Conflicts resolved via:
// - Last-write-wins
// - Vector clocks
// - CRDTs
4. Coordinator-Based Protocols
// Add a third party (coordinator)
// Breaks symmetry of Two Generals
type TwoPhaseCommit struct {
Coordinator *Node
Participants []*Node
}
// Phase 1: Coordinator asks "Can you commit?"
// Phase 2: If all yes, coordinator says "Commit!"
// Note: Coordinator can fail too (3-phase commit addresses this)
Real-World Examples
Microservices Communication
package main
import (
"context"
"fmt"
"time"
)
// Saga pattern for distributed transactions
type SagaOrchestrator struct {
Services []Service
Timeout time.Duration
}
type Service interface {
Execute(ctx context.Context) error
Compensate(ctx context.Context) error
}
func (so *SagaOrchestrator) Execute(ctx context.Context) error {
executed := []Service{}
for _, service := range so.Services {
ctx, cancel := context.WithTimeout(ctx, so.Timeout)
defer cancel()
if err := service.Execute(ctx); err != nil {
// Rollback all executed services
fmt.Printf("❌ Service failed, rolling back...\n")
for i := len(executed) - 1; i >= 0; i-- {
executed[i].Compensate(context.Background())
}
return err
}
executed = append(executed, service)
}
return nil
}
// Use compensation instead of guaranteed consensus
Idempotent Operations
// Make operations safe to retry
type IdempotentHandler struct {
Processed map[string]bool
mu sync.Mutex
}
func (ih *IdempotentHandler) Handle(requestID string, operation func()) {
ih.mu.Lock()
defer ih.mu.Unlock()
if ih.Processed[requestID] {
// Already processed, skip
return
}
operation()
ih.Processed[requestID] = true
}
// Allows safe retries without duplicates
Key Insights
1. Impossibility is Fundamental
No amount of clever engineering can solve Two Generals with 100% guarantee over unreliable channels.
2. Practical Solutions Exist
Systems use:
- Timeouts: Assume success after N confirmations
- Probabilistic: High confidence, not certainty
- Idempotency: Safe to retry
- Compensation: Undo on failure
3. CAP Theorem Connection
Two Generals relates to CAP theorem:
- Consistency
- Availability
- Partition tolerance
Can’t have all three!
Comparison Table
| Approach | Guarantee | Latency | Complexity | Use Case |
|---|---|---|---|---|
| Two Generals | Impossible | ∞ | Infinite | Theoretical |
| TCP Handshake | Practical | 1.5 RTT | Low | Networks |
| 2-Phase Commit | With coordinator | 2 RTT | Medium | Databases |
| Eventual Consistency | Eventually | Low | Medium | Distributed DBs |
| Saga Pattern | With compensation | High | High | Microservices |
When to Use What
✓ Use timeouts when:
- Perfect consistency not required
- Can tolerate occasional failure
- Low latency needed
✓ Use 2PC/3PC when:
- Strong consistency needed
- Coordinator available
- Can tolerate higher latency
✓ Use eventual consistency when:
- High availability critical
- Conflicts rare
- Can reconcile later
✓ Use saga pattern when:
- Long-running transactions
- Multiple services
- Compensating logic possible
Try It Yourself
- Vary loss rate - See how it affects coordination
- Add timeouts - Implement practical cutoffs
- Compare with 3-way handshake - TCP’s approach
- Implement compensation - Saga pattern
- Add vector clocks - Track causality
This is part 20 of “Golang Experiments: Classic Concurrency Problems”
Further Reading
- Original Paper: “Some Constraints and Tradeoffs in the Design of Network Communications” (1975)
- CAP Theorem: Brewer’s theorem on distributed systems
- TCP/IP Illustrated: Stevens’ analysis of TCP handshake
- Designing Data-Intensive Applications: Kleppmann’s chapter on consensus