The Two Generals’ Problem

The Two Generals’ Problem, also known as the Two Armies Problem, is a classic thought experiment that demonstrates the impossibility of achieving perfect consensus over an unreliable communication channel. It was first formulated by E. A. Akkoyunlu, K. Ekanadham, and R. V. Huber in 1975.

The Scenario

Two armies need to coordinate an attack:

  • General A and General B surround an enemy
  • They must attack simultaneously to win
  • If only one attacks → defeat
  • They communicate via messengers through enemy territory
  • Messages can be lost or intercepted
  • Question: Can they guarantee coordinated attack?

The Impossible Dilemma

sequenceDiagram participant GA as General A participant Enemy as Enemy Territory participant GB as General B Note over GA: Wants to attack
at dawn GA->>Enemy: "Attack at dawn" Enemy->>GB: Message delivered Note over GB: Received message,
but A doesn't know! GB->>Enemy: "Acknowledged" Enemy->>GA: ACK delivered? Note over GA: Received ACK,
but B doesn't know! GA->>Enemy: "ACK of ACK" Enemy->>GB: Delivered? Note over GA,GB: This never ends!

The Core Problem

The infinite regress:

  1. General A sends: “Attack at dawn”
  2. Did B receive it? A doesn’t know!
  3. B sends ACK: “Got it, will attack”
  4. Did A receive ACK? B doesn’t know!
  5. A sends ACK-of-ACK: “Got your ACK”
  6. Did B receive it? A doesn’t know!
  7. Infinite cycle, no guaranteed consensus

The Impossibility Result

Theorem: It is impossible to achieve guaranteed consensus between two parties communicating over an unreliable channel.

Proof sketch:

  • Assume protocol with n messages achieves consensus
  • Last message cannot be essential (might be lost)
  • Protocol with n-1 messages must work
  • By induction, 0 messages must work → contradiction

Real-World Implications

This impossibility affects many systems:

  • TCP handshake: Uses 3-way handshake (pragmatic, not perfect)
  • Distributed transactions: 2-phase commit has coordinator
  • Network protocols: Use timeouts and retries
  • Microservices: Eventual consistency instead of strong consistency
  • Blockchain: Use probabilistic finality

Implementation in Go

Demonstrating the Problem

package main

import (
	"fmt"
	"math/rand"
	"strings"
	"sync"
	"time"
)

// MessageType represents different message types
type MessageType string

const (
	Attack       MessageType = "ATTACK"
	Acknowledge  MessageType = "ACK"
	AckOfAck     MessageType = "ACK_OF_ACK"
)

// Message sent between generals
type Message struct {
	Type      MessageType
	Sender    string
	Recipient string
	SeqNum    int
	Timestamp time.Time
}

// UnreliableChannel simulates enemy territory
type UnreliableChannel struct {
	LossRate  float64 // Probability of message loss
	Delay     time.Duration
	Delivered int
	Lost      int
	mu        sync.Mutex
}

func NewUnreliableChannel(lossRate float64, delay time.Duration) *UnreliableChannel {
	return &UnreliableChannel{
		LossRate: lossRate,
		Delay:    delay,
	}
}

// Send attempts to deliver a message
func (uc *UnreliableChannel) Send(msg Message, recipient chan Message) bool {
	uc.mu.Lock()
	defer uc.mu.Unlock()

	// Simulate network delay
	time.Sleep(uc.Delay)

	// Random message loss
	if rand.Float64() < uc.LossRate {
		uc.Lost++
		fmt.Printf("  📪 Message LOST: %s from %s → %s (seq: %d)\n",
			msg.Type, msg.Sender, msg.Recipient, msg.SeqNum)
		return false
	}

	// Deliver message
	select {
	case recipient <- msg:
		uc.Delivered++
		fmt.Printf("  ✉️  Message delivered: %s from %s → %s (seq: %d)\n",
			msg.Type, msg.Sender, msg.Recipient, msg.SeqNum)
		return true
	case <-time.After(100 * time.Millisecond):
		uc.Lost++
		return false
	}
}

// General represents one of the two generals
type General struct {
	Name           string
	Inbox          chan Message
	Outbox         chan Message
	Channel        *UnreliableChannel
	Peer           *General
	AttackDecision bool
	Confidence     int // How confident are we?
	MaxRounds      int
	mu             sync.Mutex
	wg             sync.WaitGroup
}

func NewGeneral(name string, channel *UnreliableChannel, maxRounds int) *General {
	return &General{
		Name:       name,
		Inbox:      make(chan Message, 100),
		Outbox:     make(chan Message, 100),
		Channel:    channel,
		MaxRounds:  maxRounds,
	}
}

// Start begins general's message handling
func (g *General) Start() {
	g.wg.Add(1)
	go g.run()
}

// Stop gracefully stops the general
func (g *General) Stop() {
	close(g.Inbox)
	g.wg.Wait()
}

// run is the main event loop
func (g *General) run() {
	defer g.wg.Done()

	timeout := time.NewTimer(5 * time.Second)
	defer timeout.Stop()

	for {
		select {
		case msg, ok := <-g.Inbox:
			if !ok {
				return
			}
			g.handleMessage(msg)

		case <-timeout.C:
			return
		}
	}
}

// handleMessage processes incoming messages
func (g *General) handleMessage(msg Message) {
	g.mu.Lock()
	defer g.mu.Unlock()

	fmt.Printf("[%s] 📨 Received %s (seq: %d)\n", g.Name, msg.Type, msg.SeqNum)

	switch msg.Type {
	case Attack:
		// Received attack order
		g.AttackDecision = true
		g.Confidence++

		// Send acknowledgment
		ack := Message{
			Type:      Acknowledge,
			Sender:    g.Name,
			Recipient: msg.Sender,
			SeqNum:    msg.SeqNum + 1,
			Timestamp: time.Now(),
		}
		g.sendMessage(ack)

	case Acknowledge:
		// Received acknowledgment
		g.Confidence++
		fmt.Printf("[%s] ✅ Peer acknowledged! Confidence: %d\n", g.Name, g.Confidence)

		// Send ACK of ACK
		if g.Confidence < g.MaxRounds {
			ackOfAck := Message{
				Type:      AckOfAck,
				Sender:    g.Name,
				Recipient: msg.Sender,
				SeqNum:    msg.SeqNum + 1,
				Timestamp: time.Now(),
			}
			g.sendMessage(ackOfAck)
		}

	case AckOfAck:
		// Received ACK of ACK
		g.Confidence++
		fmt.Printf("[%s] ✅ Peer confirmed ACK! Confidence: %d\n", g.Name, g.Confidence)

		// Continue the cycle?
		if g.Confidence < g.MaxRounds {
			ack := Message{
				Type:      Acknowledge,
				Sender:    g.Name,
				Recipient: msg.Sender,
				SeqNum:    msg.SeqNum + 1,
				Timestamp: time.Now(),
			}
			g.sendMessage(ack)
		}
	}
}

// sendMessage sends a message through the unreliable channel
func (g *General) sendMessage(msg Message) {
	go func() {
		fmt.Printf("[%s] 📤 Sending %s (seq: %d)\n", g.Name, msg.Type, msg.SeqNum)
		g.Channel.Send(msg, g.Peer.Inbox)
	}()
}

// InitiateAttack starts the coordination protocol
func (g *General) InitiateAttack() {
	g.mu.Lock()
	defer g.mu.Unlock()

	fmt.Printf("\n[%s] 🎯 Initiating attack coordination\n", g.Name)
	g.AttackDecision = true

	msg := Message{
		Type:      Attack,
		Sender:    g.Name,
		Recipient: g.Peer.Name,
		SeqNum:    1,
		Timestamp: time.Now(),
	}

	g.sendMessage(msg)
}

// WillAttack returns whether general will attack
func (g *General) WillAttack() bool {
	g.mu.Lock()
	defer g.mu.Unlock()
	return g.AttackDecision && g.Confidence > 0
}

// GetConfidence returns confidence level
func (g *General) GetConfidence() int {
	g.mu.Lock()
	defer g.mu.Unlock()
	return g.Confidence
}

func main() {
	rand.Seed(42)

	fmt.Println("=== Two Generals' Problem ===")
	fmt.Println("Demonstrating impossibility of guaranteed consensus\n")

	scenarios := []struct {
		name     string
		lossRate float64
		maxRounds int
	}{
		{"Perfect channel (0% loss)", 0.0, 3},
		{"Low loss (20%)", 0.2, 5},
		{"High loss (50%)", 0.5, 10},
	}

	for _, scenario := range scenarios {
		fmt.Println("\n" + strings.Repeat("=", 60))
		fmt.Printf("Scenario: %s\n", scenario.name)
		fmt.Println(strings.Repeat("=", 60) + "\n")

		runScenario(scenario.lossRate, scenario.maxRounds)
		time.Sleep(1 * time.Second)
	}
}

func runScenario(lossRate float64, maxRounds int) {
	channel := NewUnreliableChannel(lossRate, 100*time.Millisecond)

	generalA := NewGeneral("General A", channel, maxRounds)
	generalB := NewGeneral("General B", channel, maxRounds)

	// Link generals
	generalA.Peer = generalB
	generalB.Peer = generalA

	// Start both generals
	generalA.Start()
	generalB.Start()

	// General A initiates attack
	time.Sleep(100 * time.Millisecond)
	generalA.InitiateAttack()

	// Let protocol run
	time.Sleep(3 * time.Second)

	// Check results
	fmt.Println("\n📊 Results:")
	fmt.Printf("  Channel stats: %d delivered, %d lost (%.1f%% loss)\n",
		channel.Delivered, channel.Lost,
		float64(channel.Lost)/float64(channel.Delivered+channel.Lost)*100)

	aWillAttack := generalA.WillAttack()
	bWillAttack := generalB.WillAttack()
	aConfidence := generalA.GetConfidence()
	bConfidence := generalB.GetConfidence()

	fmt.Printf("  General A: Will attack=%v, Confidence=%d\n", aWillAttack, aConfidence)
	fmt.Printf("  General B: Will attack=%v, Confidence=%d\n", bWillAttack, bConfidence)

	if aWillAttack && bWillAttack {
		fmt.Println("  ✅ Both generals will attack (coordinated)")
	} else if !aWillAttack && !bWillAttack {
		fmt.Println("  ⏸️  Neither general will attack (safe)")
	} else {
		fmt.Println("  ❌ UNCOORDINATED! One will attack, other won't (disaster)")
	}

	// Cleanup
	generalA.Stop()
	generalB.Stop()
}

How Real Systems Cope

Since perfect consensus is impossible, real systems use pragmatic solutions:

1. TCP Three-Way Handshake

// TCP doesn't solve Two Generals, but uses practical approach
type TCPHandshake struct {
	ClientSeq int
	ServerSeq int
	Timeout   time.Duration
}

// 1. Client → Server: SYN
// 2. Server → Client: SYN-ACK
// 3. Client → Server: ACK

// After 3-way handshake, both sides have "reasonable confidence"
// but not mathematical guarantee

2. Timeouts and Retries

func SendWithRetry(msg Message, maxRetries int, timeout time.Duration) error {
	for i := 0; i < maxRetries; i++ {
		if err := send(msg); err == nil {
			// Wait for ACK
			select {
			case <-ackReceived:
				return nil // Reasonable confidence
			case <-time.After(timeout):
				continue // Retry
			}
		}
	}
	return errors.New("failed after retries")
}

3. Eventual Consistency

// Accept that perfect consensus is impossible
// Use "eventually consistent" systems instead

type EventuallyConsistent struct {
	State     map[string]int
	Conflicts int
}

// Updates propagate eventually
// Conflicts resolved via:
// - Last-write-wins
// - Vector clocks
// - CRDTs

4. Coordinator-Based Protocols

// Add a third party (coordinator)
// Breaks symmetry of Two Generals

type TwoPhaseCommit struct {
	Coordinator *Node
	Participants []*Node
}

// Phase 1: Coordinator asks "Can you commit?"
// Phase 2: If all yes, coordinator says "Commit!"

// Note: Coordinator can fail too (3-phase commit addresses this)

Real-World Examples

Microservices Communication

package main

import (
	"context"
	"fmt"
	"time"
)

// Saga pattern for distributed transactions
type SagaOrchestrator struct {
	Services []Service
	Timeout  time.Duration
}

type Service interface {
	Execute(ctx context.Context) error
	Compensate(ctx context.Context) error
}

func (so *SagaOrchestrator) Execute(ctx context.Context) error {
	executed := []Service{}

	for _, service := range so.Services {
		ctx, cancel := context.WithTimeout(ctx, so.Timeout)
		defer cancel()

		if err := service.Execute(ctx); err != nil {
			// Rollback all executed services
			fmt.Printf("❌ Service failed, rolling back...\n")
			for i := len(executed) - 1; i >= 0; i-- {
				executed[i].Compensate(context.Background())
			}
			return err
		}
		executed = append(executed, service)
	}

	return nil
}

// Use compensation instead of guaranteed consensus

Idempotent Operations

// Make operations safe to retry
type IdempotentHandler struct {
	Processed map[string]bool
	mu        sync.Mutex
}

func (ih *IdempotentHandler) Handle(requestID string, operation func()) {
	ih.mu.Lock()
	defer ih.mu.Unlock()

	if ih.Processed[requestID] {
		// Already processed, skip
		return
	}

	operation()
	ih.Processed[requestID] = true
}

// Allows safe retries without duplicates

Key Insights

1. Impossibility is Fundamental

No amount of clever engineering can solve Two Generals with 100% guarantee over unreliable channels.

2. Practical Solutions Exist

Systems use:

  • Timeouts: Assume success after N confirmations
  • Probabilistic: High confidence, not certainty
  • Idempotency: Safe to retry
  • Compensation: Undo on failure

3. CAP Theorem Connection

Two Generals relates to CAP theorem:

  • Consistency
  • Availability
  • Partition tolerance

Can’t have all three!

Comparison Table

Approach Guarantee Latency Complexity Use Case
Two Generals Impossible Infinite Theoretical
TCP Handshake Practical 1.5 RTT Low Networks
2-Phase Commit With coordinator 2 RTT Medium Databases
Eventual Consistency Eventually Low Medium Distributed DBs
Saga Pattern With compensation High High Microservices

When to Use What

Use timeouts when:

  • Perfect consistency not required
  • Can tolerate occasional failure
  • Low latency needed

Use 2PC/3PC when:

  • Strong consistency needed
  • Coordinator available
  • Can tolerate higher latency

Use eventual consistency when:

  • High availability critical
  • Conflicts rare
  • Can reconcile later

Use saga pattern when:

  • Long-running transactions
  • Multiple services
  • Compensating logic possible

Try It Yourself

  1. Vary loss rate - See how it affects coordination
  2. Add timeouts - Implement practical cutoffs
  3. Compare with 3-way handshake - TCP’s approach
  4. Implement compensation - Saga pattern
  5. Add vector clocks - Track causality

This is part 20 of “Golang Experiments: Classic Concurrency Problems”

Further Reading

  • Original Paper: “Some Constraints and Tradeoffs in the Design of Network Communications” (1975)
  • CAP Theorem: Brewer’s theorem on distributed systems
  • TCP/IP Illustrated: Stevens’ analysis of TCP handshake
  • Designing Data-Intensive Applications: Kleppmann’s chapter on consensus