The Byzantine Generals Problem

The Byzantine Generals Problem, proposed by Leslie Lamport, Robert Shostak, and Marshall Pease in 1982, is one of the most important problems in distributed systems. It addresses the challenge of achieving consensus when some participants may be faulty or malicious.

The Scenario

Byzantine army divisions surround a city:

  • N generals command their divisions
  • They must coordinate: attack or retreat
  • They communicate via messengers
  • Some generals are traitors who send conflicting messages
  • Goal: All loyal generals must agree on the same plan

The challenge:

  • If loyalists attack while others retreat → disaster
  • Traitors try to prevent consensus
  • Messages can be altered by traitor generals
  • A traitor commander can send different messages to different generals

The Core Problem

graph TD subgraph "Scenario 1: Traitor Lieutenant" C1[Commander: ATTACK] L1[Lieutenant 1: Loyal] L2[Lieutenant 2: TRAITOR] L3[Lieutenant 3: Loyal] C1 -->|attack| L1 C1 -->|attack| L2 C1 -->|attack| L3 L2 -->|"tells L1: retreat"| L1 L2 -->|"tells L3: attack"| L3 end subgraph "Scenario 2: Traitor Commander" C2[Commander: TRAITOR] L4[Lieutenant 1: Loyal] L5[Lieutenant 2: Loyal] L6[Lieutenant 3: Loyal] C2 -->|"tells L4: attack"| L4 C2 -->|"tells L5: retreat"| L5 C2 -->|"tells L6: attack"| L6 end

The Mathematical Result

Lamport’s Impossibility Result:

  • With n total generals
  • And f traitors
  • You need n ≥ 3f + 1 to reach consensus

Examples:

  • 1 traitor → need at least 4 generals
  • 2 traitors → need at least 7 generals
  • 3 traitors → need at least 10 generals

Real-World Applications

This pattern appears in critical systems:

  • Blockchain: Bitcoin, Ethereum use Byzantine consensus
  • Distributed databases: Ensure data consistency despite node failures
  • Flight control systems: Multiple redundant computers must agree
  • Nuclear reactor control: Safety-critical decisions with redundancy
  • Space missions: Multiple sensors must agree despite failures

Implementation in Go

Part 1: Basic Byzantine Consensus (Oral Messages)

package main

import (
	"fmt"
	"math/rand"
	"strings"
	"sync"
)

// Order represents the command
type Order string

const (
	Attack  Order = "ATTACK"
	Retreat Order = "RETREAT"
)

// Message sent between generals
type Message struct {
	Order  Order
	Sender int
	Path   []int // Track message path to detect forgery
}

// General represents a node in the system
type General struct {
	ID        int
	IsTraitor bool
	Received  map[int][]Message // Messages from each round
	mu        sync.Mutex
}

// ByzantineSystem coordinates the generals
type ByzantineSystem struct {
	Generals   []*General
	NumRounds  int
	Commander  *General
	FinalOrder Order
}

func NewByzantineSystem(numGenerals, numTraitors int) *ByzantineSystem {
	generals := make([]*General, numGenerals)

	// Create generals
	for i := 0; i < numGenerals; i++ {
		generals[i] = &General{
			ID:       i,
			Received: make(map[int][]Message),
		}
	}

	// Randomly assign traitors
	traitorIndices := rand.Perm(numGenerals)[:numTraitors]
	for _, idx := range traitorIndices {
		generals[idx].IsTraitor = true
		fmt.Printf("⚠️  General %d is a TRAITOR\n", idx)
	}

	return &ByzantineSystem{
		Generals:  generals,
		NumRounds: numTraitors + 1, // Need f+1 rounds for f traitors
		Commander: generals[0],
	}
}

// SendOrder - Commander sends initial order
func (bs *ByzantineSystem) SendOrder(order Order) {
	fmt.Printf("\n📜 Commander (General %d) sends order: %s\n", bs.Commander.ID, order)

	if bs.Commander.IsTraitor {
		// Traitor commander sends conflicting orders
		fmt.Println("🔥 Commander is a traitor! Sending conflicting orders...")
		for i, general := range bs.Generals {
			if i == bs.Commander.ID {
				continue
			}

			// Send random conflicting orders
			conflictingOrder := Attack
			if rand.Float32() > 0.5 {
				conflictingOrder = Retreat
			}

			msg := Message{
				Order:  conflictingOrder,
				Sender: bs.Commander.ID,
				Path:   []int{bs.Commander.ID},
			}
			general.ReceiveMessage(msg, 0)
			fmt.Printf("  → General %d receives: %s\n", i, conflictingOrder)
		}
	} else {
		// Loyal commander sends same order to all
		for i, general := range bs.Generals {
			if i == bs.Commander.ID {
				continue
			}

			msg := Message{
				Order:  order,
				Sender: bs.Commander.ID,
				Path:   []int{bs.Commander.ID},
			}
			general.ReceiveMessage(msg, 0)
			fmt.Printf("  → General %d receives: %s\n", i, order)
		}
	}
}

// ReceiveMessage - General receives a message
func (g *General) ReceiveMessage(msg Message, round int) {
	g.mu.Lock()
	defer g.mu.Unlock()

	if g.Received[round] == nil {
		g.Received[round] = []Message{}
	}
	g.Received[round] = append(g.Received[round], msg)
}

// RelayMessages - Generals relay messages to each other
func (bs *ByzantineSystem) RelayMessages(round int) {
	fmt.Printf("\n🔄 Round %d: Generals relay messages\n", round+1)

	var wg sync.WaitGroup

	for _, sender := range bs.Generals {
		wg.Add(1)
		go func(s *General) {
			defer wg.Done()

			s.mu.Lock()
			messages := s.Received[round]
			s.mu.Unlock()

			for _, msg := range messages {
				// Relay to all other generals
				for _, receiver := range bs.Generals {
					if receiver.ID == s.ID || receiver.ID == bs.Commander.ID {
						continue
					}

					// Traitor sends conflicting information
					relayedMsg := msg
					if s.IsTraitor {
						// Randomly flip the order
						if rand.Float32() > 0.5 {
							if relayedMsg.Order == Attack {
								relayedMsg.Order = Retreat
							} else {
								relayedMsg.Order = Attack
							}
						}
					}

					relayedMsg.Path = append(relayedMsg.Path, s.ID)
					receiver.ReceiveMessage(relayedMsg, round+1)
				}
			}
		}(sender)
	}

	wg.Wait()
}

// Decide - Each general decides based on majority
func (g *General) Decide() Order {
	g.mu.Lock()
	defer g.mu.Unlock()

	// Count votes from all rounds
	votes := make(map[Order]int)

	for round := 0; round < len(g.Received); round++ {
		for _, msg := range g.Received[round] {
			votes[msg.Order]++
		}
	}

	// Majority vote
	if votes[Attack] > votes[Retreat] {
		return Attack
	}
	return Retreat
}

// RunConsensus executes the Byzantine agreement protocol
func (bs *ByzantineSystem) RunConsensus(commanderOrder Order) {
	fmt.Println("\n=== Byzantine Generals Problem ===")
	fmt.Printf("Total generals: %d\n", len(bs.Generals))

	// Round 0: Commander sends initial order
	bs.SendOrder(commanderOrder)

	// Subsequent rounds: Relay messages
	for round := 0; round < bs.NumRounds-1; round++ {
		bs.RelayMessages(round)
	}

	// Each general decides
	fmt.Println("\n🎯 Final Decisions:")
	loyalDecisions := make(map[Order]int)

	for _, general := range bs.Generals {
		decision := general.Decide()
		fmt.Printf("  General %d (%s): %s\n",
			general.ID,
			map[bool]string{true: "TRAITOR", false: "LOYAL"}[general.IsTraitor],
			decision)

		if !general.IsTraitor {
			loyalDecisions[decision]++
		}
	}

	// Check consensus
	fmt.Println("\n📊 Consensus Result:")
	if len(loyalDecisions) == 1 {
		fmt.Println("✅ CONSENSUS ACHIEVED among loyal generals!")
		for order := range loyalDecisions {
			fmt.Printf("   All loyal generals agree: %s\n", order)
		}
	} else {
		fmt.Println("❌ CONSENSUS FAILED - loyal generals disagree!")
		fmt.Printf("   Attack: %d, Retreat: %d\n",
			loyalDecisions[Attack], loyalDecisions[Retreat])
	}
}

func main() {
	rand.Seed(42) // For reproducible results

	fmt.Println("Scenario 1: Loyal Commander, 1 Traitor Lieutenant")
	fmt.Println("(Requires n ≥ 3f+1, so 4 generals for 1 traitor)")
	fmt.Println(strings.Repeat("=", 60))

	system1 := NewByzantineSystem(4, 1)
	system1.RunConsensus(Attack)

	fmt.Println("\n\n")

	fmt.Println("Scenario 2: Traitor Commander")
	fmt.Println("(Consensus should still be achieved)")
	fmt.Println(strings.Repeat("=", 60))

	system2 := NewByzantineSystem(4, 1)
	system2.Generals[0].IsTraitor = true // Make commander a traitor
	fmt.Printf("⚠️  General 0 (Commander) is a TRAITOR\n")
	system2.RunConsensus(Attack)

	fmt.Println("\n\n")

	fmt.Println("Scenario 3: Too Few Generals (Consensus Impossible)")
	fmt.Println("(Only 3 generals with 1 traitor - violates n ≥ 3f+1)")
	fmt.Println(strings.Repeat("=", 60))

	system3 := NewByzantineSystem(3, 1)
	system3.RunConsensus(Attack)
}

How It Works

1. The n ≥ 3f+1 Requirement

graph LR subgraph "4 Generals, 1 Traitor: WORKS" A[3 Loyal] --> B[Can identify traitor] B --> C[Achieve consensus] end subgraph "3 Generals, 1 Traitor: FAILS" D[2 Loyal] --> E[Cannot identify traitor] E --> F[No consensus possible] end

2. Message Relay Protocol

// Round 0: Commander sends
for each lieutenant {
    send(order)
}

// Round 1-f: Lieutenants relay
for each round {
    for each message received {
        relay to all other lieutenants
    }
}

// Decision: Majority vote
decision = majority(all_received_messages)

3. Handling Traitor Commander

Even with a traitor commander, loyal generals can achieve consensus:

// Each loyal general collects messages
// Uses majority voting across all rounds
// If commander is traitor, his conflicting messages
// are outvoted by consistent relays from loyal lieutenants

Advanced: Practical Byzantine Fault Tolerance (PBFT)

Real systems use optimized protocols like PBFT:

package main

import (
	"crypto/sha256"
	"encoding/hex"
	"fmt"
	"sync"
	"time"
)

// PBFTMessage represents a message in PBFT
type PBFTMessage struct {
	Type      string // "pre-prepare", "prepare", "commit"
	View      int
	Sequence  int
	Digest    string
	ReplicaID int
	Timestamp time.Time
}

// PBFTReplica represents a replica in PBFT
type PBFTReplica struct {
	ID            int
	IsFaulty      bool
	View          int
	Sequence      int
	PrePrepared   map[string]*PBFTMessage
	Prepared      map[string]int
	Committed     map[string]int
	Executed      map[int]bool
	mu            sync.Mutex
	MessageLog    []PBFTMessage
}

type PBFTSystem struct {
	Replicas  []*PBFTReplica
	NumFaulty int
	Client    string
}

func NewPBFTSystem(numReplicas, numFaulty int) *PBFTSystem {
	replicas := make([]*PBFTReplica, numReplicas)

	for i := 0; i < numReplicas; i++ {
		replicas[i] = &PBFTReplica{
			ID:          i,
			PrePrepared: make(map[string]*PBFTMessage),
			Prepared:    make(map[string]int),
			Committed:   make(map[string]int),
			Executed:    make(map[int]bool),
		}
	}

	// Randomly assign faulty nodes
	for i := 0; i < numFaulty && i < numReplicas; i++ {
		replicas[i].IsFaulty = true
		fmt.Printf("⚠️  Replica %d is FAULTY\n", i)
	}

	return &PBFTSystem{
		Replicas:  replicas,
		NumFaulty: numFaulty,
		Client:    "client-1",
	}
}

// Request initiates a client request
func (pbft *PBFTSystem) Request(operation string) {
	fmt.Printf("\n📨 Client request: %s\n", operation)

	// Create digest
	hash := sha256.Sum256([]byte(operation))
	digest := hex.EncodeToString(hash[:])

	// Primary (replica 0) sends pre-prepare
	primary := pbft.Replicas[0]
	primary.mu.Lock()
	primary.Sequence++
	seq := primary.Sequence
	primary.mu.Unlock()

	msg := PBFTMessage{
		Type:      "pre-prepare",
		View:      0,
		Sequence:  seq,
		Digest:    digest,
		ReplicaID: 0,
		Timestamp: time.Now(),
	}

	fmt.Printf("1️⃣  PRE-PREPARE phase (seq=%d)\n", seq)
	pbft.BroadcastPrePrepare(msg)

	// Wait for consensus
	time.Sleep(100 * time.Millisecond)
	pbft.CheckConsensus(digest, seq)
}

// BroadcastPrePrepare sends pre-prepare to all replicas
func (pbft *PBFTSystem) BroadcastPrePrepare(msg PBFTMessage) {
	for _, replica := range pbft.Replicas {
		if replica.ID == 0 {
			continue
		}
		replica.ReceivePrePrepare(msg, pbft)
	}
}

// ReceivePrePrepare handles pre-prepare message
func (r *PBFTReplica) ReceivePrePrepare(msg PBFTMessage, pbft *PBFTSystem) {
	r.mu.Lock()
	defer r.mu.Unlock()

	if r.IsFaulty {
		// Faulty node ignores or sends conflicting messages
		return
	}

	r.PrePrepared[msg.Digest] = &msg
	r.MessageLog = append(r.MessageLog, msg)

	// Send PREPARE to all other replicas
	prepareMsg := PBFTMessage{
		Type:      "prepare",
		View:      msg.View,
		Sequence:  msg.Sequence,
		Digest:    msg.Digest,
		ReplicaID: r.ID,
		Timestamp: time.Now(),
	}

	fmt.Printf("   Replica %d: PRE-PREPARE accepted, sending PREPARE\n", r.ID)

	// Broadcast prepare
	go func() {
		for _, replica := range pbft.Replicas {
			if replica.ID != r.ID {
				replica.ReceivePrepare(prepareMsg)
			}
		}
	}()
}

// ReceivePrepare handles prepare message
func (r *PBFTReplica) ReceivePrepare(msg PBFTMessage) {
	r.mu.Lock()
	defer r.mu.Unlock()

	if r.IsFaulty {
		return
	}

	r.Prepared[msg.Digest]++
	r.MessageLog = append(r.MessageLog, msg)

	// Need 2f+1 prepares (including self)
	requiredPrepares := 2*len(r.Prepared)/3 + 1

	if r.Prepared[msg.Digest] >= requiredPrepares {
		fmt.Printf("2️⃣  Replica %d: PREPARED (received %d PREPARE messages)\n",
			r.ID, r.Prepared[msg.Digest])

		// Send COMMIT
		commitMsg := PBFTMessage{
			Type:      "commit",
			View:      msg.View,
			Sequence:  msg.Sequence,
			Digest:    msg.Digest,
			ReplicaID: r.ID,
			Timestamp: time.Now(),
		}

		// Broadcast commit
		// (simplified: just increment counter)
		r.Committed[msg.Digest]++
	}
}

// CheckConsensus checks if consensus was reached
func (pbft *PBFTSystem) CheckConsensus(digest string, seq int) {
	fmt.Println("\n3️⃣  COMMIT phase:")

	committed := 0
	for _, replica := range pbft.Replicas {
		replica.mu.Lock()
		count := replica.Committed[digest]
		replica.mu.Unlock()

		if !replica.IsFaulty && count > 0 {
			committed++
			replica.mu.Lock()
			replica.Executed[seq] = true
			replica.mu.Unlock()
			fmt.Printf("   Replica %d: COMMITTED and EXECUTED\n", replica.ID)
		}
	}

	requiredCommits := 2*len(pbft.Replicas)/3 + 1

	fmt.Println("\n📊 Consensus Result:")
	if committed >= requiredCommits {
		fmt.Printf("✅ CONSENSUS ACHIEVED (%d/%d replicas committed)\n",
			committed, len(pbft.Replicas))
	} else {
		fmt.Printf("❌ CONSENSUS FAILED (%d/%d replicas committed)\n",
			committed, len(pbft.Replicas))
	}
}

func RunPBFT() {
	fmt.Println("\n=== Practical Byzantine Fault Tolerance (PBFT) ===\n")

	// 4 replicas can tolerate 1 fault (n = 3f + 1)
	system := NewPBFTSystem(4, 1)

	system.Request("SET x = 42")
}

func main() {
	RunPBFT()
}

Key Concepts

1. The 3f+1 Rule

Why this magic number?

  • Need majority of loyal nodes: ⌈(n+f)/2⌉ > f
  • Simplifies to: n > 3f
  • Minimum: n ≥ 3f + 1

2. Message Rounds

// Round complexity: O(n^(f+1))
// For f=1: O(n^2) messages
// For f=2: O(n^3) messages

3. Digital Signatures Help

With signatures (Byzantine Generals with signed messages):

  • Only need n ≥ 2f + 1
  • Fewer messages required
  • Can detect forgeries

Real-World Examples

Bitcoin (Proof of Work)

// Simplified Bitcoin consensus
type Block struct {
    PrevHash     string
    Transactions []string
    Nonce        int
    Hash         string
}

// Byzantine fault tolerance through computational work
func MineBlock(block *Block, difficulty int) {
    for {
        block.Hash = calculateHash(block)
        if hasLeadingZeros(block.Hash, difficulty) {
            break // Found valid block
        }
        block.Nonce++
    }
}

// Longest chain wins - Byzantine nodes can't
// rewrite history without 51% of hash power

Distributed Database

// Replicated state machine with Byzantine tolerance
type ReplicatedDB struct {
    Replicas []Replica
    State    map[string]string
}

func (db *ReplicatedDB) Write(key, value string) error {
    // Propose write to all replicas
    // Wait for 2f+1 confirmations
    // Apply to state
}

Testing Byzantine Consensus

func TestByzantineConsensus(t *testing.T) {
    scenarios := []struct {
        name       string
        numGenerals int
        numTraitors int
        shouldWork bool
    }{
        {"4 generals, 1 traitor", 4, 1, true},
        {"7 generals, 2 traitors", 7, 2, true},
        {"3 generals, 1 traitor", 3, 1, false},
        {"5 generals, 2 traitors", 5, 2, false},
    }

    for _, tc := range scenarios {
        t.Run(tc.name, func(t *testing.T) {
            system := NewByzantineSystem(tc.numGenerals, tc.numTraitors)
            system.RunConsensus(Attack)

            // Check if loyal generals agree
            decisions := make(map[Order]int)
            for _, g := range system.Generals {
                if !g.IsTraitor {
                    decisions[g.Decide()]++
                }
            }

            achieved := len(decisions) == 1
            if achieved != tc.shouldWork {
                t.Errorf("Expected consensus=%v, got %v", tc.shouldWork, achieved)
            }
        })
    }
}

Performance Characteristics

Metric Value
Messages (oral) O(n^(f+1))
Messages (signed) O(n²)
Rounds f + 1
Min nodes 3f + 1
Latency High (multiple rounds)
Complexity Exponential in f

When to Use

Use when:

  • System must tolerate arbitrary failures
  • Nodes may behave maliciously
  • No trusted party exists
  • Consensus is safety-critical
  • Examples: blockchain, aerospace, nuclear control

Avoid when:

  • Only crash failures possible (use Paxos/Raft instead)
  • Performance is critical
  • Trusted central authority exists
  • Cost of Byzantine tolerance too high

Try It Yourself

  1. Vary f and n - Test the 3f+1 boundary
  2. Add signatures - Implement signed messages
  3. Optimize PBFT - Reduce message complexity
  4. Simulate network partitions - Handle message delays
  5. Add view changes - Handle primary failures

This is part 17 of “Golang Experiments: Classic Concurrency Problems”

Further Reading

  • Original Paper: “The Byzantine Generals Problem” - Lamport, Shostak, Pease (1982)
  • PBFT Paper: “Practical Byzantine Fault Tolerance” - Castro & Liskov (1999)
  • Bitcoin Whitepaper: Satoshi Nakamoto (2008)