The Byzantine Generals Problem
The Byzantine Generals Problem, proposed by Leslie Lamport, Robert Shostak, and Marshall Pease in 1982, is one of the most important problems in distributed systems. It addresses the challenge of achieving consensus when some participants may be faulty or malicious.
The Scenario
Byzantine army divisions surround a city:
- N generals command their divisions
- They must coordinate: attack or retreat
- They communicate via messengers
- Some generals are traitors who send conflicting messages
- Goal: All loyal generals must agree on the same plan
The challenge:
- If loyalists attack while others retreat → disaster
- Traitors try to prevent consensus
- Messages can be altered by traitor generals
- A traitor commander can send different messages to different generals
The Core Problem
The Mathematical Result
Lamport’s Impossibility Result:
- With n total generals
- And f traitors
- You need n ≥ 3f + 1 to reach consensus
Examples:
- 1 traitor → need at least 4 generals
- 2 traitors → need at least 7 generals
- 3 traitors → need at least 10 generals
Real-World Applications
This pattern appears in critical systems:
- Blockchain: Bitcoin, Ethereum use Byzantine consensus
- Distributed databases: Ensure data consistency despite node failures
- Flight control systems: Multiple redundant computers must agree
- Nuclear reactor control: Safety-critical decisions with redundancy
- Space missions: Multiple sensors must agree despite failures
Implementation in Go
Part 1: Basic Byzantine Consensus (Oral Messages)
package main
import (
"fmt"
"math/rand"
"strings"
"sync"
)
// Order represents the command
type Order string
const (
Attack Order = "ATTACK"
Retreat Order = "RETREAT"
)
// Message sent between generals
type Message struct {
Order Order
Sender int
Path []int // Track message path to detect forgery
}
// General represents a node in the system
type General struct {
ID int
IsTraitor bool
Received map[int][]Message // Messages from each round
mu sync.Mutex
}
// ByzantineSystem coordinates the generals
type ByzantineSystem struct {
Generals []*General
NumRounds int
Commander *General
FinalOrder Order
}
func NewByzantineSystem(numGenerals, numTraitors int) *ByzantineSystem {
generals := make([]*General, numGenerals)
// Create generals
for i := 0; i < numGenerals; i++ {
generals[i] = &General{
ID: i,
Received: make(map[int][]Message),
}
}
// Randomly assign traitors
traitorIndices := rand.Perm(numGenerals)[:numTraitors]
for _, idx := range traitorIndices {
generals[idx].IsTraitor = true
fmt.Printf("⚠️ General %d is a TRAITOR\n", idx)
}
return &ByzantineSystem{
Generals: generals,
NumRounds: numTraitors + 1, // Need f+1 rounds for f traitors
Commander: generals[0],
}
}
// SendOrder - Commander sends initial order
func (bs *ByzantineSystem) SendOrder(order Order) {
fmt.Printf("\n📜 Commander (General %d) sends order: %s\n", bs.Commander.ID, order)
if bs.Commander.IsTraitor {
// Traitor commander sends conflicting orders
fmt.Println("🔥 Commander is a traitor! Sending conflicting orders...")
for i, general := range bs.Generals {
if i == bs.Commander.ID {
continue
}
// Send random conflicting orders
conflictingOrder := Attack
if rand.Float32() > 0.5 {
conflictingOrder = Retreat
}
msg := Message{
Order: conflictingOrder,
Sender: bs.Commander.ID,
Path: []int{bs.Commander.ID},
}
general.ReceiveMessage(msg, 0)
fmt.Printf(" → General %d receives: %s\n", i, conflictingOrder)
}
} else {
// Loyal commander sends same order to all
for i, general := range bs.Generals {
if i == bs.Commander.ID {
continue
}
msg := Message{
Order: order,
Sender: bs.Commander.ID,
Path: []int{bs.Commander.ID},
}
general.ReceiveMessage(msg, 0)
fmt.Printf(" → General %d receives: %s\n", i, order)
}
}
}
// ReceiveMessage - General receives a message
func (g *General) ReceiveMessage(msg Message, round int) {
g.mu.Lock()
defer g.mu.Unlock()
if g.Received[round] == nil {
g.Received[round] = []Message{}
}
g.Received[round] = append(g.Received[round], msg)
}
// RelayMessages - Generals relay messages to each other
func (bs *ByzantineSystem) RelayMessages(round int) {
fmt.Printf("\n🔄 Round %d: Generals relay messages\n", round+1)
var wg sync.WaitGroup
for _, sender := range bs.Generals {
wg.Add(1)
go func(s *General) {
defer wg.Done()
s.mu.Lock()
messages := s.Received[round]
s.mu.Unlock()
for _, msg := range messages {
// Relay to all other generals
for _, receiver := range bs.Generals {
if receiver.ID == s.ID || receiver.ID == bs.Commander.ID {
continue
}
// Traitor sends conflicting information
relayedMsg := msg
if s.IsTraitor {
// Randomly flip the order
if rand.Float32() > 0.5 {
if relayedMsg.Order == Attack {
relayedMsg.Order = Retreat
} else {
relayedMsg.Order = Attack
}
}
}
relayedMsg.Path = append(relayedMsg.Path, s.ID)
receiver.ReceiveMessage(relayedMsg, round+1)
}
}
}(sender)
}
wg.Wait()
}
// Decide - Each general decides based on majority
func (g *General) Decide() Order {
g.mu.Lock()
defer g.mu.Unlock()
// Count votes from all rounds
votes := make(map[Order]int)
for round := 0; round < len(g.Received); round++ {
for _, msg := range g.Received[round] {
votes[msg.Order]++
}
}
// Majority vote
if votes[Attack] > votes[Retreat] {
return Attack
}
return Retreat
}
// RunConsensus executes the Byzantine agreement protocol
func (bs *ByzantineSystem) RunConsensus(commanderOrder Order) {
fmt.Println("\n=== Byzantine Generals Problem ===")
fmt.Printf("Total generals: %d\n", len(bs.Generals))
// Round 0: Commander sends initial order
bs.SendOrder(commanderOrder)
// Subsequent rounds: Relay messages
for round := 0; round < bs.NumRounds-1; round++ {
bs.RelayMessages(round)
}
// Each general decides
fmt.Println("\n🎯 Final Decisions:")
loyalDecisions := make(map[Order]int)
for _, general := range bs.Generals {
decision := general.Decide()
fmt.Printf(" General %d (%s): %s\n",
general.ID,
map[bool]string{true: "TRAITOR", false: "LOYAL"}[general.IsTraitor],
decision)
if !general.IsTraitor {
loyalDecisions[decision]++
}
}
// Check consensus
fmt.Println("\n📊 Consensus Result:")
if len(loyalDecisions) == 1 {
fmt.Println("✅ CONSENSUS ACHIEVED among loyal generals!")
for order := range loyalDecisions {
fmt.Printf(" All loyal generals agree: %s\n", order)
}
} else {
fmt.Println("❌ CONSENSUS FAILED - loyal generals disagree!")
fmt.Printf(" Attack: %d, Retreat: %d\n",
loyalDecisions[Attack], loyalDecisions[Retreat])
}
}
func main() {
rand.Seed(42) // For reproducible results
fmt.Println("Scenario 1: Loyal Commander, 1 Traitor Lieutenant")
fmt.Println("(Requires n ≥ 3f+1, so 4 generals for 1 traitor)")
fmt.Println(strings.Repeat("=", 60))
system1 := NewByzantineSystem(4, 1)
system1.RunConsensus(Attack)
fmt.Println("\n\n")
fmt.Println("Scenario 2: Traitor Commander")
fmt.Println("(Consensus should still be achieved)")
fmt.Println(strings.Repeat("=", 60))
system2 := NewByzantineSystem(4, 1)
system2.Generals[0].IsTraitor = true // Make commander a traitor
fmt.Printf("⚠️ General 0 (Commander) is a TRAITOR\n")
system2.RunConsensus(Attack)
fmt.Println("\n\n")
fmt.Println("Scenario 3: Too Few Generals (Consensus Impossible)")
fmt.Println("(Only 3 generals with 1 traitor - violates n ≥ 3f+1)")
fmt.Println(strings.Repeat("=", 60))
system3 := NewByzantineSystem(3, 1)
system3.RunConsensus(Attack)
}
How It Works
1. The n ≥ 3f+1 Requirement
2. Message Relay Protocol
// Round 0: Commander sends
for each lieutenant {
send(order)
}
// Round 1-f: Lieutenants relay
for each round {
for each message received {
relay to all other lieutenants
}
}
// Decision: Majority vote
decision = majority(all_received_messages)
3. Handling Traitor Commander
Even with a traitor commander, loyal generals can achieve consensus:
// Each loyal general collects messages
// Uses majority voting across all rounds
// If commander is traitor, his conflicting messages
// are outvoted by consistent relays from loyal lieutenants
Advanced: Practical Byzantine Fault Tolerance (PBFT)
Real systems use optimized protocols like PBFT:
package main
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"sync"
"time"
)
// PBFTMessage represents a message in PBFT
type PBFTMessage struct {
Type string // "pre-prepare", "prepare", "commit"
View int
Sequence int
Digest string
ReplicaID int
Timestamp time.Time
}
// PBFTReplica represents a replica in PBFT
type PBFTReplica struct {
ID int
IsFaulty bool
View int
Sequence int
PrePrepared map[string]*PBFTMessage
Prepared map[string]int
Committed map[string]int
Executed map[int]bool
mu sync.Mutex
MessageLog []PBFTMessage
}
type PBFTSystem struct {
Replicas []*PBFTReplica
NumFaulty int
Client string
}
func NewPBFTSystem(numReplicas, numFaulty int) *PBFTSystem {
replicas := make([]*PBFTReplica, numReplicas)
for i := 0; i < numReplicas; i++ {
replicas[i] = &PBFTReplica{
ID: i,
PrePrepared: make(map[string]*PBFTMessage),
Prepared: make(map[string]int),
Committed: make(map[string]int),
Executed: make(map[int]bool),
}
}
// Randomly assign faulty nodes
for i := 0; i < numFaulty && i < numReplicas; i++ {
replicas[i].IsFaulty = true
fmt.Printf("⚠️ Replica %d is FAULTY\n", i)
}
return &PBFTSystem{
Replicas: replicas,
NumFaulty: numFaulty,
Client: "client-1",
}
}
// Request initiates a client request
func (pbft *PBFTSystem) Request(operation string) {
fmt.Printf("\n📨 Client request: %s\n", operation)
// Create digest
hash := sha256.Sum256([]byte(operation))
digest := hex.EncodeToString(hash[:])
// Primary (replica 0) sends pre-prepare
primary := pbft.Replicas[0]
primary.mu.Lock()
primary.Sequence++
seq := primary.Sequence
primary.mu.Unlock()
msg := PBFTMessage{
Type: "pre-prepare",
View: 0,
Sequence: seq,
Digest: digest,
ReplicaID: 0,
Timestamp: time.Now(),
}
fmt.Printf("1️⃣ PRE-PREPARE phase (seq=%d)\n", seq)
pbft.BroadcastPrePrepare(msg)
// Wait for consensus
time.Sleep(100 * time.Millisecond)
pbft.CheckConsensus(digest, seq)
}
// BroadcastPrePrepare sends pre-prepare to all replicas
func (pbft *PBFTSystem) BroadcastPrePrepare(msg PBFTMessage) {
for _, replica := range pbft.Replicas {
if replica.ID == 0 {
continue
}
replica.ReceivePrePrepare(msg, pbft)
}
}
// ReceivePrePrepare handles pre-prepare message
func (r *PBFTReplica) ReceivePrePrepare(msg PBFTMessage, pbft *PBFTSystem) {
r.mu.Lock()
defer r.mu.Unlock()
if r.IsFaulty {
// Faulty node ignores or sends conflicting messages
return
}
r.PrePrepared[msg.Digest] = &msg
r.MessageLog = append(r.MessageLog, msg)
// Send PREPARE to all other replicas
prepareMsg := PBFTMessage{
Type: "prepare",
View: msg.View,
Sequence: msg.Sequence,
Digest: msg.Digest,
ReplicaID: r.ID,
Timestamp: time.Now(),
}
fmt.Printf(" Replica %d: PRE-PREPARE accepted, sending PREPARE\n", r.ID)
// Broadcast prepare
go func() {
for _, replica := range pbft.Replicas {
if replica.ID != r.ID {
replica.ReceivePrepare(prepareMsg)
}
}
}()
}
// ReceivePrepare handles prepare message
func (r *PBFTReplica) ReceivePrepare(msg PBFTMessage) {
r.mu.Lock()
defer r.mu.Unlock()
if r.IsFaulty {
return
}
r.Prepared[msg.Digest]++
r.MessageLog = append(r.MessageLog, msg)
// Need 2f+1 prepares (including self)
requiredPrepares := 2*len(r.Prepared)/3 + 1
if r.Prepared[msg.Digest] >= requiredPrepares {
fmt.Printf("2️⃣ Replica %d: PREPARED (received %d PREPARE messages)\n",
r.ID, r.Prepared[msg.Digest])
// Send COMMIT
commitMsg := PBFTMessage{
Type: "commit",
View: msg.View,
Sequence: msg.Sequence,
Digest: msg.Digest,
ReplicaID: r.ID,
Timestamp: time.Now(),
}
// Broadcast commit
// (simplified: just increment counter)
r.Committed[msg.Digest]++
}
}
// CheckConsensus checks if consensus was reached
func (pbft *PBFTSystem) CheckConsensus(digest string, seq int) {
fmt.Println("\n3️⃣ COMMIT phase:")
committed := 0
for _, replica := range pbft.Replicas {
replica.mu.Lock()
count := replica.Committed[digest]
replica.mu.Unlock()
if !replica.IsFaulty && count > 0 {
committed++
replica.mu.Lock()
replica.Executed[seq] = true
replica.mu.Unlock()
fmt.Printf(" Replica %d: COMMITTED and EXECUTED\n", replica.ID)
}
}
requiredCommits := 2*len(pbft.Replicas)/3 + 1
fmt.Println("\n📊 Consensus Result:")
if committed >= requiredCommits {
fmt.Printf("✅ CONSENSUS ACHIEVED (%d/%d replicas committed)\n",
committed, len(pbft.Replicas))
} else {
fmt.Printf("❌ CONSENSUS FAILED (%d/%d replicas committed)\n",
committed, len(pbft.Replicas))
}
}
func RunPBFT() {
fmt.Println("\n=== Practical Byzantine Fault Tolerance (PBFT) ===\n")
// 4 replicas can tolerate 1 fault (n = 3f + 1)
system := NewPBFTSystem(4, 1)
system.Request("SET x = 42")
}
func main() {
RunPBFT()
}
Key Concepts
1. The 3f+1 Rule
Why this magic number?
- Need majority of loyal nodes: ⌈(n+f)/2⌉ > f
- Simplifies to: n > 3f
- Minimum: n ≥ 3f + 1
2. Message Rounds
// Round complexity: O(n^(f+1))
// For f=1: O(n^2) messages
// For f=2: O(n^3) messages
3. Digital Signatures Help
With signatures (Byzantine Generals with signed messages):
- Only need n ≥ 2f + 1
- Fewer messages required
- Can detect forgeries
Real-World Examples
Bitcoin (Proof of Work)
// Simplified Bitcoin consensus
type Block struct {
PrevHash string
Transactions []string
Nonce int
Hash string
}
// Byzantine fault tolerance through computational work
func MineBlock(block *Block, difficulty int) {
for {
block.Hash = calculateHash(block)
if hasLeadingZeros(block.Hash, difficulty) {
break // Found valid block
}
block.Nonce++
}
}
// Longest chain wins - Byzantine nodes can't
// rewrite history without 51% of hash power
Distributed Database
// Replicated state machine with Byzantine tolerance
type ReplicatedDB struct {
Replicas []Replica
State map[string]string
}
func (db *ReplicatedDB) Write(key, value string) error {
// Propose write to all replicas
// Wait for 2f+1 confirmations
// Apply to state
}
Testing Byzantine Consensus
func TestByzantineConsensus(t *testing.T) {
scenarios := []struct {
name string
numGenerals int
numTraitors int
shouldWork bool
}{
{"4 generals, 1 traitor", 4, 1, true},
{"7 generals, 2 traitors", 7, 2, true},
{"3 generals, 1 traitor", 3, 1, false},
{"5 generals, 2 traitors", 5, 2, false},
}
for _, tc := range scenarios {
t.Run(tc.name, func(t *testing.T) {
system := NewByzantineSystem(tc.numGenerals, tc.numTraitors)
system.RunConsensus(Attack)
// Check if loyal generals agree
decisions := make(map[Order]int)
for _, g := range system.Generals {
if !g.IsTraitor {
decisions[g.Decide()]++
}
}
achieved := len(decisions) == 1
if achieved != tc.shouldWork {
t.Errorf("Expected consensus=%v, got %v", tc.shouldWork, achieved)
}
})
}
}
Performance Characteristics
| Metric | Value |
|---|---|
| Messages (oral) | O(n^(f+1)) |
| Messages (signed) | O(n²) |
| Rounds | f + 1 |
| Min nodes | 3f + 1 |
| Latency | High (multiple rounds) |
| Complexity | Exponential in f |
When to Use
✓ Use when:
- System must tolerate arbitrary failures
- Nodes may behave maliciously
- No trusted party exists
- Consensus is safety-critical
- Examples: blockchain, aerospace, nuclear control
✗ Avoid when:
- Only crash failures possible (use Paxos/Raft instead)
- Performance is critical
- Trusted central authority exists
- Cost of Byzantine tolerance too high
Try It Yourself
- Vary f and n - Test the 3f+1 boundary
- Add signatures - Implement signed messages
- Optimize PBFT - Reduce message complexity
- Simulate network partitions - Handle message delays
- Add view changes - Handle primary failures
This is part 17 of “Golang Experiments: Classic Concurrency Problems”
Further Reading
- Original Paper: “The Byzantine Generals Problem” - Lamport, Shostak, Pease (1982)
- PBFT Paper: “Practical Byzantine Fault Tolerance” - Castro & Liskov (1999)
- Bitcoin Whitepaper: Satoshi Nakamoto (2008)