The Bully Election Algorithm

The Bully Algorithm, proposed by Hector Garcia-Molina in 1982, is a classic leader election algorithm for distributed systems. It’s called “bully” because the highest-numbered process always wins and “bullies” the others into accepting it as leader.

The Scenario

A distributed system needs a coordinator:

  • N nodes in a network
  • Each node has a unique ID (priority)
  • One node must be elected as leader
  • When the leader fails, a new leader must be elected
  • Rule: The node with the highest ID wins

The protocol:

  1. Any node can start an election when it detects leader failure
  2. Node sends “ELECTION” message to all higher-ID nodes
  3. If no response → declare victory
  4. If response → wait for “COORDINATOR” announcement
  5. New leader broadcasts “COORDINATOR” message to all

The Challenge

sequenceDiagram participant N1 as Node 1 participant N2 as Node 2 participant N3 as Node 3 participant N4 as Node 4 participant N5 as Node 5 (Leader) Note over N5: Leader Crashes! N5--xN1: ❌ Note over N2: Detects failure,
starts election N2->>N3: ELECTION N2->>N4: ELECTION N2->>N5: ELECTION (no response) N3->>N4: ELECTION N3->>N5: ELECTION (no response) N4->>N5: ELECTION (no response) Note over N4: Highest alive,
becomes leader N4->>N1: COORDINATOR N4->>N2: COORDINATOR N4->>N3: COORDINATOR

Real-World Applications

This pattern appears in many distributed systems:

  • Apache ZooKeeper: Leader election for distributed coordination
  • etcd: Leader election in distributed key-value store
  • MongoDB Replica Sets: Primary election
  • RabbitMQ Clustering: Master node election
  • Kubernetes: Controller leader election
  • Elasticsearch: Master node election

Implementation in Go

package main

import (
	"context"
	"fmt"
	"math/rand"
	"sync"
	"time"
)

// NodeState represents the current state of a node
type NodeState int

const (
	Follower NodeState = iota
	Candidate
	Leader
)

func (s NodeState) String() string {
	return []string{"Follower", "Candidate", "Leader"}[s]
}

// Message types
type MessageType string

const (
	Election    MessageType = "ELECTION"
	OK          MessageType = "OK"
	Coordinator MessageType = "COORDINATOR"
	Heartbeat   MessageType = "HEARTBEAT"
)

// Message sent between nodes
type Message struct {
	Type     MessageType
	SenderID int
	Term     int // Election term for detecting stale messages
}

// Node represents a process in the distributed system
type Node struct {
	ID                int
	State             NodeState
	CurrentLeader     int
	IsAlive           bool
	Term              int // Current election term
	Peers             []*Node
	IncomingMessages  chan Message
	HeartbeatReceived time.Time
	mu                sync.Mutex
	ctx               context.Context
	cancel            context.CancelFunc
	wg                sync.WaitGroup
}

// BullySystem coordinates the nodes
type BullySystem struct {
	Nodes     []*Node
	mu        sync.Mutex
	Elections int // Track number of elections
}

func NewBullySystem(numNodes int) *BullySystem {
	nodes := make([]*Node, numNodes)

	for i := 0; i < numNodes; i++ {
		ctx, cancel := context.WithCancel(context.Background())
		nodes[i] = &Node{
			ID:                i,
			State:             Follower,
			CurrentLeader:     -1,
			IsAlive:           true,
			IncomingMessages:  make(chan Message, 100),
			HeartbeatReceived: time.Now(),
			ctx:               ctx,
			cancel:            cancel,
		}
	}

	// Set up peer connections
	for _, node := range nodes {
		node.Peers = nodes
	}

	return &BullySystem{
		Nodes: nodes,
	}
}

// Start begins node operations
func (n *Node) Start() {
	n.wg.Add(1)
	go n.run()
}

// Stop gracefully stops the node
func (n *Node) Stop() {
	n.cancel()
	n.wg.Wait()
}

// run is the main event loop for the node
func (n *Node) run() {
	defer n.wg.Done()

	heartbeatTicker := time.NewTicker(500 * time.Millisecond)
	defer heartbeatTicker.Stop()

	electionTimeout := time.NewTicker(2 * time.Second)
	defer electionTimeout.Stop()

	for {
		select {
		case <-n.ctx.Done():
			return

		case msg := <-n.IncomingMessages:
			n.handleMessage(msg)

		case <-heartbeatTicker.C:
			n.sendHeartbeat()

		case <-electionTimeout.C:
			n.checkLeader()
		}
	}
}

// handleMessage processes incoming messages
func (n *Node) handleMessage(msg Message) {
	n.mu.Lock()
	defer n.mu.Unlock()

	if !n.IsAlive {
		return
	}

	switch msg.Type {
	case Election:
		// Received election from lower-ID node
		fmt.Printf("[Node %d] 📨 Received ELECTION from Node %d\n", n.ID, msg.SenderID)

		if n.ID > msg.SenderID {
			// Send OK response
			n.sendTo(msg.SenderID, Message{
				Type:     OK,
				SenderID: n.ID,
				Term:     n.Term,
			})
			fmt.Printf("[Node %d] ✉️  Sent OK to Node %d\n", n.ID, msg.SenderID)

			// Start own election
			go n.StartElection()
		}

	case OK:
		// Higher-ID node responded
		fmt.Printf("[Node %d] ✅ Received OK from Node %d (higher priority)\n", n.ID, msg.SenderID)
		n.State = Follower

	case Coordinator:
		// New leader announcement
		fmt.Printf("[Node %d] 👑 Node %d is the new COORDINATOR\n", n.ID, msg.SenderID)
		n.CurrentLeader = msg.SenderID
		n.State = Follower
		n.Term = msg.Term
		n.HeartbeatReceived = time.Now()

	case Heartbeat:
		// Leader heartbeat
		if msg.SenderID == n.CurrentLeader {
			n.HeartbeatReceived = time.Now()
		}
	}
}

// StartElection initiates an election
func (n *Node) StartElection() {
	n.mu.Lock()
	if !n.IsAlive || n.State == Candidate {
		n.mu.Unlock()
		return
	}

	n.State = Candidate
	n.Term++
	term := n.Term
	n.mu.Unlock()

	fmt.Printf("\n[Node %d] 🗳️  Starting ELECTION (term %d)\n", n.ID, term)

	// Send ELECTION to all higher-ID nodes
	higherNodes := n.getHigherNodes()

	if len(higherNodes) == 0 {
		// No higher nodes, become leader
		n.becomeLeader()
		return
	}

	// Send election messages
	for _, node := range higherNodes {
		n.sendTo(node.ID, Message{
			Type:     Election,
			SenderID: n.ID,
			Term:     term,
		})
	}

	// Wait for OK responses (with timeout)
	time.Sleep(1 * time.Second)

	n.mu.Lock()
	defer n.mu.Unlock()

	// If still candidate, no higher node responded
	if n.State == Candidate {
		n.mu.Unlock()
		n.becomeLeader()
		n.mu.Lock()
	}
}

// becomeLeader transitions node to leader
func (n *Node) becomeLeader() {
	n.mu.Lock()
	n.State = Leader
	n.CurrentLeader = n.ID
	term := n.Term
	n.mu.Unlock()

	fmt.Printf("\n[Node %d] 🎖️  Became LEADER (term %d)\n", n.ID, term)

	// Broadcast coordinator message
	for _, peer := range n.Peers {
		if peer.ID != n.ID {
			n.sendTo(peer.ID, Message{
				Type:     Coordinator,
				SenderID: n.ID,
				Term:     term,
			})
		}
	}
}

// sendHeartbeat sends periodic heartbeats if leader
func (n *Node) sendHeartbeat() {
	n.mu.Lock()
	defer n.mu.Unlock()

	if !n.IsAlive || n.State != Leader {
		return
	}

	for _, peer := range n.Peers {
		if peer.ID != n.ID && peer.IsAlive {
			n.sendTo(peer.ID, Message{
				Type:     Heartbeat,
				SenderID: n.ID,
				Term:     n.Term,
			})
		}
	}
}

// checkLeader checks if leader is still alive
func (n *Node) checkLeader() {
	n.mu.Lock()
	defer n.mu.Unlock()

	if !n.IsAlive || n.State == Leader {
		return
	}

	// Check if heartbeat timeout
	if time.Since(n.HeartbeatReceived) > 3*time.Second && n.CurrentLeader != -1 {
		fmt.Printf("\n[Node %d] ⚠️  Leader timeout detected!\n", n.ID)
		n.mu.Unlock()
		n.StartElection()
		n.mu.Lock()
	}
}

// getHigherNodes returns all nodes with higher IDs
func (n *Node) getHigherNodes() []*Node {
	var higher []*Node
	for _, peer := range n.Peers {
		peer.mu.Lock()
		if peer.ID > n.ID && peer.IsAlive {
			higher = append(higher, peer)
		}
		peer.mu.Unlock()
	}
	return higher
}

// sendTo sends a message to a specific node
func (n *Node) sendTo(targetID int, msg Message) {
	if targetID < 0 || targetID >= len(n.Peers) {
		return
	}

	target := n.Peers[targetID]
	target.mu.Lock()
	defer target.mu.Unlock()

	if target.IsAlive {
		select {
		case target.IncomingMessages <- msg:
		case <-time.After(100 * time.Millisecond):
			// Message dropped
		}
	}
}

// Crash simulates node failure
func (n *Node) Crash() {
	n.mu.Lock()
	defer n.mu.Unlock()

	if !n.IsAlive {
		return
	}

	fmt.Printf("\n[Node %d] 💥 CRASHED\n", n.ID)
	n.IsAlive = false
	n.State = Follower
}

// Recover simulates node recovery
func (n *Node) Recover() {
	n.mu.Lock()
	defer n.mu.Unlock()

	if n.IsAlive {
		return
	}

	fmt.Printf("\n[Node %d] 🔄 RECOVERED\n", n.ID)
	n.IsAlive = true
	n.CurrentLeader = -1
	n.HeartbeatReceived = time.Now()

	// Start election after recovery
	go func() {
		time.Sleep(500 * time.Millisecond)
		n.StartElection()
	}()
}

// GetLeader returns current leader info
func (bs *BullySystem) GetLeader() (leaderID int, found bool) {
	bs.mu.Lock()
	defer bs.mu.Unlock()

	for _, node := range bs.Nodes {
		node.mu.Lock()
		if node.IsAlive && node.State == Leader {
			node.mu.Unlock()
			return node.ID, true
		}
		node.mu.Unlock()
	}
	return -1, false
}

// PrintStatus prints system status
func (bs *BullySystem) PrintStatus() {
	fmt.Println("\n📊 System Status:")
	fmt.Println("   Node | State     | Leader | Alive")
	fmt.Println("   -----|-----------|--------|------")

	for _, node := range bs.Nodes {
		node.mu.Lock()
		fmt.Printf("   %4d | %-9s | %6d | %v\n",
			node.ID, node.State, node.CurrentLeader, node.IsAlive)
		node.mu.Unlock()
	}
}

func main() {
	rand.Seed(42)

	fmt.Println("=== Bully Election Algorithm ===\n")

	const numNodes = 5
	system := NewBullySystem(numNodes)

	// Start all nodes
	fmt.Println("🚀 Starting all nodes...")
	for _, node := range system.Nodes {
		node.Start()
	}

	// Wait for initial election
	time.Sleep(1 * time.Second)
	system.PrintStatus()

	// Node 4 (highest) should be leader
	if leader, found := system.GetLeader(); found {
		fmt.Printf("\n✅ Initial leader: Node %d\n", leader)
	}

	// Scenario 1: Leader crashes
	fmt.Println("\n\n" + "=== Scenario 1: Leader Crash ===" + "\n")
	time.Sleep(1 * time.Second)
	system.Nodes[4].Crash()
	time.Sleep(3 * time.Second)

	system.PrintStatus()
	if leader, found := system.GetLeader(); found {
		fmt.Printf("\n✅ New leader after crash: Node %d\n", leader)
	}

	// Scenario 2: Original leader recovers
	fmt.Println("\n\n" + "=== Scenario 2: Leader Recovery ===" + "\n")
	time.Sleep(1 * time.Second)
	system.Nodes[4].Recover()
	time.Sleep(3 * time.Second)

	system.PrintStatus()
	if leader, found := system.GetLeader(); found {
		fmt.Printf("\n✅ Leader after recovery: Node %d\n", leader)
	}

	// Scenario 3: Multiple simultaneous failures
	fmt.Println("\n\n" + "=== Scenario 3: Multiple Failures ===" + "\n")
	time.Sleep(1 * time.Second)
	system.Nodes[4].Crash()
	system.Nodes[3].Crash()
	time.Sleep(3 * time.Second)

	system.PrintStatus()
	if leader, found := system.GetLeader(); found {
		fmt.Printf("\n✅ Leader after multiple failures: Node %d\n", leader)
	}

	// Cleanup
	fmt.Println("\n\n🛑 Shutting down...")
	for _, node := range system.Nodes {
		node.Stop()
	}

	fmt.Println("\n✓ Simulation complete!")
}

How It Works

1. Election Initiation

graph TD A[Node detects
leader failure] --> B{Am I highest
alive node?} B -->|Yes| C[Become leader] B -->|No| D[Send ELECTION
to higher nodes] D --> E{Received OK?} E -->|Yes| F[Wait for
COORDINATOR] E -->|No| C

2. Message Flow

// 1. Detect failure
if time.Since(lastHeartbeat) > timeout {
    startElection()
}

// 2. Send ELECTION to higher nodes
for node := range higherNodes {
    send(node, ELECTION)
}

// 3. If OK received → wait
// 4. If timeout → become leader
if !receivedOK {
    becomeLeader()
}

3. Handling Simultaneous Elections

When multiple nodes start elections simultaneously:

// Node 2 starts election
// Node 3 starts election
// Both send to higher nodes

// Node 3 receives Node 2's ELECTION
// Responds with OK
// Node 2 defers to Node 3

// Node 4 receives both elections
// Responds OK to both
// Starts own election
// Becomes leader (highest)

Advanced: Ring-Based Bully Election

Optimize message complexity using a ring topology:

package main

import (
	"fmt"
	"sync"
)

// RingNode represents a node in a ring topology
type RingNode struct {
	ID       int
	IsAlive  bool
	Next     *RingNode
	LeaderID int
	mu       sync.Mutex
}

type RingElection struct {
	Nodes []*RingNode
}

func NewRingElection(numNodes int) *RingElection {
	nodes := make([]*RingNode, numNodes)

	// Create nodes
	for i := 0; i < numNodes; i++ {
		nodes[i] = &RingNode{
			ID:       i,
			IsAlive:  true,
			LeaderID: -1,
		}
	}

	// Create ring
	for i := 0; i < numNodes; i++ {
		nodes[i].Next = nodes[(i+1)%numNodes]
	}

	return &RingElection{Nodes: nodes}
}

// StartElection initiates ring-based election
func (rn *RingNode) StartElection() int {
	fmt.Printf("\n[Node %d] 🗳️  Starting ring election\n", rn.ID)

	// Pass token with own ID
	participantIDs := []int{rn.ID}

	// Forward around ring
	current := rn.Next
	for current.ID != rn.ID {
		current.mu.Lock()
		if current.IsAlive {
			participantIDs = append(participantIDs, current.ID)
			fmt.Printf("[Node %d] → Adding self to election\n", current.ID)
		}
		current.mu.Unlock()

		current = current.Next
	}

	// Find highest ID
	maxID := rn.ID
	for _, id := range participantIDs {
		if id > maxID {
			maxID = id
		}
	}

	// Announce leader to all
	current = rn
	for {
		current.mu.Lock()
		current.LeaderID = maxID
		current.mu.Unlock()

		current = current.Next
		if current.ID == rn.ID {
			break
		}
	}

	fmt.Printf("\n[Ring] 👑 Node %d elected leader\n", maxID)
	return maxID
}

func RunRingElection() {
	fmt.Println("\n=== Ring-Based Election ===\n")

	re := NewRingElection(5)

	// Simulate some node failures
	re.Nodes[4].IsAlive = false // Highest node down
	re.Nodes[3].IsAlive = false

	// Start election from node 1
	leader := re.Nodes[1].StartElection()

	fmt.Printf("\n✅ Elected leader: Node %d\n", leader)
	fmt.Println("\nNode States:")
	for _, node := range re.Nodes {
		fmt.Printf("  Node %d: Alive=%v, Leader=%d\n",
			node.ID, node.IsAlive, node.LeaderID)
	}
}

func main() {
	RunRingElection()
}

Key Concepts

1. Bully Property

The highest-ID process always wins:

  • Simple and deterministic
  • No ties possible
  • Predictable leader

2. Message Complexity

Worst case (Node 0 starts election):

  • O(n²) messages total
  • Node i sends to (n-i) higher nodes
  • Total: n(n-1)/2

Best case (highest node starts):

  • O(n) messages
  • Single broadcast to all nodes

3. Failure Detection

// Heartbeat-based detection
ticker := time.NewTicker(heartbeatInterval)
for {
    select {
    case <-ticker.C:
        if leader {
            sendHeartbeat()
        } else {
            checkHeartbeat()
        }
    }
}

Real-World Example: Database Primary Election

package main

import (
	"fmt"
	"sync"
	"time"
)

type DatabaseReplica struct {
	ID        int
	IsPrimary bool
	IsOnline  bool
	DataLag   time.Duration // Replication lag
	Priority  int           // Configured priority
	mu        sync.Mutex
}

type DatabaseCluster struct {
	Replicas []*DatabaseReplica
	mu       sync.Mutex
}

func (dc *DatabaseCluster) ElectPrimary() *DatabaseReplica {
	dc.mu.Lock()
	defer dc.mu.Unlock()

	fmt.Println("\n🗳️  Electing database primary...")

	// Find eligible candidates
	var candidates []*DatabaseReplica
	for _, replica := range dc.Replicas {
		replica.mu.Lock()
		if replica.IsOnline && replica.DataLag < 5*time.Second {
			candidates = append(candidates, replica)
			fmt.Printf("  Candidate: Replica %d (lag: %v, priority: %d)\n",
				replica.ID, replica.DataLag, replica.Priority)
		}
		replica.mu.Unlock()
	}

	if len(candidates) == 0 {
		fmt.Println("❌ No eligible candidates!")
		return nil
	}

	// Elect based on: 1) Priority, 2) Lowest lag, 3) Highest ID
	winner := candidates[0]
	for _, candidate := range candidates[1:] {
		if candidate.Priority > winner.Priority {
			winner = candidate
		} else if candidate.Priority == winner.Priority {
			if candidate.DataLag < winner.DataLag {
				winner = candidate
			} else if candidate.DataLag == winner.DataLag && candidate.ID > winner.ID {
				winner = candidate
			}
		}
	}

	// Set as primary
	for _, replica := range dc.Replicas {
		replica.mu.Lock()
		replica.IsPrimary = (replica.ID == winner.ID)
		replica.mu.Unlock()
	}

	winner.mu.Lock()
	winner.IsPrimary = true
	winner.mu.Unlock()

	fmt.Printf("\n👑 Replica %d elected as PRIMARY\n", winner.ID)
	return winner
}

func RunDatabaseElection() {
	fmt.Println("=== Database Primary Election ===")

	cluster := &DatabaseCluster{
		Replicas: []*DatabaseReplica{
			{ID: 0, IsOnline: true, DataLag: 100 * time.Millisecond, Priority: 1},
			{ID: 1, IsOnline: true, DataLag: 50 * time.Millisecond, Priority: 2},
			{ID: 2, IsOnline: true, DataLag: 200 * time.Millisecond, Priority: 1},
			{ID: 3, IsOnline: false, DataLag: 0, Priority: 3}, // Offline
		},
	}

	primary := cluster.ElectPrimary()

	fmt.Println("\nCluster State:")
	for _, replica := range cluster.Replicas {
		replica.mu.Lock()
		role := "Secondary"
		if replica.IsPrimary {
			role = "PRIMARY"
		}
		fmt.Printf("  Replica %d: %s (online=%v, lag=%v, priority=%d)\n",
			replica.ID, role, replica.IsOnline, replica.DataLag, replica.Priority)
		replica.mu.Unlock()
	}

	fmt.Printf("\n✅ Primary: Replica %d\n", primary.ID)
}

func main() {
	RunDatabaseElection()
}

Performance Characteristics

Metric Value
Messages O(n²) worst case
Latency 2 RTT (election + announcement)
Elections O(n) per failure
Scalability Poor for large n
Determinism Perfect (highest always wins)

Optimizations

1. Fast Path Election

// If node knows it's highest, immediately announce
if n.ID == maxKnownID {
    n.becomeLeader()
    return
}

2. Election Suppression

// Ignore duplicate elections during ongoing election
if n.electionInProgress {
    return
}

3. Priority-Based IDs

// Assign IDs based on:
// - Hardware capability
// - Network location
// - Data freshness
priority := (hardware * 1000) + (location * 100) + ID

When to Use

Use when:

  • Small to medium cluster size
  • Clear priority ordering exists
  • Network is reliable
  • Deterministic leader selection needed
  • Simplicity is valued

Avoid when:

  • Large clusters (O(n²) messages)
  • Network is unreliable
  • Frequent failures occur
  • Need partition tolerance
  • Use Raft/Paxos instead for production

Try It Yourself

  1. Add priorities - Not just ID-based election
  2. Implement ring version - Reduce message complexity
  3. Add network delays - Simulate real networks
  4. Test split-brain - Handle network partitions
  5. Compare with Raft - Understand trade-offs

This is part 18 of “Golang Experiments: Classic Concurrency Problems”

Further Reading

  • Original Paper: “Elections in a Distributed Computing System” - Garcia-Molina (1982)
  • Raft Consensus: More robust alternative
  • ZooKeeper: Production leader election
  • Chubby: Google’s distributed lock service