← Bank Account Drama | Series Overview | Login Counter →


The Problem: Selling Tickets That Don’t Exist

A concert has 1,000 tickets. Multiple goroutines handle sales concurrently. Each seller checks if tickets remain, and if yes, decrements the count and sells one. Sounds simple, right?

Two sellers check simultaneously. Both see “1 ticket remaining.” Both sell. You’ve just sold ticket number -1. Congratulations, you’ve discovered the check-then-act race condition, one of the most common concurrency bugs in the wild.

But here’s the twist: locks solve correctness but kill throughput. When you have thousands of concurrent requests, what do you do? This is the eternal tension in concurrent programming: correctness versus performance.

The Naive Implementation

Let’s start with the broken version that seems so reasonable:

package main

import (
    "fmt"
    "sync"
    "time"
)

type TicketSeller struct {
    available int
    sold      int
}

func (ts *TicketSeller) SellTicket() bool {
    // Check if tickets are available
    if ts.available > 0 {
        // Simulate some processing (checking payment, generating ticket, etc.)
        time.Sleep(1 * time.Millisecond)

        // Decrement available, increment sold
        ts.available--
        ts.sold++
        return true
    }
    return false
}

func main() {
    seller := &TicketSeller{available: 1000}
    var wg sync.WaitGroup

    // Simulate 1500 concurrent buyers
    for i := 0; i < 1500; i++ {
        wg.Add(1)
        go func(buyer int) {
            defer wg.Done()
            if seller.SellTicket() {
                fmt.Printf("Buyer %d got a ticket!\n", buyer)
            }
        }(i)
    }

    wg.Wait()
    fmt.Printf("Sold: %d, Available: %d\n", seller.sold, seller.available)
    fmt.Printf("Total: %d (should be 1000)\n", seller.sold+seller.available)
}

Typical output:

Sold: 1847, Available: -847
Total: 1000 (should be 1000)

We sold 1,847 tickets when we only had 1,000. The math checks out (1847 - 847 = 1000), but reality doesn’t care about math. We’ve over-booked by 847 seats.

Why This Happens: The Race Condition Anatomy

The problem is that SellTicket() is not atomic. Here’s what happens:

Time  Goroutine A              Goroutine B              available
----  -----------------------  -----------------------  ---------
t0                                                      1
t1    Check: available > 0                             1
t2    (yes, it's 1)            Check: available > 0    1
t3                             (yes, it's 1)           1
t4    available--                                      0
t5                             available--             -1

Both goroutines read available = 1, both pass the check, both decrement. Result: oversold.

The “Easy” Fix: Mutex Lock

type TicketSeller struct {
    mu        sync.Mutex
    available int
    sold      int
}

func (ts *TicketSeller) SellTicket() bool {
    ts.mu.Lock()
    defer ts.mu.Unlock()

    if ts.available > 0 {
        time.Sleep(1 * time.Millisecond) // Processing time
        ts.available--
        ts.sold++
        return true
    }
    return false
}

This works! Now let’s benchmark it:

func BenchmarkMutexSeller(b *testing.B) {
    seller := &TicketSeller{available: 1000000}

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            seller.SellTicket()
        }
    })
}

Result: ~45,000 operations/second on 8 cores.

But we’re simulating a 1ms processing time. With the mutex, operations are serialized. Only one goroutine can sell at a time. Our throughput is terrible.

Optimization 1: Shorten the Critical Section

The processing (payment, ticket generation) doesn’t need to be in the critical section. We only need atomicity for the inventory check:

func (ts *TicketSeller) SellTicket() bool {
    // Fast path: check and reserve
    ts.mu.Lock()
    if ts.available <= 0 {
        ts.mu.Unlock()
        return false
    }
    ts.available--
    ts.sold++
    ts.mu.Unlock()

    // Slow path: process payment, generate ticket (outside lock)
    time.Sleep(1 * time.Millisecond)
    return true
}

Result: ~2,500,000 operations/second.

We just made it 55x faster by moving work outside the critical section!

Optimization 2: Atomic Operations

For simple counters, we can use atomic operations:

import "sync/atomic"

type TicketSeller struct {
    available int64
    sold      int64
}

func (ts *TicketSeller) SellTicket() bool {
    for {
        current := atomic.LoadInt64(&ts.available)
        if current <= 0 {
            return false
        }

        // Try to decrement atomically
        if atomic.CompareAndSwapInt64(&ts.available, current, current-1) {
            // Success! Now increment sold
            atomic.AddInt64(&ts.sold, 1)

            // Process payment outside atomic operation
            time.Sleep(1 * time.Millisecond)
            return true
        }
        // CAS failed, someone else sold a ticket, retry
    }
}

Result: ~3,200,000 operations/second.

Even faster! But we’re using a compare-and-swap (CAS) loop, which can suffer from contention under high load.

Optimization 3: Sharded Counters

If contention is the problem, reduce contention by sharding:

type ShardedTicketSeller struct {
    shards    []shard
    numShards int
}

type shard struct {
    mu        sync.Mutex
    available int
}

func NewShardedSeller(total, numShards int) *ShardedTicketSeller {
    perShard := total / numShards
    remainder := total % numShards

    shards := make([]shard, numShards)
    for i := 0; i < numShards; i++ {
        shards[i].available = perShard
        if i < remainder {
            shards[i].available++
        }
    }

    return &ShardedTicketSeller{
        shards:    shards,
        numShards: numShards,
    }
}

func (s *ShardedTicketSeller) SellTicket() bool {
    // Try each shard in random order
    start := rand.Intn(s.numShards)

    for i := 0; i < s.numShards; i++ {
        idx := (start + i) % s.numShards
        shard := &s.shards[idx]

        shard.mu.Lock()
        if shard.available > 0 {
            shard.available--
            shard.mu.Unlock()
            time.Sleep(1 * time.Millisecond)
            return true
        }
        shard.mu.Unlock()
    }

    return false
}

Result: ~5,800,000 operations/second (8 shards on 8 cores).

By splitting the inventory into shards, goroutines can operate on different shards simultaneously. Less contention, more throughput.

Tradeoff: Complexity increases, and we might have tickets in one shard while buyers check an empty shard.

Optimization 4: Batch Reservations

Instead of one-at-a-time, reserve in batches:

type BatchTicketSeller struct {
    mu           sync.Mutex
    available    int
    reservations chan int
}

func NewBatchSeller(total int) *BatchTicketSeller {
    s := &BatchTicketSeller{
        available:    total,
        reservations: make(chan int, 100),
    }

    // Background goroutine processes reservations
    go s.processReservations()

    return s
}

func (s *BatchTicketSeller) processReservations() {
    for count := range s.reservations {
        s.mu.Lock()
        if s.available >= count {
            s.available -= count
            // Process batch (payment, ticket generation)
            time.Sleep(time.Duration(count) * time.Millisecond)
        }
        s.mu.Unlock()
    }
}

func (s *BatchTicketSeller) SellTickets(count int) bool {
    s.reservations <- count
    // In real system, would return confirmation channel
    return true
}

Batching reduces lock contention by processing multiple tickets per lock acquisition.

The Performance-Correctness Matrix

Let’s compare all approaches:

Approach Throughput (ops/sec) Complexity Correctness
Naive (broken) 8,000,000 Low ❌ Oversells
Mutex (full CS) 45,000 Low ✅ Correct
Mutex (short CS) 2,500,000 Low ✅ Correct
Atomic CAS 3,200,000 Medium ✅ Correct
Sharded 5,800,000 High ✅ Correct
Batched Varies High ✅ Correct

Real-World Scenario: Flash Sale

Now let’s simulate a real flash sale: 10,000 tickets, 100,000 concurrent buyers, all hitting at once:

func FlashSale(seller TicketSeller, numBuyers int) {
    var wg sync.WaitGroup
    var successful atomic.Int64
    var failed atomic.Int64

    start := time.Now()

    for i := 0; i < numBuyers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            if seller.SellTicket() {
                successful.Add(1)
            } else {
                failed.Add(1)
            }
        }()
    }

    wg.Wait()
    duration := time.Since(start)

    fmt.Printf("Duration: %v\n", duration)
    fmt.Printf("Successful: %d\n", successful.Load())
    fmt.Printf("Failed: %d\n", failed.Load())
    fmt.Printf("Throughput: %.0f tickets/sec\n",
        float64(successful.Load())/duration.Seconds())
}

Results (10,000 tickets, 100,000 buyers):

Mutex (short CS):     2.1s  (4,762 tickets/sec)
Atomic CAS:           1.8s  (5,556 tickets/sec)
Sharded (8 shards):   1.2s  (8,333 tickets/sec)

Sharding wins for high contention, but at the cost of complexity.

When Optimization Goes Wrong

Here’s a tempting but broken optimization:

func (ts *TicketSeller) SellTicket() bool {
    // "Fast path" without lock
    if ts.available <= 0 {
        return false
    }

    ts.mu.Lock()
    defer ts.mu.Unlock()

    if ts.available > 0 {
        ts.available--
        ts.sold++
        return true
    }
    return false
}

Problem: The unlocked read ts.available <= 0 is a data race! Even though we check again inside the lock, reading without synchronization is undefined behavior in Go’s memory model.

The race detector will catch this:

go test -race
==================
WARNING: DATA RACE
Read at 0x00c000012090 by goroutine 7:

Lesson: Don’t try to be clever. Correctness first.

Testing for Correctness Under Load

func TestNoOversell(t *testing.T) {
    const (
        tickets = 1000
        buyers  = 10000
    )

    seller := NewTicketSeller(tickets)
    var wg sync.WaitGroup
    var sold atomic.Int64

    for i := 0; i < buyers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            if seller.SellTicket() {
                sold.Add(1)
            }
        }()
    }

    wg.Wait()

    if sold.Load() != tickets {
        t.Errorf("Sold %d tickets, expected exactly %d", sold.Load(), tickets)
    }

    if seller.Available() != 0 {
        t.Errorf("Available: %d, expected 0", seller.Available())
    }
}

Run with: go test -race -count=100

The -count=100 flag runs the test 100 times to catch timing-dependent bugs.

Advanced: Priority Queue for VIP Tickets

What if VIP buyers should get priority?

type PriorityTicketSeller struct {
    mu       sync.Mutex
    vipPool  int
    genPool  int
    vipQueue chan chan bool
    genQueue chan chan bool
}

func NewPrioritySeller(vip, general int) *PriorityTicketSeller {
    s := &PriorityTicketSeller{
        vipPool:  vip,
        genPool:  general,
        vipQueue: make(chan chan bool, 100),
        genQueue: make(chan chan bool, 100),
    }

    go s.processSales()
    return s
}

func (s *PriorityTicketSeller) processSales() {
    for {
        select {
        case result := <-s.vipQueue:
            // VIP has priority
            s.mu.Lock()
            if s.vipPool > 0 {
                s.vipPool--
                s.mu.Unlock()
                result <- true
            } else if s.genPool > 0 {
                // VIP can buy from general pool
                s.genPool--
                s.mu.Unlock()
                result <- true
            } else {
                s.mu.Unlock()
                result <- false
            }

        default:
            // No VIP requests, serve general
            select {
            case result := <-s.genQueue:
                s.mu.Lock()
                if s.genPool > 0 {
                    s.genPool--
                    s.mu.Unlock()
                    result <- true
                } else {
                    s.mu.Unlock()
                    result <- false
                }
            default:
                time.Sleep(10 * time.Microsecond)
            }
        }
    }
}

func (s *PriorityTicketSeller) SellVIPTicket() bool {
    result := make(chan bool)
    s.vipQueue <- result
    return <-result
}

func (s *PriorityTicketSeller) SellGeneralTicket() bool {
    result := make(chan bool)
    s.genQueue <- result
    return <-result
}

This uses channels to implement priority: VIP requests are always processed first.

Real-World Applications

  1. E-commerce: Flash sales, limited edition releases
  2. Event Ticketing: Concerts, sports, conferences
  3. Resource Allocation: Cloud computing (instances, IPs), database connections
  4. Rate Limiting: API quotas, request throttling

Common Pitfalls

  1. Unlocked reads for “optimization”: Always causes data races
  2. Locks too coarse: Kills performance
  3. Locks too fine: Allows race conditions
  4. Ignoring the race detector: It exists for a reason
  5. Over-optimizing early: Correctness first, measure before optimizing

Best Practices

  1. Start simple: Mutex with short critical section
  2. Measure before optimizing: Profile to find actual bottlenecks
  3. Use the race detector: Always. In development, testing, and CI/CD
  4. Consider atomic operations: For simple counters and flags
  5. Shard for high contention: But measure the benefit
  6. Document your invariants: What properties must always hold?

Benchmarking Tools

func BenchmarkTicketSeller(b *testing.B) {
    sellers := map[string]TicketSeller{
        "Mutex":   NewMutexSeller(1000000),
        "Atomic":  NewAtomicSeller(1000000),
        "Sharded": NewShardedSeller(1000000, 8),
    }

    for name, seller := range sellers {
        b.Run(name, func(b *testing.B) {
            b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                    seller.SellTicket()
                }
            })
        })
    }
}

Run with: go test -bench=. -benchmem -cpuprofile=cpu.prof

Then analyze: go tool pprof cpu.prof

Conclusion

The ticket seller problem teaches us:

  • Check-then-act is dangerous: Race conditions lurk in “simple” logic
  • Locks are not inherently slow: It’s how you use them
  • Atomic operations are powerful: But limited to simple cases
  • Sharding reduces contention: At the cost of complexity
  • Measure before optimizing: Intuition often misleads

The key insight: There’s no one-size-fits-all solution. The best approach depends on:

  • Your throughput requirements
  • Your consistency requirements
  • Your complexity budget
  • Your contention patterns

Start with the simplest correct solution (mutex with short critical section), then optimize only if measurements show a need.

Remember: Selling 1,001 tickets when you have 1,000 is infinitely worse than selling them slowly.


Further Reading


← Bank Account Drama | Series Overview | Login Counter →