← Bank Account Drama | Series Overview | Login Counter →
The Problem: Selling Tickets That Don’t Exist
A concert has 1,000 tickets. Multiple goroutines handle sales concurrently. Each seller checks if tickets remain, and if yes, decrements the count and sells one. Sounds simple, right?
Two sellers check simultaneously. Both see “1 ticket remaining.” Both sell. You’ve just sold ticket number -1. Congratulations, you’ve discovered the check-then-act race condition, one of the most common concurrency bugs in the wild.
But here’s the twist: locks solve correctness but kill throughput. When you have thousands of concurrent requests, what do you do? This is the eternal tension in concurrent programming: correctness versus performance.
The Naive Implementation
Let’s start with the broken version that seems so reasonable:
package main
import (
"fmt"
"sync"
"time"
)
type TicketSeller struct {
available int
sold int
}
func (ts *TicketSeller) SellTicket() bool {
// Check if tickets are available
if ts.available > 0 {
// Simulate some processing (checking payment, generating ticket, etc.)
time.Sleep(1 * time.Millisecond)
// Decrement available, increment sold
ts.available--
ts.sold++
return true
}
return false
}
func main() {
seller := &TicketSeller{available: 1000}
var wg sync.WaitGroup
// Simulate 1500 concurrent buyers
for i := 0; i < 1500; i++ {
wg.Add(1)
go func(buyer int) {
defer wg.Done()
if seller.SellTicket() {
fmt.Printf("Buyer %d got a ticket!\n", buyer)
}
}(i)
}
wg.Wait()
fmt.Printf("Sold: %d, Available: %d\n", seller.sold, seller.available)
fmt.Printf("Total: %d (should be 1000)\n", seller.sold+seller.available)
}
Typical output:
Sold: 1847, Available: -847
Total: 1000 (should be 1000)
We sold 1,847 tickets when we only had 1,000. The math checks out (1847 - 847 = 1000), but reality doesn’t care about math. We’ve over-booked by 847 seats.
Why This Happens: The Race Condition Anatomy
The problem is that SellTicket() is not atomic. Here’s what happens:
Time Goroutine A Goroutine B available
---- ----------------------- ----------------------- ---------
t0 1
t1 Check: available > 0 1
t2 (yes, it's 1) Check: available > 0 1
t3 (yes, it's 1) 1
t4 available-- 0
t5 available-- -1
Both goroutines read available = 1, both pass the check, both decrement. Result: oversold.
The “Easy” Fix: Mutex Lock
type TicketSeller struct {
mu sync.Mutex
available int
sold int
}
func (ts *TicketSeller) SellTicket() bool {
ts.mu.Lock()
defer ts.mu.Unlock()
if ts.available > 0 {
time.Sleep(1 * time.Millisecond) // Processing time
ts.available--
ts.sold++
return true
}
return false
}
This works! Now let’s benchmark it:
func BenchmarkMutexSeller(b *testing.B) {
seller := &TicketSeller{available: 1000000}
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
seller.SellTicket()
}
})
}
Result: ~45,000 operations/second on 8 cores.
But we’re simulating a 1ms processing time. With the mutex, operations are serialized. Only one goroutine can sell at a time. Our throughput is terrible.
Optimization 1: Shorten the Critical Section
The processing (payment, ticket generation) doesn’t need to be in the critical section. We only need atomicity for the inventory check:
func (ts *TicketSeller) SellTicket() bool {
// Fast path: check and reserve
ts.mu.Lock()
if ts.available <= 0 {
ts.mu.Unlock()
return false
}
ts.available--
ts.sold++
ts.mu.Unlock()
// Slow path: process payment, generate ticket (outside lock)
time.Sleep(1 * time.Millisecond)
return true
}
Result: ~2,500,000 operations/second.
We just made it 55x faster by moving work outside the critical section!
Optimization 2: Atomic Operations
For simple counters, we can use atomic operations:
import "sync/atomic"
type TicketSeller struct {
available int64
sold int64
}
func (ts *TicketSeller) SellTicket() bool {
for {
current := atomic.LoadInt64(&ts.available)
if current <= 0 {
return false
}
// Try to decrement atomically
if atomic.CompareAndSwapInt64(&ts.available, current, current-1) {
// Success! Now increment sold
atomic.AddInt64(&ts.sold, 1)
// Process payment outside atomic operation
time.Sleep(1 * time.Millisecond)
return true
}
// CAS failed, someone else sold a ticket, retry
}
}
Result: ~3,200,000 operations/second.
Even faster! But we’re using a compare-and-swap (CAS) loop, which can suffer from contention under high load.
Optimization 3: Sharded Counters
If contention is the problem, reduce contention by sharding:
type ShardedTicketSeller struct {
shards []shard
numShards int
}
type shard struct {
mu sync.Mutex
available int
}
func NewShardedSeller(total, numShards int) *ShardedTicketSeller {
perShard := total / numShards
remainder := total % numShards
shards := make([]shard, numShards)
for i := 0; i < numShards; i++ {
shards[i].available = perShard
if i < remainder {
shards[i].available++
}
}
return &ShardedTicketSeller{
shards: shards,
numShards: numShards,
}
}
func (s *ShardedTicketSeller) SellTicket() bool {
// Try each shard in random order
start := rand.Intn(s.numShards)
for i := 0; i < s.numShards; i++ {
idx := (start + i) % s.numShards
shard := &s.shards[idx]
shard.mu.Lock()
if shard.available > 0 {
shard.available--
shard.mu.Unlock()
time.Sleep(1 * time.Millisecond)
return true
}
shard.mu.Unlock()
}
return false
}
Result: ~5,800,000 operations/second (8 shards on 8 cores).
By splitting the inventory into shards, goroutines can operate on different shards simultaneously. Less contention, more throughput.
Tradeoff: Complexity increases, and we might have tickets in one shard while buyers check an empty shard.
Optimization 4: Batch Reservations
Instead of one-at-a-time, reserve in batches:
type BatchTicketSeller struct {
mu sync.Mutex
available int
reservations chan int
}
func NewBatchSeller(total int) *BatchTicketSeller {
s := &BatchTicketSeller{
available: total,
reservations: make(chan int, 100),
}
// Background goroutine processes reservations
go s.processReservations()
return s
}
func (s *BatchTicketSeller) processReservations() {
for count := range s.reservations {
s.mu.Lock()
if s.available >= count {
s.available -= count
// Process batch (payment, ticket generation)
time.Sleep(time.Duration(count) * time.Millisecond)
}
s.mu.Unlock()
}
}
func (s *BatchTicketSeller) SellTickets(count int) bool {
s.reservations <- count
// In real system, would return confirmation channel
return true
}
Batching reduces lock contention by processing multiple tickets per lock acquisition.
The Performance-Correctness Matrix
Let’s compare all approaches:
| Approach | Throughput (ops/sec) | Complexity | Correctness |
|---|---|---|---|
| Naive (broken) | 8,000,000 | Low | ❌ Oversells |
| Mutex (full CS) | 45,000 | Low | ✅ Correct |
| Mutex (short CS) | 2,500,000 | Low | ✅ Correct |
| Atomic CAS | 3,200,000 | Medium | ✅ Correct |
| Sharded | 5,800,000 | High | ✅ Correct |
| Batched | Varies | High | ✅ Correct |
Real-World Scenario: Flash Sale
Now let’s simulate a real flash sale: 10,000 tickets, 100,000 concurrent buyers, all hitting at once:
func FlashSale(seller TicketSeller, numBuyers int) {
var wg sync.WaitGroup
var successful atomic.Int64
var failed atomic.Int64
start := time.Now()
for i := 0; i < numBuyers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
if seller.SellTicket() {
successful.Add(1)
} else {
failed.Add(1)
}
}()
}
wg.Wait()
duration := time.Since(start)
fmt.Printf("Duration: %v\n", duration)
fmt.Printf("Successful: %d\n", successful.Load())
fmt.Printf("Failed: %d\n", failed.Load())
fmt.Printf("Throughput: %.0f tickets/sec\n",
float64(successful.Load())/duration.Seconds())
}
Results (10,000 tickets, 100,000 buyers):
Mutex (short CS): 2.1s (4,762 tickets/sec)
Atomic CAS: 1.8s (5,556 tickets/sec)
Sharded (8 shards): 1.2s (8,333 tickets/sec)
Sharding wins for high contention, but at the cost of complexity.
When Optimization Goes Wrong
Here’s a tempting but broken optimization:
func (ts *TicketSeller) SellTicket() bool {
// "Fast path" without lock
if ts.available <= 0 {
return false
}
ts.mu.Lock()
defer ts.mu.Unlock()
if ts.available > 0 {
ts.available--
ts.sold++
return true
}
return false
}
Problem: The unlocked read ts.available <= 0 is a data race! Even though we check again inside the lock, reading without synchronization is undefined behavior in Go’s memory model.
The race detector will catch this:
go test -race
==================
WARNING: DATA RACE
Read at 0x00c000012090 by goroutine 7:
Lesson: Don’t try to be clever. Correctness first.
Testing for Correctness Under Load
func TestNoOversell(t *testing.T) {
const (
tickets = 1000
buyers = 10000
)
seller := NewTicketSeller(tickets)
var wg sync.WaitGroup
var sold atomic.Int64
for i := 0; i < buyers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
if seller.SellTicket() {
sold.Add(1)
}
}()
}
wg.Wait()
if sold.Load() != tickets {
t.Errorf("Sold %d tickets, expected exactly %d", sold.Load(), tickets)
}
if seller.Available() != 0 {
t.Errorf("Available: %d, expected 0", seller.Available())
}
}
Run with: go test -race -count=100
The -count=100 flag runs the test 100 times to catch timing-dependent bugs.
Advanced: Priority Queue for VIP Tickets
What if VIP buyers should get priority?
type PriorityTicketSeller struct {
mu sync.Mutex
vipPool int
genPool int
vipQueue chan chan bool
genQueue chan chan bool
}
func NewPrioritySeller(vip, general int) *PriorityTicketSeller {
s := &PriorityTicketSeller{
vipPool: vip,
genPool: general,
vipQueue: make(chan chan bool, 100),
genQueue: make(chan chan bool, 100),
}
go s.processSales()
return s
}
func (s *PriorityTicketSeller) processSales() {
for {
select {
case result := <-s.vipQueue:
// VIP has priority
s.mu.Lock()
if s.vipPool > 0 {
s.vipPool--
s.mu.Unlock()
result <- true
} else if s.genPool > 0 {
// VIP can buy from general pool
s.genPool--
s.mu.Unlock()
result <- true
} else {
s.mu.Unlock()
result <- false
}
default:
// No VIP requests, serve general
select {
case result := <-s.genQueue:
s.mu.Lock()
if s.genPool > 0 {
s.genPool--
s.mu.Unlock()
result <- true
} else {
s.mu.Unlock()
result <- false
}
default:
time.Sleep(10 * time.Microsecond)
}
}
}
}
func (s *PriorityTicketSeller) SellVIPTicket() bool {
result := make(chan bool)
s.vipQueue <- result
return <-result
}
func (s *PriorityTicketSeller) SellGeneralTicket() bool {
result := make(chan bool)
s.genQueue <- result
return <-result
}
This uses channels to implement priority: VIP requests are always processed first.
Real-World Applications
- E-commerce: Flash sales, limited edition releases
- Event Ticketing: Concerts, sports, conferences
- Resource Allocation: Cloud computing (instances, IPs), database connections
- Rate Limiting: API quotas, request throttling
Common Pitfalls
- Unlocked reads for “optimization”: Always causes data races
- Locks too coarse: Kills performance
- Locks too fine: Allows race conditions
- Ignoring the race detector: It exists for a reason
- Over-optimizing early: Correctness first, measure before optimizing
Best Practices
- Start simple: Mutex with short critical section
- Measure before optimizing: Profile to find actual bottlenecks
- Use the race detector: Always. In development, testing, and CI/CD
- Consider atomic operations: For simple counters and flags
- Shard for high contention: But measure the benefit
- Document your invariants: What properties must always hold?
Benchmarking Tools
func BenchmarkTicketSeller(b *testing.B) {
sellers := map[string]TicketSeller{
"Mutex": NewMutexSeller(1000000),
"Atomic": NewAtomicSeller(1000000),
"Sharded": NewShardedSeller(1000000, 8),
}
for name, seller := range sellers {
b.Run(name, func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
seller.SellTicket()
}
})
})
}
}
Run with: go test -bench=. -benchmem -cpuprofile=cpu.prof
Then analyze: go tool pprof cpu.prof
Conclusion
The ticket seller problem teaches us:
- Check-then-act is dangerous: Race conditions lurk in “simple” logic
- Locks are not inherently slow: It’s how you use them
- Atomic operations are powerful: But limited to simple cases
- Sharding reduces contention: At the cost of complexity
- Measure before optimizing: Intuition often misleads
The key insight: There’s no one-size-fits-all solution. The best approach depends on:
- Your throughput requirements
- Your consistency requirements
- Your complexity budget
- Your contention patterns
Start with the simplest correct solution (mutex with short critical section), then optimize only if measurements show a need.
Remember: Selling 1,001 tickets when you have 1,000 is infinitely worse than selling them slowly.