← Monte Carlo Pi | Series Overview | Mandelbrot Set →


The Problem: Finding Primes with Filters

The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. The concurrent version creates a pipeline of filters: each prime spawns a goroutine that filters out its multiples.

The Algorithm:

  1. Generate sequence: 2, 3, 4, 5, 6, 7, 8, 9, 10, …
  2. Take first number (2), it’s prime, filter all multiples of 2
  3. Take next number (3), it’s prime, filter all multiples of 3
  4. Take next number (5), it’s prime, filter all multiples of 5
  5. Repeat until desired count

The Beauty: Each prime creates its own filter. Numbers flow through a pipeline of increasingly selective filters. What passes through all filters must be prime.

The Problem: Goroutine explosion. How many goroutines for primes up to 1 million?

The Classic Concurrent Sieve

Here’s the elegant version from Go’s early documentation:

package main

import "fmt"

// Generate natural numbers starting from 2
func generate(ch chan<- int) {
    for i := 2; ; i++ {
        ch <- i
    }
}

// Filter out multiples of 'prime'
func filter(in <-chan int, out chan<- int, prime int) {
    for {
        num := <-in
        if num%prime != 0 {
            out <- num
        }
    }
}

// The sieve: generate primes using pipeline of filters
func sieve() {
    ch := make(chan int) // Create initial channel
    go generate(ch)      // Start generating numbers

    for {
        prime := <-ch // First number through filter is prime
        fmt.Println(prime)

        ch1 := make(chan int)
        go filter(ch, ch1, prime) // Create filter for this prime
        ch = ch1                  // Next iteration reads from filtered output
    }
}

func main() {
    sieve() // Run forever (Ctrl+C to stop)
}

Output:

2
3
5
7
11
13
17
19
23
...

Elegant! But let’s see what happens under the hood.

The Goroutine Explosion

Let’s count goroutines as we find primes:

import "runtime"

func countingSieve(n int) []int {
    ch := make(chan int)
    go generate(ch)

    primes := make([]int, 0, n)

    for i := 0; i < n; i++ {
        prime := <-ch
        primes = append(primes, prime)

        ch1 := make(chan int)
        go filter(ch, ch1, prime)
        ch = ch1

        if i%100 == 0 {
            fmt.Printf("Found %d primes, goroutines: %d\n",
                i, runtime.NumGoroutine())
        }
    }

    return primes
}

func main() {
    primes := countingSieve(1000)
    fmt.Printf("Found %d primes\n", len(primes))
    fmt.Printf("Final goroutines: %d\n", runtime.NumGoroutine())
}

Output:

Found 0 primes, goroutines: 3
Found 100 primes, goroutines: 103
Found 200 primes, goroutines: 203
Found 300 primes, goroutines: 303
...
Found 1000 primes, goroutines: 1003

1,003 goroutines for 1,000 primes. Each prime spawns a filter goroutine. For primes up to 1 million (78,498 primes), that’s 78,498 goroutines!

While Go can handle this, it’s inefficient. Each goroutine has overhead:

  • Stack space (minimum 2KB, can grow)
  • Scheduler bookkeeping
  • Channel communication

Measuring Performance

func BenchmarkConcurrentSieve(b *testing.B) {
    for i := 0; i < b.N; i++ {
        countingSieve(1000)
    }
}

func BenchmarkSequentialSieve(b *testing.B) {
    for i := 0; i < b.N; i++ {
        sequentialSieve(1000)
    }
}

// Traditional sequential sieve for comparison
func sequentialSieve(n int) []int {
    limit := 10000 // Upper bound estimate
    isPrime := make([]bool, limit)
    for i := range isPrime {
        isPrime[i] = true
    }

    primes := make([]int, 0, n)

    for i := 2; len(primes) < n; i++ {
        if isPrime[i] {
            primes = append(primes, i)
            for j := i * i; j < limit; j += i {
                isPrime[j] = false
            }
        }
    }

    return primes
}

Results:

BenchmarkConcurrentSieve-8      50     35.2 ms/op    2.1 MB/op
BenchmarkSequentialSieve-8    5000      0.3 ms/op    0.1 MB/op

The concurrent version is 100x slower! The channel communication overhead dominates.

Optimization 1: Bounded Pipeline

Instead of unlimited goroutines, use a fixed-size pool:

func boundedSieve(n, maxFilters int) []int {
    ch := make(chan int, 100) // Buffered for performance
    go generate(ch)

    primes := make([]int, 0, n)
    filters := 0

    for i := 0; i < n; i++ {
        prime := <-ch
        primes = append(primes, prime)

        // Only create filter if under limit
        if filters < maxFilters {
            ch1 := make(chan int, 100)
            go filter(ch, ch1, prime)
            ch = ch1
            filters++
        }
    }

    return primes
}

Problem: After maxFilters, we stop creating filters. Numbers pass through without proper sieving. We get incorrect results.

We need a hybrid approach.

Optimization 2: Hybrid Sieve

Use concurrent pipeline for small primes, sequential sieving for larger:

func hybridSieve(n int) []int {
    const pipelineSize = 20 // Use pipeline for first 20 primes

    ch := make(chan int, 100)
    go generate(ch)

    primes := make([]int, 0, n)

    // Pipeline phase: first 'pipelineSize' primes
    for i := 0; i < pipelineSize && i < n; i++ {
        prime := <-ch
        primes = append(primes, prime)

        ch1 := make(chan int, 100)
        go filter(ch, ch1, prime)
        ch = ch1
    }

    // Sequential phase: use existing primes to filter
    for len(primes) < n {
        candidate := <-ch
        isPrime := true

        // Check divisibility by known primes
        for _, p := range primes {
            if p*p > candidate {
                break // Optimization: only check up to sqrt
            }
            if candidate%p == 0 {
                isPrime = false
                break
            }
        }

        if isPrime {
            primes = append(primes, candidate)
        }
    }

    return primes
}

Benchmark:

BenchmarkConcurrentSieve-8        50     35.2 ms/op
BenchmarkHybridSieve-8          3000      0.5 ms/op
BenchmarkSequentialSieve-8      5000      0.3 ms/op

70x faster than pure concurrent, still slower than sequential, but much more reasonable.

Optimization 3: Buffered Channels

Channels block on every send/receive. Buffering helps:

func bufferedSieve(n, bufSize int) []int {
    ch := make(chan int, bufSize) // Large buffer
    go generate(ch)

    primes := make([]int, 0, n)

    for i := 0; i < n; i++ {
        prime := <-ch
        primes = append(primes, prime)

        if i < 20 { // Only pipeline first 20
            ch1 := make(chan int, bufSize)
            go filter(ch, ch1, prime)
            ch = ch1
        }
    }

    return primes
}

Benchmark (buffer size 1000):

BenchmarkBufferedSieve-8        5000      0.4 ms/op

Buffering reduces contention, improves throughput.

The Goroutine Limit Problem

What’s the maximum number of primes we can find with the classic approach?

func stressTest() {
    counts := []int{100, 1000, 10000, 100000}

    for _, count := range counts {
        start := time.Now()

        ch := make(chan int, 1000)
        go generate(ch)

        primes := 0
        for i := 0; i < count; i++ {
            prime := <-ch
            primes++

            ch1 := make(chan int, 1000)
            go filter(ch, ch1, prime)
            ch = ch1
        }

        duration := time.Since(start)
        goroutines := runtime.NumGoroutine()

        fmt.Printf("Primes: %6d | Time: %8s | Goroutines: %6d\n",
            count, duration, goroutines)
    }
}

Output:

Primes:    100 | Time:    1.2ms | Goroutines:    102
Primes:   1000 | Time:   15.8ms | Goroutines:   1002
Primes:  10000 | Time:  456.3ms | Goroutines:  10002
Primes: 100000 | Time: 8.234s   | Goroutines: 100002

Time grows non-linearly! The scheduler overhead with 100,000 goroutines is substantial.

Optimization 4: Worker Pool for Filtering

Instead of one goroutine per prime, use a worker pool:

type filterTask struct {
    prime int
    num   int
}

type filterResult struct {
    num     int
    isPrime bool
}

func workerPoolSieve(n, workers int) []int {
    tasks := make(chan filterTask, 1000)
    results := make(chan filterResult, 1000)

    // Start worker pool
    for i := 0; i < workers; i++ {
        go filterWorker(tasks, results)
    }

    // Generator
    go func() {
        for i := 2; ; i++ {
            tasks <- filterTask{num: i}
        }
    }()

    primes := make([]int, 0, n)
    knownPrimes := make([]int, 0, n)

    for len(primes) < n {
        task := <-tasks
        isPrime := true

        // Check against known primes
        for _, p := range knownPrimes {
            if p*p > task.num {
                break
            }
            if task.num%p == 0 {
                isPrime = false
                break
            }
        }

        if isPrime {
            primes = append(primes, task.num)
            knownPrimes = append(knownPrimes, task.num)
        }
    }

    return primes
}

func filterWorker(tasks <-chan filterTask, results chan<- filterResult) {
    for task := range tasks {
        // Worker would check divisibility, but in this design
        // the main goroutine does it (simpler)
    }
}

Actually, this doesn’t map well to the sieve. The beauty of the pipeline approach is each filter is independent. A worker pool doesn’t capture this elegance.

When Pipeline Shines: Prime Gaps

Find gaps between consecutive primes:

func findPrimeGaps(targetGap int) {
    ch := make(chan int, 100)
    go generate(ch)

    prev := 0

    for {
        prime := <-ch

        if prev > 0 {
            gap := prime - prev
            if gap >= targetGap {
                fmt.Printf("Gap of %d: %d to %d\n", gap, prev, prime)
                return
            }
        }

        prev = prime

        ch1 := make(chan int, 100)
        go filter(ch, ch1, prime)
        ch = ch1
    }
}

The pipeline naturally produces primes in order, making sequential analysis easy.

Practical Sieve: Segmented Approach

For large-scale prime generation, use segmented sieving:

func segmentedSieve(limit int) []int {
    segmentSize := 10000
    allPrimes := []int{}

    // Find small primes first (sequential)
    smallPrimes := sequentialSieve(int(math.Sqrt(float64(limit))) + 1)

    var mu sync.Mutex
    var wg sync.WaitGroup

    // Process segments in parallel
    for low := 2; low <= limit; low += segmentSize {
        high := low + segmentSize - 1
        if high > limit {
            high = limit
        }

        wg.Add(1)
        go func(low, high int) {
            defer wg.Done()

            // Sieve this segment using smallPrimes
            segmentPrimes := sieveSegment(low, high, smallPrimes)

            mu.Lock()
            allPrimes = append(allPrimes, segmentPrimes...)
            mu.Unlock()
        }(low, high)
    }

    wg.Wait()

    // Sort (segments may complete out of order)
    sort.Ints(allPrimes)

    return allPrimes
}

func sieveSegment(low, high int, primes []int) []int {
    size := high - low + 1
    isPrime := make([]bool, size)
    for i := range isPrime {
        isPrime[i] = true
    }

    for _, p := range primes {
        // Find first multiple of p in [low, high]
        start := ((low + p - 1) / p) * p
        if start < p*p {
            start = p * p
        }

        for j := start; j <= high; j += p {
            isPrime[j-low] = false
        }
    }

    result := []int{}
    for i := 0; i < size; i++ {
        if isPrime[i] && low+i >= 2 {
            result = append(result, low+i)
        }
    }

    return result
}

Benchmark (primes up to 1,000,000):

Sequential:  42ms
Segmented:   15ms (8 workers)

2.8x speedup! This is practical parallelism.

Visualization: Pipeline Depth

func visualizePipeline(n int) {
    ch := make(chan int, 10)
    go generate(ch)

    for i := 0; i < n; i++ {
        prime := <-ch
        depth := i + 1 // Pipeline depth

        fmt.Printf("Prime %4d: %6d (pipeline depth: %4d)\n",
            i+1, prime, depth)

        ch1 := make(chan int, 10)
        go filter(ch, ch1, prime)
        ch = ch1
    }
}

Output:

Prime    1:      2 (pipeline depth:    1)
Prime    2:      3 (pipeline depth:    2)
Prime    3:      5 (pipeline depth:    3)
Prime    4:      7 (pipeline depth:    4)
Prime    5:     11 (pipeline depth:    5)
...

The 1000th prime (7919) requires passing through a pipeline of 1000 filters. That’s a lot of channel hops!

Memory Analysis

func memoryProfile(n int) {
    var m runtime.MemStats

    runtime.ReadMemStats(&m)
    before := m.Alloc

    primes := countingSieve(n)

    runtime.ReadMemStats(&m)
    after := m.Alloc

    fmt.Printf("Found %d primes\n", len(primes))
    fmt.Printf("Memory used: %.2f MB\n", float64(after-before)/(1024*1024))
    fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
}

Output (1000 primes):

Found 1000 primes
Memory used: 2.34 MB
Goroutines: 1002

Each goroutine + channel pair costs ~2KB. At scale, this adds up.

Testing Correctness

func TestSieveCorrectness(t *testing.T) {
    tests := []struct {
        n        int
        expected []int
    }{
        {5, []int{2, 3, 5, 7, 11}},
        {10, []int{2, 3, 5, 7, 11, 13, 17, 19, 23, 29}},
    }

    for _, tt := range tests {
        result := countingSieve(tt.n)
        if !reflect.DeepEqual(result, tt.expected) {
            t.Errorf("countingSieve(%d) = %v, want %v",
                tt.n, result, tt.expected)
        }
    }
}

func TestSieveVsSequential(t *testing.T) {
    n := 100

    concurrent := countingSieve(n)
    sequential := sequentialSieve(n)

    if !reflect.DeepEqual(concurrent, sequential) {
        t.Errorf("Concurrent and sequential sieves disagree")
    }
}

Real-World Applications

  1. Cryptography: RSA key generation, primality testing
  2. Number Theory Research: Prime gaps, twin primes, Mersenne primes
  3. Educational: Teaching concurrency and algorithms
  4. Benchmarking: Stress-testing schedulers and channel performance

Common Pitfalls

  1. Unbounded goroutines: Creates performance problems at scale
  2. Unbuffered channels: Severe performance penalty
  3. No early termination: Pipeline runs forever if not carefully controlled
  4. Ignoring memory: Thousands of goroutines consume significant RAM
  5. Assuming speedup: Pipeline overhead often exceeds parallelism benefit

Best Practices

  1. Hybrid approaches: Combine concurrent and sequential as appropriate
  2. Buffer channels: Reduces blocking, improves throughput
  3. Limit pipeline depth: Use segmented or worker pool approaches for scale
  4. Benchmark realistically: Compare against optimized sequential code
  5. Know when not to use it: Pipeline is elegant, not always efficient

Conclusion

The concurrent Sieve of Eratosthenes is:

  • Elegant: Beautiful demonstration of Go’s channel-based concurrency
  • Educational: Perfect for learning pipelines and goroutine communication
  • Impractical: Usually slower than sequential for actual prime generation
  • Inspiring: Shows how to think about problems in terms of concurrent pipelines

Key lessons:

  • Goroutine overhead is real: Thousands of goroutines have cost
  • Channel communication isn’t free: Buffering helps, but there’s still overhead
  • Elegance ≠ Performance: Beautiful code isn’t always fast code
  • Hybrid is best: Combine concurrent and sequential techniques

The sieve teaches us that concurrency is a tool, not a goal. Use it where it makes sense, not just because you can.

For production prime generation, use:

  • Sequential sieve for small ranges
  • Segmented parallel sieve for large ranges
  • Libraries (math/big for primality testing)

But for learning Go’s concurrency model? The pipeline sieve is pure gold.


Further Reading


← Monte Carlo Pi | Series Overview | Mandelbrot Set →