← Monte Carlo Pi | Series Overview | Mandelbrot Set →
The Problem: Finding Primes with Filters
The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. The concurrent version creates a pipeline of filters: each prime spawns a goroutine that filters out its multiples.
The Algorithm:
- Generate sequence: 2, 3, 4, 5, 6, 7, 8, 9, 10, …
- Take first number (2), it’s prime, filter all multiples of 2
- Take next number (3), it’s prime, filter all multiples of 3
- Take next number (5), it’s prime, filter all multiples of 5
- Repeat until desired count
The Beauty: Each prime creates its own filter. Numbers flow through a pipeline of increasingly selective filters. What passes through all filters must be prime.
The Problem: Goroutine explosion. How many goroutines for primes up to 1 million?
The Classic Concurrent Sieve
Here’s the elegant version from Go’s early documentation:
package main
import "fmt"
// Generate natural numbers starting from 2
func generate(ch chan<- int) {
for i := 2; ; i++ {
ch <- i
}
}
// Filter out multiples of 'prime'
func filter(in <-chan int, out chan<- int, prime int) {
for {
num := <-in
if num%prime != 0 {
out <- num
}
}
}
// The sieve: generate primes using pipeline of filters
func sieve() {
ch := make(chan int) // Create initial channel
go generate(ch) // Start generating numbers
for {
prime := <-ch // First number through filter is prime
fmt.Println(prime)
ch1 := make(chan int)
go filter(ch, ch1, prime) // Create filter for this prime
ch = ch1 // Next iteration reads from filtered output
}
}
func main() {
sieve() // Run forever (Ctrl+C to stop)
}
Output:
2
3
5
7
11
13
17
19
23
...
Elegant! But let’s see what happens under the hood.
The Goroutine Explosion
Let’s count goroutines as we find primes:
import "runtime"
func countingSieve(n int) []int {
ch := make(chan int)
go generate(ch)
primes := make([]int, 0, n)
for i := 0; i < n; i++ {
prime := <-ch
primes = append(primes, prime)
ch1 := make(chan int)
go filter(ch, ch1, prime)
ch = ch1
if i%100 == 0 {
fmt.Printf("Found %d primes, goroutines: %d\n",
i, runtime.NumGoroutine())
}
}
return primes
}
func main() {
primes := countingSieve(1000)
fmt.Printf("Found %d primes\n", len(primes))
fmt.Printf("Final goroutines: %d\n", runtime.NumGoroutine())
}
Output:
Found 0 primes, goroutines: 3
Found 100 primes, goroutines: 103
Found 200 primes, goroutines: 203
Found 300 primes, goroutines: 303
...
Found 1000 primes, goroutines: 1003
1,003 goroutines for 1,000 primes. Each prime spawns a filter goroutine. For primes up to 1 million (78,498 primes), that’s 78,498 goroutines!
While Go can handle this, it’s inefficient. Each goroutine has overhead:
- Stack space (minimum 2KB, can grow)
- Scheduler bookkeeping
- Channel communication
Measuring Performance
func BenchmarkConcurrentSieve(b *testing.B) {
for i := 0; i < b.N; i++ {
countingSieve(1000)
}
}
func BenchmarkSequentialSieve(b *testing.B) {
for i := 0; i < b.N; i++ {
sequentialSieve(1000)
}
}
// Traditional sequential sieve for comparison
func sequentialSieve(n int) []int {
limit := 10000 // Upper bound estimate
isPrime := make([]bool, limit)
for i := range isPrime {
isPrime[i] = true
}
primes := make([]int, 0, n)
for i := 2; len(primes) < n; i++ {
if isPrime[i] {
primes = append(primes, i)
for j := i * i; j < limit; j += i {
isPrime[j] = false
}
}
}
return primes
}
Results:
BenchmarkConcurrentSieve-8 50 35.2 ms/op 2.1 MB/op
BenchmarkSequentialSieve-8 5000 0.3 ms/op 0.1 MB/op
The concurrent version is 100x slower! The channel communication overhead dominates.
Optimization 1: Bounded Pipeline
Instead of unlimited goroutines, use a fixed-size pool:
func boundedSieve(n, maxFilters int) []int {
ch := make(chan int, 100) // Buffered for performance
go generate(ch)
primes := make([]int, 0, n)
filters := 0
for i := 0; i < n; i++ {
prime := <-ch
primes = append(primes, prime)
// Only create filter if under limit
if filters < maxFilters {
ch1 := make(chan int, 100)
go filter(ch, ch1, prime)
ch = ch1
filters++
}
}
return primes
}
Problem: After maxFilters, we stop creating filters. Numbers pass through without proper sieving. We get incorrect results.
We need a hybrid approach.
Optimization 2: Hybrid Sieve
Use concurrent pipeline for small primes, sequential sieving for larger:
func hybridSieve(n int) []int {
const pipelineSize = 20 // Use pipeline for first 20 primes
ch := make(chan int, 100)
go generate(ch)
primes := make([]int, 0, n)
// Pipeline phase: first 'pipelineSize' primes
for i := 0; i < pipelineSize && i < n; i++ {
prime := <-ch
primes = append(primes, prime)
ch1 := make(chan int, 100)
go filter(ch, ch1, prime)
ch = ch1
}
// Sequential phase: use existing primes to filter
for len(primes) < n {
candidate := <-ch
isPrime := true
// Check divisibility by known primes
for _, p := range primes {
if p*p > candidate {
break // Optimization: only check up to sqrt
}
if candidate%p == 0 {
isPrime = false
break
}
}
if isPrime {
primes = append(primes, candidate)
}
}
return primes
}
Benchmark:
BenchmarkConcurrentSieve-8 50 35.2 ms/op
BenchmarkHybridSieve-8 3000 0.5 ms/op
BenchmarkSequentialSieve-8 5000 0.3 ms/op
70x faster than pure concurrent, still slower than sequential, but much more reasonable.
Optimization 3: Buffered Channels
Channels block on every send/receive. Buffering helps:
func bufferedSieve(n, bufSize int) []int {
ch := make(chan int, bufSize) // Large buffer
go generate(ch)
primes := make([]int, 0, n)
for i := 0; i < n; i++ {
prime := <-ch
primes = append(primes, prime)
if i < 20 { // Only pipeline first 20
ch1 := make(chan int, bufSize)
go filter(ch, ch1, prime)
ch = ch1
}
}
return primes
}
Benchmark (buffer size 1000):
BenchmarkBufferedSieve-8 5000 0.4 ms/op
Buffering reduces contention, improves throughput.
The Goroutine Limit Problem
What’s the maximum number of primes we can find with the classic approach?
func stressTest() {
counts := []int{100, 1000, 10000, 100000}
for _, count := range counts {
start := time.Now()
ch := make(chan int, 1000)
go generate(ch)
primes := 0
for i := 0; i < count; i++ {
prime := <-ch
primes++
ch1 := make(chan int, 1000)
go filter(ch, ch1, prime)
ch = ch1
}
duration := time.Since(start)
goroutines := runtime.NumGoroutine()
fmt.Printf("Primes: %6d | Time: %8s | Goroutines: %6d\n",
count, duration, goroutines)
}
}
Output:
Primes: 100 | Time: 1.2ms | Goroutines: 102
Primes: 1000 | Time: 15.8ms | Goroutines: 1002
Primes: 10000 | Time: 456.3ms | Goroutines: 10002
Primes: 100000 | Time: 8.234s | Goroutines: 100002
Time grows non-linearly! The scheduler overhead with 100,000 goroutines is substantial.
Optimization 4: Worker Pool for Filtering
Instead of one goroutine per prime, use a worker pool:
type filterTask struct {
prime int
num int
}
type filterResult struct {
num int
isPrime bool
}
func workerPoolSieve(n, workers int) []int {
tasks := make(chan filterTask, 1000)
results := make(chan filterResult, 1000)
// Start worker pool
for i := 0; i < workers; i++ {
go filterWorker(tasks, results)
}
// Generator
go func() {
for i := 2; ; i++ {
tasks <- filterTask{num: i}
}
}()
primes := make([]int, 0, n)
knownPrimes := make([]int, 0, n)
for len(primes) < n {
task := <-tasks
isPrime := true
// Check against known primes
for _, p := range knownPrimes {
if p*p > task.num {
break
}
if task.num%p == 0 {
isPrime = false
break
}
}
if isPrime {
primes = append(primes, task.num)
knownPrimes = append(knownPrimes, task.num)
}
}
return primes
}
func filterWorker(tasks <-chan filterTask, results chan<- filterResult) {
for task := range tasks {
// Worker would check divisibility, but in this design
// the main goroutine does it (simpler)
}
}
Actually, this doesn’t map well to the sieve. The beauty of the pipeline approach is each filter is independent. A worker pool doesn’t capture this elegance.
When Pipeline Shines: Prime Gaps
Find gaps between consecutive primes:
func findPrimeGaps(targetGap int) {
ch := make(chan int, 100)
go generate(ch)
prev := 0
for {
prime := <-ch
if prev > 0 {
gap := prime - prev
if gap >= targetGap {
fmt.Printf("Gap of %d: %d to %d\n", gap, prev, prime)
return
}
}
prev = prime
ch1 := make(chan int, 100)
go filter(ch, ch1, prime)
ch = ch1
}
}
The pipeline naturally produces primes in order, making sequential analysis easy.
Practical Sieve: Segmented Approach
For large-scale prime generation, use segmented sieving:
func segmentedSieve(limit int) []int {
segmentSize := 10000
allPrimes := []int{}
// Find small primes first (sequential)
smallPrimes := sequentialSieve(int(math.Sqrt(float64(limit))) + 1)
var mu sync.Mutex
var wg sync.WaitGroup
// Process segments in parallel
for low := 2; low <= limit; low += segmentSize {
high := low + segmentSize - 1
if high > limit {
high = limit
}
wg.Add(1)
go func(low, high int) {
defer wg.Done()
// Sieve this segment using smallPrimes
segmentPrimes := sieveSegment(low, high, smallPrimes)
mu.Lock()
allPrimes = append(allPrimes, segmentPrimes...)
mu.Unlock()
}(low, high)
}
wg.Wait()
// Sort (segments may complete out of order)
sort.Ints(allPrimes)
return allPrimes
}
func sieveSegment(low, high int, primes []int) []int {
size := high - low + 1
isPrime := make([]bool, size)
for i := range isPrime {
isPrime[i] = true
}
for _, p := range primes {
// Find first multiple of p in [low, high]
start := ((low + p - 1) / p) * p
if start < p*p {
start = p * p
}
for j := start; j <= high; j += p {
isPrime[j-low] = false
}
}
result := []int{}
for i := 0; i < size; i++ {
if isPrime[i] && low+i >= 2 {
result = append(result, low+i)
}
}
return result
}
Benchmark (primes up to 1,000,000):
Sequential: 42ms
Segmented: 15ms (8 workers)
2.8x speedup! This is practical parallelism.
Visualization: Pipeline Depth
func visualizePipeline(n int) {
ch := make(chan int, 10)
go generate(ch)
for i := 0; i < n; i++ {
prime := <-ch
depth := i + 1 // Pipeline depth
fmt.Printf("Prime %4d: %6d (pipeline depth: %4d)\n",
i+1, prime, depth)
ch1 := make(chan int, 10)
go filter(ch, ch1, prime)
ch = ch1
}
}
Output:
Prime 1: 2 (pipeline depth: 1)
Prime 2: 3 (pipeline depth: 2)
Prime 3: 5 (pipeline depth: 3)
Prime 4: 7 (pipeline depth: 4)
Prime 5: 11 (pipeline depth: 5)
...
The 1000th prime (7919) requires passing through a pipeline of 1000 filters. That’s a lot of channel hops!
Memory Analysis
func memoryProfile(n int) {
var m runtime.MemStats
runtime.ReadMemStats(&m)
before := m.Alloc
primes := countingSieve(n)
runtime.ReadMemStats(&m)
after := m.Alloc
fmt.Printf("Found %d primes\n", len(primes))
fmt.Printf("Memory used: %.2f MB\n", float64(after-before)/(1024*1024))
fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
}
Output (1000 primes):
Found 1000 primes
Memory used: 2.34 MB
Goroutines: 1002
Each goroutine + channel pair costs ~2KB. At scale, this adds up.
Testing Correctness
func TestSieveCorrectness(t *testing.T) {
tests := []struct {
n int
expected []int
}{
{5, []int{2, 3, 5, 7, 11}},
{10, []int{2, 3, 5, 7, 11, 13, 17, 19, 23, 29}},
}
for _, tt := range tests {
result := countingSieve(tt.n)
if !reflect.DeepEqual(result, tt.expected) {
t.Errorf("countingSieve(%d) = %v, want %v",
tt.n, result, tt.expected)
}
}
}
func TestSieveVsSequential(t *testing.T) {
n := 100
concurrent := countingSieve(n)
sequential := sequentialSieve(n)
if !reflect.DeepEqual(concurrent, sequential) {
t.Errorf("Concurrent and sequential sieves disagree")
}
}
Real-World Applications
- Cryptography: RSA key generation, primality testing
- Number Theory Research: Prime gaps, twin primes, Mersenne primes
- Educational: Teaching concurrency and algorithms
- Benchmarking: Stress-testing schedulers and channel performance
Common Pitfalls
- Unbounded goroutines: Creates performance problems at scale
- Unbuffered channels: Severe performance penalty
- No early termination: Pipeline runs forever if not carefully controlled
- Ignoring memory: Thousands of goroutines consume significant RAM
- Assuming speedup: Pipeline overhead often exceeds parallelism benefit
Best Practices
- Hybrid approaches: Combine concurrent and sequential as appropriate
- Buffer channels: Reduces blocking, improves throughput
- Limit pipeline depth: Use segmented or worker pool approaches for scale
- Benchmark realistically: Compare against optimized sequential code
- Know when not to use it: Pipeline is elegant, not always efficient
Conclusion
The concurrent Sieve of Eratosthenes is:
- Elegant: Beautiful demonstration of Go’s channel-based concurrency
- Educational: Perfect for learning pipelines and goroutine communication
- Impractical: Usually slower than sequential for actual prime generation
- Inspiring: Shows how to think about problems in terms of concurrent pipelines
Key lessons:
- Goroutine overhead is real: Thousands of goroutines have cost
- Channel communication isn’t free: Buffering helps, but there’s still overhead
- Elegance ≠ Performance: Beautiful code isn’t always fast code
- Hybrid is best: Combine concurrent and sequential techniques
The sieve teaches us that concurrency is a tool, not a goal. Use it where it makes sense, not just because you can.
For production prime generation, use:
- Sequential sieve for small ranges
- Segmented parallel sieve for large ranges
- Libraries (math/big for primality testing)
But for learning Go’s concurrency model? The pipeline sieve is pure gold.