Posts

Go Concurrency Pattern: The Ticket Seller

← Bank Account Drama | Series Overview | Login Counter → The Problem: Selling Tickets That Don’t Exist A concert has 1,000 tickets. Multiple goroutines handle sales concurrently. Each seller checks if tickets remain, and if yes, decrements the count and sells one. Sounds simple, right? Two sellers check simultaneously. Both see “1 ticket remaining.” Both sell. You’ve just sold ticket number -1. Congratulations, you’ve discovered the check-then-act race condition, one of the most common concurrency bugs in the wild. ...

Profitability Metrics: Beyond 'Are We Making Money?'

Business Math Current: Profitability Metrics: Beyond 'Are We Making Money?' Customer Economics: The LTV/CAC Framework That Predicts Success All Posts Next The WeWork Paradox: $47B Valuation with Negative Unit Economics In 2019, WeWork was valued at $47 billion. They had beautiful offices in major cities worldwide. Revenue was growing at triple-digit rates. They were “revolutionizing” real estate. ...

WebRTC in Go: Peer-to-Peer Real-Time Communication

Backend Communication Current: WebRTC gRPC Streaming All Posts NATS & JetStream What is WebRTC? WebRTC (Web Real-Time Communication) enables peer-to-peer audio, video, and data sharing directly between browsers and native applications. Unlike traditional client-server models, WebRTC allows clients to communicate directly with each other after establishing a connection through a signaling server. ...

The LLM Development Workflow: A Data-Centric View

Introduction: It’s All About the Data The secret to building great language models isn’t just architecture or compute-it’s data. Every decision in the LLM lifecycle revolves around data: What data do we train on? How do we clean and filter it? How do we align the model with human preferences? How do we measure success? Let’s trace the complete journey from raw text to a production-ready model, with data at the center. ...

Go Concurrency Pattern: The Sieve of Eratosthenes Pipeline

← Monte Carlo Pi | Series Overview | Mandelbrot Set → The Problem: Finding Primes with Filters The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. The concurrent version creates a pipeline of filters: each prime spawns a goroutine that filters out its multiples. The Algorithm: Generate sequence: 2, 3, 4, 5, 6, 7, 8, 9, 10, … Take first number (2), it’s prime, filter all multiples of 2 Take next number (3), it’s prime, filter all multiples of 3 Take next number (5), it’s prime, filter all multiples of 5 Repeat until desired count The Beauty: Each prime creates its own filter. Numbers flow through a pipeline of increasingly selective filters. What passes through all filters must be prime. ...

Customer Economics: The LTV/CAC Framework That Predicts Success

Business Math Current: Customer Economics: The LTV/CAC Framework That Predicts Success Previous All Posts Profitability Metrics: Beyond 'Are We Making Money?' The $100M Mistake In 2011, a promising e-commerce startup raised $100M in funding. Their revenue was growing 20% month-over-month. The press loved them. Investors were excited. Eighteen months later, they shut down. ...

gRPC Streaming in Go: High-Performance Inter-Service Communication

Backend Communication Current: gRPC Streaming WebSockets All Posts WebRTC What is gRPC Streaming? gRPC (gRPC Remote Procedure Call) is a high-performance, open-source RPC framework that uses HTTP/2 for transport, Protocol Buffers for serialization, and provides built-in support for streaming. Unlike traditional request-response RPCs, gRPC streaming enables long-lived connections where either party can send multiple messages over time. ...

Unpacking KV Cache Optimization: MLA and GQA Explained

Introduction: The Memory Wall Modern LLMs can process context windows of 100K+ tokens. But there’s a hidden cost: the KV cache. As context grows, the memory required to store key-value pairs in attention explodes quadratically. This creates a bottleneck: Memory: KV cache can consume 10-100× more memory than model weights Bandwidth: Moving KV cache data becomes the primary latency source Cost: Serving long-context models requires expensive high-memory GPUs Two innovations address this: Grouped Query Attention (GQA) and Multi-Head Latent Attention (MLA). They reduce KV cache size by 4-8× while maintaining quality. ...

Go Concurrency Pattern: Monte Carlo Pi Estimation

← Login Counter | Series Overview | Sieve of Eratosthenes → The Problem: Computing Pi by Throwing Darts Imagine a square dartboard with a circle inscribed inside it. Throw random darts at the square. The ratio of darts landing inside the circle to total darts thrown approaches π/4. Why? Mathematics: Square side length: 2 (from -1 to 1) Square area: 4 Circle radius: 1 Circle area: π × 1² = π Ratio: π/4 Throw 1 million darts, multiply by 4, and you’ve estimated π. More darts = better estimate. This is Monte Carlo simulation: using randomness to solve deterministic problems. ...

WebSockets in Go: Building Real-Time Bidirectional Communication

Backend Communication Current: WebSockets Server-Sent Events All Posts gRPC Streaming What are WebSockets? WebSockets provide full-duplex, bidirectional communication channels over a single TCP connection. Unlike HTTP’s request-response model, WebSockets enable both client and server to send messages independently at any time, making them ideal for real-time applications. ...