Go Concurrency Patterns Series: ← Go Generics Patterns | Series Overview
What is Distributed Tracing?
Distributed tracing tracks requests as they flow through microservices, providing visibility into performance bottlenecks, service dependencies, and error propagation in distributed systems.
Key Concepts:
- Trace: End-to-end journey of a request across services
- Span: Single unit of work within a trace
- Context Propagation: Carrying trace information across boundaries
- Sampling: Controlling which traces to collect
Why OpenTelemetry?
OpenTelemetry (OTel) is the industry standard for observability:
- Vendor-neutral: Works with Jaeger, Zipkin, DataDog, etc.
- Comprehensive: Traces, metrics, and logs
- Language support: SDKs for all major languages
- Context propagation: Automatic across HTTP, gRPC, databases
Real-World Use Cases
- Performance debugging: Find slow services in request chains
- Error tracking: Trace errors across service boundaries
- Dependency mapping: Visualize service relationships
- SLA monitoring: Track request latency distributions
- Capacity planning: Identify bottlenecks and hotspots
Basic OpenTelemetry Setup
Installation
go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/exporters/jaeger
go get go.opentelemetry.io/otel/exporters/stdout/stdouttrace
Simple Tracer Setup
package main
import (
"context"
"fmt"
"log"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
"go.opentelemetry.io/otel/trace"
)
func initTracer() (*sdktrace.TracerProvider, error) {
// Create stdout exporter for development
exporter, err := stdouttrace.New(
stdouttrace.WithPrettyPrint(),
)
if err != nil {
return nil, err
}
// Create resource with service information
res, err := resource.Merge(
resource.Default(),
resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("my-service"),
semconv.ServiceVersion("1.0.0"),
),
)
if err != nil {
return nil, err
}
// Create tracer provider
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
)
// Set global tracer provider
otel.SetTracerProvider(tp)
return tp, nil
}
func main() {
tp, err := initTracer()
if err != nil {
log.Fatal(err)
}
defer func() {
if err := tp.Shutdown(context.Background()); err != nil {
log.Printf("Error shutting down tracer provider: %v", err)
}
}()
// Get tracer
tracer := otel.Tracer("example")
// Create a span
ctx := context.Background()
ctx, span := tracer.Start(ctx, "main-operation")
defer span.End()
// Add attributes
span.SetAttributes(
attribute.String("user.id", "user-123"),
attribute.Int("items.count", 5),
)
// Do work
doWork(ctx, tracer)
fmt.Println("Trace completed")
}
func doWork(ctx context.Context, tracer trace.Tracer) {
ctx, span := tracer.Start(ctx, "do-work")
defer span.End()
// Simulate work
time.Sleep(100 * time.Millisecond)
span.AddEvent("Work completed", trace.WithAttributes(
attribute.String("status", "success"),
))
}
Jaeger Integration
Jaeger Exporter Setup
package main
import (
"context"
"log"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
)
func initJaegerTracer(serviceName string) (*sdktrace.TracerProvider, error) {
// Create Jaeger exporter
exporter, err := jaeger.New(
jaeger.WithCollectorEndpoint(
jaeger.WithEndpoint("http://localhost:14268/api/traces"),
),
)
if err != nil {
return nil, err
}
// Create resource
res, err := resource.Merge(
resource.Default(),
resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName(serviceName),
semconv.ServiceVersion("1.0.0"),
semconv.DeploymentEnvironment("production"),
),
)
if err != nil {
return nil, err
}
// Create tracer provider with sampling
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.AlwaysSample()),
)
otel.SetTracerProvider(tp)
return tp, nil
}
func main() {
tp, err := initJaegerTracer("my-go-service")
if err != nil {
log.Fatal(err)
}
defer tp.Shutdown(context.Background())
// Use tracer
tracer := otel.Tracer("my-component")
ctx := context.Background()
ctx, span := tracer.Start(ctx, "example-operation")
defer span.End()
// Your application logic here
}
Running Jaeger with Docker
docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 5775:5775/udp \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 14268:14268 \
-p 14250:14250 \
-p 9411:9411 \
jaegertracing/all-in-one:latest
Access Jaeger UI at: http://localhost:16686
HTTP Server Instrumentation
Automatic HTTP Middleware
package main
import (
"fmt"
"log"
"net/http"
"time"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)
func main() {
// Initialize tracer (using previous initJaegerTracer)
tp, err := initJaegerTracer("http-server")
if err != nil {
log.Fatal(err)
}
defer tp.Shutdown(context.Background())
// Create handler
handler := http.HandlerFunc(handleRequest)
// Wrap with OpenTelemetry middleware
wrappedHandler := otelhttp.NewHandler(handler, "http-server")
// Start server
log.Println("Server starting on :8080")
log.Fatal(http.ListenAndServe(":8080", wrappedHandler))
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
// Get span from context
span := trace.SpanFromContext(r.Context())
// Add custom attributes
span.SetAttributes(
attribute.String("user.agent", r.UserAgent()),
attribute.String("http.client_ip", r.RemoteAddr),
)
// Do work
processRequest(r.Context())
fmt.Fprintf(w, "Request processed successfully")
}
func processRequest(ctx context.Context) {
tracer := otel.Tracer("processor")
ctx, span := tracer.Start(ctx, "process-request")
defer span.End()
// Simulate database call
queryDatabase(ctx)
// Simulate external API call
callExternalAPI(ctx)
}
func queryDatabase(ctx context.Context) {
tracer := otel.Tracer("database")
ctx, span := tracer.Start(ctx, "query-database")
defer span.End()
span.SetAttributes(
attribute.String("db.system", "postgresql"),
attribute.String("db.statement", "SELECT * FROM users WHERE id = $1"),
)
// Simulate query
time.Sleep(50 * time.Millisecond)
}
func callExternalAPI(ctx context.Context) {
tracer := otel.Tracer("external")
ctx, span := tracer.Start(ctx, "call-external-api")
defer span.End()
span.SetAttributes(
attribute.String("http.url", "https://api.example.com/data"),
attribute.String("http.method", "GET"),
)
// Simulate API call
time.Sleep(100 * time.Millisecond)
}
HTTP Client Instrumentation
Traced HTTP Client
package main
import (
"context"
"fmt"
"io"
"log"
"net/http"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
type TracedHTTPClient struct {
client *http.Client
}
func NewTracedHTTPClient() *TracedHTTPClient {
return &TracedHTTPClient{
client: &http.Client{
Transport: otelhttp.NewTransport(http.DefaultTransport),
},
}
}
func (c *TracedHTTPClient) Get(ctx context.Context, url string) (string, error) {
tracer := otel.Tracer("http-client")
ctx, span := tracer.Start(ctx, "http.get")
defer span.End()
span.SetAttributes(
attribute.String("http.url", url),
attribute.String("http.method", "GET"),
)
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
span.RecordError(err)
return "", err
}
resp, err := c.client.Do(req)
if err != nil {
span.RecordError(err)
return "", err
}
defer resp.Body.Close()
span.SetAttributes(
attribute.Int("http.status_code", resp.StatusCode),
)
body, err := io.ReadAll(resp.Body)
if err != nil {
span.RecordError(err)
return "", err
}
return string(body), nil
}
func main() {
tp, err := initJaegerTracer("http-client")
if err != nil {
log.Fatal(err)
}
defer tp.Shutdown(context.Background())
client := NewTracedHTTPClient()
ctx := context.Background()
response, err := client.Get(ctx, "https://api.github.com/users/golang")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Response length: %d bytes\n", len(response))
}
Database Instrumentation
SQL Database Tracing
package main
import (
"context"
"database/sql"
"fmt"
"log"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
// Database driver with tracing support
_ "github.com/lib/pq"
)
type TracedDB struct {
db *sql.DB
}
func NewTracedDB(connString string) (*TracedDB, error) {
db, err := sql.Open("postgres", connString)
if err != nil {
return nil, err
}
return &TracedDB{db: db}, nil
}
func (tdb *TracedDB) QueryUser(ctx context.Context, userID int) (*User, error) {
tracer := otel.Tracer("database")
ctx, span := tracer.Start(ctx, "query-user")
defer span.End()
// Add database-specific attributes
span.SetAttributes(
semconv.DBSystemPostgreSQL,
semconv.DBStatement("SELECT id, name, email FROM users WHERE id = $1"),
attribute.Int("db.user_id", userID),
)
query := "SELECT id, name, email FROM users WHERE id = $1"
row := tdb.db.QueryRowContext(ctx, query, userID)
var user User
err := row.Scan(&user.ID, &user.Name, &user.Email)
if err != nil {
span.RecordError(err)
return nil, err
}
span.SetAttributes(
attribute.String("db.result.name", user.Name),
)
return &user, nil
}
func (tdb *TracedDB) InsertUser(ctx context.Context, user *User) error {
tracer := otel.Tracer("database")
ctx, span := tracer.Start(ctx, "insert-user")
defer span.End()
span.SetAttributes(
semconv.DBSystemPostgreSQL,
semconv.DBStatement("INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id"),
attribute.String("db.user.name", user.Name),
)
query := "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id"
err := tdb.db.QueryRowContext(ctx, query, user.Name, user.Email).Scan(&user.ID)
if err != nil {
span.RecordError(err)
return err
}
span.SetAttributes(
attribute.Int("db.user.id", user.ID),
)
return nil
}
type User struct {
ID int
Name string
Email string
}
Microservices Tracing
Service A (API Gateway)
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
func main() {
tp, err := initJaegerTracer("api-gateway")
if err != nil {
log.Fatal(err)
}
defer tp.Shutdown(context.Background())
mux := http.NewServeMux()
mux.HandleFunc("/api/order", handleOrder)
handler := otelhttp.NewHandler(mux, "api-gateway")
log.Println("API Gateway starting on :8080")
log.Fatal(http.ListenAndServe(":8080", handler))
}
func handleOrder(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
tracer := otel.Tracer("api-gateway")
ctx, span := tracer.Start(ctx, "handle-order")
defer span.End()
orderID := r.URL.Query().Get("id")
span.SetAttributes(attribute.String("order.id", orderID))
// Call user service
user, err := getUserInfo(ctx, "user-123")
if err != nil {
span.RecordError(err)
http.Error(w, "Failed to get user info", http.StatusInternalServerError)
return
}
// Call order service
order, err := getOrderDetails(ctx, orderID)
if err != nil {
span.RecordError(err)
http.Error(w, "Failed to get order", http.StatusInternalServerError)
return
}
response := map[string]interface{}{
"user": user,
"order": order,
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response)
}
func getUserInfo(ctx context.Context, userID string) (map[string]string, error) {
tracer := otel.Tracer("api-gateway")
ctx, span := tracer.Start(ctx, "get-user-info")
defer span.End()
span.SetAttributes(attribute.String("user.id", userID))
// Call user service at :8081
client := &http.Client{
Transport: otelhttp.NewTransport(http.DefaultTransport),
}
url := fmt.Sprintf("http://localhost:8081/user?id=%s", userID)
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
resp, err := client.Do(req)
if err != nil {
span.RecordError(err)
return nil, err
}
defer resp.Body.Close()
var user map[string]string
json.NewDecoder(resp.Body).Decode(&user)
return user, nil
}
func getOrderDetails(ctx context.Context, orderID string) (map[string]interface{}, error) {
tracer := otel.Tracer("api-gateway")
ctx, span := tracer.Start(ctx, "get-order-details")
defer span.End()
span.SetAttributes(attribute.String("order.id", orderID))
// Call order service at :8082
client := &http.Client{
Transport: otelhttp.NewTransport(http.DefaultTransport),
}
url := fmt.Sprintf("http://localhost:8082/order?id=%s", orderID)
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
resp, err := client.Do(req)
if err != nil {
span.RecordError(err)
return nil, err
}
defer resp.Body.Close()
var order map[string]interface{}
json.NewDecoder(resp.Body).Decode(&order)
return order, nil
}
Custom Instrumentation
Manual Span Management
package main
import (
"context"
"fmt"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
)
func processOrder(ctx context.Context, orderID string) error {
tracer := otel.Tracer("order-processor")
ctx, span := tracer.Start(ctx,
"process-order",
trace.WithSpanKind(trace.SpanKindInternal),
trace.WithAttributes(
attribute.String("order.id", orderID),
),
)
defer span.End()
// Add event
span.AddEvent("Validation started")
// Validate order
if err := validateOrder(ctx, orderID); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "Validation failed")
return err
}
span.AddEvent("Validation completed")
// Process payment
if err := processPayment(ctx, orderID); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "Payment failed")
return err
}
// Ship order
if err := shipOrder(ctx, orderID); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "Shipping failed")
return err
}
span.SetStatus(codes.Ok, "Order processed successfully")
return nil
}
func validateOrder(ctx context.Context, orderID string) error {
tracer := otel.Tracer("order-processor")
ctx, span := tracer.Start(ctx, "validate-order")
defer span.End()
// Simulate validation
time.Sleep(50 * time.Millisecond)
span.SetAttributes(
attribute.Bool("validation.passed", true),
)
return nil
}
func processPayment(ctx context.Context, orderID string) error {
tracer := otel.Tracer("payment-processor")
ctx, span := tracer.Start(ctx, "process-payment")
defer span.End()
span.SetAttributes(
attribute.String("payment.gateway", "stripe"),
attribute.Float64("payment.amount", 99.99),
)
// Simulate payment
time.Sleep(100 * time.Millisecond)
return nil
}
func shipOrder(ctx context.Context, orderID string) error {
tracer := otel.Tracer("shipping")
ctx, span := tracer.Start(ctx, "ship-order")
defer span.End()
span.SetAttributes(
attribute.String("shipping.carrier", "fedex"),
attribute.String("shipping.tracking", "1Z999AA10123456784"),
)
// Simulate shipping
time.Sleep(75 * time.Millisecond)
return nil
}
Error Tracking
Comprehensive Error Recording
package main
import (
"context"
"errors"
"fmt"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
)
var (
ErrNotFound = errors.New("resource not found")
ErrUnauthorized = errors.New("unauthorized")
ErrInvalidInput = errors.New("invalid input")
)
func fetchResource(ctx context.Context, resourceID string) error {
tracer := otel.Tracer("resource-fetcher")
ctx, span := tracer.Start(ctx, "fetch-resource")
defer span.End()
span.SetAttributes(
attribute.String("resource.id", resourceID),
)
// Simulate error
err := ErrNotFound
if err != nil {
// Record error with attributes
span.RecordError(err,
trace.WithAttributes(
attribute.String("error.type", "not_found"),
attribute.String("resource.id", resourceID),
),
)
// Set span status
span.SetStatus(codes.Error, err.Error())
return fmt.Errorf("failed to fetch resource: %w", err)
}
span.SetStatus(codes.Ok, "")
return nil
}
func handleError(ctx context.Context, err error) {
tracer := otel.Tracer("error-handler")
ctx, span := tracer.Start(ctx, "handle-error")
defer span.End()
// Classify error
var errorType string
switch {
case errors.Is(err, ErrNotFound):
errorType = "not_found"
case errors.Is(err, ErrUnauthorized):
errorType = "unauthorized"
case errors.Is(err, ErrInvalidInput):
errorType = "invalid_input"
default:
errorType = "unknown"
}
span.SetAttributes(
attribute.String("error.type", errorType),
attribute.String("error.message", err.Error()),
)
span.RecordError(err)
}
Sampling Strategies
Custom Sampling
package main
import (
"go.opentelemetry.io/otel/sdk/trace"
)
func setupSampling() *trace.TracerProvider {
// Always sample for development
// sampler := trace.AlwaysSample()
// Never sample (useful for debugging)
// sampler := trace.NeverSample()
// Sample 10% of traces
sampler := trace.TraceIDRatioBased(0.1)
// Parent-based sampling (inherit from parent span)
// sampler := trace.ParentBased(trace.TraceIDRatioBased(0.1))
tp := trace.NewTracerProvider(
trace.WithSampler(sampler),
)
return tp
}
Best Practices
1. Consistent Naming
// Good: Consistent span naming
tracer.Start(ctx, "service.operation")
tracer.Start(ctx, "database.query")
tracer.Start(ctx, "http.request")
// Bad: Inconsistent naming
tracer.Start(ctx, "DoSomething")
tracer.Start(ctx, "query_db")
tracer.Start(ctx, "HTTP-Request")
2. Add Meaningful Attributes
span.SetAttributes(
attribute.String("user.id", userID),
attribute.Int("order.item_count", itemCount),
attribute.Float64("order.total", total),
attribute.Bool("order.expedited", true),
)
3. Use Span Events for Milestones
span.AddEvent("Validation started")
span.AddEvent("Payment processed", trace.WithAttributes(
attribute.String("payment.id", paymentID),
))
span.AddEvent("Order shipped")
4. Always Defer span.End()
ctx, span := tracer.Start(ctx, "operation")
defer span.End() // Ensures span is closed even on panic
Performance Considerations
- Sampling: Use appropriate sampling rates for production
- Batch Export: Configure batch size and interval
- Resource Limits: Set max spans per trace
- Overhead: Tracing adds <1% overhead when properly configured
- Context Size: Keep span attributes reasonably sized
Conclusion
Distributed tracing with OpenTelemetry provides crucial visibility into microservices architecture, enabling faster debugging and better understanding of system behavior.
Key Takeaways:
- OpenTelemetry is the industry standard for distributed tracing
- Automatic instrumentation for HTTP, gRPC, databases
- Context propagation carries trace information across services
- Jaeger provides excellent visualization of traces
- Proper sampling reduces overhead while maintaining visibility
This completes the Go Concurrency Patterns series! Review the Series Overview to explore all patterns.
Previous: Go Generics Design Patterns Series: Go Concurrency Patterns