Go Concurrency Patterns Series: ← Go Generics Patterns | Series Overview


What is Distributed Tracing?

Distributed tracing tracks requests as they flow through microservices, providing visibility into performance bottlenecks, service dependencies, and error propagation in distributed systems.

Key Concepts:

  • Trace: End-to-end journey of a request across services
  • Span: Single unit of work within a trace
  • Context Propagation: Carrying trace information across boundaries
  • Sampling: Controlling which traces to collect

Why OpenTelemetry?

OpenTelemetry (OTel) is the industry standard for observability:

  • Vendor-neutral: Works with Jaeger, Zipkin, DataDog, etc.
  • Comprehensive: Traces, metrics, and logs
  • Language support: SDKs for all major languages
  • Context propagation: Automatic across HTTP, gRPC, databases

Real-World Use Cases

  • Performance debugging: Find slow services in request chains
  • Error tracking: Trace errors across service boundaries
  • Dependency mapping: Visualize service relationships
  • SLA monitoring: Track request latency distributions
  • Capacity planning: Identify bottlenecks and hotspots

Basic OpenTelemetry Setup

Installation

go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/exporters/jaeger
go get go.opentelemetry.io/otel/exporters/stdout/stdouttrace

Simple Tracer Setup

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
	"go.opentelemetry.io/otel/trace"
)

func initTracer() (*sdktrace.TracerProvider, error) {
	// Create stdout exporter for development
	exporter, err := stdouttrace.New(
		stdouttrace.WithPrettyPrint(),
	)
	if err != nil {
		return nil, err
	}

	// Create resource with service information
	res, err := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName("my-service"),
			semconv.ServiceVersion("1.0.0"),
		),
	)
	if err != nil {
		return nil, err
	}

	// Create tracer provider
	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(res),
	)

	// Set global tracer provider
	otel.SetTracerProvider(tp)

	return tp, nil
}

func main() {
	tp, err := initTracer()
	if err != nil {
		log.Fatal(err)
	}
	defer func() {
		if err := tp.Shutdown(context.Background()); err != nil {
			log.Printf("Error shutting down tracer provider: %v", err)
		}
	}()

	// Get tracer
	tracer := otel.Tracer("example")

	// Create a span
	ctx := context.Background()
	ctx, span := tracer.Start(ctx, "main-operation")
	defer span.End()

	// Add attributes
	span.SetAttributes(
		attribute.String("user.id", "user-123"),
		attribute.Int("items.count", 5),
	)

	// Do work
	doWork(ctx, tracer)

	fmt.Println("Trace completed")
}

func doWork(ctx context.Context, tracer trace.Tracer) {
	ctx, span := tracer.Start(ctx, "do-work")
	defer span.End()

	// Simulate work
	time.Sleep(100 * time.Millisecond)

	span.AddEvent("Work completed", trace.WithAttributes(
		attribute.String("status", "success"),
	))
}

Jaeger Integration

Jaeger Exporter Setup

package main

import (
	"context"
	"log"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/jaeger"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
)

func initJaegerTracer(serviceName string) (*sdktrace.TracerProvider, error) {
	// Create Jaeger exporter
	exporter, err := jaeger.New(
		jaeger.WithCollectorEndpoint(
			jaeger.WithEndpoint("http://localhost:14268/api/traces"),
		),
	)
	if err != nil {
		return nil, err
	}

	// Create resource
	res, err := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
			semconv.ServiceVersion("1.0.0"),
			semconv.DeploymentEnvironment("production"),
		),
	)
	if err != nil {
		return nil, err
	}

	// Create tracer provider with sampling
	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(res),
		sdktrace.WithSampler(sdktrace.AlwaysSample()),
	)

	otel.SetTracerProvider(tp)

	return tp, nil
}

func main() {
	tp, err := initJaegerTracer("my-go-service")
	if err != nil {
		log.Fatal(err)
	}
	defer tp.Shutdown(context.Background())

	// Use tracer
	tracer := otel.Tracer("my-component")

	ctx := context.Background()
	ctx, span := tracer.Start(ctx, "example-operation")
	defer span.End()

	// Your application logic here
}

Running Jaeger with Docker

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 14250:14250 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

Access Jaeger UI at: http://localhost:16686

HTTP Server Instrumentation

Automatic HTTP Middleware

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"

	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/trace"
)

func main() {
	// Initialize tracer (using previous initJaegerTracer)
	tp, err := initJaegerTracer("http-server")
	if err != nil {
		log.Fatal(err)
	}
	defer tp.Shutdown(context.Background())

	// Create handler
	handler := http.HandlerFunc(handleRequest)

	// Wrap with OpenTelemetry middleware
	wrappedHandler := otelhttp.NewHandler(handler, "http-server")

	// Start server
	log.Println("Server starting on :8080")
	log.Fatal(http.ListenAndServe(":8080", wrappedHandler))
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
	// Get span from context
	span := trace.SpanFromContext(r.Context())

	// Add custom attributes
	span.SetAttributes(
		attribute.String("user.agent", r.UserAgent()),
		attribute.String("http.client_ip", r.RemoteAddr),
	)

	// Do work
	processRequest(r.Context())

	fmt.Fprintf(w, "Request processed successfully")
}

func processRequest(ctx context.Context) {
	tracer := otel.Tracer("processor")

	ctx, span := tracer.Start(ctx, "process-request")
	defer span.End()

	// Simulate database call
	queryDatabase(ctx)

	// Simulate external API call
	callExternalAPI(ctx)
}

func queryDatabase(ctx context.Context) {
	tracer := otel.Tracer("database")

	ctx, span := tracer.Start(ctx, "query-database")
	defer span.End()

	span.SetAttributes(
		attribute.String("db.system", "postgresql"),
		attribute.String("db.statement", "SELECT * FROM users WHERE id = $1"),
	)

	// Simulate query
	time.Sleep(50 * time.Millisecond)
}

func callExternalAPI(ctx context.Context) {
	tracer := otel.Tracer("external")

	ctx, span := tracer.Start(ctx, "call-external-api")
	defer span.End()

	span.SetAttributes(
		attribute.String("http.url", "https://api.example.com/data"),
		attribute.String("http.method", "GET"),
	)

	// Simulate API call
	time.Sleep(100 * time.Millisecond)
}

HTTP Client Instrumentation

Traced HTTP Client

package main

import (
	"context"
	"fmt"
	"io"
	"log"
	"net/http"

	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
)

type TracedHTTPClient struct {
	client *http.Client
}

func NewTracedHTTPClient() *TracedHTTPClient {
	return &TracedHTTPClient{
		client: &http.Client{
			Transport: otelhttp.NewTransport(http.DefaultTransport),
		},
	}
}

func (c *TracedHTTPClient) Get(ctx context.Context, url string) (string, error) {
	tracer := otel.Tracer("http-client")

	ctx, span := tracer.Start(ctx, "http.get")
	defer span.End()

	span.SetAttributes(
		attribute.String("http.url", url),
		attribute.String("http.method", "GET"),
	)

	req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
	if err != nil {
		span.RecordError(err)
		return "", err
	}

	resp, err := c.client.Do(req)
	if err != nil {
		span.RecordError(err)
		return "", err
	}
	defer resp.Body.Close()

	span.SetAttributes(
		attribute.Int("http.status_code", resp.StatusCode),
	)

	body, err := io.ReadAll(resp.Body)
	if err != nil {
		span.RecordError(err)
		return "", err
	}

	return string(body), nil
}

func main() {
	tp, err := initJaegerTracer("http-client")
	if err != nil {
		log.Fatal(err)
	}
	defer tp.Shutdown(context.Background())

	client := NewTracedHTTPClient()

	ctx := context.Background()
	response, err := client.Get(ctx, "https://api.github.com/users/golang")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Response length: %d bytes\n", len(response))
}

Database Instrumentation

SQL Database Tracing

package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"

	// Database driver with tracing support
	_ "github.com/lib/pq"
)

type TracedDB struct {
	db *sql.DB
}

func NewTracedDB(connString string) (*TracedDB, error) {
	db, err := sql.Open("postgres", connString)
	if err != nil {
		return nil, err
	}

	return &TracedDB{db: db}, nil
}

func (tdb *TracedDB) QueryUser(ctx context.Context, userID int) (*User, error) {
	tracer := otel.Tracer("database")

	ctx, span := tracer.Start(ctx, "query-user")
	defer span.End()

	// Add database-specific attributes
	span.SetAttributes(
		semconv.DBSystemPostgreSQL,
		semconv.DBStatement("SELECT id, name, email FROM users WHERE id = $1"),
		attribute.Int("db.user_id", userID),
	)

	query := "SELECT id, name, email FROM users WHERE id = $1"
	row := tdb.db.QueryRowContext(ctx, query, userID)

	var user User
	err := row.Scan(&user.ID, &user.Name, &user.Email)
	if err != nil {
		span.RecordError(err)
		return nil, err
	}

	span.SetAttributes(
		attribute.String("db.result.name", user.Name),
	)

	return &user, nil
}

func (tdb *TracedDB) InsertUser(ctx context.Context, user *User) error {
	tracer := otel.Tracer("database")

	ctx, span := tracer.Start(ctx, "insert-user")
	defer span.End()

	span.SetAttributes(
		semconv.DBSystemPostgreSQL,
		semconv.DBStatement("INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id"),
		attribute.String("db.user.name", user.Name),
	)

	query := "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id"
	err := tdb.db.QueryRowContext(ctx, query, user.Name, user.Email).Scan(&user.ID)
	if err != nil {
		span.RecordError(err)
		return err
	}

	span.SetAttributes(
		attribute.Int("db.user.id", user.ID),
	)

	return nil
}

type User struct {
	ID    int
	Name  string
	Email string
}

Microservices Tracing

Service A (API Gateway)

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"

	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
)

func main() {
	tp, err := initJaegerTracer("api-gateway")
	if err != nil {
		log.Fatal(err)
	}
	defer tp.Shutdown(context.Background())

	mux := http.NewServeMux()
	mux.HandleFunc("/api/order", handleOrder)

	handler := otelhttp.NewHandler(mux, "api-gateway")

	log.Println("API Gateway starting on :8080")
	log.Fatal(http.ListenAndServe(":8080", handler))
}

func handleOrder(w http.ResponseWriter, r *http.Request) {
	ctx := r.Context()
	tracer := otel.Tracer("api-gateway")

	ctx, span := tracer.Start(ctx, "handle-order")
	defer span.End()

	orderID := r.URL.Query().Get("id")
	span.SetAttributes(attribute.String("order.id", orderID))

	// Call user service
	user, err := getUserInfo(ctx, "user-123")
	if err != nil {
		span.RecordError(err)
		http.Error(w, "Failed to get user info", http.StatusInternalServerError)
		return
	}

	// Call order service
	order, err := getOrderDetails(ctx, orderID)
	if err != nil {
		span.RecordError(err)
		http.Error(w, "Failed to get order", http.StatusInternalServerError)
		return
	}

	response := map[string]interface{}{
		"user":  user,
		"order": order,
	}

	w.Header().Set("Content-Type", "application/json")
	json.NewEncoder(w).Encode(response)
}

func getUserInfo(ctx context.Context, userID string) (map[string]string, error) {
	tracer := otel.Tracer("api-gateway")

	ctx, span := tracer.Start(ctx, "get-user-info")
	defer span.End()

	span.SetAttributes(attribute.String("user.id", userID))

	// Call user service at :8081
	client := &http.Client{
		Transport: otelhttp.NewTransport(http.DefaultTransport),
	}

	url := fmt.Sprintf("http://localhost:8081/user?id=%s", userID)
	req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
	if err != nil {
		return nil, err
	}

	resp, err := client.Do(req)
	if err != nil {
		span.RecordError(err)
		return nil, err
	}
	defer resp.Body.Close()

	var user map[string]string
	json.NewDecoder(resp.Body).Decode(&user)

	return user, nil
}

func getOrderDetails(ctx context.Context, orderID string) (map[string]interface{}, error) {
	tracer := otel.Tracer("api-gateway")

	ctx, span := tracer.Start(ctx, "get-order-details")
	defer span.End()

	span.SetAttributes(attribute.String("order.id", orderID))

	// Call order service at :8082
	client := &http.Client{
		Transport: otelhttp.NewTransport(http.DefaultTransport),
	}

	url := fmt.Sprintf("http://localhost:8082/order?id=%s", orderID)
	req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
	if err != nil {
		return nil, err
	}

	resp, err := client.Do(req)
	if err != nil {
		span.RecordError(err)
		return nil, err
	}
	defer resp.Body.Close()

	var order map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&order)

	return order, nil
}

Custom Instrumentation

Manual Span Management

package main

import (
	"context"
	"fmt"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/codes"
	"go.opentelemetry.io/otel/trace"
)

func processOrder(ctx context.Context, orderID string) error {
	tracer := otel.Tracer("order-processor")

	ctx, span := tracer.Start(ctx,
		"process-order",
		trace.WithSpanKind(trace.SpanKindInternal),
		trace.WithAttributes(
			attribute.String("order.id", orderID),
		),
	)
	defer span.End()

	// Add event
	span.AddEvent("Validation started")

	// Validate order
	if err := validateOrder(ctx, orderID); err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, "Validation failed")
		return err
	}

	span.AddEvent("Validation completed")

	// Process payment
	if err := processPayment(ctx, orderID); err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, "Payment failed")
		return err
	}

	// Ship order
	if err := shipOrder(ctx, orderID); err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, "Shipping failed")
		return err
	}

	span.SetStatus(codes.Ok, "Order processed successfully")
	return nil
}

func validateOrder(ctx context.Context, orderID string) error {
	tracer := otel.Tracer("order-processor")

	ctx, span := tracer.Start(ctx, "validate-order")
	defer span.End()

	// Simulate validation
	time.Sleep(50 * time.Millisecond)

	span.SetAttributes(
		attribute.Bool("validation.passed", true),
	)

	return nil
}

func processPayment(ctx context.Context, orderID string) error {
	tracer := otel.Tracer("payment-processor")

	ctx, span := tracer.Start(ctx, "process-payment")
	defer span.End()

	span.SetAttributes(
		attribute.String("payment.gateway", "stripe"),
		attribute.Float64("payment.amount", 99.99),
	)

	// Simulate payment
	time.Sleep(100 * time.Millisecond)

	return nil
}

func shipOrder(ctx context.Context, orderID string) error {
	tracer := otel.Tracer("shipping")

	ctx, span := tracer.Start(ctx, "ship-order")
	defer span.End()

	span.SetAttributes(
		attribute.String("shipping.carrier", "fedex"),
		attribute.String("shipping.tracking", "1Z999AA10123456784"),
	)

	// Simulate shipping
	time.Sleep(75 * time.Millisecond)

	return nil
}

Error Tracking

Comprehensive Error Recording

package main

import (
	"context"
	"errors"
	"fmt"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/codes"
)

var (
	ErrNotFound     = errors.New("resource not found")
	ErrUnauthorized = errors.New("unauthorized")
	ErrInvalidInput = errors.New("invalid input")
)

func fetchResource(ctx context.Context, resourceID string) error {
	tracer := otel.Tracer("resource-fetcher")

	ctx, span := tracer.Start(ctx, "fetch-resource")
	defer span.End()

	span.SetAttributes(
		attribute.String("resource.id", resourceID),
	)

	// Simulate error
	err := ErrNotFound

	if err != nil {
		// Record error with attributes
		span.RecordError(err,
			trace.WithAttributes(
				attribute.String("error.type", "not_found"),
				attribute.String("resource.id", resourceID),
			),
		)

		// Set span status
		span.SetStatus(codes.Error, err.Error())

		return fmt.Errorf("failed to fetch resource: %w", err)
	}

	span.SetStatus(codes.Ok, "")
	return nil
}

func handleError(ctx context.Context, err error) {
	tracer := otel.Tracer("error-handler")

	ctx, span := tracer.Start(ctx, "handle-error")
	defer span.End()

	// Classify error
	var errorType string
	switch {
	case errors.Is(err, ErrNotFound):
		errorType = "not_found"
	case errors.Is(err, ErrUnauthorized):
		errorType = "unauthorized"
	case errors.Is(err, ErrInvalidInput):
		errorType = "invalid_input"
	default:
		errorType = "unknown"
	}

	span.SetAttributes(
		attribute.String("error.type", errorType),
		attribute.String("error.message", err.Error()),
	)

	span.RecordError(err)
}

Sampling Strategies

Custom Sampling

package main

import (
	"go.opentelemetry.io/otel/sdk/trace"
)

func setupSampling() *trace.TracerProvider {
	// Always sample for development
	// sampler := trace.AlwaysSample()

	// Never sample (useful for debugging)
	// sampler := trace.NeverSample()

	// Sample 10% of traces
	sampler := trace.TraceIDRatioBased(0.1)

	// Parent-based sampling (inherit from parent span)
	// sampler := trace.ParentBased(trace.TraceIDRatioBased(0.1))

	tp := trace.NewTracerProvider(
		trace.WithSampler(sampler),
	)

	return tp
}

Best Practices

1. Consistent Naming

// Good: Consistent span naming
tracer.Start(ctx, "service.operation")
tracer.Start(ctx, "database.query")
tracer.Start(ctx, "http.request")

// Bad: Inconsistent naming
tracer.Start(ctx, "DoSomething")
tracer.Start(ctx, "query_db")
tracer.Start(ctx, "HTTP-Request")

2. Add Meaningful Attributes

span.SetAttributes(
	attribute.String("user.id", userID),
	attribute.Int("order.item_count", itemCount),
	attribute.Float64("order.total", total),
	attribute.Bool("order.expedited", true),
)

3. Use Span Events for Milestones

span.AddEvent("Validation started")
span.AddEvent("Payment processed", trace.WithAttributes(
	attribute.String("payment.id", paymentID),
))
span.AddEvent("Order shipped")

4. Always Defer span.End()

ctx, span := tracer.Start(ctx, "operation")
defer span.End() // Ensures span is closed even on panic

Performance Considerations

  • Sampling: Use appropriate sampling rates for production
  • Batch Export: Configure batch size and interval
  • Resource Limits: Set max spans per trace
  • Overhead: Tracing adds <1% overhead when properly configured
  • Context Size: Keep span attributes reasonably sized

Conclusion

Distributed tracing with OpenTelemetry provides crucial visibility into microservices architecture, enabling faster debugging and better understanding of system behavior.

Key Takeaways:

  • OpenTelemetry is the industry standard for distributed tracing
  • Automatic instrumentation for HTTP, gRPC, databases
  • Context propagation carries trace information across services
  • Jaeger provides excellent visualization of traces
  • Proper sampling reduces overhead while maintaining visibility

This completes the Go Concurrency Patterns series! Review the Series Overview to explore all patterns.


Previous: Go Generics Design Patterns Series: Go Concurrency Patterns