The Choreography Problem: Hope-Driven Development

For years, we’ve built distributed systems using choreography—Service A fires an event into the void, hoping Service B hears it. Service B processes it and fires another event, hoping Service C is listening. When something fails (and it will), we’re left scrambling through logs across multiple services, trying to piece together what happened.

This is hope-driven development, and it’s fundamentally broken.

Enter Temporal and the concept of Durable Execution—a paradigm shift that replaces hope with guarantees.

The Old Way: Choreography (Event-Driven)

What is Choreography?

In choreographed systems, services communicate through events. No central coordinator exists—each service reacts to events and publishes new ones. It’s decentralized, loosely coupled, and… a nightmare to debug.

Classic Example: E-commerce Order Flow

Customer places order
    ↓ (fires OrderCreated event)
Payment Service (hopes it receives event)
    ↓ (fires PaymentProcessed event)
Inventory Service (hopes it receives event)
    ↓ (fires InventoryReserved event)
Shipping Service (hopes it receives event)
    ↓ (fires OrderShipped event)
Notification Service (hopes it receives event)

The Choreography Implementation (Go)

// The old way - Choreography with events
package main

import (
    "context"
    "encoding/json"
    "log"
    "time"

    "github.com/segmentio/kafka-go"
)

// Events
type OrderCreated struct {
    OrderID    string    `json:"order_id"`
    CustomerID string    `json:"customer_id"`
    Amount     float64   `json:"amount"`
    Items      []string  `json:"items"`
    CreatedAt  time.Time `json:"created_at"`
}

type PaymentProcessed struct {
    OrderID       string `json:"order_id"`
    TransactionID string `json:"transaction_id"`
    Success       bool   `json:"success"`
}

type InventoryReserved struct {
    OrderID string   `json:"order_id"`
    Items   []string `json:"items"`
    Success bool     `json:"success"`
}

// Order Service - Fires and forgets
type OrderService struct {
    kafkaWriter *kafka.Writer
}

func (s *OrderService) CreateOrder(ctx context.Context, customerID string, amount float64, items []string) error {
    orderID := generateOrderID()

    event := OrderCreated{
        OrderID:    orderID,
        CustomerID: customerID,
        Amount:     amount,
        Items:      items,
        CreatedAt:  time.Now(),
    }

    data, _ := json.Marshal(event)

    // Fire the event and HOPE someone receives it
    err := s.kafkaWriter.WriteMessages(ctx, kafka.Message{
        Key:   []byte(orderID),
        Value: data,
        Topic: "order.created",
    })

    if err != nil {
        // What now? The order is created but event failed to publish
        log.Printf("Failed to publish event: %v", err)
        // Do we rollback? Retry? Give up?
        return err
    }

    log.Printf("Order created: %s (hopefully someone is listening...)", orderID)
    return nil
}

// Payment Service - Listens and hopes
type PaymentService struct {
    kafkaReader *kafka.Reader
    kafkaWriter *kafka.Writer
}

func (s *PaymentService) StartListening(ctx context.Context) {
    for {
        msg, err := s.kafkaReader.ReadMessage(ctx)
        if err != nil {
            log.Printf("Error reading message: %v", err)
            continue // What about the message we just lost?
        }

        var event OrderCreated
        if err := json.Unmarshal(msg.Value, &event); err != nil {
            log.Printf("Failed to unmarshal: %v", err)
            continue // Lost another message
        }

        // Process payment
        success := s.processPayment(ctx, event.OrderID, event.Amount)

        // Fire another event and HOPE
        paymentEvent := PaymentProcessed{
            OrderID:       event.OrderID,
            TransactionID: generateTransactionID(),
            Success:       success,
        }

        data, _ := json.Marshal(paymentEvent)
        s.kafkaWriter.WriteMessages(ctx, kafka.Message{
            Value: data,
            Topic: "payment.processed",
        })

        // If this fails, we charged the customer but nobody knows
        // If we crash here, was payment processed or not?
    }
}

func (s *PaymentService) processPayment(ctx context.Context, orderID string, amount float64) bool {
    // Call payment gateway
    // What if this times out?
    // What if it succeeds but we crash before publishing the event?
    // What if the payment gateway charges the customer but returns an error?

    time.Sleep(2 * time.Second) // Simulate payment processing
    return true // Or is it? Who knows!
}

// Inventory Service - More hoping
type InventoryService struct {
    kafkaReader *kafka.Reader
    kafkaWriter *kafka.Writer
}

func (s *InventoryService) StartListening(ctx context.Context) {
    for {
        msg, err := s.kafkaReader.ReadMessage(ctx)
        if err != nil {
            continue
        }

        var event PaymentProcessed
        json.Unmarshal(msg.Value, &event)

        if !event.Success {
            // Payment failed, but do we know if inventory was already reserved?
            // Do we need to unreserve? How do we know?
            continue
        }

        // Reserve inventory
        success := s.reserveInventory(ctx, event.OrderID)

        // Yet another event to fire and hope
        inventoryEvent := InventoryReserved{
            OrderID: event.OrderID,
            Success: success,
        }

        data, _ := json.Marshal(inventoryEvent)
        s.kafkaWriter.WriteMessages(ctx, kafka.Message{
            Value: data,
            Topic: "inventory.reserved",
        })
    }
}

func (s *InventoryService) reserveInventory(ctx context.Context, orderID string) bool {
    // What if we crash after reserving but before publishing event?
    // Inventory is locked forever!
    return true
}

The Problems with Choreography

1. No Visibility

Where is order #12345 in the process? You have to:

  • Check Kafka for events
  • Search logs across 5+ services
  • Hope timestamps align
  • Pray nothing got lost

2. Error Handling Nightmare

  • Payment succeeded but event publish failed—now what?
  • Inventory reserved but service crashed before publishing—inventory locked forever
  • Network partition—events arrive out of order
  • Duplicate events—did we charge the customer twice?

3. No Retries or Timeouts

  • How long do we wait for the next service?
  • What if a service is down?
  • Who retries? When? How many times?

4. Debugging is Detective Work

[2025-01-18 10:23:15] OrderService: Created order ABC123
[2025-01-18 10:23:16] PaymentService: Processing payment for ABC123
[2025-01-18 10:23:18] PaymentService: Payment successful
[2025-01-18 10:23:19] ???
[2025-01-18 10:24:00] Customer: Where's my order?

What happened between 10:23:19 and 10:24:00? Nobody knows.

5. Compensation is Manual

When something fails halfway through:

  • Who orchestrates the rollback?
  • How do we know what to roll back?
  • What if rollback fails?

The New Way: Durable Execution with Temporal

What is Temporal?

Temporal provides durable execution—your code runs to completion, guaranteed, even if:

  • Servers crash
  • Networks partition
  • Processes restart
  • Days or months pass

It’s not event-driven. It’s not choreography. It’s orchestration with guarantees.

Key Concepts

1. Workflows: Durable functions that coordinate business logic 2. Activities: Individual units of work that can fail and retry 3. Workers: Execute workflows and activities 4. Temporal Server: Ensures durability and orchestration

The Temporal Implementation (Go)

// The new way - Durable Execution with Temporal
package main

import (
    "context"
    "fmt"
    "time"

    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
    "go.temporal.io/sdk/workflow"
)

// ===== Domain Models =====
type OrderRequest struct {
    CustomerID string
    Amount     float64
    Items      []string
}

type OrderResult struct {
    OrderID       string
    TransactionID string
    TrackingID    string
    Status        string
}

// ===== Workflow Definition =====
// This is the ORCHESTRATOR - it coordinates everything
func OrderWorkflow(ctx workflow.Context, req OrderRequest) (*OrderResult, error) {
    logger := workflow.GetLogger(ctx)
    orderID := workflow.Now(ctx).Format("ORD-20060102-150405")

    // Configuration
    activityOptions := workflow.ActivityOptions{
        StartToCloseTimeout: 30 * time.Second,
        RetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    time.Second,
            BackoffCoefficient: 2.0,
            MaximumInterval:    time.Minute,
            MaximumAttempts:    3,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, activityOptions)

    logger.Info("Starting order workflow", "orderID", orderID)

    var result OrderResult
    result.OrderID = orderID

    // Step 1: Process Payment
    var transactionID string
    err := workflow.ExecuteActivity(ctx, ProcessPayment, orderID, req.Amount).Get(ctx, &transactionID)
    if err != nil {
        logger.Error("Payment failed", "error", err)
        result.Status = "payment_failed"
        return &result, err
    }
    result.TransactionID = transactionID
    logger.Info("Payment successful", "transactionID", transactionID)

    // Step 2: Reserve Inventory
    err = workflow.ExecuteActivity(ctx, ReserveInventory, orderID, req.Items).Get(ctx, nil)
    if err != nil {
        logger.Error("Inventory reservation failed", "error", err)

        // COMPENSATION: Refund payment automatically
        var refundID string
        workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)

        result.Status = "inventory_failed_refunded"
        return &result, err
    }
    logger.Info("Inventory reserved")

    // Step 3: Create Shipment
    var trackingID string
    err = workflow.ExecuteActivity(ctx, CreateShipment, orderID, req.Items).Get(ctx, &trackingID)
    if err != nil {
        logger.Error("Shipment creation failed", "error", err)

        // COMPENSATION: Release inventory and refund
        workflow.ExecuteActivity(ctx, ReleaseInventory, orderID).Get(ctx, nil)
        var refundID string
        workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)

        result.Status = "shipping_failed_compensated"
        return &result, err
    }
    result.TrackingID = trackingID
    logger.Info("Shipment created", "trackingID", trackingID)

    // Step 4: Send Confirmation
    err = workflow.ExecuteActivity(ctx, SendConfirmationEmail, req.CustomerID, orderID, trackingID).Get(ctx, nil)
    if err != nil {
        // Email failure is not critical - we'll retry but not rollback
        logger.Warn("Failed to send confirmation email", "error", err)
    }

    result.Status = "completed"
    logger.Info("Order workflow completed successfully")
    return &result, nil
}

// ===== Activity Implementations =====
// These are the ACTUAL WORK - each can fail and retry independently

func ProcessPayment(ctx context.Context, orderID string, amount float64) (string, error) {
    // Call actual payment gateway
    // If this fails, Temporal will retry based on retry policy
    // If server crashes mid-execution, Temporal will retry on another worker

    fmt.Printf("Processing payment for order %s: $%.2f\n", orderID, amount)
    time.Sleep(2 * time.Second) // Simulate payment processing

    // Simulate occasional failures
    // Temporal will automatically retry

    transactionID := fmt.Sprintf("TXN-%d", time.Now().Unix())
    fmt.Printf("Payment successful: %s\n", transactionID)
    return transactionID, nil
}

func RefundPayment(ctx context.Context, transactionID string) (string, error) {
    fmt.Printf("Refunding payment: %s\n", transactionID)
    time.Sleep(1 * time.Second)

    refundID := fmt.Sprintf("REF-%d", time.Now().Unix())
    fmt.Printf("Refund successful: %s\n", refundID)
    return refundID, nil
}

func ReserveInventory(ctx context.Context, orderID string, items []string) error {
    fmt.Printf("Reserving inventory for order %s: %v\n", orderID, items)
    time.Sleep(1 * time.Second)

    // Simulate inventory check
    // If this fails, payment will be automatically refunded

    fmt.Printf("Inventory reserved for order %s\n", orderID)
    return nil
}

func ReleaseInventory(ctx context.Context, orderID string) error {
    fmt.Printf("Releasing inventory for order %s\n", orderID)
    time.Sleep(500 * time.Millisecond)
    return nil
}

func CreateShipment(ctx context.Context, orderID string, items []string) (string, error) {
    fmt.Printf("Creating shipment for order %s\n", orderID)
    time.Sleep(1500 * time.Millisecond)

    trackingID := fmt.Sprintf("TRACK-%d", time.Now().Unix())
    fmt.Printf("Shipment created: %s\n", trackingID)
    return trackingID, nil
}

func SendConfirmationEmail(ctx context.Context, customerID, orderID, trackingID string) error {
    fmt.Printf("Sending confirmation email to customer %s\n", customerID)
    time.Sleep(500 * time.Millisecond)
    return nil
}

// ===== Worker Setup =====
func main() {
    // Create Temporal client
    c, err := client.Dial(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        panic(err)
    }
    defer c.Close()

    // Create worker
    w := worker.New(c, "order-processing", worker.Options{})

    // Register workflow and activities
    w.RegisterWorkflow(OrderWorkflow)
    w.RegisterActivity(ProcessPayment)
    w.RegisterActivity(RefundPayment)
    w.RegisterActivity(ReserveInventory)
    w.RegisterActivity(ReleaseInventory)
    w.RegisterActivity(CreateShipment)
    w.RegisterActivity(SendConfirmationEmail)

    // Start worker
    err = w.Run(worker.InterruptCh())
    if err != nil {
        panic(err)
    }
}

Starting a Workflow

// client/main.go
package main

import (
    "context"
    "fmt"
    "log"

    "go.temporal.io/sdk/client"
)

func main() {
    c, err := client.Dial(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        log.Fatal(err)
    }
    defer c.Close()

    // Start workflow
    workflowOptions := client.StartWorkflowOptions{
        ID:        "order-12345",
        TaskQueue: "order-processing",
    }

    req := OrderRequest{
        CustomerID: "customer-456",
        Amount:     129.99,
        Items:      []string{"laptop", "mouse", "keyboard"},
    }

    we, err := c.ExecuteWorkflow(context.Background(), workflowOptions, OrderWorkflow, req)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Started workflow ID: %s, RunID: %s\n", we.GetID(), we.GetRunID())

    // Wait for result
    var result OrderResult
    err = we.Get(context.Background(), &result)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Order completed: %+v\n", result)
}

Why Temporal Changes Everything

1. Complete Visibility

# See the entire workflow state
temporal workflow show -w order-12345

# Output:
# - Current step: "CreateShipment"
# - Payment: Completed (TXN-12345)
# - Inventory: Reserved
# - Shipment: In Progress (Attempt 2/3)

No more log diving. No more guessing.

2. Automatic Retries

RetryPolicy: &temporal.RetryPolicy{
    InitialInterval:    time.Second,
    BackoffCoefficient: 2.0,
    MaximumInterval:    time.Minute,
    MaximumAttempts:    3,
}

Activities retry automatically. You configure it once, Temporal handles it forever.

3. Built-in Compensation

// If inventory fails, automatically refund
if err != nil {
    workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, nil)
    return err
}

Compensation is explicit, testable, and guaranteed to run.

4. Survives Crashes

  • Server crashes during payment? Temporal continues from where it left off
  • Process restarts? Workflow resumes automatically
  • Network partition? Workflow waits and continues when connectivity returns

5. Human-in-the-Loop Workflows

// Wait for approval (hours, days, weeks)
var approved bool
err := workflow.ExecuteActivity(ctx, RequestApproval, orderID).Get(ctx, &approved)

// Workflow can wait for MONTHS without holding resources
workflow.Sleep(ctx, 30*24*time.Hour) // Wait 30 days

if !approved {
    // Compensate
}

Try doing this with event choreography!

6. Testability

// Test the entire workflow
func TestOrderWorkflow(t *testing.T) {
    testSuite := &testsuite.WorkflowTestSuite{}
    env := testSuite.NewTestWorkflowEnvironment()

    env.RegisterActivity(ProcessPayment)
    env.RegisterActivity(ReserveInventory)

    // Mock activity behavior
    env.OnActivity(ProcessPayment, mock.Anything, mock.Anything).Return("TXN-123", nil)
    env.OnActivity(ReserveInventory, mock.Anything, mock.Anything).Return(nil)

    env.ExecuteWorkflow(OrderWorkflow, OrderRequest{
        CustomerID: "test",
        Amount:     100,
    })

    require.True(t, env.IsWorkflowCompleted())
}

Real-World Use Cases

1. E-commerce Order Processing

Multiple steps with payments, inventory, shipping—Temporal ensures completion.

2. User Onboarding

Multi-day workflows with email verification, document uploads, approvals.

3. ETL Pipelines

Extract → Transform → Load with automatic retries and error handling.

4. Subscription Management

Trial periods, payment retries, cancellations—all in one durable workflow.

5. Regulatory Compliance

Multi-step approval processes with complete audit trails.

6. IoT Device Provisioning

Device registration, firmware updates, configuration—with retries over days.

Choreography vs Durable Execution

Aspect Choreography (Old) Durable Execution (Temporal)
Coordination Hope-driven Guaranteed
Visibility Log diving Built-in UI
Retries Manual Automatic
Compensation Ad-hoc Explicit
State Scattered Centralized
Debugging Nightmare Simple
Testing Complex Built-in
Long-running Impossible Native
Crash Recovery Manual Automatic

When to Use Temporal

Use Temporal when:

  • ✅ Multi-step workflows with dependencies
  • ✅ Need guaranteed completion
  • ✅ Compensation/rollback required
  • ✅ Long-running processes (hours, days, months)
  • ✅ Human-in-the-loop workflows
  • ✅ Need complete visibility and audit trails
  • ✅ Complex error handling and retries

Stick with Events when:

  • ❌ Simple, fire-and-forget notifications
  • ❌ True decoupling is more important than guarantees
  • ❌ High-throughput, low-latency event streaming
  • ❌ Broadcasting to many consumers

Advanced Patterns

Child Workflows

// Parent workflow spawns child workflows
func BulkOrderWorkflow(ctx workflow.Context, orders []OrderRequest) error {
    for _, order := range orders {
        childOptions := workflow.ChildWorkflowOptions{
            WorkflowID: fmt.Sprintf("order-%s", order.OrderID),
        }
        ctx := workflow.WithChildOptions(ctx, childOptions)

        // Each order runs as independent workflow
        workflow.ExecuteChildWorkflow(ctx, OrderWorkflow, order)
    }
    return nil
}

Signals (External Events)

// Workflow can receive external signals
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
    // Wait for payment confirmation signal
    var paymentConfirmed bool
    workflow.GetSignalChannel(ctx, "payment-confirmed").Receive(ctx, &paymentConfirmed)

    if !paymentConfirmed {
        return errors.New("payment not confirmed")
    }

    // Continue workflow
}

// Send signal from outside
client.SignalWorkflow(ctx, "order-12345", "", "payment-confirmed", true)

Queries (Read State)

// Query workflow state without modifying it
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
    var currentStatus string

    // Register query handler
    workflow.SetQueryHandler(ctx, "get-status", func() (string, error) {
        return currentStatus, nil
    })

    currentStatus = "processing-payment"
    // ... rest of workflow
}

// Query from outside
var status string
client.QueryWorkflow(ctx, "order-12345", "", "get-status", &status)
fmt.Printf("Current status: %s\n", status)

Getting Started

1. Install Temporal Server:

# Using Docker
curl -L https://temporal.io/docker-compose.yml | docker-compose -f - up

# Or using Temporal CLI
brew install temporal
temporal server start-dev

2. Install Temporal SDK:

go get go.temporal.io/sdk

3. Access Temporal UI:

http://localhost:8233

See all workflows, execution history, current state, and more.

Conclusion

Choreography was a noble experiment, but it’s fundamentally flawed. Services firing events into the void, hoping others receive them, with no visibility, no guarantees, and nightmare debugging.

Temporal’s Durable Execution changes the game:

  • ✅ Workflows run to completion, guaranteed
  • ✅ Automatic retries and error handling
  • ✅ Built-in compensation and rollback
  • ✅ Complete visibility and audit trails
  • ✅ Survives crashes, restarts, and partitions
  • ✅ Simple testing and debugging
  • ✅ Human-in-the-loop support
  • ✅ Long-running workflows (days, months, years)

Stop hoping. Start guaranteeing.

The shift from choreography to durable execution isn’t just a technical upgrade—it’s a fundamental rethinking of how we build reliable distributed systems. Temporal makes the complex simple and the impossible possible.

Additional Resources