The Choreography Problem: Hope-Driven Development

For years, we’ve built distributed systems using choreography-Service A fires an event into the void, hoping Service B hears it. Service B processes it and fires another event, hoping Service C is listening. When something fails (and it will), we’re left scrambling through logs across multiple services, trying to piece together what happened.

This is hope-driven development, and it’s fundamentally broken.

Enter Temporal and the concept of Durable Execution-a paradigm shift that replaces hope with guarantees.

The Old Way: Choreography (Event-Driven)

What is Choreography?

In choreographed systems, services communicate through events. No central coordinator exists-each service reacts to events and publishes new ones. It’s decentralized, loosely coupled, and… a nightmare to debug.

Classic Example: E-commerce Order Flow

Customer places order
    ↓ (fires OrderCreated event)
Payment Service (hopes it receives event)
    ↓ (fires PaymentProcessed event)
Inventory Service (hopes it receives event)
    ↓ (fires InventoryReserved event)
Shipping Service (hopes it receives event)
    ↓ (fires OrderShipped event)
Notification Service (hopes it receives event)

The Choreography Implementation (Go)

// The old way - Choreography with events
package main

import (
    "context"
    "encoding/json"
    "log"
    "time"

    "github.com/segmentio/kafka-go"
)

// Events
type OrderCreated struct {
    OrderID    string    `json:"order_id"`
    CustomerID string    `json:"customer_id"`
    Amount     float64   `json:"amount"`
    Items      []string  `json:"items"`
    CreatedAt  time.Time `json:"created_at"`
}

type PaymentProcessed struct {
    OrderID       string `json:"order_id"`
    TransactionID string `json:"transaction_id"`
    Success       bool   `json:"success"`
}

type InventoryReserved struct {
    OrderID string   `json:"order_id"`
    Items   []string `json:"items"`
    Success bool     `json:"success"`
}

// Order Service - Fires and forgets
type OrderService struct {
    kafkaWriter *kafka.Writer
}

func (s *OrderService) CreateOrder(ctx context.Context, customerID string, amount float64, items []string) error {
    orderID := generateOrderID()

    event := OrderCreated{
        OrderID:    orderID,
        CustomerID: customerID,
        Amount:     amount,
        Items:      items,
        CreatedAt:  time.Now(),
    }

    data, _ := json.Marshal(event)

    // Fire the event and HOPE someone receives it
    err := s.kafkaWriter.WriteMessages(ctx, kafka.Message{
        Key:   []byte(orderID),
        Value: data,
        Topic: "order.created",
    })

    if err != nil {
        // What now? The order is created but event failed to publish
        log.Printf("Failed to publish event: %v", err)
        // Do we rollback? Retry? Give up?
        return err
    }

    log.Printf("Order created: %s (hopefully someone is listening...)", orderID)
    return nil
}

// Payment Service - Listens and hopes
type PaymentService struct {
    kafkaReader *kafka.Reader
    kafkaWriter *kafka.Writer
}

func (s *PaymentService) StartListening(ctx context.Context) {
    for {
        msg, err := s.kafkaReader.ReadMessage(ctx)
        if err != nil {
            log.Printf("Error reading message: %v", err)
            continue // What about the message we just lost?
        }

        var event OrderCreated
        if err := json.Unmarshal(msg.Value, &event); err != nil {
            log.Printf("Failed to unmarshal: %v", err)
            continue // Lost another message
        }

        // Process payment
        success := s.processPayment(ctx, event.OrderID, event.Amount)

        // Fire another event and HOPE
        paymentEvent := PaymentProcessed{
            OrderID:       event.OrderID,
            TransactionID: generateTransactionID(),
            Success:       success,
        }

        data, _ := json.Marshal(paymentEvent)
        s.kafkaWriter.WriteMessages(ctx, kafka.Message{
            Value: data,
            Topic: "payment.processed",
        })

        // If this fails, we charged the customer but nobody knows
        // If we crash here, was payment processed or not?
    }
}

func (s *PaymentService) processPayment(ctx context.Context, orderID string, amount float64) bool {
    // Call payment gateway
    // What if this times out?
    // What if it succeeds but we crash before publishing the event?
    // What if the payment gateway charges the customer but returns an error?

    time.Sleep(2 * time.Second) // Simulate payment processing
    return true // Or is it? Who knows!
}

// Inventory Service - More hoping
type InventoryService struct {
    kafkaReader *kafka.Reader
    kafkaWriter *kafka.Writer
}

func (s *InventoryService) StartListening(ctx context.Context) {
    for {
        msg, err := s.kafkaReader.ReadMessage(ctx)
        if err != nil {
            continue
        }

        var event PaymentProcessed
        json.Unmarshal(msg.Value, &event)

        if !event.Success {
            // Payment failed, but do we know if inventory was already reserved?
            // Do we need to unreserve? How do we know?
            continue
        }

        // Reserve inventory
        success := s.reserveInventory(ctx, event.OrderID)

        // Yet another event to fire and hope
        inventoryEvent := InventoryReserved{
            OrderID: event.OrderID,
            Success: success,
        }

        data, _ := json.Marshal(inventoryEvent)
        s.kafkaWriter.WriteMessages(ctx, kafka.Message{
            Value: data,
            Topic: "inventory.reserved",
        })
    }
}

func (s *InventoryService) reserveInventory(ctx context.Context, orderID string) bool {
    // What if we crash after reserving but before publishing event?
    // Inventory is locked forever!
    return true
}

The Problems with Choreography

1. No Visibility

Where is order #12345 in the process? You have to:

  • Check Kafka for events
  • Search logs across 5+ services
  • Hope timestamps align
  • Pray nothing got lost

2. Error Handling Nightmare

  • Payment succeeded but event publish failed-now what?
  • Inventory reserved but service crashed before publishing-inventory locked forever
  • Network partition-events arrive out of order
  • Duplicate events-did we charge the customer twice?

3. No Retries or Timeouts

  • How long do we wait for the next service?
  • What if a service is down?
  • Who retries? When? How many times?

4. Debugging is Detective Work

[2025-01-18 10:23:15] OrderService: Created order ABC123
[2025-01-18 10:23:16] PaymentService: Processing payment for ABC123
[2025-01-18 10:23:18] PaymentService: Payment successful
[2025-01-18 10:23:19] ???
[2025-01-18 10:24:00] Customer: Where's my order?

What happened between 10:23:19 and 10:24:00? Nobody knows.

5. Compensation is Manual

When something fails halfway through:

  • Who orchestrates the rollback?
  • How do we know what to roll back?
  • What if rollback fails?

The New Way: Durable Execution with Temporal

What is Temporal?

Temporal provides durable execution-your code runs to completion, guaranteed, even if:

  • Servers crash
  • Networks partition
  • Processes restart
  • Days or months pass

It’s not event-driven. It’s not choreography. It’s orchestration with guarantees.

Key Concepts

1. Workflows: Durable functions that coordinate business logic 2. Activities: Individual units of work that can fail and retry 3. Workers: Execute workflows and activities 4. Temporal Server: Ensures durability and orchestration

The Temporal Implementation (Go)

// The new way - Durable Execution with Temporal
package main

import (
    "context"
    "fmt"
    "time"

    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
    "go.temporal.io/sdk/workflow"
)

// ===== Domain Models =====
type OrderRequest struct {
    CustomerID string
    Amount     float64
    Items      []string
}

type OrderResult struct {
    OrderID       string
    TransactionID string
    TrackingID    string
    Status        string
}

// ===== Workflow Definition =====
// This is the ORCHESTRATOR - it coordinates everything
func OrderWorkflow(ctx workflow.Context, req OrderRequest) (*OrderResult, error) {
    logger := workflow.GetLogger(ctx)
    orderID := workflow.Now(ctx).Format("ORD-20060102-150405")

    // Configuration
    activityOptions := workflow.ActivityOptions{
        StartToCloseTimeout: 30 * time.Second,
        RetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    time.Second,
            BackoffCoefficient: 2.0,
            MaximumInterval:    time.Minute,
            MaximumAttempts:    3,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, activityOptions)

    logger.Info("Starting order workflow", "orderID", orderID)

    var result OrderResult
    result.OrderID = orderID

    // Step 1: Process Payment
    var transactionID string
    err := workflow.ExecuteActivity(ctx, ProcessPayment, orderID, req.Amount).Get(ctx, &transactionID)
    if err != nil {
        logger.Error("Payment failed", "error", err)
        result.Status = "payment_failed"
        return &result, err
    }
    result.TransactionID = transactionID
    logger.Info("Payment successful", "transactionID", transactionID)

    // Step 2: Reserve Inventory
    err = workflow.ExecuteActivity(ctx, ReserveInventory, orderID, req.Items).Get(ctx, nil)
    if err != nil {
        logger.Error("Inventory reservation failed", "error", err)

        // COMPENSATION: Refund payment automatically
        var refundID string
        workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)

        result.Status = "inventory_failed_refunded"
        return &result, err
    }
    logger.Info("Inventory reserved")

    // Step 3: Create Shipment
    var trackingID string
    err = workflow.ExecuteActivity(ctx, CreateShipment, orderID, req.Items).Get(ctx, &trackingID)
    if err != nil {
        logger.Error("Shipment creation failed", "error", err)

        // COMPENSATION: Release inventory and refund
        workflow.ExecuteActivity(ctx, ReleaseInventory, orderID).Get(ctx, nil)
        var refundID string
        workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)

        result.Status = "shipping_failed_compensated"
        return &result, err
    }
    result.TrackingID = trackingID
    logger.Info("Shipment created", "trackingID", trackingID)

    // Step 4: Send Confirmation
    err = workflow.ExecuteActivity(ctx, SendConfirmationEmail, req.CustomerID, orderID, trackingID).Get(ctx, nil)
    if err != nil {
        // Email failure is not critical - we'll retry but not rollback
        logger.Warn("Failed to send confirmation email", "error", err)
    }

    result.Status = "completed"
    logger.Info("Order workflow completed successfully")
    return &result, nil
}

// ===== Activity Implementations =====
// These are the ACTUAL WORK - each can fail and retry independently

func ProcessPayment(ctx context.Context, orderID string, amount float64) (string, error) {
    // Call actual payment gateway
    // If this fails, Temporal will retry based on retry policy
    // If server crashes mid-execution, Temporal will retry on another worker

    fmt.Printf("Processing payment for order %s: $%.2f\n", orderID, amount)
    time.Sleep(2 * time.Second) // Simulate payment processing

    // Simulate occasional failures
    // Temporal will automatically retry

    transactionID := fmt.Sprintf("TXN-%d", time.Now().Unix())
    fmt.Printf("Payment successful: %s\n", transactionID)
    return transactionID, nil
}

func RefundPayment(ctx context.Context, transactionID string) (string, error) {
    fmt.Printf("Refunding payment: %s\n", transactionID)
    time.Sleep(1 * time.Second)

    refundID := fmt.Sprintf("REF-%d", time.Now().Unix())
    fmt.Printf("Refund successful: %s\n", refundID)
    return refundID, nil
}

func ReserveInventory(ctx context.Context, orderID string, items []string) error {
    fmt.Printf("Reserving inventory for order %s: %v\n", orderID, items)
    time.Sleep(1 * time.Second)

    // Simulate inventory check
    // If this fails, payment will be automatically refunded

    fmt.Printf("Inventory reserved for order %s\n", orderID)
    return nil
}

func ReleaseInventory(ctx context.Context, orderID string) error {
    fmt.Printf("Releasing inventory for order %s\n", orderID)
    time.Sleep(500 * time.Millisecond)
    return nil
}

func CreateShipment(ctx context.Context, orderID string, items []string) (string, error) {
    fmt.Printf("Creating shipment for order %s\n", orderID)
    time.Sleep(1500 * time.Millisecond)

    trackingID := fmt.Sprintf("TRACK-%d", time.Now().Unix())
    fmt.Printf("Shipment created: %s\n", trackingID)
    return trackingID, nil
}

func SendConfirmationEmail(ctx context.Context, customerID, orderID, trackingID string) error {
    fmt.Printf("Sending confirmation email to customer %s\n", customerID)
    time.Sleep(500 * time.Millisecond)
    return nil
}

// ===== Worker Setup =====
func main() {
    // Create Temporal client
    c, err := client.Dial(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        panic(err)
    }
    defer c.Close()

    // Create worker
    w := worker.New(c, "order-processing", worker.Options{})

    // Register workflow and activities
    w.RegisterWorkflow(OrderWorkflow)
    w.RegisterActivity(ProcessPayment)
    w.RegisterActivity(RefundPayment)
    w.RegisterActivity(ReserveInventory)
    w.RegisterActivity(ReleaseInventory)
    w.RegisterActivity(CreateShipment)
    w.RegisterActivity(SendConfirmationEmail)

    // Start worker
    err = w.Run(worker.InterruptCh())
    if err != nil {
        panic(err)
    }
}

Starting a Workflow

// client/main.go
package main

import (
    "context"
    "fmt"
    "log"

    "go.temporal.io/sdk/client"
)

func main() {
    c, err := client.Dial(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        log.Fatal(err)
    }
    defer c.Close()

    // Start workflow
    workflowOptions := client.StartWorkflowOptions{
        ID:        "order-12345",
        TaskQueue: "order-processing",
    }

    req := OrderRequest{
        CustomerID: "customer-456",
        Amount:     129.99,
        Items:      []string{"laptop", "mouse", "keyboard"},
    }

    we, err := c.ExecuteWorkflow(context.Background(), workflowOptions, OrderWorkflow, req)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Started workflow ID: %s, RunID: %s\n", we.GetID(), we.GetRunID())

    // Wait for result
    var result OrderResult
    err = we.Get(context.Background(), &result)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Order completed: %+v\n", result)
}

Why Temporal Changes Everything

1. Complete Visibility

# See the entire workflow state
temporal workflow show -w order-12345

# Output:
# - Current step: "CreateShipment"
# - Payment: Completed (TXN-12345)
# - Inventory: Reserved
# - Shipment: In Progress (Attempt 2/3)

No more log diving. No more guessing.

2. Automatic Retries

RetryPolicy: &temporal.RetryPolicy{
    InitialInterval:    time.Second,
    BackoffCoefficient: 2.0,
    MaximumInterval:    time.Minute,
    MaximumAttempts:    3,
}

Activities retry automatically. You configure it once, Temporal handles it forever.

3. Built-in Compensation

// If inventory fails, automatically refund
if err != nil {
    workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, nil)
    return err
}

Compensation is explicit, testable, and guaranteed to run.

4. Survives Crashes

  • Server crashes during payment? Temporal continues from where it left off
  • Process restarts? Workflow resumes automatically
  • Network partition? Workflow waits and continues when connectivity returns

5. Human-in-the-Loop Workflows

// Wait for approval (hours, days, weeks)
var approved bool
err := workflow.ExecuteActivity(ctx, RequestApproval, orderID).Get(ctx, &approved)

// Workflow can wait for MONTHS without holding resources
workflow.Sleep(ctx, 30*24*time.Hour) // Wait 30 days

if !approved {
    // Compensate
}

Try doing this with event choreography!

6. Testability

// Test the entire workflow
func TestOrderWorkflow(t *testing.T) {
    testSuite := &testsuite.WorkflowTestSuite{}
    env := testSuite.NewTestWorkflowEnvironment()

    env.RegisterActivity(ProcessPayment)
    env.RegisterActivity(ReserveInventory)

    // Mock activity behavior
    env.OnActivity(ProcessPayment, mock.Anything, mock.Anything).Return("TXN-123", nil)
    env.OnActivity(ReserveInventory, mock.Anything, mock.Anything).Return(nil)

    env.ExecuteWorkflow(OrderWorkflow, OrderRequest{
        CustomerID: "test",
        Amount:     100,
    })

    require.True(t, env.IsWorkflowCompleted())
}

Real-World Use Cases

1. E-commerce Order Processing

Multiple steps with payments, inventory, shipping-Temporal ensures completion.

2. User Onboarding

Multi-day workflows with email verification, document uploads, approvals.

3. ETL Pipelines

Extract → Transform → Load with automatic retries and error handling.

4. Subscription Management

Trial periods, payment retries, cancellations-all in one durable workflow.

5. Regulatory Compliance

Multi-step approval processes with complete audit trails.

6. IoT Device Provisioning

Device registration, firmware updates, configuration-with retries over days.

Choreography vs Durable Execution

Aspect Choreography (Old) Durable Execution (Temporal)
Coordination Hope-driven Guaranteed
Visibility Log diving Built-in UI
Retries Manual Automatic
Compensation Ad-hoc Explicit
State Scattered Centralized
Debugging Nightmare Simple
Testing Complex Built-in
Long-running Impossible Native
Crash Recovery Manual Automatic

When to Use Temporal

Use Temporal when:

  • ✅ Multi-step workflows with dependencies
  • ✅ Need guaranteed completion
  • ✅ Compensation/rollback required
  • ✅ Long-running processes (hours, days, months)
  • ✅ Human-in-the-loop workflows
  • ✅ Need complete visibility and audit trails
  • ✅ Complex error handling and retries

Stick with Events when:

  • ❌ Simple, fire-and-forget notifications
  • ❌ True decoupling is more important than guarantees
  • ❌ High-throughput, low-latency event streaming
  • ❌ Broadcasting to many consumers

Advanced Patterns

Child Workflows

// Parent workflow spawns child workflows
func BulkOrderWorkflow(ctx workflow.Context, orders []OrderRequest) error {
    for _, order := range orders {
        childOptions := workflow.ChildWorkflowOptions{
            WorkflowID: fmt.Sprintf("order-%s", order.OrderID),
        }
        ctx := workflow.WithChildOptions(ctx, childOptions)

        // Each order runs as independent workflow
        workflow.ExecuteChildWorkflow(ctx, OrderWorkflow, order)
    }
    return nil
}

Signals (External Events)

// Workflow can receive external signals
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
    // Wait for payment confirmation signal
    var paymentConfirmed bool
    workflow.GetSignalChannel(ctx, "payment-confirmed").Receive(ctx, &paymentConfirmed)

    if !paymentConfirmed {
        return errors.New("payment not confirmed")
    }

    // Continue workflow
}

// Send signal from outside
client.SignalWorkflow(ctx, "order-12345", "", "payment-confirmed", true)

Queries (Read State)

// Query workflow state without modifying it
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
    var currentStatus string

    // Register query handler
    workflow.SetQueryHandler(ctx, "get-status", func() (string, error) {
        return currentStatus, nil
    })

    currentStatus = "processing-payment"
    // ... rest of workflow
}

// Query from outside
var status string
client.QueryWorkflow(ctx, "order-12345", "", "get-status", &status)
fmt.Printf("Current status: %s\n", status)

Getting Started

1. Install Temporal Server:

# Using Docker
curl -L https://temporal.io/docker-compose.yml | docker-compose -f - up

# Or using Temporal CLI
brew install temporal
temporal server start-dev

2. Install Temporal SDK:

go get go.temporal.io/sdk

3. Access Temporal UI:

http://localhost:8233

See all workflows, execution history, current state, and more.

Conclusion

Choreography was a noble experiment, but it’s fundamentally flawed. Services firing events into the void, hoping others receive them, with no visibility, no guarantees, and nightmare debugging.

Temporal’s Durable Execution changes the game:

  • ✅ Workflows run to completion, guaranteed
  • ✅ Automatic retries and error handling
  • ✅ Built-in compensation and rollback
  • ✅ Complete visibility and audit trails
  • ✅ Survives crashes, restarts, and partitions
  • ✅ Simple testing and debugging
  • ✅ Human-in-the-loop support
  • ✅ Long-running workflows (days, months, years)

Stop hoping. Start guaranteeing.

The shift from choreography to durable execution isn’t just a technical upgrade-it’s a fundamental rethinking of how we build reliable distributed systems. Temporal makes the complex simple and the impossible possible.

Additional Resources