The Choreography Problem: Hope-Driven Development
For years, we’ve built distributed systems using choreography—Service A fires an event into the void, hoping Service B hears it. Service B processes it and fires another event, hoping Service C is listening. When something fails (and it will), we’re left scrambling through logs across multiple services, trying to piece together what happened.
This is hope-driven development, and it’s fundamentally broken.
Enter Temporal and the concept of Durable Execution—a paradigm shift that replaces hope with guarantees.
The Old Way: Choreography (Event-Driven)
What is Choreography?
In choreographed systems, services communicate through events. No central coordinator exists—each service reacts to events and publishes new ones. It’s decentralized, loosely coupled, and… a nightmare to debug.
Classic Example: E-commerce Order Flow
Customer places order
↓ (fires OrderCreated event)
Payment Service (hopes it receives event)
↓ (fires PaymentProcessed event)
Inventory Service (hopes it receives event)
↓ (fires InventoryReserved event)
Shipping Service (hopes it receives event)
↓ (fires OrderShipped event)
Notification Service (hopes it receives event)
The Choreography Implementation (Go)
// The old way - Choreography with events
package main
import (
"context"
"encoding/json"
"log"
"time"
"github.com/segmentio/kafka-go"
)
// Events
type OrderCreated struct {
OrderID string `json:"order_id"`
CustomerID string `json:"customer_id"`
Amount float64 `json:"amount"`
Items []string `json:"items"`
CreatedAt time.Time `json:"created_at"`
}
type PaymentProcessed struct {
OrderID string `json:"order_id"`
TransactionID string `json:"transaction_id"`
Success bool `json:"success"`
}
type InventoryReserved struct {
OrderID string `json:"order_id"`
Items []string `json:"items"`
Success bool `json:"success"`
}
// Order Service - Fires and forgets
type OrderService struct {
kafkaWriter *kafka.Writer
}
func (s *OrderService) CreateOrder(ctx context.Context, customerID string, amount float64, items []string) error {
orderID := generateOrderID()
event := OrderCreated{
OrderID: orderID,
CustomerID: customerID,
Amount: amount,
Items: items,
CreatedAt: time.Now(),
}
data, _ := json.Marshal(event)
// Fire the event and HOPE someone receives it
err := s.kafkaWriter.WriteMessages(ctx, kafka.Message{
Key: []byte(orderID),
Value: data,
Topic: "order.created",
})
if err != nil {
// What now? The order is created but event failed to publish
log.Printf("Failed to publish event: %v", err)
// Do we rollback? Retry? Give up?
return err
}
log.Printf("Order created: %s (hopefully someone is listening...)", orderID)
return nil
}
// Payment Service - Listens and hopes
type PaymentService struct {
kafkaReader *kafka.Reader
kafkaWriter *kafka.Writer
}
func (s *PaymentService) StartListening(ctx context.Context) {
for {
msg, err := s.kafkaReader.ReadMessage(ctx)
if err != nil {
log.Printf("Error reading message: %v", err)
continue // What about the message we just lost?
}
var event OrderCreated
if err := json.Unmarshal(msg.Value, &event); err != nil {
log.Printf("Failed to unmarshal: %v", err)
continue // Lost another message
}
// Process payment
success := s.processPayment(ctx, event.OrderID, event.Amount)
// Fire another event and HOPE
paymentEvent := PaymentProcessed{
OrderID: event.OrderID,
TransactionID: generateTransactionID(),
Success: success,
}
data, _ := json.Marshal(paymentEvent)
s.kafkaWriter.WriteMessages(ctx, kafka.Message{
Value: data,
Topic: "payment.processed",
})
// If this fails, we charged the customer but nobody knows
// If we crash here, was payment processed or not?
}
}
func (s *PaymentService) processPayment(ctx context.Context, orderID string, amount float64) bool {
// Call payment gateway
// What if this times out?
// What if it succeeds but we crash before publishing the event?
// What if the payment gateway charges the customer but returns an error?
time.Sleep(2 * time.Second) // Simulate payment processing
return true // Or is it? Who knows!
}
// Inventory Service - More hoping
type InventoryService struct {
kafkaReader *kafka.Reader
kafkaWriter *kafka.Writer
}
func (s *InventoryService) StartListening(ctx context.Context) {
for {
msg, err := s.kafkaReader.ReadMessage(ctx)
if err != nil {
continue
}
var event PaymentProcessed
json.Unmarshal(msg.Value, &event)
if !event.Success {
// Payment failed, but do we know if inventory was already reserved?
// Do we need to unreserve? How do we know?
continue
}
// Reserve inventory
success := s.reserveInventory(ctx, event.OrderID)
// Yet another event to fire and hope
inventoryEvent := InventoryReserved{
OrderID: event.OrderID,
Success: success,
}
data, _ := json.Marshal(inventoryEvent)
s.kafkaWriter.WriteMessages(ctx, kafka.Message{
Value: data,
Topic: "inventory.reserved",
})
}
}
func (s *InventoryService) reserveInventory(ctx context.Context, orderID string) bool {
// What if we crash after reserving but before publishing event?
// Inventory is locked forever!
return true
}
The Problems with Choreography
1. No Visibility
Where is order #12345 in the process? You have to:
- Check Kafka for events
- Search logs across 5+ services
- Hope timestamps align
- Pray nothing got lost
2. Error Handling Nightmare
- Payment succeeded but event publish failed—now what?
- Inventory reserved but service crashed before publishing—inventory locked forever
- Network partition—events arrive out of order
- Duplicate events—did we charge the customer twice?
3. No Retries or Timeouts
- How long do we wait for the next service?
- What if a service is down?
- Who retries? When? How many times?
4. Debugging is Detective Work
[2025-01-18 10:23:15] OrderService: Created order ABC123
[2025-01-18 10:23:16] PaymentService: Processing payment for ABC123
[2025-01-18 10:23:18] PaymentService: Payment successful
[2025-01-18 10:23:19] ???
[2025-01-18 10:24:00] Customer: Where's my order?
What happened between 10:23:19 and 10:24:00? Nobody knows.
5. Compensation is Manual
When something fails halfway through:
- Who orchestrates the rollback?
- How do we know what to roll back?
- What if rollback fails?
The New Way: Durable Execution with Temporal
What is Temporal?
Temporal provides durable execution—your code runs to completion, guaranteed, even if:
- Servers crash
- Networks partition
- Processes restart
- Days or months pass
It’s not event-driven. It’s not choreography. It’s orchestration with guarantees.
Key Concepts
1. Workflows: Durable functions that coordinate business logic 2. Activities: Individual units of work that can fail and retry 3. Workers: Execute workflows and activities 4. Temporal Server: Ensures durability and orchestration
The Temporal Implementation (Go)
// The new way - Durable Execution with Temporal
package main
import (
"context"
"fmt"
"time"
"go.temporal.io/sdk/client"
"go.temporal.io/sdk/worker"
"go.temporal.io/sdk/workflow"
)
// ===== Domain Models =====
type OrderRequest struct {
CustomerID string
Amount float64
Items []string
}
type OrderResult struct {
OrderID string
TransactionID string
TrackingID string
Status string
}
// ===== Workflow Definition =====
// This is the ORCHESTRATOR - it coordinates everything
func OrderWorkflow(ctx workflow.Context, req OrderRequest) (*OrderResult, error) {
logger := workflow.GetLogger(ctx)
orderID := workflow.Now(ctx).Format("ORD-20060102-150405")
// Configuration
activityOptions := workflow.ActivityOptions{
StartToCloseTimeout: 30 * time.Second,
RetryPolicy: &temporal.RetryPolicy{
InitialInterval: time.Second,
BackoffCoefficient: 2.0,
MaximumInterval: time.Minute,
MaximumAttempts: 3,
},
}
ctx = workflow.WithActivityOptions(ctx, activityOptions)
logger.Info("Starting order workflow", "orderID", orderID)
var result OrderResult
result.OrderID = orderID
// Step 1: Process Payment
var transactionID string
err := workflow.ExecuteActivity(ctx, ProcessPayment, orderID, req.Amount).Get(ctx, &transactionID)
if err != nil {
logger.Error("Payment failed", "error", err)
result.Status = "payment_failed"
return &result, err
}
result.TransactionID = transactionID
logger.Info("Payment successful", "transactionID", transactionID)
// Step 2: Reserve Inventory
err = workflow.ExecuteActivity(ctx, ReserveInventory, orderID, req.Items).Get(ctx, nil)
if err != nil {
logger.Error("Inventory reservation failed", "error", err)
// COMPENSATION: Refund payment automatically
var refundID string
workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)
result.Status = "inventory_failed_refunded"
return &result, err
}
logger.Info("Inventory reserved")
// Step 3: Create Shipment
var trackingID string
err = workflow.ExecuteActivity(ctx, CreateShipment, orderID, req.Items).Get(ctx, &trackingID)
if err != nil {
logger.Error("Shipment creation failed", "error", err)
// COMPENSATION: Release inventory and refund
workflow.ExecuteActivity(ctx, ReleaseInventory, orderID).Get(ctx, nil)
var refundID string
workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, &refundID)
result.Status = "shipping_failed_compensated"
return &result, err
}
result.TrackingID = trackingID
logger.Info("Shipment created", "trackingID", trackingID)
// Step 4: Send Confirmation
err = workflow.ExecuteActivity(ctx, SendConfirmationEmail, req.CustomerID, orderID, trackingID).Get(ctx, nil)
if err != nil {
// Email failure is not critical - we'll retry but not rollback
logger.Warn("Failed to send confirmation email", "error", err)
}
result.Status = "completed"
logger.Info("Order workflow completed successfully")
return &result, nil
}
// ===== Activity Implementations =====
// These are the ACTUAL WORK - each can fail and retry independently
func ProcessPayment(ctx context.Context, orderID string, amount float64) (string, error) {
// Call actual payment gateway
// If this fails, Temporal will retry based on retry policy
// If server crashes mid-execution, Temporal will retry on another worker
fmt.Printf("Processing payment for order %s: $%.2f\n", orderID, amount)
time.Sleep(2 * time.Second) // Simulate payment processing
// Simulate occasional failures
// Temporal will automatically retry
transactionID := fmt.Sprintf("TXN-%d", time.Now().Unix())
fmt.Printf("Payment successful: %s\n", transactionID)
return transactionID, nil
}
func RefundPayment(ctx context.Context, transactionID string) (string, error) {
fmt.Printf("Refunding payment: %s\n", transactionID)
time.Sleep(1 * time.Second)
refundID := fmt.Sprintf("REF-%d", time.Now().Unix())
fmt.Printf("Refund successful: %s\n", refundID)
return refundID, nil
}
func ReserveInventory(ctx context.Context, orderID string, items []string) error {
fmt.Printf("Reserving inventory for order %s: %v\n", orderID, items)
time.Sleep(1 * time.Second)
// Simulate inventory check
// If this fails, payment will be automatically refunded
fmt.Printf("Inventory reserved for order %s\n", orderID)
return nil
}
func ReleaseInventory(ctx context.Context, orderID string) error {
fmt.Printf("Releasing inventory for order %s\n", orderID)
time.Sleep(500 * time.Millisecond)
return nil
}
func CreateShipment(ctx context.Context, orderID string, items []string) (string, error) {
fmt.Printf("Creating shipment for order %s\n", orderID)
time.Sleep(1500 * time.Millisecond)
trackingID := fmt.Sprintf("TRACK-%d", time.Now().Unix())
fmt.Printf("Shipment created: %s\n", trackingID)
return trackingID, nil
}
func SendConfirmationEmail(ctx context.Context, customerID, orderID, trackingID string) error {
fmt.Printf("Sending confirmation email to customer %s\n", customerID)
time.Sleep(500 * time.Millisecond)
return nil
}
// ===== Worker Setup =====
func main() {
// Create Temporal client
c, err := client.Dial(client.Options{
HostPort: "localhost:7233",
})
if err != nil {
panic(err)
}
defer c.Close()
// Create worker
w := worker.New(c, "order-processing", worker.Options{})
// Register workflow and activities
w.RegisterWorkflow(OrderWorkflow)
w.RegisterActivity(ProcessPayment)
w.RegisterActivity(RefundPayment)
w.RegisterActivity(ReserveInventory)
w.RegisterActivity(ReleaseInventory)
w.RegisterActivity(CreateShipment)
w.RegisterActivity(SendConfirmationEmail)
// Start worker
err = w.Run(worker.InterruptCh())
if err != nil {
panic(err)
}
}
Starting a Workflow
// client/main.go
package main
import (
"context"
"fmt"
"log"
"go.temporal.io/sdk/client"
)
func main() {
c, err := client.Dial(client.Options{
HostPort: "localhost:7233",
})
if err != nil {
log.Fatal(err)
}
defer c.Close()
// Start workflow
workflowOptions := client.StartWorkflowOptions{
ID: "order-12345",
TaskQueue: "order-processing",
}
req := OrderRequest{
CustomerID: "customer-456",
Amount: 129.99,
Items: []string{"laptop", "mouse", "keyboard"},
}
we, err := c.ExecuteWorkflow(context.Background(), workflowOptions, OrderWorkflow, req)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Started workflow ID: %s, RunID: %s\n", we.GetID(), we.GetRunID())
// Wait for result
var result OrderResult
err = we.Get(context.Background(), &result)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Order completed: %+v\n", result)
}
Why Temporal Changes Everything
1. Complete Visibility
# See the entire workflow state
temporal workflow show -w order-12345
# Output:
# - Current step: "CreateShipment"
# - Payment: Completed (TXN-12345)
# - Inventory: Reserved
# - Shipment: In Progress (Attempt 2/3)
No more log diving. No more guessing.
2. Automatic Retries
RetryPolicy: &temporal.RetryPolicy{
InitialInterval: time.Second,
BackoffCoefficient: 2.0,
MaximumInterval: time.Minute,
MaximumAttempts: 3,
}
Activities retry automatically. You configure it once, Temporal handles it forever.
3. Built-in Compensation
// If inventory fails, automatically refund
if err != nil {
workflow.ExecuteActivity(ctx, RefundPayment, transactionID).Get(ctx, nil)
return err
}
Compensation is explicit, testable, and guaranteed to run.
4. Survives Crashes
- Server crashes during payment? Temporal continues from where it left off
- Process restarts? Workflow resumes automatically
- Network partition? Workflow waits and continues when connectivity returns
5. Human-in-the-Loop Workflows
// Wait for approval (hours, days, weeks)
var approved bool
err := workflow.ExecuteActivity(ctx, RequestApproval, orderID).Get(ctx, &approved)
// Workflow can wait for MONTHS without holding resources
workflow.Sleep(ctx, 30*24*time.Hour) // Wait 30 days
if !approved {
// Compensate
}
Try doing this with event choreography!
6. Testability
// Test the entire workflow
func TestOrderWorkflow(t *testing.T) {
testSuite := &testsuite.WorkflowTestSuite{}
env := testSuite.NewTestWorkflowEnvironment()
env.RegisterActivity(ProcessPayment)
env.RegisterActivity(ReserveInventory)
// Mock activity behavior
env.OnActivity(ProcessPayment, mock.Anything, mock.Anything).Return("TXN-123", nil)
env.OnActivity(ReserveInventory, mock.Anything, mock.Anything).Return(nil)
env.ExecuteWorkflow(OrderWorkflow, OrderRequest{
CustomerID: "test",
Amount: 100,
})
require.True(t, env.IsWorkflowCompleted())
}
Real-World Use Cases
1. E-commerce Order Processing
Multiple steps with payments, inventory, shipping—Temporal ensures completion.
2. User Onboarding
Multi-day workflows with email verification, document uploads, approvals.
3. ETL Pipelines
Extract → Transform → Load with automatic retries and error handling.
4. Subscription Management
Trial periods, payment retries, cancellations—all in one durable workflow.
5. Regulatory Compliance
Multi-step approval processes with complete audit trails.
6. IoT Device Provisioning
Device registration, firmware updates, configuration—with retries over days.
Choreography vs Durable Execution
| Aspect | Choreography (Old) | Durable Execution (Temporal) |
|---|---|---|
| Coordination | Hope-driven | Guaranteed |
| Visibility | Log diving | Built-in UI |
| Retries | Manual | Automatic |
| Compensation | Ad-hoc | Explicit |
| State | Scattered | Centralized |
| Debugging | Nightmare | Simple |
| Testing | Complex | Built-in |
| Long-running | Impossible | Native |
| Crash Recovery | Manual | Automatic |
When to Use Temporal
Use Temporal when:
- ✅ Multi-step workflows with dependencies
- ✅ Need guaranteed completion
- ✅ Compensation/rollback required
- ✅ Long-running processes (hours, days, months)
- ✅ Human-in-the-loop workflows
- ✅ Need complete visibility and audit trails
- ✅ Complex error handling and retries
Stick with Events when:
- ❌ Simple, fire-and-forget notifications
- ❌ True decoupling is more important than guarantees
- ❌ High-throughput, low-latency event streaming
- ❌ Broadcasting to many consumers
Advanced Patterns
Child Workflows
// Parent workflow spawns child workflows
func BulkOrderWorkflow(ctx workflow.Context, orders []OrderRequest) error {
for _, order := range orders {
childOptions := workflow.ChildWorkflowOptions{
WorkflowID: fmt.Sprintf("order-%s", order.OrderID),
}
ctx := workflow.WithChildOptions(ctx, childOptions)
// Each order runs as independent workflow
workflow.ExecuteChildWorkflow(ctx, OrderWorkflow, order)
}
return nil
}
Signals (External Events)
// Workflow can receive external signals
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
// Wait for payment confirmation signal
var paymentConfirmed bool
workflow.GetSignalChannel(ctx, "payment-confirmed").Receive(ctx, &paymentConfirmed)
if !paymentConfirmed {
return errors.New("payment not confirmed")
}
// Continue workflow
}
// Send signal from outside
client.SignalWorkflow(ctx, "order-12345", "", "payment-confirmed", true)
Queries (Read State)
// Query workflow state without modifying it
func OrderWorkflow(ctx workflow.Context, req OrderRequest) error {
var currentStatus string
// Register query handler
workflow.SetQueryHandler(ctx, "get-status", func() (string, error) {
return currentStatus, nil
})
currentStatus = "processing-payment"
// ... rest of workflow
}
// Query from outside
var status string
client.QueryWorkflow(ctx, "order-12345", "", "get-status", &status)
fmt.Printf("Current status: %s\n", status)
Getting Started
1. Install Temporal Server:
# Using Docker
curl -L https://temporal.io/docker-compose.yml | docker-compose -f - up
# Or using Temporal CLI
brew install temporal
temporal server start-dev
2. Install Temporal SDK:
go get go.temporal.io/sdk
3. Access Temporal UI:
http://localhost:8233
See all workflows, execution history, current state, and more.
Conclusion
Choreography was a noble experiment, but it’s fundamentally flawed. Services firing events into the void, hoping others receive them, with no visibility, no guarantees, and nightmare debugging.
Temporal’s Durable Execution changes the game:
- ✅ Workflows run to completion, guaranteed
- ✅ Automatic retries and error handling
- ✅ Built-in compensation and rollback
- ✅ Complete visibility and audit trails
- ✅ Survives crashes, restarts, and partitions
- ✅ Simple testing and debugging
- ✅ Human-in-the-loop support
- ✅ Long-running workflows (days, months, years)
Stop hoping. Start guaranteeing.
The shift from choreography to durable execution isn’t just a technical upgrade—it’s a fundamental rethinking of how we build reliable distributed systems. Temporal makes the complex simple and the impossible possible.