Saga Pattern in Microservices: Orchestration vs Choreography Complete Guide

Saga Pattern in Microservices: Orchestration vs Choreography

The saga pattern in microservices is the standard solution for managing distributed transactions across multiple services. When a business operation spans several microservices — each with its own database — you cannot use traditional ACID transactions. Therefore, sagas break a distributed transaction into a sequence of local transactions, where each step either succeeds and triggers the next step, or fails and triggers compensating transactions to undo previous steps. This comprehensive guide covers both orchestration and choreography approaches, with production-grade implementation examples, failure handling strategies, and honest guidance on when to use each pattern.

The fundamental challenge in microservices is maintaining data consistency across service boundaries. In a monolith, a single database transaction can atomically update orders, inventory, and payments. However, in a microservices architecture, each service owns its data, and there is no distributed transaction coordinator that works reliably at scale. Moreover, two-phase commit (2PC) protocols suffer from blocking, coordinator single-point-of-failure, and poor performance under high load. Consequently, the saga pattern emerged as the practical alternative — trading strong consistency for eventual consistency with explicit compensation logic.

Choreography-Based Sagas: Event-Driven Approach

In choreography, services communicate through events without a central coordinator. Each service listens for events, performs its local transaction, and publishes new events to trigger the next step. Furthermore, if a service fails, it publishes a failure event that other services listen for to execute their compensating transactions. This approach is simpler for straightforward workflows with 3-4 steps but becomes difficult to manage as saga complexity increases.

// Order Service — initiates the saga
@Service
public class OrderService {
    private final KafkaTemplate kafka;
    private final OrderRepository repository;

    @Transactional
    public Order createOrder(CreateOrderRequest request) {
        // Step 1: Create order in PENDING state
        var order = Order.builder()
            .customerId(request.customerId())
            .items(request.items())
            .totalAmount(calculateTotal(request.items()))
            .status(OrderStatus.PENDING)
            .build();

        order = repository.save(order);

        // Publish event to trigger next saga step
        kafka.send("order-events", new OrderCreatedEvent(
            order.getId(),
            order.getCustomerId(),
            order.getTotalAmount(),
            order.getItems().stream()
                .map(i -> new ItemReservation(i.getProductId(), i.getQuantity()))
                .toList()
        ));

        return order;
    }

    // Handle payment completion — saga step 3 succeeded
    @KafkaListener(topics = "payment-events",
        groupId = "order-service")
    public void onPaymentEvent(PaymentEvent event) {
        switch (event) {
            case PaymentCompletedEvent e -> {
                repository.updateStatus(e.orderId(), OrderStatus.CONFIRMED);
                kafka.send("order-events",
                    new OrderConfirmedEvent(e.orderId()));
            }
            case PaymentFailedEvent e -> {
                // Compensating transaction: cancel order
                repository.updateStatus(e.orderId(), OrderStatus.CANCELLED);
                // Trigger inventory release
                kafka.send("order-events",
                    new OrderCancelledEvent(e.orderId(), e.reason()));
            }
        }
    }

    // Handle inventory failure — saga step 2 failed
    @KafkaListener(topics = "inventory-events",
        groupId = "order-service")
    public void onInventoryEvent(InventoryEvent event) {
        if (event instanceof InventoryReservationFailedEvent e) {
            repository.updateStatus(e.orderId(), OrderStatus.CANCELLED);
            // No further compensation needed — payment not attempted
        }
    }
}

// Inventory Service — saga step 2
@Service
public class InventoryService {
    @KafkaListener(topics = "order-events",
        groupId = "inventory-service")
    public void onOrderEvent(OrderEvent event) {
        switch (event) {
            case OrderCreatedEvent e -> reserveInventory(e);
            case OrderCancelledEvent e -> releaseInventory(e);
            default -> {} // Ignore other order events
        }
    }

    @Transactional
    private void reserveInventory(OrderCreatedEvent event) {
        try {
            for (var item : event.items()) {
                inventoryRepository.reserve(
                    item.productId(), item.quantity());
            }
            kafka.send("inventory-events",
                new InventoryReservedEvent(event.orderId()));
        } catch (InsufficientStockException ex) {
            kafka.send("inventory-events",
                new InventoryReservationFailedEvent(
                    event.orderId(), ex.getMessage()));
        }
    }
}
Saga pattern microservices architecture
Choreography sagas use event-driven communication without a central coordinator

Saga Pattern Microservices: Orchestration Approach

In orchestration, a central saga orchestrator manages the workflow, telling each service what to do and handling the compensation logic when failures occur. The orchestrator maintains the saga state machine, making it the single source of truth for saga progress. Additionally, orchestration makes complex workflows with conditional logic, parallel steps, and sophisticated error handling much easier to implement and debug.

// Saga Orchestrator — manages the entire workflow
@Service
public class OrderSagaOrchestrator {
    private final SagaStateRepository sagaRepo;
    private final OrderServiceClient orderClient;
    private final InventoryServiceClient inventoryClient;
    private final PaymentServiceClient paymentClient;
    private final ShippingServiceClient shippingClient;

    @Transactional
    public SagaResult executeSaga(CreateOrderRequest request) {
        var saga = SagaState.create(request);
        sagaRepo.save(saga);

        try {
            // Step 1: Create order
            saga.setStep("CREATE_ORDER");
            var order = orderClient.createOrder(request);
            saga.setOrderId(order.id());
            saga.markStepCompleted("CREATE_ORDER");

            // Step 2: Reserve inventory
            saga.setStep("RESERVE_INVENTORY");
            inventoryClient.reserve(order.id(), request.items());
            saga.markStepCompleted("RESERVE_INVENTORY");

            // Step 3: Process payment
            saga.setStep("PROCESS_PAYMENT");
            var payment = paymentClient.charge(
                order.id(), request.paymentMethod(), order.total());
            saga.setPaymentId(payment.id());
            saga.markStepCompleted("PROCESS_PAYMENT");

            // Step 4: Schedule shipping
            saga.setStep("SCHEDULE_SHIPPING");
            shippingClient.schedule(order.id(), request.shippingAddress());
            saga.markStepCompleted("SCHEDULE_SHIPPING");

            saga.setStatus(SagaStatus.COMPLETED);
            sagaRepo.save(saga);
            return SagaResult.success(order.id());

        } catch (SagaStepException ex) {
            return compensate(saga, ex);
        }
    }

    private SagaResult compensate(SagaState saga, SagaStepException ex) {
        saga.setStatus(SagaStatus.COMPENSATING);
        var failedStep = saga.getCurrentStep();
        var completedSteps = saga.getCompletedSteps();

        // Compensate in reverse order
        if (completedSteps.contains("SCHEDULE_SHIPPING")) {
            try {
                shippingClient.cancel(saga.getOrderId());
            } catch (Exception e) {
                log.error("Shipping compensation failed", e);
                saga.addCompensationFailure("SCHEDULE_SHIPPING", e);
            }
        }

        if (completedSteps.contains("PROCESS_PAYMENT")) {
            try {
                paymentClient.refund(saga.getPaymentId());
            } catch (Exception e) {
                log.error("Payment compensation failed", e);
                saga.addCompensationFailure("PROCESS_PAYMENT", e);
            }
        }

        if (completedSteps.contains("RESERVE_INVENTORY")) {
            try {
                inventoryClient.release(saga.getOrderId());
            } catch (Exception e) {
                log.error("Inventory compensation failed", e);
                saga.addCompensationFailure("RESERVE_INVENTORY", e);
            }
        }

        if (completedSteps.contains("CREATE_ORDER")) {
            try {
                orderClient.cancel(saga.getOrderId());
            } catch (Exception e) {
                log.error("Order compensation failed", e);
                saga.addCompensationFailure("CREATE_ORDER", e);
            }
        }

        saga.setStatus(saga.hasCompensationFailures()
            ? SagaStatus.COMPENSATION_FAILED
            : SagaStatus.COMPENSATED);
        sagaRepo.save(saga);

        return SagaResult.failure(failedStep, ex.getMessage());
    }
}

Orchestration vs Choreography: Decision Framework

Choosing between orchestration and choreography depends on saga complexity, team structure, and operational requirements. Choreography works well for simple, linear workflows with 3-4 steps where all services are maintained by the same team. However, orchestration is superior for complex workflows with conditional logic, parallel steps, timeouts, and sophisticated error handling. Moreover, orchestration provides a single place to understand the entire workflow, making debugging and monitoring significantly easier.

Decision Framework: Orchestration vs Choreography

Choose CHOREOGRAPHY when:
+ Simple linear workflow (3-4 steps)
+ All services maintained by same team
+ Low coupling between services is critical
+ Event-driven architecture already in place
+ Simple compensation logic (just undo)

Choose ORCHESTRATION when:
+ Complex workflow (5+ steps)
+ Conditional branching or parallel steps needed
+ Multiple teams own different services
+ Sophisticated error handling and retries needed
+ Business wants workflow visibility and monitoring
+ Compensation logic is complex (partial rollbacks)
+ SLA requirements need timeout management

Hybrid approach:
+ Use orchestration for the main saga flow
+ Use choreography for cross-cutting concerns
  (notifications, audit logging, analytics)

Handling Compensation Failures

The hardest part of implementing sagas is handling failures during compensation. If a compensating transaction fails, you have a partially compensated saga — an inconsistent state that requires manual intervention or automated recovery. Therefore, design compensating transactions to be idempotent and retriable. Additionally, implement a saga recovery process that periodically scans for stuck sagas and retries failed compensations.

Distributed systems team collaboration
Complex saga workflows benefit from orchestration’s centralized control and monitoring

Saga State Machine Design

A well-designed saga state machine is the foundation of reliable orchestration. Each saga has a defined set of states and transitions, making it easy to reason about the saga’s progress and handle edge cases. Furthermore, persisting the saga state to a database provides durability — if the orchestrator crashes mid-saga, it can recover and resume from the last persisted state.

// Saga state machine with explicit states
public enum SagaState {
    STARTED,
    ORDER_CREATED,
    INVENTORY_RESERVED,
    PAYMENT_PROCESSED,
    SHIPPING_SCHEDULED,
    COMPLETED,

    // Compensation states
    COMPENSATING_SHIPPING,
    COMPENSATING_PAYMENT,
    COMPENSATING_INVENTORY,
    COMPENSATING_ORDER,
    COMPENSATED,
    COMPENSATION_FAILED;

    public SagaState nextStep() {
        return switch (this) {
            case STARTED -> ORDER_CREATED;
            case ORDER_CREATED -> INVENTORY_RESERVED;
            case INVENTORY_RESERVED -> PAYMENT_PROCESSED;
            case PAYMENT_PROCESSED -> SHIPPING_SCHEDULED;
            case SHIPPING_SCHEDULED -> COMPLETED;
            default -> throw new IllegalStateException(
                "No next step from " + this);
        };
    }

    public SagaState compensationStep() {
        return switch (this) {
            case SHIPPING_SCHEDULED -> COMPENSATING_SHIPPING;
            case PAYMENT_PROCESSED -> COMPENSATING_PAYMENT;
            case INVENTORY_RESERVED -> COMPENSATING_INVENTORY;
            case ORDER_CREATED -> COMPENSATING_ORDER;
            case COMPENSATING_SHIPPING -> COMPENSATING_PAYMENT;
            case COMPENSATING_PAYMENT -> COMPENSATING_INVENTORY;
            case COMPENSATING_INVENTORY -> COMPENSATING_ORDER;
            case COMPENSATING_ORDER -> COMPENSATED;
            default -> COMPENSATED;
        };
    }
}

Monitoring and Observability for Sagas

Production sagas require comprehensive monitoring. Track saga duration, step failure rates, compensation frequency, and stuck saga counts. Furthermore, implement distributed tracing with correlation IDs that span all saga participants, making it possible to trace the entire saga execution across services. Additionally, create alerts for sagas that exceed expected duration or enter compensation failure states.

Testing Saga Implementations

Testing sagas requires verifying both the happy path and every possible failure scenario. For each step in the saga, test what happens when that step fails and verify that all previous steps are properly compensated. Furthermore, test concurrent saga execution to ensure that compensating transactions don’t interfere with each other. Use Testcontainers with Kafka and databases to create realistic integration tests.

Key Takeaways

The saga pattern in microservices provides eventual consistency across service boundaries through sequences of local transactions with compensating actions. Choose choreography for simple linear workflows and orchestration for complex business processes. Design compensating transactions to be idempotent, implement saga recovery for stuck states, and invest heavily in monitoring and observability. The most successful saga implementations start simple, handle failure cases explicitly, and evolve their complexity based on actual production requirements.

Related Reading:

External Resources:

Scroll to Top