Saga Pattern: Managing Distributed Transactions in Microservices

Saga Pattern for Distributed Transactions: A Complete Implementation Guide

Distributed systems can’t use traditional database transactions that span multiple services. When an order involves an inventory service, payment service, and shipping service — each with its own database — you need a different approach. The saga pattern coordinates multi-service transactions through a sequence of local transactions with compensation logic for failures. Therefore, this guide covers both choreography and orchestration approaches with practical implementation examples.

Why Traditional Transactions Don’t Work in Microservices

In a monolithic application, a single database transaction can atomically update inventory, charge payment, and create a shipment. If anything fails, everything rolls back. This ACID guarantee breaks down in microservices because each service owns its own database — there’s no shared transaction coordinator that can lock rows across multiple databases simultaneously.

Two-phase commit (2PC) exists but has fatal flaws for microservices: it requires all services to be available simultaneously, any single service failure blocks the entire transaction, and the coordinator becomes a single point of failure. Moreover, 2PC holds locks across services during the prepare phase, destroying throughput under load. The saga pattern replaces this with eventual consistency — each service completes its local transaction and publishes an event, and compensation transactions undo completed work when a later step fails.

The saga pattern trades strong consistency for availability and performance. Instead of “everything succeeds or nothing succeeds atomically,” you get “everything eventually succeeds, or failed steps are compensated.” For most business operations — order processing, booking systems, account creation — this trade-off is acceptable and far more practical at scale.

Saga pattern distributed transaction architecture
Sagas coordinate multi-service transactions through local transactions and compensation

Choreography: Event-Driven Sagas

In choreography-based sagas, each service listens for events and decides independently what to do next. There’s no central coordinator — the saga emerges from the interaction of autonomous services. This approach is simpler to implement initially and works well for sagas with 3-4 steps.

// Order Service — starts the saga
class OrderService {
  async createOrder(orderData) {
    // Step 1: Create order in PENDING state
    const order = await this.orderRepo.create({
      ...orderData,
      status: 'PENDING'
    });

    // Publish event — inventory service will react
    await this.eventBus.publish('order.created', {
      orderId: order.id,
      items: order.items,
      customerId: order.customerId,
      totalAmount: order.totalAmount
    });

    return order;
  }

  // Compensation: payment failed or shipping failed
  async handlePaymentFailed(event) {
    await this.orderRepo.updateStatus(event.orderId, 'CANCELLED');
    // Notify customer
    await this.notificationService.send(event.customerId,
      'Your order has been cancelled due to payment failure.');
  }
}

// Inventory Service — reacts to order.created
class InventoryService {
  async handleOrderCreated(event) {
    try {
      // Step 2: Reserve inventory
      await this.inventoryRepo.reserve(event.items);

      await this.eventBus.publish('inventory.reserved', {
        orderId: event.orderId,
        items: event.items,
        customerId: event.customerId,
        totalAmount: event.totalAmount
      });
    } catch (error) {
      // Compensation: can't reserve inventory
      await this.eventBus.publish('inventory.failed', {
        orderId: event.orderId,
        reason: error.message
      });
    }
  }

  // Compensation: undo reservation if payment fails
  async handlePaymentFailed(event) {
    await this.inventoryRepo.release(event.items);
  }
}

// Payment Service — reacts to inventory.reserved
class PaymentService {
  async handleInventoryReserved(event) {
    try {
      // Step 3: Charge payment
      const payment = await this.paymentGateway.charge({
        customerId: event.customerId,
        amount: event.totalAmount
      });

      await this.eventBus.publish('payment.completed', {
        orderId: event.orderId,
        paymentId: payment.id
      });
    } catch (error) {
      // Payment failed — triggers compensation chain
      await this.eventBus.publish('payment.failed', {
        orderId: event.orderId,
        items: event.items,
        reason: error.message
      });
    }
  }
}

The downside of choreography is that the saga logic is scattered across services. As sagas grow beyond 4-5 steps, it becomes difficult to understand the complete flow, debug failures, and ensure all compensation paths are covered. Additionally, circular event dependencies can create infinite loops if not carefully managed.

Orchestration: Centralized Saga Coordination

Orchestration-based sagas use a dedicated saga orchestrator that manages the sequence of steps, tracks state, and handles failures. The orchestrator tells each service what to do rather than services reacting to events autonomously. Consequently, the entire saga flow is visible in one place, making it easier to understand, test, and debug.

// Saga Orchestrator — manages the complete flow
class OrderSagaOrchestrator {
  constructor(orderService, inventoryService, paymentService,
              shippingService) {
    this.steps = [
      {
        name: 'create_order',
        execute: (data) => orderService.createOrder(data),
        compensate: (data) => orderService.cancelOrder(data.orderId)
      },
      {
        name: 'reserve_inventory',
        execute: (data) => inventoryService.reserve(data.items),
        compensate: (data) => inventoryService.release(data.items)
      },
      {
        name: 'process_payment',
        execute: (data) => paymentService.charge(data),
        compensate: (data) => paymentService.refund(data.paymentId)
      },
      {
        name: 'arrange_shipping',
        execute: (data) => shippingService.createShipment(data),
        compensate: (data) => shippingService.cancelShipment(
          data.shipmentId)
      }
    ];
  }

  async execute(orderData) {
    const sagaLog = { id: uuid(), status: 'RUNNING',
      completedSteps: [], data: orderData };

    for (const step of this.steps) {
      try {
        const result = await step.execute(sagaLog.data);
        sagaLog.data = { ...sagaLog.data, ...result };
        sagaLog.completedSteps.push(step.name);
        await this.saveSagaState(sagaLog);
      } catch (error) {
        sagaLog.status = 'COMPENSATING';
        sagaLog.failedStep = step.name;
        sagaLog.error = error.message;
        await this.compensate(sagaLog);
        return { success: false, error: error.message };
      }
    }

    sagaLog.status = 'COMPLETED';
    await this.saveSagaState(sagaLog);
    return { success: true, data: sagaLog.data };
  }

  async compensate(sagaLog) {
    // Execute compensations in reverse order
    const stepsToCompensate = [...sagaLog.completedSteps].reverse();
    for (const stepName of stepsToCompensate) {
      const step = this.steps.find(s => s.name === stepName);
      try {
        await step.compensate(sagaLog.data);
      } catch (compError) {
        // Log compensation failure — needs manual intervention
        console.error(`Compensation failed for ${stepName}:`,
          compError);
        sagaLog.status = 'COMPENSATION_FAILED';
        await this.alertOps(sagaLog);
      }
    }
    sagaLog.status = 'COMPENSATED';
    await this.saveSagaState(sagaLog);
  }
}
Distributed transaction orchestration monitoring dashboard
Orchestration centralizes saga logic — the complete flow is visible in one place

Implementing with Kafka: Reliable Event Delivery

Both choreography and orchestration sagas need reliable event delivery. Apache Kafka is the most common choice because it provides durable, ordered event streams with exactly-once processing semantics. The transactional outbox pattern ensures that database writes and event publications are atomic.

// Transactional outbox pattern — atomic DB write + event
class OutboxEventPublisher {
  async publishWithOutbox(dbTransaction, event) {
    // Write event to outbox table within the same DB transaction
    await dbTransaction.query(
      'INSERT INTO outbox (event_type, payload, status) VALUES ($1, $2, $3)',
      [event.type, JSON.stringify(event.data), 'PENDING']
    );
    // Separate process polls outbox and publishes to Kafka
    // This ensures atomicity: if the DB write fails,
    // the event is never published
  }
}

// Outbox poller — runs as a separate process
class OutboxPoller {
  async poll() {
    const events = await db.query(
      'SELECT * FROM outbox WHERE status = $1 ORDER BY created_at LIMIT 100',
      ['PENDING']
    );

    for (const event of events) {
      await kafka.produce(event.event_type, event.payload);
      await db.query(
        'UPDATE outbox SET status = $1 WHERE id = $2',
        ['PUBLISHED', event.id]
      );
    }
  }
}

Furthermore, idempotency is critical for saga reliability. Network failures can cause duplicate event delivery, so every saga step must be idempotent — processing the same event twice should produce the same result. Use unique saga IDs and check whether a step has already been completed before executing it again.

Choosing Between Choreography and Orchestration

Use choreography when your saga has 2-4 steps, the services are owned by different teams who want autonomy, and the flow is unlikely to change frequently. Use orchestration when the saga has 5+ steps, you need clear visibility into the complete flow, complex branching logic is required, or the business logic changes frequently. In practice, many production systems use orchestration because the debugging and monitoring benefits outweigh the additional infrastructure complexity.

Microservices architecture and event-driven patterns
Choose choreography for simple flows, orchestration for complex multi-step transactions

Related Reading:

Resources:

In conclusion, the saga pattern is essential for managing distributed transactions in microservices. Start with orchestration for most use cases — the visibility and debuggability are worth the additional infrastructure. Always implement compensation logic, use the transactional outbox pattern for reliable event delivery, and make every step idempotent. Your distributed system will be more resilient and far easier to operate.

Scroll to Top