Handling Transactions Across Microservices

1 min read

The #1 Microservice Interview Problem: Distributed Transactions

In a monolith, a transaction is easy. You wrap 3 database updates in a single BEGIN TRANSACTION...COMMIT. If one fails, everything rolls back. This is an ACID transaction.

In microservices, this is impossible. Your OrderService, PaymentService, and ShippingService all have their own separate databases. You cannot 'lock' all three at once.

The Scenario: A new e-commerce order.

OrderService: Creates an 'Order' (local transaction 1).
PaymentService: Takes the payment (local transaction 2).
ShippingService: Creates a shipment (local transaction 3).

The Problem: What if Step 1 and 2 succeed, but Step 3 (Shipping) fails? The user has paid, but the item won't ship. The system is now in an inconsistent, broken state. We need a way to 'roll back' the payment and the order.

The Solution: The Saga Pattern. A Saga is a sequence of local transactions. When one local transaction completes, it triggers the next. If any step fails, the Saga is responsible for executing compensating transactions to 'undo' the preceding work.

This cluster will explain what Sagas are, the two ways to build them (Choreography vs. Orchestration), and how they handle failures.

Why 'Two-Phase Commit' (2PC) is Not the Answer

Interview Question: 'Why can't we just use a two-phase commit (2PC) for microservices?'

This is a common question to check your architectural knowledge. 2PC is a classic 'distributed transaction' protocol.

Answer: 'A two-phase commit is a 'chatty' protocol that tries to enforce ACID properties across distributed databases. It's considered an anti-pattern in modern microservices for two main reasons:'

It Requires Locking (It's Synchronous): 2PC works by having a 'coordinator' tell all participating services to 'prepare' (lock their databases) and then 'commit'. While a service is 'prepared', its database resources are locked. In a high-availability, distributed system, this is a performance killer. If one service is slow, all other services must wait, holding their locks.
It Violates Autonomy: The core idea of microservices is that each service is autonomous and independent. 2PC couples them together tightly. The coordinator becomes a single point of failure. If the coordinator crashes, all services are left with locked databases, unsure whether to commit or roll back.

The simple answer: '2PC is too slow, synchronous, and 'chatty'. It couples services together and creates blocking locks, which kills performance and availability—the very things microservices are trying to achieve.'

Saga Pattern 1: Choreography (The 'Event-Based' Model)

Interview Question: 'What is Saga Choreography?'

Answer: 'Choreography is a decentralized way to implement a Saga. There is no central coordinator. Instead, each service, after completing its local transaction, emits an event. Other services listen for these events and are triggered to perform their own part of the process.'

Analogy: It's like a chain reaction or dominoes. One domino falls (emits an event), which hits the next one, and so on.

Our E-Commerce Example (Choreography):

We would use a message bus like RabbitMQ or Kafka.

Client calls OrderService.
OrderService: Saves order to its DB, then emits an ORDER_CREATED event.
PaymentService: Is listening for ORDER_CREATED. It receives the event, takes the payment, then emits a PAYMENT_PROCESSED event.
ShippingService: Is listening for PAYMENT_PROCESSED. It receives the event, creates a shipment, and emits a SHIPMENT_CREATED event.

Pros & Cons:

Pro: Very simple and decoupled. Services don't need to know about each other, they only know about events.
Pro: No single point of failure (no coordinator).
Con: Very hard to track. 'What is the current status of Order 123?' You have to check the logs of 3 different services.
Con: Risk of 'cyclic dependencies' where services end up in an infinite loop of events.

The simple answer: 'It's a decentralized saga where services trigger each other by emitting and listening to events, with no central controller. It's simple, but can be hard to debug.'

Saga Pattern 2: Orchestration (The 'Coordinator' Model)

Interview Question: 'What is Saga Orchestration?'

Answer: 'Orchestration is a centralized way to implement a Saga. You have a new component, the Saga Orchestrator (or 'Coordinator'), which is responsible for telling each service what to do. All the logic and sequencing is in this one place.'

Analogy: The orchestrator is the 'conductor' of an 'orchestra'. It explicitly tells the 'violins' (PaymentService) when to play and the 'drums' (ShippingService) when to play.

Our E-Commerce Example (Orchestration):

The client's request starts the orchestrator.

SagaOrchestrator: Starts the 'Create Order' saga.
SagaOrchestrator: Calls OrderService -> 'Create your order'.
OrderService: Saves order, then replies 'Done'.
SagaOrchestrator: Calls PaymentService -> 'Take payment for this order'.
PaymentService: Takes payment, replies 'Done'.
SagaOrchestrator: Calls ShippingService -> 'Ship this order'.
ShippingService: Creates shipment, replies 'Done'.
SagaOrchestrator: Marks the saga as 'Complete'.

Tools like AWS Step Functions or Azure Logic Apps are often used as orchestrators.

Pros & Cons:

Pro: All the business logic is in one place. Easy to understand, manage, and debug.
Pro: Easy to track the status. You just ask the orchestrator, 'What's the status of Order 123?'.
Con: Can become a 'God Class'. You risk putting all your business logic back into one smart 'monolith' that just tells dumb services what to do.
Con: It's another component to build and manage.

Handling Failures: Compensating Transactions

Interview Question: 'A step in your saga fails. What happens next?'

This is the most important part of the saga pattern. Your answer must include the term Compensating Transaction.

Answer: 'A saga handles failure by executing compensating transactions, which are 'undo' operations for a step that has already succeeded. If a step fails, the saga must roll back all preceding work by running their compensating transactions in reverse order.'

Our E-Commerce Example (Failure):

Let's say the ShippingService fails because the item is out of stock.

OrderService -> CreateOrder: Success
PaymentService -> TakePayment: Success
ShippingService -> CreateShipment: FAILURE!

The Saga (whether orchestrated or choreographed) must now go into 'rollback' mode:

Run PaymentService's compensation: RefundPayment. This 'undoes' step 2.
Run OrderService's compensation: CancelOrder (e.g., set order status to 'Cancelled'). This 'undoes' step 1.

The system is now back in a consistent (though not 'successful') state. The customer has been refunded, and the order is cancelled.

Key Design Point:

For every 'local transaction' you add to a saga (e.g., TakePayment), you must also design and build its corresponding 'compensating transaction' (e.g., RefundPayment). A compensating transaction must be idempotent (you can run it 5 times and it will only have one effect) and should be designed to never fail.

The Big Trade-off: ACID vs. BASE (Eventual Consistency)

Interview Question: 'Sagas sound complex. What's the big trade-off you make?'

This is the final, high-level theory question. Your answer proves you understand the consequences of this pattern.

Answer: 'You are trading ACID transactions for BASE properties. The most important part of this is that you are giving up immediate consistency for eventual consistency.'

ACID (Monolith)

Atomic
Consistent
Isolated
Durable

When you hit 'Commit', the data is 100% consistent for everyone, instantly. This is strong consistency.

BASE (Microservices / Sagas)

Basically Available (The system is highly available)
Soft State (The state can change over time)
Eventual Consistency

What is Eventual Consistency?

It means there is a short period of time where the data across your services is inconsistent.

Example:

10:00:00 AM: OrderService creates the order.
10:00:01 AM: PaymentService takes the payment.
10:00:02 AM: ... The saga is in flight. The ShippingService has not been called yet.

At 10:00:02 AM, what happens if the customer looks at their dashboard? The OrderService says 'Paid', but the ShippingService says 'Not Shipped'. The data is temporarily inconsistent!

By 10:00:03 AM, the ShippingService will be called and the data will be eventually consistent (it will show 'Shipped').

The simple answer: 'The trade-off is giving up the immediate, strong consistency of ACID for eventual consistency. This is a business-level decision. You have to accept that for a few seconds, the state of the system might be inconsistent, but it will eventually resolve. This is the price you pay for the scalability and resilience of microservices.'

Cloud Messaging: Queue vs. Service Bus vs. Event Hub
12 Feb 2025
Building Scalable ASP.NET Core APIs
12 Feb 2025
Core .NET Concepts
12 Feb 2025