Handling Transactions Across Microservices
The #1 Microservice Interview Problem: Distributed Transactions
In a monolith, a transaction is easy. You wrap 3 database updates in a single BEGIN TRANSACTION...COMMIT. If one fails, everything rolls back. This is an ACID transaction.
In microservices, this is impossible. Your OrderService, PaymentService, and ShippingService all have their own separate databases. You cannot 'lock' all three at once.
The Scenario: A new e-commerce order.
OrderService: Creates an 'Order' (local transaction 1).PaymentService: Takes the payment (local transaction 2).ShippingService: Creates a shipment (local transaction 3).
The Problem: What if Step 1 and 2 succeed, but Step 3 (Shipping) fails? The user has paid, but the item won't ship. The system is now in an inconsistent, broken state. We need a way to 'roll back' the payment and the order.
The Solution: The Saga Pattern. A Saga is a sequence of local transactions. When one local transaction completes, it triggers the next. If any step fails, the Saga is responsible for executing compensating transactions to 'undo' the preceding work.
This cluster will explain what Sagas are, the two ways to build them (Choreography vs. Orchestration), and how they handle failures.
Why 'Two-Phase Commit' (2PC) is Not the Answer
Interview Question: 'Why can't we just use a two-phase commit (2PC) for microservices?'
This is a common question to check your architectural knowledge. 2PC is a classic 'distributed transaction' protocol.
Answer: 'A two-phase commit is a 'chatty' protocol that tries to enforce ACID properties across distributed databases. It's considered an anti-pattern in modern microservices for two main reasons:'
- It Requires Locking (It's Synchronous): 2PC works by having a 'coordinator' tell all participating services to 'prepare' (lock their databases) and then 'commit'. While a service is 'prepared', its database resources are locked. In a high-availability, distributed system, this is a performance killer. If one service is slow, all other services must wait, holding their locks.
- It Violates Autonomy: The core idea of microservices is that each service is autonomous and independent. 2PC couples them together tightly. The coordinator becomes a single point of failure. If the coordinator crashes, all services are left with locked databases, unsure whether to commit or roll back.
The simple answer: '2PC is too slow, synchronous, and 'chatty'. It couples services together and creates blocking locks, which kills performance and availability—the very things microservices are trying to achieve.'
Saga Pattern 1: Choreography (The 'Event-Based' Model)
Interview Question: 'What is Saga Choreography?'
Answer: 'Choreography is a decentralized way to implement a Saga. There is no central coordinator. Instead, each service, after completing its local transaction, emits an event. Other services listen for these events and are triggered to perform their own part of the process.'
Analogy: It's like a chain reaction or dominoes. One domino falls (emits an event), which hits the next one, and so on.
Our E-Commerce Example (Choreography):
We would use a message bus like RabbitMQ or Kafka.
- Client calls
OrderService. OrderService: Saves order to its DB, then emits anORDER_CREATEDevent.PaymentService: Is listening forORDER_CREATED. It receives the event, takes the payment, then emits aPAYMENT_PROCESSEDevent.ShippingService: Is listening forPAYMENT_PROCESSED. It receives the event, creates a shipment, and emits aSHIPMENT_CREATEDevent.
Pros & Cons:
- Pro: Very simple and decoupled. Services don't need to know about each other, they only know about events.
- Pro: No single point of failure (no coordinator).
- Con: Very hard to track. 'What is the current status of Order 123?' You have to check the logs of 3 different services.
- Con: Risk of 'cyclic dependencies' where services end up in an infinite loop of events.
The simple answer: 'It's a decentralized saga where services trigger each other by emitting and listening to events, with no central controller. It's simple, but can be hard to debug.'
Saga Pattern 2: Orchestration (The 'Coordinator' Model)
Interview Question: 'What is Saga Orchestration?'
Answer: 'Orchestration is a centralized way to implement a Saga. You have a new component, the Saga Orchestrator (or 'Coordinator'), which is responsible for telling each service what to do. All the logic and sequencing is in this one place.'
Analogy: The orchestrator is the 'conductor' of an 'orchestra'. It explicitly tells the 'violins' (PaymentService) when to play and the 'drums' (ShippingService) when to play.
Our E-Commerce Example (Orchestration):
The client's request starts the orchestrator.
SagaOrchestrator: Starts the 'Create Order' saga.SagaOrchestrator: CallsOrderService-> 'Create your order'.OrderService: Saves order, then replies 'Done'.SagaOrchestrator: CallsPaymentService-> 'Take payment for this order'.PaymentService: Takes payment, replies 'Done'.SagaOrchestrator: CallsShippingService-> 'Ship this order'.ShippingService: Creates shipment, replies 'Done'.SagaOrchestrator: Marks the saga as 'Complete'.
Tools like AWS Step Functions or Azure Logic Apps are often used as orchestrators.
Pros & Cons:
- Pro: All the business logic is in one place. Easy to understand, manage, and debug.
- Pro: Easy to track the status. You just ask the orchestrator, 'What's the status of Order 123?'.
- Con: Can become a 'God Class'. You risk putting all your business logic back into one smart 'monolith' that just tells dumb services what to do.
- Con: It's another component to build and manage.
Handling Failures: Compensating Transactions
Interview Question: 'A step in your saga fails. What happens next?'
This is the most important part of the saga pattern. Your answer must include the term Compensating Transaction.
Answer: 'A saga handles failure by executing compensating transactions, which are 'undo' operations for a step that has already succeeded. If a step fails, the saga must roll back all preceding work by running their compensating transactions in reverse order.'
Our E-Commerce Example (Failure):
Let's say the ShippingService fails because the item is out of stock.
OrderService->CreateOrder: SuccessPaymentService->TakePayment: SuccessShippingService->CreateShipment: FAILURE!
The Saga (whether orchestrated or choreographed) must now go into 'rollback' mode:
- Run
PaymentService's compensation:RefundPayment. This 'undoes' step 2. - Run
OrderService's compensation:CancelOrder(e.g., set order status to 'Cancelled'). This 'undoes' step 1.
The system is now back in a consistent (though not 'successful') state. The customer has been refunded, and the order is cancelled.
Key Design Point:
For every 'local transaction' you add to a saga (e.g., TakePayment), you must also design and build its corresponding 'compensating transaction' (e.g., RefundPayment). A compensating transaction must be idempotent (you can run it 5 times and it will only have one effect) and should be designed to never fail.
The Big Trade-off: ACID vs. BASE (Eventual Consistency)
Interview Question: 'Sagas sound complex. What's the big trade-off you make?'
This is the final, high-level theory question. Your answer proves you understand the consequences of this pattern.
Answer: 'You are trading ACID transactions for BASE properties. The most important part of this is that you are giving up immediate consistency for eventual consistency.'
ACID (Monolith)
- Atomic
- Consistent
- Isolated
- Durable
When you hit 'Commit', the data is 100% consistent for everyone, instantly. This is strong consistency.
BASE (Microservices / Sagas)
- Basically Available (The system is highly available)
- Soft State (The state can change over time)
- Eventual Consistency
What is Eventual Consistency?
It means there is a short period of time where the data across your services is inconsistent.
Example:
- 10:00:00 AM:
OrderServicecreates the order. - 10:00:01 AM:
PaymentServicetakes the payment. - 10:00:02 AM: ... The saga is in flight. The
ShippingServicehas not been called yet.
At 10:00:02 AM, what happens if the customer looks at their dashboard? The OrderService says 'Paid', but the ShippingService says 'Not Shipped'. The data is temporarily inconsistent!
By 10:00:03 AM, the ShippingService will be called and the data will be eventually consistent (it will show 'Shipped').
The simple answer: 'The trade-off is giving up the immediate, strong consistency of ACID for eventual consistency. This is a business-level decision. You have to accept that for a few seconds, the state of the system might be inconsistent, but it will eventually resolve. This is the price you pay for the scalability and resilience of microservices.'


