...

/

Failures in the Two-Phase Commit Protocol

Failures in the Two-Phase Commit Protocol

Learn how 2PC behaves under node or network faults.

In a distributed system, nodes (coordinators or cohorts) can fail anywhere midway through the 2PC transaction. The network can delay or lose messages, or the network can partition the participants. In this lesson, we will learn how 2PC behaves when different faults occur.

Cohort failure

Let's analyze a few failure scenarios. For instance, in the illustration below, if one of the cohorts fails in the prepare phase, the coordinator cannot commit because it needs affirmative votes from all cohorts. So, the coordinator will terminate the transaction if any cohort is unavailable (does not respond before timeout). This requirement adversely affects the system's availability, as the failure of a single node can prevent transactions from taking place.

Press + to interact
Cohort failure
Cohort failure

Coordinator failure

2PC ensures that all transaction cohorts agree on committing or aborting the transaction. However, if the coordinator crashes before sending the commit request (which is after completing the prepare phase but before starting the commit phase, cohorts are left uncertain.

In the event of a coordinator failure before sending prepare requests, cohorts can safely abort the transaction. However, if a cohort has already received a prepare request and voted affirmatively, it must wait for the coordinator's decision on whether to commit or abort the transaction. In case of a coordinator crash, the cohort is left in doubt and cannot unilaterally abort or commit.

The scenario is depicted in the figure below. In this instance, the coordinator is determined to commit, and Cohort 1 gets the commit request. ...