Retry Pattern
Learn the Retry design pattern and its usage.
Intent
It is used to enable applications to handle anticipated transient failures by transparently retrying a failed operation with an expectation that it will be successful. The Retry pattern is also known as the Transient Fault Handling pattern.
Context and problem
By their very nature, integration applications interact with other systems over the network. With dynamic cloud-based environments becoming the norm and the microservices architectural style partitioning applications into more granular services, successful service communication has become a fundamental prerequisite for many distributed applications. Services that communicate with other services must be able to handle transient failures that can occur in downstream systems transparently and continue operating without any disruption. A transient failure can be considered an infrastructure-level fault, a loss of network connectivity, timeouts and throttling applied by busy services, etc. These conditions occur infrequently, and they are typically self-correcting, and usually, retrying an operation succeeds.
Forces and solution
Reproducing and explaining transient failures can be a difficult task as these might be caused by a combination of factors happening irregularly and related to external systems. Tools such as “Chaos Monkey” can be used to simulate unpredictable system outages and let us test the application resiliency if needed. A good strategy for dealing with transient failures is to retry the operation and hope that it will succeed (if the error is truly transient, it will succeed—keep calm and keep retrying).
There are a few areas to consider to implement a retry logic.
-
Which failures to retry: Certain service operations, such as HTTP calls and relational database interactions, are potential candidates for a retry logic, but further analysis is needed before implementing it. A relational database may reject a connection attempt because it throttles against excessive resource usage or rejects an SQL insert operation because of concurrent modification. Retrying in these situations could be successful. If a relational database rejects a connection because of wrong credentials or an SQL insert operation has failed because of foreign key constraints, retrying the operation will not help. Similarly, retrying a connection or response timeout may help with HTTP calls, but retrying a SOAP
Fault
caused by a business error does not make any sense. So, we must choose our retries carefully. -
How often to retry: Once a retry necessity has been identified, the specific retry policy must be tuned to satisfy the nature of both applications: the service consumer with the retry logic and the service provider with the transient failure. For example, suppose a real-time integration service fails to process a request. In that case, ...