CODE CRAFT

A Concise Guide to Transient Fault Handling Application Block

Transient errors are intermittent errors caused by a short lived outage of a specific resource or service. In most cases, if you retry the operation after a few seconds, the error disappears. Transient errors are often beyond the control of the application programmer. However, every attempt should be made to make the application robust enough to handle transient errors.

 

In my previous post, Transient Errors Are Evil – How to Handle them With Exponential Backoff in C#, we looked at a simple retry logic that retried transient errors with exponential back-off. In this post, we’re going to attempt solving the same problem in a more fancier way using the Transient Fault Handling Application Block, a.k.a. Topaz. Looking at the Topaz information page in msdn, one might get the idea that you can only use Topaz against Azure Cloud services. This is not true and in this post we’re going to see how Topaz can be used to tackle transient faults in ANY application.

 

Components of Transient Fault Handling Block

The Transient Fault Handling Application Block makes your application more robust by providing the logic for handling transient faults. This logic is provided by specifying two things – a detection strategy and a retry strategy.

Detection Strategy: Detection strategy provides a way for identifying errors which are transient and hence should be retried. This is typically done by creating a class which implements ITransientErrorDetectionStrategy interface. For example, in case of an WebRequest, I might only want to retry all web exceptions or web exceptions which returns a specific error code. Note that if you’re using Azure services, you should probably use one of the in-built detection stratigies.

Retry Strategy: The retry strategy basically specifies how many times to retry the failed operation and at what intervals. The built-in retry strategies allow you to specify that retries should happen at fixed intervals, at intervals that increase by the same amount each time, and at intervals that increase exponentially but with some random variation. The following table from msdn shows examples of all three strategies.

Retry Strategy

Finally, the Detection Strategy and Retry Strategy is combined into a Retry policy , which in turn determines which errors should be retried and how often they should be retried.

RETRY POLICY = DETECTION STRATEGY + RETRY STRATEGY

Step By Step Guide To Using The Transient Fault Handling Application Block

Step # 1: Add the Transient Fault Handling Application Block to Your Solution – follow the msdn guidelines for obtaining and installing the nuget.

Step # 2: Define the Detection Strategy

In our case, we'll retry on any web exception. To so this, we define our class which implements the ITransientErrorDetectionStrategy interface.

class WebExceptionDetectionStrategy : ITransientErrorDetectionStrategy
{
    public bool IsTransient(Exception ex)
    {
        if (ex is WebException)
            return true;
        return false;
    }
}

Step # 3: Define a Retry Policy

We’ll define our retry policy such that it uses an randomized exponential backoff algorithm and set to retry 3 times.

var retryStrategy = new ExponentialBackoff(3, TimeSpan.FromSeconds(2),
                        TimeSpan.FromSeconds(20), TimeSpan.FromSeconds(1));

The first parameter specifies the number of retries before failing the operation. The next parameter specifies the minimum and maximum backoff time respectively. Finally, the last parameter is used to add a randomized  +/- 20% delta to avoid numerous clients all retrying simultaneously.

Step # 4:  Combine the retry strategy and detection strategy into a retry policy

var retryPolicy = new RetryPolicy(retryStrategy);

Step # 5: Call the ExecuteAction method on the retryPolicy object with your custom operation passed in as a delegate.

retryPolicy.ExecuteAction(() => ExecuteHTTPGET("https://microsoft.sharepoint.com"));

Putting it all together

The complete code listing is given below for reference:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Net;
using Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling;

namespace TOPAZ
{
    class Program
    {
       static void Main(string[] args)
        {
            try
            {
                // Define the Retry Strategy
                var retryStrategy = new ExponentialBackoff(3, TimeSpan.FromSeconds(2),
                    TimeSpan.FromSeconds(20), TimeSpan.FromSeconds(1));

                // Define the Retry Policy
                var retryPolicy = new RetryPolicy(retryStrategy);

                // Execute the Action
                retryPolicy.ExecuteAction(() => ExecuteHTTPGET("https://microsoft.sharepoint.com"));
               
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
                throw;
            }

        }

        //HTTP GET Operation we want to retry multiple times
        static void ExecuteHTTPGET(string requestUri)
        {
            Console.WriteLine(DateTime.Now);
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUri);
            request.KeepAlive = false;
            request.Method = "GET";

            HttpWebResponse webResponse = (HttpWebResponse)request.GetResponse();
            int requestStatus = (int)webResponse.StatusCode;
            webResponse.Close();
        }

    }

    //The Detection Strategy
    class WebExceptionDetectionStrategy : ITransientErrorDetectionStrategy
    {
        public bool IsTransient(Exception ex)
        {
            if (ex is WebException)
                return true;
            return false;
        }
    }
}

When To Use Topaz instead custom retry logic ?

TOPAZ is preferable over a custom retry logic whenever you're using Azure services or want a clear segregation between your detection strategy and retry strategy . There's a little bit of overhead when compared to using the custom retry logic, but the code clarity and inbuilt detection and retry stratigies makes it worthwhile.