top of page

Techniques to Implement Fault Tolerance and Resilience in ASP.NET Application

Fault Tolerance and Fault Resilience are two concepts related to how an application handles failures and recovers from them.


Fault Tolerance means that any user of the service does not observe any fault (observing delays is normal).


Fault Resilience means that a fault may be observed, but only in uncommitted data (like the database may respond with an error to the attempt to commit a transaction, etc.).


Techniques to Implement Fault Tolerance and Resilience in ASP.NET Application

There are different techniques to implement fault tolerance and resilience in ASP.NET applications, depending on the type and source of the failure. Some of the common techniques are:

  1. Retry Pattern

  2. Timeout Pattern

  3. Rate-limiting Pattern

  4. Circuit Breaker Pattern

  5. Bulkhead Pattern

  6. Infrastructure-based Resiliency

  7. Fallback Pattern

Technique 1: Retry Pattern

This pattern involves retrying a failed operation a certain number of times, with a delay between each attempt, until it succeeds or reaches a limit. This can help you to overcome temporary failures, such as network issues or temporary unavailability of a service. The Polly library is a popular .NET library that provides various retry policies for HTTP requests.


Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, Rate-limiting, and Fallback in a fluent and thread-safe manner. Polly targets .NET Framework 4.x and .NET Standard 1.0, 1.1, and 2.0 (which supports .NET Core and later). Polly can be integrated with IHttpClientFactory, which is a factory class for creating HttpClient instances in ASP.NET applications. Polly provides various extension methods to add policies to the HttpClient objects created by the factory. Polly also supports asynchronous and synchronous execution, event hooks, custom policy types, and policy registry.


The below code will show you how to use Polly to retry an HTTP request three times with exponential backoff:

// Create a Polly policy that retries with exponential backoff
var retryPolicy = Policy     
    .Handle<HttpRequestException>()     
    .WaitAndRetryAsync(3, retryAttempt =>          
                    TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))     
    );  

// Use the policy to execute an HTTP request
var response = await retryPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
}); 

Advantages of Retry Pattern:

  1. Improves reliability and availability by overcoming temporary errors or service unavailability.

  2. Reduce the impact of failure on the user experience by hiding or minimizing errors.

  3. Easy to implement using existing libraries or frameworks, such as Polly.

Disadvantages of Retry Pattern:

  1. It can increase the load and latency of operations.

  2. It will send more requests which can cause more errors.

  3. If the operation is not idempotent or if the error is not transient, it will cause unexpected results.


Technique 2: Circuit Breaker Pattern

This pattern involves breaking the connection between a client and a service when the service is failing or overloaded, and preventing further requests until the service recovers. This can help avoid cascading failures and reduce the load on the service. The Polly library also provides circuit breaker policies for HTTP requests.


For example, the following code shows how to use Polly to break the circuit after two consecutive failures and wait for 10 seconds before allowing another request:

// Create a Polly policy that breaks the circuit after two failures
var circuitBreakerPolicy = Policy     
    .Handle<HttpRequestException>()     
    .CircuitBreakerAsync(2, TimeSpan.FromSeconds(10));  

// Use the policy to execute an HTTP request
var response = await circuitBreakerPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
}); 


Advantages of Circuit Breaker Pattern:

  1. Can prevent cascading failures and reduce the load on failure.

  2. Provide a fallback response or action when a service is unavailable.

Disadvantages of Circuit Breaker Pattern:

  1. May increase the complexity and code required in the application layer by introducing an intermediary service or component.

  2. Require careful tuning and monitoring of parameters such as failure threshold, timeout duration, and recovery test frequency.


Retry Pattern and Circuit Breaker Pattern are both provided by the Polly Library. So, let's understand the difference between them:

Retry Pattern

Circuit Breaker Pattern

It is based on the expectation that the operation will eventually succeed

It is based on the prevention of an operation that is likely to fail

It can improve the reliability and availability of an application by overcoming temporary errors

It can improve stability and performance by protecting it from failing


Technique 3: Timeout Pattern

This pattern involves placing a limit on the duration for which a caller can wait for a response from a service. This can help avoid wasting resources and blocking threads on long-running or hanging operations. Here again, the Polly library is used to provide timeout policies for HTTP requests.


For example, the following code shows how to use Polly to set a timeout of 10 seconds for an HTTP request:

// Create a Polly policy that sets a timeout
var timeoutPolicy = Policy     .TimeoutAsync(10);  

// Use the policy to execute an HTTP request
var response = await timeoutPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
}); 

Advantages of Timeout Pattern:

  1. It aborts requests that are unlikely to succeed within a response time.

  2. Can also prevent cascading failure by stopping requests that are affected by overloaded service.

  3. Also, provide a fallback response.

Disadvantages of Timeout Pattern:

  1. Require tuning and monitoring of the parameters such as duration, frequency, and scope.

  2. It can cause unexpected results.


Technique 4: Rate-limiting pattern

This pattern involves limiting the number of requests that can be sent or received by a service per unit of time. This can help prevent overloading or exhausting the resources of a service. The Polly library provides rate-limiting policies for HTTP requests.


For example, the following code shows how to use Polly to limit the number of requests to 100 per minute:

// Create a Polly policy that limits the rate
var rateLimitPolicy = Policy     
    .RateLimitAsync(100, TimeSpan.FromMinutes(1));  

// Use the policy to execute an HTTP request
var response = await rateLimitPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
});

Advantages of Rate-limiting Pattern:

  1. Protects the service from denial-of-service (DoS) attacks or abusive behavior by rejecting requests that exceed the allowed rate.

  2. It can prevent the resource from starvation or degradation.

  3. It can enforce fair usage policies or business rules by differentiating between different types or levels of clients or users.

Disadvantages of Rate-limiting Pattern:

  1. It can negatively affect the user experience and satisfaction by blocking legitimate requests or causing errors.

  2. It can require tuning and monitoring of the rate-limiting parameters, such as rate, time interval, and scope


The timeout pattern and the rate-limiting pattern can be used together or separately, depending on the needs and goals of an application. Here we have a difference between timeout and rate-limiting patterns, which will give you clarity on how both techniques work:

Timeout Pattern

Rate-limiting Pattern

It is based on the duration of the request

It is based on the frequency of the request.

It affects individual requests

It affects a group of requests

It can improve the responsiveness of an application

It can improve the availability of an application


Technique 5: Bulkhead Pattern

This pattern involves isolating critical resources or services from each other so that a failure in one does not affect the others. This can help improve the availability and performance of the system. The Polly library also provides bulkhead policies for limiting concurrency and queueing of requests.


For example, the following code shows how to use Polly to limit the number of concurrent requests to a service to 10 and queue up to 20 additional requests:

// Create a Polly policy that limits concurrency and queueing
var bulkheadPolicy = Policy     
    .BulkheadAsync(10, 20);  

// Use the policy to execute an HTTP request
var response = await bulkheadPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
}); 

Advantages of Bulkhead Pattern:

  1. It enforces fair usage policies between different clients or users by assigning different capabilities or quotes.

  2. It can prevent cascading failure.

Disadvantages of Bulkhead Pattern:

  1. Require monitoring of the parameters such as capacity, scope, and isolation level.

  2. It can increase the complexity by introducing services or components.


Technique 6: Infrastructure-based Resiliency Pattern

This technique involves using external components or services to provide resiliency features, such as load balancing, health checks, failover, and autoscaling. This can help reduce the complexity and code required in the application layer. For example, Azure Kubernetes Service (AKS) is a platform that can run ASP.NET applications in containers and provide infrastructure-based resiliency.


For example, the following YAML file shows how to deploy an ASP.NET application to AKS with a load balancer service, liveness and readiness probes, and horizontal pod autoscaler:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: aspnet-app
spec:
    replicas: 2 # initial number of pods
    selector:
        matchLabels:
            app: aspnet-app
    template:
        metadata:
            labels:
                app: aspnet-app
        spec:
            containers:
            - name: aspnet-app
            image: myregistry.azurecr.io/aspnet-app:v1 # image from Azure Container Registry
            ports:
                - containerPort: 80 # expose port 80 for HTTP traffic
                livenessProbe: # check if the container is alive every 10 seconds
                httpGet:
                    path: /health/live # use health check endpoint
                    port: 80
                initialDelaySeconds: 10
                periodSeconds: 10
            readinessProbe: # check if the container is ready to serve requests every 10 seconds
                httpGet:
                    path: /health/ready # use health check endpoint
                    port: 80
                initialDelaySeconds: 10
                periodSeconds: 10
---
apiVersion: v1
kind: Servic
emetadata:
    name: aspnet-app-service
spec:
    selector:
        app: aspnet-app
    ports:
    - protocol: TCP
      port: 80 # expose port 80 for HTTP traffic
      targetPort: 80
    type: LoadBalancer # create a load balancer service
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
    name: aspnet-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: aspnet-app
    minReplicas: 2 # minimum number of pods
    maxReplicas: 10 # maximum number of pods
    metrics:
        - type: Resource # use CPU utilization as the metric for scaling
    resource:
        name: cpu
        target:
            type: Utilization
            averageUtilization: 50 # scale up or down when CPU utilization is above or below 50%

Advantages of Infrastructure-based Resiliency

  1. It can improve the availability, scalability, and performance of the application by leveraging the features of the platform or service.

  2. It can reduce the development and maintenance effort by avoiding custom code for resiliency logic.

  3. It can simplify the deployment and configuration process by using declarative syntax and tools.

Disadvantages of Infrastructure-based Resiliency

  1. It can introduce dependencies on external components or services that may have their own limitations, costs, or risks.

  2. It can reduce the control and visibility over the resiliency behavior and metrics of the application.

  3. It can require additional skills and knowledge to use the platform or service effectively and securely.


Technique 7: Fallback Pattern

This pattern involves defining an alternative action or response when a service fails or is unavailable. This can help provide graceful degradation of service or a default value to the caller.


For example, the following code shows how to use Polly to return a cached response when an HTTP request fails:

// Create a Polly policy that returns a fallback
var fallbackPolicy = Policy<HttpResponseMessage>     
    .Handle<Exception>()     
    .FallbackAsync(async (ct) =>     
    {         
        // Return a cached response from local storage
        var cachedResponse = await GetCachedResponseAsync();         
        return cachedResponse;     
    });  

// Use the policy to execute an HTTP request
var response = await fallbackPolicy.ExecuteAsync(async () => 
{     
    var request = new HttpRequestMessage(HttpMethod.Get, "https://example.com/api");     
    return await httpClient.SendAsync(request); 
});

Advantages of Fallback Pattern:

  1. It can reduce the complexity by using a proxy or an invocation handler to implement the fallback logic.

  2. It can use the existing libraries that offer fallback features such as Polly.

Disadvantages of Fallback Pattern:

  1. You need to carefully monitor the fallback condition, fallback value, and fallback frequency.

  2. It can introduce new challenges in terms of performance, security, or compliance with different regulations or standards.


Conclusion

Implementing fault tolerance and resilience in ASP.NET applications is important to ensure uninterrupted service, enhance user experiences, and protect businesses from costly downtime. By employing the techniques discussed in this article, developers can build applications that can withstand failures, recover from errors, and provide reliable and consistent service to users.

0 comments

Comments


bottom of page