Polly .NET: Practical Resilience Guide for 2026
Matteo Migliore

Matteo Migliore is an entrepreneur and software architect with over 25 years of experience developing .NET-based solutions and evolving enterprise-grade application architectures.

He has led enterprise projects, trained hundreds of developers, and helped companies of all sizes simplify complexity by turning software into profit for their business.

This guide is part of the complete section on C# and modern software development with .NET.

Every .NET application that calls external services eventually collides with the reality of distributed systems: variable latencies, timeouts, rate limits, temporary downtime. This is not a question of whether it happens, but when. And when it happens in production, the difference between a system that crashes and one that degrades gracefully is often measured in tens of thousands of euros of lost revenue.

Code without resilience handling responds to these events in binary fashion: it works or it throws an exception. Resilient code responds intelligently: it retries where it makes sense, stops where it does not, and returns to a safe state when a service is unavailable. That difference is not just technical: it is architectural.

Polly is the library that has standardized this approach in .NET for over a decade. With Polly v8 fully integrated into the modern .NET ecosystem through Microsoft.Extensions.Http.Resilience, there is no longer any reason to handle resilience manually. This guide shows you how to use it properly: with real code, real scenarios, and the pitfalls that only those who have used it in production know about.

Whether you are building microservices, applications that call external APIs, AI agents that depend on LLM providers, or any distributed system, this guide applies. The code examples target Polly v8 with .NET 8 and 9, but the concepts apply equally to projects migrating from earlier versions.

What is Polly and why every .NET application in production needs it

Polly is an open source .NET library that implements the most established resilience patterns in software engineering: retry, circuit breaker, timeout, bulkhead isolation, rate limiter, and fallback. It was created in 2013 by Michael Wolfenden and is today the .NET Foundation project with the most monthly NuGet downloads after Microsoft's own packages.

The answer to "why do I need it?" is simple: because any call to an external system can fail, and how you handle that failure determines the perceived quality of your system. An application that stops working when the database is slow for 30 seconds is a poor-quality application, regardless of how elegant its internal code is.

The failures Polly handles fall into three categories:

  • Transient errors: network timeouts, refused connections, temporary 503s. These typically resolve on their own within seconds or a few attempts.
  • Overloaded services: rate limits (429), elevated latencies, degraded responses. Here you need to reduce traffic, not increase it with retries.
  • Extended failures: services down, dependencies unavailable for minutes or hours. The system must degrade gracefully without impacting features that do not depend on the failing service.

Polly does not prevent failures, it handles them. The difference is that a system that handles failures stays operational even when its dependencies do not.

In practical terms: an e-commerce site that keeps serving product pages even when the payment gateway is temporarily unreachable, an ERP system that keeps logistics running even when an external provider is in maintenance, an AI chatbot that responds with a reasonable fallback message instead of crashing when the OpenAI API is under load.

Polly v8 and the Resilience Pipeline: the new API compared to previous versions

Those who used Polly in versions v5, v6 or v7 know the approach based on Policy.Handle<Exception>().WaitAndRetryAsync() with explicit wrapping. It worked, but had some limitations: composing policies was verbose, DI container integration was manual, and telemetry required additional configuration.

Polly v8 completely rewrites the API, introducing the Resilience Pipeline as the central construct. A pipeline is an ordered sequence of resilience strategies that wraps an operation. Each strategy in the pipeline intercepts the operation, applies its logic, and passes control to the next strategy.

To get started, add the required packages:

<!-- For HTTP client with HttpClientFactory (recommended for ASP.NET Core) -->
<PackageReference Include="Microsoft.Extensions.Http.Resilience" Version="8.10.0" />

<!-- For advanced scenarios or non-HTTP operations -->
<PackageReference Include="Polly.Extensions" Version="8.10.0" />
<PackageReference Include="Polly.Core" Version="8.10.0" />

The key difference in the v8 API compared to v7:

// OLD APPROACH (Polly v7)
var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(3, retryAttempt =>
        TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)));

var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30));

// Wrapping was explicit and order was counterintuitive
var policy = Policy.WrapAsync(retryPolicy, circuitBreakerPolicy);
var result = await policy.ExecuteAsync(() => _httpClient.GetStringAsync(url));

// NEW APPROACH (Polly v8)
var pipeline = new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddRetry(new RetryStrategyOptions<HttpResponseMessage> { MaxRetryAttempts = 3 })
    .AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage> { FailureRatio = 0.5 })
    .AddTimeout(TimeSpan.FromSeconds(10))
    .Build();

var result = await pipeline.ExecuteAsync(
    async ct => await _httpClient.GetAsync(url, ct),
    cancellationToken);

With the new API, the order in which you add strategies to the pipeline matches the order in which they are traversed, from outside to inside. This is more intuitive and reduces configuration errors.

Retry policy: configuring retries intelligently

Retry is the simplest pattern and the one most frequently misconfigured. The basic logic is obvious: if an operation fails, retry it. The devil is in the details: how many times, with what interval, for which types of errors, and what to do when the final retry also fails.

Exponential backoff and jitter

The most common mistake is using a fixed interval between retries. With a fixed interval, if 100 clients fail simultaneously (for example after a brief server downtime), they all retry at exactly the same moment, generating a traffic spike that can be worse than the original problem. This phenomenon is called the "thundering herd."

The correct solution combines exponential backoff (increasing delay with each attempt) with jitter (random variation):

builder.Services.AddHttpClient<IMyServiceClient, MyServiceClient>()
    .AddResilienceHandler("my-service-retry", pipeline =>
    {
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 4,
            Delay = TimeSpan.FromSeconds(2),
            MaxDelay = TimeSpan.FromSeconds(30),
            BackoffType = DelayBackoffType.Exponential, // 2s, 4s, 8s, 16s (with jitter)
            UseJitter = true, // Adds random variation (+/- 25% of delay)

            ShouldHandle = args => args.Outcome switch
            {
                { Exception: HttpRequestException } => PredicateResult.True(),
                { Result.StatusCode: HttpStatusCode.TooManyRequests } => PredicateResult.True(),
                { Result.StatusCode: HttpStatusCode.ServiceUnavailable } => PredicateResult.True(),
                { Result.StatusCode: HttpStatusCode.BadGateway } => PredicateResult.True(),
                { Result.StatusCode: HttpStatusCode.GatewayTimeout } => PredicateResult.True(),
                _ => PredicateResult.False()
            },

            OnRetry = args =>
            {
                _logger.LogWarning(
                    "Retry {AttemptNumber} for {OperationKey}. Delay: {Delay}ms. Reason: {Outcome}",
                    args.AttemptNumber,
                    args.Context.OperationKey,
                    args.RetryDelay.TotalMilliseconds,
                    args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString());
                return ValueTask.CompletedTask;
            }
        });
    });

Circuit Breaker: protecting the system from cascading failures

The circuit breaker takes its name from the electrical device: when current exceeds a safety threshold, the circuit opens and interrupts the flow. In software, when a service exceeds a failure threshold, the circuit breaker "opens" calls to that service, protecting both the calling system and the called one.

The problem it solves is subtle but critical: without a circuit breaker, a retry policy responds to failures by increasing traffic toward an already struggling service. This creates a vicious cycle where the service that struggles to respond is further overwhelmed by retry requests, making the situation worse.

The three states of the circuit breaker

  • Closed: normal state, calls pass through. The circuit breaker monitors the failure rate.
  • Open: the failure rate has exceeded the threshold. Calls are blocked immediately with BrokenCircuitException, without even attempting the real call.
  • Half-Open: after the break period, a limited number of test calls are allowed. If they succeed, the circuit breaker returns to Closed. If they fail, it returns to Open.
pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
    FailureRatio = 0.5,
    MinimumThroughput = 10,
    SamplingDuration = TimeSpan.FromSeconds(30),
    BreakDuration = TimeSpan.FromSeconds(60),

    OnOpened = args =>
    {
        _logger.LogError(
            "Circuit breaker OPENED for service. Reason: {Outcome}. Duration: {Duration}s",
            args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString(),
            args.BreakDuration.TotalSeconds);
        return ValueTask.CompletedTask;
    },

    OnClosed = args =>
    {
        _logger.LogInformation("Circuit breaker CLOSED. Service has recovered.");
        return ValueTask.CompletedTask;
    }
});

Timeout policy: enforcing explicit limits on slow operations

Timeouts are the simplest defense mechanism and the most often neglected. The default HttpClient timeout in .NET is 100 seconds: in production, this means a call that does not respond holds a thread (and potentially a connection pool connection) for nearly two minutes.

With Polly you can define two distinct timeout levels: one per attempt and one for the entire operation (including retries). This distinction is fundamental.

builder.Services.AddHttpClient<IOrderServiceClient, OrderServiceClient>()
    .AddResilienceHandler("order-service", pipeline =>
    {
        // TOTAL timeout for the entire operation (including all retries)
        pipeline.AddTimeout(TimeSpan.FromSeconds(15));

        // Retry: max 2 additional attempts
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 2,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential
        });

        // Timeout per SINGLE attempt
        pipeline.AddTimeout(TimeSpan.FromSeconds(4));
    });

Order matters: the total timeout must be added before the retry in the chain, and the per-attempt timeout after. This way the total timeout wraps the entire retry operation, while the per-attempt one resets with each new attempt.

Bulkhead and Rate Limiter: limiting concurrency to protect resources

The bulkhead (named after ship bulkheads) is a pattern that isolates resources: it limits how many concurrent requests can go to a particular service, and optionally how many can queue while waiting. The goal is to prevent a slow service from consuming all available resources (thread pool, connections, memory) and impacting other application features.

// Bulkhead: max 10 concurrent requests to the inventory service,
// with a queue of max 20 waiting requests
pipeline.AddConcurrencyLimiter(new ConcurrencyLimiterStrategyOptions
{
    MaxConcurrentExecutions = 10,
    QueueingStrategy = QueueingStrategy.DropOldest,
    QueueCapacity = 20,
    OnQueueFull = args =>
    {
        _logger.LogWarning(
            "Bulkhead queue full for {OperationKey}. Request dropped.",
            args.Context.OperationKey);
        return ValueTask.CompletedTask;
    }
});

Fallback: what to do when everything else fails

Fallback is the last line of defense in the resilience pipeline. When retry, circuit breaker and everything else fail to get a positive response, the fallback defines what to return. It can be a default value, data read from cache, a partial response, or a more meaningful structured error than a generic exception.

pipeline.AddFallback(new FallbackStrategyOptions<IEnumerable<ProductDto>>
{
    ShouldHandle = new PredicateBuilder<IEnumerable<ProductDto>>()
        .Handle<BrokenCircuitException>()
        .Handle<TimeoutRejectedException>()
        .Handle<HttpRequestException>()
        .HandleResult(r => r == null),

    FallbackAction = async args =>
    {
        _logger.LogWarning(
            "Fallback activated for {OperationKey}. Loading data from cache.",
            args.Context.OperationKey);

        var cached = await _cache.GetAsync<List<ProductDto>>("catalog:all");
        if (cached?.Count > 0)
            return Outcome.FromResult<IEnumerable<ProductDto>>(cached);

        _logger.LogError("Cache empty. Returning empty list as extreme fallback.");
        return Outcome.FromResult<IEnumerable<ProductDto>>(Enumerable.Empty<ProductDto>());
    }
});

Combining policies in a pipeline: the order that matters and pitfalls to avoid

When combining multiple strategies in a pipeline, the order in which you add them determines the order in which they are traversed. In Polly v8, strategies are added from outside to inside: the first added is the outermost.

Recommended order for a complete pipeline, from outside to inside:

  1. Fallback: outermost, catches any unhandled exception from all other layers
  2. Total timeout: limits the maximum time for the entire operation including retries
  3. Retry: retries the operation on failure
  4. Circuit Breaker: intercepts retries and blocks calls when the service is in fault state
  5. Per-attempt timeout: innermost, applies to each individual attempt
builder.Services.AddHttpClient<IInventoryClient, InventoryClient>()
    .AddResilienceHandler("inventory-pipeline", pipeline =>
    {
        // 1. Fallback (outermost)
        pipeline.AddFallback(new HttpFallbackStrategyOptions
        {
            FallbackAction = args => Outcome.FromResultAsValueTask(
                new HttpResponseMessage(HttpStatusCode.OK)
                {
                    Content = JsonContent.Create(new { fromCache = true, items = Array.Empty<object>() })
                }),
            ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                .Handle<BrokenCircuitException>()
                .Handle<TimeoutRejectedException>()
        });

        // 2. Total timeout: 20 seconds for the entire operation
        pipeline.AddTimeout(TimeSpan.FromSeconds(20));

        // 3. Retry: max 3 attempts with exponential backoff
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(2),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        });

        // 4. Circuit Breaker: opens after 50% failures on 10+ calls
        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            SamplingDuration = TimeSpan.FromSeconds(30),
            BreakDuration = TimeSpan.FromSeconds(60)
        });

        // 5. Per-attempt timeout: 4 seconds
        pipeline.AddTimeout(TimeSpan.FromSeconds(4));
    });

The most common pitfall is placing the circuit breaker before the retry. With this ordering, the circuit breaker sees only the final result of the retry operation (not each individual attempt), losing the ability to open quickly during a series of failures.

Polly with HttpClientFactory in ASP.NET Core: the 2026 standard pattern

The integration of Polly with HttpClientFactory in ASP.NET Core is the recommended way to use HTTP resilience in any modern application. HttpClientFactory manages the lifecycle of HttpClient instances, avoiding the socket exhaustion problems of manual creation, and the integration with Polly via AddResilienceHandler adds resilience transparently.

var builder = WebApplication.CreateBuilder(args);

// Option 1: Standard Resilience Handler (sensible defaults)
builder.Services.AddHttpClient("external-api")
    .ConfigureHttpClient(c => c.BaseAddress = new Uri("https://api.external-service.com"))
    .AddStandardResilienceHandler(options =>
    {
        options.Retry.MaxRetryAttempts = 4;
        options.CircuitBreaker.BreakDuration = TimeSpan.FromSeconds(30);
        options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(20);
    });

// Option 2: Custom Resilience Handler (full control)
builder.Services.AddHttpClient<IECommerceApiClient, ECommerceApiClient>()
    .ConfigureHttpClient(c =>
    {
        c.BaseAddress = new Uri(builder.Configuration["Services:ECommerce:BaseUrl"]!);
        c.DefaultRequestHeaders.Add("X-Api-Key",
            builder.Configuration["Services:ECommerce:ApiKey"]);
    })
    .AddResilienceHandler("ecommerce-resilience", (pipeline, context) =>
    {
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        });

        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 8,
            SamplingDuration = TimeSpan.FromSeconds(30),
            BreakDuration = TimeSpan.FromSeconds(45)
        });

        pipeline.AddTimeout(TimeSpan.FromSeconds(8));
    });

Observability with Polly: logging, metrics and telemetry with OpenTelemetry

A resilience pipeline that produces no metrics is like a circuit breaker with no status indicator: you do not know when it opens, how often retries are triggered, or whether your timeouts are correctly calibrated. In production, Polly observability is essential.

Polly v8 has native integration with System.Diagnostics.Metrics and OpenTelemetry. Simply add the telemetry listener and metrics are emitted automatically for every strategy in the pipeline.

builder.Services
    .AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddRuntimeInstrumentation()
            .AddMeter("Polly") // Automatically adds Polly metrics
            .AddPrometheusExporter();
    })
    .WithTracing(tracing =>
    {
        tracing
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddSource("Polly"); // Polly emits spans for each attempt
    });

Polly automatically emits the following key metrics:

  • resilience.polly.strategy.events: counts events by type (retry, circuit-breaker-state-change, timeout, etc.) with tags for pipeline name and event type.
  • resilience.polly.strategy.duration: histogram of execution durations.
  • resilience.polly.strategy.attempts: number of attempts per operation.

Polly for AI agents: resilience for LLM provider calls

If you are building AI agents with Semantic Kernel or RAG systems that call LLM providers, Polly becomes even more critical. LLM providers have peculiar failure patterns: aggressive rate limits with 60-second windows, highly variable latencies (from 500ms to 60s depending on response length), model updates that cause brief periods of unavailability.

For a deep dive into building AI agents with .NET and Semantic Kernel, we have a dedicated article. Here we focus on how Polly protects these calls.

builder.Services.AddHttpClient("openai-client")
    .ConfigureHttpClient(c =>
    {
        c.BaseAddress = new Uri("https://api.openai.com");
        c.Timeout = Timeout.InfiniteTimeSpan; // Polly manages timeouts
    })
    .AddResilienceHandler("llm-resilience", pipeline =>
    {
        // Total timeout for the entire chain (including retries)
        pipeline.AddTimeout(TimeSpan.FromSeconds(90));

        // Retry with Retry-After header support
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            ShouldHandle = args => new ValueTask<bool>(
                args.Outcome.Result?.StatusCode == HttpStatusCode.TooManyRequests ||
                args.Outcome.Result?.StatusCode == HttpStatusCode.ServiceUnavailable ||
                args.Outcome.Exception is HttpRequestException
            ),
            DelayGenerator = args =>
            {
                if (args.Outcome.Result?.Headers.RetryAfter?.Delta is TimeSpan retryAfter)
                    return new ValueTask<TimeSpan?>(retryAfter + TimeSpan.FromSeconds(1));

                return new ValueTask<TimeSpan?>(
                    TimeSpan.FromSeconds(Math.Pow(2, args.AttemptNumber + 1)));
            }
        });

        // Circuit breaker: open after 60% failures on 5 calls in 2 minutes
        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.6,
            MinimumThroughput = 5,
            SamplingDuration = TimeSpan.FromMinutes(2),
            BreakDuration = TimeSpan.FromMinutes(3)
        });

        // Per-attempt timeout for LLM
        pipeline.AddTimeout(TimeSpan.FromSeconds(60));
    });
Polly is not an option for .NET applications in production: it is a requirement. Every call to an external system without a resilience policy is an unhandled failure point that will eventually affect your users.

Domande frequenti

Polly is an open source .NET library that implements resilience patterns like retry, circuit breaker, timeout, bulkhead and fallback. It makes applications more robust against transient errors, unavailable external services and overload conditions, preventing a single failure from cascading through the system.

Polly v8 introduces the Resilience Pipeline as the central construct, replacing individual Policies. The new API is deeply integrated with Microsoft.Extensions.Http.Resilience and the modern .NET ecosystem. The old Policy.Handle and WrapAsync approach is still supported but deprecated. With v8 you define pipelines using ResiliencePipelineBuilder and configure them via AddResilienceHandler in ASP.NET Core.

With Polly v8 you use AddRetry() on the ResiliencePipelineBuilder. You can configure MaxRetryAttempts, Delay, BackoffType (constant, linear or exponential) and ShouldHandle to specify which exceptions or results trigger a retry. Automatic jitter prevents the thundering herd problem when many clients retry simultaneously.

The circuit breaker pattern automatically stops calls to a failing service, preventing you from overwhelming a system already under stress. It has three states: Closed (normal operation), Open (calls are blocked without even attempting) and Half-Open (a limited number of attempts are allowed to verify recovery). Use it whenever you call external services or databases that can become overloaded.

In ASP.NET Core you register the resilience pipeline using AddResilienceHandler() on IHttpClientBuilder. Every HTTP request made through that client automatically benefits from retry, circuit breaker and timeout, without having to manually manage the pipeline in application code.

Leave your details in the form below

Matteo Migliore

Matteo Migliore is an entrepreneur and software architect with over 25 years of experience developing .NET-based solutions and evolving enterprise-grade application architectures.

Throughout his career, he has worked with organizations such as Cotonella, Il Sole 24 Ore, FIAT and NATO, leading teams in developing scalable platforms and modernizing complex legacy ecosystems.

He has trained hundreds of developers and supported companies of all sizes in turning software into a competitive advantage, reducing technical debt and achieving measurable business results.

Stai leggendo perché vuoi smettere di rattoppare software fragile.Scopri il metodo per progettare sistemi che reggono nel tempo.