What is Polly and how do you use it to build resilient .NET applications?
Polly is the de facto standard library for resilience in .NET: it implements retry, circuit breaker, timeout, bulkhead and fallback with a fluent API and deep integration with ASP.NET Core and HttpClientFactory.
With Polly v8 you build a declarative Resilience Pipeline that wraps risky operations. The pipeline automatically handles transient failures without polluting application code.
- Retry: automatically retries with exponential backoff and jitter
- Circuit Breaker: blocks calls to services in a fault state
- Timeout: enforces explicit limits on slow operations
- Fallback: returns a default value when everything else fails
ASP.NET Core integration: builder.Services.AddHttpClient("MyClient").AddResilienceHandler("my-pipeline", b => { ... });

This guide is part of the complete section on C# and modern software development with .NET.
Every .NET application that calls external services eventually collides with the reality of distributed systems: variable latencies, timeouts, rate limits, temporary downtime. This is not a question of whether it happens, but when. And when it happens in production, the difference between a system that crashes and one that degrades gracefully is often measured in tens of thousands of euros of lost revenue.
Code without resilience handling responds to these events in binary fashion: it works or it throws an exception. Resilient code responds intelligently: it retries where it makes sense, stops where it does not, and returns to a safe state when a service is unavailable. That difference is not just technical: it is architectural.
Polly is the library that has standardized this approach in .NET for over a decade. With Polly v8 fully integrated into the modern .NET ecosystem through Microsoft.Extensions.Http.Resilience, there is no longer any reason to handle resilience manually. This guide shows you how to use it properly: with real code, real scenarios, and the pitfalls that only those who have used it in production know about.
Whether you are building microservices, applications that call external APIs, AI agents that depend on LLM providers, or any distributed system, this guide applies. The code examples target Polly v8 with .NET 8 and 9, but the concepts apply equally to projects migrating from earlier versions.
What is Polly and why every .NET application in production needs it
Polly is an open source .NET library that implements the most established resilience patterns in software engineering: retry, circuit breaker, timeout, bulkhead isolation, rate limiter, and fallback. It was created in 2013 by Michael Wolfenden and is today the .NET Foundation project with the most monthly NuGet downloads after Microsoft's own packages.
The answer to "why do I need it?" is simple: because any call to an external system can fail, and how you handle that failure determines the perceived quality of your system. An application that stops working when the database is slow for 30 seconds is a poor-quality application, regardless of how elegant its internal code is.
The failures Polly handles fall into three categories:
- Transient errors: network timeouts, refused connections, temporary 503s. These typically resolve on their own within seconds or a few attempts.
- Overloaded services: rate limits (429), elevated latencies, degraded responses. Here you need to reduce traffic, not increase it with retries.
- Extended failures: services down, dependencies unavailable for minutes or hours. The system must degrade gracefully without impacting features that do not depend on the failing service.
Polly does not prevent failures, it handles them. The difference is that a system that handles failures stays operational even when its dependencies do not.
In practical terms: an e-commerce site that keeps serving product pages even when the payment gateway is temporarily unreachable, an ERP system that keeps logistics running even when an external provider is in maintenance, an AI chatbot that responds with a reasonable fallback message instead of crashing when the OpenAI API is under load.
Polly v8 and the Resilience Pipeline: the new API compared to previous versions
Those who used Polly in versions v5, v6 or v7 know the approach based on Policy.Handle<Exception>().WaitAndRetryAsync() with explicit wrapping. It worked, but had some limitations: composing policies was verbose, DI container integration was manual, and telemetry required additional configuration.
Polly v8 completely rewrites the API, introducing the Resilience Pipeline as the central construct. A pipeline is an ordered sequence of resilience strategies that wraps an operation. Each strategy in the pipeline intercepts the operation, applies its logic, and passes control to the next strategy.
To get started, add the required packages:
<!-- For HTTP client with HttpClientFactory (recommended for ASP.NET Core) --> <PackageReference Include="Microsoft.Extensions.Http.Resilience" Version="8.10.0" /> <!-- For advanced scenarios or non-HTTP operations --> <PackageReference Include="Polly.Extensions" Version="8.10.0" /> <PackageReference Include="Polly.Core" Version="8.10.0" />
The key difference in the v8 API compared to v7:
// OLD APPROACH (Polly v7) var retryPolicy = Policy .Handle<HttpRequestException>() .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))); var circuitBreakerPolicy = Policy .Handle<HttpRequestException>() .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30)); // Wrapping was explicit and order was counterintuitive var policy = Policy.WrapAsync(retryPolicy, circuitBreakerPolicy); var result = await policy.ExecuteAsync(() => _httpClient.GetStringAsync(url)); // NEW APPROACH (Polly v8) var pipeline = new ResiliencePipelineBuilder<HttpResponseMessage>() .AddRetry(new RetryStrategyOptions<HttpResponseMessage> { MaxRetryAttempts = 3 }) .AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage> { FailureRatio = 0.5 }) .AddTimeout(TimeSpan.FromSeconds(10)) .Build(); var result = await pipeline.ExecuteAsync( async ct => await _httpClient.GetAsync(url, ct), cancellationToken);
With the new API, the order in which you add strategies to the pipeline matches the order in which they are traversed, from outside to inside. This is more intuitive and reduces configuration errors.
Retry policy: configuring retries intelligently
Retry is the simplest pattern and the one most frequently misconfigured. The basic logic is obvious: if an operation fails, retry it. The devil is in the details: how many times, with what interval, for which types of errors, and what to do when the final retry also fails.
Exponential backoff and jitter
The most common mistake is using a fixed interval between retries. With a fixed interval, if 100 clients fail simultaneously (for example after a brief server downtime), they all retry at exactly the same moment, generating a traffic spike that can be worse than the original problem. This phenomenon is called the "thundering herd."
The correct solution combines exponential backoff (increasing delay with each attempt) with jitter (random variation):
builder.Services.AddHttpClient<IMyServiceClient, MyServiceClient>() .AddResilienceHandler("my-service-retry", pipeline => { pipeline.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 4, Delay = TimeSpan.FromSeconds(2), MaxDelay = TimeSpan.FromSeconds(30), BackoffType = DelayBackoffType.Exponential, // 2s, 4s, 8s, 16s (with jitter) UseJitter = true, // Adds random variation (+/- 25% of delay) ShouldHandle = args => args.Outcome switch { { Exception: HttpRequestException } => PredicateResult.True(), { Result.StatusCode: HttpStatusCode.TooManyRequests } => PredicateResult.True(), { Result.StatusCode: HttpStatusCode.ServiceUnavailable } => PredicateResult.True(), { Result.StatusCode: HttpStatusCode.BadGateway } => PredicateResult.True(), { Result.StatusCode: HttpStatusCode.GatewayTimeout } => PredicateResult.True(), _ => PredicateResult.False() }, OnRetry = args => { _logger.LogWarning( "Retry {AttemptNumber} for {OperationKey}. Delay: {Delay}ms. Reason: {Outcome}", args.AttemptNumber, args.Context.OperationKey, args.RetryDelay.TotalMilliseconds, args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString()); return ValueTask.CompletedTask; } }); });
Circuit Breaker: protecting the system from cascading failures
The circuit breaker takes its name from the electrical device: when current exceeds a safety threshold, the circuit opens and interrupts the flow. In software, when a service exceeds a failure threshold, the circuit breaker "opens" calls to that service, protecting both the calling system and the called one.
The problem it solves is subtle but critical: without a circuit breaker, a retry policy responds to failures by increasing traffic toward an already struggling service. This creates a vicious cycle where the service that struggles to respond is further overwhelmed by retry requests, making the situation worse.
The three states of the circuit breaker
- Closed: normal state, calls pass through. The circuit breaker monitors the failure rate.
- Open: the failure rate has exceeded the threshold. Calls are blocked immediately with
BrokenCircuitException, without even attempting the real call. - Half-Open: after the break period, a limited number of test calls are allowed. If they succeed, the circuit breaker returns to Closed. If they fail, it returns to Open.
pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions { FailureRatio = 0.5, MinimumThroughput = 10, SamplingDuration = TimeSpan.FromSeconds(30), BreakDuration = TimeSpan.FromSeconds(60), OnOpened = args => { _logger.LogError( "Circuit breaker OPENED for service. Reason: {Outcome}. Duration: {Duration}s", args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString(), args.BreakDuration.TotalSeconds); return ValueTask.CompletedTask; }, OnClosed = args => { _logger.LogInformation("Circuit breaker CLOSED. Service has recovered."); return ValueTask.CompletedTask; } });
Timeout policy: enforcing explicit limits on slow operations
Timeouts are the simplest defense mechanism and the most often neglected. The default HttpClient timeout in .NET is 100 seconds: in production, this means a call that does not respond holds a thread (and potentially a connection pool connection) for nearly two minutes.
With Polly you can define two distinct timeout levels: one per attempt and one for the entire operation (including retries). This distinction is fundamental.
builder.Services.AddHttpClient<IOrderServiceClient, OrderServiceClient>() .AddResilienceHandler("order-service", pipeline => { // TOTAL timeout for the entire operation (including all retries) pipeline.AddTimeout(TimeSpan.FromSeconds(15)); // Retry: max 2 additional attempts pipeline.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 2, Delay = TimeSpan.FromSeconds(1), BackoffType = DelayBackoffType.Exponential }); // Timeout per SINGLE attempt pipeline.AddTimeout(TimeSpan.FromSeconds(4)); });
Order matters: the total timeout must be added before the retry in the chain, and the per-attempt timeout after. This way the total timeout wraps the entire retry operation, while the per-attempt one resets with each new attempt.
Bulkhead and Rate Limiter: limiting concurrency to protect resources
The bulkhead (named after ship bulkheads) is a pattern that isolates resources: it limits how many concurrent requests can go to a particular service, and optionally how many can queue while waiting. The goal is to prevent a slow service from consuming all available resources (thread pool, connections, memory) and impacting other application features.
// Bulkhead: max 10 concurrent requests to the inventory service, // with a queue of max 20 waiting requests pipeline.AddConcurrencyLimiter(new ConcurrencyLimiterStrategyOptions { MaxConcurrentExecutions = 10, QueueingStrategy = QueueingStrategy.DropOldest, QueueCapacity = 20, OnQueueFull = args => { _logger.LogWarning( "Bulkhead queue full for {OperationKey}. Request dropped.", args.Context.OperationKey); return ValueTask.CompletedTask; } });
Fallback: what to do when everything else fails
Fallback is the last line of defense in the resilience pipeline. When retry, circuit breaker and everything else fail to get a positive response, the fallback defines what to return. It can be a default value, data read from cache, a partial response, or a more meaningful structured error than a generic exception.
pipeline.AddFallback(new FallbackStrategyOptions<IEnumerable<ProductDto>> { ShouldHandle = new PredicateBuilder<IEnumerable<ProductDto>>() .Handle<BrokenCircuitException>() .Handle<TimeoutRejectedException>() .Handle<HttpRequestException>() .HandleResult(r => r == null), FallbackAction = async args => { _logger.LogWarning( "Fallback activated for {OperationKey}. Loading data from cache.", args.Context.OperationKey); var cached = await _cache.GetAsync<List<ProductDto>>("catalog:all"); if (cached?.Count > 0) return Outcome.FromResult<IEnumerable<ProductDto>>(cached); _logger.LogError("Cache empty. Returning empty list as extreme fallback."); return Outcome.FromResult<IEnumerable<ProductDto>>(Enumerable.Empty<ProductDto>()); } });
Combining policies in a pipeline: the order that matters and pitfalls to avoid
When combining multiple strategies in a pipeline, the order in which you add them determines the order in which they are traversed. In Polly v8, strategies are added from outside to inside: the first added is the outermost.
Recommended order for a complete pipeline, from outside to inside:
- Fallback: outermost, catches any unhandled exception from all other layers
- Total timeout: limits the maximum time for the entire operation including retries
- Retry: retries the operation on failure
- Circuit Breaker: intercepts retries and blocks calls when the service is in fault state
- Per-attempt timeout: innermost, applies to each individual attempt
builder.Services.AddHttpClient<IInventoryClient, InventoryClient>() .AddResilienceHandler("inventory-pipeline", pipeline => { // 1. Fallback (outermost) pipeline.AddFallback(new HttpFallbackStrategyOptions { FallbackAction = args => Outcome.FromResultAsValueTask( new HttpResponseMessage(HttpStatusCode.OK) { Content = JsonContent.Create(new { fromCache = true, items = Array.Empty<object>() }) }), ShouldHandle = new PredicateBuilder<HttpResponseMessage>() .Handle<BrokenCircuitException>() .Handle<TimeoutRejectedException>() }); // 2. Total timeout: 20 seconds for the entire operation pipeline.AddTimeout(TimeSpan.FromSeconds(20)); // 3. Retry: max 3 attempts with exponential backoff pipeline.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 3, Delay = TimeSpan.FromSeconds(2), BackoffType = DelayBackoffType.Exponential, UseJitter = true }); // 4. Circuit Breaker: opens after 50% failures on 10+ calls pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions { FailureRatio = 0.5, MinimumThroughput = 10, SamplingDuration = TimeSpan.FromSeconds(30), BreakDuration = TimeSpan.FromSeconds(60) }); // 5. Per-attempt timeout: 4 seconds pipeline.AddTimeout(TimeSpan.FromSeconds(4)); });
The most common pitfall is placing the circuit breaker before the retry. With this ordering, the circuit breaker sees only the final result of the retry operation (not each individual attempt), losing the ability to open quickly during a series of failures.
Polly with HttpClientFactory in ASP.NET Core: the 2026 standard pattern
The integration of Polly with HttpClientFactory in ASP.NET Core is the recommended way to use HTTP resilience in any modern application. HttpClientFactory manages the lifecycle of HttpClient instances, avoiding the socket exhaustion problems of manual creation, and the integration with Polly via AddResilienceHandler adds resilience transparently.
var builder = WebApplication.CreateBuilder(args); // Option 1: Standard Resilience Handler (sensible defaults) builder.Services.AddHttpClient("external-api") .ConfigureHttpClient(c => c.BaseAddress = new Uri("https://api.external-service.com")) .AddStandardResilienceHandler(options => { options.Retry.MaxRetryAttempts = 4; options.CircuitBreaker.BreakDuration = TimeSpan.FromSeconds(30); options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(20); }); // Option 2: Custom Resilience Handler (full control) builder.Services.AddHttpClient<IECommerceApiClient, ECommerceApiClient>() .ConfigureHttpClient(c => { c.BaseAddress = new Uri(builder.Configuration["Services:ECommerce:BaseUrl"]!); c.DefaultRequestHeaders.Add("X-Api-Key", builder.Configuration["Services:ECommerce:ApiKey"]); }) .AddResilienceHandler("ecommerce-resilience", (pipeline, context) => { pipeline.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 3, Delay = TimeSpan.FromSeconds(1), BackoffType = DelayBackoffType.Exponential, UseJitter = true }); pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions { FailureRatio = 0.5, MinimumThroughput = 8, SamplingDuration = TimeSpan.FromSeconds(30), BreakDuration = TimeSpan.FromSeconds(45) }); pipeline.AddTimeout(TimeSpan.FromSeconds(8)); });
Observability with Polly: logging, metrics and telemetry with OpenTelemetry
A resilience pipeline that produces no metrics is like a circuit breaker with no status indicator: you do not know when it opens, how often retries are triggered, or whether your timeouts are correctly calibrated. In production, Polly observability is essential.
Polly v8 has native integration with System.Diagnostics.Metrics and OpenTelemetry. Simply add the telemetry listener and metrics are emitted automatically for every strategy in the pipeline.
builder.Services .AddOpenTelemetry() .WithMetrics(metrics => { metrics .AddAspNetCoreInstrumentation() .AddHttpClientInstrumentation() .AddRuntimeInstrumentation() .AddMeter("Polly") // Automatically adds Polly metrics .AddPrometheusExporter(); }) .WithTracing(tracing => { tracing .AddAspNetCoreInstrumentation() .AddHttpClientInstrumentation() .AddSource("Polly"); // Polly emits spans for each attempt });
Polly automatically emits the following key metrics:
resilience.polly.strategy.events: counts events by type (retry, circuit-breaker-state-change, timeout, etc.) with tags for pipeline name and event type.resilience.polly.strategy.duration: histogram of execution durations.resilience.polly.strategy.attempts: number of attempts per operation.
Polly for AI agents: resilience for LLM provider calls
If you are building AI agents with Semantic Kernel or RAG systems that call LLM providers, Polly becomes even more critical. LLM providers have peculiar failure patterns: aggressive rate limits with 60-second windows, highly variable latencies (from 500ms to 60s depending on response length), model updates that cause brief periods of unavailability.
For a deep dive into building AI agents with .NET and Semantic Kernel, we have a dedicated article. Here we focus on how Polly protects these calls.
builder.Services.AddHttpClient("openai-client") .ConfigureHttpClient(c => { c.BaseAddress = new Uri("https://api.openai.com"); c.Timeout = Timeout.InfiniteTimeSpan; // Polly manages timeouts }) .AddResilienceHandler("llm-resilience", pipeline => { // Total timeout for the entire chain (including retries) pipeline.AddTimeout(TimeSpan.FromSeconds(90)); // Retry with Retry-After header support pipeline.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 3, ShouldHandle = args => new ValueTask<bool>( args.Outcome.Result?.StatusCode == HttpStatusCode.TooManyRequests || args.Outcome.Result?.StatusCode == HttpStatusCode.ServiceUnavailable || args.Outcome.Exception is HttpRequestException ), DelayGenerator = args => { if (args.Outcome.Result?.Headers.RetryAfter?.Delta is TimeSpan retryAfter) return new ValueTask<TimeSpan?>(retryAfter + TimeSpan.FromSeconds(1)); return new ValueTask<TimeSpan?>( TimeSpan.FromSeconds(Math.Pow(2, args.AttemptNumber + 1))); } }); // Circuit breaker: open after 60% failures on 5 calls in 2 minutes pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions { FailureRatio = 0.6, MinimumThroughput = 5, SamplingDuration = TimeSpan.FromMinutes(2), BreakDuration = TimeSpan.FromMinutes(3) }); // Per-attempt timeout for LLM pipeline.AddTimeout(TimeSpan.FromSeconds(60)); });Polly is not an option for .NET applications in production: it is a requirement. Every call to an external system without a resilience policy is an unhandled failure point that will eventually affect your users.
Domande frequenti
Polly is an open source .NET library that implements resilience patterns like retry, circuit breaker, timeout, bulkhead and fallback. It makes applications more robust against transient errors, unavailable external services and overload conditions, preventing a single failure from cascading through the system.
Polly v8 introduces the Resilience Pipeline as the central construct, replacing individual Policies. The new API is deeply integrated with Microsoft.Extensions.Http.Resilience and the modern .NET ecosystem. The old Policy.Handle and WrapAsync approach is still supported but deprecated. With v8 you define pipelines using ResiliencePipelineBuilder and configure them via AddResilienceHandler in ASP.NET Core.
With Polly v8 you use AddRetry() on the ResiliencePipelineBuilder. You can configure MaxRetryAttempts, Delay, BackoffType (constant, linear or exponential) and ShouldHandle to specify which exceptions or results trigger a retry. Automatic jitter prevents the thundering herd problem when many clients retry simultaneously.
The circuit breaker pattern automatically stops calls to a failing service, preventing you from overwhelming a system already under stress. It has three states: Closed (normal operation), Open (calls are blocked without even attempting) and Half-Open (a limited number of attempts are allowed to verify recovery). Use it whenever you call external services or databases that can become overloaded.
In ASP.NET Core you register the resilience pipeline using AddResilienceHandler() on IHttpClientBuilder. Every HTTP request made through that client automatically benefits from retry, circuit breaker and timeout, without having to manually manage the pipeline in application code.
