AI Agents in .NET with Semantic Kernel: production guide

Q: How do you build an AI agent in .NET?

Building an AI agent in .NET with Semantic Kernel means giving an LLM model the ability to invoke real functions in your system, call APIs, read databases, perform calculations, and reason autonomously about how to combine them to achieve an objective. The fundamental components are: the Kernel , central orchestrator, plugins , the functions the agent can invoke, ChatHistory , conversational context, memory , persistent context between conversations, and the ChatCompletionAgent , the actual agent with its instructions. The most critical feature for production agents is observability: every tool call must be logged with parameters and results, and irreversible actions must always pass through a human approval mechanism.

How do you build an AI agent in .NET?

Building an AI agent in .NET with Semantic Kernel means giving an LLM model the ability to invoke real functions in your system, call APIs, read databases, perform calculations, and reason autonomously about how to combine them to achieve an objective.

The fundamental components are: the Kernel, central orchestrator, plugins, the functions the agent can invoke, ChatHistory, conversational context, memory, persistent context between conversations, and the ChatCompletionAgent, the actual agent with its instructions.

The most critical feature for production agents is observability: every tool call must be logged with parameters and results, and irreversible actions must always pass through a human approval mechanism.

AI Agents in .NET with Semantic Kernel: production guide

Matteo Migliore is an entrepreneur and software architect with over 25 years of experience developing .NET-based solutions and evolving enterprise-grade application architectures.

He has led enterprise projects, trained hundreds of developers, and helped companies of all sizes simplify complexity by turning software into profit for their business.

A manager walks into the Monday morning meeting with a fixed idea.

He saw a demo on LinkedIn: an AI agent that handles customer support for a company, responds to requests, opens internal tickets, updates the CRM, sends notifications.

All autonomously. All in real time.

And now he is looking at you, the senior developer in the room, with that question already forming on his lips: "Can we do this too?"

Silence. Because you already know how it ends.

That demo ran on a fake dataset, with a model costing thirty cents per call, no logging, no security, nothing that holds up after the first week in production with real users.

But you cannot say it that bluntly. You need to say something intelligent.

And right now, if you have not already built an AI agent that actually works, you do not have that answer.

This is the situation dozens of developers find themselves in every week in 2026.

It is no longer a matter of curiosity toward an emerging technology: it is a matter of professional competitiveness.

Companies are no longer looking for people who know AI "in theory."

They want people who can build systems that work, that integrate with existing infrastructure, that do not cost a fortune in tokens, and that do not break when a user writes something unexpected.

And the gap between those who can do it and those who can only do the demo is measured today in salary differences, promotions, and project assignments.

The problem is that nearly all AI agent tutorials teach you how to build demos.

They show a kernel with two plugins, an agent that queries Wikipedia or calculates the current date, an example that works locally and has never seen a real user.

Useful for understanding the basic mechanics, useless for anyone who needs to decide whether and how to bring an agent to production on a corporate system with security, observability, and sustainable operational cost requirements.

Everyone talks about AI agents. Almost no one knows how to build them for real.

And this is where tools like Semantic Kernel and .NET come into play, along with the architectural skills to use them professionally.

If you have already read our introductory article on Semantic Kernel, here you will find what comes next: not the first steps, but the real problems that emerge when you stop experimenting and start thinking in terms of production.

What AI agents really are and why they change the way software is built

There is a phrase I often hear in development teams: "We have already integrated AI, we use ChatGPT via API."

Fair enough.

But using an LLM API and building an AI agent are so different they almost should not share the same name.

With a classic LLM call, flow control is entirely in your hands.

You send text, you receive a text response, you decide what to do with it.

The model is a passive tool: it executes when called, responds only to what you ask.

If you want it to fetch data from a database, you do that before the call. If you want it to update a record, you do that after the response.

All decision logic stays in your code, deterministic and testable like any other piece of software.

An AI agent works in a radically different way.

Instead of you deciding the sequence of operations, you provide the agent with a set of functions it can invoke autonomously.

The model reasons about which function to call, with which parameters, evaluates the intermediate result, and decides whether its goal has been reached or whether another step is needed.

You define the system's capabilities. The model decides the execution strategy.

This is a shift of responsibility with profound implications for how software is designed, tested, and brought to production.

This pattern, known as ReAct (Reasoning and Acting), is the foundation of all modern agent-based systems.

In Semantic Kernel it is implemented natively: the model receives the list of available tools with their descriptions, decides which to invoke, executes the calls, reasons about the results, and continues until it considers the task complete.

Each cycle generates new messages in the conversation history, which are sent back to the model along with the tool results.

An AI agent is not an improved chatbot: it is a component that makes decisions.

Designing a system that makes decisions is a different activity from designing a system that answers questions.

Why companies are investing in AI agents in 2026

In sectors where case variability is high, such as customer support, document analysis, and integration between heterogeneous systems, development teams have always written enormous amounts of code to handle exceptions, variants, and unexpected scenarios.

Every new edge case required new code, new tests, new maintenance.

A well-designed agent covers this variability with a set of general functions and behavioral instructions, rather than infinite conditional branches.

The practical difference for a team: with traditional LLM calls, code grows linearly with problem complexity.

With an agent, you delegate planning of intermediate steps to the model.

The code you write defines what the system can do, not how it must do it in every possible scenario.

For high-variability problems, this drastically reduces the code surface to maintain.

It is not magic and it is not risk-free, but the ratio between complexity of the problem solved and code to write is significantly better for a specific class of problems.

When using an agent makes sense and when it is overkill

Agents shine where the number of possible combinations of steps is too high to be explicitly coded.

You can read this concept more concretely by translating it into real situations:

When the flow changes based on user input
When intermediate steps are not predictable upfront
When the number of combinations grows rapidly
When writing everything in code would lead to hundreds of conditions

Do not use an agent when the workflow has fixed and known steps: classic imperative code is more predictable, more testable, and much cheaper in terms of tokens and latency.

A good heuristic: if you can draw a complete flowchart of the problem with fewer than ten decision nodes, you probably do not need an agent.

If the diagram requires hundreds of branches to cover all real cases, an agent is the right choice.

LLM vs AI agents: the difference that determines your system architecture

AI agents in .NET for adaptive flows and robust decision-making

The confusion between LLMs and AI agents is understandable, because both use a language model as their central component.

But the architectural difference is radical, and choosing the wrong approach for a given problem leads to systems that are too expensive, too fragile, or both.

A system based on direct LLM calls has a simple structure: your code calls the model, interprets the response, acts accordingly.

The flow is deterministic and completely under your control.

It is the right choice for the vast majority of AI use cases: answering questions on a document domain, classifying text, generating structured content, summarizing information.

Simple, predictable, testable with standard tools.

An agent-based system has a fundamentally different structure: the model drives the flow, not the other way around.

The agent receives a goal, not a question. It decides which steps to execute, in which order, with which parameters.

The outcome is not predictable in advance because it depends on the model's runtime decisions, which in turn depend on intermediate results.

This makes agent-based systems powerful for complex problems and harder to debug and test compared to traditional code.

If you want to make this difference truly operational, you can read it like this:

Aspect	Classic LLM Call	AI Agent
Flow control	Managed by application code	Delegated to the model
Determinism	High	Low
Debugging	Linear and predictable	Log-based and reasoning-driven
Costs	Controllable	Variable and potentially high
Adaptability	Limited	High
When to use it	Predictable flows	High-variability problems

Understanding this difference allows you to make a conscious architectural choice, not to pick the most advanced technology available.

In many cases that seem to require an agent, a classic pipeline with some integrated LLM calls works better: it is faster, costs less, and is testable with standard tools.

An agent is worth the complexity when adaptability is genuinely necessary, not as the default for any AI integration.

Viewing your system as "AI-powered" because it uses an agent instead of a direct call is a priority mistake many teams make.

The hidden cost of the wrong choice

An agent that invokes five functions to answer a simple question uses far more tokens than a single direct LLM call.

The conversation history grows with each iteration, further increasing the cost.

In production, with hundreds of sessions per day, the difference between choosing an agent or a classic call can translate into very different operational costs.

Before choosing the architecture, answer this question: is the solution path to my problem variable enough that I cannot explicitly code it?

If the answer is yes, an agent makes sense. If the steps are predictable and stable, a traditional pipeline is the better choice.

There is also a less obvious cost: debugging complexity.

In practice, when working with an agent, debugging stops being linear:

You no longer follow a deterministic flow line by line
You need to reconstruct the model's reasoning from the logs
You need to understand why it chose one tool over another
You need to distinguish whether the problem is in the code or in the model's behavior

With imperative code, you know exactly which line produced a certain behavior.

With an agent, debugging changes its nature entirely.

You have to reconstruct the model's reasoning from the invocation logs, understand why it chose a certain tool over another, and determine whether the problem is in the plugin, the system instructions, or the model itself.

This additional complexity must be compensated by a real benefit in flexibility or capability.

If you are reading this section with the feeling that "ok, it is clearer now..." but you still would not know how to make an architectural decision on a real project, stop for a moment.

Because that is exactly where the gap forms.

Between those who understand the difference between LLMs and agents, and those who know how to use it to design systems that do not collapse in production.

The truth is that no tutorial teaches you this step.

They teach you to use the tools. Not to reason like someone who makes decisions that impact budgets, stability, and careers.

This is where everything changes.

If you want to stop being the one who "implements" and start being the one who decides how to truly build AI systems, take a look at the full program: AI Programming Course.

It is not a course just to learn how to use AI.

It is a path to become the professional others consult when decisions become critical.

Stop improvising with AI

Why Semantic Kernel has become the standard for AI agents in .NET

In 2025, if you had wanted to build an AI agent in Python you would have had LangChain, LlamaIndex, AutoGen, and a dozen other frameworks to choose from.

In .NET, the choice was less obvious.

But over the past twelve months, Semantic Kernel has consolidated a dominant position in the enterprise .NET ecosystem, and there are concrete reasons for this adoption that go beyond the fact that it is made by Microsoft.

The first reason is integration with the Azure ecosystem.

Semantic Kernel integrates natively with Azure OpenAI, Azure AI Search, Azure AI Foundry, and Application Insights.

For companies already on Azure, this means zero extra configuration for authentication, logging, and monitoring.

This is not a minor detail: it reduces weeks of infrastructure configuration work that in a typical project is billed at a daily rate.

The second reason is LLM provider neutrality.

The same code that runs with Azure OpenAI in production runs with Ollama locally during development, and potentially with Anthropic's Claude or Google's Gemini if requirements change.

The abstraction is concrete: when you change the provider, you change the kernel configuration, not the agent code.

For a team that must maintain the system for years, this flexibility has real economic value.

The third reason is integration with the standard .NET dependency injection.

Semantic Kernel uses IServiceCollection and the builder pattern you already know. Plugins receive dependencies via constructor like any other service. It does not impose architectural patterns different from those already used in the project.

For those who want to explore enterprise AI programming with Semantic Kernel, this ecosystem is the exact starting point.

For a senior .NET team, the framework feels familiar from day one.

The Agents.Core package is separate from the core precisely because those who use the framework only for LLM calls do not have to carry the agents code.

This separation reflects the philosophy of the framework: every advanced feature is an optional layer on top of the core.

The project structure that scales

Separating plugins, agents, and filters into dedicated folders is not an organizational preference: it is the structure that allows adding a new plugin without touching the agent code, modifying instructions without impacting plugins, and adding logging or security filters without changing the application logic.

An Agents/ folder for agent definitions with their system instructions, Plugins/ for the functions agents can invoke, Filters/ for observability and security, Infrastructure/ for the kernel factory: this is a structure that is understood at first glance, maintained over time, and explained to a new team member in five minutes.

These details make the difference when the project is six months old and the plugin list has grown to twenty entries.

How to design effective plugins for your AI agents

Plugins are the most critical part of any agent.

Every design mistake translates into an agent that invokes functions in the wrong context, with incorrect parameters, or that fails to complete tasks it should handle without difficulty.

Designing plugins well is the skill that more than any other separates an agent that works in a demo from one that holds up in production.

The fundamental principle, and the most counterintuitive for those coming from traditional development: the descriptions in the attributes are not documentation for the development team.

They are instructions for the LLM model. The model reads them to decide if and when to invoke each function.

A vague description generates incorrect invocations. A description that explains the context of use generates a reliable agent.

The parameter includes format examples that help the model correctly extract the ID from the user's message.

Here is what truly makes this type of plugin effective:

The description explains when to use the function, not just what it does
Parameters include concrete input examples
Error messages are written for the end user
Each invocation is logged for debugging
The usage context is explicit and unambiguous

The error message is written for the end user, not the developer. The cancellation token is propagated to allow cancellation in case of timeout.

Each invocation is logged with the parameters for debugging.

How many functions to expose to an agent

The most capable models handle lists of twenty to thirty functions well.

More economical models have worse performance when the list is too long: they start confusing tools, invoking the wrong ones, ignoring the relevant ones.

The practical rule: keep the number of functions under fifteen to twenty per agent.

If more are needed, consider dynamic plugin selection based on conversation context rather than having them all always available.

A common mistake in teams starting with agents: exposing internal domain APIs as plugins, with the same names and signatures used in the application code.

Plugin functions must be designed for the LLM model, not the developer.

Names must be clear in natural language. Descriptions must contextualize the use. Return messages must be understandable as plain text.

A function called GetOrderV2ByIdFromCrmAsync with a generic description is a joke for the model.

A function called GetOrderStatusAsync with a description explaining when to use it is a precise tool.

Building and configuring the AI agent in .NET with Semantic Kernel

With the kernel configured and the plugins registered, creating the agent is technically simple.

The real complexity lies in the system instructions: they are the most important part of the entire architecture and almost always the most underestimated.

They define the behavioral contract of the agent: what it can do, what it must never do, how it should communicate, how to handle edge cases.

Any behavior not specified in the instructions becomes non-deterministic.

The model will handle those cases in the way it considers most appropriate based on its training, and this rarely matches what you want in an enterprise system.

Effective instructions use explicit lists for permissions and constraints, specify the response format when relevant, include guidance on how to handle error situations and out-of-scope requests.

A gap in the instructions is a vector of non-deterministic behavior that will surface sooner or later, inevitably, at the worst possible moment.

The FunctionChoiceBehavior.Auto() parameter enables agent behavior: the model autonomously decides if and when to invoke the available functions.

Without this configuration, the agent responds with text only without invoking anything, behaving like a standard LLM call.

MaximumAutoInvokeAttempts defines how many invocation cycles are allowed before the model must conclude: keep this explicit in production, do not rely on the default value.

The instructions every production agent should have

A production agent needs instructions that cover at least four areas:

Context and role (who it is and which system it works for)
Explicit capabilities (what it can do)
Explicit constraints (what it must never do)
Instructions for error handling and edge cases (how to behave when a tool fails or the request is out of scope)

An uncovered area creates a behavioral gap.

Behavioral gaps are found in production, not during development.

Testing instructions with a suite of edge case scenarios, including prompt injection attempts, before release is a practice that cannot be skipped.

Configuring an agent is easy.

Write two instructions, enable auto invoke, connect a few plugins, and everything seems to work.

Then production arrives.

And there you understand that the problem was not "getting it to start." It was building something that does not break.

Incomplete instructions. Unpredictable behaviors. Costs that explode without apparent reason.

This is where the world divides.

On one side those who followed tutorials. On the other those who learned to truly design systems.

If you want to avoid months of trial, error, and unstable systems, and learn an approach that works even when the context becomes real, look here: AI Programming Course.

Because the difference is not knowing how to use Semantic Kernel. It is knowing what to do with it when you no longer have room for error.

Build agents that don't break in production

AI agent memory: session context and persistence across conversations

Semantic Kernel in .NET manages secure agent memory

AI agent memory is divided into two levels with very different characteristics: conversation history, which lasts for the duration of a session, and persistent memory, which survives across different sessions.

Understanding this distinction and managing it correctly is one of the areas where the most frequent production errors occur, with consequences ranging from runaway token costs to privacy violations.

Conversation history is an object containing all messages exchanged between user, agent, and functions during a session.

It is the mechanism that transforms a sequence of independent LLM calls into a real conversation with context memory.

Without it, every message is treated as a new conversation.

The agent remembers nothing from the previous turn, cannot reference information provided earlier, and cannot complete multi-step tasks that span multiple dialogue turns.

The problem that emerges in production is that the history grows.

Every function invocation adds messages. Every agent response adds messages.

After dozens of turns the context becomes enormous, token costs grow proportionally, and the risk of exceeding the model's maximum context becomes real.

For long conversations, a deliberate reduction strategy is necessary.

Strategies for managing conversation history growth

The simplest strategy is the sliding window: keep only the last N messages, discarding the oldest.

Simple to implement, but loses context for very long conversations.

A window of twenty to thirty messages is a good starting point for most enterprise cases: it covers medium-complexity conversations without letting the context grow uncontrolled.

The alternative strategy is summarization: before discarding the oldest messages, summarize them with a model call and insert the summary as a system message.

This maintains semantic context but adds the cost of an extra LLM call.

Worth it in scenarios where conversations last hours and accumulated context is critically important for response quality.

At this point, the choice between the two strategies becomes much more concrete:

Strategy	How it works	Advantages	Disadvantages	When to use it
Sliding window	Keeps only the last N messages	Simple, fast, inexpensive	Loses historical context	Short or medium conversations
Summarization	Summarizes old messages	Preserves semantic context	Additional LLM cost	Long and complex conversations

Persistent memory across sessions

For many enterprise use cases, something more than session history is needed.

An agent that remembers user preferences across sessions, that knows the relevant documents for the domain, that retrieves context from relevant past conversations.

Semantic Kernel provides an abstraction layer over vector databases to save and retrieve memories via semantic search.

Instead of a query that looks for exact matches, you search by meaning: a query like "how does this customer prefer to receive packages" returns relevant results even if those memories do not contain exactly those words.

The choice of vector backend depends on existing infrastructure: Azure AI Search is the natural choice for those already working on Microsoft Azure, while solutions like Qdrant or Weaviate are better suited to on-premises deployments running on owned infrastructure without depending on external cloud services.

Changing the backend does not require changing the agent code.

To explore the topic of semantic memory and the retrieval-augmented generation pattern, read our article on AI memory and RAG for .NET applications.

To understand how LLMs communicate with external context systems according to the emerging standard, also read our article on the Model Context Protocol.

How to prevent infinite loops and unexpected behavior in AI agents

There is a moment that almost every developer who has worked seriously with AI agents remembers: you look at the cost dashboard and notice an anomalous spike.

You open the logs and find an agent that invoked the same plugin seven consecutive times, with slightly different parameters each time, never arriving at a final response.

The reason, you discover after twenty minutes of debugging, is an ambiguous description in one of the functions that put the model in a circular reasoning loop.

Meanwhile, the token cost for that single session is ten times what was expected.

Infinite loops in AI agents are not a rare exception: they are one of the most common problems during development, and one of the most costly when they appear in production.

They almost always arise from these scenarios:

The agent cannot reach the goal but keeps trying
A plugin returns an error and the agent repeats it without adapting
System instructions do not clearly define when to stop
Tool descriptions are ambiguous and generate decision loops

The maximum invocation limit is the first line of defense.

Do not rely on the framework's default value: set it deliberately based on the type of agent and the complexity of the tasks it needs to handle.

A customer support agent rarely needs more than six to eight invocations to respond to a request.

A complex data analysis agent might need fifteen.

Set the limit deliberately, monitor the average number of invocations per session in production, and lower it if you see sessions frequently reaching it without legitimate reason.

The second cause of unexpected behavior is incomplete system instructions.

If the agent does not know how to handle a specific case, the model improvises. Sometimes it improvises well.

Often not.

Always test instructions with a suite of edge cases before release: out-of-scope requests, errors in data returned by plugins, malformed inputs, ambiguous requests that could be interpreted two different ways.

Patterns for detecting and interrupting anomalous behavior

Beyond the invocation limit, implement a monitoring filter that tracks the invocation pattern in the current conversation.

If the same plugin is invoked with the same parameters three consecutive times, the agent is probably in a loop.

A filter that detects this pattern and interrupts it with an explicit error message is much more useful than a generic timeout.

It allows the agent to respond to the user with a meaningful message instead of silently timing out.

Input sanitization and prompt injection resistance

If users can enter free text that ends up in the agent's context, there is a risk of prompt injection.

Inputs crafted to try to modify the agent's behavior, make it step outside its scope, or get it to execute unauthorized actions.

Always sanitize user input before inserting it into the conversation.

System instructions must explicitly include guidance on how the agent should react to manipulation attempts.

Context contamination in multi-tenant systems deserves separate attention: conversation history and persistent memory must be isolated per user or per session.

Accidentally sharing context between different users is a serious privacy violation and is difficult to detect without structured logs.

Human oversight and safety: the human-in-the-loop pattern with Semantic Kernel

An AI agent without explicit limits is like a junior developer with admin access: intelligent but dangerous.

Never expose to an agent functions that execute irreversible actions without a human approval mechanism.

Deleting records, sending emails, processing payments, modifying system configurations: these actions require a pattern where the agent proposes the action and waits for confirmation before executing it.

This is not a recommended best practice: it is a non-negotiable requirement for any production system that wants to be reliable and compliant with corporate regulations.

The technical mechanism in Semantic Kernel for implementing this pattern is the function invocation filter.

It is a class that intercepts every plugin call before execution.

It is registered in the kernel once and automatically applies to all functions, which means you do not have to modify the code of individual plugins to add the security check.

How to classify actions by risk level

A practical taxonomy for functions exposed to an agent distinguishes three categories.

Read-only actions, such as fetching data, consulting catalogs, performing searches, can be invoked freely without approval.

Reversible low-impact actions, such as updating preferences, saving drafts, creating tickets, may require only logging without an explicit block.

Irreversible or high-impact actions, such as sending communications, processing payments, deleting data, modifying configurations, always require human approval before execution.

This classification must be documented, shared with the team, and updated every time a new plugin is added.

Every function added to the system must be categorized before release.

This is not a bureaucratic activity: it is the minimum governance that allows you to know what the agent can do autonomously and what it cannot, and to answer management's questions when something goes wrong.

Production observability: logs, traces, and metrics

An AI agent in production without observability is like an application without logs: when something goes wrong, you have no means to understand what happened.

Semantic Kernel natively supports OpenTelemetry: each invocation generates spans with standard attributes compatible with Azure Monitor, Jaeger, Grafana, and any compatible backend.

In a system with multiple agents, distributed tracing is the only practical way to reconstruct the full flow of a conversation.

The metrics to monitor in production are: number of invocations per conversation, token cost per session, plugin error rate, latency distribution.

These numbers help understand where agents get stuck most often and whether the operational cost is sustainable over time.

Multi-agent systems in .NET: orchestrating teams of specialized AI

For scenarios where a single agent is not sufficient, Semantic Kernel supports agent groups: multiple specialized agents that collaborate to achieve a common goal.

Each agent has a specific role, a set of functions consistent with that role, and instructions that define its behavior within the group.

It is a powerful pattern, with proportionally higher operational costs and management complexity compared to a single agent.

The pattern is particularly useful for processing pipelines where steps require different expertise.

One agent collects and analyzes data, a second produces an executive report, a third verifies the output quality.

Or for review systems where one agent generates a proposal and a second critiques it, iterating until the result meets the established criteria.

Two configurations are critical for any agent group: the selection strategy, which decides which agent speaks at each turn, and the termination strategy, which decides when the group has completed the work.

The sequential selection strategy, where agents take turns in a fixed order, is the simplest and most predictable.

LLM-based selection, where a model decides which agent is most appropriate to respond based on context, adds costs and latency without necessarily improving results.

For most cases, start with sequential selection and evaluate later whether something more sophisticated is needed.

A concrete use case: code review with two specialized agents

A real-world case in enterprise environments: automated code review of a pull request.

An analyst agent has access to the code and static analysis tools: it identifies potential issues, evaluates complexity, suggests refactoring.

A reviewer agent has access to quality guidelines and the history of past reviews: it evaluates the quality of the analyst's analysis, verifies that security, performance, and testability have been considered, approves or requests further investigation.

The cycle continues until the reviewer is satisfied or the iteration limit is reached.

The key point is that each agent has its own kernel with the plugins appropriate to its role.

The Analyst does not have access to the reviewer's tools and vice versa.

This separation reduces the risk that an agent invokes tools inappropriate for its role and makes the system's behavior more predictable.

The operational cost of multi-agent systems

Multi-agent systems with review cycles are powerful but costly.

Each iteration generates multiple LLM calls. In production, monitor the average number of iterations before termination and the average cost per session.

Some practical optimizations: use a cheaper model for the agent that generates drafts and a more capable one only for the reviewer.

Set a hard limit of two to three maximum iterations to contain costs without sacrificing quality.

Multi-agent systems with review cycles represent the state of the art for complex tasks, but their operational cost requires explicit governance: per-session budget, usage metrics, and alert thresholds are part of the architecture as much as the code itself.

Multiple agents are fascinating. As long as they stay on paper.

Because when you actually start orchestrating multiple agents, problems emerge that no simple example ever shows you: coordination, out-of-control costs, inconsistent decisions, nearly impossible debugging.

And at that point you understand one thing. You are no longer "using AI." You are designing a complex system.

And if you do not have a method, it turns against you.

If you want to learn how to manage this complexity without getting lost, and build systems that truly scale (technically and professionally), come here: AI Programming Course.

Not to do more complex things. But to do them in a way that works when it truly matters.

Manage agents without chaos

How to test an AI agent: strategies for non-deterministic systems

Testing an AI agent is fundamentally different from testing deterministic code.

You cannot expect the same input to always produce the same output: LLM models have a stochastic component. Text output varies between executions.

But you can test the behavioral properties of the system: does the agent invoke the right functions in the appropriate situations? Does it never execute critical actions without approval?

Does it maintain context correctly across turns? Does it respond with a sensible message when a plugin fails?

The most effective strategy is to decouple the test of agent behavior from the test of plugin logic.

For plugins, classic unit tests in C#: they are functions with defined inputs and outputs, completely deterministic.

For the agent, mock the plugins and verify that the correct functions are invoked with reasonable parameters in response to specific inputs.

This allows you to isolate problems: if the agent does not work correctly in a test, you know the problem is in the agent's behavior, not in the plugin logic.

The pattern is to verify that the right plugins were invoked with reasonable parameters, not that the text output is identical to an expected value.

Functional behavior, which functions are called and with which parameters, is much more stable than text output and is reliably verifiable even considering the stochastic nature of the model.

Behavioral regression testing

For end-to-end integration tests, maintain a set of representative conversational scenarios with expected inputs and outputs, run periodically against the real model.

These serve to detect behavioral regressions introduced by LLM model updates, changes to system instructions, or new plugins that interfere with existing ones.

A behavioral regression is when the agent stops doing something it was doing correctly, or starts doing something it should not, without your code having changed.

It happens when the provider updates the base model, when new plugins are added that the model prefers to existing ones in certain contexts, or when system instructions for one agent are modified without verifying the impact on others.

Without a regression test suite, you do not notice until a user complaint arrives.

These tests are expensive in tokens and slow: they should not run on every build, but at least weekly in staging and always before a production release.

When to use an AI agent and when it is the wrong choice for the problem

After seeing how to build complex agents, it is important to take a step back.

AI agents are not the right solution for every problem involving an LLM.

Using them indiscriminately leads to systems that are more expensive, slower, harder to debug, and often worse, not better, than simpler solutions.

Signs that an agent is the right choice: the problem has high variability in solution paths.

The steps to execute depend on the results of previous steps in ways not predictable upfront.

Coding all possible cases in imperative code would require hundreds of conditional branches that are difficult to maintain over time.

Signs that an agent is excessive: the workflow has fixed and predictable steps.

The operations are primarily calls to external APIs without complex decision logic.

The additional latency introduced by the agent-model-plugin cycle is not acceptable for the use case, and the token cost per session exceeds the value of the service delivered.

In these cases, a simpler solution works better on every dimension: cost, speed, reliability, maintainability.

Alternatives to agents for the most common cases

For many scenarios that seem to require an agent, a simpler solution works better.

A system with direct LLM calls and manual routing based on text classification is sufficient for customer support with limited scope and predictable steps.

A processing pipeline with fixed steps in imperative code is more predictable and testable for deterministic workflows.

A RAG system, with semantic retrieval over a document base and response generation, is the right choice when the problem is answering questions about documents, not executing actions on systems.

Agents add value when the problem requires genuine adaptive planning, not when the task is answering questions with context retrieved from a database.

Confusing the two scenarios is one of the most common mistakes in teams starting to work with AI systems.

For a comparison between different architectures, also read our analysis on vibe coding and AI-assisted development.

The AI agent market in 2026 and the skills that make the difference

With .NET and Semantic Kernel you bring AI to production

The AI agent market in 2026 has stopped being one of experimentation.

Medium and large companies are bringing to production the first real agent-based systems, primarily in three areas: customer support automation, document analysis and classification, integration between legacy systems and new cloud platforms.

In manufacturing, financial, and distribution sectors, the most frequent requests involve agents that integrate with existing ERP and CRM systems, that meet stringent security requirements even with sensitive data, and that provide complete logs of all actions performed for regulatory compliance.

It is not a coincidence that these sectors are the most active: they are also the ones with the highest process variability and the highest cost of personnel dedicated to handling exceptional cases that a traditional automated system cannot cover.

The professional profile these companies are looking for is not the data scientist who trains models.

It is the senior developer who knows how to design integrated AI systems with existing infrastructure.

Someone with a solid .NET background, who understands architectural patterns for distributed systems, knows when to use an agent and when not to, and has the experience to bring it to production with the reliability and observability guarantees required by enterprise environments.

The gap between those who build demos and those who build systems

Semantic Kernel is today the de facto standard for AI agents in enterprise .NET environments.

Its adoption has grown significantly over the past year thanks to integration with Azure AI Foundry and support for the MCP protocol.

Developers who know it in depth, not just the basics, have a concrete advantage in today's market. But framework knowledge alone is not enough.

The most common gap in teams starting with agents is this: they know how to build a prototype that works in a demo, but struggle to bring it to production with the reliability, observability, and security requirements that companies demand.

The difference is not technical in the strict sense: it is architectural.

It concerns decisions made before writing the first line of code: where to set the agent's autonomy boundaries, how to structure plugins to be effective for the model, which metrics to monitor, how to handle edge cases and irreversible actions.

How to effectively train on .NET AI agents

Training on agent-based systems requires a hands-on approach.

Reading documentation and watching tutorials solves the first-steps problem, but does not prepare you to handle real production problems.

Real problems include unexpected model behavior, out-of-control costs, behavioral regressions after an update, and security incidents related to prompt injection or context contamination.

The most effective path combines solid architectural foundations with practice on real scenarios that mirror those encountered in an enterprise project.

To explore how Semantic Kernel integrates into the .NET ecosystem and how to structure an adoption plan for a team, read our article on the Semantic Kernel framework.

Do you want to learn how to design and deploy robust AI agent systems in .NET with a solid architectural approach?

Discover BestDeveloper's training program for AI systems architects.

If you have read this article to the end, one thing is clear. You are no longer in the phase where trying AI is enough.

You have understood where the real problems are. You have seen what happens when you move from demo to production.

You have seen how quickly a seemingly simple system can become unmanageable.

And above all you have understood this: the real gap is not between those who use AI and those who do not. It is between those who know how to build systems that hold up, and those who do not.

You can keep trying. Accumulating attempts. Hoping that eventually everything will come together.

Or you can make the leap.

Not the technical one. The mental one. Moving from a developer who integrates tools to a professional who designs AI systems with criteria.

If this is the step you want to take, here is the program: AI Programming Course.

It is not yet another course to "learn AI."

It is a program built for those who want to stop being an executor, and start reasoning like someone who makes decisions that impact architecture, costs, and careers.

If you are still looking for yet another tutorial, that is fine.

But know that out there, they do not pay for people who can do a demo. They pay for those who can make things work when they truly matter.

Frequently asked questions

An AI agent is a system that uses an LLM model to reason about an objective and make autonomous decisions about which actions to take to achieve it. Unlike a simple LLM call, an agent has access to tools, functions it can invoke, can reason about intermediate results, and plan multiple steps.

An OpenAI client handles HTTP communication with the API. Semantic Kernel adds an orchestration layer: plugin management, tools, persistent memory, automatic step planning, AutoInvokeKernelFunctions, support for multiple collaborating agents, and integration with Azure AI and other providers.

Yes, with appropriate precautions. Production agents require: explicit limits on available tools, logging of every action taken, human approval mechanisms for irreversible actions, human-in-the-loop, error handling and infinite loop prevention, and rigorous testing of edge cases.

Yes. Semantic Kernel supports OpenAI, Azure OpenAI, Ollama, local models, HuggingFace, and through custom connectors any model with a compatible API. In 2026, support was extended to Anthropic's Claude and Google's Gemini.

Semantic Kernel provides the Memory Store, an abstraction layer over vector databases such as Azure AI Search, Qdrant, Chroma, or Pinecone. You use the VectorStore to save and retrieve memories relevant to the current conversation through semantic search. The pattern involves saving conversation fragments as vector embeddings and retrieving them based on semantic similarity with the current message.

The human-in-the-loop pattern is implemented by creating special tools that instead of directly executing an action return a proposal to the user and wait for confirmation. In Semantic Kernel, an IFunctionInvocationFilter is used to intercept calls to critical functions, log the proposal, and suspend execution until confirmation arrives. Irreversible actions, deletions, payments, email sending, must always go through this pattern.

Leave your details in the form below