Published on May 21, 2026

Designing Workflow-Level Guardrails on Top of Azure AI Foundry with elsai Guardrails

Executive summary

Azure AI Foundry has moved quickly from an interesting developer preview into something that genuinely matters at the enterprise level. Over 80,000 enterprises now run workloads on the platform, including 80 per cent of Fortune 500 companies, according to Microsoft's fiscal year 2025 annual report. It gives teams access to more than 11,000 models, a unified agent service, and deep integrations with the rest of the Azure ecosystem. For teams that have been waiting for a stable, enterprise-grade foundation to build LLM applications on, Foundry has become the answer.

But there is a gap that Foundry, by design, does not close, and it matters considerably once an LLM workflow moves beyond a sandbox and into production. The platform gives you the infrastructure to run models. It does not give you a programmable, workflow-level safety layer that intercepts every input and output, checks it against your organization's rules, and blocks or flags what does not pass. That layer has to be built separately. And for most enterprise teams, figuring out how to build it robustly is one of the harder parts of production deployment.

This is what elsai Guardrails is designed to solve. It sits between your application logic and the LLM, wrapping every call with configurable checks that run in real time, before the model sees the input, and before the response reaches the user. Think of it as the compliance and enforcement layer that Azure AI Foundry does not out of the box.

Why the gap exists - and why it matters 

Azure AI Foundry is built for flexibility. It supports OpenAI models, Anthropic Claude, Meta Llama, Mistral, and dozens of others. It handles orchestration, deployment, scaling, and AI observability. What it does not do is make opinionated decisions about what should or should not pass through your LLM workflows at the content level. That is intentional, Foundry is a platform, not a policy engine.

The problem is that enterprise LLM deployments need a policy engine. They need one because the surface area of risk in an LLM workflow is wide and non-obvious. Prompt injection attacks, where malicious input tries to override system instructions , are not hypothetical. PHI and PII leakage through model outputs is a documented compliance risk, particularly in healthcare and financial services workloads. Jailbreak attempts against customer-facing agents happen at scale. And when an LLM is connected to a database and generating SQL, a syntactically incorrect or semantically dangerous query can cause serious downstream damage.

None of these risks addressed by infrastructure-level controls. They require checks that operate at the content layer, on the text going in, and the text coming out for every single request, with thresholds you can tune and logs you can audit.

As of 2025, only about 2 per cent of organizations have deployed agent-based systems at real operational scale, while most remain stuck in pilots. One of the primary reasons is that enterprises cannot get comfortable with the governance and safety picture. Guardrails are what close that gap.

What elsai guardrails actually does 

elsai Guardrails wraps your LLM calls through a simple Python interface. Once configured, every call to your Azure OpenAI deployment, or any other provider, runs through a configurable set of checks before the model processes the input and before the response reaches your application. The checks are fast: the platform targets sub-100ms latency so that guardrails do not become a bottleneck in your workflow.

Here is what the wrapper looks like in practice when configured for an Azure OpenAI deployment:

from elsai_guardrails.guardrails import LLMRails, RailsConfig

yaml_content = """

llm:

engine: "azure_openai"

endpoint: "https://your-resource.openai.azure.com"

api_version: "2024-02-15-preview"

api_key: "your-api-key"

model: "gpt-4"

guardrails:

input_checks: true

output_checks: true

check_toxicity: true

check_sensitive_data: true

check_semantic: true

toxicity_threshold: 0.7

block_toxic: true

block_sensitive_data: true

"""

config = RailsConfig.from_content(yaml_content=yaml_content)

rails = LLMRails(config=config)

result = rails.generate(

messages=[{"role": "user", "content": user_input}],

return_details=True

)

from elsai_guardrails.guardrails import LLMRails, RailsConfig

yaml_content = """

llm:

engine: "azure_openai"

endpoint: "https://your-resource.openai.azure.com"

api_version: "2024-02-15-preview"

api_key: "your-api-key"

model: "gpt-4"

guardrails:

input_checks: true

output_checks: true

check_toxicity: true

check_sensitive_data: true

check_semantic: true

toxicity_threshold: 0.7

block_toxic: true

block_sensitive_data: true

"""

config = RailsConfig.from_content(yaml_content=yaml_content)

rails = LLMRails(config=config)

result = rails.generate(

messages=[{"role": "user", "content": user_input}],

return_details=True

)

from elsai_guardrails.guardrails import LLMRails, RailsConfig

yaml_content = """

llm:

engine: "azure_openai"

endpoint: "https://your-resource.openai.azure.com"

api_version: "2024-02-15-preview"

api_key: "your-api-key"

model: "gpt-4"

guardrails:

input_checks: true

output_checks: true

check_toxicity: true

check_sensitive_data: true

check_semantic: true

toxicity_threshold: 0.7

block_toxic: true

block_sensitive_data: true

"""

config = RailsConfig.from_content(yaml_content=yaml_content)

rails = LLMRails(config=config)

result = rails.generate(

messages=[{"role": "user", "content": user_input}],

return_details=True

)

The configuration above runs toxicity detection, sensitive data protection, and semantic content classification on both the input and output of every call. The threshold for toxicity is tunable — a threshold of 0.7 means the system only blocks content when it has 70 per cent confidence or higher that it is toxic. Organizations in regulated industries typically set tighter thresholds; general-purpose applications may allow more latitude. 

The six checks that matter most in production 

The elsai Guardrails library currently ships with six core checks. Each one addresses a distinct category of risk that appears consistently in production LLM deployments. 

Toxicity detection

This runs on both inputs and outputs, scoring content for harmful, offensive, or inappropriate language. The toxicity threshold is configurable so teams can calibrate sensitivity to their use case. A customer service agent and an internal developer tool do not need the same threshold. 

PHI and PII detection 

For any workflow processing personal health information or personally identifiable information, healthcare applications, HR tools, financial services assistants, this check identifies and redacts sensitive data before it gets logged, stored, or passed to a model. This is not a nice-to-have in HIPAA-adjacent workloads. It is a compliance requirement. 

Sensitive data detection 

Beyond PHI PII detection, this check catches financial data, credentials, API keys, and other high-value information that should not appear in LLM inputs or outputs. In workflows where users can upload documents or paste text directly, this check prevents accidental exposure through the model. 

Jailbreak detection

Jailbreak attempts use crafted prompts to bypass model safety measures and extract harmful outputs. This check uses semantic routing to identify patterns that signal a jailbreak attempt, regardless of how the prompt is phrased. This matters especially for customer-facing agents, where the user population is large, and adversarial inputs will eventually appear.

Prompt injection detection 

Distinct from jailbreaks, prompt injection attacks try to hijack an agent's behavior through malicious content embedded in external data, documents, database results, emails, or web content that the agent retrieves as part of its task. As Azure AI Foundry workloads increasingly involve agentic retrieval and multi-step reasoning over external sources, this check becomes critical. 

SQL syntax validation 

When an LLM generates SQL queries for execution against a database, a malformed or semantically dangerous query can cause significant downstream harm. Elsai Guardrails validates generated SQL against seven major dialects, PostgreSQL, MySQL, SQLite, SQL Server, and more, before execution, catching errors before they reach the database layer. This is especially relevant for text-to-SQL and natural language database query applications built on Foundry. 

Off-topic detection: Keeping agents focused 

One of the more practically useful features in the current release is off-topic detection. Enterprise LLM deployments often involve scoped agents, a customer support bot that should only answer questions about your product, a legal research assistant that should stay within a defined domain, a clinical decision support tool that should not wander into general medical advice.

Without enforcement, users quickly discover that agents will answer questions outside their intended scope, which creates both compliance risk and quality problems. Off-topic detection lets you define the allowed topics for an agent and block inputs that fall outside them:

guardrails: 

  check_off_topic: true 

  block_off_topic: true 

  allowed_topics: 

    - name: "Customer Support" 

      description: "Product questions and customer service" 

guardrails: 

  check_off_topic: true 

  block_off_topic: true 

  allowed_topics: 

    - name: "Customer Support" 

      description: "Product questions and customer service" 

This is especially valuable when building specialized agents on top of Azure AI Foundry's multi-agent orchestration layer. Each agent in a workflow can have its own topic scope, enforced at the guardrail level rather than through fragile prompt engineering. 

How this fits into an Azure AI foundry workflow

The architecture is straightforward. Your application sends user input to the elsai Guardrails wrapper. The wrapper runs the configured input checks and if the input passes forwards the request to your Azure OpenAI deployment via the standard API. When the response comes back, the wrapper runs the configured output checks before returning the result to your application. Failed checks return a structured result that your application can handle blocking the response, routing to a fallback, or flagging for human review.

Because elsai Guardrails supports Azure OpenAI natively alongside OpenAI, Anthropic, Google Gemini, and AWS Bedrock, teams that run multi-model workflows on Foundry can apply consistent guardrail policies across different model providers without maintaining separate safety implementations for each one. The configuration is YAML-based and can be managed as code alongside the rest of your deployment configuration.

The EHR stays the system of record. The agent becomes the system of action. The guardrail is the enforcement layer that makes sure the agent only acts within the boundaries the organization has defined.

What this means for enterprise deployments on Azure

Azure AI Foundry's control plane gives teams model access management, evaluations, and CI/CD integration. Microsoft Defender and Entra ID provide identity and access controls at the infrastructure level. These are necessary components of AI enterprise governance. But they operate at a different layer than content-level safety.

The checks that elsai Guardrails runs happen at the request level, on the actual text of every interaction. They complement rather than duplicate what Foundry's control plane provides. Infrastructure-level governance tells you who can use a model and how to use it. Workflow-level guardrails AI tell you what is actually passing through it.

For enterprises that have built internal AI policies and most large organizations have, or are in the process of doing so, elsai Guardrails gives teams a way to implement those policies programmatically rather than through documentation and training. If your policy says that PHI should not be processed through external LLMs without redaction, that policy becomes an automated check rather than a manual review step.

The library is released under the MIT License, runs against SOC2-compliant infrastructure, and ships with async support for high-throughput production workloads. For teams already running on Azure, the integration path is straightforward: install the package, point it at your existing Azure OpenAI deployment, configure your checks in YAML, and wrap your existing generate calls with the rails interface.

Azure AI Foundry is an excellent foundation for enterprise LLM applications. The production readiness gaps most teams encounter is not at the infrastructure layer it is at the content and policy layer. elsai Guardrails is the piece that fills that gap.

FAQ

Does elsai Guardrails replace Azure AI Foundry's built-in safety features?

No. elsai Guardrails operates at the content and workflow level — checking the text of inputs and outputs in real time. Azure AI Foundry's safety features operate at the infrastructure and access-control level. The two are complementary and designed to work together. 

Which Azure OpenAI models are supported? 

elsai Guardrails works with any Azure OpenAI deployment, including GPT-4, GPT-4o, and GPT-3.5-Turbo. Configuration is done at the YAML level, so switching between model deployments does not require code changes to your guardrail implementation. 

How does elsai Guardrails handle PHI in healthcare workflows? 

The PHI/PII detection check identifies and redacts sensitive personal health information and personally identifiable information before it is passed to the model or returned in a response. For HIPAA-adjacent workloads, this check should run on both inputs and outputs. All actions are logged for audit purposes. 

Can I run different guardrail configurations for different agents in a multi-agent workflow? 

Yes. Each agent or workflow can be initialized with its own RailsConfig. This means a customer-facing agent and an internal developer tool can have different toxicity thresholds, topic scopes, and blocking behaviors — all managed through separate YAML configurations. 

What happens when a check fails does the call just return an error? 

When a check fails, elsai Guardrails returns a structured result that includes which check failed and why. Your application code decides how to handle it whether that means returning a fallback message to the user, routing to a human review queue, or logging the event for audit purposes. The system is designed to give your application meaningful information about failures rather than just blocking silently. 

Is elsai Guardrails suitable for high-volume production workloads? 

Yes. The platform targets sub-100ms latency for guardrail checks and ships with full async support. For workloads with high request volumes, the async API allows for concurrent guardrail evaluation without blocking your application thread. The infrastructure runs on SOC2-compliant infrastructure. 

Recent blogs

Secure your agents

We’d love to chat with you about how your team can secure and govern Ai agents everywhere

elsai

Enterprise AI governance platform for agentic workflows. Transform your operations with confidence.

Offices

USA

UK

Australia

UAE

India

© 2026 elsai. All rights reserved.

elsai

Enterprise AI governance platform for agentic workflows. Transform your operations with confidence.

Offices

USA

UK

Australia

UAE

India

© 2026 elsai. All rights reserved.

elsai

Enterprise AI governance platform for agentic workflows. Transform your operations with confidence.

Offices

USA

UK

Australia

UAE

India

© 2026 elsai. All rights reserved.

We use cookies to personalize content and ads, to provide social media features, and to analyze our traffic. We also share information about your use of our site with our social media, advertising, and analytics partners. You can choose which types of cookies to accept. Read our cookies policy ↗

Necessary

Enables security and basic functionality.

Preferences

Enables personalized content and settings.

Analytics

Enables tracking of performance.

Marketing

Enables ads personalization and tracking.