Guardrails

Input/Output Validation & Safety Mechanisms for LLM Applications

What are LLM Guardrails?

Guardrails are safety mechanisms that validate, filter, and control both the inputs sent to LLMs and the outputs they generate. They act as protective boundaries to ensure AI systems behave safely, ethically, and within defined constraints.

"Guardrails are programmable, rule-based systems that sit between users and foundation models to help ensure AI behavior aligns with organizational and societal expectations."

— NVIDIA NeMo Guardrails

Input Validation

Block malicious prompts, injections, jailbreaks

Output Filtering

Ensure safe, accurate, compliant responses

Conversation Control

Guide dialogues within defined boundaries

Why Guardrails Matter

Safety & Compliance

Prevent generation of harmful, biased, or inappropriate content. Ensure compliance with regulations like GDPR, HIPAA, or industry-specific requirements.

Security Protection

Defend against prompt injection attacks, jailbreaking attempts, and data exfiltration. Protect sensitive information and prevent unauthorized actions.

Focus & Relevance

Keep conversations on-topic and within the intended use case. Prevent off-topic responses that could confuse users or expose the system to misuse.

Quality Assurance

Validate output format, consistency, and accuracy. Catch hallucinations, factual errors, or responses that don't meet quality standards.

Types of Guardrails

Type	Description	Examples
Content Filters	Block harmful or inappropriate content	Toxicity detection, hate speech filtering
Topic Rails	Restrict conversations to allowed topics	Only answer about product features
Format Validators	Ensure outputs match expected structure	JSON schema validation, length limits
PII Protection	Detect and mask sensitive data	SSN, credit cards, email addresses
Fact Checking	Verify accuracy against knowledge bases	RAG grounding, citation verification
Jailbreak Detection	Identify and block bypass attempts	Prompt injection, role-playing attacks

Guardrails Architecture

User Input

Input Rails

LLM

Output Rails

Safe Response

Guardrails Tools Comparison

NeMo Guardrails

by NVIDIA

Colang dialogue modeling language
Programmable conversation rails
Open-source & production-ready
LangChain integration

View on GitHub

Guardrails AI

Open Source

50+ pre-built validators
Pydantic-style validation
Automatic fix/retry logic
Guardrails Hub ecosystem

Visit Guardrails AI

Llama Guard

by Meta

Fine-tuned Llama for safety
Customizable safety policies
Multi-turn conversation support
Input & output classification

View on HuggingFace

Bedrock Guardrails

by AWS

Managed service, no code needed
Content filtering & topic blocking
PII detection built-in
Works with any Bedrock model

AWS Bedrock Guardrails

Other Notable Tools

Azure AI Content Safety Vertex AI Safety Rebuff LLM Guard Claude Constitutional AI

Best Practices

Do This

Layer multiple guardrails (defense in depth)
Validate both inputs AND outputs
Log all guardrail triggers for analysis
Provide helpful fallback responses
Regularly update rules for new attack vectors
Test with adversarial prompts regularly

Avoid This

Relying only on the LLM's built-in safety
Using simple keyword blocklists alone
Blocking too aggressively (high false positives)
Ignoring edge cases and creative bypasses
Exposing guardrail logic to end users
Assuming guardrails are 100% effective

Quick Start: Guardrails AI

# Install
pip install guardrails-ai

# Define a guard
from guardrails import Guard
from guardrails.hub import ToxicLanguage, DetectPII

# Create guard with multiple validators
guard = Guard().use_many(
    ToxicLanguage(on_fail="exception"),
    DetectPII(pii_entities=["EMAIL", "SSN"], on_fail="fix")
)

# Validate LLM output
result = guard.validate("Contact me at john@email.com for details")
print(result.validated_output)
# Output: "Contact me at [EMAIL REDACTED] for details"

See full documentation: docs.guardrailsai.com

NeMo Guardrails (Colang)

# config.yml
models:
  - type: main
    engine: openai
    model: gpt-4

# rails.co (Colang file)
define user ask about competitors
    "What about competitor X?"
    "Is Y better than you?"
    "Compare with Z product"

define bot refuse competitor discussion
    "I can only discuss our own products and services. 
    How can I help you with those?"

define flow
    user ask about competitors
    bot refuse competitor discussion

See full documentation: NeMo Guardrails Docs

Research & References

NeMo Guardrails

NVIDIA's open-source toolkit

Guardrails AI

Validation framework & Hub

Llama Guard Paper

Meta's safety classifier research

Bedrock Guardrails

AWS implementation guide

Universal Jailbreak Attacks

Research on LLM vulnerabilities

OWASP LLM Top 10

Security risks for LLM applications

Guardrails

What are LLM Guardrails?

Input Validation

Output Filtering

Conversation Control

Why Guardrails Matter

Safety & Compliance

Security Protection

Focus & Relevance

Quality Assurance

Types of Guardrails

Guardrails Architecture

Guardrails Tools Comparison

NeMo Guardrails

Guardrails AI

Llama Guard

Bedrock Guardrails

Other Notable Tools

Best Practices

Do This

Avoid This

Quick Start: Guardrails AI

NeMo Guardrails (Colang)

Research & References

NeMo Guardrails

Guardrails AI

Llama Guard Paper

Bedrock Guardrails

Universal Jailbreak Attacks

OWASP LLM Top 10

Related Topics