Guardrails
Input/Output Validation & Safety Mechanisms for LLM Applications
What are LLM Guardrails?
Guardrails are safety mechanisms that validate, filter, and control both the inputs sent to LLMs and the outputs they generate. They act as protective boundaries to ensure AI systems behave safely, ethically, and within defined constraints.
"Guardrails are programmable, rule-based systems that sit between users and foundation models to help ensure AI behavior aligns with organizational and societal expectations."
Input Validation
Block malicious prompts, injections, jailbreaks
Output Filtering
Ensure safe, accurate, compliant responses
Conversation Control
Guide dialogues within defined boundaries
Why Guardrails Matter
Safety & Compliance
Prevent generation of harmful, biased, or inappropriate content. Ensure compliance with regulations like GDPR, HIPAA, or industry-specific requirements.
Security Protection
Defend against prompt injection attacks, jailbreaking attempts, and data exfiltration. Protect sensitive information and prevent unauthorized actions.
Focus & Relevance
Keep conversations on-topic and within the intended use case. Prevent off-topic responses that could confuse users or expose the system to misuse.
Quality Assurance
Validate output format, consistency, and accuracy. Catch hallucinations, factual errors, or responses that don't meet quality standards.
Types of Guardrails
| Type | Description | Examples |
|---|---|---|
| Content Filters | Block harmful or inappropriate content | Toxicity detection, hate speech filtering |
| Topic Rails | Restrict conversations to allowed topics | Only answer about product features |
| Format Validators | Ensure outputs match expected structure | JSON schema validation, length limits |
| PII Protection | Detect and mask sensitive data | SSN, credit cards, email addresses |
| Fact Checking | Verify accuracy against knowledge bases | RAG grounding, citation verification |
| Jailbreak Detection | Identify and block bypass attempts | Prompt injection, role-playing attacks |
Guardrails Architecture
Guardrails Tools Comparison
NeMo Guardrails
by NVIDIA- Colang dialogue modeling language
- Programmable conversation rails
- Open-source & production-ready
- LangChain integration
Guardrails AI
Open Source- 50+ pre-built validators
- Pydantic-style validation
- Automatic fix/retry logic
- Guardrails Hub ecosystem
Llama Guard
by Meta- Fine-tuned Llama for safety
- Customizable safety policies
- Multi-turn conversation support
- Input & output classification
Bedrock Guardrails
by AWS- Managed service, no code needed
- Content filtering & topic blocking
- PII detection built-in
- Works with any Bedrock model
Other Notable Tools
Best Practices
Do This
- Layer multiple guardrails (defense in depth)
- Validate both inputs AND outputs
- Log all guardrail triggers for analysis
- Provide helpful fallback responses
- Regularly update rules for new attack vectors
- Test with adversarial prompts regularly
Avoid This
- Relying only on the LLM's built-in safety
- Using simple keyword blocklists alone
- Blocking too aggressively (high false positives)
- Ignoring edge cases and creative bypasses
- Exposing guardrail logic to end users
- Assuming guardrails are 100% effective
Quick Start: Guardrails AI
# Install
pip install guardrails-ai
# Define a guard
from guardrails import Guard
from guardrails.hub import ToxicLanguage, DetectPII
# Create guard with multiple validators
guard = Guard().use_many(
ToxicLanguage(on_fail="exception"),
DetectPII(pii_entities=["EMAIL", "SSN"], on_fail="fix")
)
# Validate LLM output
result = guard.validate("Contact me at john@email.com for details")
print(result.validated_output)
# Output: "Contact me at [EMAIL REDACTED] for details"
See full documentation: docs.guardrailsai.com
NeMo Guardrails (Colang)
# config.yml
models:
- type: main
engine: openai
model: gpt-4
# rails.co (Colang file)
define user ask about competitors
"What about competitor X?"
"Is Y better than you?"
"Compare with Z product"
define bot refuse competitor discussion
"I can only discuss our own products and services.
How can I help you with those?"
define flow
user ask about competitors
bot refuse competitor discussion
See full documentation: NeMo Guardrails Docs