GenAIHub
← Back to Technical Section

Guardrails

Input/Output Validation & Safety Mechanisms for LLM Applications

What are LLM Guardrails?

Guardrails are safety mechanisms that validate, filter, and control both the inputs sent to LLMs and the outputs they generate. They act as protective boundaries to ensure AI systems behave safely, ethically, and within defined constraints.

"Guardrails are programmable, rule-based systems that sit between users and foundation models to help ensure AI behavior aligns with organizational and societal expectations."

Input Validation

Block malicious prompts, injections, jailbreaks

Output Filtering

Ensure safe, accurate, compliant responses

Conversation Control

Guide dialogues within defined boundaries

Why Guardrails Matter

Safety & Compliance

Prevent generation of harmful, biased, or inappropriate content. Ensure compliance with regulations like GDPR, HIPAA, or industry-specific requirements.

Security Protection

Defend against prompt injection attacks, jailbreaking attempts, and data exfiltration. Protect sensitive information and prevent unauthorized actions.

Focus & Relevance

Keep conversations on-topic and within the intended use case. Prevent off-topic responses that could confuse users or expose the system to misuse.

Quality Assurance

Validate output format, consistency, and accuracy. Catch hallucinations, factual errors, or responses that don't meet quality standards.

Types of Guardrails

Type Description Examples
Content Filters Block harmful or inappropriate content Toxicity detection, hate speech filtering
Topic Rails Restrict conversations to allowed topics Only answer about product features
Format Validators Ensure outputs match expected structure JSON schema validation, length limits
PII Protection Detect and mask sensitive data SSN, credit cards, email addresses
Fact Checking Verify accuracy against knowledge bases RAG grounding, citation verification
Jailbreak Detection Identify and block bypass attempts Prompt injection, role-playing attacks

Guardrails Architecture

User Input
Input Rails
LLM
Output Rails
Safe Response

Guardrails Tools Comparison

NeMo Guardrails

by NVIDIA
  • Colang dialogue modeling language
  • Programmable conversation rails
  • Open-source & production-ready
  • LangChain integration
View on GitHub

Guardrails AI

Open Source
  • 50+ pre-built validators
  • Pydantic-style validation
  • Automatic fix/retry logic
  • Guardrails Hub ecosystem
Visit Guardrails AI

Llama Guard

by Meta
  • Fine-tuned Llama for safety
  • Customizable safety policies
  • Multi-turn conversation support
  • Input & output classification
View on HuggingFace

Bedrock Guardrails

by AWS
  • Managed service, no code needed
  • Content filtering & topic blocking
  • PII detection built-in
  • Works with any Bedrock model
AWS Bedrock Guardrails

Best Practices

Do This

  • Layer multiple guardrails (defense in depth)
  • Validate both inputs AND outputs
  • Log all guardrail triggers for analysis
  • Provide helpful fallback responses
  • Regularly update rules for new attack vectors
  • Test with adversarial prompts regularly

Avoid This

  • Relying only on the LLM's built-in safety
  • Using simple keyword blocklists alone
  • Blocking too aggressively (high false positives)
  • Ignoring edge cases and creative bypasses
  • Exposing guardrail logic to end users
  • Assuming guardrails are 100% effective

Quick Start: Guardrails AI

# Install
pip install guardrails-ai

# Define a guard
from guardrails import Guard
from guardrails.hub import ToxicLanguage, DetectPII

# Create guard with multiple validators
guard = Guard().use_many(
    ToxicLanguage(on_fail="exception"),
    DetectPII(pii_entities=["EMAIL", "SSN"], on_fail="fix")
)

# Validate LLM output
result = guard.validate("Contact me at john@email.com for details")
print(result.validated_output)
# Output: "Contact me at [EMAIL REDACTED] for details"

See full documentation: docs.guardrailsai.com

NeMo Guardrails (Colang)

# config.yml
models:
  - type: main
    engine: openai
    model: gpt-4

# rails.co (Colang file)
define user ask about competitors
    "What about competitor X?"
    "Is Y better than you?"
    "Compare with Z product"

define bot refuse competitor discussion
    "I can only discuss our own products and services. 
    How can I help you with those?"

define flow
    user ask about competitors
    bot refuse competitor discussion

See full documentation: NeMo Guardrails Docs

Research & References

Related Topics