Model Governance
Lifecycle Management, Versioning & Compliance for LLMs in Production
What is Model Governance?
Model governance encompasses the policies, processes, and controls for managing AI/LLM models throughout their lifecycle. It ensures models are deployed responsibly, perform as expected, comply with regulations, and can be audited and rolled back when needed.
"AI governance is the framework of policies, procedures, and standards for responsible AI development and deployment. It includes model lifecycle management, risk assessment, and ongoing monitoring."
Versioning
Track changes
Approval
Review gates
Monitoring
Performance watch
Compliance
Regulatory align
Model Lifecycle Stages
Key Governance Components
| Component | Purpose | Key Actions |
|---|---|---|
| Model Registry | Central catalog of all models and versions | Register, version, discover, deprecate |
| Prompt Registry | Version control for prompts and templates | Track, test, rollback prompts |
| Evaluation Suite | Automated testing before deployment | Run benchmarks, safety tests, regression |
| Approval Workflow | Human review gates for critical changes | Request, review, approve, reject |
| Deployment Controls | Staged rollouts and rollback capability | Canary, blue-green, rollback |
| Model Card | Documentation of model capabilities/limits | Document, publish, review |
Model Card Template
# Model Card: Customer Support Assistant v2.1
## Model Overview
- Name: customer-support-assistant
- Version: 2.1.0
- Base Model: GPT-4-turbo
- Last Updated: 2024-01-15
- Owner: AI Platform Team
## Intended Use
- Primary Use: Customer service Q&A for product support
- Users: Customer service representatives, end customers
- Out of Scope: Medical advice, legal guidance, financial decisions
## Training Data
- Sources: Support tickets (2020-2023), FAQ database, product docs
- Data Volume: 500K examples
- PII Handling: All PII removed during preprocessing
## Performance Metrics
- Accuracy: 94.2% on test set
- Hallucination Rate: 2.1%
- Latency (p95): 1.2s
## Limitations & Risks
- May struggle with very recent product updates
- Not suitable for billing disputes requiring system access
- Should not be used for escalation decisions
## Ethical Considerations
- Tested for bias across customer demographics
- Escalation to human for sensitive topics
Prompt Versioning
# Prompt Registry Implementation
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, Dict
@dataclass
class PromptVersion:
prompt_id: str
version: str
template: str
created_at: datetime
created_by: str
status: str # draft, review, approved, deprecated
metrics: Optional[Dict] = None
class PromptRegistry:
def __init__(self, db):
self.db = db
def register(self, prompt_id: str, template: str, author: str) -> PromptVersion:
"""Register a new prompt version"""
current = self.get_latest(prompt_id)
new_version = self._increment_version(current.version if current else "0.0.0")
prompt = PromptVersion(
prompt_id=prompt_id,
version=new_version,
template=template,
created_at=datetime.utcnow(),
created_by=author,
status="draft"
)
self.db.save(prompt)
return prompt
def approve(self, prompt_id: str, version: str, approver: str):
"""Approve a prompt for production use"""
prompt = self.db.get(prompt_id, version)
if prompt.status != "review":
raise ValueError("Prompt must be in review status")
prompt.status = "approved"
self.db.save(prompt)
self._log_approval(prompt, approver)
def get_production(self, prompt_id: str) -> PromptVersion:
"""Get latest approved version for production"""
return self.db.query(
prompt_id=prompt_id,
status="approved"
).order_by("version").last()
Deployment Strategies
Canary Deployment
Route a small % of traffic to new version. Monitor for issues before full rollout.
Blue-Green Deployment
Run two identical environments. Switch traffic instantly between them.
Feature Flags
Toggle features for specific users, orgs, or percentages without redeployment.
A/B Testing
Compare two versions with different user groups to measure impact.
Model Governance Tools
Best Practices
Do This
- Maintain a model registry with all versions
- Version prompts alongside code
- Require approval for production changes
- Document model capabilities/limitations
- Run evals before every deployment
- Have rollback capability ready
Avoid This
- Deploying without evaluation gates
- Hardcoding model versions in code
- Skipping model cards / documentation
- Big-bang deployments (all traffic at once)
- Ignoring deprecation timelines
- Lacking audit trail for changes