Model Governance

Lifecycle Management, Versioning & Compliance for LLMs in Production

What is Model Governance?

Model governance encompasses the policies, processes, and controls for managing AI/LLM models throughout their lifecycle. It ensures models are deployed responsibly, perform as expected, comply with regulations, and can be audited and rolled back when needed.

"AI governance is the framework of policies, procedures, and standards for responsible AI development and deployment. It includes model lifecycle management, risk assessment, and ongoing monitoring."

— ISO/IEC 42001:2023 AI Management System

Versioning

Track changes

Approval

Review gates

Monitoring

Performance watch

Compliance

Regulatory align

Model Lifecycle Stages

Development

Evaluation

Approval

Deployment

Monitoring

Retirement

Key Governance Components

Component	Purpose	Key Actions
Model Registry	Central catalog of all models and versions	Register, version, discover, deprecate
Prompt Registry	Version control for prompts and templates	Track, test, rollback prompts
Evaluation Suite	Automated testing before deployment	Run benchmarks, safety tests, regression
Approval Workflow	Human review gates for critical changes	Request, review, approve, reject
Deployment Controls	Staged rollouts and rollback capability	Canary, blue-green, rollback
Model Card	Documentation of model capabilities/limits	Document, publish, review

Model Card Template

# Model Card: Customer Support Assistant v2.1

## Model Overview
- Name: customer-support-assistant
- Version: 2.1.0
- Base Model: GPT-4-turbo
- Last Updated: 2024-01-15
- Owner: AI Platform Team

## Intended Use
- Primary Use: Customer service Q&A for product support
- Users: Customer service representatives, end customers
- Out of Scope: Medical advice, legal guidance, financial decisions

## Training Data
- Sources: Support tickets (2020-2023), FAQ database, product docs
- Data Volume: 500K examples
- PII Handling: All PII removed during preprocessing

## Performance Metrics
- Accuracy: 94.2% on test set
- Hallucination Rate: 2.1%
- Latency (p95): 1.2s

## Limitations & Risks
- May struggle with very recent product updates
- Not suitable for billing disputes requiring system access
- Should not be used for escalation decisions

## Ethical Considerations
- Tested for bias across customer demographics
- Escalation to human for sensitive topics

Prompt Versioning

# Prompt Registry Implementation
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, Dict

@dataclass
class PromptVersion:
    prompt_id: str
    version: str
    template: str
    created_at: datetime
    created_by: str
    status: str  # draft, review, approved, deprecated
    metrics: Optional[Dict] = None

class PromptRegistry:
    def __init__(self, db):
        self.db = db
    
    def register(self, prompt_id: str, template: str, author: str) -> PromptVersion:
        """Register a new prompt version"""
        current = self.get_latest(prompt_id)
        new_version = self._increment_version(current.version if current else "0.0.0")
        
        prompt = PromptVersion(
            prompt_id=prompt_id,
            version=new_version,
            template=template,
            created_at=datetime.utcnow(),
            created_by=author,
            status="draft"
        )
        self.db.save(prompt)
        return prompt
    
    def approve(self, prompt_id: str, version: str, approver: str):
        """Approve a prompt for production use"""
        prompt = self.db.get(prompt_id, version)
        if prompt.status != "review":
            raise ValueError("Prompt must be in review status")
        prompt.status = "approved"
        self.db.save(prompt)
        self._log_approval(prompt, approver)
    
    def get_production(self, prompt_id: str) -> PromptVersion:
        """Get latest approved version for production"""
        return self.db.query(
            prompt_id=prompt_id, 
            status="approved"
        ).order_by("version").last()

Deployment Strategies

Canary Deployment

Route a small % of traffic to new version. Monitor for issues before full rollout.

v1: 95% traffic → v2: 5% traffic

Blue-Green Deployment

Run two identical environments. Switch traffic instantly between them.

Blue (v1) ←→ Green (v2)

Feature Flags

Toggle features for specific users, orgs, or percentages without redeployment.

if feature_enabled("gpt4"): use_gpt4()

A/B Testing

Compare two versions with different user groups to measure impact.

Group A → v1 | Group B → v2