GenAIHub
← Back to Technical Section

Model Governance

Lifecycle Management, Versioning & Compliance for LLMs in Production

What is Model Governance?

Model governance encompasses the policies, processes, and controls for managing AI/LLM models throughout their lifecycle. It ensures models are deployed responsibly, perform as expected, comply with regulations, and can be audited and rolled back when needed.

"AI governance is the framework of policies, procedures, and standards for responsible AI development and deployment. It includes model lifecycle management, risk assessment, and ongoing monitoring."

Versioning

Track changes

Approval

Review gates

Monitoring

Performance watch

Compliance

Regulatory align

Model Lifecycle Stages

Development
Evaluation
Approval
Deployment
Monitoring
Retirement

Key Governance Components

Component Purpose Key Actions
Model Registry Central catalog of all models and versions Register, version, discover, deprecate
Prompt Registry Version control for prompts and templates Track, test, rollback prompts
Evaluation Suite Automated testing before deployment Run benchmarks, safety tests, regression
Approval Workflow Human review gates for critical changes Request, review, approve, reject
Deployment Controls Staged rollouts and rollback capability Canary, blue-green, rollback
Model Card Documentation of model capabilities/limits Document, publish, review

Model Card Template

# Model Card: Customer Support Assistant v2.1

## Model Overview
- Name: customer-support-assistant
- Version: 2.1.0
- Base Model: GPT-4-turbo
- Last Updated: 2024-01-15
- Owner: AI Platform Team

## Intended Use
- Primary Use: Customer service Q&A for product support
- Users: Customer service representatives, end customers
- Out of Scope: Medical advice, legal guidance, financial decisions

## Training Data
- Sources: Support tickets (2020-2023), FAQ database, product docs
- Data Volume: 500K examples
- PII Handling: All PII removed during preprocessing

## Performance Metrics
- Accuracy: 94.2% on test set
- Hallucination Rate: 2.1%
- Latency (p95): 1.2s

## Limitations & Risks
- May struggle with very recent product updates
- Not suitable for billing disputes requiring system access
- Should not be used for escalation decisions

## Ethical Considerations
- Tested for bias across customer demographics
- Escalation to human for sensitive topics

Prompt Versioning

# Prompt Registry Implementation
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, Dict

@dataclass
class PromptVersion:
    prompt_id: str
    version: str
    template: str
    created_at: datetime
    created_by: str
    status: str  # draft, review, approved, deprecated
    metrics: Optional[Dict] = None

class PromptRegistry:
    def __init__(self, db):
        self.db = db
    
    def register(self, prompt_id: str, template: str, author: str) -> PromptVersion:
        """Register a new prompt version"""
        current = self.get_latest(prompt_id)
        new_version = self._increment_version(current.version if current else "0.0.0")
        
        prompt = PromptVersion(
            prompt_id=prompt_id,
            version=new_version,
            template=template,
            created_at=datetime.utcnow(),
            created_by=author,
            status="draft"
        )
        self.db.save(prompt)
        return prompt
    
    def approve(self, prompt_id: str, version: str, approver: str):
        """Approve a prompt for production use"""
        prompt = self.db.get(prompt_id, version)
        if prompt.status != "review":
            raise ValueError("Prompt must be in review status")
        prompt.status = "approved"
        self.db.save(prompt)
        self._log_approval(prompt, approver)
    
    def get_production(self, prompt_id: str) -> PromptVersion:
        """Get latest approved version for production"""
        return self.db.query(
            prompt_id=prompt_id, 
            status="approved"
        ).order_by("version").last()

Deployment Strategies

Canary Deployment

Route a small % of traffic to new version. Monitor for issues before full rollout.

v1: 95% traffic → v2: 5% traffic

Blue-Green Deployment

Run two identical environments. Switch traffic instantly between them.

Blue (v1) ←→ Green (v2)

Feature Flags

Toggle features for specific users, orgs, or percentages without redeployment.

if feature_enabled("gpt4"): use_gpt4()

A/B Testing

Compare two versions with different user groups to measure impact.

Group A → v1 | Group B → v2

Model Governance Tools

Best Practices

Do This

  • Maintain a model registry with all versions
  • Version prompts alongside code
  • Require approval for production changes
  • Document model capabilities/limitations
  • Run evals before every deployment
  • Have rollback capability ready

Avoid This

  • Deploying without evaluation gates
  • Hardcoding model versions in code
  • Skipping model cards / documentation
  • Big-bang deployments (all traffic at once)
  • Ignoring deprecation timelines
  • Lacking audit trail for changes

Related Topics