Model Selection & Model Card
Purpose
Standardized model card documenting model purpose, capabilities, limitations, performance, and ethical considerations.
Related Controls
1. Model Overview
Provide basic identification and purpose information.
Model Name: [MODEL NAME]
Version: [VERSION]
Date: [DATE]
Model Type: Classification / Regression / Generation / Embedding / Agent / Other
Provider: Internal / [VENDOR NAME]
Owner: [NAME], [ROLE TITLE]
Purpose: [One paragraph describing what business problem this model solves]
Intended Users: [Who will use this model — roles, teams]
Intended Use Cases: [Specific approved use cases]
Out-of-Scope Uses: [Explicitly prohibited or unsupported uses]
2. Model Architecture & Training
Document technical details about the model's architecture and training.
Architecture: [Model architecture — e.g., Transformer, CNN, Gradient Boosting, LLM API]
Base Model: [If fine-tuned, identify the base model — e.g., Claude 3.5 Sonnet, GPT-4o, Llama 3]
Training Data:
- Sources: [List data sources used for training]
- Size: [Dataset size — records, tokens, images]
- Date Range: [Time period covered by training data]
- Data Classification: [Classification level of training data]
Fine-tuning:
- Method: [Full fine-tune, LoRA, RLHF, prompt tuning, none]
- Dataset: [Fine-tuning dataset description]
- Hyperparameters: [Key hyperparameters if applicable]
Inference Requirements:
- Compute: [CPU/GPU requirements]
- Memory: [RAM/VRAM requirements]
- Latency: [Expected response time]
3. Performance Metrics
Document quantitative performance on relevant benchmarks and test sets.
| Metric | Test Set | Score | Threshold | Pass/Fail |
|---|---|---|---|---|
| Accuracy | [TEST SET] | [MIN] | ||
| Precision | [TEST SET] | [MIN] | ||
| Recall | [TEST SET] | [MIN] | ||
| F1 Score | [TEST SET] | [MIN] | ||
| Latency (p50) | Production | [MAX ms] | ||
| Latency (p99) | Production | [MAX ms] | ||
| Throughput | Load Test | [MIN req/s] |
Performance Notes: [Any caveats, known performance degradation scenarios, or edge cases]
4. Limitations & Risks
Be transparent about what the model cannot do and known risks.
Known Limitations
- [LIMITATION — e.g., "Model performs poorly on languages other than English"]
- [LIMITATION — e.g., "Accuracy degrades for inputs longer than 4096 tokens"]
- [LIMITATION — e.g., "Model may hallucinate citations and references"]
Bias Assessment
- Tested for bias across: [Protected attributes tested — gender, race, age, etc.]
- Results: [Summary of bias testing results]
- Mitigations: [What was done to address identified biases]
Risk Classification
- Risk Tier: Low / Medium / High
- Data Exposure Risk: [Assessment]
- Fairness Risk: [Assessment]
- Safety Risk: [Assessment]
- Mitigations in Place: [List active mitigations]
5. Approval & Lifecycle
Document approval status and ongoing lifecycle management.
Selection Justification
Why this model was selected over alternatives:
- [REASON — e.g., "Best accuracy/cost tradeoff for our use case"]
- [REASON — e.g., "Vendor meets our security and privacy requirements"]
- [REASON — e.g., "Compatible with existing infrastructure"]
Alternatives Considered:
- [MODEL] — Rejected because [REASON]
- [MODEL] — Rejected because [REASON]
Approvals
- Technical Review: [NAME] — [DATE]
- Security Review: [NAME] — [DATE]
- Business Approval: [NAME] — [DATE]
Lifecycle
- Deployment Date: [DATE]
- Next Review Date: [DATE]
- Retirement Criteria: [What would trigger model replacement or decommission]
- Monitoring: [How model performance is tracked in production]