Model Selection & Model Card

Card BUILD

Purpose

Standardized model card documenting model purpose, capabilities, limitations, performance, and ethical considerations.

Related Controls

ISO A.4 NIST MP-2

1. Model Overview

Provide basic identification and purpose information.

Model Name: [MODEL NAME]

Version: [VERSION]

Date: [DATE]

Model Type: Classification / Regression / Generation / Embedding / Agent / Other

Provider: Internal / [VENDOR NAME]

Owner: [NAME], [ROLE TITLE]

Purpose: [One paragraph describing what business problem this model solves]

Intended Users: [Who will use this model — roles, teams]

Intended Use Cases: [Specific approved use cases]

Out-of-Scope Uses: [Explicitly prohibited or unsupported uses]

2. Model Architecture & Training

Document technical details about the model's architecture and training.

Architecture: [Model architecture — e.g., Transformer, CNN, Gradient Boosting, LLM API]

Base Model: [If fine-tuned, identify the base model — e.g., Claude 3.5 Sonnet, GPT-4o, Llama 3]

Training Data:

  • Sources: [List data sources used for training]
  • Size: [Dataset size — records, tokens, images]
  • Date Range: [Time period covered by training data]
  • Data Classification: [Classification level of training data]

Fine-tuning:

  • Method: [Full fine-tune, LoRA, RLHF, prompt tuning, none]
  • Dataset: [Fine-tuning dataset description]
  • Hyperparameters: [Key hyperparameters if applicable]

Inference Requirements:

  • Compute: [CPU/GPU requirements]
  • Memory: [RAM/VRAM requirements]
  • Latency: [Expected response time]

3. Performance Metrics

Document quantitative performance on relevant benchmarks and test sets.

MetricTest SetScoreThresholdPass/Fail
Accuracy[TEST SET][MIN]
Precision[TEST SET][MIN]
Recall[TEST SET][MIN]
F1 Score[TEST SET][MIN]
Latency (p50)Production[MAX ms]
Latency (p99)Production[MAX ms]
ThroughputLoad Test[MIN req/s]

Performance Notes: [Any caveats, known performance degradation scenarios, or edge cases]

4. Limitations & Risks

Be transparent about what the model cannot do and known risks.

Known Limitations

  • [LIMITATION — e.g., "Model performs poorly on languages other than English"]
  • [LIMITATION — e.g., "Accuracy degrades for inputs longer than 4096 tokens"]
  • [LIMITATION — e.g., "Model may hallucinate citations and references"]

Bias Assessment

  • Tested for bias across: [Protected attributes tested — gender, race, age, etc.]
  • Results: [Summary of bias testing results]
  • Mitigations: [What was done to address identified biases]

Risk Classification

  • Risk Tier: Low / Medium / High
  • Data Exposure Risk: [Assessment]
  • Fairness Risk: [Assessment]
  • Safety Risk: [Assessment]
  • Mitigations in Place: [List active mitigations]

5. Approval & Lifecycle

Document approval status and ongoing lifecycle management.

Selection Justification

Why this model was selected over alternatives:

  1. [REASON — e.g., "Best accuracy/cost tradeoff for our use case"]
  2. [REASON — e.g., "Vendor meets our security and privacy requirements"]
  3. [REASON — e.g., "Compatible with existing infrastructure"]

Alternatives Considered:

  • [MODEL] — Rejected because [REASON]
  • [MODEL] — Rejected because [REASON]

Approvals

  • Technical Review: [NAME] — [DATE]
  • Security Review: [NAME] — [DATE]
  • Business Approval: [NAME] — [DATE]

Lifecycle

  • Deployment Date: [DATE]
  • Next Review Date: [DATE]
  • Retirement Criteria: [What would trigger model replacement or decommission]
  • Monitoring: [How model performance is tracked in production]
← Back to all templates