AI Threat Model
Purpose
Structured threat modeling template for AI systems covering attack surfaces, threat actors, and mitigation strategies.
Related Controls
1. System Overview
Describe the AI system being threat modeled, its components, and trust boundaries.
System Name: [SYSTEM NAME]
Version: [VERSION]
Date: [DATE]
Threat Modeler: [NAME], [ROLE TITLE]
System Description: [Brief description of what the system does, its architecture, and key components]
Components:
- Frontend/UI: [Description]
- API Layer: [Description]
- AI Model/Service: [Description]
- Data Store: [Description]
- External Integrations: [Description]
Trust Boundaries:
- User ↔ Application
- Application ↔ AI Service
- AI Service ↔ Data Store
- Application ↔ External APIs
2. Threat Actor Profiles
Identify who might attack this system and their capabilities.
| Actor | Motivation | Capability | Access Level |
|---|---|---|---|
| External Attacker | Data theft, disruption, financial gain | Prompt injection, API abuse, social engineering | Unauthenticated / public endpoints |
| Malicious User | Bypass restrictions, extract data, abuse service | Prompt manipulation, jailbreaking, rate limit abuse | Authenticated user |
| Compromised Insider | Data exfiltration, sabotage | Direct system access, model poisoning, configuration changes | Privileged access |
| Supply Chain | Backdoor, data collection | Compromised model, poisoned training data, malicious dependencies | Vendor/integration level |
| Automated Agent | Resource exhaustion, data harvesting | Scripted attacks, bot networks, recursive prompts | API access |
3. Attack Surface Analysis
Enumerate AI-specific attack vectors using STRIDE or similar methodology.
| Attack Vector | STRIDE Category | Description | Likelihood | Impact | Risk |
|---|---|---|---|---|---|
| Prompt Injection (Direct) | Tampering | User crafts input to override system instructions | |||
| Prompt Injection (Indirect) | Tampering | Malicious content in retrieved data manipulates model behavior | |||
| Training Data Poisoning | Tampering | Adversary corrupts training/fine-tuning data | |||
| Model Inversion | Info Disclosure | Attacker extracts training data from model responses | |||
| Sensitive Data Leakage | Info Disclosure | Model reveals PII, credentials, or proprietary data | |||
| Denial of Service | Denial of Service | Resource exhaustion via large/recursive prompts | |||
| Jailbreaking | Elevation | Bypass content filters and safety guardrails | |||
| Agent Hijacking | Elevation | Manipulate AI agent to perform unauthorized actions |
4. Mitigations
For each identified threat, document the mitigation strategy and its current status.
| Threat | Mitigation | Control Type | Status | Owner |
|---|---|---|---|---|
| Prompt Injection | Input sanitization, system prompt hardening, output filtering | Preventive | ||
| Data Leakage | Output scanning for PII/secrets, data classification enforcement | Detective | ||
| DoS | Rate limiting, token limits, request size limits, timeout enforcement | Preventive | ||
| Model Theft | Access controls, API key rotation, usage monitoring | Preventive | ||
| Jailbreaking | Content filters, output validation, monitoring for policy violations | Detective | ||
| Agent Hijacking | Least privilege, tool whitelisting, human approval gates | Preventive |
Control Types: Preventive (stop the attack), Detective (detect the attack), Corrective (respond to the attack)
5. Residual Risk & Sign-Off
Document remaining risks after mitigations and get formal approval.
Residual Risks
- [RISK — e.g., "Novel prompt injection techniques may bypass current filters"]
- Likelihood after mitigation: [Low/Medium/High]
- Accepted by: [NAME, TITLE]
- [RISK — e.g., "Model may generate plausible but incorrect outputs"]
- Likelihood after mitigation: [Low/Medium/High]
- Accepted by: [NAME, TITLE]
Approval
- Threat Model Review: [NAME] — [DATE] — Approved / Requires Revision
- Security Sign-Off: [NAME] — [DATE]
- System Owner Acceptance: [NAME] — [DATE]
Next Review
- Scheduled: [DATE] (or triggered by significant system change, new threat intelligence, or security incident)