AI Threat Model

Assessment SECURE

Purpose

Structured threat modeling template for AI systems covering attack surfaces, threat actors, and mitigation strategies.

AI/LLM Threat Modeling

ISO A.8 NIST AI-600-1

Describe the AI system being threat modeled, its components, and trust boundaries.

System Name: [SYSTEM NAME]

Version: [VERSION]

Date: [DATE]

Threat Modeler: [NAME], [ROLE TITLE]

System Description: [Brief description of what the system does, its architecture, and key components]

Components:

Trust Boundaries:

Identify who might attack this system and their capabilities.

Actor	Motivation	Capability	Access Level
External Attacker	Data theft, disruption, financial gain	Prompt injection, API abuse, social engineering	Unauthenticated / public endpoints
Malicious User	Bypass restrictions, extract data, abuse service	Prompt manipulation, jailbreaking, rate limit abuse	Authenticated user
Compromised Insider	Data exfiltration, sabotage	Direct system access, model poisoning, configuration changes	Privileged access
Supply Chain	Backdoor, data collection	Compromised model, poisoned training data, malicious dependencies	Vendor/integration level
Automated Agent	Resource exhaustion, data harvesting	Scripted attacks, bot networks, recursive prompts	API access

Enumerate AI-specific attack vectors using STRIDE or similar methodology.

Attack Vector	STRIDE Category	Description
Prompt Injection (Direct)	Tampering	User crafts input to override system instructions
Prompt Injection (Indirect)	Tampering	Malicious content in retrieved data manipulates model behavior
Training Data Poisoning	Tampering	Adversary corrupts training/fine-tuning data
Model Inversion	Info Disclosure	Attacker extracts training data from model responses
Sensitive Data Leakage	Info Disclosure	Model reveals PII, credentials, or proprietary data
Denial of Service	Denial of Service	Resource exhaustion via large/recursive prompts
Jailbreaking	Elevation	Bypass content filters and safety guardrails
Agent Hijacking	Elevation	Manipulate AI agent to perform unauthorized actions

For each identified threat, document the mitigation strategy and its current status.

Threat	Mitigation	Control Type
Prompt Injection	Input sanitization, system prompt hardening, output filtering	Preventive
Data Leakage	Output scanning for PII/secrets, data classification enforcement	Detective
DoS	Rate limiting, token limits, request size limits, timeout enforcement	Preventive
Model Theft	Access controls, API key rotation, usage monitoring	Preventive
Jailbreaking	Content filters, output validation, monitoring for policy violations	Detective
Agent Hijacking	Least privilege, tool whitelisting, human approval gates	Preventive

Control Types: Preventive (stop the attack), Detective (detect the attack), Corrective (respond to the attack)

Document remaining risks after mitigations and get formal approval.

- Likelihood after mitigation: [Low/Medium/High]

- Accepted by: [NAME, TITLE]

- Likelihood after mitigation: [Low/Medium/High]

- Accepted by: [NAME, TITLE]

Scheduled: [DATE] (or triggered by significant system change, new threat intelligence, or security incident)