AI Red Team Playbook

Tier 2 SECURE

NIST AI-600-1

Related Templates

What This Requires

Develop and execute AI-specific red team scenarios: prompt injection, jailbreaking, data extraction, model inversion, and agent hijacking. Conduct annually or before major releases. Remediate high-severity findings.

Why It Matters

Static security testing misses creative attacks. Red teaming simulates real-world adversaries and uncovers edge cases. Findings drive hardening roadmap.

How To Implement

Define Scenarios

Create 5-10 attack scenarios: (1) prompt injection to leak data, (2) jailbreak to bypass content filters, (3) agent hijacking via tool manipulation, (4) model inversion to extract training data, (5) DoS via recursive prompts.

Assemble Team

Recruit internal security engineers or hire external AI red team consultants. Provide target system access (staging or isolated prod replica).

Execute & Document

Red team executes attacks over 1-2 weeks. Document successes (exploits found), failures (defenses held), and near-misses. Rate findings by severity (CVSS or internal scale).

Remediate

Fix high/critical findings before prod release. For medium/low, track in backlog with target fix date. Re-test fixes before closing.

Evidence & Audit

Red team playbook with defined scenarios
Engagement scope and rules of engagement
Red team report with findings and severity ratings
Remediation plan and completion records
Re-test validation results

Related Controls

AI/LLM Threat Modeling Prompt Injection Prevention Post-Incident Review