AI Incident Response Plan

Plan MONITOR

Purpose

An AI-specific incident response plan covering detection, triage, containment, and post-incident review procedures.

Related Controls

NIST MS-4

1. Purpose

Define the plan's scope and activation criteria.

This plan establishes procedures for detecting, responding to, and recovering from incidents involving AI systems at [ORGANIZATION NAME]. It supplements the general IT incident response plan with AI-specific procedures.

Plan Owner: [ROLE TITLE], [DEPARTMENT]

Effective Date: [DATE]

Activation Criteria: Any event that disrupts AI system availability, compromises AI system integrity, exposes data through AI channels, or causes AI systems to produce harmful outputs.

2. Incident Type Definitions

Classify AI-specific incident types with severity levels.

Incident TypeCategoryDefault SeverityExamples
Prompt Injection AttackSecurityHighSystem prompt extracted, unauthorized actions executed
Data Leakage via AIPrivacyCriticalPII exposed in model outputs, training data extracted
Model Drift/DegradationQualityMediumAccuracy drops below threshold, biased outputs detected
AI Service OutageAvailabilityHighModel endpoint unresponsive, API provider outage
Harmful OutputSafetyHighToxic, misleading, or dangerous content generated
Agent RunawayOperationalCriticalAgent exceeding limits, unauthorized resource access, cascading failures
Supply Chain CompromiseSecurityCriticalCompromised model, malicious dependency, vendor breach

3. Response Team Roles

Define the incident response team structure and responsibilities.

RoleResponsibilitiesPrimaryBackup
Incident CommanderOverall coordination, decision authority, stakeholder communication[NAME, PHONE][NAME, PHONE]
AI Technical LeadAI-specific diagnosis, model behavior analysis, prompt forensics[NAME, PHONE][NAME, PHONE]
Security AnalystAttack analysis, evidence preservation, threat assessment[NAME, PHONE][NAME, PHONE]
OperationsSystem health, log collection, service restoration[NAME, PHONE][NAME, PHONE]
CommunicationsInternal/external communications, customer notification[NAME, PHONE][NAME, PHONE]

4. Response Procedures

Step-by-step procedures for each severity level.

Severity 1 (Critical) — Response within 15 minutes

  1. Detect: Alert received via monitoring / user report / automated detection
  2. Triage: Incident Commander confirms severity, activates response team
  3. Contain: Immediately disable affected AI system or isolate from network
  4. Notify: Alert executive leadership, legal, and affected stakeholders
  5. Investigate: Preserve logs, analyze attack vector or failure mode
  6. Remediate: Fix root cause, apply patches, update controls
  7. Restore: Re-enable system with enhanced monitoring
  8. Review: Conduct post-incident review within 48 hours

Severity 2 (High) — Response within 1 hour

  1. Detect & Triage: Confirm severity and assign to response team
  2. Contain: Apply targeted mitigation (rate limit, input filter, output block)
  3. Investigate: Analyze scope and impact
  4. Remediate: Apply fix in staging, test, deploy
  5. Review: Post-incident review within 1 week

Severity 3 (Medium) — Response within 4 hours

  1. Detect & Triage: Log incident, assign owner
  2. Investigate: Determine root cause
  3. Remediate: Schedule fix in next sprint
  4. Review: Include in weekly security review

5. Escalation Matrix

Define when and to whom incidents are escalated.

SeverityInitial ResponseEscalation (if unresolved)TimeframeExecutive Notification
CriticalIncident Commander + Full TeamCISO → CTO → CEOImmediateWithin 15 minutes
HighOn-Call + AI Tech LeadIncident Commander → CISO1 hourWithin 4 hours
MediumOn-Call EngineerTeam Lead → AI Program Lead4 hoursWeekly summary
LowAssigned EngineerTeam LeadNext business dayMonthly summary

6. Communication Templates

Pre-drafted communications for rapid incident notification.

Internal Notification (Severity 1-2)

Subject: [SEVERITY] AI Incident — [SYSTEM NAME] — [SHORT DESCRIPTION]

Status: Active / Contained / Resolved

Impact: [Description of user/business impact]

Current Actions: [What the team is doing right now]

Next Update: [TIME]

External/Customer Notification (if required)

Subject: Service Update — [SYSTEM NAME]

Body: We are aware of an issue affecting [SERVICE]. Our team is actively working to resolve the situation. [SPECIFIC IMPACT]. We expect to provide an update by [TIME]. We apologize for any inconvenience.

Post-Resolution Notification

Subject: Resolved — [SYSTEM NAME] Incident

Body: The incident affecting [SERVICE] has been resolved as of [TIME]. Root cause: [BRIEF SUMMARY]. A full post-incident review will be completed within [TIMEFRAME].

7. Plan Review History

Track plan reviews, tabletop exercises, and updates.

DateActivityParticipantsFindingsChanges Made
[DATE]Plan Review[NAMES][FINDINGS][CHANGES]
[DATE]Tabletop Exercise[NAMES][FINDINGS][CHANGES]
[DATE]Live Incident Lessons[NAMES][FINDINGS][CHANGES]

Tabletop Exercise Schedule

  • Frequency: Semi-annual (minimum)
  • Scenarios: Rotate through all incident types over 2-year cycle
  • Next Exercise: [DATE]
  • Scenario: [PLANNED SCENARIO]
← Back to all templates