Post-Incident Review Report
Purpose
A post-incident review template with timeline, root cause analysis, corrective actions, and lessons learned.
Related Controls
1. Incident Summary
Capture key facts about the incident for quick reference.
Incident ID: INC-[NNN]
Incident Title: [SHORT DESCRIPTION]
Date/Time Detected: [DATE TIME]
Date/Time Resolved: [DATE TIME]
Duration: [HOURS:MINUTES]
Severity: Critical / High / Medium / Low
Affected Systems: [SYSTEM NAMES]
Affected Users: [NUMBER / SCOPE]
Incident Commander: [NAME]
Report Author: [NAME], [ROLE TITLE]
Report Date: [DATE]
Impact Summary: [2-3 sentences describing the business and user impact]
2. Timeline of Events
Reconstruct the full incident timeline from detection to resolution.
| Time | Event | Action Taken | By |
|---|---|---|---|
| [TIME] | First anomaly detected by monitoring | Alert triggered, on-call notified | Automated |
| [TIME] | On-call engineer acknowledges alert | Begins investigation | [NAME] |
| [TIME] | Root cause identified | Incident Commander notified, team assembled | [NAME] |
| [TIME] | Containment action taken | [SPECIFIC ACTION — e.g., AI service disabled] | [NAME] |
| [TIME] | Fix implemented and deployed | [SPECIFIC FIX] | [NAME] |
| [TIME] | Service restored and verified | Monitoring confirmed normal operation | [NAME] |
| [TIME] | All-clear communicated to stakeholders | Resolution notification sent | [NAME] |
3. Root Cause Analysis
Use the 5 Whys technique to identify the true root cause.
5 Whys Analysis
Why 1: Why did the incident occur?
→ [ANSWER — e.g., "The AI agent executed an unauthorized API call"]
Why 2: Why did that happen?
→ [ANSWER — e.g., "The agent's permission boundary did not restrict that API endpoint"]
Why 3: Why was it not restricted?
→ [ANSWER — e.g., "The endpoint was added after the permission matrix was last reviewed"]
Why 4: Why wasn't the permission matrix updated?
→ [ANSWER — e.g., "No process to trigger permission review when new endpoints are added"]
Why 5: Why is there no trigger process?
→ [ANSWER — e.g., "Permission reviews are calendar-based (quarterly) not event-based"]
Root Cause Statement
[Clear, specific statement of the root cause — e.g., "Agent permission reviews are calendar-based rather than triggered by system changes, creating windows where new capabilities are not governed."]
4. Contributing Factors
Identify factors that contributed to the incident but are not the root cause.
Process Factors
- [FACTOR — e.g., "Change management did not include permission review as a checklist item"]
- [FACTOR — e.g., "No automated detection for agent behavior outside permitted boundaries"]
Technical Factors
- [FACTOR — e.g., "Agent framework does not enforce permission boundaries at runtime"]
- [FACTOR — e.g., "Monitoring did not alert on the unauthorized API call pattern"]
Human Factors
- [FACTOR — e.g., "Team unfamiliar with the new API endpoint's sensitivity level"]
- [FACTOR — e.g., "Alert fatigue caused initial monitoring signals to be dismissed"]
Note: This analysis is blameless. The goal is to improve systems and processes, not assign personal blame.
5. What Went Well
Acknowledge effective response elements to reinforce good practices.
- [POSITIVE — e.g., "Incident was detected within 5 minutes of occurrence"]
- [POSITIVE — e.g., "Response team assembled quickly and communicated effectively"]
- [POSITIVE — e.g., "Containment action prevented further data exposure"]
- [POSITIVE — e.g., "Rollback procedure worked as documented"]
6. What Didn't Go Well
Identify areas where the response could be improved.
- [NEGATIVE — e.g., "Took 30 minutes to identify root cause due to insufficient logging"]
- [NEGATIVE — e.g., "Customer communication was delayed by 1 hour"]
- [NEGATIVE — e.g., "Runbook for this scenario did not exist"]
- [NEGATIVE — e.g., "Monitoring alert was too noisy, causing initial dismissal"]
7. Corrective Actions
Define specific, measurable, and time-bound corrective actions.
| Action ID | Corrective Action | Owner | Deadline | Status | Verification Method |
|---|---|---|---|---|---|
| CA-001 | [ACTION — e.g., "Add permission review to change management checklist"] | [NAME] | [DATE] | Open / In Progress / Complete | [HOW TO VERIFY] |
| CA-002 | [ACTION — e.g., "Implement runtime permission enforcement in agent framework"] | [NAME] | [DATE] | ||
| CA-003 | [ACTION — e.g., "Add monitoring rule for unauthorized API call patterns"] | [NAME] | [DATE] | ||
| CA-004 | [ACTION — e.g., "Create runbook for agent boundary violation incidents"] | [NAME] | [DATE] |
8. Lessons Learned Summary
Distill key takeaways that should be shared across the organization.
Key Lessons
- [LESSON TITLE]: [DESCRIPTION — e.g., "Permission reviews must be event-driven, not just calendar-based. Any change to system capabilities should trigger a permission review."]
- [LESSON TITLE]: [DESCRIPTION — e.g., "Runtime enforcement of agent boundaries is essential. Documentation-only controls are insufficient for autonomous systems."]
- [LESSON TITLE]: [DESCRIPTION — e.g., "Monitoring must be tuned to detect behavioral anomalies, not just system health metrics."]
Distribution List
This report is distributed to:
- AI Governance Committee
- Incident Response Team
- System Owner(s)
- Engineering Team Lead(s)
- [ADDITIONAL STAKEHOLDERS]
Classification: Confidential — Internal Use Only