SLA & Performance Baselines
What This Requires
Define SLAs for AI systems (uptime, latency, error rate) and establish performance baselines. Monitor against SLAs and alert on violations. Review and adjust baselines quarterly based on actual performance.
Why It Matters
Without SLAs, teams lack objective criteria for success. Baselines enable detection of performance degradation (model drift, infrastructure issues).
How To Implement
Define SLAs
For each AI system, set SLAs: uptime (99.9%), latency p99 (<500ms), error rate (<1%), throughput (100 QPS). Align to business requirements.
Baseline Establishment
Run load tests and collect 2 weeks of production metrics. Calculate baseline: median, p50, p99. Document in runbook.
Monitoring & Alerting
Configure alerts for SLA violations: uptime <99.9%, latency p99 >500ms, error rate >1%. Alert on-call via PagerDuty/Opsgenie.
Quarterly Review
Review SLA performance vs. baseline. Adjust if needed (e.g., baseline latency increased due to added features). Document changes in change log.
Evidence & Audit
- SLA definitions per AI system
- Baseline establishment methodology and results
- Monitoring dashboards showing SLA metrics
- Alert configuration and incident history
- Quarterly review records with baseline adjustments