Preventing Excessive AI Agency
What This Requires
Prevent excessive agency by limiting agent autonomy: require human approval for destructive actions, cap iteration depth, enforce timeouts, and monitor for runaway behavior.
Why It Matters
Autonomous agents can spiral into harmful behavior (infinite loops, mass deletions, unintended side effects). Guardrails ensure humans remain in control.
How To Implement
Human-in-the-Loop
For high-risk actions (delete data, deploy code, send external messages), require human approval. Pause agent execution until approval granted via UI or API.
Iteration Limits
Cap agent reasoning loops (e.g., max 10 iterations). Prevent infinite loops where agent repeatedly calls same tool with no progress.
Timeouts
Enforce execution timeout (e.g., 5 minutes per request). Abort agent if timeout exceeded. Return error to user with option to retry.
Runaway Detection
Monitor for suspicious patterns: excessive API calls (>100/min), repeated errors, no successful tool calls. Auto-pause agent and alert operator.
Evidence & Audit
- Agency policy defining approval requirements for high-risk actions
- Implementation of human approval gates (code, UI workflow)
- Iteration limits and timeout configuration
- Runaway detection rules and alert logs
- Test results showing limits enforced