Prompt Injection Prevention
What This Requires
Implement technical controls to prevent prompt injection: input validation, context isolation, privilege separation, and output filtering. Test defenses with adversarial prompts.
Why It Matters
Prompt injection is the AI equivalent of SQL injection. Attackers manipulate prompts to bypass restrictions, leak data, or gain unauthorized access. Defense-in-depth is essential.
How To Implement
Input Validation
Block known injection patterns: "Ignore previous instructions", "You are now DAN", excessive newlines. Use regex or ML-based classifier to detect attacks.
Context Isolation
Separate system instructions from user input using delimiters (e.g., XML tags, triple quotes). Use LLM features like OpenAI's message roles (system vs. user).
Privilege Separation
Limit agent permissions. If agent only needs read access, don't grant write. Enumerate allowed tools in system prompt.
Output Filtering
Monitor outputs for sensitive patterns (SSN, API keys). If detected, block response and alert security team. Use regex or DLP library.
Evidence & Audit
- Input validation rules (regex, ML classifier config)
- Sample prompts showing context isolation implementation
- Agent privilege configuration (RBAC, tool allowlists)
- Output filtering rules and test results
- Adversarial testing reports