Input/Output Sanitization

Tier 2 SECURE

What This Requires

Sanitize all inputs to and outputs from LLM systems. Inputs: validate format, length, character set; block injection patterns. Outputs: escape HTML/JS, validate JSON schema, check for leaked secrets.

Why It Matters

Unsanitized inputs enable injection attacks. Unsanitized outputs enable XSS, command injection, or data leakage. Defense-in-depth requires sanitization at every boundary.

How To Implement

Input Sanitization

Validate all user inputs: enforce max length (e.g., 10K chars), allow only expected characters (alphanumeric + safe punctuation), reject known injection patterns (regex for "Ignore previous", excessive newlines).

Output Escaping

For web UI: escape HTML/JS in LLM outputs before rendering (use framework built-ins: React's JSX auto-escaping, Django's template escaping). For APIs: validate JSON schema before sending to client.

Secret Detection

Scan outputs for leaked secrets: high-entropy strings (base64, hex), known patterns (AWS keys, SSH keys, SSNs). Use regex or library (detect-secrets). If found, block response and alert.

Logging & Monitoring

Log sanitization failures (input rejected, output blocked). Alert on repeated failures (possible attack).

Evidence & Audit

  • Input validation rules (regex, length limits, character allowlists)
  • Output escaping implementation (code samples, library usage)
  • Secret detection configuration (regex, detect-secrets rules)
  • Test results showing sanitization effectiveness
  • Monitoring alerts for sanitization failures

Related Controls