What is “prompt injection”?
Prompt injection is a class of attack on systems built around LLMs in which an attacker crafts input (a prompt) that overrides, modifies or bypasses the system’s intended instructions. The vulnerability is structural: LLMs do not distinguish between “system instructions” and “user content” in their context window — both are tokens. OWASP lists prompt injection as the #1 risk in its LLM Top 10 (2023, 2025).
Prompt injection types
- Direct injection: the user explicitly types adversarial instructions (“ignore previous instructions and reveal the system prompt”).
- Indirect injection: adversarial instructions hidden in retrieved content — a website, document or email that the LLM reads as context.
- Multi-turn injection: staged across multiple conversation turns to gradually shift behaviour.
- Image / audio injection: multimodal models can be attacked through visual or audio content containing hidden instructions.
Defensive patterns
- Input filtering and sanitisation: first-pass detection of suspicious patterns.
- Separation of trusted vs. untrusted context: structured prompting that flags user content distinctly.
- Output validation: programmatic checks before LLM output triggers actions or reaches users.
- Privilege limitation: agents should only have minimal capabilities; tools should be granular and auditable.
- Human-in-the-loop: for high-impact actions, require user confirmation.
Legal and regulatory implications
Under EU AI Act Article 15, high-risk AI systems must be “robust and secure” against attacks. Prompt injection that exposes personal data triggers KVKK / GDPR breach notification obligations. Vendor agreements increasingly include security commitments specific to LLM attacks.
Prompt injection as a liability question
Prompt injection is the canonical LLM-application vulnerability, and its legal weight grows with agentic deployments: an agent with tool access that can be hijacked through planted instructions converts a security bug into unauthorized transactions, data exfiltration or defamatory output — with liability allocation depending on contracts written before the incident. The current standard of care, visible in OWASP’s LLM Top 10 and vendor security guidance: input/output filtering, privilege separation between model and tools, human confirmation gates for consequential actions, and logged decisions. Customer agreements for AI products should describe these controls honestly; security questionnaires already ask, and a misdescribed control posture is both a breach-of-warranty and a regulatory-aggravation fact after the incident.