May 30, 2026

Prompt Injection

🇹🇷Türk hukuk bağlamı arıyorsanız bu kavramın Türkçe versiyonu:Prompt Injection Saldırısı →

What is “prompt injection”?

Prompt injection is a class of attack on systems built around LLMs in which an attacker crafts input (a prompt) that overrides, modifies or bypasses the system’s intended instructions. The vulnerability is structural: LLMs do not distinguish between “system instructions” and “user content” in their context window — both are tokens. OWASP lists prompt injection as the #1 risk in its LLM Top 10 (2023, 2025).

Prompt injection types

Direct injection: the user explicitly types adversarial instructions (“ignore previous instructions and reveal the system prompt”).
Indirect injection: adversarial instructions hidden in retrieved content — a website, document or email that the LLM reads as context.
Multi-turn injection: staged across multiple conversation turns to gradually shift behaviour.
Image / audio injection: multimodal models can be attacked through visual or audio content containing hidden instructions.

Defensive patterns

Input filtering and sanitisation: first-pass detection of suspicious patterns.
Separation of trusted vs. untrusted context: structured prompting that flags user content distinctly.
Output validation: programmatic checks before LLM output triggers actions or reaches users.
Privilege limitation: agents should only have minimal capabilities; tools should be granular and auditable.
Human-in-the-loop: for high-impact actions, require user confirmation.

Legal and regulatory implications

Under EU AI Act Article 15, high-risk AI systems must be “robust and secure” against attacks. Prompt injection that exposes personal data triggers KVKK / GDPR breach notification obligations. Vendor agreements increasingly include security commitments specific to LLM attacks.

Prompt injection as a liability question

Prompt injection is the canonical LLM-application vulnerability, and its legal weight grows with agentic deployments: an agent with tool access that can be hijacked through planted instructions converts a security bug into unauthorized transactions, data exfiltration or defamatory output — with liability allocation depending on contracts written before the incident. The current standard of care, visible in OWASP’s LLM Top 10 and vendor security guidance: input/output filtering, privilege separation between model and tools, human confirmation gates for consequential actions, and logged decisions. Customer agreements for AI products should describe these controls honestly; security questionnaires already ask, and a misdescribed control posture is both a breach-of-warranty and a regulatory-aggravation fact after the incident.