LLM01: Prompt Injection
An attacker manipulates an LLM's behaviour by inserting instructions into the prompt — directly ('ignore previous instructions') or indirectly (a malicious instruction planted in a webpage, document, or RAG corpus the LLM reads). Traditional input validation fails because LLMs are trained to follow natural-language instructions. Defenses: separate system and user prompts using structured APIs, sanitize and filter inputs, treat external content as untrusted, constrain agent tool access.
LLM02: Insecure Output Handling
An application passes LLM output to a downstream system without sanitization. Rendering as HTML → XSS. Passing as SQL → SQL injection. Executing as shell → RCE. Sending as email → spam/phishing. Defense: treat every LLM output as untrusted user input. Encode, parameterise, validate against an allow-list before downstream use.
LLM03: Training Data Poisoning
An attacker introduces malicious content into the dataset used to train or fine-tune an LLM, causing backdoor behaviour on specific triggers. Most production apps using OpenAI / Anthropic / Google API models inherit base-model defenses, but the risk applies to your fine-tuning corpus and your RAG sources. Defense: verify training data provenance, isolate fine-tuning sets, monitor outputs for trigger-based anomalies.
LLM04: Model Denial of Service
An attacker submits prompts that consume disproportionate compute, hitting context-window limits, triggering loops, or running up an unbounded bill. Defense: per-user prompt rate limits, max-token enforcement, timeout budgets per request, billing alerts on cost-per-user anomalies.
LLM05: Supply Chain Vulnerabilities
Risk in the LLM dependency tree — frameworks (LangChain, LlamaIndex), vector DB clients, embedding libraries, model artifacts. A compromised library shipped through normal package management can exfiltrate prompts or steer outputs. Defense: pin model versions, audit dependency tree, subscribe to provider security advisories, use SBOM for AI pipelines.
LLM06: Sensitive Information Disclosure
The LLM reveals secrets, PII, training data, or system prompts because they were included in the prompt or fine-tuning. Defense: never send secrets into the prompt unless absolutely necessary, redact at the boundary, classify RAG documents and gate retrieval by user permission, audit logs for accidental disclosure.
LLM07: Insecure Plugin Design
A plugin or tool the LLM can invoke has weak input validation, exposes admin-level operations, or trusts LLM-generated parameters without verification. Defense: tight tool input schemas, server-side validation of every tool invocation, no admin operations exposed to LLM-callable surface, audit every tool call.
LLM08: Excessive Agency
An LLM agent has more permissions, tools, or autonomy than the task requires. When (not if) it is prompt-injected, the blast radius is the full permission set. Defense: least-privilege tool access, human-in-the-loop for high-risk actions, scoped credentials per invocation, dry-run mode for destructive operations, default-deny on unrecognised intents.
LLM09: Overreliance
Users or downstream systems trust LLM output as authoritative when it should be reviewed. Hallucinations enter business logic, code, medical advice, legal text. Defense: human-in-the-loop checkpoints for high-stakes outputs, display confidence + provenance, never present LLM output without traceable sourcing for regulated domains.
LLM10: Model Theft
An attacker exfiltrates a proprietary model — weights, fine-tuning data, system prompts — through API abuse, side-channel attacks, or insider access. Defense: rate-limit + monitor for distillation patterns, segment model artifacts by access tier, encrypt at rest, audit every model-export action, watermark outputs where feasible.
How to test an LLM application for these risks
Testing LLM applications is hybrid manual + automated. Manual red-team explores novel prompt vectors — role-play, multi-turn escalation, indirect injection via documents the agent reads. Automated probes — Garak, PyRIT, Bachao.AI's LLM probe suite — run thousands of variations against your endpoint and report regressions on every deploy. Coverage should span OWASP LLM Top 10, MITRE ATLAS, and AI Village's indirect-prompt-injection corpus. Treat findings as standard P1/P2 security bugs, with a defined SLA for remediation.