Five Poisoned Documents Can Manipulate Your…

Five documents. Ninety percent. The numbers sound like hype until you see the papers. Researchers have shown that injecting as few as five adversarial documents into a RAG knowledge base—one per target question—can push attack success rates past 90% when the system retrieves the top five chunks. That’s not against a toy corpus. It holds across benchmarks like NQ and MS-MARCO, with retrievers like Contriever, DPR, and ANCE, and against models from GPT-4o to LLaMA2 and DeepSeek. The takeaway isn’t that RAG is broken. It’s that retrieval has become the main lever: control what gets into the top-k, and you usually control what the model says. OWASP’s 2025 Top 10 for LLM Applications now codifies this under LLM08: Vector and Embedding Weaknesses. If your threat model still treats the vector store as a “safe” layer between raw text and the LLM, it’s time to change that.

Retrieval is the bottleneck, not the corpus

RAG has two stages: retrieve, then generate. Security teams often focus on the generator—prompt injection, output validation, jailbreaks. The retriever is treated as a relevance engine. You index documents, embed queries, and pull the nearest neighbors. The implicit assumption is that the retriever is neutral: it just finds “relevant” content. Relevance is defined by embedding similarity. If an attacker can craft documents that are both highly similar to the target query and contain the answer or behavior they want, they don’t need to touch the model. They only need to win the ranking game.

That’s exactly what recent poisoning work exploits. In PoisonedRAG (USENIX Security 2025) and the follow-on CPA-RAG framework, the attack is formalized as three conditions. The poisoned document must be retrieved (show up in top-k). It must drive the generation (the LLM produces the attacker’s target answer when that document is in context). And it must be concealed—fluent, non-repetitive, and resilient to defenses like perplexity filtering and duplicate-text detection. Early attacks failed on the third condition: gradient-based or template-heavy texts were easy to spot. CPA-RAG and similar methods use prompt-based generation across multiple LLMs plus retriever-guided optimization so that the poisoned passages look like normal, query-relevant content. They’re not obviously adversarial. They just happen to nudge the model toward a chosen output.

So the “five documents” result isn’t magic. It’s the default setting in the experiments: five adversarial passages per target query, each tuned for retrieval and generation. Under top-5 retrieval, at least one of those five is usually in the context window. If that one passage is persuasive enough, the model follows it. The attack success rate is then simply the frequency with which the poisoned content both gets retrieved and dominates the answer. On NQ with k=5, CPA-RAG reaches 92% ASR in a black-box setting—no access to model weights or retriever parameters—and still holds an ~5 percentage point edge over prior black-box baselines when defenses are applied. The same style of attack has been demonstrated against a commercial RAG system on Alibaba’s BaiLian platform. This isn’t lab-only.

Why “we only use embeddings” doesn’t protect you

A common reassurance in RAG design is that the system “only” stores vector embeddings, not raw text—as if that were a security boundary. Two things undermine that.

First, retrieval manipulation doesn’t require inverting embeddings. The attacker isn’t trying to reconstruct your documents from vectors. They’re injecting their documents (or influencing what gets into the corpus) so that their content embeds close to the queries they care about. The embedding space is the battlefield. Whoever controls the text that gets embedded and indexed can shape what the retriever returns. So the “safe proxy” idea—that vectors are opaque and therefore harmless—applies to a different threat (embedding inversion). It doesn’t apply to poisoning. For poisoning, the vector store is exactly where the fight happens: poisoned text is embedded like any other; it just happens to be optimized to rank highly for chosen queries and to carry a payload that steers the LLM.

Second, OWASP LLM08 explicitly calls out both data poisoning and embedding inversion under Vector and Embedding Weaknesses. Even if your immediate concern is “someone poisons our corpus,” the same control surface—how vectors are generated, stored, and retrieved—is where the framework expects you to look. Inadequate access controls, multi-tenant leakage, and poisoned or manipulated data in the knowledge base all sit under LLM08. Treating the retriever as “just search” and the vector DB as “not sensitive” undercounts the risk.

What actually works (and what doesn’t)

Perplexity filtering and duplicate-text removal are the usual first-line defenses. They help against crude, repetitive, or obviously unnatural poison content. They're not enough. CPA-RAG and similar methods are explicitly designed to produce fluent, diverse passages that pass those checks. In the reported experiments, attack success stays high even when those defenses are on. You need to think beyond “filter weird-looking chunks.”

Ingestion and provenance. If the corpus can be edited by outsiders (public wikis, user uploads, third-party feeds), treat every ingested document as untrusted. Validate structure and content before embedding; consider integrity checks or signing for high-assurance sources. That won’t stop a determined insider or a compromised pipeline, but it raises the bar for mass injection.

Retrieval-stage defenses. Research on retrieval-stage defenses (e.g., partitioning or masking suspicious tokens, attention-based filters) shows they can reduce attack success, but adaptive attackers can still reach on the order of 35% success in some settings. Mitigations, not silver bullets. Deploy them as part of a stack, and monitor retrieval logs for patterns (e.g., the same few chunks dominating for many distinct queries).

Access control and isolation. LLM08’s mitigations emphasize fine-grained access control and logical partitioning of the vector store. In multi-tenant RAG, one tenant must not be able to query or influence another tenant’s vectors. Same for different applications or trust boundaries sharing one index. Limit who can write to the knowledge base and who can read embeddings or run similarity search. Audit retrieval and ingestion.

Don’t over-trust retrieved context. Architecturally, the generator is trained to rely on retrieved context. That’s by design. But you can still add checks: validate or re-rank chunks before they’re sent to the LLM, cap the influence of a single document, or require multiple sources for sensitive claims. None of this is free (latency, complexity), but for high-stakes domains it’s part of the tradeoff.

LLM08 in practice

LLM08:2025 Vector and Embedding Weaknesses is the formal umbrella. It groups unauthorized access and data leakage, cross-tenant and federation conflicts, embedding inversion, data poisoning, and behavior alteration under one risk category. For RAG, the practical entries are: poisoning (malicious or biased documents in the index), inversion (recovering source text from vectors), and access/leakage (who can read or write the store). The “five poisoned documents at 90%” result is a concrete instance of the poisoning leg. It shows that a small, well-crafted set of documents can manipulate outputs at scale. When you map your controls to the OWASP list, LLM08 should drive at least: (1) strict ingestion and provenance for anything that gets embedded, (2) access control and tenant isolation for the vector store and any embedding API, and (3) monitoring and testing for retrieval manipulation—e.g., red-team style queries to see if injected or borderline content can dominate answers.

Bottom line

Five poisoned documents can manipulate your RAG system most of the time. Not because the model is weak. Because retrieval is a single point of control. Optimize for retrieval and generation together, keep the poison fluent, and the rest is engineering. Vector embeddings are not a safe proxy for “we don’t expose raw text”—they’re the mechanism by which retrieval is decided. OWASP LLM08 is the place where that reality shows up in the standard risk list. Plan for retrieval manipulation the same way you plan for prompt injection: assume the corpus and the vector store are in scope, and design so that a handful of adversarial passages can’t own the top-k.

Assessing RAG and vector store security? We do AI system risk reviews and retrieval architecture. Get in touch.

Five Poisoned Documents Can Manipulate Your RAG System 90% of the Time

Stay Updated on AI Risk & Compliance

Retrieval is the bottleneck, not the corpus

Why “we only use embeddings” doesn’t protect you

What actually works (and what doesn’t)

LLM08 in practice

Bottom line

Get an independent
AI risk assessment

Retrieval is the bottleneck, not the corpus

Why “we only use embeddings” doesn’t protect you

What actually works (and what doesn’t)

LLM08 in practice

Bottom line

Get an independentAI risk assessment

Get an independent
AI risk assessment