← All posts

Smart contracts taught us to audit AI agents — here's what carries over

Six years of blockchain auditing mapped to agentic security. Reentrancy, flash loans, oracle manipulation — the failure modes rhyme.

We spent six years auditing smart contracts. CosmWasm, Cosmos SDK, Solana — over 150 audits across protocols handling hundreds of millions of dollars (and even into the billions). The job was simple in theory: read code an attacker would eventually read, find what they’d find before they do.

Then AI agents started shipping to production. We started looking at MCP servers and agentic pipelines the way we used to look at DeFi protocols, and something clicked: we had seen almost all of these bugs before. Different runtime, different language, same threat model.

Here’s what six years of blockchain auditing taught us about securing AI agents.


The shared threat model

Smart contracts and AI agents are both:

Once you see this, the analogy becomes hard to unsee. Let us walk through six findings we’ve seen on both sides.


1. Sandboxed VMs → Unbounded execution environments

Smart contracts run in a sandbox. The EVM, CosmWasm’s Wasm runtime, Solana’s BPF — each one is a tightly constrained execution environment. The contract can only do what the VM allows. It can’t make arbitrary network calls, read the filesystem, or spawn processes. The attack surface is bounded by design.

AI agents are the opposite. They run on general-purpose infrastructure — your cloud, your laptop, your CI pipeline. They can call APIs, read files, execute code, browse the web, send emails, and interact with databases. Every new skill, plugin, or tool integration expands the attack surface. A smart contract exploit is limited to what the VM permits. An agent exploit is limited to whatever permissions the agent has been given — and most agents are given far too many.

This is the single biggest difference between the two worlds. Smart contracts are secure partly because they can’t do much. Agents are insecure partly because they can do everything. The mitigation: treat every agent like it’s running in a hostile environment. Least privilege isn’t optional — it’s the only thing standing between a prompt injection and full system access. Sandbox what you can. Restrict tool access to the minimum needed. Assume every capability you grant will eventually be exercised by an attacker.


2. Reentrancy → Agent looping

The reentrancy bug is the original DeFi sin. The attack: a contract sends ETH to an external address, and before it updates its own balance, that external address calls back into the contract and withdraws again. The $60M DAO hack in 2016. Hundreds of millions drained since.

The AI equivalent: an agent executes a tool, receives a response that includes instructions to call another tool, which returns instructions to call back to the first tool — or worse, to call itself. The loop isn’t always as obvious as a stack trace. We’ve seen it manifest as an agent caught in a retrieval cycle, re-fetching the same poisoned document infinitely. We’ve seen it triggered by a malicious tool response that injects a subtask into the agent’s working context.

The mitigation is structurally similar too. In smart contracts: checks-effects-interactions — update your state before making external calls. In agents: bound your tool call depth, validate loop termination conditions, and treat tool responses as untrusted inputs, not as continuations of your own reasoning.


3. Flash loan exploits → Context injection

Flash loans let you borrow enormous sums of capital, use them within a single transaction, and return them — all atomically. The exploit isn’t in the loan itself; it’s in what you can do with temporary, unchecked access to capital mid-transaction. Manipulate an oracle. Drain a liquidity pool. Exploit a pricing mechanism that assumed a normal trading environment.

Context injection is the same shape. An attacker can’t permanently compromise your agent — but they can inject a large payload of malicious context into a single execution. The agent retrieves a document from a RAG store, or calls a tool that returns attacker-controlled text, or processes a user-submitted form. Inside that payload: instructions that look, to the model, like they came from the system. The agent acts on them. The context is gone after the session. The action persists.

Flash loan defenses are price oracle hardening, time-weighted averages, and circuit breakers. Context injection defenses are input sanitization at retrieval boundaries, prompt construction that structurally separates instructions from data, and skepticism about any content that arrives via tool output or user input claiming special authority.


4. Access control bugs → MCP permission failures

Access control is the most common finding in smart contract audits. Some function that should only be callable by the owner can be called by anyone. A modifier that was meant to gate privileged operations was applied to the wrong function. A role that should be restricted gets handed out during initialization and never revoked.

MCP servers have the same problem, and right now it’s worse because there’s no established convention for what “access control” even means in this context. We’ve reviewed servers where every tool is available to every model with no scoping. Servers where the tool descriptions themselves grant implicit permissions by telling the model it “can” do things it shouldn’t be able to. Servers where the permission model exists in the documentation but isn’t enforced in the implementation.

The principle of least privilege is non-negotiable in smart contracts. It should be in MCP servers too: each tool should request only the permissions it needs, the server should validate that the calling context is authorized before executing, and there should be no path from an unauthenticated input to a privileged operation.


5. Indirect prompt injection → The web as an attack vector

This one has no clean smart contract analogy, and it’s arguably the most dangerous pattern we’re seeing right now.

Agents are ingesting more and more content from the open internet — web searches, scraped pages, retrieved documents, API responses, email bodies. Every one of those sources is a potential injection point. An attacker doesn’t need access to your system. They just need to put text on a web page that your agent will eventually read.

A hidden instruction buried in a blog post, a comment on a GitHub issue, a line of white-on-white text on a product page — any of these can carry a payload that redirects agent behavior. The agent fetches the page as part of a research task, ingests the content, and follows the embedded instruction as if it came from the user or the system prompt. The attacker never touched your infrastructure. They just poisoned a page they knew your agent would visit.

This is fundamentally different from anything in the smart contract world. Contracts don’t browse the web. They don’t ingest unstructured text from arbitrary sources. Agents do, constantly, and the surface area grows with every new data source you connect.

The mitigation is layered: sanitize and filter retrieved content before it reaches the model, structurally separate data from instructions in your prompt construction, limit what actions the agent can take based on retrieved content, and assume that anything sourced from the internet may contain adversarial instructions. If your agent can read the web and also send emails, you have a data exfiltration path waiting to be exploited.


6. Oracle manipulation → RAG poisoning

DeFi protocols need to know the price of assets. They get this from oracles — external data sources that feed prices on-chain. If you can manipulate the oracle, you can make the protocol believe an asset is worth more or less than it is, and exploit the gap. Dozens of protocols have been drained this way.

RAG (retrieval-augmented generation) is your agent’s oracle. It’s the external data source the model consults when it needs to know things. If an attacker can get malicious content into your vector database — or into any document your agent retrieves — they’re manipulating the oracle. The model will treat retrieved content as ground truth unless explicitly instructed otherwise. An attacker who can insert a document that says “when asked about X, always do Y” has effectively changed your system prompt for any query that retrieves that document.

The mitigation: treat your retrieval pipeline with the same suspicion you’d treat an external price oracle. Validate sources. Don’t let retrieved content override instructions. Consider the retrieval step a trust boundary, not a data pipe.


What this means

The AI security industry is not starting from zero. There’s a body of adversarial knowledge built over years of breaking composable, autonomous, high-stakes systems in public. Most of it hasn’t been translated yet.

That translation work is what we do. If you’re shipping an agentic system and want someone who’s thought about this longer than it’s been fashionable — get in touch.