Pillar 2 — Platform & Infrastructure
Prompt Injection Is SSRF for Thought
Prompt injection is not a model being tricked. It is request forgery across an authority boundary.
The mistake in early SSRF defenses was thinking the URL was the problem.
Teams blocked localhost. Attackers used alternate IP formats. Teams blocked the cloud metadata address. Attackers found parser gaps, redirects, DNS rebinding, or services that reached the same place through a different route. The real issue was never the string.
The issue was authority.
An untrusted user could choose where a trusted server sent a request. The server had network position the attacker did not have. It could reach internal services, localhost ports, cloud metadata endpoints, and administrative surfaces hidden behind the firewall.
Prompt injection has the same shape.
An untrusted document, email, ticket, web page, Slack message, PDF, code comment, or retrieved knowledge-base entry is placed in the same reasoning context as trusted instructions. The model is then allowed to call tools, retrieve data, write messages, update records, open pull requests, run workflows, or make recommendations under an identity the attacker does not have.
That is SSRF translated from networks into cognition.
The attacker is no longer choosing a URL for a server to fetch. The attacker is choosing instructions for a model to interpret.
The trusted server becomes the trusted agent.
The internal network becomes the enterprise tool graph.
The metadata endpoint becomes the privileged action surface: files, email, CRM, code repositories, ticketing systems, SaaS APIs, cloud consoles, databases, and CI/CD.
The bug is that the system confuses data with authority.
SSRF Was Never About URLs
SSRF looks simple from the outside.
A product adds a URL preview feature. A user shares a link. The server fetches the link to extract a title, description, and thumbnail. This seems harmless because the user could have opened the link themselves.
But the server is not the user.
The server sits in a different trust position. It may be inside a VPC. It may have access to internal service names. It may be able to reach localhost-only services. It may be running on a cloud instance with a metadata service that issues credentials to the workload.
The external attacker cannot reach those things directly.
The server can.
Once the attacker controls the destination of the server-side request, the application becomes a proxy into a more privileged environment.
This is why cloud metadata endpoints became such valuable SSRF targets. They were designed for workloads to learn about themselves and obtain temporary credentials. They were not designed to be exposed to arbitrary internet users through a preview feature, image fetcher, webhook validator, PDF renderer, or import tool.
The server-side request carried the server’s position.
That is the lesson AI systems need to absorb.
The dangerous part of prompt injection is hostile text interpreted from inside a privileged decision loop.
The AI Version of the Same Vulnerability
Modern AI applications increasingly do the same thing SSRF-prone web applications did.
They retrieve attacker-influenced content and process it from a trusted position.
A customer-support assistant reads incoming emails and account history, then drafts replies. A procurement assistant reads vendor documents and internal policy, then recommends next steps. A coding assistant reads repository files and comments, then proposes changes. A sales assistant reads CRM notes and public web pages, then updates opportunities. An operations agent reads alerts and runbooks, then executes remediation steps.
In each case, untrusted content can enter the model context.
That content may look like ordinary data. A paragraph in a web page. A note in a ticket. A hidden instruction in a document. A comment in a repository. A sentence in an email thread. A line in a retrieved knowledge-base article.
But if the model treats all text in context as potential instruction, the content becomes operational influence.
This is the authority confusion:
- The system prompt says what the agent is supposed to do.
- The retrieved document says what the world contains.
- The user request says what task is being attempted.
- The tool layer gives the model ways to act.
If those boundaries collapse, the agent cannot reliably distinguish between “this document contains information” and “this document is authorized to instruct me.”
That is the AI equivalent of letting an attacker-controlled URL inherit the server’s network position.
Filtering Bad Prompts Is the Wrong Center of Gravity
Many prompt-injection defenses start with content filtering.
Look for phrases such as “ignore previous instructions.” Strip suspicious text. Add a stronger system prompt. Tell the model never to reveal secrets. Ask another model to classify whether the retrieved content is malicious.
Some of these controls can help.
They are not the architecture.
SSRF taught the same lesson. Blocking one address is useful. Blocking private ranges is useful. Normalizing URLs is useful. Preventing redirects is useful. But the deeper fix is to stop giving a generic fetcher unrestricted network reach.
Natural language is even more flexible than URLs. Instructions can be indirect, translated, quoted, fragmented, embedded in tables, hidden in markup, encoded in images, or expressed as role-play. They can be split across documents and assembled only after retrieval. They can exploit ambiguity rather than obvious malicious language.
The real defense is not a better blacklist of bad words. The real defense is reducing what untrusted content is allowed to influence.
If retrieved text can change tool choice, destination, permission scope, data disclosure, escalation behavior, memory writes, or user-facing claims without being constrained by the application, the system is relying on language moderation to enforce an authority boundary.
That will fail.
Separate Data From Instructions
The first design requirement is simple to state and difficult to implement well:
Data should not automatically become instruction.
A retrieved email can provide facts. It should not be able to redefine the agent’s job. A web page can provide content. It should not be able to authorize a tool call. A support ticket can describe a customer problem. It should not be able to change what data the assistant is allowed to reveal.
This separation has to exist in the product architecture, not only in the prompt.
The application should decide:
- which content sources are untrusted
- which instructions are trusted
- which tools can be used for this task
- which data can be disclosed
- which actions require confirmation
- which actions are impossible from this workflow
- which outputs require citations or provenance
- which requests should be rejected because authority is unclear
The model can reason inside those boundaries.
This is where many agent architectures become fragile. They put the model in the role of interpreter, planner, policy engine, access-control judge, and user-interface writer at the same time. Then they feed it untrusted content and ask it to remain loyal to an instruction hierarchy expressed in natural language.
It’s too much authority inside one probabilistic component.
Tool Access Is the Metadata Endpoint
In SSRF, the cloud metadata endpoint was valuable because it converted network reach into credentials.
In agentic AI, tool access plays a similar role.
The model may not “have” secrets in its weights. But the surrounding system may let it call tools that can read secrets, fetch private documents, send messages, query databases, trigger workflows, modify records, or execute code.
That is where the real blast radius lives.
A prompt-injection attack that only changes a summary is a data-quality problem. A prompt-injection attack that changes a tool call is an authority problem.
The useful question is not “can the model be tricked?” Its “What can the model do after being tricked?”
If an injected instruction can make the agent read a private file, forward an internal email, create an external webhook, approve a payment, open a pull request, modify a ticket, change a cloud resource, or write persistent memory, then the prompt-injection risk is not conversational. It is operational.
The tool graph must be designed as if the model can be influenced by hostile content.
That means:
- deny by default
- grant tools per task, not per agent personality
- scope data access tightly
- require explicit user-visible approval for consequential actions
- prevent untrusted content from choosing external destinations
- separate read tools from write tools
- isolate memory writes from retrieved content
- log privileged actions with enough context to reconstruct why they happened
This is the same discipline security teams already apply to server-side fetchers, service accounts, cloud roles, and internal APIs.
Identity Matters More Than Intelligence
AI security discussions often over-focus on model intelligence.
The sharper question is identity.
Which identity does the agent use when it acts? Does it inherit the user’s access? Does it use a shared service account? Does every agent instance share the same credentials? Can it reach data the user cannot see? Can it perform writes the user could not perform manually? Are tool calls attributed to the human, the agent, the application, or the service account?
SSRF was dangerous because the request came from the server’s network position.
Prompt injection is dangerous because the action comes from the agent’s identity position.
If the agent acts with broad ambient authority, every piece of untrusted context becomes a possible path into that authority.
This is why least privilege has to be concrete. Not a principle in a policy document. Concrete capability grants, per workflow, with visible boundaries.
A support assistant should not have the same powers while summarizing a ticket as it has while issuing a refund. A coding assistant should not have the same powers while explaining a file as it has while opening a pull request. An enterprise-search assistant should not have write access because one future workflow might need it.
Authority should be narrow at the moment of action.
Prompt Injection Is a Systems Bug
Treating prompt injection as bad text leads to shallow fixes.
Treating it as request forgery across an authority boundary leads to better architecture.
The defensive pattern is familiar:
- classify which inputs are untrusted
- define what authority each workflow requires
- minimize tool access
- constrain outbound communication
- keep external destinations out of untrusted content
- separate retrieval from instruction
- require durable logs for privileged actions
- design approval flows around raw action intent, not model-written summaries
- test the system with hostile documents, not only hostile user prompts
The agent should be assumed influenceable.
The system should still be hard to abuse.
That is the maturity shift.
SSRF became manageable when teams stopped trusting URL strings and started designing around network authority. Prompt injection will mature the same way. Not by finding the perfect refusal phrase. By admitting that untrusted language should not be allowed to borrow privileged identity.