Pillar 4 — Engineering Intelligence
The New Malware Analysis Is Agent Behavior Analysis
As agents gain tools, skills, memory, and identity, defenders need to analyze behavior, not just prompts.
Malware analysis begins with distrust. You do not double-click the sample on your laptop and hope for the best. You identify it, hash it, inspect strings, examine imports, run it in an isolated lab, capture network traffic, watch filesystem changes, look for persistence, and revert the machine when the analysis is done.
The discipline exists because adversarial software lies about what it is. AI agents are moving into the same territory.
A skill package may present itself as a productivity helper. A connector may look like a harmless bridge to a SaaS tool. A local automation module may promise to summarize files or manage tickets. Underneath, it may request broad permissions, call remote services, modify persistent memory, read local files, instruct the agent to use tools in a particular sequence, or operate under a user’s identity.
Treat agent extensions as software supply-chain artifacts. Treat prompts as executable influence. Treat memory as persistence. Treat tool calls as behavior. Then analyze them with the seriousness defenders already bring to malware.
The Agent Ecosystem Has an Execution Layer
Early AI security focused heavily on model output. Did the model hallucinate? Did it reveal sensitive text? Did it follow a malicious prompt? Did it generate unsafe content? Those questions still matter. They are no longer enough.
Agents now have execution layers: tools, plugins, skills, MCP servers, local scripts, browser automation, IDE integrations, SaaS connectors, workflow actions, memory stores, and long-running background tasks. That means the security object has changed. The thing to analyze is not only the prompt. It is the behavior of a composed software system.
A skill can define how tools are orchestrated. A connector can expand reach into documents, tickets, calendars, code, email, customer records, and internal APIs. A memory system can preserve state across sessions. A remote instruction source can change what the agent does after installation. A local script can turn a natural-language request into file, network, or shell activity.
This starts to resemble other ecosystems defenders already understand:
- browser extensions
- npm packages
- CI/CD actions
- IDE extensions
- SaaS integrations
- endpoint automation
- malware droppers
The package may be small. The authority it inherits may be large.
Static Review Is Only the First Pass
Malware triage starts with basic facts. What is this file? What platform does it target? What strings are visible? What libraries or APIs does it use? Does it contact a network destination? Does it reference persistence locations? Does it look packed or obfuscated? Is it known already?
Agent triage needs an equivalent first pass:
- What permissions does the skill or connector request?
- What tools can it invoke?
- What files can it read or write?
- What domains can it contact?
- What secrets or tokens are available in its runtime?
- What persistent memory can it modify?
- What instructions are embedded in markdown, config, comments, or remote content?
- What scripts or binaries does it run?
- What dependencies does it install?
- What user-facing explanation does it provide compared with what it can actually do?
Malicious behavior can be conditional. It may activate only when a specific file exists, when a certain environment variable is present, when the user asks a particular kind of question, when the date changes, when a remote prompt changes, or when the agent has accumulated enough context to make the action valuable.
Static review can tell you what the artifact contains. It cannot prove what the agent will do.
Behavior Is the Security Signal
Malware analysts run samples because behavior reveals what code hides. The same principle applies to agents.
An agent extension should be observed under controlled conditions. Give it fake files, fake credentials, fake tickets, fake customer data, fake repositories, fake SaaS APIs, and fake memory. Then watch what it attempts to read, where it tries to connect, what it writes, which tools it invokes, what state it preserves, and whether its explanation matches its behavior.
The analysis target is not “did the model say something suspicious?”
The target is operational behavior:
- Did it request data outside the task?
- Did it attempt outbound communication?
- Did it write memory unexpectedly?
- Did it create or modify files unrelated to the user request?
- Did it try to access credentials?
- Did it alter configuration?
- Did it call tools in a sequence that changes the risk profile?
- Did it hide consequential actions behind harmless language?
- Did it behave differently when a document contained adversarial instructions?
This is where agent behavior analysis becomes distinct from ordinary application testing.
The same natural-language request can produce different behavior depending on context, memory, retrieved documents, tool availability, and model interpretation. That makes deterministic unit testing insufficient. Defenders need behavioral runs.
Build Agent Detonation Labs
Malware analysis uses isolated labs because real environments are too valuable to risk. Agent analysis needs the same habit. An agent detonation lab should be disposable, instrumented, and fake-rich.
Disposable means the environment can be reset. No durable credentials. No connection to real production systems. No access to the analyst’s personal files or corporate tokens.
Instrumented means the lab captures behavior. Tool calls, filesystem diffs, network destinations, process activity, memory writes, prompt/context boundaries, retrieved documents, and approval screens should all be recorded.
Fake-rich means the environment contains plausible targets. Empty sandboxes miss behavior because there is nothing interesting to touch. A useful lab includes seeded documents, honeytokens, fake API keys, mock SaaS endpoints, sample repositories, synthetic tickets, realistic logs, and controlled adversarial content.
The point is to answer concrete questions:
- What does the agent try to enumerate?
- What does it treat as valuable?
- What does it send outward?
- What state survives the session?
- What actions occur without explicit user intent?
- What happens when untrusted documents contain instructions?
- What happens when the agent sees credentials it should ignore?
- What happens when a remote service returns hostile content?
Memory Is Persistence
Malware analysts care about persistence because malware that survives reboot becomes part of the machine’s future Agent memory deserves the same scrutiny. Memory can preserve preferences, project facts, user habits, task context, and workflow state. That can make agents useful. It can also make compromise durable.
If untrusted content can write to memory, it can influence future sessions. If a malicious skill can store instructions, shortcuts, external destinations, poisoned facts, or hidden preferences, the attack may outlive the original interaction.
Deleting a bad message may not clean the system. Removing a connector may not remove memory it already wrote. Revoking a tool may not erase a poisoned preference that affects future decisions. Rebuilding a session may not help if long-term memory is shared across workflows.
Memory needs analysis controls:
- what wrote this memory
- when it was written
- which source influenced it
- who approved it
- which workflows can read it
- whether it can trigger tool use
- how it is deleted
- how it is reviewed after an incident
Persistent state is operational power. It should not be treated as chat history with better branding.
Tool Calls Are System Calls
In malware analysis, system calls matter because they show what the program actually does.
Open file. Read registry. Create process. Connect socket. Write data. Modify startup entry.
Agent tool calls are the equivalent behavioral layer.
Search documents. Read file. Query CRM. Send message. Create ticket. Execute command. Update memory. Open pull request. Call webhook. Retrieve secrets. Modify configuration.
Tool-call transcripts should be treated as first-class security evidence.
The transcript should capture the exact tool, inputs, outputs, calling context, identity used, user approval state, and source content that influenced the call. A pretty natural-language conversation is not enough.
When an incident occurs, nobody should have to infer behavior from the final answer. They should be able to reconstruct the action path.
This also changes testing. A safe-looking final response can hide risky intermediate behavior. The agent may have read more than necessary, contacted external services, or written memory even if the final user-facing text is benign.
Marketplaces Need Sandboxes, Not Just Scanners
Agent ecosystems will create marketplaces. Marketplaces create supply-chain risk.
Static scanning will help, especially for obvious dangerous permissions, suspicious domains, known malicious dependencies, and unsafe scripts. But scanners that only pattern-match text or manifests will miss behavior expressed through natural language, remote prompts, conditional logic, and tool orchestration.
Agent marketplaces will need the equivalent of malware sandboxes.
Before installation, an extension should be tested in controlled environments. During use, its behavior should be monitored. After updates, it should be retested. Permissions should be compared against observed behavior. Network destinations should be tracked. Memory writes should be reviewed. High-risk behavior should require stronger publisher trust and stronger runtime controls.
This is how mature ecosystems usually evolve. At first, trust is social. Then incidents happen. Then signing, permissions, scanning, runtime monitoring, reputation, revocation, and incident response become part of the ecosystem. Agent systems should skip fewer of those lessons.
The Defensive Shift
AI security will not mature if it remains obsessed with whether a model said the wrong thing. Agents are becoming systems with behavior. That behavior needs analysis.
The defensive questions are practical:
- What did it request?
- What did it read?
- What did it write?
- What did it call?
- What did it remember?
- What did it send?
- What identity did it use?
- What changed after it ran?
- What would it have done if the fake credential were real?
That is a malware-analysis mindset applied to agentic systems. Any software system with autonomy, identity, tools, persistence, and exposure to untrusted input deserves behavioral scrutiny before it is allowed to operate in a real environment.