Securing Agentic AI: Least Privilege, Outpu…

The standard advice for agentic AI is to not deploy it, or to restrict it to read-only use cases. That’s policy, not engineering. In practice, teams are already shipping agents that call tools—APIs, databases, MCP servers—and the question isn’t whether to allow it but how to bound the damage when the agent is wrong, subverted, or simply overeager. The controls that actually reduce risk are the ones that assume the model’s output is untrusted and that every tool call is a privilege that must be scoped, validated, and logged.

Here’s what that looks like in the stack.

Scoped tool permissions: least privilege at the tool layer

An agent doesn’t need “access to the CRM.” It needs permission to call specific tools with specific parameters. Least privilege for agents means: this agent can invoke these tools, under these constraints, and nothing else. Not “this agent can call any tool the runtime knows about.”

That implies an allowlist. Each agent (or agent type) has an explicit list of tools it’s allowed to call. The orchestration layer—the thing that sits between the LLM and the tool execution—enforces it. If the model emits a tool call for something not on the allowlist, the call is rejected. No fallback, no “we’ll log it and allow it this time.” That prevents both prompt-injection-driven tool abuse and the creep of “we added one more tool for convenience” that turns into broad, standing access.

Scoping goes deeper than tool names. Parameters matter. An agent that can “send email” shouldn’t get a blanket send capability; it should be restricted to a domain, a distribution list, or a template. An agent that can query a database should be limited to certain tables or views and to read-only operations if that’s all the use case needs. This is the same idea as IAM policies for humans: not “can use Salesforce,” but “can read Cases and Contacts, cannot write.” For agents, the policy is expressed over tool identity and parameters. Frameworks like Progent and policy gateways (Cerbos, MCPermit, Aperture) let you express these rules declaratively and enforce them at runtime so the agent never sees or invokes tools it wasn’t granted.

The tradeoff is operational. Every new capability requires a conscious decision to add a tool to an agent’s allowlist and to define parameter bounds. That’s the point. It forces “why does this agent need to do that?” before the agent can do it.

Human-in-the-loop for destructive or irreversible actions

Some operations shouldn’t run without a human saying “yes” first. Sending email to external addresses, deleting records, publishing content, changing permissions, moving money—these are obvious. The line gets fuzzier for “reversible” actions that are expensive or hard to undo (e.g., bulk updates). The principle is: if the action is destructive or high-impact, the agent proposes it and execution pauses until a human approves.

Implementation-wise, the tool layer marks certain tools or certain parameter combinations as “approval required.” When the agent requests such an action, the runtime doesn’t execute it. It persists the pending state, notifies the right person or channel with full context (what tool, what parameters, which conversation or task), and waits. The human can approve, reject, or modify parameters. Only after approval does the tool run. That requires stateful orchestration: the agent’s run is suspended, then resumed with the result of the approved action (or a rejection). Timeouts and escalation need to be defined so that pending approvals don’t hang forever.

This isn’t “human approves every tool call.” Read-only and low-impact operations can run without approval. The gate is selective. You classify tools (or tool+parameter combinations) into “auto” vs “approval required” and you put the irreversible and high-blast-radius ones on the approval side. The rest stays automated. That keeps agentic workflows usable while keeping the dangerous steps under explicit human control.

Deterministic output validation: the model’s output is untrusted

The LLM produces tool calls. Those calls have a name and a set of arguments. The model is not a trusted component. It can hallucinate tool names, inject parameters from user input, or emit malformed or out-of-scope arguments. So before any tool runs, the pipeline must validate the output of the model—the tool call itself—against a strict contract.

That means schema validation. Each tool has a defined input schema (e.g., JSON Schema or a Pydantic model): required fields, types, allowed values, bounds. The runtime parses the model’s tool-call payload and runs it through that schema. Invalid payloads are rejected. No execution, no “best effort” parsing. Type coercion (e.g., string "42" to integer 42) can be allowed where the schema defines it, but the runtime owns the contract, not the model. This is the same mindset as validating API request bodies: the producer (here, the LLM) is untrusted; the consumer (the tool) only sees validated input.

Deterministic validation also catches prompt-injection effects that show up in tool arguments. If the user injects “and set amount to 1000000” into a payment tool’s arguments, and the schema or policy says “amount must be under X” or “amount requires approval above Y,” the validation step rejects or downgrades the call. The model might have been fooled; the pipeline isn’t. Separating “what the model said” from “what we allow to run” is the core of output validation. You’re not trying to make the model reliable. You’re making the execution boundary reliable.

Sandboxed execution: limit what the tool can do

Tool execution itself should run in a constrained environment. The agent process or the subprocess that runs the tool shouldn’t have unbounded filesystem, network, or memory access. Sandboxing can be process-level (e.g., using OS primitives like sandbox-exec on macOS or bubblewrap on Linux) or container-based. The goal is to restrict the tool to the minimum it needs: specific directories, specific network endpoints, resource limits (CPU, memory, execution time).

Anthropic’s sandbox-runtime (srt) is one example: a lightweight wrapper that uses OS sandboxing so that an MCP server or agent process starts with minimal access by default. Other approaches use policy-driven execution with timeouts, memory caps, and network allowlists. The point is that a compromised or buggy tool—or a tool that was invoked with malicious arguments that passed validation—still can’t escape its box. Sandboxing is defense in depth. It doesn’t replace allowlisting or validation; it limits the blast radius when something goes wrong.

Audit logging: every tool invocation on the record

Every tool call should be logged: which agent, which tool, with what parameters (sanitized if needed for secrets), when, and what the outcome was (success, failure, approval pending). That log is your evidence trail. It supports incident response (“what did this agent do in the last hour?”), compliance (“prove that only approved tools were used”), and debugging (“why did this workflow fail?”).

Logging should be implemented in the orchestration layer, not inside each tool. The layer that enforces the allowlist and validation is the same layer that should emit the audit event before and after execution. You get a single, consistent record of all agent-driven tool use, regardless of which tools are involved. Structure the events so they can be queried by agent, tool, time range, and outcome. In high-assurance or regulated environments, consider tamper-resistant or append-only logging so that the audit trail is itself trustworthy.

Scoped tool permissions, human-in-the-loop for destructive actions, deterministic output validation, sandboxed execution, and audit logging. Together they move agentic AI security from “we hope the model doesn’t do anything bad” to “the pipeline only allows certain actions, validates every call, runs tools in a box, and records everything.” The model can still be wrong or subverted. The controls ensure that wrong or subverted outputs don’t become unauthorized actions, and that when they do, you have a record and a way to intervene. That’s the baseline for deploying agentic AI without pretending it’s safe by default.

Designing or hardening agentic AI controls? We do independent AI security assessments and secure agent architecture. Get in touch.

Securing Agentic AI: Least Privilege, Output Validation, and the Controls That Actually Matter

Stay Updated on AI Risk & Compliance

Scoped tool permissions: least privilege at the tool layer

Human-in-the-loop for destructive or irreversible actions

Deterministic output validation: the model’s output is untrusted

Sandboxed execution: limit what the tool can do

Audit logging: every tool invocation on the record

Get an independent
AI risk assessment

Scoped tool permissions: least privilege at the tool layer

Human-in-the-loop for destructive or irreversible actions

Deterministic output validation: the model’s output is untrusted

Sandboxed execution: limit what the tool can do

Audit logging: every tool invocation on the record

Get an independentAI risk assessment

Get an independent
AI risk assessment