Read online: open the styled Gradient Push web version. This must point to newsletter.gradientpush.com, never a beehiiv.com or hp.beehiiv.com post URL.

the point

Before giving an autonomous agent more authority, check what the runtime actually lets it do: tools, identity, approvals, escalation, logs, and recovery.

what to check before expanding access

If an agent can send messages, change records, issue refunds, open pull requests, or trigger jobs, the policy is not the boundary. The runtime is.

A policy can say what the agent should do. The runtime decides what the agent can actually do when no one is watching: which tool it can call, under which identity, with which inputs, after which approval, and with what recovery path if the action is wrong.

That is the part teams often underbuild.

supportDraft a refund note without issuing the refund.

engineeringOpen a pull request without merge or deploy authority.

revopsPrepare outreach without sending to protected accounts.

what can the agent do?

Be specific. “Handle support tickets” is too vague. List the actual actions: read, draft, queue, send, issue credit, change status, update CRM, open PR, merge, deploy, disable workflow.

A safer design separates suggestion, preparation, execution, and override. The agent can help at one level without receiving authority at the next level.

which tools and records can it reach?

Tool access should be scoped by task, identity, tenant, environment, and risk level.

The unit of control is not “the agent.” It is the specific tool call under a specific identity in a specific context.

what requires approval?

High-impact transitions need a different path than low-risk drafting or retrieval.

Sending external messages, changing money, modifying account permissions, deleting data, granting access, merging code, deploying, or overriding policy should not rely on model discretion alone.

The approval screen also needs enough context for a real decision: what the agent plans to do, what object will change, which tool will execute, whose authority the tool uses, and whether the action can be reversed.

what forces escalation?

Escalation should not depend only on model uncertainty.

Use runtime conditions too: missing data, conflicting records, protected account flags, unusual spend, repeated tool failures, stale state, policy exceptions, and attempts to use tools outside the agent’s assigned scope.

If the system cannot decide whether to proceed safely, that is an exception path. It should not become a prompt tweak.

who can stop it?

Every production agent needs a stop path people understand.

That might mean pausing a run, disabling a tool, revoking a credential, pausing a queue, rolling back a workflow change, or forcing high-risk actions into manual review.

The owner matters. A kill switch that only one platform engineer understands is not useful to the support, sales, or release team that depends on the agent.

what logs prove what happened?

Audit logs should show more than final output.

They should preserve the request, state, tool calls, tool inputs, tool outputs, approvals, denials, escalation triggers, policy versions, human interventions, and final action.

When something goes wrong, the team needs to know whether the failure came from the model, permissions, stale data, retrieval, tool output, or approval flow.

how does the team recover?

Recovery should be designed before the agent gets broader permissions.

Can the team reverse a bad account change? Void a mistaken refund? Pull back an email campaign? Revert a pull request? Disable the affected tool without stopping the whole workflow? Find every object touched by the run?

If no one owns the bad-action path, the agent is not production-safe.

the Monday-morning test

Pick one live or planned agent workflow. Then answer these questions:

  • What may the agent do without approval?

  • What may it prepare but not execute?

  • What may it never do?

  • Which tools and records can it reach?

  • Which identity does each tool call use?

  • What input checks run before a sensitive call?

  • Which actions require confirmation or approval?

  • What conditions force escalation?

  • Who receives the alert?

  • Who can pause the run?

  • Who can disable the tool?

  • Which logs prove the decision path?

  • How does the team recover from a bad action?

If the team cannot answer those questions, the agent does not need more autonomy yet. It needs a better runtime boundary.

Use the checklist before giving an agent broader permissions.

source basis

Keep reading