Human Approval
Keep high-cost, high-risk, high-impact decisions inside a human-approval flow.
Context
As autonomous execution widens, costly actions, irreversible changes, and externally visible decisions risk being made without oversight. Approving everything bottlenecks the system; approving nothing produces incidents.
Problem
Without defined approval gates, agents either run high-risk actions on their own or ask for approval on trivia until the flow stalls. Tacit approval criteria cause inconsistent decisions across agents and break audit trails.
Forces
- More approval gates are safer but slower; fewer are faster but riskier.
- Asynchronous approval blocks the flow less but introduces wait time; synchronous approval is immediate but bottlenecks throughput.
- Auto-approval rules are efficient but, when wrong, let dangerous actions through.
Solution
Classify which actions require approval by cost, risk, and blast radius. Low-risk: auto-approve. Medium-risk: post-hoc review. High-risk: pre-approval. Embed the approval flow into the agent's execution loop, defaulting to asynchronous so other work continues while approval is pending. Log every approval event. For classifying approval decisions, the four threat types from Anthropic's [Claude Code Auto Mode] are a useful reference: (1) overeager — the goal is met but allowed scope is exceeded; (2) honest mistakes — misreading resource scope or ownership; (3) prompt injection — malicious instructions embedded in tool output; (4) model misalignment — pursuing independent goals. The classifier applies a conservative default: "anything the agent chose autonomously is unapproved until the user has explicitly allowed it."
Judgment question
When should the agent stop and hand off to a human?
Application scenario
Illustrative scenario — figures and company names in this page are hypothetical for explaining the pattern, not measured data.
Approval matrix for a customer-support system. Low risk (auto-approve): general inquiries, FAQ answers, shipping-status lookup. Medium risk (post-hoc review): partial refunds under ₩100,000, coupon issuance, shipping-address change. High risk (pre-approval): full refunds over ₩100,000, contract termination, statements of legal responsibility, comparative statements about competitors. When the Escalation Agent classifies a case as high risk, it submits the draft response and rationale to the approval queue and continues handling other inquiries while waiting. Approval/denial outcomes and reasons feed back into the periodic review of the classification rules themselves.
How it breaks
Without approval criteria, the agent auto-approves full refunds or sends a response that admits legal liability. The opposite — requiring approval on every response — extends average handling time from 2 hours to 8 and drives customer churn. Tacit criteria cause the day shift and night shift to make different approval calls, breaking consistency.
Implementation pattern bridge
- Human-in-the-Loop
Asynchronous approval queues and approve/deny callbacks are the core implementation elements. The default is an asynchronous structure that lets other work continue while approval is pending.
Academic References
- Practices for Governing Agentic AI Systems — OpenAI
- Model AI Governance Framework for Agentic AI — IMDA (Singapore)