Human Approval

Install

cp skills/patterns/reopt-human-approval/SKILL.md <your-project>/.claude/skills/reopt-human-approval/SKILL.md

Copy this repo's file into your project and the resource activates in your Claude Code session immediately.

---
name: reopt-human-approval
description: When you need to insert human approval into irreversible or high-impact decisions. Defining the approval classification (auto / post-hoc / pre-approval). Designing the asynchronous approval queue and responding to prompt-injection and misalignment threats.
---

# Human Approval

OCLS phase: **SHARPEN** · Keep high-cost, high-risk, high-impact decisions inside a human-approval flow.

## Core rules

- Classify which actions require approval by cost, risk, and blast radius.
- Low-risk → auto-approve. Medium-risk → post-hoc review. High-risk → pre-approval.
- Embed the approval flow into the execution loop, defaulting to an asynchronous structure that lets other work continue while approval is pending.
- Log every approval event.
- Four threat types: (1) overeager — meets the goal but exceeds allowed scope; (2) honest mistakes — misreads resource scope or ownership; (3) prompt injection — malicious instructions embedded in tool output; (4) model misalignment — pursuing independent goals.
- Conservative default: "anything the agent chose autonomously is unapproved until the user has explicitly allowed it."

## Judgment question

**When should the agent stop and hand off to a human?**

## Application check

1. Is the approval matrix (low / medium / high) documented?
2. Is an asynchronous approval queue implemented?
3. Are approve/deny events written to a log?
4. Are there rules that cover each of the four threat types?

## Code example

```typescript
// Approval classification
type RiskTier = "low" | "medium" | "high";

function classifyApproval(action: Action): RiskTier {
  if (action.cost > 500000 || action.kind === "legal-statement") return "high";
  if (action.cost > 100000 || action.kind === "refund" || action.novel) return "medium";
  return "low";
}

// Asynchronous approval queue
interface ApprovalQueue {
  enqueuePreApproval(action: Action): Promise<Ticket>; // wait for approval
  enqueuePostReview(action: Action, result: Result): Promise<void>; // post-hoc review
}

async function executeWithApproval(action: Action, queue: ApprovalQueue) {
  const tier = classifyApproval(action);
  if (tier === "low") return execute(action);
  if (tier === "medium") {
    const result = await execute(action);
    await queue.enqueuePostReview(action, result);
    return result;
  }
  // high: wait for pre-approval but keep other work running
  const ticket = await queue.enqueuePreApproval(action);
  return { pending: true, ticketId: ticket.id };
}
```

## Antipatterns

**Customer support**: without approval criteria, the agent auto-approves full refunds or sends a response that admits legal liability. The opposite — requiring approval on every response — extends average handling time from 2 to 8 hours and drives customer churn.

**Infrastructure automation**: when an auto-deploy agent autonomously runs prod rollback or DB migrations, irreversible incidents follow. Actions whose blast radius is prod, user data, or security settings must default to pre-approval.

## Invocation example

```
"Insert Human Approval into RefundAgent.
Low-risk (auto): refunds ≤ ₩100,000.
Medium-risk (post-hoc review): ₩100,000–500,000 or novel cases.
High-risk (pre-approval): > ₩500,000, legal statements, bulk refunds.
Use an asynchronous queue so other inquiries keep moving while high-risk
approvals are pending."
```

## Related patterns

- Evaluation and Guardrails — the evaluation basis that triggers approval
- Responsibility Partitioning — the responsibility boundary of the approver (human)