Cost Control

Install

cp skills/patterns/reopt-cost-control/SKILL.md <your-project>/.claude/skills/reopt-cost-control/SKILL.md

Copy this repo's file into your project and the resource activates in your Claude Code session immediately.

---
name: reopt-cost-control
description: Setting per-agent and per-module cost ceilings, model routing (high-risk = strong model, low-risk = lightweight), token budget management. When cost is unpredictable or spiking.
---

# Cost Control

OCLS phase: **LAYER** · Manage token budget, model selection, and call frequency structurally to control the cost curve.

## Core rules

- **Route-based model selection**: strong models on high-stakes decision paths; lightweight models on low-risk paths like classification and filtering.
- **Per-agent budget allocation**: assign a token budget per agent and module, track usage in real time, fall back to a lightweight model or escalate to a human on exhaustion.
- **Cache and batching strategy**: cache results for repeated input patterns; batch async-capable work.
- Cost data is a primary input to the OCLS SHARPEN loop. On cost anomalies, re-tune model assignments and budget boundaries.

## Judgment question

**Does this reasoning step really need this model?**

## Application check

1. Are model assignments per route declared?
2. Are per-agent and per-module token budgets set?
3. Is the cache policy (what to cache, for how long) defined?
4. Is there a fallback path on budget exhaustion?

## Code example

```typescript
// Route-based model routing
type Risk = "high" | "medium" | "low";

function selectModel(route: { risk: Risk }): string {
  if (route.risk === "high") return "claude-opus-4-7";     // legal, payment, security
  if (route.risk === "medium") return "claude-sonnet-4-6"; // response generation
  return "claude-haiku-4-5";                               // classification, filter
}

// Per-agent budget + fallback
class BudgetedAgent {
  private used = 0;
  constructor(private budget: number, private llm: LLMClient) {}

  async call(input: Input): Promise<Output> {
    if (this.used > this.budget * 0.8) {
      return this.fallback(input); // fall back to lightweight at 80%
    }
    const model = selectModel(input.route);
    const r = await this.llm.call(model, input);
    this.used += r.tokensUsed;
    return r;
  }

  private async fallback(input: Input) {
    return this.llm.call("claude-haiku-4-5", input);
  }
}

// Cache — reuse identical input patterns within TTL
const cacheKey = hash(input);
const cached = await cache.get(cacheKey);
if (cached && !cached.expired) return cached.result;
```

## Antipatterns

**Customer support**: operating without cost tracking makes "why did spend double this month" unanswerable. Using the same model across all agents burns expensive reasoning on simple classification. Operating without budget ceilings means traffic spikes turn into unbounded cost.

**Search / RAG systems**: recomputing every embedding per query compounds cost linearly. Cache frequent query and document embeddings, and add invalidation rules so only changed documents are reprocessed.

## Invocation example

```
"Every agent currently uses the same strong model, so monthly cost is 3× budget.
Apply reopt-cost-control: lightweight for Intake, keep the strong model on
Response, and add per-agent token budgets with a fallback path triggered at 80%."
```

## Related patterns

- Module Contract — declare per-module cost data in the contract
- Evaluation and Guardrails — optimize cost without losing quality