Implementing Reopt Agentic Governance: from assessment to system

Case Study — 사례 연구

How a production SaaS running eight AI agents used reopt architecture's principles to diagnose its governance gaps and implement them — universal audit logging, persistent approval records, MCP tracing and rate limiting, agent constraints, and an admin dashboard.


Background — assess first, then implement

Reopt is a B2B SaaS platform that manages brand, content, and customer operations in a single workspace. On top of Next.js 16, React 19, and a Turborepo monorepo sit 35 packages and four apps. Eight AI agents handle document generation, brand strategy, customer analysis, page building, and workflow automation.

Diagnosing the existing system through reopt architecture's eight patterns made four debt types stand out cleanly. Audit logging existed only on BrandDefinition, so domain-level consistency was missing (Observability Gap); approval decisions lived only in UI state and were never persisted (Contract Gap); MCP handlers had neither usage tracking nor rate limiting (Authority Sprawl); and agent execution had no structural limits (Validation Gap).

This article walks through the implementation that built a governance system on top of those diagnostic results.

Responsibility Partitioning — Agent Registry + Versioning

Reopt's eight agents each carry their own responsibility scope and register themselves with a global registry. What the governance work added were version management and execution constraints. Every agent now carries a semver version and limits its execution scope through a constraints field.

type AgentDefinition = {
  id: string;
  displayName: string;
  description: string;
  version?: string;               // semver — "1.0.0"

  buildSystemPrompt: (ctx: AgentContext) => string | Promise<string>;
  createTools: (params: { context: AgentContext; dataStream: ... }) => ToolSet;
  activeToolNames: (ctx: AgentContext) => string[];

  defaultModel: string;
  allowedModels?: string[] | null;
  uiFeatures: AgentUIFeatures;

  constraints?: {
    maxToolCallsPerTurn?: number;   // upper bound on tool calls per turn (default 10)
    maxTokensPerSession?: number;   // upper bound on tokens per session
    maxSteps?: number;              // maximum execution steps (default 5)
  };
};

AgentDefinition — the agent contract extended with version and execution constraints

AgentContext now carries agentId, so agent identity threads through the entire call chain from tool execution to audit logging. From route handler → agent selection → context creation → tool factory → audit log, the agentId is never broken.

Module Contract — Tool Registry expansion

As the Tool Registry grew from 15 to 32 entries, the metadata grew alongside it. On top of the existing input/output declarations, the contract now includes failure modes, whether the tool calls an LLM internally, and a per-hour rate limit.

type ToolDefinition = {
  id: string;
  displayName: string;
  category: ToolCategory;
  action: ToolAction;
  needsApproval: boolean;
  parameters?: ToolParamDefinition[];
  outputFields?: ToolParamDefinition[];
  executeSummary?: string;
  detailUrlPattern?: string;

  // fields added in the governance implementation
  involvesLLM?: boolean;           // a tool that calls Claude internally
  rateLimitPerHour?: number;       // per-tool hourly call cap
  failureModes?: Array<{
    code: string;
    retryable: boolean;
    userMessage: string;
  }>;
};

ToolDefinition — the contract with failure modes, LLM-call flag, and rate limit

The involvesLLM flag is the key to cost tracking. Tools like requestSuggestions, createDocument, and createPostDraft call Claude internally, so double-billing occurs. With this flag, the cost dashboard can identify the LLM-in-LLM pattern.

failureModes lifts refusal conditions out of code, where they used to scatter, into the registry. reopt architecture's judgment question — "under what conditions must this module refuse or fail?" — now has a declarative answer.

Human Approval — persistent approval records

Previously, an approval decision lived only in the UI message state (approval-pendingapproval-responded). Once the chat ended there was no way to trace who had approved what. The AiToolApprovalRecord model fills that gap.

model AiToolApprovalRecord {
  id          String   @id @default(cuid())
  workspaceId String
  chatId      String                // in which conversation
  messageId   String                // on which message
  agentId     String                // which agent
  toolId      String                // requested which tool
  toolArgs    Json                  // with which arguments
  status      String   @default("pending")  // pending | approved | denied
  resolvedAt  DateTime?             // when the decision was made
  createdAt   DateTime @default(now())

  @@index([workspaceId, status])
  @@index([chatId])
}

AiToolApprovalRecord — the persistent record of an approval decision

When the approval flow detects a tool call, a pending record is created; the user's response updates it to approved or denied. With this, questions like "what was last month's approval-rejection rate for the cx agent?" and "which tool is rejected most often?" become answerable.

Decision Traceability — universal audit logging

Change history used to be recorded only in the BrandDefinitionChange table. Customer tags, content publishing, and CMS property changes were not tracked. AiToolAuditLog provides consistent audit logging across every data-mutating tool.

model AiToolAuditLog {
  id          String   @id @default(cuid())
  workspaceId String
  agentId     String                // which agent
  toolId      String                // with which tool
  userId      String                // in which user's session
  entityType  String                // which entity
  entityId    String                // with which ID
  action      String                // create | update | delete
  before      Json?                 // state before the change
  after       Json?                 // state after the change
  approvalId  String?               // reference to the approval record
  durationMs  Int?                  // execution duration
  createdAt   DateTime @default(now())

  @@index([workspaceId, createdAt(sort: Desc)])
  @@index([agentId, createdAt(sort: Desc)])
  @@index([entityType, entityId])
}

AiToolAuditLog — universal audit log that threads agent, tool, and entity together

// logToolAction — audit helper that works inside and outside a transaction
export async function logToolAction(
  client: PrismaClient | PrismaTransaction,
  params: {
    workspaceId: string;
    agentId: string;
    toolId: string;
    userId: string;
    entityType: string;
    entityId: string;
    action: "create" | "update" | "delete";
    before?: unknown;
    after?: unknown;
    approvalId?: string;
    durationMs?: number;
  },
) {
  await client.aiToolAuditLog.create({ data: params });
}

logToolAction — an audit helper compatible with Prisma transactions

Two design decisions matter. First, no foreign key to the workspace: audit logs must survive workspace deletion (retention-first). Second, a logging failure must not stop tool execution (fire-and-forget). Audit is observation, not blocking.

It currently applies to six data-mutating tools: updateCustomerTag, createCustomerTask, updatePostTags, updatePostProperties, savePostDraft, publishPost. The before/after snapshot makes it precise what changed; the approvalId makes it traceable under which approval it ran.

MCP Governance — tracing, rate limiting, audit

MCP handlers previously had no usage tracking, rate limiting, or audit logging. External agents (Claude Code and others) acted in a complete black box. Three layers closed this Observability Gap.

// 1. McpToolInvocation — call records
model McpToolInvocation {
  id          String   @id @default(cuid())
  workspaceId String
  clientId    String              // which client
  userId      String
  toolName    String              // which tool
  status      String              // success | error
  durationMs  Int?
  createdAt   DateTime @default(now())
  // args and result are intentionally excluded for sensitive-data risk

  @@index([workspaceId, createdAt(sort: Desc)])
  @@index([clientId, createdAt(sort: Desc)])
}

McpToolInvocation — records of external-agent tool calls

// 2. MCP rate limiting — Redis sliding window
const READ_LIMIT  = 100;  // 100 per minute
const WRITE_LIMIT = 10;   // 10 per minute

// Separate read and write for differentiated limits
const WRITE_TOOLS = new Set(["reopt_eav_record_bulk_update"]);

// Fail-open on Redis failure — a rate-limiter outage does not block tools
// 429 responses include the Retry-After header

MCP Rate Limiting — differentiated read/write rate limits

// 3. withAudit — non-blocking audit wrapper
export function withAudit(handler: McpHandler): McpHandler {
  return async (params) => {
    const start = Date.now();
    try {
      const result = await handler(params);
      // fire-and-forget: audit-log failure does not delay the response
      void logMcpInvocation({ ...params, status: "success", durationMs: Date.now() - start });
      return result;
    } catch (err) {
      void logMcpInvocation({ ...params, status: "error", durationMs: Date.now() - start });
      throw err;
    }
  };
}

withAudit — a non-blocking wrapper that prevents audit failures from blocking tool execution

The composition order of the three layers is rate-limit → audit → handler. Block excessive calls with the rate limit first, apply audit to the calls that pass, then run the handler. Each layer is independent, and a failure in one layer does not collapse the others.

Governance Dashboard — operational visibility

Audit logs and MCP traces are not governance if you cannot view them. Two dashboards were added to the admin app.

  • AI Audit page — audit logs and approval records separated into tabs. Filter by agentId, toolId, status, and date. Expandable before/after JSON view. 50 entries per page.
  • MCP Usage page — summary cards for total calls, success rate, and average response time. A chart of the top 20 tools by call frequency. Recent call logs. 7 / 30 / 90 day range selector.

Agent Usage Statistics rolls up daily totals (session count, tokens, credits) from the AiCreditLedger and Chat tables. This is the foundation for tracking cost per agent.

These dashboards enable the SHARPEN phase of the OCLS loop. To adjust boundaries from operational data, you must first be able to see operational data.

Remaining work

The governance implementation closed much of the Observability Gap and Contract Gap, but work remains.

  • No risk-tier classification — every approval-target tool requires the same level of approval. A low/medium/high matrix is needed.
  • No Cross-Agent Collaboration — agent-to-agent handoffs and context transfer are not supported. The transition to Stage 3 (Multi-Agent Collaboration) is still ahead.
  • No rationale — audit logs record what changed, not why the decision was made.
  • The other data-mutating tools — only 6 of 32 registered tools have audit logging applied. Expansion to the CUD tools in the Canvas and Document domains is needed.

On reopt architecture's evolution scale, Reopt is at the transition point between Stage 2 (Responsibility Split) and Stage 3 (Multi-Agent Collaboration). With the governance infrastructure laid down, the foundation now exists to design inter-agent collaboration rules.

Conclusion — structure isn't finished in one pass

Reopt's governance implementation proceeded in two steps. First, the existing system was diagnosed with reopt architecture's patterns to identify the gaps. Second, those gaps were filled at the infrastructure level. The universal audit log (AiToolAuditLog), persistent approval records (AiToolApprovalRecord), MCP tracing and rate limiting (McpToolInvocation + Redis rate limiter), agent constraints, and the admin dashboard are the result.

Three core design principles drove it: retention-first (logs outlive the original data), fire-and-forget (observation does not block execution), fail-open (an infrastructure failure does not collapse the feature). These principles came from the production-SaaS reality that governance must not sacrifice performance or stability.

As reopt architecture's OCLS loop says, governance is not finished in one pass; it improves iteratively against operational data. This implementation completed the first loop. The data collected in the SHARPEN phase will trigger the next OWN phase — risk-tier classification, cross-agent collaboration, and rationale recording.

Tags

reoptproductionauditapprovalMCPgovernanceguardrails

Related patterns

  • Responsibility Partitioning결과를 소유할 주체와 책임 경계를 명확히 정의한다.
  • Module Contract실행 단위의 조건, 권한, 실패 경로를 계약으로 선언한다.
  • Context Routing각 주체에 필요한 정보만 전달되도록 정보 흐름을 설계한다.
  • Human Approval고비용·고위험·고영향 의사결정은 인간 승인 흐름 안에 둔다.
  • Decision Traceability판단 근거, 선택 사유, 협업 경로를 구조화된 로그로 남긴다.
  • Evaluation and Guardrails허용되는 판단과 위험한 판단을 평가 기준과 안전 규칙으로 구분한다.