Building the Agent Catalog architecture with opt-harness

Eric Han/Co-founder / CTO, Reopt AI

April 14, 202611 min read

A practical case study of using @reopt-ai/opt-harness, the design-governance harness package, to translate the Agent Catalog's architecture principles into a real product implementation.

When I wrote this, I believed that more general, stronger rules would keep agent behavior in line. So I designed the contracts and policies carefully, and even built the modules and tooling to enforce them, aiming for a powerful design harness.

It didn't play out that way. The smarter the model and the stronger the contract, the less room there was to adapt. Running the loop didn't improve the design; if anything, problem-solving got worse. Strong contracts were caging the model's judgment rather than guiding it.

So I shelved this module and changed direction. Instead of piling up tooling for the harness, I now aim for a shape where governance falls out naturally as you solve the problem. These days I'm paring the modules and tools down at reopt design and rebuilding them within the narrower scope of design.

This article stays up as a record of the thinking from before that turn.

Overview — what a design-governance harness is

reopt architecture is a methodology for designing AI-native systems through agent-level responsibility, module contracts, and operational governance. But a methodology, by itself, is not code. For principles to seep into product code, you need infrastructure that enforces and measures them.

opt-harness is the package that fills that gap — a design-governance harness. It solves the problems CSS tokens alone cannot — app-shell structure, workspace recipes, page rhythm, state-UX consistency, engine-adapter integration — through a declarative manifest and a policy system.

This article maps opt-harness's core concepts onto reopt architecture's principles and walks through the implementation with real code.

Manifest-Driven Architecture

opt-harness starts with the HarnessManifest. It is a static, declarative descriptor of one product surface. Just as the Module Contract pattern says "define a module's input/output and responsibility explicitly," HarnessManifest declares the contract of the entire app as a single object.

interface HarnessManifest {
  id: string;                    // unique app identifier
  label: string;                 // human-readable name
  description?: string;
  audience?: "internal" | "external";
  routeGroups?: readonly HarnessRouteGroup[];
  defaults?: HarnessPolicy;      // policy defaults
  contract?: HarnessContractRegistry;  // audit rules
}

HarnessManifest — the descriptor that declares the app's structural contract

The manifest is compiled by the createHarnessApp factory. During compilation, policy defaults are resolved, the contract registry is validated, and theme tokens are generated. The result, CompiledHarnessManifest, is injected at runtime through HarnessProvider.

import { createHarnessApp } from "@reopt-ai/opt-harness/core";

const app = createHarnessApp({
  id: "agent-catalog",
  label: "reopt architecture",
  audience: "external",
  defaults: {
    density: "comfortable",
    contentWidth: "normal",
    navigationMode: "stacked",
  },
  contract: {
    designDocument: { requiredSections: ["Overview", "Recipes"] },
    rolloutTargets: [
      { kind: "layout", relativePath: "app/layout.tsx" },
      {
        kind: "recipe-screen",
        relativePath: "app/patterns/page.tsx",
        expectedWorkspace: "ListWorkspace",
        expectedRecipe: "list",
      },
    ],
  },
});

createHarnessApp — compiles the manifest into a runtime-usable form

This structure mirrors how the Governance layer in reopt architecture's four-layer model organizes and controls the lower layers. The manifest declares the app's governance boundaries; the recipes and slots below it operate within those boundaries.

Five canonical workspace recipes

opt-harness classifies every product screen as one of five canonical recipes. A recipe defines the screen's semantic intent and enforces a layout contract that fits it. This is the Responsibility Partitioning pattern in practice — "don't pile features on without role and boundary."

list — screens for scanning, filtering, and bulk-acting on data rows. Default width: wide. Required slots: header, content.
detail — screens for inspecting a single resource. Default width: normal. Required slots: header, content.
editor — document authoring and review. Save/draft state is a first-class concern. Default width: wide.
dashboard — overview, triage, metrics, next actions. Default width: full.
landing — public pages. Hero, CTA stack. Default width: full.

Recipe selection is also programmatic. The selectRecipe function takes signals from the screen and returns the most appropriate recipe.

import { selectRecipe } from "@reopt-ai/opt-harness/core";

selectRecipe({ hasDataGrid: true });        // → "list"
selectRecipe({ hasEditor: true });          // → "editor"
selectRecipe({ primaryAction: "inspect" }); // → "detail"
selectRecipe({ isPublicFacing: true });     // → "landing"
selectRecipe({});                           // → "dashboard" (fallback)

selectRecipe — signal-based recipe selection heuristic

Each recipe also carries metadata for AI agents: intent (why this recipe exists), selectionHeuristics (signals that should pick this recipe), antiPatterns (common mistakes on this recipe). When an agent automates screen design, this metadata prevents wrong choices.

Phase-Aware AI Agent Protocol

opt-harness's most distinctive design is that the agent protocol is aware of the development phase. Just as reopt architecture's OCLS loop (Own → Contract → Layer → Sharpen) is the cyclical model for governance design, opt-harness defines four phases: scaffold → implement → polish → audit.

Loading diagram…

opt-harness Agent Protocol — the four-phase development loop

The key is that the information exposed to the agent is limited by phase. In scaffold only recipe selection is exposed; in implement only the slots and adapters of the locked recipe; in polish only policy options; in audit only audit findings and remediation patterns.

import { generateHarnessContext } from "@reopt-ai/opt-harness/core";

// scaffold phase: expose only recipe information
const scaffoldCtx = generateHarnessContext(manifest, {
  phase: "scaffold",
});

// implement phase: expose only the chosen recipe's slots
const implCtx = generateHarnessContext(manifest, {
  phase: "implement",
  lockedRecipe: "list",
});

// audit phase: expose only audit findings and remediation patterns
const auditCtx = generateHarnessContext(manifest, {
  phase: "audit",
});

generateHarnessContext — phase-scoped agent context generation

This design reflects a core OCLS-loop insight: exposing every piece of information at once degrades the agent's decision accuracy. Providing only the relevant information per phase reduces token overhead, lowers decision noise, and minimizes hallucination risk.

Policy Resolution System

In opt-harness, Policy is the implementation of governance constraints. Just as the Evaluation & Guardrails pattern says "declare not only what the agent must do but what it must never do at design time," HarnessPolicy declaratively restricts what the UI is allowed to do.

interface HarnessPolicy {
  density: "comfortable" | "compact";
  contentWidth: "narrow" | "normal" | "wide" | "full";
  navigationMode: "sidebar" | "stacked";
  motionPolicy: "full" | "reduced";
  stateLabels: HarnessStateLabels;
  panelBehavior: HarnessPanelBehavior;
  adapters: {
    datagrid: { chrome: "card" | "plain" };
    editor: { chrome: "card" | "plain" };
  };
  theme: HarnessThemeConfig;
}

HarnessPolicy — UI behavior limits declared as data

Policy resolution is hierarchical. The manifest's defaults provide the baseline; runtime overrides overwrite it. resolveHarnessPolicy merges the two inputs to produce the final ResolvedHarnessPolicy. This is similar to the CSS cascade, but type-safe and verifiable.

density — gap and size scale. comfortable (default) for generous padding; compact for high-density data views.
contentWidth — maximum page width. Recipes have defaults, but policy can override.
navigationMode — sidebar or stacked (top fixed). Decided by app scale.
motionPolicy — animation intensity. reduced minimizes transitions for accessibility.
stateLabels — global text for loading, empty, and error states.

An ESLint rule prevents bypassing the policy. For instance, using a Tailwind class like max-w-* directly produces a warning — every page width must go through policy.contentWidth. This is "structure is governance" in practice.

Slot-Based Composition

A recipe defines the contract of named slots. Six slots — header, toolbar, filters, content, aside, footer — exist; required and optional slots vary by recipe. The structure lets an agent reason about the layout without rendering.

Loading diagram…

The ListWorkspace slot structure — required/optional is marked on each node label

<ListWorkspace
  header={<PageHeader title="Pattern Catalog" />}
  toolbar={<FilterBar categories={categoryOrder} />}
  filters={<ActiveFilters selected={selectedCategory} />}
  content={
    <HarnessDataGridAdapter
      loading={isLoading}
      empty={patterns.length === 0 ? { title: "No patterns" } : null}
      error={error}
    >
      <PatternGrid patterns={patterns} />
    </HarnessDataGridAdapter>
  }
  aside={<PatternPreview selected={selectedPattern} />}
  footer={<BulkActions />}
/>

Slot-based composition — independent components placed into each slot

The slot contract is the UI-level answer to reopt architecture's core question, "who owns this outcome?" The header slot owns the page identity, the content slot owns the core data, and the aside slot owns the supplementary context. With clear owners, the blast radius of any change is also clear.

MCP Handlers — an agent interface with reads and writes

opt-harness provides more than 15 server-side MCP (Model Context Protocol) handlers. Early on there were only read handlers; through governance improvements, write handlers for recording decisions, requesting approval, and registering agents have been added. Just as the Module Contract pattern says "communication between modules is by explicit contract," MCP handlers are the bidirectional contract between agents and the harness.

listHarnessManifests — return a summary list of every registered app manifest.
getHarnessRecipes — return the full definition of the five canonical recipes.
getHarnessRecipeDetail — return detailed metadata for a specific recipe.
resolveHarnessPolicyMCP — resolve the final policy from manifest and overrides.
getHarnessCompletenessScore — compute a 0–100 score from audit findings.
getStructuredHarnessContext — return phase-scoped agent context as JSON.
getHarnessCoverage — return the route-coverage ratio.
searchHarnessPatternsHandler — keyword search across 50+ patterns.
recordHarnessDecision — write a governance decision to the audit log (write).
getHarnessDecisionLog — query decision history filtered by agent or code.
getHarnessScoreHistory — return score time series with trend and regression info.
requestHarnessApproval — request approval for a guard code (write).
resolveHarnessApproval — accept or reject a pending approval (write).
registerHarnessAgent — register an agent session and report conflicts (write).
detectHarnessConflicts — detect conflicts between registered agents.

Adding write handlers was a pivotal shift. The agent moved from a passive role of just reading the harness's state to an active participant that records decisions, requests approvals, and registers itself. This is the infrastructure-level realization of the Decision Traceability pattern's requirement that "every decision must be traceable."

Completeness Scoring — quantifying scalable governance

reopt architecture says, "if you can't measure it, it isn't governance." opt-harness's Completeness Scoring is that principle's implementation. It takes audit findings as input and produces a weighted score (0–100) per category. Seven default categories are provided, and HarnessCompletenessConfig lets you inject domain-specific custom categories and weights.

layout (20%) — HarnessProvider existence, manifest binding, layout-rule compliance.
recipe-contract (25%) — correct workspace component usage, required slots present.
adapter-ownership (15%) — correct wrapping of DataGrid/Editor and custom adapters.
state-ux (10%) — consistent handling of loading, empty, and error states.
design-document (10%) — DESIGN.md exists and contains required sections.
scaffold-bundle (10%) — completeness of scaffolding outputs.
accessibility (10%) — ARIA landmarks, heading hierarchy, skip links.

import { computeCompletenessScore } from "@reopt-ai/opt-harness/core";

const score = computeCompletenessScore(auditFindings, totalCheckCount);
// → {
//     score: 78,
//     categories: {
//       layout: { score: 100, weight: 20, findings: [] },
//       "recipe-contract": { score: 60, weight: 25, findings: [...] },
//       "adapter-ownership": { score: 85, weight: 15, findings: [...] },
//       ...
//     }
//   }

computeCompletenessScore — weighted score across seven categories

This score sets the agent's remediation priority. A low recipe-contract score (weight 25%) is fixed first; accessibility (weight 10%) waits.

// Injecting a custom category from the manifest
const config: HarnessCompletenessConfig = {
  categories: [
    {
      category: "security",
      weight: 30,
      codes: ["missing-auth-check", "exposed-api-key"],
    },
  ],
  replaceDefaults: false,  // add to the default seven (true uses only custom)
};

const score = computeCompletenessScore(findings, totalChecks, config);

HarnessCompletenessConfig — inject domain-specific custom categories and weights

Pattern Index & Contract Registry

opt-harness indexes more than 50 design patterns. Recipes, slots, adapters, policies, antipatterns, and audit rules are all indexed in a searchable form. When an agent asks "how do I build a screen with a data grid and filters?", the pattern index returns the relevant recipes and slot contracts.

import { searchHarnessPatterns } from "@reopt-ai/opt-harness/core";

const patterns = searchHarnessPatterns("datagrid filter", 3);
// → [
//   { tag: "recipe", name: "list", description: "Scan, filter, ..." },
//   { tag: "slot", name: "filters", description: "Active filter chips ..." },
//   { tag: "adapter", name: "HarnessDataGridAdapter", description: "..." },
// ]

searchHarnessPatterns — keyword search across 50+ design patterns

The Contract Registry is a set of audit targets. Each target is a machine-readable rule: "this file must use this workspace," "this route must contain this code snippet."

layout — HarnessProvider and manifest binding exist in the root layout.
recipe-screen — a specific route uses the expected workspace component.
fullscreen-tool — a fullscreen tool wraps HarnessFullscreenToolSurface.
a11y — accessibility rules: ARIA landmarks, heading hierarchy, skip links.

These rules are used at build time (ESLint), at runtime (Completeness Scoring), and in CI/CD (SARIF output, PR review comments). They realize the Decision Traceability pattern's "every design decision must be traceable" at the infrastructure level.

Decision Traceability — the implementation of decision tracking

The Decision Traceability pattern asks, "can this agent's judgment be explained after the fact?" opt-harness implements this with an append-only Decision Log. Every decision is recorded with who (agentId), when (decidedAt), and why (rationale), and can be queried later per agent or per code.

import { createDecisionLog } from "@reopt-ai/opt-harness/core";

const log = createDecisionLog();

// An agent records a recipe-selection decision
log.record({
  code: "accept-recipe",
  kind: "accept",
  attribution: {
    agentId: "architect-agent",
    decidedAt: new Date().toISOString(),
    rationale: "Data grid is the primary content, so the list recipe is chosen",
  },
  findingCode: "missing-workspace-component",
  context: { recipe: "list", screen: "/orders" },
});

// Post-hoc queries
log.findByAgent("architect-agent");  // every decision by a specific agent
log.findByCode("accept-recipe");     // every decision under a specific code
log.latestForCode("missing-workspace-component"); // latest decision

createDecisionLog — append-only decision log for governance audit trails

HarnessDecisionRecord supports four decision kinds: accept, override, defer, reject. An agent can accept an audit result, override it with rationale, defer it for later, or reject it. This structured record closes reopt architecture's Observability Gap — the state where reasoning paths and decision rationale are not recorded, making post-hoc analysis impossible.

Multi-Agent Governance — conflict detection and approval

The transition from a single-agent model to a multi-agent environment is a key turning point in reopt architecture's evolution stages. opt-harness supports this transition through Agent Registry and Approval Gate.

import { createAgentRegistry } from "@reopt-ai/opt-harness/core";

const registry = createAgentRegistry();

// Register agent sessions
registry.register("architect", "scaffold", "list");
registry.register("implementer", "implement", "list");

// Detect conflict — scaffold + implement on the same recipe concurrently
const conflicts = registry.detectConflicts();
// → [{ agents: ["architect", "implementer"],
//      conflictType: "recipe",
//      message: "Agents have conflicting phases on recipe \"list\"." }]

// Check conflicts before executing an instruction
const check = registry.checkInstruction("new-agent", {
  action: "configure-policy",
  agentId: "new-agent",
});
// → warns about a policy conflict with a polish-phase agent

createAgentRegistry — multi-agent session management and conflict detection

There are three conflict rules. Two agents both implementing the same recipe is a conflict. Scaffold and implement running on the same recipe concurrently is a conflict. Two agents polishing (tuning policy) concurrently is a conflict. These rules surface as warnings to assist the agent's judgment.

import { createApprovalGate } from "@reopt-ai/opt-harness/core";

const gate = createApprovalGate({
  requireApproval: ["bootstrap_bundle_required", "design_alignment_required"],
});

// Request approval for a high-risk action
const req = gate.requestApproval(
  "bootstrap_bundle_required",
  "implementer-agent",
);
// → { id: "approval-1", status: "pending", ... }

// A human reviews and approves
gate.approve(req.id, "human-reviewer");

// validateAgentInstruction validates approval automatically
const result = validateAgentInstruction(
  { action: "create-screen", recipe: "list", approvalId: req.id },
  manifest, "implement", gate, registry,
);

createApprovalGate — human approval workflow integrated into validation

Approval Gate implements reopt architecture's Human Approval pattern. The default is requireApproval: [] — no approval required, preserving backward compatibility. Registering a high-risk guard code forces human approval for that action. validateAgentInstruction integrates phase gating, approval validation, and agent-conflict detection into one validation pipeline.

Temporal Scoring — the time axis of governance

In reopt architecture's OCLS loop, the SHARPEN phase says "operational data updates the contract." opt-harness's Score Tracker implements that feedback loop. It tracks scores over time, detects regression, and reports improvement trends.

import { createScoreTracker } from "@reopt-ai/opt-harness/core";

const tracker = createScoreTracker();

// Record a score — scoredAt, previousScore, delta are generated automatically
tracker.record(computeCompletenessScore(findings1, 20), "audit-agent");
// → { score: 72, scoredAt: "2026-04-14T...", previousScore: undefined }

tracker.record(computeCompletenessScore(findings2, 20), "audit-agent");
// → { score: 85, delta: 13, previousScore: 72 }

tracker.record(computeCompletenessScore(findings3, 20), "audit-agent");
// → { score: 78, delta: -7, previousScore: 85 }

// Trend analysis
tracker.trend();         // → "regressing"
tracker.hasRegression(); // → true
tracker.latest();        // → { score: { score: 78, ... }, scoredAt: "..." }

createScoreTracker — time-series score tracking and regression detection

When Score Tracker detects a regression, the next improvement cycle is triggered. This is the SHARPEN→OWN feedback of the OCLS loop. "The score dropped from 85 to 78" becomes "which change violated the recipe contract?", and new owner assignments and contract revisits begin.

Combine Completeness Scoring's custom categories, the Decision Log's rationale, Agent Registry's conflict records, and Score Tracker's time series, and you have the infrastructure to structurally detect and resolve all four agentic-debt types reopt architecture defines (Authority Sprawl, Contract Gap, Observability Gap, Validation Gap).

Conclusion — structure is governance, and governance evolves

opt-harness is a case of implementing reopt architecture's core thesis, "Agents Scale by Structure," as product infrastructure. The initial version secured design-time governance (manifest, recipes, slots, policy); over time it evolved into operational governance (decision tracking, multi-agent coordination, approval, time-series scoring). The evolution itself is OCLS-loop practice.

The manifest declares the app's governance boundary (Module Contract).
Recipes classify screen responsibility (Responsibility Partitioning).
Slots define component ownership (Own Every Outcome).
Policy limits allowed behavior (Evaluation & Guardrails).
The Phase-Aware Protocol narrows the agent's judgment scope per phase (Contract First).
The Decision Log makes every decision traceable (Decision Traceability).
The Agent Registry detects multi-agent conflicts (Context Routing).
The Approval Gate forces human approval on high-risk actions (Human Approval).
The Score Tracker detects governance regression and completes the feedback loop (Sharpen in Operation).

In an era when agents proliferate, the product UI those agents produce needs the same level of governance. opt-harness enforces that governance at the infrastructure level, so even as agents multiply, the system stays explainable and operable. Structure is governance, and governance evolves in operation.

Related patterns

Module ContractDeclare the conditions, authority, and failure paths of every execution unit.
Responsibility PartitioningName the owner of each outcome and draw the boundary around it.