Architecture

Premises of the AI-Native Product

reopt architecture rests on the premise that AI-native products differ fundamentally from traditional software.

P.01

Non-deterministic execution

The same input can produce different outputs. The design discipline is therefore not to guarantee outputs but to declare judgment conditions and refusal boundaries.

P.02

Duality of ownership

The owner of an outcome can be a human or an AI. Human ownership allows contextual judgment; AI ownership requires every condition to be made explicit. This distinction determines how strict the governance must be. And human ownership is not merely defensive responsibility — setting an ambitious goal and connecting domains is the active half of ownership.

P.03

Balance of autonomy and control

AI's value comes from autonomous judgment, but unchecked autonomy produces agentic debt. To raise autonomy, you must first define the structure.

P.04

Speed is the default, structure is the choice

In an era where AI can ship an MVP in a day, speed is not the differentiator. Only structure separates a sustainable product from a one-off demo.

P.05

Structure is governance

Adding governance as a separate process invites teams to bypass it. Governance must come from the product's own structure to operate naturally.

P.06

Sovereignty lives in the loop, not the model

Models get swapped; the learning loop remains. When a human sets direction and AI scales it, that expertise accrues as ours no matter which model arrives. What you control is not the model but the structure you build on top of it.

GOAL

Goals of this architecture

Keep it explainable, at any scale, who owns which outcome.
Declare judgment conditions and failure paths as contracts, so that structure itself becomes governance.
Use the OCLS loop so that operational data continuously corrects the structure.
Let the whole team (product, engineering, operations) communicate in the same structural vocabulary.

NON–GOAL

Non-goals

Does not prescribe the usage of any particular framework or library.
Does not cover prompt-engineering technique.
Does not recommend model selection or fine-tuning strategy.
Does not require changes to organizational structure or HR systems.

The Trap of Speed Without Structure

Building a working product is easy now. The hard part is keeping it explainable and controllable as time passes. Without structure, speed turns into poison.

Prompts alone do not become structure

No matter how carefully you tune the prompt, it still cannot answer who owns the outcome or what happens when it fails.

The faster you ship, the faster it becomes a black box

A demo takes a day. But when cost, quality, approval, and accountability tangle, within three months no one can explain the product.

There is no path from demo to production

The first demo is easy. Without a structural path to team-scale operation and iterative improvement, the project stops at the demo.

Why four layers

Each layer answers a question the previous one cannot. Module alone cannot answer "who owns this outcome," so Agent is required. Agent alone cannot answer "how do they collaborate," so Collaboration rules are required. Collaboration alone cannot answer "is the product evolving safely," so Governance is required.

Each layer resolves a question the previous one cannot

Layer	Responsibility	Contract	Operations	Observed metrics (examples)	Protocol reference
Module	The product's minimum execution unit. Performs one capability and returns a result.	Declares input conditions, output format, authority scope, and refusal conditions as a contract. Because the contract is decoupled from the model, swapping the model leaves the contract intact.	Tracks call count, cost, output quality, and failure reasons.	Average cost per call, failure rate, average response time, contract-violation count	MCP Tool Definition — input/output schemas play the same role as the MCP tool schema
Agent	The actor that owns and judges outcomes. Uses one or more modules to reach a goal.	Declares goal, authority scope, delegation policy, and termination condition.	Records judgment rationale, execution steps, and approval events.	Goal-completion rate, average handle time, collaboration frequency, escalation rate	A2A Agent Card — declares the agent's goal, authority, and termination condition
Collaboration	Defines role allocation, information transfer, and flow coordination between actors.	Defines collaboration rules, information-transfer scope, and cost-attribution rules.	Monitors bottlenecks, wait time, rework, and collaboration failures.	Collaboration-failure rate, average wait time, rework frequency	A2A Task Lifecycle — collaboration flow, state transitions, information transfer
Governance	Enforces evaluation, approval, cost control, and policy so the product evolves safely. It treats the model as a replaceable part and accrues expertise into internal traces and private evals to preserve model sovereignty.	Provides guardrails, approval policy, evaluation criteria, and correction criteria.	Monitors quality variance, policy violations, and human-intervention points.	Guardrail-block count, approval wait time, quality-score distribution, policy-violation rate	Infrastructure-level guardrails (OPA, RBAC) — a policy-enforcement layer above protocol

The point of the four layers is not to slice the product more finely. It is to make explainable who owns the outcome, who absorbs the failure, and how improvement iterates.

Design Principles

If the principles below stop holding, reopt architecture regresses into yet another bundle of feature-driven automation.

Own Every Outcome

Design around responsibility units, not features

An agent is not a function caller — it is a continuously explainable owner of outcomes.

Contract First

A module is not a callable tool — it is an execution unit with a contract

Only when a module has a contract does the structure support evaluation, replacement, authority control, and testing.

Layer, Then Scale

Scale is possible only when governed through classification and structure

As agents multiply, structuring them into categories, layers, and boundaries is what keeps governance intact and scale sustainable.

Sharpen in Operation

Architecture is a system that evolves in operation

Responsibility boundaries and policies must keep adjusting from failure logs and evaluation outcomes.

Thinking Vocabulary

Four thinking tools. Each concept is a lens for design decisions and corresponds to a phase of the OCLS governance loop.

OWN

Own Every Outcome

Assign an owner to every outcome, and name whether that owner is a human or an AI. Human ownership leaves room for contextual judgment; AI ownership requires every condition to be declared as a contract. The type of owner determines how strict the governance must be.

Before building, assign an owner to every outcome and mark whether it's a human or an AI.

Is the owner of this outcome a human or an AI? And when something goes wrong, who is the escalation target?

OWN

Aim the Compute

Without human direction, compute runs in circles. Ownership is not only bearing responsibility — it is setting an ambitious goal and connecting domains. As token capital grows, the human capital that sets direction grows more valuable, not less. When a human recognizes the pattern, AI makes that expertise replicable and scalable.

Before handing work to AI, declare the goal and the connections only a human can set.

Who sets the direction of this outcome? With the human judgment removed, what is this loop spinning toward? If there is no answer, the compute is running in circles.

CONTRACT

Contract First

Declaring input, output, authority, and refusal conditions before implementation makes evaluation, replacement, and control possible. The more an AI owns the outcome, the stricter and more explicit the contract must be. The contract is the language of governance.

If you can write the contract, you've understood the responsibility.

What input must this module refuse? When you can answer, the contract is complete.

CONTRACT

Sovereign Over the Model

The model is a borrowed part; sovereignty lives in the learning system you stack on top of it. Swap the generalist model, and the company's expertise must remain in your contracts, traces, and internal evals. The test of sovereignty is not picking the best model — it is owning a learning loop that works no matter which model arrives.

Write down what survives a model swap. If nothing survives, you have no sovereignty.

Does this system's expertise live in the model, or in our contracts and traces? Do we evaluate on our own business outcomes (a private eval) rather than an external benchmark?

LAYER

Layer, Then Scale

Even as agents multiply, structuring them into categories, layers, and boundaries preserves governance. Layering is the scaling strategy. Multi-agent scaling research from Google and MIT (2026, as reported by InfoQ) showed empirically that task-dependency structure is the deciding factor — for tasks with heavy tool coordination, multi-agent overhead actually degrades performance, and the optimal collaboration strategy varies by task.

Before creating a new agent, decide where it sits within the existing categories.

Which of the four layers does this agent operate in, and what is its relationship to the existing agents?

SHARPEN

Sharpen in Operation

Tuning boundaries from operational data lets governance evolve with reality. Anthropic's [Demystifying Evals for AI Agents] frames eval-driven development: defining evaluation first surfaces requirement ambiguity before implementation. The moment a capability eval graduates to a regression eval is the signal that the boundary has stabilized. And as Anthropic's [Scaling Managed Agents] notes, assumptions go stale as models improve — upgrading to a new model generation is itself a SHARPEN trigger to re-tune boundaries and model assignments.

Start with provisional boundaries and let operational data drive splits, merges, and re-categorization.

Do recent operational signals call for boundary adjustment? Handoff failures, accountability gaps, cost anomalies, and upgrading to a new model generation are those signals. Swap the model, but keep the boundaries you learned from its traces — that is model sovereignty.

OCLS Loop

Own → Contract → Layer → Sharpen. A cyclical model for governance design. Each pass sharpens the governance boundaries.

Contract First and Sharpen in Operation are not in conflict. Contracts are provisional. Operational data updates the contract. Signals surfaced in SHARPEN (handoff failure, accountability gaps, cost anomalies) trigger the next OWN phase, where existing contracts and boundaries are revisited.

OWN → CONTRACT → LAYER → SHARPEN → ↻

OCLS in the Development Lifecycle

The traditional SDLC does not handle the non-deterministic execution and governance requirements of agents. The OCLS loop operates as the governance layer of the Agentic SDLC (ASDLC). OWN + CONTRACT run inside the Design Loop, LAYER inside the Run Loop, and SHARPEN inside the Governance Loop. Key distinction: ASDLC declares not only "what the agent must do" but "what it must never do" — refusal conditions and guardrails — at design time.

Design Review Checklist

Applying the One Question to each layer turns it into a concrete validation question. Use this checklist in design reviews, sprint kickoffs, and incident retrospectives.

CONTRACT

Module

Are this module's input, output, and failure conditions explicitly defined?
What input must this module refuse?
Can a contract violation be detected at runtime?

OWN

Agent

Can this agent's responsibility scope be stated in one sentence?
Are its goal, authority scope, and termination condition declared?
When this agent fails, who is the escalation target?

LAYER

Collaboration

Is the context passed at handoff the minimum required?
Are inter-agent collaboration rules (order, conditions, recovery paths) explicit?
Is a retry or alternate path defined for handoff failure?

SHARPEN

Governance

Are evaluation criteria defined quantitatively?
Are human-approval criteria for high-risk actions explicit?
Can every decision be traced after the fact?

Agentic Debt — the debt that accrues when speed outpaces structure

Technical debt accrued at low speed. Agentic debt accrues at high speed. The faster AI builds, the faster a structureless product becomes a black box.

OWN

Authority Sprawl

As agents multiply, who holds which authority becomes untraceable. Agent identities, access scopes, and execution rights expand tacitly, producing security incidents and accountability gaps at the same time.

CONTRACT

Contract Gap

Modules and agents run without documented input/output, refusal conditions, or failure modes. Side effects of a module swap or prompt change become unpredictable, and there is no basis to define evaluation criteria.

LAYER

Observability Gap

Reasoning paths, decision rationale, and handoff reasons go unrecorded. When something fails, the cause cannot be pinpointed and the entire reasoning process turns opaque.

SHARPEN

Validation Gap

Evaluation criteria and guardrails are missing or exist only as ritual. Quality variance widens, dangerous behavior is found only after the fact, and the feedback loop for improvement is broken.

Evolution Path

1. Single-Agent Start

Start AI automation on a small scale and accumulate baseline contracts and logs.

Transition signals

One agent juggles several roles and you can no longer tell which role caused a failure.
Prompt changes spill into unrelated capabilities.
You have enough per-module logs (call count, cost, success/failure) to track them independently.

Core deliverables

Per-module draft input/output contracts
Baseline execution logs (call count, cost, success/failure)
List of points where roles conflict

Enterprise Security baseline — agent identity, least-privilege, baseline audit logging

2. Responsibility Separation

Separate conflicting roles such as planning, execution, review, and deployment to sharpen responsibility boundaries.

Transition signals

Handoffs between separated agents repeatedly drop or duplicate context.
Parallel work is required that a single agent cannot perform.
Each agent reaches a state where it can be evaluated and improved independently.

Core deliverables

Per-agent responsibility statements
Handoff rules and context-passing schemas
Per-agent independent evaluation criteria

Enterprise Compliance boundaries — per-role data access scope, PII handling policy

3. Multi-Agent Collaboration

Define collaboration rules and information flow to stabilize the product flow.

Transition signals

Collaboration flow is stable but quality variance is wide or cost is unpredictable.
Agents start acting beyond their authority, or human-approval situations recur.
Scale demands explicit enforcement of governance rules.

Core deliverables

Stabilized collaboration flow diagrams
Context-routing rules
State/memory separation policy

Enterprise Integration with existing systems — SSO/OIDC, existing approval workflow integration, API gateway

4. Governance by Design

Bake evaluation, approval, cost control, and policy enforcement into the product's baseline structure.

Transition signals

The OCLS loop runs as a standing operating cadence; boundaries and contracts update on a regular schedule.
Governance metrics (quality score, cost, policy-violation rate) are tracked steadily.

Core deliverables

Automated evaluation and guardrail pipelines
Approval classification matrix and escalation rules
OCLS-based recurring review process

Enterprise Automated audit — RBAC, audit-trail automation, recurring governance reviews

Operating Loops

Design Loop

Lock responsibility boundaries first — role design, contract definition, failure-scenario specification.

Run Loop

Collect execution logs, quality scores, and cost signals to identify bottlenecks and unnecessary collaboration. These traces are institutional memory that survives a model swap.

Governance Loop

Adjust evaluation criteria, approval policies, and exception handling to scale the system safely.