CO: Full Specification
CO: A Core Thesis
Section titled “CO: A Core Thesis”Cognitive Orchestration
A Methodology for Structuring Human-AI Collaboration
Author: Dr. Jack Hong, Singapore Management University
Version: 1.1 | March 2026
License: CC BY 4.0
Disclosure
Section titled “Disclosure”I developed Cognitive Orchestration and built the Kailash ecosystem that implements it. The entire Kailash stack is released under the Apache 2.0 open-source license, meaning the source code is publicly available, and anyone is legally entitled to use it, modify it, and build commercial products with it, permanently and irrevocably.
Two patent applications are pending (PCT/SG2024/050503 and P251088SG; favorable IPRP received, national phase filings in progress in Singapore and the United States; no patent has been granted). The Apache 2.0 license includes an automatic patent grant (Section 3 of the Apache License) providing users a perpetual, royalty-free patent license for the contributed code, with defensive termination if the user initiates patent litigation.
This paper is published under CC BY 4.0 (Creative Commons Attribution). The methodology has been implemented and tested in software development (COC).
Part I: The Problem That Model Improvement Cannot Solve
Section titled “Part I: The Problem That Model Improvement Cannot Solve”AI Capability Is Commodity
Section titled “AI Capability Is Commodity”In 2024, GPT-4 was the clear leader among large language models. By 2026, Claude, Gemini, and numerous open-source models match or exceed its performance across most practical tasks. The gaps continue to narrow. Prices continue to fall. If your competitive advantage depends on which model you use, you have no competitive advantage.
The real differentiator is what surrounds the model: institutional knowledge, organizational context, hard-won lessons, conventions that emerged from failure. This context is specific to your organization, your domain, your history. It cannot be downloaded. It cannot be replicated by a competitor switching to a better model.
The Brilliant New Hire With Zero Onboarding
Section titled “The Brilliant New Hire With Zero Onboarding”Consider what happens when you hire a brilliant new professional in any field. A new compliance officer does not know that your organization’s interpretation of “material risk” was shaped by a regulatory action five years ago that appears in no textbook. A new financial analyst does not know that the quarterly forecast model has a manual override for the Asian markets because of a data source that lags by six days.
These professionals are not incompetent. They lack context. Good organizations solve this with onboarding: documentation, mentoring, review processes, institutional memory made transmissible.
Now consider what the industry did with AI. It took the most capable “new hire” in the history of knowledge work and gave it zero onboarding. The output looks competent. It follows general best practices. And it silently violates your organizational standards in ways that will cost months to untangle.
Three Failure Modes
Section titled “Three Failure Modes”This failure manifests identically across every domain where AI operates without institutional context. The surface details differ. The structural pattern does not.
Failure Mode 1: Amnesia. AI systems operate within a context window — a fixed amount of information they can hold in working memory. As interactions grow, the AI begins to forget your patterns and reverts to its training data’s patterns. The compliance AI applies textbook interpretations instead of your established positions. The financial AI uses standard assumptions instead of your calibrated models. No error. No warning. Just drift.
Failure Mode 2: Convention Drift. Every mature organization has conventions from hard-won experience. The AI knows none of them. It has been trained on the open internet. When your conventions conflict with generic conventions, the AI follows the generic ones.
Failure Mode 3: Safety Blindness. AI models optimize for the most direct path to a completed task. Safety is almost never the most direct path. Proper approval workflows are less direct than bypassing them. Conservative interpretations are less direct than optimistic ones. The AI takes the efficient path because nothing in its context tells it why the inefficient path exists.
These three failure modes are not model deficiencies. They are context deficiencies. They appear in codegen, compliance, finance, operations, and every other domain where AI operates without institutional knowledge. No amount of model improvement will solve them, because the knowledge they require is yours, not the model’s.
The Domain-Agnostic Root Cause
Section titled “The Domain-Agnostic Root Cause”The software development community encountered these failure modes first. The term “vibe coding” (Karpathy, 2025) captured the pattern: developers describing intent and accepting whatever the AI produced, with predictable quality collapse in production systems. Cognitive Orchestration for Codegen (COC) was the first systematic response (Hong, 2026d), demonstrating that the solution was architectural, not model-dependent.
But the three failure modes are not specific to software development. A compliance team using AI without organizational conventions suffers the same failure as a development team using AI without coding standards. The surface differs. The structure is identical.
Cognitive Orchestration (CO) extracts the domain-agnostic principles and architecture from COC. COC was the first domain application. CO is the generalization.
Part II: Eight First Principles
Section titled “Part II: Eight First Principles”CO rests on eight principles. Each is domain-agnostic. Each is stated in falsifiable form.
Principle 1: The Institutional Knowledge Thesis
Section titled “Principle 1: The Institutional Knowledge Thesis”Statement: The primary determinant of AI output quality is not model capability but the institutional context surrounding the model.
Two organizations using the same model will produce dramatically different output quality if one has invested in making its institutional knowledge machine-readable and the other has not. The organization with better context, using a weaker model, will outperform the organization with weaker context using a stronger model.
This principle draws on Polanyi’s insight that “we know more than we can tell” (Polanyi, 1966). The tacit knowledge of an organization is precisely what AI lacks and precisely what makes institutional output valuable. CO is an architecture for making this tacit knowledge explicit and machine-readable, while acknowledging that some will always resist articulation.
Falsification condition: If, across ten or more controlled deployments, structured institutional context produces no statistically significant quality improvement, the thesis should be abandoned.
Principle 2: The Brilliant New Hire Principle
Section titled “Principle 2: The Brilliant New Hire Principle”Statement: An AI system without organizational context is functionally equivalent to a brilliant new hire on their first day — capable but contextless, productive but undisciplined, fast but unaware of what burned you last quarter.
This is not a metaphor. It is a diagnostic. It predicts the specific failure modes (amnesia, convention drift, safety blindness) and prescribes the remedy (structured onboarding as machine-readable institutional knowledge). It also predicts where AI will succeed without CO: where institutional context is minimal and convention violations are cheap. Production systems with organizational standards are not such domains.
Principle 3: Three Failure Modes
Section titled “Principle 3: Three Failure Modes”Statement: AI systems operating without institutional context will fail along three predictable fault lines: amnesia (forgetting instructions as context fills), convention drift (following generic practices instead of yours), and safety blindness (optimizing for task completion over organizational safety requirements).
These are structural consequences. Context windows are finite, so amnesia is inevitable over long interactions. Training data reflects generic practices, so convention drift is the default. Task completion is the optimization target, so safety is deprioritized unless explicitly enforced. The term is “safety blindness,” not “security blindness,” because the failure extends beyond security to any organizational safety concern: regulatory compliance, data handling, approval workflows, risk management. Security is one dimension of safety.
Falsification condition: If AI systems consistently maintain conventions, retain instructions, and prioritize safety without explicit enforcement, these failure modes should be reclassified from structural to incidental.
Principle 4: The Human-on-the-Loop Position
Section titled “Principle 4: The Human-on-the-Loop Position”Statement: The optimal position for humans in AI-augmented work is neither in-the-loop (approving every action) nor out-of-the-loop (absent from the process), but on-the-loop: defining the operating envelope, observing execution patterns, and refining boundaries based on what they learn (Parasuraman & Riley, 1997).
The human-on-the-loop contributes what AI cannot: institutional judgment about boundaries, contextual wisdom about organizational values, and the ethical reasoning that determines not just what can be done but what should be done. The human’s primary value is not executing tasks. It is defining and maintaining the context that makes AI-executed tasks trustworthy.
Critical caveat: Defining effective constraints for complex AI behavior is itself difficult. Human-on-the-loop is aspirational architecture, not guaranteed control.
Principle 5: Deterministic Enforcement Over Probabilistic Compliance
Section titled “Principle 5: Deterministic Enforcement Over Probabilistic Compliance”Statement: Critical organizational rules must be enforced deterministically, outside the AI’s context window, independent of whether the AI “remembers” the rule.
AI models are probabilistic. They follow instructions most of the time. “Most of the time” is not acceptable for security rules, regulatory requirements, or safety-critical procedures. CO distinguishes between soft enforcement (rules the AI follows probabilistically) and hard enforcement (deterministic mechanisms that block violations regardless of the AI’s state). This is the difference between telling a new hire “please follow this policy” and building a system that prevents policy violations at the infrastructure level.
The asymmetric structure of the five layers reflects a theoretical constraint. Under incomplete contracts (Hart, 1995), most orchestration decisions require semantic judgment because the specification cannot anticipate every contingency; the AI must interpret goals, route tasks, and assess completion in situations the rule-writer could not foresee. Under absent wealth effects (Fama & Jensen, 1983), safety decisions require deterministic enforcement because the system lacks incentive alignment for self-governance; it optimizes for task completion, not organizational safety. The boundary between semantic and deterministic mechanisms is not a design preference. It is a structural consequence of delegation to systems that reason but do not bear consequences for how they reason.
Principle 6: Bainbridge’s Irony Applied
Section titled “Principle 6: Bainbridge’s Irony Applied”Statement: The more AI handles routine execution, the more critical it becomes that humans maintain deep understanding of the work being executed.
Bainbridge’s “Ironies of Automation” (1983) documented a paradox: automating a process requires skilled operators for exceptional cases, but automation degrades the skills needed to handle those exceptions.
CO is designed with this irony in mind. Every layer is an investment in human understanding. Defining intent routing forces articulation of domain structure. Maintaining context forces articulation of tacit knowledge. Designing guardrails forces identification of non-negotiable rules. The practitioner who builds CO layers becomes a deeper expert, not a shallower one. This is by design.
Principle 7: Knowledge Compounds
Section titled “Principle 7: Knowledge Compounds”Statement: Institutional knowledge captured in machine-readable form compounds across sessions, practitioners, and time periods. Each successful interaction makes the next one more effective.
Stateless AI interactions start from zero every session. The same mistakes are made. The same corrections are applied. Nothing accumulates.
CO creates a compounding loop: observations from AI-assisted work are captured, patterns are identified, successful patterns are formalized into institutional knowledge, and that knowledge improves future AI interactions. Over months, the gap between a CO-equipped organization and a stateless one widens, not because the model improves, but because the institutional knowledge deepens.
This is the strategic argument: model capability is commodity and depreciates with each new release. Institutional knowledge is specific and appreciates with each successful application.
Principle 8: Authentic Voice and Responsible Co-Authorship
Section titled “Principle 8: Authentic Voice and Responsible Co-Authorship”Statement: CO exists to produce genuinely excellent work through human-AI collaboration. The human directs; the AI assists. The output reflects the human’s intellectual contribution, not the tool that helped produce it. CO practitioners disclose AI assistance per venue and institution requirements. They do not conceal it.
Statistical AI detection tools (Originality.ai, GPTZero, Turnitin) use proxy signals: perplexity, burstiness, vocabulary frequency. These proxies measure stylistic uniformity, not intellectual origin. A human who writes formulaic prose triggers false positives. Liang et al. (2023) found false positive rates exceeding 61% for non-native English speakers. The tools measure the wrong thing.
CO addresses the real question: did a human direct this work? The deliberation records, approval gates, and journals create an auditable trail of genuine human intellectual engagement. This trail is stronger evidence of authentic co-authorship than any statistical proxy. A researcher using CO produces higher quality work than one who free-rides on AI, because CO forces articulation (Principle 1), deliberation (Principle 4), teaching (Principle 6), and defense at every step. The architecture makes free-riding harder, not easier.
Writing style constraints in CO implementations exist to mitigate detection bias in tools whose statistical methods flag stylistic patterns that have no bearing on work quality. The distinction matters: concealing AI use is dishonest; mitigating detection bias in tools that flag stylistic patterns irrelevant to intellectual contribution is not.
Falsification condition: Abandon this principle if AI detection tools achieve reliable detection of intellectual origin (not just stylistic patterns) with false positive rates below 5% across native and non-native English speakers.
Part III: The Five-Layer Architecture
Section titled “Part III: The Five-Layer Architecture”CO defines five layers. Each addresses a specific structural problem. Together, they form the architecture for human-on-the-loop work across any domain.
+--------------------------------------------------+| Layer 5: LEARNING || Observe -> Capture -> Evolve knowledge |+--------------------------------------------------+| Layer 4: INSTRUCTIONS || Structured workflows with approval gates |+--------------------------------------------------+| Layer 3: GUARDRAILS || Deterministic enforcement, not suggestions |+--------------------------------------------------+| Layer 2: CONTEXT || Your living institutional handbook |+--------------------------------------------------+| Layer 1: INTENT || Specialized agents, not generalist AI |+--------------------------------------------------+By 2026, multiple commercial AI tools — Claude Code (Anthropic), Cursor, Windsurf, and others — have independently converged on mechanisms that map to CO Layers 1-3: agent specialization, context engineering, and deterministic hooks. This convergence is not a threat to CO. It is validation. When independent teams, building for different markets, arrive at the same architectural conclusions, it confirms that these mechanisms are necessary rather than incidental. The term “context engineering” is now used industry-wide to describe what CO calls Layer 2. The fact that everyone agrees on the need does not diminish the contribution of formalizing the architecture.
What remains unaddressed — and what CO uniquely contributes — is the governance methodology: structured workflows with quality gates (Layer 4), organizational learning with human-gated knowledge formalization (Layer 5), and the integration of all five layers into a coherent system with formal trust infrastructure (EATP). Part VI examines this relationship in detail.
For each layer, I describe the domain-agnostic problem, the CO principle it enacts, the COC proof-of-concept, and a sketch of how the same layer applies in other domains.
Layer 1: Intent — The Role
Section titled “Layer 1: Intent — The Role”The problem: A single generalist AI applied to every task produces generalist output. It applies its broadest training rather than deep domain-specific knowledge. This is the organizational equivalent of asking one person to handle legal, finance, operations, and engineering — they can attempt all of them, but they excel at none.
The CO principle: Route tasks to domain-specialized agents, each configured with deep knowledge of a specific area. Stop treating AI as a monolith. Treat it as a team.
COC proof: The Kailash COC implementation defines 29 agent specializations across seven development phases. Each agent carries curated domain knowledge, uses tools appropriate to its specialty, and operates at a model tier matched to its cognitive demands. A security reviewer does not write frontend code. A database specialist does not design UIs. The routing is explicit and systematic.
Cross-domain: A compliance practice might define agents for regulatory interpretation, policy drafting, audit preparation, and incident assessment. A finance function might define agents for transaction analysis, forecasting, reporting, and advisory. Each carries institutional knowledge relevant to its specialty — organizational judgment, not generic domain knowledge.
Effective organizations do not ask one person to be everything. They specialize, and they route work to the appropriate specialist. Layer 1 makes this routing explicit for AI.
Layer 2: Context — The Library
Section titled “Layer 2: Context — The Library”The problem: AI training data is stale the moment training ends. Your internal conventions, organizational decisions, and institutional lessons are not in any training set. The AI defaults to generic knowledge because it has no access to yours.
The CO principle: Replace the AI’s generic training data with your living institutional handbook. Make your organizational knowledge the primary reference, not a secondary afterthought.
COC proof: The Kailash COC implementation structures organizational knowledge in a progressive disclosure hierarchy: a master directive loaded every session, quick-reference indexes (10-50 lines), topic-specific files (50-250 lines), and deep reference material pulled in only when needed. Over 100 files follow this pattern across 25 skill directories. The AI loads minimum context for the current task and pulls deeper reference only when required.
Two sub-principles govern this layer. Framework-First: before building from scratch, check whether existing organizational building blocks already handle the task. The principle is composition over creation. Single Source of Truth: each piece of institutional knowledge lives in exactly one place, with no contradictions.
This is context engineering, as distinct from prompt engineering. Prompt engineering optimizes a single interaction. Context engineering builds organizational knowledge that persists across every interaction, with every practitioner, across every session.
Cross-domain: A compliance team’s Layer 2 might include regulatory interpretation guides, precedent databases, and policy templates reflecting organizational risk appetite. A finance team’s might include calibrated models, institutional parameter choices, and exception handling procedures. In every case, the content is the team’s accumulated judgment made machine-readable, not generic domain knowledge.
Layer 3: Guardrails — The Supervisor
Section titled “Layer 3: Guardrails — The Supervisor”The problem: Context tells the AI what to do. But AI models are probabilistic. They follow instructions most of the time, not all of the time. For critical rules — regulatory requirements, safety procedures, authorization limits — “most of the time” is not acceptable.
The CO principle: Deterministic enforcement outside the AI’s context window. Not “the AI should follow this rule” but “the system will not allow the AI to violate this rule.”
COC proof: The Kailash COC implementation operates in two tiers. Tier 1 (Rules): soft enforcement through directive files the AI interprets. Tier 2 (Hooks): hard enforcement through deterministic scripts that run on every tool call, outside the model’s context window, intercepting violations before they reach the codebase.
The most important mechanism is the anti-amnesia hook: a script that fires on every user message, re-injecting critical rules fresh into the AI’s context. It survives context window compression because it runs outside the model’s memory. Critical rules employ defense in depth: five to eight independent enforcement layers, so that if any four fail, the fifth catches the violation.
The CARE connection: This is the Trust Plane applied to practice. In the CARE framework (Hong, 2026a), the Trust Plane is where humans define boundaries that AI cannot override. Layer 3 is that concept made operational. The AI has full autonomy within the envelope. The envelope itself is not negotiable.
Cross-domain: A compliance team’s Layer 3 might hard-enforce: no external communication without human review, mandatory disclaimers on client-facing advice, data classification boundaries. A finance team’s might hard-enforce: transaction limits, segregation of duties, regulatory format validation. The common pattern: critical rules get deterministic enforcement, not probabilistic compliance.
Layer 4: Instructions — The Operating Procedures
Section titled “Layer 4: Instructions — The Operating Procedures”The problem: Even with the right specialist, the right context, and the right guardrails, complex tasks fail when there is no procedural discipline. The AI tackles the hardest part first when it should start with analysis. It produces output before confirming the approach. It claims completion without evidence.
The CO principle: Give the AI a structured methodology with approval gates between phases, evidence requirements for completion, and mandatory delegation rules.
COC proof: The Kailash COC implementation provides a seven-phase workflow with quality gates at four points. The AI cannot skip phases. Evidence-based completion requires file-and-line proof for every claim. The approval gates create natural points for human judgment: review the analysis, approve the plan, verify the evidence. This is human-on-the-loop applied to the work cycle.
The meta-process point: CO does not prescribe a specific N-phase workflow for every domain. It prescribes structured processes with approval gates, evidence requirements, and mandatory checkpoints appropriate to the domain. The principle — structured methodology with human checkpoints — is universal.
Layer 5: Learning — The Performance Review
Section titled “Layer 5: Learning — The Performance Review”The problem: Most AI-assisted workflows are stateless. Every session starts from zero. The AI learns nothing from yesterday’s successes or failures. Institutional knowledge does not grow from AI-assisted work — it atrophies.
The CO principle: Observe what works, capture successful patterns, and evolve new capabilities over time. Institutional knowledge compounds with every session.
COC proof: The Kailash COC implementation follows an Observe-Capture-Evolve pipeline. Observation logs patterns, errors, and fixes. Pattern analysis identifies recurring sequences with confidence scores. Evolution suggests formalization of high-confidence patterns into institutional artifacts — but only as suggestions requiring human approval. No pattern becomes institutional knowledge without a human deciding it should.
Over months, new team members inherit not just existing knowledge but the distilled judgment of everyone who has worked with the system.
The critical constraint: The human approves what becomes institutional knowledge. The system proposes. The human disposes. This is the most direct expression of human-on-the-loop: the practitioner’s primary value is not the output of any single session but the institutional knowledge they create and refine across all sessions.
Part IV: The Process Model
Section titled “Part IV: The Process Model”CO is a methodology, not a product. Implementing it requires roles, artifacts, and quality criteria. This section describes the meta-process for building a CO practice.
Five roles emerge from the five layers:
- CO Architect: Designs the overall five-layer architecture. The most senior role, requiring the deepest institutional knowledge.
- Domain Specialist: Contributes institutional knowledge to Layer 2. Not a technology role — a domain expertise role.
- Guardrail Engineer: Implements deterministic enforcement in Layer 3. A technology role requiring understanding of both enforcement mechanisms and the organizational rules they protect.
- Process Designer: Defines structured workflows in Layer 4 — phases, gates, evidence requirements, delegation rules.
- Knowledge Curator: Maintains Layer 5. Reviews proposed pattern formalizations. Ensures the knowledge base remains accurate and non-contradictory. An ongoing role, not a one-time setup.
Artifacts
Section titled “Artifacts”Each layer produces specific artifacts:
| Layer | Artifacts |
|---|---|
| Intent (L1) | Agent definitions, routing rules, model tier assignments |
| Context (L2) | Master directive, skill indexes, topic files, reference documentation |
| Guardrails (L3) | Soft rule files, hard enforcement scripts, defense-in-depth mapping |
| Instructions (L4) | Phase definitions, gate criteria, evidence templates, delegation rules |
| Learning (L5) | Observation logs, pattern analyses, proposed evolutions, approved formalizations |
Quality Criteria
Section titled “Quality Criteria”A CO implementation is assessed on five dimensions:
Coverage: What proportion of organizational institutional knowledge is machine-readable? This is never 100%. Some tacit knowledge resists articulation. The question is whether the most critical knowledge is captured.
Enforcement reliability: Do critical rules have hard enforcement? Is defense in depth applied to the highest-risk rules? A single enforcement mechanism for a critical rule is a single point of failure.
Workflow discipline: Are approval gates respected? Does the AI produce evidence at each gate? Can a human reviewer verify claims without reading raw output?
Learning velocity: How quickly do successful patterns become institutional knowledge? Are new practitioners inheriting accumulated wisdom? Is the knowledge base growing or stagnating?
Practitioner depth: Are the humans building CO layers becoming deeper domain experts? If the practitioners are becoming shallower — more dependent on the AI, less capable of independent judgment — the implementation has inverted Bainbridge’s irony rather than resolving it.
Part V: CO in the Trinity
Section titled “Part V: CO in the Trinity”CO does not stand alone. It is one of three interconnected frameworks, each addressing a distinct question about human-AI collaboration.
CARE (Philosophy) "What is the human for?" | +-- EATP (Trust Protocol) | "How do we keep the human accountable?" | +-- CO (Methodology) "How does the human structure AI's work?" | +-- COC (Codegen) — mature, in production +-- COR (Research) — in production +-- COL (Learning) — in development +-- COL-F (Finance) — in production +-- COE (Education) — in production +-- COG (Governance) — in production +-- COComp (Compliance) — in productionHow CO Inherits from CARE
Section titled “How CO Inherits from CARE”CARE (Hong, 2026a) established the philosophical foundation: the Dual Plane Model, the Mirror Thesis, and the Human-on-the-Loop position. CO makes the third of these operational.
CARE’s Trust Plane becomes CO’s Layer 3 (Guardrails): the normative separation between trust and execution becomes an architectural separation. CARE’s Mirror Thesis manifests in CO as the practitioner’s evolving role: the CO Architect who designs the five layers does not write the output but defines what makes the output trustworthy. The mirror reveals that this architectural contribution is the human’s unique value.
CARE’s eight principles map to CO’s structure:
| CARE Principle | CO Expression |
|---|---|
| Full Autonomy as Baseline | AI executes within the five-layer envelope |
| Human Choice of Engagement | Approval gates, not continuous monitoring |
| Transparency as Foundation | Audit trails, evidence requirements |
| Continuous Operation | AI maintains quality; humans intervene at checkpoints |
| Human Accountability Preserved | Every output traces to human-defined context |
| Graceful Degradation | Escalation when AI reaches competence boundaries |
| Evolutionary Trust | Learning layer evolves institutional knowledge |
| Purpose Alignment | Master directive establishes organizational purpose |
How CO Connects to EATP
Section titled “How CO Connects to EATP”EATP (Hong, 2026b) provides the trust infrastructure that CO’s Layer 3 can leverage for enterprise deployments. EATP’s Genesis Record parallels CO’s master directive (organizational root of authority). EATP’s Constraint Envelopes — Financial, Operational, Temporal, Data Access, Communication — provide a rigorous framework for CO’s guardrails. EATP’s Audit Anchors provide the evidentiary foundation that CO’s evidence-based completion requires.
The relationship is complementary. EATP answers: “Can we prove this action was authorized by a human?” CO answers: “Did we structure the work so the output is trustworthy?” An action can be authorized but poorly structured, or well-structured but unauthorized. Organizations need both.
Why They Are Peers, Not Layers
Section titled “Why They Are Peers, Not Layers”CARE, EATP, and CO are siblings, not a stack. An organization can adopt CO without EATP (structured methodology without formal trust chains) or EATP without CO (formal accountability without structured methodology). Both work better together, but neither depends on the other.
Part VI: Relationship to AI Execution Tools
Section titled “Part VI: Relationship to AI Execution Tools”The Convergence as Validation
Section titled “The Convergence as Validation”A conformance analysis conducted in March 2026 assessed Claude Code CLI — arguably the most architecturally complete AI execution tool — against every normative requirement in the CO specification. The results are instructive.
Claude Code achieves CO-L1L2L3 conformance (Layers 1-3, with caveats) when properly configured. It provides agent directories (Layer 1), context files with progressive disclosure (Layer 2), and hooks that run outside the model’s context window (Layer 3). These are genuine architectural mechanisms, not superficial feature mapping. The hook system, in particular, provides exactly the deterministic enforcement that Layer 3 requires.
Cursor and Windsurf follow the same pattern: agent specialization, context files, enforcement rules. The convergence is real. These are not features copied from CO — they are independent solutions to the same structural problems that CO identifies. This makes the convergence stronger evidence, not weaker.
CO-L1L2L3 conformance means these layers are becoming commodity. This paper does not claim novelty for agent specialization, context engineering, or deterministic hooks. The novel contribution is what sits above: the methodology for using these primitives as part of an integrated governance system.
Where Execution Tools Stop
Section titled “Where Execution Tools Stop”The same conformance analysis found MUST-level failures at Layers 4 and 5. No shipping execution tool implements:
Layer 4: Structured workflows with quality gates. Claude Code provides slash commands (invocation shortcuts that load context). CO Layer 4 requires workflow orchestration: phase state machines, enforced approval gates that the AI cannot bypass, evidence-based completion requiring file-and-line proof, mandatory delegation rules, and phase-skip prevention. Commands say “here is a shortcut to load context.” Layer 4 says “you cannot proceed until gate conditions are satisfied.” These are fundamentally different. The Claude Code team’s own internal analysis identifies eight pain points in the current command system, including no state tracking, no validation, and no persistence of phase completion.
Layer 5: Organizational learning pipeline. Claude Code provides auto-memory (unstructured free text managed by the model). CO Layer 5 requires an observe-capture-evolve pipeline: structured observation logging with categories, pattern analysis with confidence scoring, proposed formalizations that require human approval before becoming institutional knowledge, and inheritance mechanisms so new practitioners benefit from accumulated wisdom. Auto-memory is session continuity (pick up where you left off). Layer 5 is organizational learning (patterns observed across many sessions, scored, and formalized with human gates). The Kailash COC implementation builds this entire pipeline as custom scripts running on Claude Code’s hook infrastructure — the pipeline is 100% user-built, not platform-provided.
The Two-Era Framework
Section titled “The Two-Era Framework”The relationship between execution tools and governance methodology becomes clear when viewed as two distinct eras:
| Era 1: Execution | Era 2: Institutionalization | |
|---|---|---|
| Question | ”Can the AI do it?" | "Should the AI do it, and can we prove who authorized it?” |
| Architecture | Execution plane only | Trust Plane + Execution Plane |
| Knowledge | Session context (volatile) | Institutional knowledge (compounds) |
| Governance | Settings and permissions | Constraint envelopes and trust lineage |
| Human role | Approver or absent | Architect of the operating envelope |
| Failure model | ”The AI made an error" | "Which human defined the boundary that permitted this?” |
| Asset | Model capability (depreciates) | Institutional knowledge (appreciates) |
| Scope | My project, my session | My organization, across years |
Era 1 is where every AI tool vendor operates. Era 2 is where CO, CARE, and EATP operate. The two eras are complementary, not competitive. An organization needs execution tools (Era 1) and governance methodology (Era 2). Neither substitutes for the other.
The PMBOK Analogy
Section titled “The PMBOK Analogy”The convergence of execution primitives makes CO more relevant, not less. This follows a pattern established in other fields:
Every organization has access to project management software. PMI’s PMBOK (Project Management Body of Knowledge) is still published and widely adopted because the methodology for using tools is different from the tools themselves. Every organization has access to spreadsheets. Financial modeling methodologies are still taught because knowing Excel is different from knowing how to build a reliable financial model.
CO occupies the same layer. It does not compete with Claude Code, Cursor, or any other tool. It provides the methodology for using any tool effectively for human-AI collaboration. The more tools converge on similar primitives, the more valuable a shared methodology becomes, because the methodology transfers across tools.
CO Conformance Comparison
Section titled “CO Conformance Comparison”The following table summarizes CO conformance for a representative execution tool (Claude Code CLI) against the full CO specification:
| CO Layer | Execution Tool Coverage | Nature of Gap | Novelty Status |
|---|---|---|---|
| L1: Intent | Partial | Routing is probabilistic (model-decided), not deterministic; no scope enforcement | Converging |
| L2: Context | Substantial | Strongest layer; no contradiction detection across knowledge sources | Converging |
| L3: Guardrails | Met (CO), Not Met (EATP) | Hook architecture meets CO requirements; no typed constraint dimensions or verification gradient | Converging (CO); Unique (EATP) |
| L4: Instructions | MUST Failure | No workflow engine, no phase state machine, no gate enforcement, no evidence validation | Unoccupied |
| L5: Learning | MUST Failure | No structured observation, no confidence scoring, no formalization pipeline | Unoccupied |
| Process Model | Not addressed | Roles, quality criteria, adoption phases, conformance levels | Unique |
| Domain Generalization | Not addressed | Application template, cross-domain pattern, failure mode mapping | Unique |
| EATP Integration | ~5% | No trust lineage, no delegation chains, no cryptographic attestation | Unique |
The pattern is clear. Layers 1-3 are converging toward commodity — execution tool vendors will continue to improve these mechanisms. Layers 4-5, the process model, domain generalization, and EATP integration are CO’s genuinely unique contribution. No execution tool has an incentive to build these because they are not tool features; they are organizational methodology.
Five Propositions
Section titled “Five Propositions”The philosophical position that distinguishes CO from execution tool engineering rests on five propositions:
Proposition 1: Model capability is commodity. Institutional knowledge is the differentiator. Two organizations using the same model produce dramatically different output quality depending on whether institutional knowledge is machine-readable (CO Principle 1). Execution tools provide the mechanism for context. CO provides the methodology for building, maintaining, and evolving institutional knowledge as an organizational asset.
Proposition 2: Trust is human. Execution is shared. The Trust Plane (accountability, authority, values, boundaries) is permanently human. The Execution Plane (task completion, coordination) is shared with AI (CARE Dual Plane Model). Execution tools operate entirely within the Execution Plane. Their “trust” mechanisms are execution-layer access controls, not governance architecture.
Proposition 3: Accountability requires architecture, not just settings. When an AI agent acts, the question “by what authority?” must have a verifiable answer terminating at a human decision (EATP trust lineage). Execution tools answer “is this tool allowed?” (access control). EATP answers “who authorized this action, within what constraints, and can we prove it to an auditor?” (trust governance).
Proposition 4: Knowledge compounds. Sessions do not. Stateless AI interactions start from zero every session. The observe-capture-evolve pipeline creates institutional knowledge that compounds across sessions, practitioners, and time (CO Principle 7). Model capability depreciates with each new release. Institutional knowledge appreciates with each successful application.
Proposition 5: The human’s value is defining context, not executing tasks. When AI handles execution, it reveals what the human contributes beyond task completion: ethical judgment, relationship capital, contextual wisdom, creative synthesis, emotional intelligence, cultural navigation (CARE Mirror Thesis). The human is the CO Architect — the person who designs the five layers — not the person who writes the output.
These propositions are not claims about what execution tools should become. They are claims about what organizations need beyond execution tools. The governance layer is complementary to the execution layer, not a replacement for it.
Part VII: Domain Applications
Section titled “Part VII: Domain Applications”COC: The Proof of Concept
Section titled “COC: The Proof of Concept”Cognitive Orchestration for Codegen is the first and most mature domain application of CO. It has been implemented in the Kailash open-source ecosystem and documented in a companion paper (Hong, 2026d).
COC demonstrates that the five-layer architecture is not theoretical. The implementation contains 29 agents (Layer 1), over 100 knowledge files across 25 skill directories (Layer 2), 8 rule files and 8 hook scripts including the anti-amnesia mechanism (Layer 3), a seven-phase workflow with 12 slash commands (Layer 4), and an observe-capture-evolve learning pipeline (Layer 5).
The results are specific and verifiable. AI-generated code follows organizational conventions rather than internet conventions. Critical security rules survive context window compression. New practitioners inherit accumulated institutional patterns. The details are in the COC thesis; what matters here is that the architecture works when implemented with discipline.
CO for Compliance and Finance: Sketches
Section titled “CO for Compliance and Finance: Sketches”These applications have not been built. They are proposed, not proven. I include them because the structural parallels are clear enough to be useful, and because honest acknowledgment of their unproven status is more valuable than omitting them.
Both domains suffer the same three failure modes. In compliance: amnesia erases regulatory interpretation history, convention drift applies textbook interpretations instead of established positions, safety blindness takes the most direct analytical path rather than the path that satisfies auditors. The pattern in finance is structurally identical.
The five-layer instantiation follows the same pattern in both:
| Layer | Compliance | Finance |
|---|---|---|
| Intent | Regulatory interpreters, policy drafters, audit preparers | Transaction analysts, forecasters, risk assessors |
| Context | Interpretation history, precedent databases, policy templates | Calibrated models, parameter choices, exception procedures |
| Guardrails | No external communication without review, mandatory disclaimers, data classification | Transaction limits, segregation of duties, format validation |
| Instructions | Intake, research, interpretation, review — each with evidence gates | Data validation, analysis, risk assessment, approval — each with evidence gates |
| Learning | Interpretation accuracy, audit finding patterns, regulatory trends | Forecast accuracy, model calibration drift, exception resolution |
Whether these instantiations work in practice is an empirical question that requires implementation and testing.
Part VIII: Adoption Path
Section titled “Part VIII: Adoption Path”Organizations adopt CO through five phases. These are not prescriptive — skip what does not apply — but they represent a tested sequence.
Phase 1: Audit. Map your institutional knowledge. What is documented versus tacit? Which rules are critical (hard enforcement) versus advisory (soft)? Where does your practice diverge from generic practice? This produces a gap analysis: the distance between your institutional knowledge and what is currently machine-readable.
Phase 2: Foundation. Build Layers 1 and 2. Define agent specializations. Encode critical institutional knowledge. Start with the master directive — the single document that establishes baseline rules for every AI interaction. This phase does not require technology beyond what exists today.
Phase 3: Enforcement. Build Layer 3. Implement hard enforcement for critical rules. Design defense in depth for the highest-risk rules. Deploy anti-amnesia mechanisms to ensure critical context survives long interactions.
Phase 4: Process. Build Layer 4. Define domain-specific workflow phases, approval gates, evidence requirements, and delegation rules. Test against real work. Refine based on where evidence is insufficient and where human reviewers cannot verify claims.
Phase 5: Learning. Build Layer 5. Deploy observation mechanisms. Establish the capture-and-evolution pipeline. Define who curates the knowledge base. This phase is ongoing. It does not end.
An organization can derive value from Phase 2 onward. Full five-layer implementation is not a prerequisite for improvement. Each layer adds value independently.
Part IX: What This Does Not Solve
Section titled “Part IX: What This Does Not Solve”Honest Limitations
Section titled “Honest Limitations”Novel judgment under genuine uncertainty. CO structures the routine so that humans can focus on the novel. It does not structure the novel itself. When the organization confronts a genuinely novel challenge, the decision remains a deeply human activity.
Emergent system behavior. Layer 3 catches local violations. It does not catch emergent problems from the interaction of individually correct actions: cascading failures, unintended consequences of multiple agents operating within their individual envelopes. These require system-level thinking that CO’s layer-by-layer architecture does not provide.
Organizational culture and politics. CO structures the technical relationship between practitioners and AI. It says nothing about who decides what becomes a guardrail, how to maintain professional expertise when much of the output is machine-generated, or how to navigate the politics of who controls the CO architecture.
Tacit knowledge that resists articulation. Polanyi’s insight cuts both ways. CO can make some institutional knowledge machine-readable, but some is genuinely tacit — the experienced practitioner’s “feel” for when something is wrong. CO structures the environment so that tacit judgment is applied at the right moments (approval gates, escalation triggers), but the judgment itself remains irreducibly human.
Model-specific limitations. CO reduces the frequency and severity of failures; it does not eliminate them. The AI will still occasionally produce incorrect output that passes all guardrails.
Layers 1-3 are converging commodity. By 2026, the execution primitives that map to CO Layers 1-3 — agent specialization, context engineering, and deterministic hooks — are independently implemented by multiple commercial tools (Part VI). CO does not claim novelty for these mechanisms. CO’s unique contribution is the governance methodology (Layers 4-5), the process model, the domain generalization framework, and the integration with formal trust infrastructure (EATP). Organizations that adopt only Layers 1-3 are implementing what every serious AI tool already provides. The distinctive value begins at Layer 4.
The bootstrap problem. Building CO layers requires the institutional knowledge that CO is designed to make available. Organizations with weak institutional knowledge have the most to gain and the least capacity to build it. CO helps organizations make existing knowledge available to AI. It cannot create knowledge that does not exist.
Falsifiability Criteria
Section titled “Falsifiability Criteria”CO’s central claims are testable. I propose specific failure criteria:
- Institutional Knowledge Thesis: Abandon if structured context produces no quality improvement across ten or more deployments.
- Three Failure Modes: Reclassify from structural to incidental if AI systems maintain conventions and prioritize safety without explicit enforcement.
- Five-Layer Architecture: Revise if all five layers show no improvement over fewer layers or alternative architectures, across five or more domains.
- Learning Compounds: Abandon if knowledge bases show no growth or no quality correlation across deployments sustained six or more months.
- Domain Generalizability: Narrow scope if CO implementations outside codegen consistently fail to produce quality improvements, across three or more domains.
These thresholds are proposed, not definitive. Independent researchers should establish appropriate methodology.
References
Section titled “References”Anthropic. (2026). “Claude Code: Documentation and Architecture.” Available at https://docs.anthropic.com/en/docs/claude-code.
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779.
Fama, E. F., & Jensen, M. C. (1983). Separation of Ownership and Control. Journal of Law and Economics, 26(2), 301-325.
Haidt, J. (2001). The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Judgment. Psychological Review, 108(4), 814-834.
Hart, O. (1995). Firms, Contracts, and Financial Structure. Oxford University Press.
Hong, J. (2026). “Constraint Theater: Governance Without Wealth Effects.” Submitted to American Economic Review. Theoretical foundation; CO addresses the specification quality problem (the q parameter).
Hong, J. (2026a). “CARE: Collaborative Autonomous Reflective Enterprise — A Core Thesis.” White Paper Series, Version 2.1. Terrene Foundation. https://github.com/terrene-foundation/publications/blob/main/CARE-Core-Thesis.pdf.
Hong, J. (2026b). “Enterprise Agent Trust Protocol (EATP) — A Core Thesis.” White Paper Series, Version 2.2. Terrene Foundation. https://github.com/terrene-foundation/publications/blob/main/EATP-Core-Thesis.pdf.
Hong, J. (2026d). “Cognitive Orchestration for Codegen (COC) — A Core Thesis.” White Paper Series, Version 1.1. Terrene Foundation. https://github.com/terrene-foundation/publications/blob/main/COC-Core-Thesis.pdf.
Hong, J. (2026e). “The Constrained Organization: An Organizational Model for Enterprise AI Governance.” White Paper Series, Version 1.0. Terrene Foundation. https://github.com/terrene-foundation/publications/blob/main/Constrained-Organization-Thesis.pdf.
Hong, J. (2026f). “PACT: Principled Architecture for Constrained Trust — A Core Thesis.” White Paper Series, Version 0.8. Terrene Foundation. https://github.com/terrene-foundation/publications/blob/main/PACT-Core-Thesis.pdf.
Hong, J. (2026, March). “CO/COC vs Claude Code CLI: Deep Conformance Analysis.” Terrene Foundation Internal Analysis.
Karpathy, A. (2025). “Vibe Coding.” Personal blog post, February 2025.
Klein, G. (1998). Sources of Power: How People Make Decisions. MIT Press.
Parasuraman, R., & Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2), 230-253.
Perry, N., Srivastava, M., Kumar, D., & Boneh, D. (2023). Do Users Write More Insecure Code with AI Assistants? ACM CCS 2023.
Polanyi, M. (1966). The Tacit Dimension. University of Chicago Press.
Project Management Institute. (2021). A Guide to the Project Management Body of Knowledge (PMBOK Guide), 7th Edition. PMI.
A Note on How This Paper Was Produced
Section titled “A Note on How This Paper Was Produced”This paper was written using the process it describes.
The initial draft (v1.0) was generated by a team of AI agents:
- research analysts that explored the knowledge base and source repositories, wrote structured drafts, and cross-referenced claims against anchor documents.
- A separate alignment-critic agent red-teamed the output, identifying issues including factual errors, overstatements, and inconsistencies with source material.
- A debate-expert agent challenged the draft’s arguments and flagged unverifiable claims.
The v1.1 update incorporated findings from a six-agent conformance analysis (Hong, 2026) that assessed CO requirements against Claude Code CLI’s architecture. The analysis confirmed L1-L3 convergence and L4-L5 gaps, leading to the Two-Era Framework, the Five Propositions, and the conformance comparison table added in Part VI.
The author directed every iteration. Instructions included:
- remove claims that cannot be verified
- strip out references to structures that do not yet exist
- simplify the disclosure to plain facts
- explain technical terms for non-technical readers
- ensure the paper reads as a standards proposition rather than marketing
- acknowledge convergence honestly and claim novelty only for what is genuinely unimplemented elsewhere
The author made direct edits to tone, phrasing, and substance throughout. Multiple rounds of revision followed, each narrowing the gap between what the agents produced and what the author judged to be honest and useful.
The AI agents drafted. The author defined the boundaries, evaluated the output, caught what the agents missed, and decided what stayed. That is human-on-the-loop in practice, not a theoretical model, but the process that produced this document.
The tools used were Claude Code (Anthropic) with specialized subagents for research, alignment checking, and adversarial review. The full conversation history, including every instruction, every correction, and every piece of content the author rejected, is retained by the author.
Version History
Section titled “Version History”| Version | Date | Changes |
|---|---|---|
| 1.0 | March 2026 | Initial thesis: seven first principles, five-layer architecture, three failure modes, conformance criteria, process model, domain application template. COC as reference implementation. Conformance analysis against Claude Code CLI |
| 1.1 | March 2026 | Added Principle 8 (Authentic Voice and Responsible Co-Authorship). Seven domain applications documented (COC, COR, COL, COL-F, COE, COG, COComp). Principle count updated throughout. CORP renamed to COR |
This paper is Hong (2026c), derived from the theoretical foundation in Hong, J. (2026). “Constraint Theater: Governance Without Wealth Effects.” CO addresses the specification quality problem (the q parameter in the formal model). See also: Hong, J. (2026a). “CARE: A Core Thesis” for governance philosophy. Hong, J. (2026b). “EATP: A Core Thesis” for trust verification. Hong, J. (2026d). “COC: A Core Thesis” for the first domain application. Hong, J. (2026e). “The Constrained Organization” for institutional design. Hong, J. (2026f). “PACT: A Core Thesis” for organizational architecture.