type: adr id: 0053-structured-spec-and-review-system status: accepted created: 2026-06-07 updated: 2026-06-07 supersedes: superseded_by:
ADR-0053: corpus is a structured specification & review system (review-as-exceptions)
Context
Recent agent-evaluation evidence is consistent on one point: the surrounding system matters as much as the model. Harness/configuration choice alone moved aggregate scores by 23.8 points on a fixed task set and model pool [HARNESSBENCH]; an automatic-harness method lifted Terminal-Bench 2 pass@1 69.7%→77.0% with the gains coming from tools, middleware, and memory — not the system prompt [AHE]; a standardized harness + trace inspection is what made agent evaluation tractable at all [HAL]; and frontier agents still score < 65% on hard, verifiable CLI tasks [TERMBENCH]. In parallel, ambiguous requirements degrade code generation and models can't reliably self-resolve the ambiguity [ORCHID]. And the productivity story is not what it feels like: a 2025 RCT measured experienced developers ~19% slower with AI while they believed they were faster [METR], and industry data finds adoption raises both throughput and instability absent a control layer [DORA2025].
Read together, these say what corpus should be, plainly: not "the smartest agent," but the layer that makes agent work legible, safe, and reproducible — clear specs in, reviewable evidence out, with a verification gate as the control the evidence says adoption lacks. The framing the framework had drifted toward (spec-repo discipline, obligations, verdicts) is correct but under-stated; this ADR names the position.
Crucially, the substance already exists and needs no new machinery: the merge gate
(0035), the WAIVED lifecycle verdict, status.md (desired vs.
observed), source authority (0031), the trace/review separation, and
the recognized source-classes (docs/artifacts/README.md §3) are all in place. This is a positioning
reframe over existing parts, not a new layer.
Decision
- Position corpus as a structured specification & review system for agentic software work: it turns messy inputs into verifiable specs, specs into bounded agent work, and large agent output into reviewable evidence.
- Foreground "review-as-exceptions" as the merge-gate payoff — the productivity unlock and the control
layer. A reviewer inspects failed/unverified obligations, unauthorized changes, high-risk surfaces, and
promotion decisions, not every generated line. This is framing mapped onto the existing
review.mdsections (the merge-gate predicate,## Unauthorized changes, theRISK high/criticaldual-judge rule,## Promotion queue) — it restates no semantics. - No new artifacts, directories, roles, or runtime. No
docs/enterprise/, no "Spec Author" role, nointake/status/ledger/policy/scaffold dirs — the substance (merge gate,WAIVED, status, source authority, source-classes) already exists. The only structural addition is an optional## Source inputssection in the spec contract + template (provenance for upstream intent, drawn from the existing source-classes), with zero tool-specific connectors. - Honor the overclaim ban. The public language avoids "compiler," "trust agents automatically," "makes review obsolete," "regenerates code from specs," and "the new SDLC." Understated = clear and brief, not informal: the SOL/verdict rigor stays.
This refines 0050,
0051, and 0052 (a positioning
layer over the spec-repo discipline + per-feature homes) and keeps .agents/
(0049). It changes no closed set, the SOL
grammar, the nine steps, the verdicts, or the artifact set.
Alternatives considered
| Alternative | Why rejected |
|---|---|
Add an enterprise layer — docs/enterprise/, a "Spec Author" role, intake/status/ledger/policy/ dirs | The useful substance already exists (spec-repo discipline 0050/0051, WAIVED, status.md, source-authority, promotion-protocol, source-classes). New dirs/roles add ceremony without new capability — against the minimality discipline. |
| Build tool-specific intake connectors (Jira/Notion/Linear/…) | Couples the framework to vendors and adds a runtime surface. The agnostic source-classes + an optional ## Source inputs link table capture upstream provenance with zero connectors. |
| Expand the APS improve-operation set / high-risk-word list (per the source brief) | The improve-operation set is a frozen closed set of 10 (conformance check A15); changing it breaks count reconciliation for no gain. |
| Market as a "harness platform" / "compiler" | Overclaims, and the evidence rewards the opposite: the harness matters internally [HARNESSBENCH][AHE], but users buy clarity, safety, and reviewable evidence — not a harness brand. |
Consequences
- Positive: a plain, evidence-grounded position (the control/verification layer); review-as-exceptions makes the payoff legible to a reviewer drowning in agent diffs; the framing is defensible against the perceived-vs-real gap [METR] and the adoption-instability finding [DORA2025].
- Negative: a bounded doc sweep —
positioning.md, the rootREADMEon-ramp,passes/review.mdforegrounding, the spec contract/template## Source inputssection, and thesources.mdgrounding. Done as part of this change. - Neutral: no closed set, grammar, step, verdict, or artifact changes — only framing, one optional spec section, and the evidence base grow.
Status
Accepted (v0.1). The positioning/README/review/spec-contract edits and the sources.md additions are this
change. No follow-on migration.
Affected obligations / constraints
- Refines: 0050, 0051, 0052 (positioning over the existing discipline + homes).
- Keeps: 0049 (
.agents/, no mount). - Grounded by: [HARNESSBENCH], [AHE], [HAL], [TERMBENCH], [ORCHID], [METR], [DORA2025].
- Does NOT change: any closed set, the SOL grammar, the nine steps, the verdicts, or the artifact set.
Ledger note (2026-06-11): refined by ADR-0057; intake-directory deferral partially superseded by ADR-0061.
Ledger note (2026-06-12) — evidence and its limits: the Context figures (HARNESSBENCH, AHE, HAL, TERMBENCH, METR) are preprint-tier corroboration, cited here as preliminary evidence, not settled results; METR's uplift figure in particular should not be cited as settled (see the caveats in
../research/sources.md).
Ready to run the loop on your own repo? Get started — copy the kit and write your first spec.