type: adr id: 0043-checkable-documents status: proposed created: 2026-06-04 updated: 2026-06-06 supersedes: superseded_by:
ADR-0043: Checkable documents — what is lintable, and with what
Context
corpus has a controlled obligation language (SOL, 0027) and a blocking lint pass (0029, 0034), both scoped to spec.md. A recurring design question — swept with live web verification and recorded in — is whether the lintable / structured-checkability discipline should extend to the other agent-interpreted artifacts: the source documents (audit.md, research.md, bug-report.md), the working artifacts (finding.md, the prose of review.md, memory/), and the repo-context files (AGENTS.md, the SKILL.md set).
The naïve reading — "extend lintable structure to those docs ⇒ more structure ⇒ more reliable" — is contradicted by the strongest, most recent measured evidence, which converges on a sharper partition: structure helps when it is typed answer-slots and checkable evidence binding on the frame; it hurts when it is more prose or rigid schema wrapped around free reasoning; and the reliability lever is a deterministic external check, never the model judging or voting on itself. This ADR records that partition so future passes know what to add — and, more load-bearingly, what not to add.
Decision
The lintable surface is partitioned as follows.
-
Obligation-blocks and the blocking obligation-lint gate stay spec-only. Only
spec.mdcarriesREQ/CONSTRAINT/INVARIANT/INTERFACE/QUESTIONblocks, and only it is subject to the blockingSOL-S/SOL-M/SOL-V/SOL-Olayers and the blockingSOL-P001–P008set behind the CLARIFY gate (§8, §9, §11.6). No obligation grammar runs on an audit, finding, research doc, or review. This is the regime where typed structure measurably helps — typed answer-slots and output contracts ([FORMATFREE], [SCOT]) — and pushing it into free-reasoning documents is the regime where it hurts, and would collapse the epistemic-stance partition (0030, §29.1). -
Other agent-interpreted artifacts are checkable along a subtractive + checkable dimension — never additive structure. The checkable properties are integrity properties of the frame, not new prose schema: provenance-resolution (a fact-shaped claim carries a resolving evidence anchor), evidence-before-conclusion ordering, staleness / conflict, and minimality / anti-bloat. Adding structured agent prose is a measured liability: over-specified context files reduce success and raise cost ([AGENTSMD-HARM]), and most added skill docs are inert with some actively harmful via staleness ([SWESKILLS]). What helps is machine-checkable / executable, not machine-readable ([ORACLESWE], [EVIBOUND]).
-
Structure binds the frame and metadata; the reasoning body stays free-form. An agent reasons free-form, then emits the structured artifact (decouple — never format while reasoning): format restriction degrades reasoning purely by ordering the answer before the reason, and decoupling recovers the loss ([FORMATFREE], [FORMATTAX]). Evidence precedes the claim it supports ([ATTRFIRST]); a conclusion-slot (e.g. a
VERDICTline) is the output a pass emits after it has reasoned — the "classification" regime where structure is safe. -
Enforcement is deterministic, never an LLM judge or a vote. A document-integrity check either resolves a referent (the
file:lineexists, the citation/URL resolves, thecontent_hashstill matches) or it is an advisory smell. Agent agreement, self-consistency, and LLM self-critique are not correctness signals: judge agreement collapses without a reference ([NOFREE]), voting amplifies correlated errors ([CORRELATED], [CONSENSUS]), and models cannot reliably self-correct without external feedback ([SELFCORRECT]). This is the deterministic-check spine the framework already rests on (VERIFY BY, 0038; empirical-proof, 0008). -
Namespace and severity. These checks join the one unified
SOL-<LAYER><NNN>namespace asSOL-P(prose-layer) codes applied across the artifact set — there is no sixth layer (0034's five-layers↔passes invariant is preserved). A document-integrity check MAY block only when it is backed by a deterministic resolving check (an unresolved provenance anchor, acontent_hash-detected staleness); a heuristic / requirements-smell check is advisory only, because lexical smell detection is precision-bounded (~48–59%, [SMELLS]). The blocking obligation-lint gate of point 1 is untouched and stays spec-only.
This record refines, and does not supersede, ADRs 0027, 0028, 0030, 0032, 0034, 0035, and 0037. It is the gate for the implementing passes (provenance-enforced lint; the evidence-before-conclusion rule; the staleness/conflict lint; the minimality discipline; the inter-agent contract) catalogued in ; this ADR settles what and with what, not the individual code assignments, which each implementing pass fixes against §8.
Alternatives considered
| Alternative | Why rejected |
|---|---|
| Extend the SOL obligation grammar (blocks) into audits / findings / research | The harmful regime: rigid schema around free reasoning degrades it ([FORMATFREE]); it blurs the epistemic-stance boundary the memory-injection literature motivates ([MINJA]) and collapses the stance partition (0030, §29.1). |
| Raise reliability by adding richer structured agent prose / context files | Measured net-negative: over-specified context costs more for less ([AGENTSMD-HARM]); most added skill docs are inert and some harmful ([SWESKILLS]). The win condition for extending the lint is subtractive. |
| Grade agent docs with an LLM judge or a multi-agent vote | Not a verifier: judge agreement collapses without a reference ([NOFREE]); voting amplifies correlated errors that grow with capability ([CORRELATED], [CONSENSUS]). |
| Add a sixth lint layer for documents | Breaks the five-layers↔passes invariant (0034); document-integrity checks fit the existing SOL-P prose layer as advisory codes. |
| Leave every non-spec artifact unchecked | The artifacts agents read most (findings, audits, memory, AGENTS.md) carry exactly the staleness and unresolved-provenance failure modes ([SWESKILLS], [CITECHECK]); a cheap deterministic check there is high-leverage. |
Consequences
Positive
- The framework states what not to add, which is the load-bearing result: future passes cannot drift toward additive document schema in the name of "more rigor."
- A cheap, deterministic, mostly-advisory check surface for the most-read artifacts (provenance resolves, nothing stale, evidence before claim, minimal) — without touching the obligation-lint gate or the stance partition.
- Aligns the document layer with the framework's validated spine: an external deterministic check is the lever (0038, 0008), matching where the multi-agent-failure evidence puts the leverage (the system-design/specification and verification layers together ≈ 63% of failures, [MAST]; contracts + verification cut failures 64–70%, [SEMAP]).
Negative
- The
SOL-Padvisory catalogue grows (append-only with tombstoning, §8.1.1); authors meet a few more advisory codes. - "Frame structured, reasoning free" is a boundary a reviewer or future tool applies by judgment; only the resolving checks (provenance, staleness) are fully mechanical. The ordering and minimality checks are partly heuristic and therefore advisory.
Neutral / tradeoffs
- This is scope + severity guidance, not a new construct. The blocking obligation-lint gate, the five lint layers, the seven block types, the artifact set, the verdict set, and every canonical count are unchanged.
Evidence and its limits (§0.7)
- Stance separation is threat-motivated design, not a measured reliability gain. [MINJA] measures the attack (memory injection); corpus's provenance/stance defense is sound, field-aligned design — its reliability delta is not separately measured. Claimed as design, not result.
- No controlled "spec-first measurably wins" study exists in the confirmed sources; the only hands-on numbers show heavyweight spec-driven development is slower. The discipline this ADR gates must stay cheap and load-bearing, never ceremonial.
- Smell-style prose checks are advisory only (~40%+ false-positive floor, [SMELLS]); blocking precision is reachable only against the defined SOL grammar (point 1), which is exactly why obligation-lint stays spec-only.
- Every preprint-grounded point above is corroboration of a direction, not the grounding of a
MUST(the load-bearing claims rest on the peer-reviewed entries).
Status
Proposed / parked (not yet in force). This records the direction for a checkable-document layer;
no document-integrity lint rule has been built — lint.md still scopes the SOL-P layer to
spec prose only. Nothing here modifies the live lint pass yet. The design note + backlog live in
.agents/lintable-docs-improvement-plan.md; promote this to accepted only when (and if) the rules land.
Affected obligations / constraints
- Proposes (not yet applied): the lintable-document partition; the subtractive-not-additive doctrine for non-spec artifacts; the deterministic-not-judge enforcement rule; the frame-structured / reasoning-free + reason-then-emit rule.
- Would modify (when implemented, not yet): the
SOL-Padvisory layer's scope (spec-prose-only → the artifact set) and its severity rule (a document-integrity check blocks only when backed by a deterministic resolving check). - Refines: 0027, 0028, 0030, 0032, 0034, 0035, 0037. Relates to §8, §9, §11.6, §20.3 / §29.1, §23, §26.
- Does NOT change: the obligation grammar, the blocking obligation-lint gate (spec-only), the five lint layers, the seven block types, the verdict set, or any canonical count.
Ledger note (2026-06-11): refined by ADR-0063.
Ready to run the loop on your own repo? Get started — copy the kit and write your first spec.