type: adr id: adr-0094 status: accepted created: 2026-06-22 updated: 2026-06-22
ADR-0094 — Decomposition into small untangled units; review scrutiny weighted by diffusion, churn, and change-type
Context
Agent-written changes are cheap to produce and easy to make large. corpus's leverage is the handoff — what the agent is told to emit and what the human (or an independent reviewer) must actually look at. Two recurring questions had no grounded answer in the canon:
- How should a task be shaped? The docs said "a small cleanup is still a task" but never made small, single-concern, untangled, refactor-separated the primary decomposition rule.
- Where should review scrutiny concentrate? Field reports framed it as "greenfield (lighter) vs brownfield (heavier)." That framing does not hold up against the evidence.
A web-verified evidence pass (corpus-works #53; sources below, each verified June 2026 with honest tiers) settled the principle — and reframed the discriminator from new-vs-old to diffusion / churn / change-type.
- Small, self-contained changes are the best-grounded result in code review. Defect-detection effectiveness is best on small changes — a ~200–400 LOC band, falling past ~60–90 min ([[SMARTBEAR]] — industry dataset, cite as heuristic). Corroborated: useful-comment proportion drops as files-per-change rises ([[BOSU15]]); modern review converged on changes of tens of lines ([[RIGBY13]]); "one self-contained change," ~100 reasonable / ~1000 too large, refactor separate from behavior change ([[GOOGLESMALLCL]]).
- Untangling buys cleaner reviews, not more bugs caught. A controlled experiment (n=28) found decomposition yields fewer false positives and more context-seeking, with the number of defects found unchanged ([[DIBIASE19]]). The honest reason to split is reviewability, not detection.
- Risk rises with diffusion and differs by change-type. P(a change induces a failure) rises with size and diffusion (files/modules/subsystems touched) and depends on change type ([[MOCKUS00]]); ~40% of fault-fix changes introduce new defects while a one-line change is <~4% ([[PURUSHOTHAMAN05]]); faults cluster in churned code — change-count predicts faults better than size ([[GRAVES00]]).
- Counter-evidence to "net-new is safe": once size and total changes are controlled, change type (new feature vs improvement) has no significant effect on later defects — size/churn dominates ([[HINDLE11]]). So "greenfield" must never justify skipping review on a large or high-diffusion change.
Decision
-
Small, single-concern, untangled, refactor-separated is the primary task-decomposition rule (
docs/06-creating-tasks.md, thesplit-workguide). A task carries one concern; a refactor lands in its own task/commit, separate from the behavior change it enables (the small-cleanup exception in [[GOOGLESMALLCL]] stands). Level: convention. -
An oversized-packet heuristic is named as a toolable signal (
corpus check/ a step-bar): flag a task or review packet whose changed-LOC and files-touched/diffusion exceed a band, anchored to the [[SMARTBEAR]] 200–400 LOC heuristic with files-touched as a first-class signal ([[BOSU15]]). Thresholds ship as heuristics with provenance, never enforced law. Level: toolable (the checker iscorpus-cli); not built here — this ADR specifies it, corpus-cli implements. -
Review scrutiny is weighted by change-type / diffusion / churn, not greenfield-vs-brownfield (
docs/05-brownfield-and-change-plans.md,docs/06). Concentrate human / independent-reviewer attention on (a) fault-fix / modification changes, (b) high-diffusion connective tissue touching many files/subsystems/interfaces, (c) high-churn loci. A lighter lane for net-new code is acceptable only when it is also small and low-diffusion, and still demands genuine engagement on the risky seams. The Hindle caveat is explicit: never let "greenfield" excuse skipping review on a large or high-diffusion change ([[HINDLE11]]). Level: checklist. -
Decomposition's benefit is framed honestly as cleaner reviews, not more bugs caught ([[DIBIASE19]]). The docs do not claim splitting catches more defects.
Consequences
- The "connective-tissue" instinct (split out the wiring so the human reviews only what must be reviewed) is kept — but named precisely as coupling/diffusion, the version the evidence supports, not "new vs old."
- No
checks.yamlrule and no contract-version bump land with this ADR: the oversized-packet flag is specified-not-shipped (acorpus-clitoolable, demand-gated), consistent with the honesty framework (ADR-0063) — name the checker, never claim enforcement before a tool ships. - Thresholds are heuristics: [[SMARTBEAR]] is a single 2006 vendor dataset, and [[HINDLE11]] is scoped to four Apache Java projects — the docs cite the band as guidance with its provenance and caveats, never as a measured optimum.
Propagation
docs/06-creating-tasks.md (task-shaping rule + named oversized-packet flag), docs/05-brownfield-and-change-plans.md
(risk-weighted review + Hindle caveat), the kit split-work guide (the small/untangled lever),
docs/research/sources.md (the nine entries above). The corpus-cli oversized-packet check is the
toolable follow-up (tracked in corpus-works #61 §B), not shipped by this ADR.
Update (2026-06-22) — the oversized-packet band measured, deferred (ADR-0097)
The named oversized-packet toolable was built and measured: a raw changed-LOC/files band cannot be
both useful and low-false-positive for code task diffs (≈15% FP at 600 LOC — feature-with-tests shares
the 600–1200 LOC range; a 0-FP band of ≥1500 never fires on the population it targets). So the
band-based check is specified-not-shipped (id C018 reserved for a future decomposition-predictive
signal), and corpus review surfaces the diff size as neutral info instead — the "size as a signal"
intent in the only honest form the data supports. Full measurement in ADR-0097.
Ready to run the loop on your own repo? Get started — copy the kit and write your first spec.