Skip to main content
sourceSource: corpus/docs/tutorial/02-task-and-run.mdModified: 2026-06-23

Walk the loop · Task and Run

Works today — plain markdown plus your agent; no corpus tooling required.

You have a spec. specs/checkout/spec.md (SPEC-checkout, status ready) carries one requirement — AC-001 — Expired session returns 409 — with a runnable check on it. Holding a spec is not the same as handing work to someone. This page is the two steps that do the handing:

  • Task — you write the packet that bounds the work: which requirement, which files, what must not change, how it gets verified.
  • Run — the work happens in an isolated branch, and a run summary with real test output comes back.

By the end you will have produced tasks/checkout-expiry.md and you will know exactly what a finished run looks like — including the one thing that separates a Pass from an Unverified.

About the test command. corpus ships no sample repo, so the npm run test:integration command on this page never literally runs against any code here — shop-api is the worked reference, not a downloadable project. You produce the corpus artifacts for real; you run the actual loop on your own next real change. Everything below is a real artifact you can copy.


Step 3 · Task: bound the work

A spec says what should be true. A task hands one slice of it to one pair of hands — agent or human — with a wall around it. The packet is where the handoff is written down so you can inspect it. The full reasoning lives in Creating tasks; here you just fill the template.

Now create the file. Copy templates/task.md to tasks/checkout-expiry.md and fill it in. Here is the whole packet for our one requirement:

---
type: task
id: TASK-checkout-expiry
source:
  - SPEC-checkout
scope: [AC-001]
status: ready
---

# Task: Expired checkout session returns 409

## Source

- Spec: `specs/checkout/spec.md` (SPEC-checkout)

## Scope

Implement or preserve:

- AC-001 — A checkout session older than 30 minutes returns 409 SESSION_EXPIRED, never a 5xx.

## Do not change

- the `sessions` table schema

## Affected areas

- `src/checkout/`, `test/`

## Verify

- [ ] `npm run test:integration -- expired-session` (AC-001)

## Agent instructions

<!-- The standing rules ship in templates/task.md — copy them verbatim,
     do not rewrite them per task. -->

Three lines are doing the real work, and each maps to a part of the template explained on the Creating tasks page:

  • scope: [AC-001] — exactly the requirement from your spec, by id. A task never invents requirements of its own; if the work turns out to need more, the agent's instruction is to stop and say why, not improvise.
  • The "Do not change" linethe sessions table schema. This is the scope wall: name the adjacent, tempting thing the work should leave alone. It is the most valuable line in many packets.
  • The Verify commandnpm run test:integration -- expired-session, copied straight from your spec's Verify with: line. One runnable command per requirement is the part an agent benefits from most.

The agent instructions are not yours to write. The standing rules — read the sources first, stay in scope or stop and say why, run every Verify item and paste real output, self-review the diff, leave a run summary — ship in the template and travel with every task, so every agent gets the same brief. Copy the ## Agent instructions block from templates/task.md verbatim; do not retype or edit them here. (See the section on the standing rules.)

Stop-point check. Open tasks/checkout-expiry.md. You should see: scope: [AC-001] in the frontmatter, a "Do not change" line naming the sessions table schema, exactly one line under Verify, and the standing Agent instructions block copied from the template unchanged. That is a complete packet — one requirement, one wall, one check. You have produced the handoff.


Step 4 · Run: hand it off in isolation

Run is the step where the packet leaves your hands. You can hand it to whatever does the work — Claude Code, Codex, Cursor, Aider, or a human colleague. corpus does not run agents and does not care which one you use; the packet is plain markdown, and anything that can read a file can work from it (Running agents).

Make an isolated branch

Run each task in its own git worktree, on its own branch off the base. This keeps the work off your main checkout, maps one task to one branch to one review, and makes abandoning a bad run as cheap as deleting a directory (why this hygiene pays off).

Now run this (against your own repo, on your own next change):

git worktree add -b corpus/checkout-expiry ../shop-api--checkout-expiry main

That gives you a fresh checkout in ../shop-api--checkout-expiry on a new branch corpus/checkout-expiry. This is a convention — nothing here enforces it; an agent told to edit in place will edit in place. (The optional corpus worktree sets up the isolated branch and checkout for you; you still launch your own agent inside it — the reference CLI prepares the loop, it does not run your agent.)

Hand off the packet

Point whoever is doing the work — agent or human — at the task file and let them follow the sources from there. No prompt preamble is needed; the standing rules are already in the packet. For an agent, the one-line handoff is:

Read tasks/checkout-expiry.md and do what it says.

A human colleague reads the same file and works from the same Scope, "Do not change", and Verify line. (The optional corpus run launches a configured agent in the task's worktree and records the launch; it never becomes the agent — the coding loop is still the agent's, never corpus's.)

What comes back

The worker does two things with the packet on the way out. First, it runs each Verify item and pastes that command's real output directly under the item — that is where the evidence lives. Then the last standing instruction has it fill the packet's ## Run summary section — changed files, out-of-scope edits, blocked questions, and one line per Verify command that points back at the paste rather than repeating it. Both shapes are owned by the task template and explained in What you get back — don't redraw them. A finished, green packet for our task looks like this, Verify section first:

## Verify

- [x] `npm run test:integration -- expired-session` (AC-001)

      Test Suites: 1 passed, 1 total
      Tests:       3 passed, 3 total
## Run summary

- Changed files: src/checkout/expiry.ts, src/api/errors.ts, test/integration/expired-session.test.ts
- Verify results:
  - npm run test:integration -- expired-session (AC-001) → PASS (output under Verify above)
- Out-of-scope edits: none
- Blocked questions: none

Notice what makes that packet trustworthy: the actual test output is pasted in under the Verify item, and the summary just indexes it — never re-pastes it. This is the one rule on this page worth memorizing, and it decides the review result a step later:

"Tests passed" with no pasted output is not evidence. A done-claim with the command's real output behind it can become a Pass at review. A bare "all tests passed" with nothing pasted is recorded as Unverified — not Pass, not Fail — until someone produces the output (the evidence rule). Insist on the paste while the worktree still exists and a re-run costs seconds; later it is archaeology.

The worker also self-reviews its diff as a skeptic before leaving the summary — that yields fixes, never a result; the Pass/Fail call belongs to the next step, made by someone who didn't write the code. See self-review before handoff for why an agent grading its own work can't issue the verdict.

Stop-point check. You should have: a worktree on branch corpus/checkout-expiry, the packet handed off with the one-line prompt, and a returned packet where the integration-test output is pasted under its Verify item (green) and the ## Run summary line cites it. If the summary says "passed" but no output sits under the Verify item, it is not done — ask for the paste before tearing anything down. When the output is there, you are holding a run ready to judge.


What you produced

  • tasks/checkout-expiry.md — a complete task packet: one requirement in scope, one "Do not change" wall, one Verify command, the standing agent instructions from the template.
  • An isolated branch (corpus/checkout-expiry) and a returned packet with real test output pasted under the Verify item and a Run summary that indexes it — the raw material the review step reads.

Next

You have a finished run with pasted evidence and an untrustworthy-by-default done-claim sitting in it. The next step turns that into a Pass: Review.

Ready to run the loop on your own repo? Get started — copy the kit and write your first spec.