Skip to main content
SKILL.md — open Agent Skills format

Writing a skill

A skill is a markdown file your agent reads when the work matches. It is instructions with a load trigger — and like any prompt, its structure determines whether it actually fires.

This page distills the evidence-backed authoring guide from the corpus-skills docs. No shortcuts. No silver bullets. Just a format that makes the agent more likely to do the right thing.

layout.txt — what a skill folder looks like
pwrcheckevidence
terminal

# A minimal skill

skills/write-feature/
  SKILL.md
  references/
    task-template.md

# A persona needs no references folder

skills/persona-skeptic/
  SKILL.md

The open Agent Skills spec defines the frontmatter and progressive-disclosure model: metadata is always present, the body loads when the description matches, and references load on demand.

description.yaml — the load-bearing line

Activation: the directive description

Agents scan the description to decide whether to load the skill. A self-published 650-trial measurement (one author, not peer-reviewed) reported passive descriptions activating far less reliably than directive ones with an explicit exclusion clause. Treat the exact percentages as a hint, not a law — but the direction is consistent and the fix costs nothing, so we took it.

Passive (weak)

Use when implementing a feature from a spec. Encodes the discipline — read the spec, survey patterns, halt on ambiguity, validate after every batch, paste output.

activates less reliably (self-reported)

Directive (strong)

Implement a feature from a spec. ALWAYS apply when the user asks to implement, build, or add a feature, when a spec doc is referenced, or when acceptance criteria are named — even with no spec. Do not code before surveying patterns, invent a requirement, or refactor in passing. Skip defect fixes, behavior-preserving refactors, deliberate rewrites, migrations, performance tuning, and test-only work.

activates far more reliably (self-reported)

Four required clauses

  • WHAT.Open with an imperative verb: “Implement a feature”, “Back every claim”, “Judge another agent's work”.
  • ALWAYS. Name concrete triggers, including implicit signals the user might not say literally.
  • Do not. Block the default bypass — the thing the agent would do if it skipped the skill.
  • Skip for. Name task types, not sibling skill names. This prevents directive saturation without assuming another skill is installed.

Target 350–600 characters. Hard cap 800. The open spec allows 1024, but the tighter limit is a forcing function for clarity.

body.md — the rules that fire after activation

Body anatomy

Activation is necessary but not sufficient. The body must be shaped so the agent acts on it. Long contexts suffer from U-shaped attention: the start and end are remembered; the middle fades.

Numbered rules with rationales

Each rule pairs an imperative with the failure mode it prevents. Rationales let the model extend the rule to cases the author didn't imagine.

Anti-patterns section

Negative examples are load-bearing. Document the temptations the agent should refuse, not just the happy path.

References one hop away

Anything deeper than a single `references/*.md` file risks partial reads and silent omissions.

Length budgets

  • ~200 lines is the practical target.
  • 500 linesis the hard cap, backed by Anthropic's own guidance.
  • If a skill grows past 200 lines, the question is “what moves to references?”, not “should I raise the limit?”.
pwrcheckevidence
SKILL.md

---

name: write-feature

description: Implement a feature from a spec...

---

# Skill: write-feature

## Purpose

<2–3 sentences. The failure mode this skill prevents.>

## Core rules

### 1. Read the packet first

### 2. Map every AC before coding

...

## What does not belong

## Anti-patterns

## Bundled resources

execution.log — make the invisible visible

Forced visible output

Skills have two reliability problems: activation and execution. A loaded skill can still skip late-stage verification steps because they produce no visible content. The fix is structural: every verification must leave a marker the next reader can see.

empirical-proof

Every claim in ## Self-reviewgets its own verbatim pasted output. No paraphrase, no “✅ all passing”.

write-testing

Flip each new test's assertion: it must fail; restore it: it must pass. Paste the fail-then-pass transition.

The same mechanism drives Reflexion's +11 pp gain on HumanEval: a written artefact converts an implicit signal into a durable one. If the rule's compliance would otherwise be invisible, force it to produce output.

self-containment.md — no sibling dependencies

Self-containment

A user who installs only write-feature does not have empirical-proof in context. Therefore write-featuremust read correctly on its own. If a related discipline is load-bearing, restate it inline — a sibling may be mentioned only as an optional “if installed” note.

The AGENTS.md contract

Skills name abstract command slots — cmdTest, cmdLint, cmdValidate— never concrete commands. The consuming repo's AGENTS.md Commands table supplies the implementations. An empty slot means ask; a skill never invents a command.

This split is what makes a skill portable: the guide carries the discipline, your repo carries the toolchain.

task-template.md — externalised working memory

Task templates

Long-running tasks accumulate intermediate findings, abandoned hypotheses, and decisions. The task file is the agent's working memory, written to disk. It is not the deliverable; it is the scratchpad. Keep it gitignored, local, and personal.

The MIHPSG rubric

Ship a references/task-template.md only when at least three of these six criteria hold:

  • Multi-session — likely to span more than one agent session
  • Iteration — validate → fix → re-validate cycles
  • Hypothesis tracking — multiple competing explanations
  • Multi-stage plan — four or more distinct phases
  • State separate from deliverable — working state is not the final artefact
  • Verification checks — pasted command output required

Personas and cross-cutting quality checks like empirical-proof deliberately ship no template — their discipline lives in the body and surfaces inside whichever workflow's task file is in play.

scope.txt — what stays out

Scope

A skill must be useful in any consumer repo, by itself, with no implicit dependencies. The catalogue is deliberately narrow so it stays portable.

  • Engineering-domain prescriptions

    auth-patterns, caching guides

  • Stack-specific skills

    react-19-best-practices, postgres-index-patterns

  • Internal product knowledge

    business-logic wikis, vendor runbooks

  • Automation or scripts

    CI workflows, eval harnesses, build generators

  • Core / loader skills

    personas-core, write-core

  • Always-on skills

    anything that tries to load for every request

The right home for stack-specific or internal knowledge is the consuming repo's AGENTS.md, not a skill.

next steps

Read the full reference

This page is a summary. The full reasoning, controlled studies, and sample skills live in the repositories.