Writing a skill
A skill is a markdown file your agent reads when the work matches. It is instructions with a load trigger — and like any prompt, its structure determines whether it actually fires.
This page distills the evidence-backed authoring guide from the corpus-skills docs. No shortcuts. No silver bullets. Just a format that makes the agent more likely to do the right thing.
# A minimal skill
skills/write-feature/
SKILL.md
references/
task-template.md
# A persona needs no references folder
skills/persona-skeptic/
SKILL.md
The open Agent Skills spec defines the frontmatter and progressive-disclosure model: metadata is always present, the body loads when the description matches, and references load on demand.
Activation: the directive description
Agents scan the description to decide whether to load the skill. A self-published 650-trial measurement (one author, not peer-reviewed) reported passive descriptions activating far less reliably than directive ones with an explicit exclusion clause. Treat the exact percentages as a hint, not a law — but the direction is consistent and the fix costs nothing, so we took it.
Passive (weak)
Use when implementing a feature from a spec. Encodes the discipline — read the spec, survey patterns, halt on ambiguity, validate after every batch, paste output.
activates less reliably (self-reported)
Directive (strong)
Implement a feature from a spec. ALWAYS apply when the user asks to implement, build, or add a feature, when a spec doc is referenced, or when acceptance criteria are named — even with no spec. Do not code before surveying patterns, invent a requirement, or refactor in passing. Skip defect fixes, behavior-preserving refactors, deliberate rewrites, migrations, performance tuning, and test-only work.
activates far more reliably (self-reported)
Four required clauses
- WHAT.Open with an imperative verb: “Implement a feature”, “Back every claim”, “Judge another agent's work”.
- ALWAYS. Name concrete triggers, including implicit signals the user might not say literally.
- Do not. Block the default bypass — the thing the agent would do if it skipped the skill.
- Skip for. Name task types, not sibling skill names. This prevents directive saturation without assuming another skill is installed.
Target 350–600 characters. Hard cap 800. The open spec allows 1024, but the tighter limit is a forcing function for clarity.
Body anatomy
Activation is necessary but not sufficient. The body must be shaped so the agent acts on it. Long contexts suffer from U-shaped attention: the start and end are remembered; the middle fades.
Numbered rules with rationales
Each rule pairs an imperative with the failure mode it prevents. Rationales let the model extend the rule to cases the author didn't imagine.
Anti-patterns section
Negative examples are load-bearing. Document the temptations the agent should refuse, not just the happy path.
References one hop away
Anything deeper than a single `references/*.md` file risks partial reads and silent omissions.
Length budgets
- ~200 lines is the practical target.
- 500 linesis the hard cap, backed by Anthropic's own guidance.
- If a skill grows past 200 lines, the question is “what moves to references?”, not “should I raise the limit?”.
---
name: write-feature
description: Implement a feature from a spec...
---
# Skill: write-feature
## Purpose
<2–3 sentences. The failure mode this skill prevents.>
## Core rules
### 1. Read the packet first
### 2. Map every AC before coding
...
## What does not belong
## Anti-patterns
## Bundled resources
Forced visible output
Skills have two reliability problems: activation and execution. A loaded skill can still skip late-stage verification steps because they produce no visible content. The fix is structural: every verification must leave a marker the next reader can see.
empirical-proof
Every claim in ## Self-reviewgets its own verbatim pasted output. No paraphrase, no “✅ all passing”.
write-testing
Flip each new test's assertion: it must fail; restore it: it must pass. Paste the fail-then-pass transition.
The same mechanism drives Reflexion's +11 pp gain on HumanEval: a written artefact converts an implicit signal into a durable one. If the rule's compliance would otherwise be invisible, force it to produce output.
Self-containment
A user who installs only write-feature does not have empirical-proof in context. Therefore write-featuremust read correctly on its own. If a related discipline is load-bearing, restate it inline — a sibling may be mentioned only as an optional “if installed” note.
The AGENTS.md contract
Skills name abstract command slots — cmdTest, cmdLint, cmdValidate— never concrete commands. The consuming repo's AGENTS.md Commands table supplies the implementations. An empty slot means ask; a skill never invents a command.
This split is what makes a skill portable: the guide carries the discipline, your repo carries the toolchain.
Task templates
Long-running tasks accumulate intermediate findings, abandoned hypotheses, and decisions. The task file is the agent's working memory, written to disk. It is not the deliverable; it is the scratchpad. Keep it gitignored, local, and personal.
The MIHPSG rubric
Ship a references/task-template.md only when at least three of these six criteria hold:
- Multi-session — likely to span more than one agent session
- Iteration — validate → fix → re-validate cycles
- Hypothesis tracking — multiple competing explanations
- Multi-stage plan — four or more distinct phases
- State separate from deliverable — working state is not the final artefact
- Verification checks — pasted command output required
Personas and cross-cutting quality checks like empirical-proof deliberately ship no template — their discipline lives in the body and surfaces inside whichever workflow's task file is in play.
Scope
A skill must be useful in any consumer repo, by itself, with no implicit dependencies. The catalogue is deliberately narrow so it stays portable.
Engineering-domain prescriptions
auth-patterns, caching guides
Stack-specific skills
react-19-best-practices, postgres-index-patterns
Internal product knowledge
business-logic wikis, vendor runbooks
Automation or scripts
CI workflows, eval harnesses, build generators
Core / loader skills
personas-core, write-core
Always-on skills
anything that tries to load for every request
The right home for stack-specific or internal knowledge is the consuming repo's AGENTS.md, not a skill.
Read the full reference
This page is a summary. The full reasoning, controlled studies, and sample skills live in the repositories.