Architecture
How @contentful/skill-kit works internally. For the author-facing API, see the API Reference. For the full specification, see SPEC.md.
CLI Protocol
A built skill is invoked by agents via Bash — one call per step. The SDK supports two invocation modes: session mode (recommended) which uses a temp file for communication, and stateless mode (fallback) which passes everything via CLI args and stdout.
Session mode (recommended)
Session mode moves protocol data to a JSONL temp file, reducing noise in the agent’s Bash output and eliminating the growing --history flag.
1. Start — Agent creates a session:
scripts/run --params '{"repoPath":"."}' --host claude-code --session new
Returns a minimal SessionPointer to stdout:
{ "sessionId": "abc123", "file": "/tmp/skill-kit-abc123.jsonl", "line": 2 }
The agent reads line 2 from the session file (via the host’s Read tool) to get the prompt, schema, and preamble.
2. Write output — Agent appends its response to the session file:
echo '{"type":"output","step":"diagnose","output":{"checks":[...]}}' >> /tmp/skill-kit-abc123.jsonl
3. Advance — Agent calls advance with just the session ID:
scripts/run advance --session abc123
Returns a line number (e.g., 4). The agent reads that line for the next prompt or done signal.
4. Repeat until the line contains "type":"done".
Stateless mode (fallback)
In stateless mode, the agent passes the full conversation history on every invocation. JSON goes to stdout.
1. Start — Agent calls scripts/run (defaults to start):
scripts/run --params '{"repoPath":"."}' --host claude-code
Returns a PromptResult:
{
"kind": "prompt",
"preamble": "In this session you will follow a structured workflow...",
"step": "diagnose",
"prompt": "Inspect the repository and report failed health checks.",
"schema": { "type": "object", "properties": { "checks": {} } }
}
The preamble is emitted once. It establishes session-wide conventions (XML tag-to-tool mappings, rendering rules) so that later step prompts can be shorter.
2. Advance — Agent submits output, gets next step:
scripts/run advance \
--step diagnose \
--output '{"checks":[{"name":"ci","status":"fail","detail":"no config"}]}' \
--params '{"path":"."}' \
--history '[{"step":"diagnose","response":{...}}]' \
--host claude-code
Returns another PromptResult (next step) or a DoneResult:
{
"kind": "done",
"done": true,
"finalOutput": { "summary": "..." },
"completed": { "step": "report", "output": {} }
}
3. Validation error — If the agent’s output doesn’t match the step schema:
{
"kind": "error",
"error": "validation",
"step": "diagnose",
"message": "Expected object, received string",
"retry": true
}
The agent retries with corrected output. The retry: true flag tells the agent the step hasn’t advanced.
Result type discrimination — All result types carry a kind field ('prompt', 'done', 'error', 'redirect') that serves as a discriminant for the CliResult union. The SDK exports type guard helpers: isPrompt(r), isDone(r), isError(r), isRedirect(r).
CLI Flags
| Flag | Required | Description |
|---|---|---|
--params | On start | JSON string validated against the skill’s params schema |
--step | On advance | Name of the step being submitted. Not needed with --session in file mode |
--output | On advance | JSON string — the agent’s response. Not needed with --session in file mode |
--history | On advance | JSON array of { step, response, actionResult? }. Not needed with --session |
--host | Optional | Host identifier: claude-code, codex, opencode, gemini-cli, cline, roo-code, kilo-code, cursor, amp, generic |
--tools | Optional | Comma-separated list of available tools (merged with host registry; authoritative with --subagent). Only needed on start — session mode stores tools in the header |
--subagent | Optional | Boolean flag. Indicates a subagent with a genuine tool subset — --tools becomes authoritative (no registry merge) |
--session | Optional | new (start) or session ID (advance). Enables session mode |
--session-dir | Optional | Directory for session files. Default: OS temp directory |
--output-mode | Optional | file (default) or flag. How agent passes step output in session mode |
Why stateless is still supported
No persistent processes, no stdin piping, no subprocess lifecycle management. The agent makes sequential Bash calls and parses JSON — a pattern every agent host supports today. Statelessness also enables horizontal scaling, resumable workflows, and retry/redo logic without session corruption.
History replay is cheap. The engine reconstructs state (store, visit counts) from history data without re-executing actions or observers. Session mode uses the same replay mechanism — it just reads history from the file instead of CLI args.
MCP Transport
The CLI protocol works everywhere but requires multiple visible tool calls per step (Bash + file Read). The MCP transport wraps the same SkillEngine as a long-lived stdio MCP server, reducing each workflow step to a single MCP tool call.
How it works
scripts/run mcp --host claude-code
The binary starts an MCP stdio server (via @modelcontextprotocol/sdk). Two tools are registered:
start— creates aWorkflowEngine, returns the first step prompt with preambleadvance— callsengine.advance(), auto-advances prompt-less steps, returns next prompt or done
State lives in memory — no JSONL files, no history replay. Session IDs link start and advance calls. When the workflow reaches done, the session is removed.
Relation to CLI mode
Both transports share the same engine layer (WorkflowEngine, SubskillEngine, autoAdvance, generatePreamble). No MCP-specific engine logic exists. The MCP entry point is a thin adapter that maps MCP tool calls to engine method calls.
The generated SKILL.md includes MCP instructions first, with CLI as a labeled fallback. The agent chooses which mode based on whether MCP tools are in its tool list.
The Host-Aware Primitive System
The architectural constraint
The skill CLI cannot call tools. It cannot invoke MCP methods. It cannot cause the host to render UI. Only the model can do those things, and only in response to prose it reads.
When the SDK wants the model to use AskUserQuestion on Claude Code, all it can do is return XML that the preamble has mapped to that tool. The model reads the XML tag, consults the preamble’s tag-to-tool table, calls the tool, and passes the answer back on the next invocation. The answer shape is still enforced by the step’s ArkType schema.
Everything in the host-aware system is downstream of this constraint. Primitives render XML tags; the preamble maps each tag to the host’s tools. The “capability system” is a lookup table that picks which tool to advertise per tag.
Two mechanisms
Preamble at session start. Generated once per skill invocation. The preamble is a markdown table mapping XML tags to host-specific tools:
Tag Tool How to use <ask-user>AskUserQuestion Present <option>children as choices via the tool…<checklist>TaskCreate Register <item>children via the tool…<subagent>Agent Spawn isolated agent for enclosed task via the tool…
The preamble is generated per host — different tools in the Tool column, same tags and semantics. resolveTools(handshake) matches each primitive’s tool list against the host’s available tools to populate the table. Later step prompts can be shorter because the preamble has set the context.
Preambles are best-effort — the model may forget them under context pressure. The XML tags themselves are self-describing, which helps even without preamble context. Preambles optimize the common case; the tag structure guards correctness.
Per-step XML rendering. For any step using a primitive (passed to prompt directly or composed via act.* in a prompt function), the SDK renders the primitive as an XML tag. An askUser step emits:
<ask-user type="structured" question="Which target?">
<option value="production" label="Production"></option>
<option value="staging" label="Staging"></option>
</ask-user>
The preamble tells the model how to handle <ask-user> — on Claude Code, use the AskUserQuestion tool; on a host without a structured-question tool, present as a numbered list. No tool names appear in the XML itself.
Primitive registry and tool resolution
Each primitive is defined with definePrimitive() (src/primitives/primitive.ts):
interface RenderContext {
skillName?: string;
}
interface Primitive<TInput, TConfig, TTools> {
readonly tag: string; // XML tag name (e.g., 'ask-user')
readonly tools: TTools; // tool names to match against host
create(input: TInput): TConfig;
render(config: TConfig, ctx?: RenderContext): string;
preambleRow(tool: string | undefined): PreambleRow;
}
The render method accepts an optional RenderContext. The engine passes { skillName } when calling renderPrimitive, allowing primitives like subagent to emit the skill name in the no-recurse attribute.
The registry (src/primitives/registry.ts) manages all five primitives and exposes two functions:
renderPrimitive(config, ctx?)— dispatches to the correct primitive’srender()method byconfig.kind, forwarding an optionalRenderContext(e.g.,{ skillName }). Returns the XML string.resolveTools(handshake)— three-way tool resolution: (1) no explicit tools → use host registry; (2) explicit tools +isSubagent→ authoritative, no registry merge; (3) explicit tools withoutisSubagent→ union with host registry. Returns aToolResolvermap from tag name to matched tool (orundefined).
Tool resolution uses a three-way strategy based on the isSubagent flag on the Handshake. Top-level agents that under-report their tools (common in practice) get the registry union, so their preamble includes all known host tools. Subagents that pass --subagent get authoritative resolution — only their explicitly reported tools are used, preventing the preamble from referencing tools the subagent does not actually have.
To add a new primitive, define it with definePrimitive() and add it to the ALL_PRIMITIVES array in the registry.
Prompt assembly pipeline
When the engine builds a step’s prompt, it follows this sequence:
-
resolvePromptValue— Evaluates the step’spromptconfig. If it’s a function, calls it withPromptContext(which includesactandsystembuilders). If it’s a string, array, or single segment, uses it directly. ReturnsPromptReturn. -
normalizePieces— Wraps the result in an array ofPromptPiecevalues. A single string or segment becomes a one-element array. -
assemblePieces— Walks the array in author order, wrapping each piece in its XML tag:- Plain strings become
<prompt>\n...\n</prompt>. SystemSegment({ kind: 'system' }) becomes<system>...</system>.ActSegment({ kind: 'act' }) is rendered viarenderPrimitive()to its XML tag (e.g.,<ask-user>,<checklist>,<subagent>).ViewSegment({ kind: 'view' }) becomes<rendered>...</rendered>(with optionalnameattribute).
Non-empty results are joined with double newlines.
- Plain strings become
The author controls ordering. A prompt function returning [system\…`, act.checklist(…), ‘Build the game.’]produces, then , then
Why primitives matter
Three reasons, all load-bearing:
- Centralized tuning. The SDK owns the XML rendering and preamble mappings that produce reliable behavior per host. One tuning pass benefits every skill.
- Host portability. Authors write intent once; the SDK translates per host. No hardcoded tool names to break on a different agent.
- SDK improvements propagate. Better phrasing for Codex six months from now ships as an SDK update. Every skill using primitives gets it for free.
Skills written against primitives inherit prompt-engineering work done in the SDK. Skills written as raw prose don’t.
Escape hatch
For cases where author-written prose must be host-aware, PromptContext.host.toolsAvailable is available in prompt functions:
prompt: ({ host }) => {
if (host.toolsAvailable.includes('WebSearch')) {
return 'Search the web for recent CVEs affecting this dependency.';
}
return 'Check the changelog for known security issues.';
},
The lint rule no-host-tool-names enforces that raw tool names only appear inside host.toolsAvailable.includes() guards.
Known host tool inventories
Claude Code: AskUserQuestion, EnterPlanMode, ExitPlanMode, TaskCreate, TaskUpdate, TaskList, TaskGet, Agent, Skill, Read, Edit, Write, Bash, Glob, Grep, WebFetch, WebSearch, TodoWrite, SendMessage, Monitor, LSP, NotebookEdit, EnterWorktree, ExitWorktree.
Codex CLI: shell, apply_patch, update_plan, web_search, view_image, exec_command, write_stdin, ToolRequestUserInput, CollabAgent.
OpenCode: bash, read, write, edit, apply_patch, glob, grep, codesearch, lsp, webfetch, websearch, question, todo, task, plan, skill.
Gemini CLI: shell, read-file, write-file, edit, glob, grep, web-search, web-fetch, ask-user, enter-plan-mode, exit-plan-mode, write-todos, agent, tracker-create-task, tracker-update-task, memory, activate-skill, complete-task.
Cline / Roo Code / Kilo Code: execute_command, read_file, write_to_file, edit_file, apply_diff, apply_patch, search_files, list_files, codebase_search, ask_followup_question, attempt_completion, new_task, switch_mode, update_todo_list, and host-specific additions.
Cursor: codebase_search, read_file, edit_file, run_terminal_command, file_search, grep_search, list_dir.
Amp: shell, read, write, edit.
The Build Pipeline
skill-kit build <entry.ts> -o <dir> produces a distributable, agentskills.io-compliant skill directory.
Build modes
The --mode flag selects the bundling strategy:
--mode bun(default) — Compiles platform-specific executables viabun build --compile. Standalone, no runtime dependency, ~50-100MB per target.--mode node— Bundles into a single.mjsfile via esbuild. Requires Node.js >= 24 at runtime, ~100-500KB.
Pipeline steps
- Load — Import the entry file, extract the default export (must be a
SkillDefinitionorReferenceDefinition). - Validate — Run lint checks (cycle guards, schema consistency).
- Generate wrapper — Create a temporary entry point that imports the skill and calls
main()(orcompositeMain()if the skill has subskills) from@contentful/skill-kit/cli. - Bundle — Mode-dependent:
- Bun mode: For each target platform, run
bun build --compile --target bun-<platform>. Individual target failures don’t halt the pipeline; zero successful targets does. - Node mode: Run esbuild to produce a single
.mjsbundle with all dependencies inlined.
- Bun mode: For each target platform, run
- Generate scripts/run — Shell wrapper (mode-dependent: platform dispatcher for bun, Node version check for node).
- Generate SKILL.md — Agent-facing documentation with invocation instructions, step descriptions, and reference pointers.
- Generate package.json — Name, version, and any fields from the skill’s
packageconfig. Merges with existingpackage.jsonin the output directory. WhenresolveVersion: true, reads the version from the nearest ancestorpackage.json. - Copy references/ — Markdown files from the source
references/directory. - Clean up — Remove temporary wrapper files.
Output structure
Bun mode:
<dir>/
SKILL.md ← Agent reads this first
package.json
scripts/
run ← Detects OS/arch, delegates to binary
bin/
<name>-darwin-arm64 ← macOS Apple Silicon
<name>-linux-x64 ← Linux x86_64
references/
*.md ← Bundled content files
Default targets: darwin-arm64 and linux-x64. Override with --targets. Use --single for current-platform-only dev builds.
Node mode:
<dir>/
SKILL.md ← Agent reads this first
package.json
scripts/
run ← Checks Node ≥ 24, runs bundle
bin/
<name>.mjs ← Single ESM bundle
references/
*.md ← Bundled content files
The scripts/run wrapper
Agents call scripts/run, never bin/ directly. The wrapper varies by mode but the contract is identical.
Bun mode — detects OS/architecture, selects the correct binary:
#!/usr/bin/env bash
set -euo pipefail
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m)"
case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
BIN="$SKILL_DIR/bin/<name>-${OS}-${ARCH}"
exec "$BIN" "$@"
Node mode — checks Node version, sets SKILL_DIR for reference resolution:
#!/usr/bin/env bash
set -euo pipefail
NODE_VERSION="$(node -e 'process.stdout.write(process.versions.node.split(".")[0])')"
if [ "$NODE_VERSION" -lt 24 ] 2>/dev/null; then
echo "error: Node.js >= 24 required" >&2; exit 1
fi
export SKILL_DIR
exec node "$SKILL_DIR/bin/<name>.mjs" "$@"
This decouples the skill’s contract (SKILL.md references scripts/run) from its internal layout.
The compile/bundle step
The SDK generates a temporary entry point:
import skill from './skill';
import { main } from '@contentful/skill-kit/cli';
main(skill);
Bun mode: bun build --compile bundles everything — the SDK, ArkType, the skill code — into a single self-contained executable. The SDK itself has no Bun runtime dependency; Bun is used only as a build tool.
Node mode: esbuild bundles the same tree into a single .mjs file. All dependencies are inlined; only Node.js built-ins are external.
Engine Internals
The WorkflowEngine (src/runtime/engine.ts) is the core state machine.
Lifecycle
Constructor — Takes a SkillDefinition, Handshake, params, and optional ReferenceLoader. Initializes the state store.
start() — Validates the skill structure (parent sentinels resolved, cycle guards present). Generates the preamble. Fires onStepStart. Returns the first step’s PromptResult. If the entry step has no prompt, it auto-advances through the step lifecycle without agent interaction (useful for computation-only routing steps).
advance(stepName, rawOutput) — The main loop:
- Validate response against step’s ArkType schema. Steps without a
responseschema skip validation. - If invalid: fire
onStepValidationFailed, returnValidationErrorResultwithretry: true. - Map action input via
action.mapInput(if configured), or use validated response directly. - Execute action (if configured). Action receives typed input and AbortSignal.
- Compute save — call the
savecallback (if configured) with{ response, actionResult, store, params }. The return value is{ step?, ...subStoreWrites }. Thestepproperty determines what gets stored as the step result (priority:save().step> action output > response). Additional keys are deep-merged into the corresponding sub-stores viaapplySave(). - Freeze the step result object.
- Append the step result to
store.steps. Deep-merge any sub-store writes into their top-level store properties. - Fire
onStepComplete. - Resolve next step:
{ terminal: true }/terminal— fireonSkillComplete, returnDoneResult.NextBranch[]— evaluate branches in order, first match wins.- Function — call with
{ response, actionResult, attempts, params, store }, get step name. 'self'— rewrite to current step name.- Apply
maxVisits/onMaxVisitsthrottle.
- Fire
onTransition. - If the next step has no prompt, auto-advance through it immediately (no agent round-trip needed).
- Return next step’s
PromptResult.
History replay
replayHistory(history) reconstructs engine state from a previous execution. It validates each entry’s response against the step schema and re-executes save callbacks to rebuild both step results and sub-store state, but does not re-execute actions or fire observers. This is how the stateless protocol works — each advance call replays the full history to rebuild state before processing the new step.
StateStore
Append-only store with two namespaces. Step results live under store.steps — each step appends a StepResult record containing the step name, response, actionResult, and computed result. The steps namespace provides a typed accessor (StepsView) via a Proxy that maps property access to step-name lookups. Guaranteed steps (on all paths from entry) are non-optional; branch targets require ?.. Methods like steps.all(step) and steps.ran(step) provide loop and existence queries. Values are frozen after append.
Top-level sub-stores hold domain-structured state populated by save callbacks. When a save callback returns keys beyond step, those are deep-merged into the corresponding sub-store properties via applySave(). Multiple steps can write to the same sub-store — writes accumulate via deep merge, allowing incremental state building across the workflow.
Observer dispatch
Observers fire sequentially and are awaited (they can be async), but failures are caught and logged — they never block workflow execution. Observers receive read-only snapshots of engine state.
Composite skill routing
When a composite skill’s step returns a next target that doesn’t exist in the local step map (e.g., 'subskill:doctor'), the engine returns a RedirectResult instead of throwing. The composite entry point (compositeMain) intercepts this:
subskill:X— looks up the sub-skill registration, callsparamsMap(response, store)to produce params, creates a newWorkflowEnginefor the sub-skill, and returns its firstPromptResultwith the step name prefixed (doctor/diagnose).topic:X— loads the topic content viaReferenceLoaderand returns aDoneResult.
On subsequent advance calls, the composite entry checks whether the step name contains /. If it does, it routes to the corresponding sub-skill engine (with history filtered and unprefixed). The engines themselves are unaware of the composite layer — each operates on its own SkillDefinition with unprefixed step names.
Direct sub-skill access (scripts/run doctor --params '{}') bypasses the dispatcher entirely and creates the sub-skill engine directly.
Lint System
checkSkill(skill, rootDir) runs static analysis on a skill definition. Returns an array of LintDiagnostic objects:
interface LintDiagnostic {
rule: string;
severity: 'error' | 'warning';
message: string;
step?: string;
file?: string;
}
Rules
cycle-guard (warning/error) — Warns when circular step transitions (self-loops and multi-step cycles) lack maxVisits + onMaxVisits; an implicit runtime limit of 10 visits applies. Errors when the cycle-guard configuration itself is invalid (e.g., onMaxVisits targets a non-existent step). Enforced at validation time, before the engine runs.
no-host-tool-names (error) — Steps must not reference host tool names directly (e.g., AskUserQuestion, apply_patch, TodoWrite) in prompts without guarding behind host.toolsAvailable.includes('ToolName'). Scans both string prompts and function .toString() output. The guard pattern exempts the reference.
primitive-schema-mismatch (error/warning) — For steps with askUser structured type: errors if option values are missing from the output ArkType enum, warns if the enum has values not present in the options list.
orphan-references (warning) — Files in the references/ directory that aren’t mentioned in any step prompt. May indicate dead content.
unknown-tool-names (warning) — host.toolsAvailable.includes() calls that reference tool names not in the known registry (40+ tools across Claude Code, Codex, and OpenCode).
host-branching-density (warning) — Multiple steps branching on host.toolsAvailable.includes(). Suggests a missing SDK primitive — if several steps need host-specific logic, the pattern should probably be elevated to a primitive.
composite-step-name (error) — Dispatcher step names containing /, which conflicts with sub-skill step namespacing.
composite-duplicate-subskill / composite-duplicate-topic (error) — Duplicate names in sub-skill or topic registrations.
For composite skills, checkSkill recursively lints each registered sub-skill. Diagnostics from sub-skills are prefixed with [subskill:<name>] for clarity.
Design Decisions
These are non-negotiable choices with specific rationale. For the full list, see SPEC.md §13.
State is append-only. Prior step results are never mutated. The store is append-only — each step appends its result. This enables history replay — the engine can reconstruct state from data without re-executing side effects.
Cycles have implicit bounds. The cycle guard validator detects potential cycles and applies a default runtime limit (10 visits). Explicit maxVisits + onMaxVisits provides control over the fallback behavior. Unguarded cycles are a lint warning, not a load-time error — the runtime safety net prevents infinite loops.
Actions are declared, not inferred. Any CLI-side side effect must exist as a named action() with typed input/output schemas. No implicit I/O in step callbacks.
Steps are named string keys. The state machine is inspectable as data. Transitions reference step names as strings, not closures. This makes the workflow diffable, serializable, and debuggable.
Schemas are ArkType. One validator, one source of truth, native TypeScript types. No pluggable schema systems. The SDK re-exports type so skills don’t need a separate ArkType dependency.
Prose stays prose. The SDK structures when prose is shown and what contract it satisfies. It never replaces prose with code. Nodes contain freely-written instructions; transitions between nodes are typed and explicit.