Architecture

How @contentful/skill-kit works internally. For the author-facing API, see the API Reference. For the full specification, see SPEC.md.


CLI Protocol

A built skill is invoked by agents via Bash — one call per step. The SDK supports two invocation modes: session mode (recommended) which uses a temp file for communication, and stateless mode (fallback) which passes everything via CLI args and stdout.

Session mode moves protocol data to a JSONL temp file, reducing noise in the agent’s Bash output and eliminating the growing --history flag.

1. Start — Agent creates a session:

scripts/run --params '{"repoPath":"."}' --host claude-code --session new

Returns a minimal SessionPointer to stdout:

{ "sessionId": "abc123", "file": "/tmp/skill-kit-abc123.jsonl", "line": 2 }

The agent reads line 2 from the session file (via the host’s Read tool) to get the prompt, schema, and preamble.

2. Write output — Agent appends its response to the session file:

echo '{"type":"output","step":"diagnose","output":{"checks":[...]}}' >> /tmp/skill-kit-abc123.jsonl

3. Advance — Agent calls advance with just the session ID:

scripts/run advance --session abc123

Returns a line number (e.g., 4). The agent reads that line for the next prompt or done signal.

4. Repeat until the line contains "type":"done".

Stateless mode (fallback)

In stateless mode, the agent passes the full conversation history on every invocation. JSON goes to stdout.

1. Start — Agent calls scripts/run (defaults to start):

scripts/run --params '{"repoPath":"."}' --host claude-code

Returns a PromptResult:

{
  "kind": "prompt",
  "preamble": "In this session you will follow a structured workflow...",
  "step": "diagnose",
  "prompt": "Inspect the repository and report failed health checks.",
  "schema": { "type": "object", "properties": { "checks": {} } }
}

The preamble is emitted once. It establishes session-wide conventions (XML tag-to-tool mappings, rendering rules) so that later step prompts can be shorter.

2. Advance — Agent submits output, gets next step:

scripts/run advance \
  --step diagnose \
  --output '{"checks":[{"name":"ci","status":"fail","detail":"no config"}]}' \
  --params '{"path":"."}' \
  --history '[{"step":"diagnose","response":{...}}]' \
  --host claude-code

Returns another PromptResult (next step) or a DoneResult:

{
  "kind": "done",
  "done": true,
  "finalOutput": { "summary": "..." },
  "completed": { "step": "report", "output": {} }
}

3. Validation error — If the agent’s output doesn’t match the step schema:

{
  "kind": "error",
  "error": "validation",
  "step": "diagnose",
  "message": "Expected object, received string",
  "retry": true
}

The agent retries with corrected output. The retry: true flag tells the agent the step hasn’t advanced.

Result type discrimination — All result types carry a kind field ('prompt', 'done', 'error', 'redirect') that serves as a discriminant for the CliResult union. The SDK exports type guard helpers: isPrompt(r), isDone(r), isError(r), isRedirect(r).

CLI Flags

FlagRequiredDescription
--paramsOn startJSON string validated against the skill’s params schema
--stepOn advanceName of the step being submitted. Not needed with --session in file mode
--outputOn advanceJSON string — the agent’s response. Not needed with --session in file mode
--historyOn advanceJSON array of { step, response, actionResult? }. Not needed with --session
--hostOptionalHost identifier: claude-code, codex, opencode, gemini-cli, cline, roo-code, kilo-code, cursor, amp, generic
--toolsOptionalComma-separated list of available tools (merged with host registry; authoritative with --subagent). Only needed on start — session mode stores tools in the header
--subagentOptionalBoolean flag. Indicates a subagent with a genuine tool subset — --tools becomes authoritative (no registry merge)
--sessionOptionalnew (start) or session ID (advance). Enables session mode
--session-dirOptionalDirectory for session files. Default: OS temp directory
--output-modeOptionalfile (default) or flag. How agent passes step output in session mode

Why stateless is still supported

No persistent processes, no stdin piping, no subprocess lifecycle management. The agent makes sequential Bash calls and parses JSON — a pattern every agent host supports today. Statelessness also enables horizontal scaling, resumable workflows, and retry/redo logic without session corruption.

History replay is cheap. The engine reconstructs state (store, visit counts) from history data without re-executing actions or observers. Session mode uses the same replay mechanism — it just reads history from the file instead of CLI args.


MCP Transport

The CLI protocol works everywhere but requires multiple visible tool calls per step (Bash + file Read). The MCP transport wraps the same SkillEngine as a long-lived stdio MCP server, reducing each workflow step to a single MCP tool call.

How it works

scripts/run mcp --host claude-code

The binary starts an MCP stdio server (via @modelcontextprotocol/sdk). Two tools are registered:

State lives in memory — no JSONL files, no history replay. Session IDs link start and advance calls. When the workflow reaches done, the session is removed.

Relation to CLI mode

Both transports share the same engine layer (WorkflowEngine, SubskillEngine, autoAdvance, generatePreamble). No MCP-specific engine logic exists. The MCP entry point is a thin adapter that maps MCP tool calls to engine method calls.

The generated SKILL.md includes MCP instructions first, with CLI as a labeled fallback. The agent chooses which mode based on whether MCP tools are in its tool list.


The Host-Aware Primitive System

The architectural constraint

The skill CLI cannot call tools. It cannot invoke MCP methods. It cannot cause the host to render UI. Only the model can do those things, and only in response to prose it reads.

When the SDK wants the model to use AskUserQuestion on Claude Code, all it can do is return XML that the preamble has mapped to that tool. The model reads the XML tag, consults the preamble’s tag-to-tool table, calls the tool, and passes the answer back on the next invocation. The answer shape is still enforced by the step’s ArkType schema.

Everything in the host-aware system is downstream of this constraint. Primitives render XML tags; the preamble maps each tag to the host’s tools. The “capability system” is a lookup table that picks which tool to advertise per tag.

Two mechanisms

Preamble at session start. Generated once per skill invocation. The preamble is a markdown table mapping XML tags to host-specific tools:

TagToolHow to use
<ask-user>AskUserQuestionPresent <option> children as choices via the tool…
<checklist>TaskCreateRegister <item> children via the tool…
<subagent>AgentSpawn isolated agent for enclosed task via the tool…

The preamble is generated per host — different tools in the Tool column, same tags and semantics. resolveTools(handshake) matches each primitive’s tool list against the host’s available tools to populate the table. Later step prompts can be shorter because the preamble has set the context.

Preambles are best-effort — the model may forget them under context pressure. The XML tags themselves are self-describing, which helps even without preamble context. Preambles optimize the common case; the tag structure guards correctness.

Per-step XML rendering. For any step using a primitive (passed to prompt directly or composed via act.* in a prompt function), the SDK renders the primitive as an XML tag. An askUser step emits:

<ask-user type="structured" question="Which target?">
  <option value="production" label="Production"></option>
  <option value="staging" label="Staging"></option>
</ask-user>

The preamble tells the model how to handle <ask-user> — on Claude Code, use the AskUserQuestion tool; on a host without a structured-question tool, present as a numbered list. No tool names appear in the XML itself.

Primitive registry and tool resolution

Each primitive is defined with definePrimitive() (src/primitives/primitive.ts):

interface RenderContext {
  skillName?: string;
}

interface Primitive<TInput, TConfig, TTools> {
  readonly tag: string; // XML tag name (e.g., 'ask-user')
  readonly tools: TTools; // tool names to match against host
  create(input: TInput): TConfig;
  render(config: TConfig, ctx?: RenderContext): string;
  preambleRow(tool: string | undefined): PreambleRow;
}

The render method accepts an optional RenderContext. The engine passes { skillName } when calling renderPrimitive, allowing primitives like subagent to emit the skill name in the no-recurse attribute.

The registry (src/primitives/registry.ts) manages all five primitives and exposes two functions:

Tool resolution uses a three-way strategy based on the isSubagent flag on the Handshake. Top-level agents that under-report their tools (common in practice) get the registry union, so their preamble includes all known host tools. Subagents that pass --subagent get authoritative resolution — only their explicitly reported tools are used, preventing the preamble from referencing tools the subagent does not actually have.

To add a new primitive, define it with definePrimitive() and add it to the ALL_PRIMITIVES array in the registry.

Prompt assembly pipeline

When the engine builds a step’s prompt, it follows this sequence:

  1. resolvePromptValue — Evaluates the step’s prompt config. If it’s a function, calls it with PromptContext (which includes act and system builders). If it’s a string, array, or single segment, uses it directly. Returns PromptReturn.

  2. normalizePieces — Wraps the result in an array of PromptPiece values. A single string or segment becomes a one-element array.

  3. assemblePieces — Walks the array in author order, wrapping each piece in its XML tag:

    • Plain strings become <prompt>\n...\n</prompt>.
    • SystemSegment ({ kind: 'system' }) becomes <system>...</system>.
    • ActSegment ({ kind: 'act' }) is rendered via renderPrimitive() to its XML tag (e.g., <ask-user>, <checklist>, <subagent>).
    • ViewSegment ({ kind: 'view' }) becomes <rendered>...</rendered> (with optional name attribute).

    Non-empty results are joined with double newlines.

The author controls ordering. A prompt function returning [system\…`, act.checklist(…), ‘Build the game.’]produces, then , then Build the game.` in the final output.

Why primitives matter

Three reasons, all load-bearing:

  1. Centralized tuning. The SDK owns the XML rendering and preamble mappings that produce reliable behavior per host. One tuning pass benefits every skill.
  2. Host portability. Authors write intent once; the SDK translates per host. No hardcoded tool names to break on a different agent.
  3. SDK improvements propagate. Better phrasing for Codex six months from now ships as an SDK update. Every skill using primitives gets it for free.

Skills written against primitives inherit prompt-engineering work done in the SDK. Skills written as raw prose don’t.

Escape hatch

For cases where author-written prose must be host-aware, PromptContext.host.toolsAvailable is available in prompt functions:

prompt: ({ host }) => {
  if (host.toolsAvailable.includes('WebSearch')) {
    return 'Search the web for recent CVEs affecting this dependency.';
  }
  return 'Check the changelog for known security issues.';
},

The lint rule no-host-tool-names enforces that raw tool names only appear inside host.toolsAvailable.includes() guards.

Known host tool inventories

Claude Code: AskUserQuestion, EnterPlanMode, ExitPlanMode, TaskCreate, TaskUpdate, TaskList, TaskGet, Agent, Skill, Read, Edit, Write, Bash, Glob, Grep, WebFetch, WebSearch, TodoWrite, SendMessage, Monitor, LSP, NotebookEdit, EnterWorktree, ExitWorktree.

Codex CLI: shell, apply_patch, update_plan, web_search, view_image, exec_command, write_stdin, ToolRequestUserInput, CollabAgent.

OpenCode: bash, read, write, edit, apply_patch, glob, grep, codesearch, lsp, webfetch, websearch, question, todo, task, plan, skill.

Gemini CLI: shell, read-file, write-file, edit, glob, grep, web-search, web-fetch, ask-user, enter-plan-mode, exit-plan-mode, write-todos, agent, tracker-create-task, tracker-update-task, memory, activate-skill, complete-task.

Cline / Roo Code / Kilo Code: execute_command, read_file, write_to_file, edit_file, apply_diff, apply_patch, search_files, list_files, codebase_search, ask_followup_question, attempt_completion, new_task, switch_mode, update_todo_list, and host-specific additions.

Cursor: codebase_search, read_file, edit_file, run_terminal_command, file_search, grep_search, list_dir.

Amp: shell, read, write, edit.


The Build Pipeline

skill-kit build <entry.ts> -o <dir> produces a distributable, agentskills.io-compliant skill directory.

Build modes

The --mode flag selects the bundling strategy:

Pipeline steps

  1. Load — Import the entry file, extract the default export (must be a SkillDefinition or ReferenceDefinition).
  2. Validate — Run lint checks (cycle guards, schema consistency).
  3. Generate wrapper — Create a temporary entry point that imports the skill and calls main() (or compositeMain() if the skill has subskills) from @contentful/skill-kit/cli.
  4. Bundle — Mode-dependent:
    • Bun mode: For each target platform, run bun build --compile --target bun-<platform>. Individual target failures don’t halt the pipeline; zero successful targets does.
    • Node mode: Run esbuild to produce a single .mjs bundle with all dependencies inlined.
  5. Generate scripts/run — Shell wrapper (mode-dependent: platform dispatcher for bun, Node version check for node).
  6. Generate SKILL.md — Agent-facing documentation with invocation instructions, step descriptions, and reference pointers.
  7. Generate package.json — Name, version, and any fields from the skill’s package config. Merges with existing package.json in the output directory. When resolveVersion: true, reads the version from the nearest ancestor package.json.
  8. Copy references/ — Markdown files from the source references/ directory.
  9. Clean up — Remove temporary wrapper files.

Output structure

Bun mode:

<dir>/
  SKILL.md               ← Agent reads this first
  package.json
  scripts/
    run                  ← Detects OS/arch, delegates to binary
  bin/
    <name>-darwin-arm64  ← macOS Apple Silicon
    <name>-linux-x64     ← Linux x86_64
  references/
    *.md                 ← Bundled content files

Default targets: darwin-arm64 and linux-x64. Override with --targets. Use --single for current-platform-only dev builds.

Node mode:

<dir>/
  SKILL.md               ← Agent reads this first
  package.json
  scripts/
    run                  ← Checks Node ≥ 24, runs bundle
  bin/
    <name>.mjs           ← Single ESM bundle
  references/
    *.md                 ← Bundled content files

The scripts/run wrapper

Agents call scripts/run, never bin/ directly. The wrapper varies by mode but the contract is identical.

Bun mode — detects OS/architecture, selects the correct binary:

#!/usr/bin/env bash
set -euo pipefail
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m)"
case "$ARCH" in x86_64) ARCH="x64" ;; aarch64|arm64) ARCH="arm64" ;; esac
BIN="$SKILL_DIR/bin/<name>-${OS}-${ARCH}"
exec "$BIN" "$@"

Node mode — checks Node version, sets SKILL_DIR for reference resolution:

#!/usr/bin/env bash
set -euo pipefail
NODE_VERSION="$(node -e 'process.stdout.write(process.versions.node.split(".")[0])')"
if [ "$NODE_VERSION" -lt 24 ] 2>/dev/null; then
  echo "error: Node.js >= 24 required" >&2; exit 1
fi
export SKILL_DIR
exec node "$SKILL_DIR/bin/<name>.mjs" "$@"

This decouples the skill’s contract (SKILL.md references scripts/run) from its internal layout.

The compile/bundle step

The SDK generates a temporary entry point:

import skill from './skill';
import { main } from '@contentful/skill-kit/cli';
main(skill);

Bun mode: bun build --compile bundles everything — the SDK, ArkType, the skill code — into a single self-contained executable. The SDK itself has no Bun runtime dependency; Bun is used only as a build tool.

Node mode: esbuild bundles the same tree into a single .mjs file. All dependencies are inlined; only Node.js built-ins are external.


Engine Internals

The WorkflowEngine (src/runtime/engine.ts) is the core state machine.

Lifecycle

Constructor — Takes a SkillDefinition, Handshake, params, and optional ReferenceLoader. Initializes the state store.

start() — Validates the skill structure (parent sentinels resolved, cycle guards present). Generates the preamble. Fires onStepStart. Returns the first step’s PromptResult. If the entry step has no prompt, it auto-advances through the step lifecycle without agent interaction (useful for computation-only routing steps).

advance(stepName, rawOutput) — The main loop:

  1. Validate response against step’s ArkType schema. Steps without a response schema skip validation.
  2. If invalid: fire onStepValidationFailed, return ValidationErrorResult with retry: true.
  3. Map action input via action.mapInput (if configured), or use validated response directly.
  4. Execute action (if configured). Action receives typed input and AbortSignal.
  5. Compute save — call the save callback (if configured) with { response, actionResult, store, params }. The return value is { step?, ...subStoreWrites }. The step property determines what gets stored as the step result (priority: save().step > action output > response). Additional keys are deep-merged into the corresponding sub-stores via applySave().
  6. Freeze the step result object.
  7. Append the step result to store.steps. Deep-merge any sub-store writes into their top-level store properties.
  8. Fire onStepComplete.
  9. Resolve next step:
    • { terminal: true } / terminal — fire onSkillComplete, return DoneResult.
    • NextBranch[] — evaluate branches in order, first match wins.
    • Function — call with { response, actionResult, attempts, params, store }, get step name.
    • 'self' — rewrite to current step name.
    • Apply maxVisits / onMaxVisits throttle.
  10. Fire onTransition.
  11. If the next step has no prompt, auto-advance through it immediately (no agent round-trip needed).
  12. Return next step’s PromptResult.

History replay

replayHistory(history) reconstructs engine state from a previous execution. It validates each entry’s response against the step schema and re-executes save callbacks to rebuild both step results and sub-store state, but does not re-execute actions or fire observers. This is how the stateless protocol works — each advance call replays the full history to rebuild state before processing the new step.

StateStore

Append-only store with two namespaces. Step results live under store.steps — each step appends a StepResult record containing the step name, response, actionResult, and computed result. The steps namespace provides a typed accessor (StepsView) via a Proxy that maps property access to step-name lookups. Guaranteed steps (on all paths from entry) are non-optional; branch targets require ?.. Methods like steps.all(step) and steps.ran(step) provide loop and existence queries. Values are frozen after append.

Top-level sub-stores hold domain-structured state populated by save callbacks. When a save callback returns keys beyond step, those are deep-merged into the corresponding sub-store properties via applySave(). Multiple steps can write to the same sub-store — writes accumulate via deep merge, allowing incremental state building across the workflow.

Observer dispatch

Observers fire sequentially and are awaited (they can be async), but failures are caught and logged — they never block workflow execution. Observers receive read-only snapshots of engine state.

Composite skill routing

When a composite skill’s step returns a next target that doesn’t exist in the local step map (e.g., 'subskill:doctor'), the engine returns a RedirectResult instead of throwing. The composite entry point (compositeMain) intercepts this:

  1. subskill:X — looks up the sub-skill registration, calls paramsMap(response, store) to produce params, creates a new WorkflowEngine for the sub-skill, and returns its first PromptResult with the step name prefixed (doctor/diagnose).
  2. topic:X — loads the topic content via ReferenceLoader and returns a DoneResult.

On subsequent advance calls, the composite entry checks whether the step name contains /. If it does, it routes to the corresponding sub-skill engine (with history filtered and unprefixed). The engines themselves are unaware of the composite layer — each operates on its own SkillDefinition with unprefixed step names.

Direct sub-skill access (scripts/run doctor --params '{}') bypasses the dispatcher entirely and creates the sub-skill engine directly.


Lint System

checkSkill(skill, rootDir) runs static analysis on a skill definition. Returns an array of LintDiagnostic objects:

interface LintDiagnostic {
  rule: string;
  severity: 'error' | 'warning';
  message: string;
  step?: string;
  file?: string;
}

Rules

cycle-guard (warning/error) — Warns when circular step transitions (self-loops and multi-step cycles) lack maxVisits + onMaxVisits; an implicit runtime limit of 10 visits applies. Errors when the cycle-guard configuration itself is invalid (e.g., onMaxVisits targets a non-existent step). Enforced at validation time, before the engine runs.

no-host-tool-names (error) — Steps must not reference host tool names directly (e.g., AskUserQuestion, apply_patch, TodoWrite) in prompts without guarding behind host.toolsAvailable.includes('ToolName'). Scans both string prompts and function .toString() output. The guard pattern exempts the reference.

primitive-schema-mismatch (error/warning) — For steps with askUser structured type: errors if option values are missing from the output ArkType enum, warns if the enum has values not present in the options list.

orphan-references (warning) — Files in the references/ directory that aren’t mentioned in any step prompt. May indicate dead content.

unknown-tool-names (warning) — host.toolsAvailable.includes() calls that reference tool names not in the known registry (40+ tools across Claude Code, Codex, and OpenCode).

host-branching-density (warning) — Multiple steps branching on host.toolsAvailable.includes(). Suggests a missing SDK primitive — if several steps need host-specific logic, the pattern should probably be elevated to a primitive.

composite-step-name (error) — Dispatcher step names containing /, which conflicts with sub-skill step namespacing.

composite-duplicate-subskill / composite-duplicate-topic (error) — Duplicate names in sub-skill or topic registrations.

For composite skills, checkSkill recursively lints each registered sub-skill. Diagnostics from sub-skills are prefixed with [subskill:<name>] for clarity.


Design Decisions

These are non-negotiable choices with specific rationale. For the full list, see SPEC.md §13.

State is append-only. Prior step results are never mutated. The store is append-only — each step appends its result. This enables history replay — the engine can reconstruct state from data without re-executing side effects.

Cycles have implicit bounds. The cycle guard validator detects potential cycles and applies a default runtime limit (10 visits). Explicit maxVisits + onMaxVisits provides control over the fallback behavior. Unguarded cycles are a lint warning, not a load-time error — the runtime safety net prevents infinite loops.

Actions are declared, not inferred. Any CLI-side side effect must exist as a named action() with typed input/output schemas. No implicit I/O in step callbacks.

Steps are named string keys. The state machine is inspectable as data. Transitions reference step names as strings, not closures. This makes the workflow diffable, serializable, and debuggable.

Schemas are ArkType. One validator, one source of truth, native TypeScript types. No pluggable schema systems. The SDK re-exports type so skills don’t need a separate ArkType dependency.

Prose stays prose. The SDK structures when prose is shown and what contract it satisfies. It never replaces prose with code. Nodes contain freely-written instructions; transitions between nodes are typed and explicit.