Version: v0.1.1

MCP Integration

The MCP (Model Context Protocol) integration allows mekara scripts to run inside Claude Code as custom Claude Code slash commands. This enables users to type /command in Claude Code and have it execute through mekara's step-based execution model.

The .mekara/scripts/nl/ directory (symlinked as .claude/commands/) contains the natural language script sources that become slash commands when the mekara MCP server is configured.

Architecture

One Process Per Claude Code Instance

Each Claude Code instance spawns its own mekara mcp process via stdio. This simplifies the design:

Only one script executes at a time per MCP server process
No session IDs needed - just maintain current execution stack
No session management complexity

Module Structure

src/mekara/mcp/
├── __init__.py      # Package init
├── server.py        # FastMCP server, tool definitions, state
└── executor.py      # Pull-based execution with stack for nested scripts

MCP Tools

The server exposes five tools via FastMCP:

Script execution flow:

start(name, arguments) - Start executing a script. Runs auto steps until first LLM step, NL script, or completion.
continue_compiled_script(outputs) - Continue after completing an LLM step in a compiled script. Pass an empty dict when no outputs are expected. Errors if an NL script is pending.
finish_nl_script() - Signal completion of a natural language script. Errors if an LLM step (not NL script) is pending.
status() - Get current execution state including pending step info (uses the pending step's format() method for display).

Why Separate Tools for Compiled vs NL Scripts?

LLMs often get confused when executing an NL script, and will call continue_script after completing a single step of the NL script instead of only calling it to signal completion. As such, in order to:

ensure that LLMs are aware of the different semantics around continuing versus completing scripts, and
ensure a clean mental model for the humans maintaining the code

we separate the two actions into two very clearly separate domains.

Project customization:

write_bundled(name, force) - Write a bundled skill or standard to the local .agents/ directory for project-level customization. Auto-detects whether name refers to a skill (written to .agents/skills/) or a standard (written to .agents/standards/). Use the standard: prefix to force standard lookup when the name is ambiguous. Used by the /customize command to get bundled source on disk for editing. See Customizing Bundled Content for the full workflow. Errors if a local override already exists (unless force=True).

Pull-Based Executor with Stack

The McpScriptExecutor in executor.py provides pull-based execution control with support for nested scripts:

run_until_llm() - Execute steps until hitting an LLM step or completion. Automatically handles call_script steps by pushing/popping execution frames.
continue_after_llm(outputs) - Resume after LLM step completion

The executor maintains a stack of execution frames, where each frame represents a script. Frames use a type union ExecutionFrame = CompiledScriptFrame | NLScriptFrame to represent compiled vs NL scripts:

ScriptFrame (base class) - Holds shared metadata (script_name, working_dir, resolved_target, arguments)
CompiledScriptFrame - Holds a generator, current step, step index, and exception state for compiled scripts
NLScriptFrame - Holds no generator or step tracking; NL content is loaded on demand

Executed steps tracking: The executor maintains recently_executed_steps: list[ExecutedStep] which accumulates steps executed since the last LLM invocation. When run_until_llm() returns a RunResult, it includes these steps and clears the list.

Each ExecutedStep includes the step's output (stdout + stderr) in its output field. This allows formatting code to show output immediately after each step, rather than accumulating output globally. For Auto steps, the output is extracted from the AutoResult.stdout and AutoResult.stderr fields. AutoException output can be empty if an exception occurs before any output is captured.

Output formatting: The _format_executed_steps() function in server.py displays output using XML tags for clear demarcation:

### Steps executed:
- `test/random[0]`: ✓ `shuf -i 1-100 -n 1`

  <output>
  42
  </output>

- `test/random[1]`: ✓ `echo $((42 * 2))`

  <output>
  84
  </output>

The <output></output> tags make it immediately clear which output came from which step, and the indentation (2 spaces) aligns the output with the list item content for readability.

When a call_script step is encountered:

The current frame is preserved on the stack
The nested script is loaded and a new frame is pushed
Execution continues in the nested script
When the nested script completes, the frame is popped and the parent resumes

This differs from the full ScriptExecutor which handles the entire execution loop internally. The MCP executor gives explicit control over advancement so Claude Code can handle llm steps.

Natural Language Script Handling

Natural language scripts get their own frames on the stack using NLScriptFrame. This is a distinct type from CompiledScriptFrame, sharing the base ScriptFrame metadata.

When an NL script is started (either via start() or through a call_script step):

The NL script frame is pushed with the resolved target and arguments
The pending property loads the NL script file content
The first instance of $ARGUMENTS is substituted with the actual arguments
A PendingNLScript is returned

When finish_nl_script() is called:

An ExecutedStep for the NL completion is recorded
The NLScriptFrame is popped
If the parent frame (always a CompiledScriptFrame) has a CallScript as its current_step, the executor advances the parent past that step
run_until_llm() continues execution in the parent script
_handle_run_result() formats the output, including the NL completion step

This design ensures nested NL scripts work correctly—when another script is pushed onto the stack after an NL script, the NL script's frame is preserved and can be properly resumed when the nested script completes.

Compiled Script Context and Exception Fallback

For compiled scripts, the executor shows the original NL source once per script: the first time the script requires LLM interaction (an llm step or an exception fallback). The source is loaded from the resolved target and the first $ARGUMENTS is substituted, then the frame is marked as having shown context so it isn't repeated.

If an auto step raises an exception, the executor wraps it in an AutoException, records it in the executed step list, and places the compiled frame into fallback mode. The pending state becomes PendingNLFallback, which includes the exception details and the original NL source so the LLM can complete the task manually. Completion uses finish_nl_script(), which pops the failed compiled frame. If there is a parent frame with a CallScript as its current step, the parent is halted with an error PendingLlmStep (not advanced) — the same behavior as any other nested script failure.

Duplicate Instruction Pitfall

NL commands use different wording for the completion instruction: "When you have completed this command" (not "step"), because the LLM must complete the entire command, not just one step within it. The executor adds this instruction to the Llm step's prompt.

_format_llm_step() in server.py also adds a completion instruction for all Llm steps. To avoid duplication, it checks if the prompt already contains call `continue_compiled_script` and skips adding its own instruction if so.

Hook Integration

Mekara uses two Claude Code hooks to ensure compiled scripts are executed via MCP.

UserPromptSubmit Hook

The mekara hook reroute-user-commands command handles the UserPromptSubmit hook, which fires when a user types a /command:

Reads prompt from stdin (JSON format from Claude Code)
Checks if prompt starts with / followed by a command name
Normalizes colons to slashes (Claude Code uses : as path separator)
Uses resolve_target() to check if it's a compiled mekara script
If yes, outputs instructions directing Claude to use MCP tools

PreToolUse Hook

The mekara hook reroute-agent-commands command handles the PreToolUse hook, which fires when Claude attempts to use the Skill tool. This prevents Claude from bypassing MCP when it internally decides to invoke a compiled script:

Checks if the tool being invoked is the Skill tool
Gets the skill name from the tool input
Uses resolve_target() to check if it's a compiled mekara script
If yes, outputs a permissionDecision: "deny" response that tells Claude to use mcp__mekara__start instead

This is essential for nested script invocations. Without the PreToolUse hook, when a parent script's llm step instructs Claude to call /child-script, Claude might use the Skill tool directly instead of MCP start, which would not benefit from the stack-based execution model

Key Implementation Details

Script Resolution vs Working Directory

CRITICAL: The working_dir parameter passed to push_script() is ONLY for auto step execution, NOT for script resolution.

Why this matters: When working_dir is used for script resolution instead of just auto step execution, it can cause find_project_root() to walk up from the wrong location and find the wrong project root. This results in:

Loading scripts from wrong precedence level (e.g., bundled instead of local)
Potentially loading wrong script versions (NL-only instead of compiled)
Silent failures that are difficult to diagnose

The fix ensures script resolution always uses the current project context from cwd, while working_dir only affects where auto steps execute. During VCR replay, working_dir can point to nonexistent locations (like deleted worktrees) because no actual shell commands are executed - we're replaying recorded results.

Debugging Symptom

If VCR tests fail with ValueError: VCR replay event mismatch. Expected McpToolOutputEvent, got AutoStepEvent, one possible cause is that the wrong script version was loaded. When a compiled script is incorrectly loaded as NL-only (due to wrong base_dir in script resolution), the NL script returns immediately without executing auto steps, leaving AutoStepEvents unconsumed in the cassette. This manifests as the VCR event mismatch error above, even though the root cause is silent incorrect script resolution.

Colon/Slash Normalization

Claude Code represents nested commands like /test/random as test:random internally. The hook and MCP server normalize colons to slashes for filesystem lookup, but ResolvedTarget.name uses the canonical colon format (e.g., test:nested) for display in execution history and stack traces.

LLM Control Flow Warning

The start() response always begins with a FUNDAMENTAL PRINCIPLE message:

⚠️ FUNDAMENTAL PRINCIPLE: You called mcp__mekara__start — that means surrendering control to the mekara script runner entirely. The script runner owns ALL control flow. Every step, including manually-executed NL scripts, must advance through the script runner. You MUST NOT continue manually after any step — always call the appropriate continuation tool (finish_nl_script or continue_compiled_script).

and PendingNLScript responses include a CRITICAL notice:

⚠️ CRITICAL: Even though you execute this set of instructions manually, overall control flow is still managed by the script runner. You MUST call finish_nl_script when done with this script — do NOT continue with remaining parts of the parent script on your own. All script execution must go through the script runner.

Why this is necessary

Without these warnings, LLMs sometimes make the following mistake: they receive NL script instructions (e.g., from /merge-main as a nested step), follow the instructions manually, and then continue executing the parent script's remaining steps themselves — never calling finish_nl_script. This abandons the script runner mid-execution.

Error Handling

All failures halt the current (or parent) frame with an error PendingLlmStep via _halt_frame_with_error(). Three failure types use this path:

Auto step failure (non-zero exit code or error): halts the current frame with the step name, command, and error detail
Script load failure (ScriptLoadError): the parent frame is halted with the script name and load error
Nested script exception fallback: after finish_nl_script() completes the fallback, the parent frame is halted with the nested script name and exception

In all cases, the halted frame's current_step is replaced with the error Llm step, which becomes pending and is surfaced to the LLM to handle.

State Management

MekaraServer encapsulates all server state. The executor always exists and is created during server initialization.

The executor holds ALL execution state directly as fields:

working_dir: Path - Current working directory
stack: list[ExecutionFrame] - Execution stack (compiled and NL frames)
recently_executed_steps: list[ExecutedStep] - Steps executed since last LLM invocation

The pending property is computed from the top frame—it's not stored separately. This eliminates state synchronization bugs by deriving pending state from the stack instead of tracking it redundantly.

The server should never track execution state separately—it always delegates to self.executor for state queries.

The server instance is created inside run_server() along with the FastMCP registration.

Start Tool Semantics

start() is MEANT to always be allowed to be called, even when a script is already running. This enables manual nested invocation: when a parent script's LLM step instructs Claude to call another script, Claude uses start() which pushes the new script (compiled or NL) onto the existing stack.

When start() is called while a script is already running:

Parent script hits an llm step and waits
The llm step's instructions tell Claude to call /child-script
Claude calls start("child-script")
The child script is pushed onto the parent's execution stack
Child script runs to completion (or its own llm step)
When child completes, it pops and the parent's llm step becomes pending again (Claude must then call continue_compiled_script() with an outputs dict to advance the parent)

When a nested script completes, the executor checks the parent frame's current_step to determine whether the invocation was automatic or manual. Automatic invocation (via CallScript) results in automatic advancement of the parent's execution frame; manual invocation (via Llm or NL script step) requires manual advancement.

The server is VCR-agnostic—it receives an AutoExecutorProtocol at construction time and doesn't know whether it's a AutoExecutor or VcrAutoExecutor. For VCR testing, VcrMekaraServer wraps MekaraServer and records/replays MCP tool inputs and outputs at the boundary.

run_server() checks for the MEKARA_VCR_CASSETTE environment variable. When set, it creates a VcrMekaraServer in record mode instead of a plain MekaraServer. This enables recording cassettes when running Claude Code with the MCP server (e.g., MEKARA_VCR_CASSETTE=path.yaml claude).

Limitations

One script at a time: Each MCP server process handles one script execution. Starting a new script aborts any running script.
State not persisted: If Claude Code restarts, the MCP server restarts and loses state. The user must re-run the script.

Architecture​

One Process Per Claude Code Instance​

Module Structure​

MCP Tools​

Pull-Based Executor with Stack​

Natural Language Script Handling​

Compiled Script Context and Exception Fallback​

Hook Integration​

UserPromptSubmit Hook​

PreToolUse Hook​

Key Implementation Details​

Script Resolution vs Working Directory​

Colon/Slash Normalization​

LLM Control Flow Warning​

Error Handling​

State Management​

Start Tool Semantics​

Limitations​